2011 Computer Science Research Conference

Final Program and Abstracts

Friday. April 8, 2011. Lov 103

Judging Panel

Download Judging Form

Guest Judge

Faculty Judge

Student Judge

Stephen Hines

Daniel Chang

Peter Gavin

Time Student Title
10:00-10:20 Sereyvathana Ty Systems Approach to Steganalysis
10:20-10:40 Fernando Sanchez Blocking Spam by Separating End-User Machines from Legitimate Mail Sever Machines
10:40-11:00 Wangchao Le Rewriting Queries on SPARQL Views  ---Honorable Mention---
11:00-11:20 Jeffery Jestes Building Wavelet Histograms on Large Data in MapReduce  ---Best Overall---
11:20-11:40 Bradley Neff Instant Approximate 1-Center on Road Networks Via Embeddings
11:40-12:00 Santosh Kumar Mahapatra Load Balancing Mechanisms in Data Center Networks
12:00-1:00 Lunch Break Lunch Break
1:00-1:20 Yuval Peress The Use of Memory Pages in L1 Caches
1:20-1:40 Ian Finlayson Improving Program Efficiency with Static Pipelining  ---Honorable Mention---
1:40-2:00 Ciaran Hanningan PhoneStar
2:20-2:40 Michael Mitchell Context and Bio-Aware Applications for Android  ---Honorable Mention---
2:40-3:00 Chris Meyers Tags: A Unifying Primitive to Build Storage Data Paths for Swiftly Evolving Workloads and Storage Media
3:00-:3:20 Bin Yao Optimal location queries in road network databases
3:20-4:20 Judges Deliberation Judges Deliberation

Abstracts

Systems Approach to Steganalysis.

Advisor: Xiuwen Liu

Digital Steganographic algorithms are routines that hide secret messages in seemingly innocent cover objects such as images. In recent years, steganographic and steganalysis algorithms have rapidly evolved, resulting in effective steganographic algorithms with minimum distortions and associated steganalysis algorithms. While existing steganographic and steganalysis algorithms often have original properties, they are typically developed and evaluated under unrealistic settings. As a result, the inherent deficiency renders most of the methods ineffective. In this paper we propose a system approach to steganalysis for reliably detecting steganographic objects among a large number of images, acknowledging that most digital images are intact images. The system consists of a cascade of intrinsic image formations filters (IIFFs), where the IIFFs in the early stage are designed to filter out non-stego images based on real world constraints, and the IIFFs in the late stage are designed to detect specific intrinsic features of steganographic routines. The proposed approach allows us to maximally utilize the available constraints that lead to robust detection performance and low probability of false alarm. Our systematic experimental results using large-scale images from Flickr.com demonstrate the potential of the proposed approach for real world large-scale repositories.

Blocking Spam by Separating End-User Machines from Legitimate Mail Sever Machines.

Advisor: Zhenhai Duan

Spamming botnets present a critical challenge in the control of spam messages due to the sheer volume and wide spread of the botnet members. In this paper we advocate the approach for recipient mail servers to filter messages directly delivered from remote end-user (EU) machines, given that the majority of spamming bots are EU machines. We develop a Support Vector Machine (SVM) based classifier to separate EU machines from legitimate mail server (LMS) machines, using a set of machine features that cannot be easily manipulated by spammers. We investigate the efficacy and performance of the SVM-based classifier using a number of real-world data sets. Our performance studies show that the SVM-based classifier is indeed a feasible and effective approach in distinguishing EU machines from LMS machines. For example, training and testing on an aggregated data set containing both EU machines and LMS machine, the SVM-based classifier can achieve a 99.27% detection accuracy, with very small false positive rate (0.44%) and false negative rate (1.1%), significantly outperforming eight DNS-based blacklists widely used today.

Rewriting Queries on SPARQL Views.

Advisor: FeiFei Li

The problem of answering SPARQL queries over virtual SPARQL views is commonly encountered in a number of settings, including while enforcing security policies to access RDF data, or when integrating RDF data from disparate sources. We approach this problem by rewriting SPARQL queries over the views to equivalent queries over the underlying RDF data, thus avoiding the costs entailed by view materialization and maintenance. We show that SPARQL query rewriting combines the most challenging aspects of rewriting for the relational and XML cases: like the relational case, SPARQL query rewriting requires synthesizing multiple views; like the XML case, the size of the rewritten query is exponential to the size of the query and the views. In this paper, we present the first native query rewriting algorithm for SPARQL. For an input SPARQL query over a set of virtual SPARQL views, the rewritten query resembles a union of conjunctive queries and can be of exponential size. We propose optimizations overthe basic rewriting algorithm to (i) minimize each conjunctive query in the union; (ii) eliminate conjunctive queries with empty results from evaluation; and (iii) efficiently prune out big portions of the search space of empty rewritings. The experiments, performed on two RDF stores, show that our algorithms arescalable and independent of the underlying RDF stores. Furthermore, our optimizations have order of magnitude improvements over the basic rewriting algorithm in both the rewriting size and evaluation time.

Building Wavelet Histograms on Large Data in MapReduce.

Advisor: FeiFei Li

MapReduce is becoming the de facto framework for storing and processing massive data, due to its excellent scalability, reliability, and elasticity. In many MapReduce applications, obtaining a compact accurate summary of data is essential. Among various data summarization tools, histograms have proven to be particularly important and useful for summarizing data, and the wavelet histogram is one of the most widely used histograms. In this paper, we investigate the problem of building wavelet histograms efficiently on large datasets in MapReduce. We measure the efficiency of the algorithms by both end-to-end running time and communication cost. We demonstrate straightforward adaptations of existing exact and approximate methods for building wavelet histograms to MapReduce clusters are highly inefficient. To that end, we design new algorithms for computing exact and approximate wavelet histograms and discuss their implementation in MapReduce. We illustrate our techniques in Hadoop, and compare to baseline solutions with extensive experiments performed in a heterogeneous Hadoop cluster of 16 nodes, using large real and synthetic datasets, up to hundreds of gigabytes. The results suggest significant (often several orders of magnitude) performance improvement achieved by our new algorithms.

Instant Approximate 1-Center on Road Networks Via Embeddings.

We present the 1-center problem on road networks, an important problem in GIS. Using Euclidean embeddings, and reduction to fast nearest neighbor search, we devised an approximation algorithm for this problem. On real world data sets, we conducted extensive experiments that indicate fast computation of constant factor approximate solutions for query sets much larger than previously computable using exact techniques.

Load Balancing Mechanisms in Data Center Networks.

Advisor: Xin Yuan

We study the network traffic load balancing mechanisms in Data Center Network environments, which are commonly built with scaleout topologies such as folded-clos or fat trees. We investigate the Valiant Load Balancing (VLB) technique that uses randomization to deal with the highly volatile traffic in the Data Center Networks. The focus of our research is the granularity at which VLB can be utilized to maximize the peak performance of such networks. Through simulation experiments, we show that flow-level VLB can be significantly worse than the packet level VLB, which is the ideal for all cases, for non-uniform traffic. We propose two adaptive load balance mechanisms that utilizes the network state information to achieve load balancing. The first mechanism selects the next hop switch with the least queue length. The second mechanism probes a set of the available alternate paths for possible congestion and selects the one with the least congestion. Our experimental results indicate that these techniques can improve the performance over flow level VLB in many situations. Currently we are investigating the benefits of using VLB at the flowlet level.

The Use of Memory Pages in L1 Caches.

Advisor: Gary Tyson

L1 cache design has remained relatively unchanged. Instead modern alterations are made to peripheral devices/heuristics such as clever pre-fetchers and reorder buffers. This work proposes a split of the L1 data cache that captures specific well behaved lines of memory. To do so, memory pages are used as a means to make the selection process easier and more cost effective. The end result of which is an efficient selection algorithm for highly referenced pages of memory that can now be caches in smaller/faster parallel structures in parallel to the L1 data cache.

Improving Program Efficiency with Static Pipelining.

Advisors: Gary Tyson and David Whalley

A new generation of mobile applications requires reduced energy consumption without sacrificing execution performance. In this talk we introduce a statically pipelined processor in which the control during each cycle is explicitly represented in each instruction. Thus the pipelining is in effect statically determined by the compiler. The benefits of this approach include simpler hardware and that it allows the compiler to perform optimizations that are not possible on traditional architectures. The results indicate that static pipelining can significantly reduce power consumption without adversely affecting performance.

PhoneStar.

Advisor: Gary Tyson

On-star is the most popular automated service feature found in recent automobiles. On-star provides continuous monitoring of vehicle condition including both performance and safety features. The system automatically informs service personnel in the event of an accident. The cost of this service is a monthly fee. Many users eventually drop the service contract due to cost. Most of the features available with the service contract can be obtained using a combination of a smart phone, a blue-tooth enabled scantool and a social network willing to react to a reported emergency. Our Android application will collect vehical real-time information from the scantool. With this information the phone can monitor the state of the airbags, which are clear indicators of an accident. Upon the detection of an accident, the phone alerts the driver of a detected accident and asks if they are alright. On user request, or when no interaction occurs, the phone will automatically contact friends/family who can assess the situation and contact emergency services if required. By relying on friends and family, PhoneStar can avoid the high service fee found in other monitoring services.

Context and Bio-Aware Applications for Android.

Advisors: Gary Tyson and Andy Wang

The dramatically expanding capabilities of the modern smart phone are enabling the creation of entirely new classes of applications for health, wellness, and entertainment. Data collected from on-board sensors, web services, social media and external biosensors can be integrated to determine the context of the device, user, and environment. These combined contexts allow the device to achieve a new level of awareness of the user and surroundings, thereby enabling the creation of rich, biologically driven applications. Collected and derived context is stored in a context provider - an extension of the Android content provider - offering a unified, query-able interface to all contextual data on the device. This allows for rapid development of new context and bio-aware applications and helps simplify the deployment model. Presented bio-aware applications include: BEAT, a medical monitoring and alert framework, and TempoTrainer, a biofeedback driven music recommendation system.

Tags: A Unifying Primitive to Build Storage Data Paths for Swiftly Evolving Workloads and Storage Media.

Advisor: Andy Wang

With increasing capacity and declining costs, digital storage affects our lives in unprecedented ways. However, legacy storage data paths are not adequately prepared to integrate with swiftly evolving storage media (i.e., various memory-based storage) and workload demands (e.g., energy efficiency, security, real- time constraints, maintenance, etc.). In particular, legacy data paths show: (1) limited application support to control low-level storage mechanisms; (2) difficulties of migrating from the assumptions of disks to memory-based storage; (3) poor communications within a data path for each component to coordinate.

Optimal location queries in road network databases.

Advisor: Feifei Li

Optimal location (OL) queries are a type of spatial queries particularly useful for the strategic planning of resources. Given a set of existing facilities and a set of clients, an OL query asks for a location to build a new facility that optimizes a certain cost metric (defined based on the distances between the clients and the facilities). Several techniques have been proposed to address OL queries, assuming that all clients and facilities reside in an Lp space. In practice, however, movements between spatial locations are usually confined by the underlying road network, and hence, the actual distance between two locations can differ significantly from their Lp distance. Motivated by the deficiency of the existing techniques, this paper presents the first study on OL queries in road networks. We propose a unified framework that addresses three variants of OL queries that find important applications in practice, and we instantiate the framework with several novel query processing algorithms. We demonstrate the efficiency of our solutions through extensive experiments with real data.