Bayesian Analysis in Visual Analytics (BAVA)
Abstract: The goal of this research is to combine two areas, Visual Analytics and
Bayesian Statistics. Currently, visualizations display inflexible
deterministic transformations of data that inherently separate data
visualization from visual synthesis. Analysts cannot manipulate
displays to inject domain-specific knowledge to formally assess the
merger of their expert judgment with the data. However, by changing the
nature of the data transformation from deterministic to probabilistic
Bayesian methods, manipulations to a display are possible to interpret
quantitatively. Thus, a new visualization model is developed which
offers editable representations to promote bidirectional flow between
analyst and data.
read more ...
Differential geometry approach for virus surface formation, evolution and visualization
Abstract: Viruses are contagious agents and can cause epidemics and pandemics. The importance of the prevention and control of viral epidemics and pandemics to homeland security and daily life cannot be overemphasized. Viruses cannot grow and/or reproduce outside host cells. Their infection starts with the attachment of a virus on the host cell surface, with possible fusion of viral capsid surface and the host cellular membrane, followed by virus penetration into the host cell. These processes involve mostly non-bonding interactions between the virus capsid surface and the aquatic environment, as well as the host surface membrane or receptor. read more ...
Dimension Reduction and Data Reduction: Foundations for Visualization
Abstract: The FODAVA (Foundations of Data Analysis and Visualization) Lead research team at the Georgia Institute of Technology provides unified expertise in the critical areas for providing leadership of the FODAVA effort, including machine learning and computational statistics, information visualization, massive-dataset algorithms and data structures, and optimization theory. The team is focused on the fundamental theory and approaches to make breakthroughs in data representations and transformations. read more ...
Efficient Data Reduction and Summarization
Abstract: The ubiquitous phenomenon of massive data (including data streams) imposes considerable challenges in data visualization and exploratory data analysis. About 15 years ago, terabyte datasets were still considered `ridiculous.' However, modern datasets managed by Stanford Linear Acceleration Center (SLAC), NASA, NSA, etc. have reached the perabyte scale or larger. Corporations such as Amazon, Wal-Mart, Ebay, and search engine firms are also major generators and users of massive data. The general theme of data reduction and summarization has become an active and highly inter-disciplinary area of research. This project proposes to develop various approximation techniques, which generate a "fingerprint" or "sketch" of the massive data by transforming the original data. read more ...
Formal Models, Algorithms, and Visualizations for Storytelling Analytics
Abstract: Modern direct manipulation and visualization systems have made key strides in bringing powerful data transformations and algorithms to the analyst's desktop. But to further promote the vision of powerful visual analytics, wherein automated algorithms and visual representations complement each other to yield new insight, we must continually increase the expressiveness with which analysts interact with data. This project focuses on the task of storytelling, that is to say the stringing together of seemingly unconnected pieces of data into a coherent thread or argument. read more ...
Foundations of Comparative Analytics for Uncertainty in Graphs
Abstract: This is a collaborative research effort bringing together expertise of
Lise Getoor, University of Maryland College Park (0937094), Alex Pang,
University of California-Santa Cruz (0937073) and Lisa Singh,
Georgetown University (0937070).
read more ...
Global Structure Discovery on Sampled Spaces
Abstract: Over the past decade, the precipitous drop in the cost of disk storage and the build-up of world-wide high-bandwidth fiber optic communications has made massive amounts of data of different modalities (text, images,video) easily available to everyone over the Web. In science, engineering, business, and medicine, high-bandwidth sensors, large-scale simulations, and data collection bots generate immense data sets that need to be analyzed. Making sense of all this disparate data in becoming increasingly challenging and difficult. Unlike traditional databases where data is carefully massaged to adhere to rigid schemata, much of the above data comes unstructured, is often dynamic rather than static, can contain large amounts of noise or even errors, and can be incomplete. read more ...
Interactive Discovery and Semantic Labeling of Patterns in Spatial Data
Abstract: Finding and labeling semantic patterns in large, spatial data sets is one of the most important problems facing computer scientists today. Massive spatial data sets are being acquired in almost every scientific discipline, such as medicine, geology, biology, astrophysics, and others. Finding meaningful patterns in those data is often the bottleneck to scientific discovery. The proposed research is to develop a transformative machine learning methodology, where the process of discovering semantic patterns in large spatial data sets is interactive and semi-autonomous. read more ...
Mathematical Foundations of Multiscale Graph Representations and Interactive Learning
Abstract: The analysis of large high-dimensional data sets and graphs is motivated by many important applications, such as the study of databases of images and documents, and the modeling of complex dynamical systems (e.g. transaction data, weather patterns, molecular dynamics). This research involves the development of novel mathematical techniques for extracting and visualizing information from large data sets. The data layout, visualization, and human interaction are centered around multi-scale representations, which make it possible to access the data, the derived information and the inference processes associated with it at multiple levels of resolution. read more ...
New Geometric Methods of Mixture Models for Interactive Visualization
Abstract: This research project will extend the theoretical foundations of mixture modeling for statistical learning by novel mathematical tools that can probe into the precise geometry of mixture models. Based on the theoretical results, the investigators will develop new approaches to clustering, dimension reduction, variable selection, and temporal analysis. These methods will open promising paths for interactively visualizing complex data and for data summarization. A suite of statistical tools will be integrated as the technical backbone into a new visualization system. Applications to very large-scale, high dimensional, and temporally evolving data will be explored. read more ...
Principles for Scalable Dynamic Visual Analytics
Abstract: The human eye is often capable of identifying interesting patterns and
trends from a well-presented data set, whereas computational algorithms
may have difficulties with such a task. Yet, there are limits to human
ability, both with the scale of the data set in terms of objects and
attributes and with dynamic changes over time. This project develops an
analytic and computational framework to support the visual analysis of
large-scale dynamic data with network structure.
read more ...
Scalable Visualization and Model Building
Abstract: Developing new algorithms, visualization tools, and mathematical models that can predict and explain patterns in data is fundamental to machine learning and statistics. They enable a predictive modeling that is fundamental to science and engineering. Visualization is critical in all phases of data analysis, from the moment the data are collected when data checking and cleaning are needed, to the final presentation of results. Visualization facilitates model building by allowing the analyst to critically assess the predictive power of a model, and to diagnose problems in fitting the patterns in the data. read more ...
Uncertainty-Aware Data Transformations for Collaborative Reasoning
Abstract: The ability to obtain insight from massive, dynamic, and likely
incomplete digital data is absolutely essential to those who collect
these data for time-critical decision-making. This research is
developing mathematical formulations for the quantification,
propagation, and aggregation of such data to support collaborative
reasoning using visual means. The fundamental importance of this
research is to give analysts a more trustworthy view of data with the
consideration and incorporation of uncertainty due to data
transformations and the propagation of that information to the
reasoning stage.
read more ...
Visualization of Analytical Procsses
Abstract: There is currently a major discrepancy between the dramatic improvements in hardware for sensing, communication, and storage of raw data and the capacity of humans to analyze and act on this data in a meaningful way. There is every reason to believe that this development will continue in the near future, given the revolutionary changes to hardware and software in the World Wide Web, the Sensor Web, the network of hand-held and mobile devices, and the Smart Grid. read more ...
Visualizing Audio for Anomaly Detection
Abstract: The goal of this proposal is to transform large audio corpora into a form suitable for visualization. Specifically, this proposal addresses the type of audio anomalies that human data analysts hear instantly: angry shouting, trucks at midnight on a residential street, gunshots. The human ear detects anomalies of this type rapidly and with high accuracy. Unfortunately, a data analyst can listen to only one sound at a time. read more ...
Visually-Motivated Characterizations of Point Sets Embedded in High-Dimensional Geometric Spaces
Abstract: The proposed research exploits an idea of John Tukey that was never published. Called scagnostics (a Tukey neologism for "scatterplot diagnostics"), the original idea leads to a more general characterization of high-dimensional point sets using visually-based geometric and graph-theoretic measures. These measures comprise a canonical set of 9 features of pointwise data typically observed by experienced statisticians. Computing these measures on all possible 2D axis-parallel orthogonal projections in a p-dimensional space results in a p(p- 1)/2 × 9 matrix of measures. read more ...

