Latest News and Events

The SAMSI-FODAVA Workshop on Interactive Visualization and Analysis of Massive Data will be held on December 10-12, 2012.
Posted: October 02, 2012
The FODAVA Annual Meeting will immediately follow (Dec 12-13) the SAMSI/FODAVA joint workshop at the same location.
Posted: September 05, 2012
Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from num
Posted: June 30, 2012

Scalable Visualization and Model Building

Developing new algorithms, visualization tools, and mathematical models that can predict and explain patterns in data is fundamental to machine learning and statistics. They enable a predictive modeling that is fundamental to science and engineering. Visualization is critical in all phases of data analysis, from the moment the data are collected when data checking and cleaning are needed, to the final presentation of results. Visualization facilitates model building by allowing the analyst to critically assess the predictive power of a model, and to diagnose problems in fitting the patterns in the data. The investigators are carrying out research in approaches, methods, and models for describing patterns in data with a strong emphasis on visualization and on comprehensive analysis of massive datasets.

The research is addressing two broad topics. One is a framework for the integration of visual analysis and statistical modeling. We envision a system that facilities an iterative modeling process. The modeling cycle includes multiple stages, starting with descriptive visualization, then model selection, model fitting, diagnosis and evaluation, and finally iterative model refinement. The second topic is a general approach to visualization and modeling that scales from small to massive datasets, and the development of new methods specifically for the scaling of data visualization. We approach scaling by partitioning the data into subsets, sampling the subsets, and applying modeling and visualization to each subset. The investigators are carrying out the research in the context of two challenging data analysis projects in homeland security: (1) Daily counts of chief complaints from 76 emergency departments of the Indiana Public Health Emergency Surveillance System; and (2) Internet packet traces for network security that we collect on the campuses of Purdue University and Stanford University.