Latest News and Events
Scalable Visualization and Model Building
Developing new algorithms, visualization tools, and mathematical models that can predict and explain patterns in data is fundamental to machine learning and statistics. They enable a predictive modeling that is fundamental to science and engineering. Visualization is critical in all phases of data analysis, from the moment the data are collected when data checking and cleaning are needed, to the final presentation of results. Visualization facilitates model building by allowing the analyst to critically assess the predictive power of a model, and to diagnose problems in fitting the patterns in the data. The investigators are carrying out research in approaches, methods, and models for describing patterns in data with a strong emphasis on visualization and on comprehensive analysis of massive datasets.
The research is addressing two broad topics. One is a framework for the integration of visual analysis and statistical modeling. We envision a system that facilities an iterative modeling process. The modeling cycle includes multiple stages, starting with descriptive visualization, then model selection, model fitting, diagnosis and evaluation, and finally iterative model refinement. The second topic is a general approach to visualization and modeling that scales from small to massive datasets, and the development of new methods specifically for the scaling of data visualization. We approach scaling by partitioning the data into subsets, sampling the subsets, and applying modeling and visualization to each subset. The investigators are carrying out the research in the context of two challenging data analysis projects in homeland security: (1) Daily counts of chief complaints from 76 emergency departments of the Indiana Public Health Emergency Surveillance System; and (2) Internet packet traces for network security that we collect on the campuses of Purdue University and Stanford University.