Latest News and Events

The SAMSI-FODAVA Workshop on Interactive Visualization and Analysis of Massive Data will be held on December 10-12, 2012.
Posted: October 02, 2012
The FODAVA Annual Meeting will immediately follow (Dec 12-13) the SAMSI/FODAVA joint workshop at the same location.
Posted: September 05, 2012
Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from num
Posted: June 30, 2012

Fast Algorithms and Data Structures for Visualization and Machine Learning on Massive Datasets

Alex Gray

I will discuss mathematical approaches to allow fast real-time visualization of huge datasets. Nonlinear manifold learning methods provide a powerful way to visualize high-dimensional data in 2 or 3 dimensions, but require difficult computations rendering them intractable for large real-world datasets. I will briefly review manifold methods, then describe our approaches to the underlying bottleneck computations: all-nearest-neighbors, kernel summation, singular value decomposition, and convex optimization. The two novel approaches we introduce can be seen as principled forms of exact or approximate data reduction with rigorous error control, using multi-scale data structures, series expansions, and Monte Carlo ideas. The first is a generalization of the famous Fast Multipole Method of computational physics, and the second introduces a new data structure for linear algebra called the Cosine Tree. Results so far include the fastest practical algorithms for several of the aforementioned problems, making orders-of-magnitude larger datasets tractable.