Latest News and Events
Fast Algorithms and Data Structures for Visualization and Machine Learning on Massive Datasets
I will discuss mathematical approaches to allow fast real-time visualization of huge datasets. Nonlinear manifold learning methods provide a powerful way to visualize high-dimensional data in 2 or 3 dimensions, but require difficult computations rendering them intractable for large real-world datasets. I will briefly review manifold methods, then describe our approaches to the underlying bottleneck computations: all-nearest-neighbors, kernel summation, singular value decomposition, and convex optimization. The two novel approaches we introduce can be seen as principled forms of exact or approximate data reduction with rigorous error control, using multi-scale data structures, series expansions, and Monte Carlo ideas. The first is a generalization of the famous Fast Multipole Method of computational physics, and the second introduces a new data structure for linear algebra called the Cosine Tree. Results so far include the fastest practical algorithms for several of the aforementioned problems, making orders-of-magnitude larger datasets tractable.