Latest News and Events

SAMSI-FODAVA Workshop

The SAMSI-FODAVA Workshop on Interactive Visualization and Analysis of Massive Data will be held on December 10-12, 2012.

Posted: October 02, 2012

FODAVA Annual Review Meeting 2012

The FODAVA Annual Meeting will immediately follow (Dec 12-13) the SAMSI/FODAVA joint workshop at the same location.

Posted: September 05, 2012

FODAVA Testbed Software

Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from num

Posted: June 30, 2012

Efficient Data Reduction and Summarization

Principal Investigator(s):

Ping Li (Cornell University)

The ubiquitous phenomenon of massive data (including data streams) imposes considerable challenges in data visualization and exploratory data analysis. About 15 years ago, terabyte datasets were still considered `ridiculous.' However, modern datasets managed by Stanford Linear Acceleration Center (SLAC), NASA, NSA, etc. have reached the perabyte scale or larger. Corporations such as Amazon, Wal-Mart, Ebay, and search engine firms are also major generators and users of massive data. The general theme of data reduction and summarization has become an active and highly inter-disciplinary area of research. This project proposes to develop various approximation techniques, which generate a "fingerprint" or "sketch" of the massive data by transforming the original data. These `sketches' are reasonably small (hence easy to store) and can provide approximate answers which are usually good enough for practical purposes.

This proposal concerns the fundamental problems of processing/transforming massive (possibly dynamic) data. In particular, it focuses on (A) developing systematic fundamental tools for effective data reduction and efficient data summarization; (B) applying these tools to improve numerical analysis, visualization, and exploratory data analysis. Two lines of theoretically sound techniques for data reduction and summarization will be developed and further improved: (1) the method of stable random projections (SRP), effective in heavy-tailed data; (2) the method of Conditional Random Sampling (CRS), mainly for sparse data. Concrete applications of SRP and CRS will be investigated. Widely-used basic numerical algorithms can be rewritten by taking advantage of SRP or CRS. Popular methods/tools for exploratory data analysis will also benefit considerably from the development of data reduction techniques.

Main menu

Latest News and Events

Efficient Data Reduction and Summarization

People of FODAVA

Research

Lectures

Events

Blog

Education & Outreach

Other DAVA News