Latest News and Events

The SAMSI-FODAVA Workshop on Interactive Visualization and Analysis of Massive Data will be held on December 10-12, 2012.
Posted: October 02, 2012
The FODAVA Annual Meeting will immediately follow (Dec 12-13) the SAMSI/FODAVA joint workshop at the same location.
Posted: September 05, 2012
Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from num
Posted: June 30, 2012

Manifold Alignment of High-Dimensional Data Sets

As the availability and size of digital information repositories continues to burgeon, the problem of extracting deep semantic structure from high-dimensional data becomes more critical. This project addresses the fundamental problem of transfer learning, in particular it investigates methods for aligning multiple heterogeneous data sets to find correspondences and extract shared latent semantic structure. Domains of applicability include automatic machine translation, bioinformatics, cross-lingual information retrieval, perceptual learning, robotic control, and sensor-based activity modeling. The proposed research will investigate a geometric framework for transfer learning based on finding correspondences between data by aligning their projections onto lower dimensional manifolds. The proposed research will investigate a broad spectrum of approaches to manifold alignment, including one-step vs. two-step alignment, instance-based vs. feature-based alignment, semi-supervised vs. unsupervised alignment, and finally one-level vs. multi-scale alignment. Visualization tools that use alignment information will be developed to facilitate interactive learning from data analysis. To aid the processing of large data sets, the parallel computational power of modern graphics processing units (GPUs) will be exploited.

Given the rapidly increasing availability of digital data sets from a diverse variety of domains, the scientific question of extracting knowledge from massive unstructured information repositories is becoming ever more critical. The proposed research combines the study of machine learning algorithms for discovering latent correspondences between seemingly disparate data sets, and the development of visualization tools to aid human interpretation of high-dimensional data. Empirical studies on a variety of real-world applications will be carried out, ranging from bioinformatics, Internet web archives, multilingual text, and sequential time-series data sets. The broader impacts of the proposed research include algorithmic advances in the analysis and visualization of high-dimensional data, and empirical studies on a variety of real-world applications. The data sets and software developed in this research will be disseminated through the web. The research will be communicated through a variety of conferences, workshops and seminars in several disciplines ranging from computer science, engineering, mathematics, and statistics. The PIs will make significant efforts to recruit underrepresented groups, including women and other minorities, in this research. New course material on advanced data analysis and visualization will be developed based on the proposed research.