Latest News and Events

The SAMSI-FODAVA Workshop on Interactive Visualization and Analysis of Massive Data will be held on December 10-12, 2012.
Posted: October 02, 2012
The FODAVA Annual Meeting will immediately follow (Dec 12-13) the SAMSI/FODAVA joint workshop at the same location.
Posted: September 05, 2012
Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from num
Posted: June 30, 2012

The Disappearing Second Derivative of Quadratics: Perceptual, Mathematical, and Statistical Properties of Judging Dependence on Visual Displays

William S. Cleveland

When y is plotted against x to see how y depends on x, whether it is data or a function that is displayed, the aspect ratio (height/width) is a critical factor in our ability to visually decode information about the dependence. Dependence is in part decoded by judging the slopes of line segments: the local segments of a curve that is displayed or of a virtual curve that forms from the underlying pattern of displayed data points. A change in the aspect ratio changes the orientations of the physical slopes of the segments, which in turn changes our ability to visually decode slope to judge the rate of change of y with x.

Work by Cleveland and McGill showed that we can greatly enhance our ability to judge rate of change by "banking to 45 degrees": choosing the aspect ratio to center the absolute orientations of the segments on 45 degrees. R.A. Fisher, who founded modern statistics along with mathematical genetics, seems to have understood this result for a special case.

The question is a definition of centering. Banking algorithms have been put forward including some very interesting recent work of Heer and Agrawala.

Performance studies of the algorithms in the past have been purely empirical.
Segments are generated, algorithms applied, and the resulting distribution of segment orientations assessed.

We are studying the properties of banking algorithms theoretically using visual perception, geometry, and statistics. Mathematical descriptions of the curvature that the human visual system perceives elucidates why we see what we do on data displays such as a graph of a quadratic polynomial. We discovered that a geometrically motivated banking algorithm, resultant-vector banking, leads to simple formulas for the aspect ratio of a banked displayed that are tractable and enable mathematical and statistic investigations of properties.

Moving from the discrete case of past work on a finite set of line segments to the continuous case provides additional mathematical insights.

Joint work with Saptarshi Guha, Department of Statistics, Purdue University

William S. Cleveland is a Professor of Statistics and Courtesy Professor of Computer Science at Purdue University. Previous to this he was a Distinguished Member of Technical Staff in the Statistics Research Department at Bell Labs, Murray Hill; for 12 of his years at Bell Labs he was a Department Head. His areas of Research have included data visualization, computer networking, machine learning, data mining, time series, statistical modeling, visual perception, environmental science, and seasonal adjustment. Cleveland has been involved in many projects requiring the mining, statistical analysis, and modeling of data from several fields including environmental science, customer opinion polling, visual perception, and computer networking. In the course of this work he has developed many new statistical models and methods, including visualization methods, that are widely used in engineering, science, medicine, and business. He has participated in the design and implementation of software for the trellis display framework for visualization that he and colleagues developed, and for the loess approach to nonparametric function estimation that he introduced into statistics and machine learning. The software is now a part of many commercial systems. Cleveland has published over 120 papers on his research in a wide range of scientific journals, refereed proceedings, and books. In the area of data visualization he has written three books and one user's manual, edited two books, and edited a special issue of the Journal of the American Statistical Association. He was the editor-in-chief of the seven volumes of the Collected Works of John W. Tukey, and for ten years was an editor of the Wadsworth Probability and Statistics Series. His two books The Elements of Graphing Data and Visualizing Data have been reviewed in dozens of journals, and Elements was selected for the Library of Science. He is a principal investigator in the Network Modeling and Simulation Program of DARPA where he works on statistical modeling for generating background packet-level traffic and source-level traffic in simulators, on bandwidth allocation, on validation of network simulator models, and on packet sampling. Cleveland has twice won the Wilcoxon Prize and once won the Youden prize from the statistics journal Technometrics . He is a Fellow of the American Statistical Association, the Institute of Mathematical Statistics, and the American Association of the Advancement of Science, and is an elected member of the International Statistical Institute. In 1996 he was chosen Statistician of the Year by the Chicago Chapter of the American Statistical Association. In 2002 he was selected as a Highly Cited Researcher by the American Society for Information Science & Technology in the newly formed mathematics category. He was the founding chair of the Graphics Section of the American Statistical Association, and has served on the Council of the Institute of Mathematical Statistics, the Committee on Applied and Theoretical Statistics of the National Research Council, and the Council of the Statistics Section of the American Association of the Advancement of Science. Cleveland received an A.B. in Mathematics from Princeton where his senior thesis advisor was probabilist William Feller. He received his Ph.D. in Statistics from Yale University where his Ph.D. thesis advisor was statistician Leonard Jimme Savage.