The SAMSI-FODAVA Workshop on Interactive Visualization and Analysis of Massive Data will be held on December 10-12, 2012.
Posted: October 02, 2012
The FODAVA Annual Meeting will immediately follow (Dec 12-13) the SAMSI/FODAVA joint workshop at the same location.
Posted: September 05, 2012
Many of the modern data sets such as text and image data can be represented in high-dimensional vector spaces and have benefited from computational methods that utilize advanced techniques from num
Posted: June 30, 2012

The Mathematical Foundation of Analytic Visualizations

Leland Wilkinson

The Grammar of Graphics (GoG) is the title of a book that lays out the mathematical foundation of analytic visualizations: statistical, cartographic, and other quantitative graphics designed to represent observed or abstract data. Analytic visualizations are distinguished from other graphics by their mathematical formalism. Informal diagrams, by contrast, are designed to communicate ideological, artistic, religious, or other metaphorical information.

The GoG foundation is based on the conventional definition of the graph of a function: a collection of ordered pairs (x, f(x)). A graphic is a visual representation of the graph of a function. In analytic visualizations, this function operates on observed or abstract data.

GoG decomposes the global visualization function into seven orthogonal classes that comprise a totally ordered function chain. Each class has a collection of member functions that are composable with functions in adjacent classes of the function chain. The first class (Variables) maps data to an object called a varset (a set of variables). The next two classes (Algebra, Scales) are transformations on varsets. The next class (Statistics) takes a varset and creates a statistical graph (a statistical summary). The next class (Geometry) maps a statistical graph to a geometric graph. The next (Coordinates) embeds a graph in a coordinate space. And the last class (Aesthetics) maps a graph to a visible or perceivable display called a graphic.

A consequence of this class-orthogonality is a high degree of expressiveness: the product set of these seven function classes produces a huge variety of graphical forms or chart types. In fact, it is claimed that virtually the entire corpus of known statistical charts can be generated by this relatively parsimonious system, and perhaps a great number of meaningful but undiscovered chart types as well.

The second principal claim of GoG is that this function chain encapsulates the meaning of what we do when we construct formal statistical graphics, charts, and visualizations. It is not a taxonomy. It is a computational system based on the underlying mathematics of
representing functions of data. A consequence of this claim is to say that charts not definable within the GoG chain should be carefully examined for the possibility that they are ill-formed (meaningless).

This talk will include concrete examples to illustrate distinguishing characteristics of visualization languages based on GoG: simplicity, expressiveness, coherence, and meaningfulness. I will also survey software systems based on GoG that have been developed since the book was first published in 1999.

Leland Wilkinson is Executive VP of SYSTAT Software Inc., Adjunct Professor of Statistics at Northwestern University, and Adjunct Professor of Computer Science at the University of Illinois Chicago. He received an A.B. degree from Harvard in 1966, an S.T.B. degree from Harvard Divinity School in 1969, and a Ph.D. from Yale in 1975. Wilkinson wrote the SYSTAT statistical package and founded SYSTAT Inc. in 1984. After the company grew to 50 employees, he sold SYSTAT to SPSS in 1994 and worked there for ten years on research and development of visualization systems. SPSS eventually sold SYSTAT to Cranes Software International and Wilkinson rejoined SYSTAT in 2008. Wilkinson is a Fellow of the American Statistical Association, an elected member of the International Statistical Institute, and a Fellow of the American Association for the Advancement of Science. He has won best speaker award at the National Computer Graphics Association and the Youden prize for best expository paper in the statistics journal Technometrics. He has served on the Committee on Applied and Theoretical Statistics of the National Research Council and has been Vice Chair of the Board of the National Institute of Statistical Sciences (NISS). In addition to authoring journal articles, the original SYSTAT computer program and manuals, and patents in visualization and distributed analytic computing, Wilkinson is the author (with Grant Blank and Chris Gruber) of Desktop Data Analysis with SYSTAT and The Grammar of Graphics.