Archive

Posts Tagged ‘urbanek’

Interactive graphics for data analysis

1st September, 2011 2 comments

Rocking out, reading Theus & Urbanek

I got a copy of Martin Theus and Simon Urbanek’s Interactive Graphics for Data Analysis a couple of years ago, whence it’s been sat on my bookshelf. Since I’ve recently become a self-proclaimed expert on interactive graphics I thought it was about time I read the thing. Which is exactly what I did last weekend at the Leeds Festival (in between rocking out).

It’s a book of two halves, and despite the title the interactivity isn’t really the focus. The book is actually a guide on how to do exploratory data analysis. The first half of the book works like an advanced chart chooser, explaining which plots are useful for which types of data, and what types of interactivity they can benefit from. For me, it was worth it for the many rare plots, like spineplots and interaction plots and mosaic plots and fluctuation diagrams. If you’re bored of barcharts, this is a great way to expand your graphical vocabulary. The second half of the book consists entirely of case studies, where you can practice a workflow for exploring data, which is something that’s always worthwhile doing.

The really big takeaway that I got is that exploratory graphics have different priorities to publication graphics. When you are in the courting stage with a dataset, just getting to know each other, you don’t really care so much about whether the greek letters in your axis label are formatted correctly or whether the shade of pink in your dots is quite right. All you really need is to be able to generate lots and lots of plots quickly, and to be able to see the relationships between them.

It is this last point that the authors claim interactivity is most useful for. Perhaps the canonical example of this is clicking a bar on a histogram or barchart, and having corresponding points on a scatterplot highlighted. To demonstrate this, here’s an example using Simon’s Acinonyx package (shortly to be renamed ix for “iplots Extreme”). Acinonyx isn’t yet available on CRAN, see its home page
for installation details.

library(Acinonyx)        
library(MASS)
data(Cars93)
interactive_scatter <- with(Cars93, iplot(Horsepower, MPG.city))  
interactive_histo <- with(Cars93, ihist(EngineSize))

Click a bar in the histogram and the the corresponding points in the scatterplot are highlighted. Likewise, drag to select points in the scatterplot and fractions of the histgram are highlighted.

The equivalent static version would be to use trellising and draw each possible graph combination. Splitting a scatterplot into different groups depending upon bars of a histogram works something like this:

library(ggplot2)
Cars93$EngineSizeGroup <- cut(Cars93$EngineSize, 11)
(static_trellis_scatter <- ggplot(Cars93, aes(Horsepower, MPG.city)) +
  geom_point() +
  facet_wrap(~ EngineSizeGroup)
)

(We don’t actually need to bother with the histograms, since they are a little boring.) The reverse operation – going from a selected region of scatterplot to a higlighted region of bar chart is also possible, but trickier. In this case, we do need both graphs.

Cars93 <- within(Cars93, 
{
  selected <- ifelse(
    Horsepower < 200 & MPG.city > 20 & MPG.city < 30, 
    "selected", 
    "unselected"
  )
})
(static_scatter_with_highlight <-
  ggplot(Cars93, aes(Horsepower, MPG.city, colour = selected)) +
  geom_point()
)
(static_histo_with_highlight <- 
  ggplot(Cars93, aes(EngineSizeGroup, fill = selected)) +
  geom_histogram() + 
  opts(axis.text.x = theme_text(angle = 30, hjust = 1, vjust = 1))
)

My conclusion from reading the book, and from my initial experimentation with Acinonyx is that anything you can do interactively is also possible by drawing many static graphs, but the interaction can let you see things quicker.

Follow

Get every new post delivered to your Inbox.

Join 228 other followers