GUI building in R: gWidgets vs Deducer

20th February, 2012 4 comments

I’ve been a user (and fan) of gWidgets for a couple of years now for GUI building in R. (See my introduction to it here.) However, it’s always good to check out the competition so I’ve been playing around with Deducer to see how they compare.

R can access a number of GUI building frameworks including tcltk, GTK, qt, and Java, not to mention HTML. gWidgets’ big selling point is that is provides a high-level wrapper to all the R wrappers for each framework, so you can write code in a toolkit independent way. Switching between tcltk and GTK and qt won’t often be that useful, but if you think you might want to move from a desktop based GUI to a web app, it makes the transition easier. By contrast, Deducer based upon the rJava, and provides access to the Java Swing framework. It’s a slightly lower level library (which means you have to write more lines of code to achieve the same thing), but since you get full access to Swing, it’s a little more flexible. Deducer also has some features to integrate your GUIs with JGR, so if you use that for running R, it’s perhaps the most natural choice.

To test the two frameworks, I wrote a small GUI for running the Kolmogorov-Smirnoff test (that one of the ones for checking whether or not a variable seems to have been sampled from a particular distribution). Take a look at the code below to see the comparison. (Regular reader may notice I’ve switched from my usual under_casing to camelCasing. Both the frameworks use this style, so I thought I’d follow suit for cleanliness.)

First, here are some common variables (labels and the like).

#Some sample data to test against
x1 <- rnorm(100)
x2 <- runif(100)

#Widget labels
labelX <- "Variable name for data: "
labelY <- "Distribution to compare to: "
labelAlternative <- "One or two sided test?: "
labelP <- "The p-value is: "

#Choices for comboboxes
choicesAlternative <- eval(formals(ks.test)$alternative)
distributions <- c(
    normal = pnorm, 
    exponential = pexp,
    F = pf,
    "log-normal" = plnorm,
    "Student's t" = pt,
    uniform = punif
)

This is the gWidgets GUI

createKsTestGwidgets <- function()
{
  library(gWidgetstcltk)
  options(guiToolkit = "tcltk")
  win <- gwindow("KS Test, gWidgets edition", visible = FALSE)
  
  frmX <- gframe("x", container = win)
  lblX <- glabel(labelX, container = frmX)
  txtX <- gedit(container = frmX)
  
  frmY <- gframe("y", container = win)
  lblY <- glabel(labelY, container = frmY)
  cmbY <- gcombobox(names(distributions), container = frmY)
  
  frmAlternative <- gframe("alternative", container = win)
  lblAlternative <- glabel(labelAlternative, container = frmAlternative)
  cmbAlternative <- gcombobox(choicesAlternative, container = frmAlternative)
  
  btnCalc <- gbutton("Calculate", container = win,
      handler = function(h, ...)
      {
        x <- get(svalue(txtX), mode = "numeric")
        y <- distributions[[svalue(cmbY)]]
        alternative <- svalue(cmbAlternative)
        ans <- ks.test(x, y, alternative = alternative)
        svalue(txtP) <- format(ans$p.value, digits = 3)
      }
  )
  frmResults <- gframe("results", container = win)
  lblP <- glabel(labelP, container = frmResults)
  txtP <- gedit(container = frmResults)
  visible(win) <- TRUE
  
}

createKsTestGwidgets()

…and here’s the Deducer equivalent.

createKsTestDeducer <- function()
{
  library(Deducer)
  win <- new(RDialog)
  win$setSize(300L, 500L)
  win$setTitle("KS TEST, Deducer edition")
  
  JLabel <- J("javax.swing.JLabel")
  lblX <- new(JLabel, labelX)
  addComponent(win, lblX, 1, 1000, 50, 1, rightType = "REL")
  txtX <- new(TextAreaWidget, "x")
  addComponent(win, txtX, 51, 1000, 150, 1, rightType = "REL")
  
  lblY <- new(JLabel, labelY)
  addComponent(win, lblY, 151, 1000, 200, 1, rightType = "REL")
  
  cmbY <- new(ComboBoxWidget, names(distributions))
  cmbY$setDefaultModel(names(distributions)[1])
  addComponent(win, cmbY, 201, 1000, 300, 1, rightType = "REL")
  
  lblAlternative <- new(JLabel, labelAlternative)
  addComponent(win, lblAlternative, 301, 1000, 400, 1, rightType = "REL")
  
  cmbAlternative <- new(ComboBoxWidget, choicesAlternative)
  cmbAlternative$setDefaultModel(choicesAlternative[1])
  addComponent(win, cmbAlternative, 401, 1000, 500, 1, rightType = "REL")
  
  JButton <- J("javax.swing.JButton")
  btnCalc <- new(JButton, "Calculate")
  addComponent(win, btnCalc, 501, 1000, 601, 1, rightType = "REL")
  ActionListener <- J("org.rosuda.deducer.widgets.event.RActionListener")
  listener <- new(ActionListener)
  calculationHandler <- function(cmd, ActionEvent)
  {
    x <- get(txtX$getText())
    y <- distributions[[cmbY$getModel()]]
    alternative <- cmbAlternative$getModel()
    ans <- ks.test(x, y, alternative = alternative)
    print(ans)
    txtP$setText(format(ans$p.value, digits = 3))
  }
  listener$setFunction(toJava(calculationHandler))
  btnCalc$addActionListener(listener)
  
  lblP <- new(JLabel, labelP)
  addComponent(win, lblP, 601, 1000, 650, 1, rightType = "REL")
  
  txtP <- new(TextAreaWidget, "results")
  addComponent(win, txtP, 651, 1000, 750, 1, rightType = "REL")
    
  win$run()
}

createKsTestDeducer()

Note that the Deducer example works perfectly under JGR, though I couldn’t get the button handler to fire when running it from eclipse. This is likely due to my inexperience with the toolkit rather than a fundamental problem with the framework. Many of the lines are more or less a one-to-one comparison, but Deducer requires you to explicitly specify positions of widgets, and is a little more verbose when you come to add event handling logic.

Either or these frameworks is suitable for the obvious use case of GUI building in R (rapid prototyping of front-ends for non technical users, and for teaching demos), so don’t sweat your decision too much.

Edit: Fixed variable name casing issues.

R hits 10000 questions on stackoverflow

17th February, 2012 Leave a comment

R's 10000th question on stackoverflow

A milestone, though not that exciting as questions go. Still, if you haven’t yet joined the cult of Stack Exchange, take a look here.

Tags: ,

Viewing the internals of MATLAB Matrices

31st January, 2012 Leave a comment

A cool undocumented trick I just learnt from The MathWorks’ Bob Gilmore. If you type

format debug

Then printing any vector reveals information about its internal representation. For example:

x = magic(3)

x =


Structure address = 6bc1ab0 
m = 3
n = 3
pr = d8dccf0 
pi = 0
     8     1     6
     3     5     7
     4     9     2

The structure address is the address in memory where the matrix is stored, m and n are the number of rows and columns respectively of the matrix, and pr and pi are pointers to the addresses of the matrices storing the real and imaginary components of the matrix.

One interesting thing to look at is the representation of scalar numbers.

 y = 1

y =


Structure address = 6bc31e0 
m = 1
n = 1
pr = d790b90 
pi = 0
     1

Yep: they are stored in exactly the same way as matrices: in the same way the “everything in R is a vector”, everything in MATLAB is a matrix. To finish up, here are some more examples for you to explore:

% higher dimensional arrays
rand(2, 3, 4)
% cell arrays (unfortunately not that revealing)
{1, magic(3)}
% sparse matrices (very interesting)
sparse(ones(3))

Exploring the functions in a package

26th January, 2012 4 comments

Sometimes it can be useful to list all the functions inside a package. This is done in the same way that you would list variables in your workspace. That is, using ls. The syntax is ls(pos = "package:packagename"), which is easy enough if you can remember it. Unfortunately, I never can, and have to type search() first to see what the format of that string is.

Today, that problem is solved with a tiny utility function to save remembering things, and to save typing.

lsp <- function(package, all.names = FALSE, pattern) 
{
  package <- deparse(substitute(package))
  ls(
      pos = paste("package", package, sep = ":"), 
      all.names = all.names, 
      pattern = pattern
  )
}

all.names and pattern behave in the same way as they do in regular ls. You use it like this:

lsp(base)
lsp(base, TRUE)
lsp(base, pattern = "^is")


EDIT: I’ve had a couple of questions about the use case, and there are some interesting comments on alternatives. My thinking behind this function was that I sometimes know I’ve seen a function in a package but can’t remember what it’s called. If you can hazard a guess at the name, then apropos is probably better, though it looks everywhere on the search path rather than in a particular package. Autocompletion is also useful for this, but you need to know the first few characters of what you are looking for. (Activate autocompletions by pressing TAB in R GUI or Rstudio or CTRL+space in eclipse. I can’t remember what the shortcut is in emacs, but you probably just mash CTRL+META until you have RSI.) Finally, the unknownR package is useful for finding new functions that you hadn’t heard of yet.

A quick primer on split-apply-combine problems

16th December, 2011 5 comments

I’ve just answered my hundred billionth question on Stack Overflow that goes something like

I want to calculate some statistic for lots of different groups.

Although these questions provide a steady stream of easy points, its such a common and basic data analysis concept that I thought it would be useful to have a document to refer people to.

First off, you need to data in the right format. The canonical form in R is a data frame with one column containing the values to calculate a statistic for and another column containing the group to which that value belongs. A good example is the InsectSprays dataset, built into R.

head(InsectSprays)
  count spray
1    10     A
2     7     A
3    20     A
4    14     A
5    14     A
6    12     A

These problems are widely known as split-apply-combine problems after the three steps involved in their solution. Let’s go through it step by step.

First, we split the count column by the spray column.

(count_by_spray <- with(InsectSprays, split(count, spray)))

Secondly, we apply the statistic to each element of the list. Lets use the mean here.

(mean_by_spray <- lapply(count_by_spray, mean))

Finally, (if possible) we recombine the list as a vector.

unlist(mean_by_spray)

This procedure is such a common thing that there are many functions to speed up the process. sapply and vapply do the last two steps together.

sapply(count_by_spray, mean)
vapply(count_by_spray, mean, numeric(1))

We can do even better than that however. tapply, aggregate and by all provide a one-function solution to these S-A-C problems.

with(InsectSprays, tapply(count, spray, mean))
with(InsectSprays, by(count, spray, mean))
aggregate(count ~ spray, InsectSprays, mean)

The plyr package also provides several solutions, with a choice of output format. ddply takes a data frame and returned another data frame, which is what you’ll want most of the time. dlply takes a data frame and returns the uncombined list, which is useful if you want to do another processing step before combining.

ddply(InsectSprays, .(spray), summarise, mean.count = mean(count))
dlply(InsectSprays, .(spray), summarise, mean.count = mean(count))

You can read much more on this type of problem and the plyr solution in The Split-Apply-Combine Strategy for Data Analysis, in the Journal of Statistical Software, by the ubiquitous Hadley Wickham.

One tiny variation on the problem is when you want the output statistic vector to have the same length as the original input vectors. For this, there is the ave function (which provides mean as the default function).

with(InsectSprays, ave(count, spray))

MATLAB’s stand out new feature

6th October, 2011 Leave a comment

It’s been a while since my last MATLAB post, not because I don’t love the language, but more because I do most of my blogging from home, where I have no license, and because (mostly thanks to R-bloggers) I get ten times as many page views for the R posts. (TODO: Create MATLAB-bloggers service.)

Having returned from holiday (it was lovely, thanks for asking) I’ve been trying out the latest release of MATLAB – R2011b. So far, the standout new feature is the automatic variable renaming. If you change the name of a variable at the point where it was declared, then pressing Shift+Enter lets MATLAB rename all other instances. IDEs for statically-typed languages have had this feature for years, but to see it in a dynamically-typed language is very impressive.

MATLAB's variable auto-renaming in action

A Great European Bailout Infographic

8th September, 2011 1 comment

Whenever there’s a financial crisis, I tend to assume that it’s Dirk Eddelbuettel‘s fault, though apparently the EMU debt crisis is more complicated than that. JP Morgan have just released a white paper about the problem, including a Lego infographic of who is asking who for money. Created, apparently, by Peter Cembalest, aged nine. Impressive stuff.

European bailout graph

Found via The Register.

Interactive graphics for data analysis

1st September, 2011 2 comments

Rocking out, reading Theus & Urbanek

I got a copy of Martin Theus and Simon Urbanek’s Interactive Graphics for Data Analysis a couple of years ago, whence it’s been sat on my bookshelf. Since I’ve recently become a self-proclaimed expert on interactive graphics I thought it was about time I read the thing. Which is exactly what I did last weekend at the Leeds Festival (in between rocking out).

It’s a book of two halves, and despite the title the interactivity isn’t really the focus. The book is actually a guide on how to do exploratory data analysis. The first half of the book works like an advanced chart chooser, explaining which plots are useful for which types of data, and what types of interactivity they can benefit from. For me, it was worth it for the many rare plots, like spineplots and interaction plots and mosaic plots and fluctuation diagrams. If you’re bored of barcharts, this is a great way to expand your graphical vocabulary. The second half of the book consists entirely of case studies, where you can practice a workflow for exploring data, which is something that’s always worthwhile doing.

The really big takeaway that I got is that exploratory graphics have different priorities to publication graphics. When you are in the courting stage with a dataset, just getting to know each other, you don’t really care so much about whether the greek letters in your axis label are formatted correctly or whether the shade of pink in your dots is quite right. All you really need is to be able to generate lots and lots of plots quickly, and to be able to see the relationships between them.

It is this last point that the authors claim interactivity is most useful for. Perhaps the canonical example of this is clicking a bar on a histogram or barchart, and having corresponding points on a scatterplot highlighted. To demonstrate this, here’s an example using Simon’s Acinonyx package (shortly to be renamed ix for “iplots Extreme”). Acinonyx isn’t yet available on CRAN, see its home page
for installation details.

library(Acinonyx)        
library(MASS)
data(Cars93)
interactive_scatter <- with(Cars93, iplot(Horsepower, MPG.city))  
interactive_histo <- with(Cars93, ihist(EngineSize))

Click a bar in the histogram and the the corresponding points in the scatterplot are highlighted. Likewise, drag to select points in the scatterplot and fractions of the histgram are highlighted.

The equivalent static version would be to use trellising and draw each possible graph combination. Splitting a scatterplot into different groups depending upon bars of a histogram works something like this:

library(ggplot2)
Cars93$EngineSizeGroup <- cut(Cars93$EngineSize, 11)
(static_trellis_scatter <- ggplot(Cars93, aes(Horsepower, MPG.city)) +
  geom_point() +
  facet_wrap(~ EngineSizeGroup)
)

(We don’t actually need to bother with the histograms, since they are a little boring.) The reverse operation – going from a selected region of scatterplot to a higlighted region of bar chart is also possible, but trickier. In this case, we do need both graphs.

Cars93 <- within(Cars93, 
{
  selected <- ifelse(
    Horsepower < 200 & MPG.city > 20 & MPG.city < 30, 
    "selected", 
    "unselected"
  )
})
(static_scatter_with_highlight <-
  ggplot(Cars93, aes(Horsepower, MPG.city, colour = selected)) +
  geom_point()
)
(static_histo_with_highlight <- 
  ggplot(Cars93, aes(EngineSizeGroup, fill = selected)) +
  geom_histogram() + 
  opts(axis.text.x = theme_text(angle = 30, hjust = 1, vjust = 1))
)

My conclusion from reading the book, and from my initial experimentation with Acinonyx is that anything you can do interactively is also possible by drawing many static graphs, but the interaction can let you see things quicker.

Nomograms everywhere!

30th August, 2011 Leave a comment

At useR!, Jonty Rougier talked about nomograms, a once popular visualisation that has fallen by the wayside with the rise of computers. I’d seen a few before, but hadn’t understood how they worked or why you’d want to use them. Anyway, since that talk I’ve been digging around in biology books from the 60s and 70s, and it seems they are full of them. So for those of you who haven’t seen the talk, here’s how they work.

A basic nomogram consists of three scales. By reading off known values from two of the scales, you can estimate a third one. Here’s an example I found in the ICRP‘s reference manual.

Nomogram estimating skin surface area from height and bodyweight

It’s difficult to measure people’s skin surface area, but height and bodyweight are very straightforward. To use the nomogram, you place a ruler (or other straight edge) on the height* and weight scales and read of the point where the ruler crosses the surface area scale. I’m 177cm tall and weigh 72kg, so according to this, my estimated skin surface area is 1.89m2.

Of course nowadays, the standard way to solve this sort of problem is to write a function. Jonty suggested that the main modern use of nomograms is in fieldwork situations, where computers aren’t handily available. (His case study was Kenyan vets trying to estimate the weight of donkeys form there height and girth.)

Altman and Dittmer’s Respiration and Circulation has many more pretty nomograms. I was particularly impressed by those on blood pH, reproduced below for your pleasure.

Nomogram estimating excess base in blood from pH and CO2 pressure via Siggard Andersen

Nomogram estimating excess base in blood from pH and CO2 pressure via Singer Hastings

Your homework is to dig out a pre-1980 textbook and hunt for more nomograms.

*Gruesomely, the fact that the scale is labelled “length” rather than “height” makes me suspect that the bodies that provided the data were in a permanent lying down position – that is, they were corpses.

Follow

Get every new post delivered to your Inbox.

Join 41 other followers