How R will turn into SQL
Up until very recently the only way of running R code was through the standard R distribution. Of course you could use another IDE, but somewhere underneath it all you would be running the same, standard R engine from the R-core team.
This is no longer your only option. A couple of weeks ago Radford Neal released pqR, a research project that reimplements pieces of the R engine to make it “pretty quick”. Much of pqR is likely to be folded back into the main R engine, but that isn’t the only new player in town.
A team at Tibco, including software architect Michael Sannella, have rewritten the R engine from the ground up for TERR, a commercial R distribution. This leads to some very interesting possibilities. Maybe in the few years we could see many more R engines appearing. Oracle have recently invested heavily in R, and it’s certainly imaginable that they could create a high performance R that is tightly coupled to their database products. Google are also big R investors, and I could easily see them creating an R engine that has parallelism built in.
After that, perhaps even Microsoft might notice R and fix the fact that its integration with .NET is rubbish. IronR, anyone?
This situation isn’t as far fetched as you may think: there is a very obvious precedent in the data analysis world. There are dozens of different database vendors that use a common interface langauge: SQL. In the world of databases, the engine is separate from the programming language.
This is a good thing – multiple vendors provides competition, differentiation and innovation. You have a full spectrum of products from big corporate databases (Oracle again) down to miniature data stores like sqlite.
The fact that SQL has led the way means that some of the potential traps are visible before we embark down this road. My biggest bugbear with SQL is that it isn’t quite universal: different vendors have their own non-standard extensions, which means that not all SQL code is portable. A situation like HTML or C++ or Fortran, where a standards committee defines an official version of the language would be preferable. Whether R-core or another body would set this standard is a matter to be decided. (I suspect that R-core would not welcome the additional administrative burden, and commercial vendors may want more control in defining the spec, so a separate R-standards group is more likely.)
These are interesting times for R, and I look forward to seeing how the separation of language and engine progresses.
- A sloppy sentence in a previous version of this post made it sound like Michael Sannella ran Tibco. He’s actually a software architect.
- The future is happening faster than we think. Renjin and Riposte are two other R engines.