Home > R > How R will turn into SQL

How R will turn into SQL

Up until very recently the only way of running R code was through the standard R distribution. Of course you could use another IDE, but somewhere underneath it all you would be running the same, standard R engine from the R-core team.

This is no longer your only option. A couple of weeks ago Radford Neal released pqR, a research project that reimplements pieces of the R engine to make it “pretty quick”. Much of pqR is likely to be folded back into the main R engine, but that isn’t the only new player in town.

A team at Tibco, including software architect Michael Sannella, have rewritten the R engine from the ground up for TERR, a commercial R distribution. This leads to some very interesting possibilities. Maybe in the few years we could see many more R engines appearing. Oracle have recently invested heavily in R, and it’s certainly imaginable that they could create a high performance R that is tightly coupled to their database products. Google are also big R investors, and I could easily see them creating an R engine that has parallelism built in.

After that, perhaps even Microsoft might notice R and fix the fact that its integration with .NET is rubbish. IronR, anyone?

This situation isn’t as far fetched as you may think: there is a very obvious precedent in the data analysis world. There are dozens of different database vendors that use a common interface langauge: SQL. In the world of databases, the engine is separate from the programming language.

This is a good thing – multiple vendors provides competition, differentiation and innovation. You have a full spectrum of products from big corporate databases (Oracle again) down to miniature data stores like sqlite.

The fact that SQL has led the way means that some of the potential traps are visible before we embark down this road. My biggest bugbear with SQL is that it isn’t quite universal: different vendors have their own non-standard extensions, which means that not all SQL code is portable. A situation like HTML or C++ or Fortran, where a standards committee defines an official version of the language would be preferable. Whether R-core or another body would set this standard is a matter to be decided. (I suspect that R-core would not welcome the additional administrative burden, and commercial vendors may want more control in defining the spec, so a separate R-standards group is more likely.)

These are interesting times for R, and I look forward to seeing how the separation of language and engine progresses.

Bootnotes:

  1. A sloppy sentence in a previous version of this post made it sound like Michael Sannella ran Tibco. He’s actually a software architect.
  2. The future is happening faster than we think. Renjin and Riposte are two other R engines.
About these ads
Tags: , , , , ,
  1. 17th July, 2013 at 19:34 pm

    SQL does have a standard, but from the first version (1986) it was more an idea than a requirement. The last few have split the standard into “standard” parts and “optional” parts, such that most anything that claims to use SQL can claim compliance with some part of some version of standard SQL.

    What R lacks, and has needed from the start, is a BDFL (Guido, in Python terms), who can settle the division between stat pack command language and general purpose (sort of) programming language. Much of the angst with R is that it’s simply neither fish nor fowl.

    And from the title of the post, I was expecting an essay on the beauty of PL/R (and some discussion of similar for other databases), with the analytics part of the database, R driven by SQL; rather then other way round. Which is how 99.44% of people integrate R and RDBMS.

    • 17th July, 2013 at 23:03 pm

      Yep, crappy post title.

      PL/R looks interesting; I must look into it. Thanks.

  2. alex
    18th July, 2013 at 15:01 pm

    There is also Renjin, another independent implementation of an R interpreter. We have similar challenges to those that the TIBCO team are facing, though we’ve been able to keep all of the parts of GNU written in R because Renjin is also open source.

    We’re taking a pragmatic approach to language definition issues – we test each release against all existing CRAN packages and their 20,000 + examples and tests. We haven’t reached 100% compatiblity yet but we’re getting closer: http://packages.renjin.org

    • 18th July, 2013 at 15:51 pm

      Wow, that’s a really interesting project.

      Sounds like it could make it easier for the machine learning community, where there seems to be a split between R and Java.

  3. 18th July, 2013 at 19:03 pm

    I suppose it had to happen, but the analogy isn’t SQL but Ruby, and what happened to Ruby is a tragedy.

    • 18th July, 2013 at 21:50 pm

      I know that there are several Ruby implementations, but I haven’t used the language other than to play around so forgive my ignorance.

      What’s the tragedy?

  4. anon
    22nd July, 2013 at 3:16 am

    Also check out Riposte: https://github.com/jtalbot/riposte

  5. 26th July, 2013 at 19:38 pm

    In developing TERR (TIBCO Enterprise Runtime for R), we take R compatibility very seriously. Early on, we heard loud and clear from our customers that R compatibility is critical, and they do not want to be locked into a proprietary flavor of the R language, and so we run thousands of tests regularly vs. R and CRAN packages. We also try to be very open about our R compatibility: the TERR Community Site, https://www.tibcommunity.com/community/products/analytics/terr, has resources which detail what parts of R and CRAN we currently work with, and provides a link to download a free Developer’s Edition of TERR, to enable the R community to use it widely, and benefit from our work.

    If you’d like to learn more, feel free to contact me (I’m the Product Manager for TERR), or post on the community site.

  6. Michael Sannella
    26th July, 2013 at 19:57 pm

    A small clarification: The sentence “Tibco, led by Michael Sannella,
    have rewritten…” would be more accurately rewritten as “A group
    within TIBCO, including Michael Sannella (software architect), have
    rewritten…”. I’m not leading TIBCO, unless they’ve given me a
    promotion they haven’t told me about.

    I am also curious about what “the tragedy of Ruby” was.

  7. Chris
    4th September, 2013 at 15:00 pm

    In my organization the one thing limiting use of R is GPL licensing issues. How do any of these things overcome that?

    • 13th October, 2013 at 19:24 pm

      @Chris Would be curious to know what licensing issues you’re encountering? The GPL certainly imposes certain restrictions on developing works that “derive” from the R interpreter (and yes I know defining “derive” is very problematic), but it shouldn’t impact your use of the R language any more than compiling your own C programs with GCC. With the Renjin project, we’re very interested in making the project useful to as wide an audience as possible (while still meeting our obligations under the GPL). Is there anything we could do specifically to facilitate adoption? LGPL certain portions, for example?

  8. 30th October, 2013 at 22:41 pm

    TERR is not under the GPL license. We implemented TERR from the ground up to provide comparable functionality to the open source R engine, but did so in a clean-room manner, based on our resources and long expertise as the makers of S+. For those customer for whom GPL is a concern, TERR provides an alternative.

  1. 7th September, 2013 at 8:15 am
  2. 3rd January, 2014 at 0:21 am

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 229 other followers

%d bloggers like this: