Home > R > Make your data famous!

Make your data famous!

I’m writing a book on R for O’Reilly, and I need interesting datasets for the examples. Any data that you provide will get you a mention in the book and in the publicity material, so it’s a great opportunity to publicise your work or your organisation.

Datasets from any area or industry are suitable; the only constraint is that it can be analysed with a few pages of R code to provide a result that a general reader might go “ooh”. There’s a chapter on data cleaning, so even dirty data is suitable!

All the data will be provided in an R package to accompany the book, so you need to be willing to make it publically available. I can help you anonymise the data, or strip out commercially sensitive parts if you require.

If you can provide anything, or you know someone who might be able to, then drop me an email at richierocks AT gmail DOT com. Thanks.

EDIT: There are some (quite) frequently asked questions already! Here are the answers; you can use your Jeopardy! skills to guess the questions.
1. The book is called “Learning R”, and it’s a fairly gentle introduction to the language, covering both how you program in R, and how you analyse data.
2. If you provide data, then yes, you can have an PDF of the pre-release version to make sure I haven’t done something silly with your dataset.

Tags: , , ,
  1. 31st October, 2012 at 0:52 am

    Check out the WDI package on CRAN and let me know what you think. Contact info and usage examples posted on github: http://github.com/vincentarelbundock/WDI

  2. toke
    31st October, 2012 at 12:06 pm

    Would you be interested in house price data with a lot of explanatory variables.

    • 31st October, 2012 at 16:08 pm

      Absolutely. Please drop me an email. Thanks.

  3. Anonymous
    31st October, 2012 at 22:56 pm

    What’s the focus of your book?

  4. anspiess
    1st November, 2012 at 8:08 am

    How about some high-dimensional (31 x 55000) microarray data to show the merits and differences of hierarchical clustering and PCA?

    • 1st November, 2012 at 8:41 am

      Sounds wonderful. You might have to talk me through the those merits and differences; it’s a while since I’ve done any PCA. Email me at richierocks AT gmail DOT com with your thoughts and data. Thanks.

  5. 15th July, 2014 at 18:34 pm

    I’ve added a new R packaged called archdata that contains 11 data sets from archaeological sites. Eventually it will be about double that by adding the data sets on my webpage at http://people.tamu.edu/~dcarlson/quant/data/index.html

  1. No trackbacks yet.

Leave a Reply to Anonymous Cancel reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: