<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>4D Pie Charts</title>
	<atom:link href="http://4dpiecharts.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://4dpiecharts.com</link>
	<description>Scientific computing, data viz and general geekery, with examples in R and MATLAB.</description>
	<lastBuildDate>Fri, 03 May 2013 13:19:05 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='4dpiecharts.com' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>4D Pie Charts</title>
		<link>http://4dpiecharts.com</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://4dpiecharts.com/osd.xml" title="4D Pie Charts" />
	<atom:link rel='hub' href='http://4dpiecharts.com/?pushpress=hub'/>
		<item>
		<title>A brainfuck interpreter for R</title>
		<link>http://4dpiecharts.com/2013/04/24/a-brainfuck-interpreter-for-r/</link>
		<comments>http://4dpiecharts.com/2013/04/24/a-brainfuck-interpreter-for-r/#comments</comments>
		<pubDate>Wed, 24 Apr 2013 22:51:25 +0000</pubDate>
		<dc:creator>richierocks</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[brainfuck]]></category>
		<category><![CDATA[interpreter]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[turing-tarpit]]></category>

		<guid isPermaLink="false">http://4dpiecharts.com/?p=515</guid>
		<description><![CDATA[The deadline for my book on R is fast approaching, so naturally I&#8217;m in full procrastination mode.  So much so that I&#8217;ve spent this evening creating a brainfuck interpreter for R.  brainfuck is a very simple programming language: you get an array of 30000 bytes, an index, and just 8 eight commands.  You move the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=515&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>The deadline for my book on R is fast approaching, so naturally I&#8217;m in full procrastination mode.  So much so that I&#8217;ve spent this evening creating a brainfuck interpreter for R.  brainfuck is a very simple programming language: you get an array of 30000 bytes, an index, and just 8 eight commands.  You move the index left or right along the array with <code>&lt;</code> and <code>&gt;</code>; increase or decrease the value at the current position with <code>+</code> and <code>-</code>; read and write characters using <code>.</code> and <code>,</code>; and start and end loops with <code>[</code> and <code>]</code>.</p>
<p>There seem to be two approaches to creating a brainfuck interpreter: directly execute the commands, or generate code in a sensible language and execute that.  I&#8217;ve opted for the latter approach because it&#8217;s easier, at least in R.  Generating R code and then calling <code>eval</code> is probably a little slower than directly executing commands, but that&#8217;s the least of your worries with brainfuck.  Even writing a trivial page-long program will take you many million times longer than it takes to execute.</p>
<p>The fact that you have to mix data variables (that 30000 element raw vector and an index) with commands means that an object oriented approach is useful.  The whole interpreter is stored in a single reference class, of type brainfuck.  Rather than me showing you all the code here, I suggest that you take a look at it (or clone it) from <a href="https://bitbucket.org/richierocks/brainfuck">its repository on bitbucket</a>.  (I&#8217;ll submit to CRAN soon.)</p>
<p>Here&#8217;s a Hello World example taken from <a href="http://en.wikipedia.org/wiki/Brainfuck#Hello_World.21">Wikipedia</a>.  To use the brainfuck package, you just create/import your brainfuck program as a character vector (non-command characters are ignored, so you can comment your code).  Call <code>fuckbrain</code> once to create the interpreter variable, then call its <code>interpret</code> method on each program that you want to run.</p>
<pre class="brush: r; title: ; notranslate">
library(brainfuck)
hello_world &lt;- &quot;+++++ +++++  initialize counter (cell #0) to 10
[                            use loop to set the next four cells to 70/100/30/10
    &gt; +++++ ++               add  7 to cell #1
    &gt; +++++ +++++            add 10 to cell #2 
    &gt; +++                    add  3 to cell #3
    &gt; +                      add  1 to cell #4
    &lt;&lt;&lt;&lt; -                   decrement counter (cell #0)
]                   
&gt; ++ .                       print 'H'
&gt; + .                        print 'e'
+++++ ++ .                   print 'l'
.                            print 'l'
+++ .                        print 'o'
&gt; ++ .                       print ' '
&lt;&lt; +++++ +++++ +++++ .       print 'W'
&gt; .                          print 'o'
+++ .                        print 'r'
----- - .                    print 'l'
----- --- .                  print 'd'
&gt; + .                        print '!'
&gt; .                          print '\n'&quot;
bfi &lt;- fuckbrain()
bfi$interpret()
</pre>
<br /> Tagged: <a href='http://4dpiecharts.com/tag/brainfuck/'>brainfuck</a>, <a href='http://4dpiecharts.com/tag/interpreter/'>interpreter</a>, <a href='http://4dpiecharts.com/tag/r/'>r</a>, <a href='http://4dpiecharts.com/tag/turing-tarpit/'>turing-tarpit</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/4dpiecharts.wordpress.com/515/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/4dpiecharts.wordpress.com/515/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=515&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://4dpiecharts.com/2013/04/24/a-brainfuck-interpreter-for-r/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/85c1a01d3843ecaa30abd003b65811a8?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">richierocks</media:title>
		</media:content>
	</item>
		<item>
		<title>A little Christmas Present for you</title>
		<link>http://4dpiecharts.com/2012/12/25/a-little-christmas-present-for-you/</link>
		<comments>http://4dpiecharts.com/2012/12/25/a-little-christmas-present-for-you/#comments</comments>
		<pubDate>Tue, 25 Dec 2012 08:34:07 +0000</pubDate>
		<dc:creator>richierocks</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[bad data handbook]]></category>
		<category><![CDATA[book]]></category>
		<category><![CDATA[chemistry]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[HSL]]></category>
		<category><![CDATA[statistics]]></category>

		<guid isPermaLink="false">http://4dpiecharts.com/?p=504</guid>
		<description><![CDATA[Here&#8217;s an excerpt from my chapter &#8220;Blood, sweat and urine&#8221; from The Bad Data Handbook. Have a lovely Christmas! I spent six years working in the statistical modeling team at the UK’s Health and SafetyLaboratory. A large part of my job was working with the laboratory’s chemists, lookingat occupational exposure to various nasty substances to see [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=504&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Here&#8217;s an excerpt from my chapter &#8220;Blood, sweat and urine&#8221; from <a href="http://shop.oreilly.com/product/0636920024422.do" title="The Bad Data Handbook">The Bad Data Handbook</a>.  Have a lovely Christmas!</p>
<blockquote><p>I spent six years working in the statistical modeling team at the UK’s Health and Safety<br />Laboratory. A large part of my job was working with the laboratory’s chemists, looking<br />at occupational exposure to various nasty substances to see if an industry was adhering<br />to safe limits. The laboratory gets sent tens of thousands of blood and urine samples<br />each year (and sometimes more exotic fluids like sweat or saliva), and has its own team<br />of occupational hygienists who visit companies and collect yet more samples.<br />The sample collection process is known as “biological monitoring.” This is because when<br />the occupational hygienists get home and their partners ask “How was your day?,” “I’ve<br />been biological monitoring, darling” is more respectable to say than “I spent all day<br />getting welders to wee into a vial.”<br />In 2010, I was lucky enough to be given a job swap with James, one of the chemists.<br />James’s parlour trick is that, after running many thousands of samples, he can tell the<br />level of creatinine in someone’s urine with uncanny accuracy, just by looking at it. This<br />skill was only revealed to me after we’d spent an hour playing “guess the creatinine level”<br />and James had suggested that “we make it more interesting.” I’d lost two packets of fig<br />rolls before I twigged that I was onto a loser.</p>
<p>The principle of the job swap was that I would spend a week in the lab assisting with<br />the experiments, and then James would come to my office to help out generating the<br />statistics. In the process, we’d both learn about each other’s working practices and find<br />ways to make future projects more efficient.<br />In the laboratory, I learned how to pipette (harder than it looks), and about the methods<br />used to ensure that the numbers spat out of the mass spectrometer4 were correct. So as<br />well as testing urine samples, within each experiment you need to test blanks (distilled<br />water, used to clean out the pipes, and also to check that you are correctly measuring<br />zero), calibrators (samples of a known concentration for calibrating the instrument5),<br />and quality controllers (samples with a concentration in a known range, to make sure<br />the calibration hasn’t drifted). On top of this, each instrument needs regular maintaining<br />and recalibrating to ensure its accuracy.<br />Just knowing that these things have to be done to get sensible answers out of the ma?<br />chinery was a small revelation. Before I’d gone into the job swap, I didn’t really think<br />about where my data came from; that was someone else’s problem. From my point of<br />view, if the numbers looked wrong (extreme outliers, or otherwise dubious values) they<br />were a mistake; otherwise they were simply “right.” Afterwards, my view is more<br />nuanced. Now all the numbers look like, maybe not quite a guess, but certainly only an<br />approximation of the truth. This measurement error is important to remember, though<br />for health and safety purposes, there’s a nice feature. Values can be out by an order of<br />magnitude at the extreme low end for some tests, but we don’t need to worry so much<br />about that. It’s the high exposures that cause health problems, and measurement error<br />is much smaller at the top end.</p>
</blockquote>
<p> </p>
<br /> Tagged: <a href='http://4dpiecharts.com/tag/bad-data-handbook/'>bad data handbook</a>, <a href='http://4dpiecharts.com/tag/book/'>book</a>, <a href='http://4dpiecharts.com/tag/chemistry/'>chemistry</a>, <a href='http://4dpiecharts.com/tag/data/'>data</a>, <a href='http://4dpiecharts.com/tag/hsl/'>HSL</a>, <a href='http://4dpiecharts.com/tag/statistics/'>statistics</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/4dpiecharts.wordpress.com/504/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/4dpiecharts.wordpress.com/504/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=504&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://4dpiecharts.com/2012/12/25/a-little-christmas-present-for-you/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/85c1a01d3843ecaa30abd003b65811a8?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">richierocks</media:title>
		</media:content>
	</item>
		<item>
		<title>Have my old job!</title>
		<link>http://4dpiecharts.com/2012/11/14/have-my-old-job/</link>
		<comments>http://4dpiecharts.com/2012/11/14/have-my-old-job/#comments</comments>
		<pubDate>Wed, 14 Nov 2012 19:55:36 +0000</pubDate>
		<dc:creator>richierocks</dc:creator>
				<category><![CDATA[MATLAB]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[acslX]]></category>
		<category><![CDATA[career]]></category>
		<category><![CDATA[HSL]]></category>
		<category><![CDATA[job]]></category>
		<category><![CDATA[matlab]]></category>
		<category><![CDATA[r]]></category>

		<guid isPermaLink="false">https://4dpiecharts.wordpress.com/?p=502</guid>
		<description><![CDATA[My old job at the Health &#38; Safety Laboratory is being advertised, and at a higher pay grade to boot.  (Though it is still civil service pay, and thus not going to make you rich.) You&#8217;ll need to have solid mathematical modelling skills, particularly solving systems of ODEs, and be proficient at writing scientific code, [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=502&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>My old job at the Health &amp; Safety Laboratory is being <a href="http://www.HSL.gov.uk/careers/current-vacancies/senior-mathemtical-modeller.aspx">advertised</a>, and at a higher pay grade to boot.  (Though it is still civil service pay, and thus not going to make you rich.)</p>
<p>You&#8217;ll need to have solid mathematical modelling skills, particularly solving systems of ODEs, and be proficient at writing scientific code, preferably R or MATLAB or acslX. From chats with a few people at the lab, management are especially keen to get someone who can bring in money so grant writing and blagging skills are important too.</p>
<p>It&#8217;s a smashing place to work and the people are lovely.  Also, you get flexitime and loads of holiday.  If you are looking for a maths job in North West* England then I can heartily recommend applying.</p>
<p>*Buxton is sometimes North West England (when we get BBC local news) and sometimes in the East Midlands (like when we vote in European elections).</p>
<br /> Tagged: <a href='http://4dpiecharts.com/tag/acslx/'>acslX</a>, <a href='http://4dpiecharts.com/tag/career/'>career</a>, <a href='http://4dpiecharts.com/tag/hsl/'>HSL</a>, <a href='http://4dpiecharts.com/tag/job/'>job</a>, <a href='http://4dpiecharts.com/tag/matlab/'>matlab</a>, <a href='http://4dpiecharts.com/tag/r/'>r</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/4dpiecharts.wordpress.com/502/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/4dpiecharts.wordpress.com/502/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=502&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://4dpiecharts.com/2012/11/14/have-my-old-job/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/85c1a01d3843ecaa30abd003b65811a8?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">richierocks</media:title>
		</media:content>
	</item>
		<item>
		<title>Indexing with factors</title>
		<link>http://4dpiecharts.com/2012/11/08/indexing-with-factors/</link>
		<comments>http://4dpiecharts.com/2012/11/08/indexing-with-factors/#comments</comments>
		<pubDate>Thu, 08 Nov 2012 10:59:49 +0000</pubDate>
		<dc:creator>richierocks</dc:creator>
				<category><![CDATA[R]]></category>

		<guid isPermaLink="false">http://4dpiecharts.com/?p=500</guid>
		<description><![CDATA[This is a silly problem that bit me again recently. It&#8217;s an elementary mistake that I&#8217;ve somehow repeatedly failed to learn to avoid in eight years of R coding. Here&#8217;s an example to demonstrate. Suppose we create a data frame with a categorical column, in this case the heights of ten adults along with their [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=500&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>This is a silly problem that bit me again recently.  It&#8217;s an elementary mistake that I&#8217;ve somehow repeatedly failed to learn to avoid in eight years of R coding.  Here&#8217;s an example to demonstrate.</p>
<p>Suppose we create a data frame with a categorical column, in this case the heights of ten adults along with their gender.</p>
<pre class="brush: r; title: ; notranslate">
(heights &lt;- data.frame(
  height_cm = c(153, 181, 150, 172, 165, 149, 174, 169, 198, 163),
  gender    = c(&quot;female&quot;, &quot;male&quot;, &quot;female&quot;, &quot;male&quot;, &quot;male&quot;, &quot;female&quot;, &quot;female&quot;, &quot;male&quot;, &quot;male&quot;, &quot;female&quot;)
))
</pre>
<p>Using a factory fresh copy of R, the gender column will be assigned a factor with two levels: &#8220;female&#8221; and then &#8220;male&#8221;.  This is all well and good, though the column can be kept as characters by setting <code>stringsAsFactors = FALSE</code>.</p>
<p>Now suppose that we want to assign a body weight to these people, based upon a gender average.</p>
<pre class="brush: r; title: ; notranslate">
avg_body_weight_kg &lt;- c(male = 78, female = 63)
</pre>
<p>Pop quiz: what does this next line of code give us?</p>
<pre class="brush: r; title: ; notranslate">
avg_body_weight_kg[heights$gender]  
</pre>
<p>Well, the first value of <code>heights$gender</code> is &#8220;female&#8221;, so the first value should be 63, and the second value of <code>heights$gender</code> is &#8220;male&#8221;, so the second value should be 78, and so on.  Let&#8217;s try it.</p>
<pre class="brush: r; title: ; notranslate">
avg_body_weight_kg[heights$gender]  
#  male female   male female female   male   male female female   male 
#    78     63     78     63     63     78     78     63     63     78 
</pre>
<p>Uh-oh, the values are reversed.  So what really happened?  When you use a factor as an index, <em>R silently converts it to an integer vector</em>. That means that the first index of &#8220;female&#8221; is converted to 1, giving a value of 78, and so on.</p>
<p>The fundamental problem is that there are two natural interpretations of a factor index &ndash; character indexing or integer indexing.  Since these can give conflicting results, ideally R would provide a warning when you use a factor index.  Until such a change gets implemented, I suggest that best practice is to always explicitly convert factors to integer or to character before you use them in an index.</p>
<pre class="brush: r; title: ; notranslate">         
avg_body_weight_kg[as.character(heights$gender)]  
avg_body_weight_kg[as.integer(heights$gender)]
</pre>
<br />  <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/4dpiecharts.wordpress.com/500/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/4dpiecharts.wordpress.com/500/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=500&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://4dpiecharts.com/2012/11/08/indexing-with-factors/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/85c1a01d3843ecaa30abd003b65811a8?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">richierocks</media:title>
		</media:content>
	</item>
		<item>
		<title>Make your data famous!</title>
		<link>http://4dpiecharts.com/2012/10/30/make-your-data-famous/</link>
		<comments>http://4dpiecharts.com/2012/10/30/make-your-data-famous/#comments</comments>
		<pubDate>Tue, 30 Oct 2012 19:38:51 +0000</pubDate>
		<dc:creator>richierocks</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[book]]></category>
		<category><![CDATA[data]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[stats]]></category>

		<guid isPermaLink="false">http://4dpiecharts.com/?p=496</guid>
		<description><![CDATA[I&#8217;m writing a book on R for O&#8217;Reilly, and I need interesting datasets for the examples. Any data that you provide will get you a mention in the book and in the publicity material, so it&#8217;s a great opportunity to publicise your work or your organisation. Datasets from any area or industry are suitable; the [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=496&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>I&#8217;m writing a book on R for O&#8217;Reilly, and I need interesting datasets for the examples.  Any data that you provide will get you a mention in the book and in the publicity material, so it&#8217;s a great opportunity to publicise your work or your organisation.</p>
<p>Datasets from any area or industry are suitable; the only constraint is that it can be analysed with a few pages of R code to provide a result that a general reader might go &#8220;ooh&#8221;.  There&#8217;s a chapter on data cleaning, so even dirty data is suitable!</p>
<p>All the data will be provided in an R package to accompany the book, so you need to be willing to make it publically available.  I can help you anonymise the data, or strip out commercially sensitive parts if you require.  </p>
<p>If you can provide anything, or you know someone who might be able to, then drop me an email at richierocks AT gmail DOT com.  Thanks.</p>
<p>EDIT: There are some (quite) frequently asked questions already!  Here are the answers; you can use your Jeopardy! skills to guess the questions.<br />
1. The book is called &#8220;Learning R&#8221;, and it&#8217;s a fairly gentle introduction to the language, covering both how you program in R, and how you analyse data.<br />
2. If you provide data, then yes, you can have an PDF of the pre-release version to make sure I haven&#8217;t done something silly with your dataset.</p>
<br /> Tagged: <a href='http://4dpiecharts.com/tag/book/'>book</a>, <a href='http://4dpiecharts.com/tag/data/'>data</a>, <a href='http://4dpiecharts.com/tag/r/'>r</a>, <a href='http://4dpiecharts.com/tag/stats/'>stats</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/4dpiecharts.wordpress.com/496/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/4dpiecharts.wordpress.com/496/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=496&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://4dpiecharts.com/2012/10/30/make-your-data-famous/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/85c1a01d3843ecaa30abd003b65811a8?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">richierocks</media:title>
		</media:content>
	</item>
		<item>
		<title>Look ma! No typing! Autorunning code on R startup</title>
		<link>http://4dpiecharts.com/2012/07/20/look-ma-no-typing-autorunning-code-on-r-startup/</link>
		<comments>http://4dpiecharts.com/2012/07/20/look-ma-no-typing-autorunning-code-on-r-startup/#comments</comments>
		<pubDate>Fri, 20 Jul 2012 13:47:25 +0000</pubDate>
		<dc:creator>richierocks</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[gui]]></category>
		<category><![CDATA[gWidgets]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[startup]]></category>

		<guid isPermaLink="false">http://4dpiecharts.com/?p=493</guid>
		<description><![CDATA[Regular readers may know that I often make R-based GUIs. They&#8217;re great for giving non-technical users safe and easy access to statistical models. The safety comes from the restrictions of a GUI: you can limit what the users does more easily than with a command line, helping to reduce the number of opportunities for bad [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=493&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Regular readers may know that I often make R-based GUIs.  They&#8217;re great for giving non-technical users safe and easy access to statistical models.  The safety comes from the restrictions of a GUI: you can limit what the users does more easily than with a command line, helping to reduce the number of opportunities for bad science.  My tool of choice for GUI building is John Verzani&#8217;s set of gWidgets packages; see my <a href="http://4dpiecharts.com/2010/10/06/creating-guis-in-r-with-gwidgets/">introduction</a> and <a href="http://4dpiecharts.com/2012/02/20/gui-building-in-r-gwidgets-vs-deducer/">comparison with Deducer</a>.</p>
<p>Since the target audience is non-technical, an important aim is to reduce the amount of typing at the R command prompt.  Typically, I wrap the GUI into  package, and have a single function call to load the GUI, so the users will have to start R, then type something like</p>
<pre class="brush: r; title: ; notranslate">
library(myGui)
gui &lt;- runTheGui()

#Here's an example GUI to play with now
runTheGui &lt;- function()
{
  win &lt;- gwindow(&quot;Test&quot;, visible = FALSE)
  rad &lt;- gradio(letters[1:4], cont = win)
  visible(win) &lt;- TRUE
  focus(win)
  list(win = win, rad = rad)
}
</pre>
<p>Typing two lines isn&#8217;t <em>too</em> onerous, but the ideal situation would involve no typing at all.  That is, you double click a shortcut that opens R and then the GUI.  With a little help from the internet, I have two solutions.  Which one is best depends upon your setup.</p>
<p>The first solution was suggested to me by my collaborators Simon and Mark over at <a href="http://drunks-and-lampposts.com/2012/06/18/r-creating-a-shortcut-to-run-a-gwidgets-gui/">Drunks &amp; Lampposts</a>, who got it from <a href="http://stackoverflow.com/questions/10312417/running-an-r-script-using-a-windows-shortcut">Greg Snow</a>.  I&#8217;ve refined the technique to make it simpler.</p>
<p>There are two tricks involved.  Firstly, when R (at least R GUI; Eclipse, RStudio, emacs, etc. may require configuration) starts up, by default it will run a function named <code>.First</code>, if that function exists.  So our first task is to put those previous lines of code inside that function.</p>
<pre class="brush: r; title: ; notranslate">
.First &lt;- function()
{
  library(myGui)
  gui &lt;&lt;- runTheGui()
}
</pre>
<p>Then, we save that function into an R binary workspace file.</p>
<pre class="brush: r; title: ; notranslate">
save(.First, file = &quot;~/Desktop/runTheGui.RData&quot;)
</pre>
<p>The second trick is that (assuming your operating system has been <a href="http://windows.microsoft.com/en-us/windows7/Change-which-programs-Windows-uses-by-default">configured correctly</a>), double-clicking a <code>.RData</code> file will start R GUI, loading said <code>.RData</code> file, and running the contents of that <code>.First</code> function.</p>
<p>So all the user needs to do is double click the RData file, and the GUI will run.</p>
<p>This is exactly what we wanted, but it has a small drawback in that R GUI isn&#8217;t available on all platforms.  Also, if you really don&#8217;t want users to type things, then you may not want R GUI at all.  In that case, using <code>Rscript</code> (as suggested by <a href="http://stackoverflow.com/a/11565945/134830">Dirk Eddelbuettel</a>) is a better solution.  <code>Rscript</code> is a little bit like batch mode.  It open R in a terminal, runs a script, then closes R again.  So for this solution, we need to create a script.  This time we don&#8217;t need to wrap the contents inside a function.  We do need to add something to the end of the script to prevent R closing down once the script has run, such as a check that the window is still open.  (Note that since the R console won&#8217;t be available this time, this solution isn&#8217;t that useful for non-GUI purposes.)</p>
<pre class="brush: r; title: ; notranslate">
library(myGui)
gui &lt;- runTheGui()
while(isExtant(gui$win)) Sys.sleep(1)
</pre>
<p>Now to get the GUI running, you create a shortcut to Rscript, with the script file as an argument.  Change the path to R and to the script file as appropriate.</p>
<pre class="brush: plain; title: ; notranslate">
&quot;%ProgramW6432%\R\R-2.15.1\bin\Rscript.exe&quot; &quot;path/to/your/script/runTheGui.R&quot;
</pre>
<p>And voila! We have a GUI running in R, again from a simple, single double-click, and this time R closes itself down when the GUI closes.</p>
<br /> Tagged: <a href='http://4dpiecharts.com/tag/gui/'>gui</a>, <a href='http://4dpiecharts.com/tag/gwidgets/'>gWidgets</a>, <a href='http://4dpiecharts.com/tag/r/'>r</a>, <a href='http://4dpiecharts.com/tag/startup/'>startup</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/4dpiecharts.wordpress.com/493/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/4dpiecharts.wordpress.com/493/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=493&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://4dpiecharts.com/2012/07/20/look-ma-no-typing-autorunning-code-on-r-startup/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/85c1a01d3843ecaa30abd003b65811a8?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">richierocks</media:title>
		</media:content>
	</item>
		<item>
		<title>How long does it take to get pregnant?</title>
		<link>http://4dpiecharts.com/2012/06/15/how-long-does-it-take-to-get-pregnant/</link>
		<comments>http://4dpiecharts.com/2012/06/15/how-long-does-it-take-to-get-pregnant/#comments</comments>
		<pubDate>Fri, 15 Jun 2012 15:09:07 +0000</pubDate>
		<dc:creator>richierocks</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[conception]]></category>
		<category><![CDATA[dataviz]]></category>
		<category><![CDATA[fertility]]></category>
		<category><![CDATA[ggplot2]]></category>
		<category><![CDATA[pregnancy]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[stats]]></category>

		<guid isPermaLink="false">http://4dpiecharts.com/?p=486</guid>
		<description><![CDATA[My girlfriend&#8217;s biological clock is ticking, and so we&#8217;ve started trying to spawn. Since I&#8217;m impatient, that has naturally lead to questions like &#8220;how long will it take?&#8221;. If I were to believe everything on TV, the answer would be easy: have unprotected sex once and pregnancy is guaranteed. A more cynical me suggests that [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=486&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>My girlfriend&#8217;s biological clock is ticking, and so we&#8217;ve started trying to spawn.  Since I&#8217;m impatient, that has naturally lead to questions like &#8220;how long will it take?&#8221;.  If I were to believe everything on TV, the answer would be easy: <a href="http://tvtropes.org/pmwiki/pmwiki.php/Main/CantGetAwayWithNuthin">have unprotected sex once and pregnancy is guaranteed</a>.</p>
<p>A more cynical me suggests that this isn&#8217;t the case.  Unfortunately, it is surpisingly difficult to find out the monthly chance of getting pregnant (technical jargon: the &#8220;monthly fecundity rate&#8221;, or MFR), given that you are having regular sex in the days leading up to ovulation.  Everyone agrees that age has a big effect, with women&#8217;s peak fertility occuring somewhere around the age of 25.  Beyond that point, the internet is filled with near-useless summary statistics like the chance of conceiving after one year.  For example, the usually reliable <a href="http://www.nhs.uk/chq/Pages/2295.aspx?CategoryID=54&amp;SubCategoryID=127">NHS site</a> says</p>
<blockquote><p>Women become less fertile as they get older. For women aged 35, about 94 out of every 100<br />
who have regular unprotected sex will get pregnant after three years of trying. However, for<br />
women aged 38, only 77 out of every 100 will do so.</p></blockquote>
<p>I found a couple of reasonably sciency links(<a href="http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3017326/">George and Kamath</a>, <a href="http://www.socalfertility.com/age-and-fertility.html">Socal Fertility</a>) that suggest that the MFR is about 25% for a women aged 25, and 10% at age 35.  The Scoal link also gives rates of 15% at age 30, 5% at age 40 and less than 1% at age 45.  If the woman is too fat, too thin, a smoker, or has hormone problems, or is stressed, then the rate needs reducing.</p>
<p>Given the MFR, the probability of getting pregnant after a given number of months can be calculated with a negative binomial distribution.</p>
<pre class="brush: r; title: ; notranslate">
months &lt;- 0:60
p_preg_per_month &lt;- c(&quot;25&quot; = 0.25, &quot;30&quot; = 0.15, &quot;35&quot; = 0.1, &quot;40&quot; = 0.05, &quot;45&quot; = 0.01)
p_success &lt;- unlist(lapply(
  p_preg_per_month, 
  function(p) pnbinom(months, 1, p)
))
</pre>
<p>Now we just create a data frame suitable for passing to ggplot2 &#8230;</p>
<pre class="brush: r; title: ; notranslate">
mfr_group &lt;- paste(
  &quot;MFR =&quot;, 
  format(p_preg_per_month, digits = 2), 
  &quot;at age&quot;, 
  names(p_preg_per_month)
)
mfr_group &lt;- factor(mfr_group, levels = mfr_group)
preg_data &lt;- data.frame(
  months = rep.int(months, length(mfr_group))  ,
  mfr_group = rep(mfr_group, each = length(months)),
  p_success = p_success
)
</pre>
<p>and draw the plot.</p>
<pre class="brush: r; title: ; notranslate">
library(ggplot2)
(p &lt;- ggplot(preg_data, aes(months, p_success, colour = mfr_group)) +
  geom_point() +
  scale_x_continuous(breaks = seq.int(0, 60, 12)) +
  scale_y_continuous(breaks = seq.int(0, 1, 0.1), limits = c(0, 1)) +
  scale_colour_discrete(&quot;Monthly fecundity rate&quot;) +
  xlab(&quot;Months&quot;) +
  ylab(&quot;Probability of conception&quot;) +
  opts(panel.grid.major = theme_line(colour = &quot;grey60&quot;))
)
</pre>
<p><a href="http://4dpiecharts.files.wordpress.com/2012/06/probability_of_conception_by_month.png"><img src="http://4dpiecharts.files.wordpress.com/2012/06/probability_of_conception_by_month.png?w=595" alt="The plot shows the probability of conception by number of months of trying for different age groups. " title="Probability of conception by number of months of trying"   class="aligncenter size-full wp-image-487" /></a></p>
<p>So almost half of the (healthy) 25 year olds get pregnant in the first <del datetime="2012-07-03T15:47:50+00:00">month</del>two months, and after two years (the point when doctors start considering you to have fertility problems) more than 90% of 35 year olds should conceive.  By contrast, just over 20% of 45 year old women will.  In fact, even this statistic is over-optimistic: at this age, fertility is rapidly decreasing, and a 1% MFR at age 45 will mean a much lower MFR at age 47 and the negative binomial model breaks down.</p>
<p>Of course, from a male point of view, conception is an embarrassingly parallel problem: you can dramatically reduce the time to conceive a child by sleeping with lots of women at once.  (DISCLAIMER: Janette, if you&#8217;re reading this, I&#8217;m not practising or advocating this technique!)</p>
<br /> Tagged: <a href='http://4dpiecharts.com/tag/conception/'>conception</a>, <a href='http://4dpiecharts.com/tag/dataviz/'>dataviz</a>, <a href='http://4dpiecharts.com/tag/fertility/'>fertility</a>, <a href='http://4dpiecharts.com/tag/ggplot2/'>ggplot2</a>, <a href='http://4dpiecharts.com/tag/pregnancy/'>pregnancy</a>, <a href='http://4dpiecharts.com/tag/r/'>r</a>, <a href='http://4dpiecharts.com/tag/stats/'>stats</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/4dpiecharts.wordpress.com/486/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/4dpiecharts.wordpress.com/486/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=486&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://4dpiecharts.com/2012/06/15/how-long-does-it-take-to-get-pregnant/feed/</wfw:commentRss>
		<slash:comments>38</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/85c1a01d3843ecaa30abd003b65811a8?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">richierocks</media:title>
		</media:content>

		<media:content url="http://4dpiecharts.files.wordpress.com/2012/06/probability_of_conception_by_month.png" medium="image">
			<media:title type="html">Probability of conception by number of months of trying</media:title>
		</media:content>
	</item>
		<item>
		<title>Be assertive!</title>
		<link>http://4dpiecharts.com/2012/05/30/be-assertive/</link>
		<comments>http://4dpiecharts.com/2012/05/30/be-assertive/#comments</comments>
		<pubDate>Wed, 30 May 2012 12:14:38 +0000</pubDate>
		<dc:creator>richierocks</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[assertive]]></category>
		<category><![CDATA[packages]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[robust-code]]></category>
		<category><![CDATA[stats]]></category>

		<guid isPermaLink="false">http://4dpiecharts.com/?p=480</guid>
		<description><![CDATA[assertive, my new package for writing robust code, is now on CRAN. It consists of lots of is functions for checking variables, and corresponding assert functions that throw an error if the condition doesn&#8217;t hold. For example, is_a_number checks that the input is numeric and scalar. In the last two cases, the return value of [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=480&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p><a href="http://4dpiecharts.files.wordpress.com/2012/05/assert_package_is_awesome.png"><img src="http://4dpiecharts.files.wordpress.com/2012/05/assert_package_is_awesome.png?w=595" alt="assert_package_is_awesome(&quot;assertive&quot;) returns TRUE." title="This function doesn&#039;t exist, but if it did, it would return TRUE."   class="aligncenter size-full wp-image-481" /></a></p>
<p>assertive, my new package for writing robust code, is now on CRAN.  It consists of lots of <code>is</code> functions for checking variables, and corresponding <code>assert</code> functions that throw an error if the condition doesn&#8217;t hold.  For example, <code>is_a_number</code> checks that the input is numeric and scalar.</p>
<pre class="brush: r; title: ; notranslate">
is_a_number(1)     #TRUE
is_a_number(&quot;a&quot;)   #FALSE
is_a_number(1:10)  #FALSE
</pre>
<p>In the last two cases, the return value of FALSE has an attribute &#8220;<code>cause</code>&#8221; that indicates the cause of failure. When &#8220;a&#8221; is the input, the cause is &#8220;<code>"a" is not of type 'numeric'.</code>&#8220;, whereas for <code>1:10</code>, the cause is &#8220;<code>1:10 does not have length one.</code>&#8220;.  You can get or set the cause attribute with the <code>cause</code> function.</p>
<pre class="brush: r; title: ; notranslate">
m &lt;- lm(uptake ~ 1, CO2)
ok &lt;- is_empty_model(m)
if(!ok) cause(ok)
</pre>
<p>The <code>assert</code> functions call an <code>is</code> function, and if the result is FALSE, they throw an error; otherwise they do nothing.</p>
<pre class="brush: r; title: ; notranslate">
assert_is_a_number(1)   #OK
assert_is_a_number(&quot;a&quot;) #Throws an error
</pre>
<p>There are also some <code>has</code> functions, primarily for checking the presence of attributes.</p>
<pre class="brush: r; title: ; notranslate">
has_names(c(foo = 1, bar = 4, baz = 9))
has_dims(matrix(1:12, nrow = 3))
</pre>
<p>Some functions apply to properties of vectors.  In this case, the <code>assert</code> functions can check that all the values conform to the condition, or any of the values conform.</p>
<pre class="brush: r; title: ; notranslate">
x &lt;- -2:2
is_positive(x)              #The last two are TRUE
assert_any_are_positive(x)  #OK
assert_all_are_positive(x)  #Error
</pre>
<p>&#8220;Why would you want to use these functions?&#8221;, you may be asking.  The dynamic typing and extreme flexibility of R means that it is very easy to have variables that are the wrong format.  This is particularly true when you are dealing with user input.  So while you know that the sales totals passed to your function should be a vector of non-negative numbers, or that the regular expression should be a single string rather than a character vector, your user may not.  You need to check for these invalid conditions, and return an error message that the user can understand.  assertive makes it easy to do all this. </p>
<p>Since this is the first public release of assertive, it hasn&#8217;t been widely tested.  I&#8217;ve written a moderately comprehensive unit-test suite, but there are likely to be a few minor bugs here and there.  In particular, I suspect there may be one or two typos in the documentation.  Please give the package a try, and let me know if you find any errors, or if you want any other functions adding.</p>
<br /> Tagged: <a href='http://4dpiecharts.com/tag/assertive/'>assertive</a>, <a href='http://4dpiecharts.com/tag/packages/'>packages</a>, <a href='http://4dpiecharts.com/tag/r/'>r</a>, <a href='http://4dpiecharts.com/tag/robust-code/'>robust-code</a>, <a href='http://4dpiecharts.com/tag/stats/'>stats</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/4dpiecharts.wordpress.com/480/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/4dpiecharts.wordpress.com/480/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=480&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://4dpiecharts.com/2012/05/30/be-assertive/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/85c1a01d3843ecaa30abd003b65811a8?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">richierocks</media:title>
		</media:content>

		<media:content url="http://4dpiecharts.files.wordpress.com/2012/05/assert_package_is_awesome.png" medium="image">
			<media:title type="html">This function doesn&#039;t exist, but if it did, it would return TRUE.</media:title>
		</media:content>
	</item>
		<item>
		<title>Benford&#8217;s Law and fraud in the Russian election</title>
		<link>http://4dpiecharts.com/2012/03/05/benfords-law-and-fraud-in-the-russian-election/</link>
		<comments>http://4dpiecharts.com/2012/03/05/benfords-law-and-fraud-in-the-russian-election/#comments</comments>
		<pubDate>Mon, 05 Mar 2012 23:18:45 +0000</pubDate>
		<dc:creator>richierocks</dc:creator>
				<category><![CDATA[R]]></category>
		<category><![CDATA[Benford's Law]]></category>
		<category><![CDATA[election]]></category>
		<category><![CDATA[forensic statistics]]></category>
		<category><![CDATA[goldacre]]></category>
		<category><![CDATA[r]]></category>
		<category><![CDATA[russia]]></category>

		<guid isPermaLink="false">http://4dpiecharts.com/?p=467</guid>
		<description><![CDATA[Earlier today Ben Goldacre posted about using Benford&#8217;s Law to try and detect fraud in the Russian elections. Read that now, or the rest of this post won&#8217;t make sense. This is a loose R translation of Ben&#8217;s Stata code. The data is held in a Google doc. While it is possible to directly retrieve [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=467&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Earlier today Ben Goldacre <a href="http://www.badscience.net/2012/03/is-there-statistical-evidence-of-fraud-in-the-russian-election-data/">posted</a> about using Benford&#8217;s Law to try and detect fraud in the Russian elections.  Read that now, or the rest of this post won&#8217;t make sense.  This is a loose R translation of Ben&#8217;s Stata code.</p>
<p>The data is held in a <a href="https://docs.google.com/spreadsheet/ccc?key=0Aj-gvVaugJowdDRkUnM4S2FjOUJjTVphM1djam9VOUE#gid=0">Google doc</a>.  While it is possible to directly retrieve the contents with R, for a single document it is easier to save it a CSV, and load it from your own machine.</p>
<pre class="brush: r; title: ; notranslate">
russian &lt;- read.csv(&quot;Russian observed results - FullData.csv&quot;)
</pre>
<p>There are loads of ways of manipulating data and plotting it in R, and while you can do everything in the base R distribution, I&#8217;m going to use a few packages to make it easier.</p>
<pre class="brush: r; title: ; notranslate">
library(reshape)
library(stringr)
library(ggplot2)
</pre>
<p>A little transformation is needed.  We take only the columns containing the counts and manipulate the data into a &#8220;long&#8221; format with only one value per row.</p>
<pre class="brush: r; title: ; notranslate">
russian &lt;- melt(
    russian[, c(&quot;Zhirinovsky&quot;, &quot;Zyuganov&quot;, &quot;Mironov&quot;, &quot;Prokhorov&quot;, &quot;Putin&quot;)], 
    variable_name = &quot;candidate&quot;
)
</pre>
<p>Now we add columns containing the first and last digits, extracted using regular expressions.</p>
<pre class="brush: r; title: ; notranslate">
russian &lt;- ddply(
    russian, 
    .(candidate), 
    transform, 
    first.digit = str_extract(value, &quot;[123456789]&quot;),
    last.digit  = str_extract(value, &quot;[[:digit:]]$&quot;))
</pre>
<p>The table function gives us the counts of each number, and we compare these against the counts predicted by Benford&#8217;s Law.</p>
<pre class="brush: r; title: ; notranslate">
first_digit_counts &lt;- as.vector(table(russian$first.digit))
first_digit_actual_vs_expected &lt;- data.frame(
  digit            = 1:9,
  actual.count     = first_digit_counts,    
  actual.fraction  = first_digit_counts / nrow(russian),
  benford.fraction = log10(1 + 1 / (1:9))
)
</pre>
<p>The counts of the last digit can be obtained in a similar way.</p>
<pre class="brush: r; title: ; notranslate">
last_digit_counts &lt;- as.vector(table(russian$last.digit))
last_digit_actual_vs_expected &lt;- data.frame(
    digit     = 0:9,
    count     = last_digit_counts,    
    fraction  = last_digit_counts / nrow(russian)
)
last_digit_actual_vs_expected$cumulative.fraction &lt;- cumsum(last_digit_actual_vs_expected$fraction)
</pre>
<p>Here is the line graph&#8230;</p>
<pre class="brush: r; title: ; notranslate">
a_vs_e &lt;- melt(first_digit_actual_vs_expected[, c(&quot;digit&quot;, &quot;actual.fraction&quot;, &quot;benford.fraction&quot;)], id.var = &quot;digit&quot;)
(fig1_lines &lt;- ggplot(a_vs_e, aes(digit, value, colour = variable)) +
    geom_line() +
    scale_x_continuous(breaks = 1:9) +
    scale_y_continuous(formatter = &quot;percent&quot;) +
    ylab(&quot;Counts with this first digit&quot;) +
    opts(legend.position = &quot;none&quot;)
)
</pre>
<p><a href="http://4dpiecharts.files.wordpress.com/2012/03/fig_1_actual_vs_benford1.png"><img src="http://4dpiecharts.files.wordpress.com/2012/03/fig_1_actual_vs_benford1.png?w=595" alt="Fig 1. Actual percentages of first digits vs. those predicted by Benford&#039;s Law" title="Fig 1. Actual percentages of first digits vs. those predicted by Benford&#039;s Law"   class="aligncenter size-full wp-image-470" /></a></p>
<p>and the histogram</p>
<pre class="brush: r; title: ; notranslate">
(fig2_hist &lt;- ggplot(russian, aes(value)) +
    geom_histogram(binwidth = 20)
)
</pre>
<p><a href="http://4dpiecharts.files.wordpress.com/2012/03/fig_2_vote_counts.png"><img src="http://4dpiecharts.files.wordpress.com/2012/03/fig_2_vote_counts.png?w=595" alt="Fig 2. Histogram of vote counts in the Russian election" title="Fig 2. Histogram of vote counts in the Russian election"   class="aligncenter size-full wp-image-469" /></a></p>
<br /> Tagged: <a href='http://4dpiecharts.com/tag/benfords-law/'>Benford's Law</a>, <a href='http://4dpiecharts.com/tag/election/'>election</a>, <a href='http://4dpiecharts.com/tag/forensic-statistics/'>forensic statistics</a>, <a href='http://4dpiecharts.com/tag/goldacre/'>goldacre</a>, <a href='http://4dpiecharts.com/tag/r/'>r</a>, <a href='http://4dpiecharts.com/tag/russia/'>russia</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/4dpiecharts.wordpress.com/467/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/4dpiecharts.wordpress.com/467/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=467&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://4dpiecharts.com/2012/03/05/benfords-law-and-fraud-in-the-russian-election/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/85c1a01d3843ecaa30abd003b65811a8?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">richierocks</media:title>
		</media:content>

		<media:content url="http://4dpiecharts.files.wordpress.com/2012/03/fig_1_actual_vs_benford1.png" medium="image">
			<media:title type="html">Fig 1. Actual percentages of first digits vs. those predicted by Benford&#039;s Law</media:title>
		</media:content>

		<media:content url="http://4dpiecharts.files.wordpress.com/2012/03/fig_2_vote_counts.png" medium="image">
			<media:title type="html">Fig 2. Histogram of vote counts in the Russian election</media:title>
		</media:content>
	</item>
		<item>
		<title>Radical Statistics was radical</title>
		<link>http://4dpiecharts.com/2012/02/25/radical-statistics-was-radical/</link>
		<comments>http://4dpiecharts.com/2012/02/25/radical-statistics-was-radical/#comments</comments>
		<pubDate>Sat, 25 Feb 2012 09:34:10 +0000</pubDate>
		<dc:creator>richierocks</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[conference]]></category>
		<category><![CDATA[RadStats]]></category>
		<category><![CDATA[stats]]></category>

		<guid isPermaLink="false">https://4dpiecharts.wordpress.com/?p=465</guid>
		<description><![CDATA[Today I went to the Radical Statistics conference in London. RadStats was originally a sort of left wing revolutionary group for statisticians, but these days the emphasis is on exposing dubious statistics by companies and politicians. Here&#8217;s a quick rundown of the day. First up Roy Carr-Hill spoke about the problems with trying to collect [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=465&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></description>
				<content:encoded><![CDATA[<p>Today I went to the Radical Statistics conference in London. RadStats was originally a sort of left wing revolutionary group for statisticians, but these days the emphasis is on exposing dubious statistics by companies and politicians.</p>
<p>Here&#8217;s a quick rundown of the day.</p>
<p>First up Roy Carr-Hill spoke about the problems with trying to collect demographic data and estimating soft measures of societal progress like wellbeing. (Household surveys exclude people not in households, like the homeless soldiers and old people in care homes; and English people claim to be 70% satisfied regardless of the question.)</p>
<p>Next was Val Saunders who started with a useful debunking of done methodological flaws in schizophrenia research, then blew it by detailing her own methodologically flaws research and making overly strong claims to have found the cause of that disease.</p>
<p>Aubrey Blunsohn and David Healy both talked about ways that the pharmaceutical industry fudges results. The list was impressively long, leading me to suspect that far to many people have spent far too long thinking of ways to game the system. The two main recommendations that resonated with me were to extend the trials register to phase 1 trials to avoid unfavourable studies being buried and for raw data to be made available for transparent analysis. Pipe dreams.</p>
<p>After lunch Prem Sikka pointed out that tax avoidance isn&#8217;t just shady companies trying to scam the system, but actually accountancy firms pay people to dream up new wheezes and sell them to those companies.</p>
<p>Ann Pettifor and final speaker Howard Reed had similar talks evangelising Keynesian stimulus (roughly, big government spending in times of recession) for the UK economy amongst some economic myth debunking. Thought provoking, though both speakers neglected to mention the limitations of such stimuli &#8211; you have to avoid spending in pork barrel nonsense (see Japan in the 90s, that buy-a-banger scheme in the UK in 2009) and you have to find a ways to turn of the taps w when recession is over.</p>
<p>The other speaker was Allyson Pollack who discussed debunking a dubious study by Zac Cooper claiming that patients being allowed to choose their surgeon improved success rated treating acute myocardial infarction. Such patients are generally unconscious while having their heart attack so out was inevitably nonsense.</p>
<p>Overall a great day.</p>
<br /> Tagged: <a href='http://4dpiecharts.com/tag/conference/'>conference</a>, <a href='http://4dpiecharts.com/tag/radstats/'>RadStats</a>, <a href='http://4dpiecharts.com/tag/stats/'>stats</a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/4dpiecharts.wordpress.com/465/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/4dpiecharts.wordpress.com/465/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=4dpiecharts.com&#038;blog=15320431&#038;post=465&#038;subd=4dpiecharts&#038;ref=&#038;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://4dpiecharts.com/2012/02/25/radical-statistics-was-radical/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://2.gravatar.com/avatar/85c1a01d3843ecaa30abd003b65811a8?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">richierocks</media:title>
		</media:content>
	</item>
	</channel>
</rss>
