## Have my old job!

My old job at the Health & Safety Laboratory is being advertised, and at a higher pay grade to boot. (Though it is still civil service pay, and thus not going to make you rich.)

You’ll need to have solid mathematical modelling skills, particularly solving systems of ODEs, and be proficient at writing scientific code, preferably R or MATLAB or acslX. From chats with a few people at the lab, management are especially keen to get someone who can bring in money so grant writing and blagging skills are important too.

It’s a smashing place to work and the people are lovely. Also, you get flexitime and loads of holiday. If you are looking for a maths job in North West* England then I can heartily recommend applying.

*Buxton is sometimes North West England (when we get BBC local news) and sometimes in the East Midlands (like when we vote in European elections).

## Viewing the internals of MATLAB Matrices

A cool undocumented trick I just learnt from The MathWorks’ Bob Gilmore. If you type

format debug

Then printing any vector reveals information about its internal representation. For example:

x = magic(3) x = Structure address = 6bc1ab0 m = 3 n = 3 pr = d8dccf0 pi = 0 8 1 6 3 5 7 4 9 2

The structure address is the address in memory where the matrix is stored, `m`

and `n`

are the number of rows and columns respectively of the matrix, and `pr`

and `pi`

are pointers to the addresses of the matrices storing the real and imaginary components of the matrix.

One interesting thing to look at is the representation of scalar numbers.

y = 1 y = Structure address = 6bc31e0 m = 1 n = 1 pr = d790b90 pi = 0 1

Yep: they are stored in exactly the same way as matrices: in the same way the “everything in R is a vector”, everything in MATLAB is a matrix. To finish up, here are some more examples for you to explore:

% higher dimensional arrays rand(2, 3, 4) % cell arrays (unfortunately not that revealing) {1, magic(3)} % sparse matrices (very interesting) sparse(ones(3))

## MATLAB’s stand out new feature

It’s been a while since my last MATLAB post, not because I don’t love the language, but more because I do most of my blogging from home, where I have no license, and because (mostly thanks to R-bloggers) I get ten times as many page views for the R posts. (TODO: Create MATLAB-bloggers service.)

Having returned from holiday (it was lovely, thanks for asking) I’ve been trying out the latest release of MATLAB – R2011b. So far, the standout new feature is the automatic variable renaming. If you change the name of a variable at the point where it was declared, then pressing Shift+Enter lets MATLAB rename all other instances. IDEs for statically-typed languages have had this feature for years, but to see it in a dynamically-typed language is very impressive.

## Friday Function: nclass

When you draw a histogram, an important question is “how many bar should I draw?”. This should inspire an indignant response. You didn’t become a programmer to answer questions, did you? No. The whole point of programming is to let your computer do your thinking for you, giving you more time to watch videos of fluffy kittens.

Fortunately, R contains three functions to automate the answer, namely `nclass.Sturges`

, `nclass.scott`

and `nclass.FD`

. (FD is short for Freedman-Diaconis; watch out for the fact that `scott`

isn’t capitalised.)

The differences depend upon length and spread of data. For longer vectors, Scott and Freedman-Diaconis tend to give bigger answers.

short_normal <- rnorm(1e2) nclass.Sturges(short_normal) #8 nclass.scott(short_normal) #8 nclass.FD(short_normal) #12

long_normal <- rnorm(1e5) nclass.Sturges(long_normal) #18 nclass.scott(long_normal) #111 nclass.FD(long_normal) #144

For strongly skewed data, you are best to use some sort of transformation before you draw a histogram, but for the record, Freedman-Diaconis again gives bigger answers for highly skewed (and thus wider) vectors.

short_lognormal <- rlnorm(1e2) nclass.Sturges(short_lognormal) #8 nclass.scott(short_lognormal) #9 nclass.FD(short_lognormal) #20

long_lognormal <- rlnorm(1e5) nclass.Sturges(long_lognormal) #18 nclass.scott(long_lognormal) #443 nclass.FD(long_lognormal) #1134

My feeling is that since each of the three algorithms is rather dumb, it is safest to calculate all three, then pick the middle one.

nclass.all <- function(x, fun = median) { fun(c( nclass.Sturges(x), nclass.scott(x), nclass.FD(x) )) } log_islands hist(log_islands, breaks = nclass.all(log_islands))

I also wrote a MATLAB implementation of this a couple of years ago.

It is worth noting that ggplot2 doesn’t accept a number-of-bins argument to `geom_histogram`

, because

In practice, you will need to use multiple bin widths to

discover all the signal in the data, and having bins with

meaningful widths (rather than some arbitrary fraction of the

range of the data) is more interpretable.

That’s fine if you are interactively exploring the data, but if you want a purely automated solution, then you need to make up a number of bins.

calc_bin_width <- function(x, ...) { rangex <- range(x, na.rm = TRUE) (rangex[2] - rangex[1]) / nclass.all(x, ...) } p <- ggplot(movies, aes(x = votes)) + geom_histogram(binwidth = calc_bin_width(log10(movies$votes))) + scale_x_log10() p

## supercalifragilisticexpialidocious = 1

I notice that the latest version of R has upped the maximum length of variable names from 256 characters to a whopping 10 000! (See `?name`

.) It makes the 63 character limit in MATLAB look rather pitiful by comparison. Come on MathWorks! Let’s have the ability to be stupidly verbose in our variable naming!

## Presentation on testing frameworks

It’s testing week in the Software Carpentry course. To commemorate the occasion, I’ve uploaded a presentation I gave a couple of years ago to evangelise the use of testing frameworks to make your life easier. Just for you, I’ve recorded an all new audio track to accompany the slides. The example uses MATLAB but the principle is applicable to any language.

## When 1 * x != x

Trying to dimly recall things from my maths degree, it seems that in most contexts the whole point of the number one is that it is a multiplicative identity. That is, for any x in your set, 1 * x is equal to x. It turns out that when you move to floating point numbers, *in some programming lanugages*, this isn’t always true.

In R, try the following

x <- Inf + 0i 1 * x [1] Inf + NaNi

Something odd is happening, and my first reaction was that this is a bug. In fact, I reported it as such (in a slightly different form) but the terrifyingly wise Brian Ripley came up with the cause.

When you multiply two complex numbers, this is what happens

In this instance

So, there is a reasonable argument that R is doing the right thing.

MATLAB works a little differently since all it’s numbers are implicitly complex. The inner workings of MATLAB are somewhat opaque for commercial reasons, but in order for compiled MATLAB code to work, I think it’s reasonable to assume that MATLAB stores its variables in a class that is similar to the MWArray class used by compiled code. As far as I can tell, each `double`

matrix contains two matrices for storing real and imaginary components, and has an `IsComplex`

property that is true whenever the complex part has any nonzero values. If `IsComplex`

is false, only the real matrix is used for arithmetic.

Thus in MATLAB, when you define `x = Inf + 0i`

it immediately resolves itself to simply `Inf`

, and we don’t get the same weirdness.

x = Inf + 0i 1 * x Inf

Other languages are divided on the subject: python and ruby agree with R but Mathematica (or at least Wolfram Alpha) agrees with MATLAB.

Thinking about this from a purely mathematical sense,

This concurs with the MATLAB answer and I’m inclined to agree that it makes more sense, but the issue isn’t entirely clear cut. Changing the way complex arithmetic works for a single edge case is likely to introduce more confusion than clarity. It is perhaps sufficient for you to remember that complex infinity is a bit pathological with arithemtic in some languages, and add checks in your code if you think that that will cause a problem.

## Counting the number of dimensions

Knowing how many dimensions your array has is important, if only to know that your pie chart really is 4d. MATLAB is a little odd in that scalar values are, under the bonnet, matrices with one row and one column. Likewise, vectors are matrices with one dimension of length one. Mostly, you don’t notice this but when you are trying to count the number of dimensions, you can see some odd results.

ndims(1) % 2 ndims(1:5) % 2

Often, you may prefer to know the number of nontrivial dimensions: that is, how many dimensions an array has that have length greater than one. I noticed this problem while browsing the code of the rather excellent xml_io_tools by Jarek Tuszynski. The solution is easy to wrap into a function.

function n = nntdims(x) %NNTDIMS Number of non-trivial dimensions. n = nnz(size(x) > 1); end nntdims(1) % 0 nntdims(1:5) % 1

## Two amigos: follow up

Brett and Jiro have announced the results of the competition to make a Bob-free image. There were five entries, two prizes and … I didn’t win either. Still, it was a fun challenge and a useful learning experience so I’m consoling myself with cliches like “it’s not the winning that’s important but the taking part”. I’m certainly not using MATLAB to construct a voodoo-doll image of Brett and Jiro.

%% Read in image and display theAmigos = imread('the amigos better blur.jpg'); image(theAmigos) %% Add lines pinColour = [.5 .5 .5]; xcoords = { ... [130 180] ... [132 182] ... [136 184] ... [140 186] ... [148 190] ... [165 195] ... [182 200] ... [200 205] ... [215 214] ... [230 223] ... [243 228] ... [255 234] ... [270 237] ... [283 244] ... [295 247] ... [300 246] ... [303 248] ... ... [465 515] ... [465 516] ... [466 517] ... [469 519] ... [475 522] ... [487 526] ... [505 534] ... [528 540] ... [548 546] ... [567 551] ... [588 554] ... [606 557] ... [621 560] ... [628 563] ... [633 566] ... [633 567] ... [634 568] ... }; ycoords = { ... [295 300] ... [275 290] ... [260 280] ... [240 275] ... [225 274] ... [220 274] ... [215 273] ... [212 273] ... [212 273] ... [214 273] ... [217 274] ... [221 274] ... [230 275] ... [240 277] ... [250 280] ... [275 285] ... [290 292] ... ... [320 322] ... [305 315] ... [288 310] ... [272 304] ... [253 300] ... [240 296] ... [233 292] ... [230 291] ... [230 291] ... [232 292] ... [236 294] ... [246 297] ... [262 300] ... [280 302] ... [296 307] ... [309 312] ... [320 320] ... }; xstart = cellfun(@(x) x(1), xcoords); ystart = cellfun(@(x) x(1), ycoords); hold on cellfun(@(x, y) line(x, y, 'Color', pinColour), xcoords, ycoords); arrayfun(@(x, y) plot(x, y, '.', 'Color', pinColour), xstart, ystart); hold off %% Remove the extra bits created by plot calls and write to file set(gca, 'Visible', 'off') set(gca, 'Position', [0 0 1 1]) print(gcf, '-djpeg', 'the amigos voodoo.jpg')

## Two amigos MATLAB contest

Today I discovered a MATLAB mini-contest called The Two Amigos. The idea is two use MATLAB to remove Bob from a photo of the three Pick-of-the-Week bloggers. The contest officially closed last week but they had no entries by submission day so you’re still in with a chance, if you’re quick.

I hadn’t done any image processing in MATLAB until earlier today, and I don’t have access to the image processing toolbox, so my attempt is pretty basic. I’m posting my submission here to give you a headstart. As with all the other code on this blog, it is licensed under the WTFPL, so you can literally do “what the f*ck” you want with it. ( If you submit something based upon my code though, an attribution would be appreciated.)

First up: reading and displaying an image in MATLAB is easy.

theAmigos = imread('threeamigos-800w.jpg'); image(theAmigos)

My first idea was to simply place a black rectangle over a region roughly corresponding to Bob.

% Take a copy of the basic image theAmigosBlackout = theAmigos; % Select 'Bob' region bobRectX = 220:563; bobRectY = 300:470; % Make this region black theAmigosBlackout(bobRectX, bobRectY, : ) = 0; image(theAmigosBlackout)

Hmm. The black region looks a little severe. It would be easier on the eye to simply blur him out. My homemade blur technique uses the `filter2`

function to create a moving average filter (a very simple smoother) to blur the region.

theAmigosBlur = theAmigos; blurRadius = 20; % Calculate weights for the blur filter w = [(blurRadius + 1):-1:1 2:(blurRadius + 1)]; w2 = repmat(w, length(w), 1); weights = 1 ./ (w2 + w2'); weights = weights / sum(weights(:)); % Apply filter to each colour channel for i = 1:3 theAmigosBlur(bobRectX, bobRectY, i) = ... filter2(weights, theAmigosBlur(bobRectX, bobRectY, i)); end image(theAmigosBlur)

The problem here is that the edges of the blurred region look darker. Presumably missing values are considered as black, for some reason. To solve this we only use the central “valid” region of the filter. This involves some fiddling to extend the rectangle’s region, which in turn involves some fiddling to extend the base of the image.

theAmigosBetterBlur = theAmigos; % Temporarily extend image bottom bottom = repmat(theAmigosBetterBlur(end, :, : ), blurRadius, 1); theAmigosBetterBlur = [theAmigosBetterBlur; bottom]; % Extend rectangle extendedX = (min(bobRectX) - blurRadius):(max(bobRectX) + blurRadius); extendedY = (min(bobRectY) - blurRadius):(max(bobRectY) + blurRadius); % Apply filter to valid region for i = 1:3 theAmigosBetterBlur(bobRectX, bobRectY, i) = ... filter2(weights, theAmigosBetterBlur(extendedX, extendedY, i), 'valid'); end % Remove the extension to the bottom of the image theAmigosBetterBlur = theAmigosBetterBlur(1:(end - blurRadius), :, : ); image(theAmigosBetterBlur)

Can you do better than this? Maybe you can figure out how to do edge detection, or find a better way to find the ‘Bob’ region? Can you think of a more appropriate substitute than blurring? Maybe a few MATLAB logos in there might swing the judges decision. Or you could recreate the special effect used when they transport people on Star Trek.

Let me know how you get on.