Archive
How long does it take to get pregnant?
My girlfriend’s biological clock is ticking, and so we’ve started trying to spawn. Since I’m impatient, that has naturally lead to questions like “how long will it take?”. If I were to believe everything on TV, the answer would be easy: have unprotected sex once and pregnancy is guaranteed.
A more cynical me suggests that this isn’t the case. Unfortunately, it is surpisingly difficult to find out the monthly chance of getting pregnant (technical jargon: the “monthly fecundity rate”, or MFR), given that you are having regular sex in the days leading up to ovulation. Everyone agrees that age has a big effect, with women’s peak fertility occuring somewhere around the age of 25. Beyond that point, the internet is filled with near-useless summary statistics like the chance of conceiving after one year. For example, the usually reliable NHS site says
Women become less fertile as they get older. For women aged 35, about 94 out of every 100
who have regular unprotected sex will get pregnant after three years of trying. However, for
women aged 38, only 77 out of every 100 will do so.
I found a couple of reasonably sciency links(George and Kamath, Socal Fertility) that suggest that the MFR is about 25% for a women aged 25, and 10% at age 35. The Scoal link also gives rates of 15% at age 30, 5% at age 40 and less than 1% at age 45. If the woman is too fat, too thin, a smoker, or has hormone problems, or is stressed, then the rate needs reducing.
Given the MFR, the probability of getting pregnant after a given number of months can be calculated with a negative binomial distribution.
months <- 0:60
p_preg_per_month <- c("25" = 0.25, "30" = 0.15, "35" = 0.1, "40" = 0.05, "45" = 0.01)
p_success <- unlist(lapply(
p_preg_per_month,
function(p) pnbinom(months, 1, p)
))
Now we just create a data frame suitable for passing to ggplot2 …
mfr_group <- paste( "MFR =", format(p_preg_per_month, digits = 2), "at age", names(p_preg_per_month) ) mfr_group <- factor(mfr_group, levels = mfr_group) preg_data <- data.frame( months = rep.int(months, length(mfr_group)) , mfr_group = rep(mfr_group, each = length(months)), p_success = p_success )
and draw the plot.
library(ggplot2)
(p <- ggplot(preg_data, aes(months, p_success, colour = mfr_group)) +
geom_point() +
scale_x_continuous(breaks = seq.int(0, 60, 12)) +
scale_y_continuous(breaks = seq.int(0, 1, 0.1), limits = c(0, 1)) +
scale_colour_discrete("Monthly fecundity rate") +
xlab("Months") +
ylab("Probability of conception") +
opts(panel.grid.major = theme_line(colour = "grey60"))
)
So almost half of the (healthy) 25 year olds get pregnant in the first monthtwo months, and after two years (the point when doctors start considering you to have fertility problems) more than 90% of 35 year olds should conceive. By contrast, just over 20% of 45 year old women will. In fact, even this statistic is over-optimistic: at this age, fertility is rapidly decreasing, and a 1% MFR at age 45 will mean a much lower MFR at age 47 and the negative binomial model breaks down.
Of course, from a male point of view, conception is an embarrassingly parallel problem: you can dramatically reduce the time to conceive a child by sleeping with lots of women at once. (DISCLAIMER: Janette, if you’re reading this, I’m not practising or advocating this technique!)
Legendary Plots
I was recently pointed in the direction of a thermal comfort model by the engineering company Arup (p27–28 of this pdf). Figure 3 at the top of p28 caught my attention.
It’s mostly a nice graph; there’s not too much junk in it. One thing that struck me was that there is an awful lot of information in the legend, and that I found it impossible to retain all that information while switching between the plot and the legend.
The best way to improve this plot then is to find a way to simplify the legend. Upon closer inspection, it seems that there is a lot of information that is repeated. For example, there are only two temperature combinations, and three levels of direct solar energy. Humidity and diffused solar energy are kept the same in all cases. That makes it really easy for us: our five legend options are
| Outdoor temp (deg C) | Direct solar energy (W/m^2) |
|---|---|
| 32 | 700 |
| 32 | 150 |
| 32 | 500 |
| 29 | 500 |
| 29 | 150 |
Elsewhere we can explain that the mezannine/platform temps are always 2/4 degrees higher than outdoors, and that the humidity is always 50%, and that the diffused solar energy is always 100W/m^2.
Living in Buxton, one of the coldest, rainiest towns in the UK, it amuses me to see that their “low” outdoor temperature is 29°C.
The other thing to note is that we have two variables mapped to the hue. For just five cases, this is just about acceptable, but it isn’t the best option and it won’t scale to many more categories. It’s generally considered best practice to work in HCL color space when mapping variables to colours. I would be tempted to map temperature to hue – whether you pick red as hot and blue as cold or the other way around depends upon how many astronomers you have in your target audience. Then I’d map luminance (lightness) to solar energy: more sunlight = lighter line.
I don’t have the values to exactly recreate the dataset, but here are some made up numbers with the new legend. Notice the combined outdoor temp/direct solar energy variable.
time_points <- 0:27
n_time_points <- length(time_points)
n_cases <- 5
comfort_data <- data.frame(
time = rep.int(time_points, n_cases),
comfort = jitter(rep(-2:2, each = n_time_points)),
outdoor.temperature = rep(
c(32, 29),
times = c(3 * n_time_points, 2 * n_time_points)
),
direct.solar.energy = rep(
c(700, 150, 500, 500, 150),
each = n_time_points
)
)
comfort_data$combined <- with(comfort_data,
factor(paste(outdoor.temperature, direct.solar.energy, sep = ", "))
)
We manually pick the colours to use in HCL space (using str_detect to examine the factor levels).
library(stringr)
cols <- hcl(
h = with(comfort_data, ifelse(str_detect(levels(combined), "29"), 0, 240)),
c = 100,
l = with(comfort_data,
ifelse(str_detect(levels(combined), "150"), 20,
ifelse(str_detect(levels(combined), "500"), 50, 80))
)
)
Drawing the plot is very straightforward, it’s just a line plot.
library(ggplot2)
p <- ggplot(comfort_data, aes(time, comfort, colour = combined)) +
geom_line(size = 2) +
scale_colour_manual(
name = expression(paste(
"Outdoor temp (", degree, C, "), Direct solar (", W/m^2, ")"
)),
values = cols) +
xlab("Time (minutes)") +
ylab("Comfort")
p
Sensible people should stop here, and write the additional detail in the figure caption. There is currently no sensible way of writing annotations outside of the plot area (annotate only works inside panels). The following hack was devised by Baptiste Auguie, read this forum thread for other variations.
library(gridExtra)
caption <- tableGrob(
matrix(
expression(
paste(
"Mezzanine temp is 2", degree, C, " warmer than outdoor temp"
),
paste(
"Platform temp is 4", degree, C, " warmer than outdoor temp"
),
paste("Humidity is always 50%"),
paste(
"Diffused solar energy is always 100", W/m^2
)
)
),
parse = TRUE,
theme = theme.list(
gpar.corefill = gpar(fill = NA, col = NA),
core.just = "center"
)
)
grid.arrange(p, sub=caption)




