Back to Bayesics

Why isn’t the sky violet? Part 1

April 20th, 2025
As everyone knows, the sky is blue because it’s a reflection of the ocean, which is also blue. And the ocean, well, it’s a reflection of the sky.

The real reason has something to do with Rayleigh scattering, although as xkcd likes to reiterate, that doesn’t tell the full story either. In the next couple of posts I’d like to explore the different phenomena at play. In this first post I’m going to start by trying to understand the interplay between perception, physics and computer graphics in perceiving and displaying colours. The goal will be to develop a translation from a power spectrum to the equivalent colour in RGB space.

In the next post we can then focus on how to derive the power spectrum of the sky. To give you a sneak peek we’ll need to look at black-body radiation, scattering and some geometry to understand how light from the sun eventually reaches your eyes. But first, we will need to cast our minds back nearly 100 years ago, to the origin of how colours are displayed today.

The standard observer

In the late 1920s and early 1930s Wright and Guild independently ran a series of experiments to understand how mixing illumination sources of different wavelengths can be perceptually identical to a different illumination source, despite the two having very different power spectra. This concept is known as metamerism, and the two power spectra are known as metamers if they look the same to the human eye. It had been hypothesised at the time that all colours could be produced by mixing three so-called primary colours, and in fact this is something that many people are still taught today. Guild notes, with some degree of sass, that this is not the case:

Maxwell’s results indicate that the spectrum he employed was very impure, particularly in the blue-green part of the spectrum. This gave an entirely fictitious approximation of the spectrum locus to two sides of the colour triangle and led Maxwell to conclude that “. . . all the colours of the spectrum may be compounded of those which lie at the angles of this triangle,” and that there was “strong reason to believe that these are the three primary colours corresponding to those modes of sensation in the organ of vision.” These conclusions have long been known to be quite untenable, there being no spectral colours which are in any sense Primary colours or from which all other colours can be compounded.

Maybe he was just jealous that Maxwell unravelled the beautiful relationship between electric and magnetic fields and was hoping to have his own equations named after him. Anyway, moving on.

The way they quantified metamerism was to set up an experiment as follows:
- A circular disk is placed in front of an observer at a distance such that it subtends a 2˚ angle of the subject’s vision. This is important because colour vision is different in different parts of our retina, owing to the different densities of rods and cones on the retina. More on this later.
- One half of the disk is illuminated with light of a known wavelength, e.g. a particular line from the mercury emission spectrum.
- The other half is illuminated by a mix of three illumination sources. In Wright’s experiment they were 650 nm (red), 530 nm (green) and 460 nm (blue).
- The observer varies the intensity of the three primary colours until the two halves match.
The procedure was repeated for different target wavelengths and different observers in order to produce average colour matching functions. Here you can see the results from Wright’s paper:

The keen observer will note that some of the values are negative. It is of course not possible to illuminate something with a negative amount of light. Instead, the experimenters came up with an ingenious method based on the assumption of linearity in light perception.

In 1853 Herman Grassmann proposed a set of laws relating to colour matching. Of interest is the third law:

There are lights with different spectral power distributions but appear identical. First corollary: such identical appearing lights must have identical effects when added to a mixture of light. Second corollary: such identical appearing lights must have identical effects when subtracted (i.e., filtered) from a mixture of light.

When they couldn’t match a target exactly, observers were asked to add light to the target instead. Using Grassmann’s third law (Which is just linearity with extra steps) we can then infer what light we would have had to add to match the original target. We simply subtract the light that was added to the target from both halves of the disk. In this sense we can end up with a negative amount of a primary colour.

In other words, if we cannot match a specific target, we can instead add some amount of, say, red, to it first. If we can match this reddened target that would be equivalent to “subtracting” red to match the original target.

The CIE 1931 colour spaces

By combining the results from the above experiments the International Commission on Illumination (CIE) introduced the CIE 1931 colour spaces. The first of these is the CIE RGB space, which is directly based on the experiments above. They define a so-called tristimulus value of R, G and B corresponding to the amount of each primary colour to add to match a particular power spectrum. Assuming linearity, we can use the colour matching functions obtained for matching a monochromatic target to a whole spectrum P(lambda) as such:

$R = \int P(\lambda) \bar{r}(\lambda) d\lambda$

$G = \int P(\lambda) \bar{g}(\lambda) d\lambda$

$B = \int P(\lambda) \bar{b}(\lambda) d\lambda$

At the time, computation was still done by hand using slide-rulers and pen and paper. Therefore, although the colour matching functions above were perfectly reasonable they were not ideal to do calculations with and the CIE decided to define another colour space, CIE XYZ. The Wikipedia page lists the properties of this space. Most importantly the coordinates are all non-zero for any physically realisable colour. In general the article is a great starting point if you want to go down the rabbit hole of colour theory… For our purposes, suffice it to say that there is a reversible linear transformation between the two, and as a consequence there is also a set of colour matching functions for the XYZ space.

Note that this RGB space is just one of many possible RGB spaces, and the one that’s used by your display is almost certainly not CIE RGB. If you mess around with the display settings on your Mac you can see there are lots of different presets that (not so) subtly change the colours you see on the screen.

A way to visualise how the different RGB colour spaces relate to each other is to project the XYZ space into two dimensions. If you normalise each component by the sum of the three, you can plot x and y (denoted with a lower-case when normalised) in two dimensions. You can ignore z since it’s implied by the other two. In this space the different RGB spaces can be visualised by triangles, with corners where their primaries are. Note that not all points in this space have an associated colour. These are known as imaginary colours and cannot be physically realised. Some RGB spaces use imaginary colours as primaries, which means that not all points in the space can actually be displayed.

With all this in mind the translation from spectrum to colour will be as follows:
- Using the colour matching functions, integrate the power spectrum you want to visualise to generate the coordinate in XYZ space.
- Convert this value to your favourite RGB space
First, we project our power spectrum to an XYZ value using the official colour matching spectra. Then we need to find the linear translation between the two spaces. I’m going to use the space sRGB as an example, whose definition is given in the W3 standard. Although my Mac uses the P3 space by default, everything on the web is assumed to use sRGB colours.

Each RGB space is defined by its three primaries in xy space, as well as its whitepoint. The whitepoint is the XYZ value that corresponds to R = G = B = 1. Let’s write down some equations to relate these definitions. First, we’ll parameterise the linear transformation from XYZ to sRGB space with an unknown matrix T.

$\begin{pmatrix}R\\G\\B\end{pmatrix}_{RGB} = T \begin{pmatrix}X\\Y\\Z\end{pmatrix}_{XYZ}$

Next, let’s write down the equations defining the primary colours:

$\begin{pmatrix}k_1\\0\\0\end{pmatrix}_{RGB} = T \begin{pmatrix} 0.640\\ 0.330\\0.03\end{pmatrix}_{XYZ}$

$\begin{pmatrix}0\\k_2\\0\end{pmatrix}_{RGB} = T \begin{pmatrix} 0.3\\ 0.6\\0.1\end{pmatrix}_{XYZ}$

$\begin{pmatrix}0\\0\\k_3\end{pmatrix}_{RGB} = T \begin{pmatrix} 0.15\\ 0.06\\0.79\end{pmatrix}_{XYZ}$

I’ve introduced a variable k_i per equation to indicate that we don’t know the exact scaling of T from the primary colours (They’re only given in chromaticities, xy). The scaling we will get from the whitepoint, which in sRGB is the D65 point, named after the spectrum of a 6500 K black-body¹:

$\begin{pmatrix}1\\1\\1\end{pmatrix}_{RGB} = T \begin{pmatrix} 0.95047\\ 1.00\\1.08883\end{pmatrix}_{XYZ}$

We have a total of 12 equations (3 primary colours * 3 components + 3 components for the whitepoint) and 12 unknowns (9 elements of T + k_1, k_2, k_3). With a little bit of algebra, the following matrix comes out on the other side:

$T = \begin{pmatrix} 3.240625 & -1.537207 & -0.498628\\ -0.969243 & 1.875885 & 0.041555\\0.055630 & -0.203996 & 1.057221\end{pmatrix}$

Let’s use this to see what the full spectrum of monochromatic colours would look like. First, here are the XYZ colour matching functions:

Note that the absolute scale does not matter since it is calibrated via T such that the whitepoint has RGB = 1, 1, 1. However, the relative scale does matter. Therefore the above curves are scaled to have the same integral, and the position of primaries in the different standards are set with this scaling in mind. Having figured out the elements in T we can apply the transformation to get the RGB values:

Again we notice some pesky negative values! These negative values correspond to colours that are outside the colour triangle defined by the three primaries. Unfortunately there’s not much we can do about these, other than to try to find the closest match within the triangle. They are simply unrepresentable colours in the RGB space.

“Closest match” is a somewhat subjective terms. There have been some attempts to make this formal by defining colour spaces where points close in the euclidean sense are also perceptually similar². However, in this case we’re going to do a quick and dirty trick, which is to add some white (equal amounts of each primary) to the colours which have negative values. This desaturates them, bringing them into the colour triangle.

There is one more subtlety we have to take into account before we can actually see some colours. The RGB values we have calculated thus far are proportional to the intensities we want to display. But under the hood the screen controls the LEDs via a specific voltage V, not the output intensity I. The actual intensity follows a power law:

$I \approx kV^\gamma$

The gamma in that equation gives this correction the name “gamma correction” (hehe). The sRGB standard defines a specific equation to perform this correction. Doing so gives us the spectrum below:

This spectrum looks pretty decent but there are some strange “discontinuities”, such as the blue band just above 450 nm. We can get rid of these artifacts by handling the negative values differently. We can add a constant amount of white to all the colours, such that they’re all within the colour triangle:

This spectrum looks much smoother but is also more washed out. For the next blog posts this won’t matter too much since I’m not going to be displaying a lot of monochromatic light. We’ll therefore sit comfortably within the range of colours that screens can display.

It can also be interesting to look at what these spectra look like if we convert them back into XYZ space. Here they are:

As you can see both methods convert the original points such that they now sit inside the colour triangle. Method 2 touches the triangle in one spot, which corresponds to its minimum RGB value being 0.

And that’s it for the main quest! We now have everything we need to proceed in determining the colour of the sky. However, I do want to take a quick detour to talk about why using exactly three primary colours is perfect and how this all relates to what’s going on in our eyes.

Rods and cones

As someone who never did any more biology than necessary I’m not going to pretend to understand the intricacies of sight. However, the process boils down to light hitting photosensitive cells on our retinas, causing a signal to propagate to the brain. The brain then combines these signals to form an image. There are two types of light sensors in our eyes: rods and cones. Rods are far more sensitive than cones and so they tend to be used particularly in low-light conditions. However, while different wavelengths excite the rods by a different amount, there is no way for the brain to distinguish between a bright light at one wavelength vs a dimmer one from another wavelength³. That’s where cones enter the picture. They come in three types: short, medium and long. Depending on the incoming power spectrum they will each get excited a different amount, and it’s the combination of the three that the brain uses to create colour.

In other words, our eyes do not actually measure the power spectrum directly but projects it onto three dimensions which we interpret as colour.

This webpage tabulates cone sensitivity spectra, also known as cone fundamentals. The website has a few different options for models and formats. Below you can see the linear normalised sensitivity spectra for each type of cone based on the 2-degree model from Stiles & Burch.

With these spectra we can easily determine the level of signal in each cone based on an incoming spectrum P. For a sensitivity spectrum S, the level of excitation e is

$e \propto \int P(\lambda) S(\lambda) d\lambda$

If we want to create sRGB values from these cone fundamentals we will need to know the power spectrum of the primaries. Unfortunately, no such thing exists, since the standard is defined in terms of xy values⁴. However, all hope is not lost. If we can find a power spectrum that gives the same xy value as the primary we can use this. Given that there are an infinite number of spectra that will produce the same xy values that shouldn’t be too hard. Let’s parameterise a simple model based on a normal distribution:

$P(\lambda) \propto \exp(-0.5(\lambda-\lambda_0)^2/\sigma^2)$

We can then calculate the XYZ value of this spectrum by using the colour matching functions (See the three consecutive equations for R, G and B above). By varying lambda_0 and sigma we can minimise the distance between the XYZ values we get and the desired one for each primary. For the sRGB primaries these are the results I get:

Desired xy: [0.64 0.33]. Actual xy: [0.654 0.345]
lambda_0 = 770.49, sigma=70.13

Desired xy: [0.3 0.6]. Actual xy: [0.300 0.600]
lambda_0 = 536.63, sigma=33.61

Desired xy: [0.15 0.06]. Actual xy: [0.140 0.060]
lambda_0 = 430.63, sigma=37.56

And here is what the spectra look like:

We will also need to figure out the relative scale between these. We can do this with the whitepoint. It should be no surprise that we require the exact same data to pin down the sRGB values as before. There is no such thing as free lunch! We need to match the XYZ value of the whitepoint by scaling the unscaled XYZ values of each primary.

$\begin{pmatrix}0.95047\\1.00\\1.08883\end{pmatrix} = k_r * XYZ_r + k_g * XYZ_g + k_b * XYZ_b$

We can rewrite the RHS as a matrix multiplication, which is easily solvable using np.linalg.solve or similar.

$\begin{pmatrix}0.95047 &1.00 & 1.08883\end{pmatrix} = \begin{pmatrix} k_r & k_g & k_b \end{pmatrix}\begin{pmatrix} X_r & Y_r & Z_r \\X_g & Y_g & Z_g \\X_b & Y_b & Z_b \end{pmatrix}$

With the primaries in hand we can determine sRGB values for any spectrum. First, let’s define the LMS colour space to be a vector indicating the excitement of each cone type. This is analogous to the XYZ or CIE RGB spaces, with the cone fundamentals acting as colour matching functions. We will parameterise the relationship between LMS space and sRGB with a matrix A:

$\begin{pmatrix}L\\M\\S\end{pmatrix}_{LMS} = A \begin{pmatrix}R\\G\\B\end{pmatrix}_{sRGB}$

We can then use each primary (P_r, P_g, P_b) to solve for A. Here is the equation for P_r:

$\begin{pmatrix}\int P_r(\lambda)l(\lambda)d\lambda\\\int P_r(\lambda)m(\lambda)d\lambda\\\int P_r(\lambda)s(\lambda)d\lambda\end{pmatrix}_{LMS} = A \begin{pmatrix}1\\0\\0\end{pmatrix}_{sRGB}$

The RHS simply evaluates to the first column of A, so the equation ends up very easy to solve. By performing the integrals for P_g and P_b we get the second and third columns as well.

As a proof of principle let’s now use A to generate the sRGB for the colour spectrum we did above. All we have to do is to invert A and left-multiply it with the cone fundamentals. Here they are:

It is the same plot as the one we saw earlier! But instead of arriving at it directly through colour matching functions we used the cone fundamentals and the⁵ power spectra of the primary colours. Of course, we still had to use the colour matching functions but only because that’s the space in which sRGB is defined. In a more modern standard, Rec. 2020, the primaries are actually defined in terms of pure wavelengths and so we could skip the XYZ space entirely.

Why 3?

Knowing that there are three types of cone cells, it’s not too surprising to learn that all of these colour spaces also have three components (RGB, XYZ, LMS, LAB, etc.) This is of course no coincidence. It is possible to define a colour space with more or fewer primaries but you quickly run into trouble. With just two primaries you can only represent a very limited number of colours. Just one plane in XYZ space. Discounting luminosity, just one line in the xy space. With more than three primaries you end up with a non-unique representation of colours. This is of course possible (See the CMYK space for example⁶) but it leads to mathematical difficulties when converting back and forth. All of our matrices become non-square, so don’t have inverses.

This non-uniqueness can also be understood more intuitively without considering the linear algebra. With four primaries the gamut of the space becomes the interior of the quadrilateral formed by the four corners. If you pick any three corners to form a triangle, then do it again with another three corners, it’s easy to see that the two triangles must have interior points in common. That means that there are colours which can be formed by a combination of three colours in two different ways.

100 years later

It is fascinating that something as fundamental as computer graphics is still based on sensory experiments from almost 100 years ago. Although the colour matching functions have been refined throughout the years, it is incredible that something as seemingly subjective as the perception of colour can prove to be so enduring, and match so well to physical measurements of cone cells. There are no Guild equations as far as I’m aware but his and his colleagues’ work have certainly had a huge impact. We are indeed standing on the shoulders of giants.
1. Fun fact: D65 actually matches the spectrum of a 6504 K black-body, since the value of Planck’s constant has been updated since the introduction of D65. ↩︎
2. See the CIELAB and CIELUV spaces, as well as this great post from Björn Ottosson. ↩︎
3. As I’m writing this I wonder if our brain uses the signal from both rods and cones to infer colour. From what I’ve read rods tend to be ignored in the discussion of colour but I don’t see why. Perhaps it has something to do with the different absolute sensitivities of the two types of cells ¯\_(ツ)_/¯ ↩︎
4. It makes a lot of sense for the standard to specify its primaries in terms of xy values instead of prescribing a specific power spectrum. Different devices will be able to produce different power spectra that correspond to the same xy values. The xy definition gives them better flexibility in implementation, without causing the primaries to look any differently to the human eye. ↩︎
5. Or rather, one possible power spectrum of each primary. ↩︎
6. The CMYK space is not an additive colour space but a subtractive one. That means the equations for how spectra relate to the colour space change slightly. However, it is still a 4-dimensional colour space ↩︎
The rain drop problem

October 14th, 2023
As it begins to rain, the raindrops hit the pavement and leave dark spots. Initially, they appear as a sparse polka dot pattern but as the rain goes on, as it so often does here in London, they start to overlap and before you know it the pavement has taken on a uniform darker colour. The physics of why wet things appear darker is a fascinating topic in itself but that’s not what we’re going to look at today. Instead, we’ll be asking the (also fascinating!) question of how long it takes for the ground to go from completely dry to completely wet.

In order to attack this question we need to idealise the situation so we can define the problem properly. First of all, let’s assume the rain drops fall independently of each other at a constant rate k in both time and space on an infinite plane. To make this statement a bit more formal, consider a region¹ with some surface area A after some time t since the beginning of the rain fall. The number of rain drops that have fallen in this region, N, is a random variable that follows the Poisson distribution:

$P(N=n)=\frac{\lambda^ne^{-\lambda}}{n!}\text{, where } \lambda = kAt$

This is also known as a Poisson point process. Secondly, each time a rain drop hits the ground it leaves a circular mark of radius r. If any of the ground is already wet from a previous rain drop we assume it doesn’t get any wetter. In other words, a particular point on the ground is considered wet, or covered, if it is within a distance r of the centre of at least one rain drop.

Now that we’ve defined the situation properly (shut up, mathematicians) we can ask the following question: What is the expected ratio of the area covered in this region by at least one rain drop after some time t?

We’re going to attempt to solve this problem in three separate ways, hopefully giving us the same answer in each case.

Solution 1: Intersecting circles

This approach is conceptually simple but will turn out to be the most algebraically disgusting, which is why we’re starting here. Always eat your vegetables before having dessert, right? The plan of attack is to calculate the area covered by N circles as a function of their positions, then integrate over all possible N and all possible positions. If we denote the set of points covered by the ith drop A_i, the set of all covered points will be equal to the union of these sets A_i, for i=1, 1, ... N. The area of the union of these sets is given by the inclusion-exclusion principle, alternating between adding and subtracting increasingly higher order intersections. The sum is not infinite but stops when we’ve reached the term that includes the intersection of all N circles.

$|\bigcup_i A_i|=\sum_i{|A_i|} - \sum_{i<j}{|A_i\cap A_j|} + \sum_{i<j<k}|A_i \cap A_j \cap A_k | - ...$

Now, this is a pretty scary-looking formula. In principle we will have to determine the circle-circle intersection formula, then the circle-circle-circle intersection formula and so forth. It seems like a hopeless task. However, we are saved by the fact that we are not interested in just the area of overlap for a particular configuration (i.e., a particular distance between circles) but the intersection averaged over all possible distances. Let’s begin with the circle-circle intersections, considering for a moment the expected value of the overlap between circles A_i and A_j. We can do this by defining functions a_i and a_j to be 1 when their argument is less than a distance r from the origin and zero otherwise². With this definition, the area of overlap is simply the product of a_i and a_j integrated over all space. To find the expected value of this overlap, we simply integrate over all possible circle centres r_i and r_j then divide by the measure of the displacement space. If we plug in our infinitely large space immediately we will see that this value tends to zero, so for now we will integrate over a large area S then take the limit once we’ve got the full expression. Writing down the above more formally, we get

$\mathbb{E}\left(|A_i\cap A_j|\right)=\frac{1}{S^2}\iiint a_i(\underline{x}-\underline{r_i})a_j(\underline{x}-\underline{r_j}) d\underline{x}d\underline{r_i}d\underline{r_j}$

where underlines indicate 2D vector variables. Reiterating what was said above, the integrand is the product of the “circle” functions displaced by some r_i and r_j within a large region of area S. The innermost integral is over the entire region and represents the area of overlap for a particular pair of displacements. The two outermost integrals are there to compute the expected area of overlap along with the normalisation factor 1/S^2 in the front.

This expression is very similar to the expression for the integral of a convolution, and using a similar switch in the order of integration you can show that it is simply equal to³

$\mathbb{E}\left(|A_i\cap A_j|\right)=\frac{(\pi r^2)^2}{S}$

In fact, for the intersection of k circles the expected value of overlap is equal to

$\mathbb{E}\left(|A_1\cap A_2 \cap ... \cap A_k|\right)=\frac{(\pi r^2)^k}{S^{k-1}}$

I can see a power series approaching…

Indeed, let’s plug this formula into the original expression

$\mathbb{E}|\bigcup_i A_i|=\sum_i{\mathbb{E}|A_i|} - \sum_{i<j}{\mathbb{E}|A_i\cap A_j|} + \sum_{i<j<k}\mathbb{E}|A_i \cap A_j \cap A_k | - ...$

$\mathbb{E}|\bigcup_i A_i|= \sum_i{(\pi r^2)} - \sum_{i<j}{\frac{\pi r^2}{S}} + \sum_{i<j<k}\frac{(\pi r^2)^2}{S} - ...$

We’re very close to an expression we can evaluate but we need to make a few final substitutions. First of all, the summands are all constants so we can replace the summations with a multiplication by the number of terms per summation. For sum_i that’s N, sum_i<j it’s N choose 2, sum_i<j<k is N choose 3 and so forth. Rewriting, we get

$\mathbb{E}|\bigcup_i A_i|= S \sum_{n=1}^{N} -\left(-\frac{\pi r^2}{S}\right)^n \times {N \choose n}$

We’re now very close to being able to take the limit as S and N go to infinity. In the problem definition, we see that N = kSt so we’re going to substitute that in. Secondly, to get the expected coverage we have to divide the equation by S to go from area covered to the ratio of the covered area.

$C = \frac{1}{S} \mathbb{E}|\bigcup_i A_i|= \sum_{n=1}^{kSt} -\left(-\frac{\pi r^2}{S}\right)^n \times {kSt \choose n}$

We can expand the definition of kSt choose n to get something that looks a lot more like an exponential power series.

$C = \sum_{n=1}^{kSt} - \left(-\frac{\pi r^2}{S}\right)^n \times \frac{(kSt)!}{(kSt-n)!n!}$

We’re nearly able to write this as an exponential but the terms aren’t quite right. The last piece of trickery we will need to invoke is a variant of Stirling’s approximation, which says that for a large a and small b, the following approximation holds:

$\ln(\frac{a!}{(a-b)!}) = \sum_{a-b}^{a} \ln(x) \approx \int_{a-b}^{a} \ln(x) dx$

For the early terms in the series above, kSt is a large number and n is small. Later in the series n becomes a large number. However, we’re going to ignore this based on the fact that the ratio (pi r^2 / S)^n becomes vanishingly small with large n, so the later terms do not contribute significantly to the sum. This isn’t a particularly rigorous argument, in fact it’s not an argument at all. We’ll leave the mathematicians to sort this one out.

With our heads held high in blissful ignorance of rigour, let’s use Stirling’s approximation to rewrite our factorials.

$\int_{a-b}^{a} \ln(x) dx \left[ x\ln{x} - x \right]_{a-b}^a$

We can further approximate ln(a) ~ ln(a-b) to get our final expression.

$\int_{a-b}^{a} \ln(x) dx \approx (a\ln(a) - a) - ((a-b)\ln(a) - a + b) = b\ln(a) - b \approx b\ln(a)$

In other words a!/(a-b)! ~ a^b, which sounds fairly reasonable. The remaining thing to do is to plug this back into our formula for the coverage and see what we end up with.

$C = \sum_{n=1}^{kSt} - \left(-\frac{\pi r^2}{S}\right)^n \times \frac{(kSt)^n}{n!} = -\sum_{n=1}^{kSt} \frac{(-\pi r^2kt)^n}{n!}$

The final expression is very nearly equal to an exponential power series, just missing out the n=1 term. Substituting that in, we get the final expression for C in all of its glory:

$C = 1 - \exp\left(-\pi r^2kt\right)$

It’s worth pausing briefly to see that this expression behaves like we expect it to. For low values of t we have C ~ pi r^2 k t. This corresponds to the linear regime right at the beginning where none of the droplets overlap. In that case we have kt droplets per unit area, each covering an area pi r^2 which is exactly what the formula is telling us. In the limit of infinitely large t we have of course that C ~ 1, which is what we all know from going outside after it’s been raining for an hour.

Solution 2: Differential equations

Rather than considering the relative positioning between all of the droplets after some finite time t, consider instead the effect on the coverage of adding one more droplet. Starting with a covered area C(N) after N droplets, the probability that the centre of a new droplet is a point that has already been covered is of course C(N). This is also true for all of the other points within the circle. Therefore, on average the added coverage is pi r^2 * (1 - C(N)) / S, where S is the area of the whole region. As N tends to infinity we can write this as a differential equation.

$\frac{dC}{dN} = \frac{\pi r^2}{S} \times (1 - C)$

This is a separable differential equation, so we can rewrite as follows:

$\frac{dC}{1-C} = \frac{\pi r^2}{S} dN$

Integrating on both sides, we get

$-\ln(1 - C) = \frac{\pi r^2}{S} N + c$

Initially, there are no droplets and the coverage is zero, so N and C are zero. Plugging these values in, we see that the integration constant c must also be zero. Rewriting, we get an expression for the coverage as a function of the number of droplets:

$C = 1 - \exp(\frac{-\pi r^2 N}{S})$

After making the substitution N = ktS, this is the same solution that we got earlier. In other words, it’s either the correct solution or we’ve made the same mistake twice!

Solution 3: Probabilistic approach

For the final and simplest solution we are going to use the fact that the number of droplet centres within a particular region follows a Poisson distribution. If we pick a random point on the plane and draw a circle of radius r around it, the point is covered if there is at least one droplet within this circle. The circle has an area of pi r^2, so the probability distribution of the number of droplets M within it is given by the following Poisson distribution:

$P(M=m)=\frac{\lambda^me^{-\lambda}}{m!}\text{, where } \lambda = \pi r^2 kt$

The probability that the point is covered is equivalent to 1-P(M=0), which is equal to

$C = 1 - P(M=0)=\frac{\lambda^0e^{-\lambda}}{0!} = 1 - e^{-\lambda}$

After substituting in the value of lambda this is unsurprisingly the same expression as the one we reached twice before.

Conclusion

Although we took some mathematical liberties (I believe it’s called a poetic license) we managed to get the same answer three times using very different approaches. It’s interesting that three reasonable approaches can produce solutions with such different levels of complexity. One explanation for this may be that the first solution is “purer” in the sense that it uses less abstract machinery. In the two last solutions we are using differential equations and Poisson distributions which end up doing a lot of heavy lifting. In the first solution you get the exponential by producing the power series by hand, whereas with a Poisson distribution you get it for free. We are indeed standing on the shoulders of wet giants.
1. There are some restrictions on how we choose this region but assuming you don’t create a pathological one it should be fine. Imagine a child drawing a loop and defining the region to be all points within the loop. ↩︎
2. a_i and a_j are in fact the same function but we distinguish them here for readability. ↩︎
3. This isn’t quite right because we’re integrating over a finite region. However, the edge effects tend to zero in the limit of an infinitely large region. The UK is a great example of an approximately infinite region with a constant rate of rain fall. ↩︎