Solution to fivethirtyeight Riddler’s puzzle, Shared Birthdays (December 9, 2022).

Suppose people walk into a room, one at a time. Their birthdays happen to be randomly distributed throughout the 365 days of the year (and no one was born on a leap day). The moment two people in the room have the same birthday, no more people enter the room and everyone inside celebrates by eating cake, regardless of whether that common birthday happens to be today.

On average, what is the expected number of people in the room when they eat cake?

Extra credit: Suppose everyone eats cake the moment three people in the room have the same birthday. On average, what is this expected number of people?

Monte Carlo Simulation

import random

random.seed(123)

def simulate_data(trials, days, rep_birthdays):
    lengths = []
    for trial in range(1, trials+1):
        x = []
        for k in range(1, days + 2):
            z = random.sample(range(1, days+1), 1)[0]
            if x.count(z) == rep_birthdays - 1:
                lengths.append(k)
                break
            else:
                x.append(z)
    return lengths

Running the simulation for a number of trials:

trials = int(1e5)
n_people = simulate_data(trials, 365, 2)

We can now calculate the expectation (mean) and the 95% confidence intervals for the sample.

import numpy as np

avg = np.mean(n_people)
se = np.std(n_people) / np.sqrt(trials)
ci = (avg - 1.96 * se, avg + 1.96 * se)
ci = [round(num, 4) for num in ci]
ci = "[" + ", ".join(map(str, ci)) + "]"

Answer

The answer to the puzzle is 24.7021 with a bound of [24.6266, 24.7775].

Extra credit:

trials = int(1e4)
select_matches = list(range(2, 10))

results = [np.mean(simulate_data(trials, 365, x)) for x in select_matches]

print("answer to extra credit: ", results[1])
## answer to extra credit:  88.6022

Additional Notes

I am curious to see how this value grows as we add more “matches”.

results <- py$results

enframe(results, name = "select_matches", value = "results") |>
  unnest() |>
  ggplot(aes(select_matches, results)) +
  geom_point() +
  scale_x_continuous(n.breaks = 10) +
  geom_hline(yintercept = 365, col = "red", linetype = "dashed") +
  labs(
    x = "Number of People With Same Birthday",
    y = "Number of People Eating Cake"
  )