This post is a simulation based walkthrough, starting with simple Bernoulli trials and building up to random walks with drift. The random walk section also connects naturally to regression to the mean, which is a related idea worth understanding alongside CLT.

The central limit theorem (CLT) states that, regardless of the data generating distribution, if the sample size is increased, the repeated sampling distribution of the sample mean converges to a normal distribution. This convergence does not preserve the underlying distribution due to the aggregation of random fluctuations.

There is another concept that is closely linked to CLT, which is regression to the mean. When small fluctuations are added after each trial, they cancel each other in the long run, and there will be a course-correction on the particle (subject) to move towards the underlying mean.

Simulate CLT

The binomial distribution is given by,

\[P(X=x) = \binom{n}{x} \cdot p^x \cdot (1-p)^{n-x} = \frac{n!}{x!(n-x)!} \cdot p^x \cdot (1-p)^{n-x},\] where \(n\) is the number of trials, \(p\) is the probability of success in each trial, with the expected value \(E(x) = n \cdot p\).

Here I will generate data from binomial trials (size = 1). Each trial will follow a Bernoulli distribution with a probability of success, ‘p,’ assumed to be uniformly distributed between 0 and 1.

As a side note, there will be small random fluctuations in the value of p that follow a beta distribution, i.e., p ~ Beta(1,1). For example,

trials <- 1e3 # number of observations, or trials

plot_beta <- function() {
  replicate(trials, rbinom(n = 1, size = 1, prob = 0.5)) |>
    as_tibble() |>
    list() |>
    list_rbind(names_to = "simulation") |>
    ggplot(aes(value)) +
    # kernel density
    geom_histogram(aes(y = ..density..)) +
    scale_x_continuous(breaks = scales::pretty_breaks(n = 3)) +
    theme(axis.title = element_blank())
}

p_1 <- plot_beta()
p_2 <- plot_beta()
p_3 <- plot_beta()

gridExtra::grid.arrange(
  patchworkGrob(p_1 + p_2 + p_3),
  left = "density",
  bottom = "prob",
  top = "Beta(1,1) Distributions for Bernoulli Trials"
)

Although the beta distribution is not uniform on the unit interval, the binomial distribution resulting from the Bernoulli trials is.

Next is a simulation for Bernoulli outcomes. At each trial, the function Reduce would sum up the observations to obtain the sample mean. The replicate function then repeats this process a specified number of times.

replicate(n = trials, expr = Reduce(x = rbinom(n = trials, size = 1, prob = 0.5), sum)) |>
  as_tibble() |>
  list() |>
  list_rbind(names_to = "simulation") |>
  ggplot(aes(value)) +
  geom_histogram(aes(y = ..density..),
    binwidth = 1
  ) +
  stat_density(
    # sd taken from the same calculation
    data = tibble(value = rnorm(n = 1e6, mean = 500, sd = 15.8)),
    col = "#619CFF",
    geom = "line", size = 1
  ) +
  geom_vline(xintercept = 0.5 * trials) +
  scale_x_continuous(breaks = scales::breaks_pretty(n = 5)) +
  labs(title = "Distribution of Bernoulli Trial Outcomes, blue (~ Gaussian)")

The distribution has the characteristic bell shape even with a small sampling size.

You can expect similar outcomes from the following binomial trials, as compared to the simulation above:

rbinom(n = trials, size = trials, prob = 0.5) |>
  as_tibble() |>
  list() |>
  list_rbind(names_to = "simulation") |>
  ggplot(aes(value)) +
  geom_histogram(aes(y = ..density..),
    binwidth = 1
  ) +
  stat_density(
    # sd taken from the same calculation
    data = tibble(value = rnorm(n = 1e6, mean = 500, sd = 15.8)),
    col = "#619CFF",
    geom = "line", size = 1
  ) +
  geom_vline(xintercept = 0.5 * trials) +
  scale_x_continuous(breaks = scales::breaks_pretty(n = 5)) +
  ggtitle(label = "CLT for Binomial Trial Outcomes, blue (~ Gaussian)")

Simulate Random Walks With Drift

Lets take this one step further. consider a random walk with a positive drift.

\[y_t = y_{t-1} + μ + e_t,\]

where \(y_t\) is the value at time \(t\), \(y_{t-1}\) is the value at time \(t-1\), \(\mu\) is the drift, which is a constant added to each step, and \(e_t\) is the error term at time \(t\), which is a white noise series. This is a time trend \(tμ\) and a random walk process of \(e_t\).

The next piece of code generates a specified number of Bernoulli random walk processes with a number of steps. At each step (trial) of the process, the function rbinom() generates a random value of either 0 or 1 with a probability of success of 0.5, equivalent to flipping a coin. These are the random Bernoulli fluctuations that determine a move (1) or not (0).

Here is a plot with 100 simulations with cross sections as a reference:

simulations <- 100
trials <- 300
lines <- c(10, 50, 100, 200, 300)

map(
  1:simulations,
  \(i) accumulate(
    1:trials,
    \(acc, nxt) acc + rbinom(n = 1, size = 1, prob = 0.5)
  )
) |>
  set_names(paste0("sim", 1:simulations)) |>
  map(\(x) tibble(value = x, trial = 1:trials)) |>
  list_rbind(names_to = "simulation") |>
  ggplot(aes(x = trial, y = value)) +
  geom_step(aes(color = simulation), alpha = 0.4) +
  geom_vline(xintercept = lines, alpha = 0.5, linetype = "dashed") +
  scale_x_continuous(breaks = scales::breaks_pretty(n = 20)) +
  labs(
    title = "Simulated Random Walks With Drift Using Bernoulli Fluctuations",
    subtitle = "Vertical lines represent cross sections of the random walk process",
    x = "step (Bernoulli trial)"
  ) +
  guides(col = "none")

Here are the plots for these cross sections:

simulations <- 1e4
trials <- 300

random_walk_sim <- map(
  1:simulations,
  \(i) accumulate(
    1:trials,
    \(acc, nxt) acc + rbinom(n = 1, size = 1, prob = 0.5)
  )
) |>
  set_names(paste0("sim", 1:simulations)) |>
  map(\(x) tibble(value = x, trial = 1:trials)) |>
  list_rbind(names_to = "simulation") |>
  group_by(simulation) |>
  filter(trial %in% lines) |>
  group_by(trial) |>
  mutate(
    dist_mean = mean(value),
    trial = factor(trial)
  )

random_walk_sim |>
  ggplot(aes(value, fill = trial)) +
  geom_histogram(aes(y = ..density..), binwidth = 1) +
  geom_vline(
    data = random_walk_sim |> distinct(dist_mean, .keep_all = T),
    aes(xintercept = dist_mean, col = trial), alpha = 1, linetype = "dashed"
  ) +
  labs(
    title = "Distributions at Cross Sections for Bernoulli Random Walks ",
    fill = "Cross Section",
    x = "Mean value (0.5 * Bernoulli trials)"
  ) +
  scale_x_continuous(breaks = scales::breaks_pretty(n = 10)) +
  guides(col = "none")

At each cross section in the Bernoulli random walk process, the distribution of the simulated values is expected to follow a normal distribution, as predicted by the CLT. In our example, initially, the distributions do not look Gaussian, but a spread about the tails starts to form. After around 100 steps, the characteristic bell shape appears. Once the distribution has arrived at this shape, it will get wider and wider and flatten out.

Conclusion

In the context of a Bernoulli random walk with drift, the mean refers to the expected value of the process at a particular time. The presence of a drift term in a Bernoulli random walk does not change this fact. This means that extreme values are likely to be followed by values that are closer to the mean. This is an example of regression to the mean.