The Law of Large Numbers states that as the size of a sample drawn from a random variable increases, the mean of more samples gets closer and closer to the true population mean. We are going to simulate this using the steps below and then answer some questions:
First simulation: Generate a simulated population (you could also use real data here)
population <- runif(10000, min=1950, max=2020)
Set the highest number of samples needed
max_sample_size <- 1000
This vector will hold the mean calculated for each sample size
mean_vec <- rep(0, max_sample_size)
Illustrate the law of large numbers by calculating means for all of the different samples from size n=1 to n=max_sample_size.
for (n in 1:max_sample_size) {
Draw a random sample of length n from the plant height data
values <- sample(population, n)
Notice that we are filtering the data to only include adult plants and remove the outliers
Calculate the sample mean and store it in the mean_vec
mean_vec[n] <- mean(values)
}
Finally, plot the sample mean vs. the sample size
plot(seq(1, max_sample_size), mean_vec, xlab="Sample size", ylab="Sample mean")
Draw a red horizontal line with a Y-intercept equal to mu
abline(h=mean(population), col="red")
lln_mean <- data.frame(Sample_Size=seq(1, max_sample_size), mean_vec)
library(ggplot2)
sp_lln_mean <- ggplot(data=lln_mean, aes(x=Sample_Size, y=mean_vec)) +
geom_point(shape=1) +
geom_hline(yintercept=mean(population), color="red", size=1) +
labs(title="Simulated Uniform Distribution Means for Different Sample Sizes", x="Sample size")
(a) Why, at the maximum value of the sample size, does the sample mean still not exactly equal the population mean? (hint, you can increase the max sample size and see how the graph reacts)
(b) Try changing the distribution from a uniform to a binomial (rbinom) with a size=3 and prob=0.30 and rerunning the code. Does it appear the Law of Large Numbers appear to be dependent on the distribution of the random variable?
5. population <- runif(10000, min=1950, max=2020)
Set the highest number of samples needed
max_sample_size <- 1000
This vector will hold the mean calculated for each sample size
median_vec <- rep(0, max_sample_size)
Illustrate the law of large numbers by calculating means for all of the different samples from size n=1 to n=max_sample_size.
for (n in 1:max_sample_size) {
Draw a random sample of length n from the plant height data
values <- sample(population, n)
Notice that we are filtering the data to only include adult plants and remove the outliers
Calculate the sample mean and store it in the mean_vec
median_vec[n] <- median(values)
}
Finally, plot the sample mean vs. the sample size
plot(seq(1, max_sample_size), median_vec, xlab="Sample size", ylab="Sample median")
Draw a red horizontal line with a Y-intercept equal to mu
abline(h=median(population), col="red")
lln_median <- data.frame(Sample_Size=seq(1, max_sample_size), median_vec)
library(ggplot2)
sp_lln_median <- ggplot(data=lln_median, aes(x=Sample_Size, y=median_vec)) +
geom_point(shape=1) +
geom_hline(yintercept=mean(population), color="red", size=1) +
labs(title="Simulated Uniform Distribution Medians for Different Sample Sizes", x="Sample size")
(a) In your textbook, the law of large numbers is only applied to means. Based on the plot you just produced, do you think it applies to medians?
(b) Change the statistic in the code to be a standard deviation. Do you think the law of large numbers applies to them also? Why?