A. Define units = c(1,2,3,4) and probs = c(.3,.5,.1,.1) and use vector arithmetic in R to quickly compute E(x) and Var(x). (You should get 2 and .8). Execute plot(units, probs, lwd=5, col="red", type="h") to display the PMF. Show your code, results, and plot.
Note you can generate a sample of size, say, 10, taken from units with respective probabilities probs by executing sample(units, 10, replace=TRUE, prob=probs). So e.g. if you execute hist(sample(units, 10000, replace=TRUE, prob=probs)) then the resulting histogram should mimic the PMF plotted in Part A.
B. [2pts] Suppose that T is the total number of units rented by the next 100 customers. Compute the theoretical mean and standard deviation of T. Simulate T (using, e.g., replicate(), sum(), and sample(), say with 10000 replications) and verify the results are close to their predicted values. Find the theoretical normal distribution that the CLT predicts will approximate the distribution of T. Overlay it on the density histogram, verify the two are close, and display the resulting plot. You can search Google for how to overlay a normal density on a density histogram - the first result says that curve(dnorm(x, mean=m, sd=s), col="darkblue", lwd=2, add=TRUE, yaxt="n") will overlay a normal PDF with mean m and SD s. Show all code, calculations, plots.
C. Approximate the probability that T is over 195 from the simulated results. Compare this to the value predicted by the CLT. Show all code and results. Hint: To compute this using simulated values, once you have your 10000 simulated values of T you can estimate, e.g., P(T > 195) by executing the statistical test mean(values > 195) (where values is your vector of 10000 simulated values of T). Note: T is discrete so P(T > 195) = P(T >= 196). You may notice a discrepancy in your two results which can be accounted for by computing P(T > 195.5) (using the normal approximation). This shift by a half unit roughly interpolates between P(T > 195) and P(T >= 196) and should give a better approximation. It is known as a "continuity correction."
D. Suppose that Tate's will stay in business if their next 100 customers purchase more than 185 units, and they will be profitable if their next 100 customers purchase more than 195 units. Let's find the probability they are profitable given they stay in business in two different ways: First using your simulated values of T, and secondly using the CLT and the theoretically predicted normal distribution. Compare these two conditional probabilities. Show all code and results.
E. [Extra Credit, 1pt] If you thought the last part was hard then here's a tougher one: How many customers does Tate's need to service before the probability that the mean number of units rented per customer is less than 1.85 is less than 10%? In other words, if X_n is the mean number of units rented by the next n customers, what is the smallest n for which P(X_n < .1)? It is possible to estimate this by simulation, or by using the normal approximation that comes about by the CLT (assuming the number is at least large enough for the CLT to give a good approximation, which turns out to be the case). I would be impressed if you could compute using either method, and even more impressed if you got the similar answers. For an additional challenge see if you can display a plot of the values of P(X_n < .1) for various values of n, say n = 1, 2, ..., 100. You should be able to see about where the probabilities fall below 10%, and if you connect the points with lines it should show an interesting decreasing sawtooth pattern.
Tate's Rents Example Tate's Rents loans out heavy equipment. Let X be the number of units purchased by a random customer and suppose X is supported on {1,2,3,4} with probabilities
P(X=1)=.3, P(X=2)=.5, P(X=3)=.1, P(X=4)=.1.