Since 1980s, sabermetricians have used a summary statistic different from batting average to evaluate players. They realized walks were important and that doubles, triples and HR, should be weighed more than singles. As a result, they proposed the following metric:
BB/PA + (Singles + 2Doubles + 3Triples + 4HR)/AB
They called this on-base-percentage plus slugging percentage (OPS). Although the sabermetricians probably did not use regression, here we show how this metric is close to what one gets with regression.
1. Compute the OPS for each team in the 2001 season. Then plot Runs per game versus OPS.
2. For every year since 1961, compute the correlation between runs per game and OPS then plot these correlations as a function of year.
3. Note that we can rewrite OPS as a weighted average of BB, singles, doubles, triples and HR. We know that the weight for doubles, triples, and HR are 2, 3 and 4 times that of singles. But what about BB? What is the weight for BB relative to singles. Hint: the weight for BB relative to singles will be a function of AB and PA.
4. Note that the weight for BB, AB/PA, will change from team to team. To see how variable it is, compute and plot this quantity for each team for each year since 1961. Then plot it again, but instead of computing it for every team, compute and plot the ratio for the entire year. Then, once you are convinced that there is not much of a time or team trend, report the overall average.
5. So now we know that the formula for OPS is proportional to 0.91 x BB + singles + 2 x doubles + 3 x triples + 4 x HR. Let's see how these coefficients compare to those obtained with regression. Fit a regression model to the data after 1961, as done earlier: using per game statistics for each year for each team. After fitting this model, report the coefficients as weights relative to the coefficient for singles.