Download the App!

Get 24/7 study help with the Numerade app for iOS and Android! Enter your email for an invite.

Sent to:
Search glass icon
  • Login
  • Textbooks
  • Ask our Educators
  • Study Tools
    Study Groups Bootcamps Quizzes AI Tutor iOS Student App Android Student App StudyParty
  • For Educators
    Become an educator Educator app for iPad Our educators
  • For Schools

Correlation and Regression

Correlation and regression both refer to the process of making a statement about the association between two variables. The two terms are also used interchangeably in some fields. In statistics, a correlation coefficient is a number that describes the strength and direction of the linear association between two variables. A correlation coefficient of 1 would indicate a perfect positive linear association; 0 would indicate a perfect negative linear association; and -1 would indicate a perfect negative linear association. A regression coefficient is a number that describes the strength and direction of the linear association between two variables. A regression coefficient of 1 would indicate a perfect positive linear association; 0 would indicate a perfect negative linear association; and -1 would indicate a perfect negative linear association. In both cases, the "sign" of the coefficient indicates the direction of the association. The terms "correlation" and "regression" are often used interchangeably, or in some cases, incorrectly. In statistics, correlations and regressions are two different techniques for analyzing the relationship between two variables. Correlation is a general term for any type of statistical relationship between two variables, while regression is the application of a particular statistical technique to analyze the relationship. In this sense, correlation and regression are not the same. The term "regression" is used to describe the specific technique used to analyze the relationship between two variables. In a regression analysis, a relationship that is of interest is represented by the equation of a line, which may be linear or nonlinear. The regression equation is a function of the independent variable(s) and the dependent variable(s). A nonlinear function is used when the relationship between dependent and independent variables is not a straight line. For example, the relationship between weight and height is nonlinear, so a model that describes this relationship using a parabola is appropriate. Correlation analysis is not a technique used to model relationships between variables. Instead, it is a technique used to determine whether two or more sets of data are "related" to each other, and, if so, how strong the relationship is. Correlation analysis is used to determine whether two variables are related, how strongly they are related, and if one variable is related to another.

Scatter Plots and Correlation

29 Practice Problems
View More
01:30
Elementary Statistics

Construct a scatterplot, and find the value of the linear correlation coefficient $r$ Also find the $P$ -value or the critical values of $r$ from Table $A$ -6. Use a significance level of $\alpha=0.05 .$ Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section $10-2$ exercises.)Repeat the preceding exercise using diameters and volumes.

Correlation and Regression
Correlation
Kaylee Mcclellan
01:36
Elementary Statistics

Construct a scatterplot, and find the value of the linear correlation coefficient $r$ Also find the $P$ -value or the critical values of $r$ from Table $A$ -6. Use a significance level of $\alpha=0.05 .$ Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section $10-2$ exercises.)Listed below are amounts of bills for dinner and the amounts of the tips that were left. The data were collected by students of the author. Is there sufficient evidence to conclude that there is a linear correlation between the bill amounts and the tip amounts? If everyone were to tip with the same percentage, what should be the value of $r ?$.$$\begin{array}{l|c|c|c|c|c|c}\hline \text { Bill (dollars) } & 33.46 & 50.68 & 87.92 & 98.84 & 63.60 & 107.34 \\\hline \text { Tip (dollars) } & 5.50 & 5.00 & 8.08 & 17.00 & 12.00 & 16.00 \\
\hline\end{array}$$.

Correlation and Regression
Correlation
Kaylee Mcclellan
01:15
Elementary Statistics

Construct a scatterplot, and find the value of the linear correlation coefficient $r$ Also find the $P$ -value or the critical values of $r$ from Table $A$ -6. Use a significance level of $\alpha=0.05 .$ Determine whether there is sufficient evidence to support a claim of a linear correlation between the two variables. (Save your work because the same data sets will be used in Section $10-2$ exercises.).A classic application of correlation involves the association between the temperature and the number of times a cricket chirps in a minute. Listed below are the numbers of chirps in 1 min and the corresponding temperatures in 'F (based on data from The Song of Insects, by George W. Pierce, Harvard University Press). Is there sufficient evidence to conclude that there is a linear correlation between the number of chirps in 1 min and the temperature?$$\begin{array}{l|c|c|c|c|c|c|c|c}\hline \text { Chirps in 1 min } & 882 & 1188 & 1104 & 864 & 1200 & 1032 & 960 & 900 \\\hline \text { Temperature ("F) } & 69.7 & 93.3 & 84.3 & 76.3 & 88.6 & 82.6 & 71.6 & 79.6 \\\hline\end{array}$$

Correlation and Regression
Correlation
Kaylee Mcclellan

Regression

19 Practice Problems
View More
02:04
Elementary Statistics

If a scatterplot reveals a nonlinear (not a straight line) pattern that you recognize as another type of curve, you may be able to apply the methods of this section. For the data given in the margin, find the linear equation $\left(\hat{y}=b_{0}+b_{1} x\right)$ that best fits the sample data, and find the logarithmic equation $(\hat{y}=a+b \ln x)$ that best fits the sample data. (Hint Begin by replacing each $x$ value with $\ln x$ ) Which of these two equations fies the data better? Why?
$$\begin{array}{l|cccc}
x & 2 & 48 & 377 & 4215 \\
\hline y & 1 & 4 & 6 & 10
\end{array}$$

Correlation and Regression
Regression
James Kiss
03:26
Elementary Statistics

We the same Appendix $B$ data sets as Exercises $29-32$ in Section $10-2 .$ In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted values following the prediction procedure summarized in Figure $10-5$
(FIGURE CAN'T COPY)
Refer to Data Set 4 in Appendix $B$ and use the tar and nicotine data from king six cigarettes. Find the best predicted amount of nicotine in a king size cigarette with $10 \mathrm{mg}$ of tar.

Correlation and Regression
Regression
James Kiss
01:43
Elementary Statistics

We the same Appendix $B$ data sets as Exercises $29-32$ in Section $10-2 .$ In each case, find the regression equation, letting the first variable be the predictor (x) variable. Find the indicated predicted values following the prediction procedure summarized in Figure $10-5$
(FIGURE CAN'T COPY)
Refer to Data Set 9 in Appendix B and use the paired data consisting of movie budget amounts and the amounts that the movies grossed. Find the best predicted amount that a movie will gross if is budget is $\$ 120$ million.

Correlation and Regression
Regression
James Kiss

Coefficient of Determination and Standard Error of the Estimate

11 Practice Problems
View More
08:26
Understandable Statistics, Concepts and Methods

When we take measurements of the same general type, a power law of the form $y=\alpha x^{\beta}$ often gives an excellent fit to the data. A lot of research has been conducted as to why power laws work so well in business, economics, biology, ecology, medicine, engineering, social science, and so on. Let us just say that if you do not have a good straight-line fit to data pairs $(x, y),$ and the scatter plot does not rise dramatically (as in exponential growth), then a power law is often a good choice. College algebra can be used to show that power law models become linear when we apply logarithmic transformations to both variables. To see how this is done, please read on. Note: For power law models, we assume all $x>0$ and all $y>0$
Suppose we have data pairs $(x, y)$ and we want to find constants $\alpha$ and
$\beta$ such that $y=\alpha x^{\beta}$ is a good fit to the data. First, make the logarithmic transformations $x^{\prime}=\log x$ and $y^{\prime}=\log y .$ Next, use the $\left(x^{\prime}, y^{\prime}\right)$ data pairs and a calculator with linear regression keys to obtain the least-squares equation $y^{\prime}=a+b x^{\prime} .$ Note that the equation $y^{\prime}=a+b x^{\prime}$ is the same as $\log y=a+b(\log x) .$ If we raise both sides of this equation to the power 10 and use some college algebra, we get $y=10^{a}(x)^{b} .$ In other words, for the power law model, we have $\alpha \approx 10^{a}$ and $\beta \approx b$ In the electronic design of a cell phone circuit, the buildup of electric current (Amps) is an important function of time (microseconds). Let $x=$ time in microseconds and let $y=$ Amps built up in the circuit at time $x .$ (a) Make the logarithmic transformations $x^{\prime}=\log x$ and $y^{\prime}=\log y .$ Then make a scatter plot of the $\left(x^{\prime}, y^{\prime}\right)$ values. Does a linear equation seem to be a good fit to this plot? (b) Use the $\left(x^{\prime}, y^{\prime}\right)$ data points and a calculator with regression keys to find the leastsquares equation $y^{\prime}=a+b x^{\prime} .$ What is the sample correlation coefficient?
(c) Use the results of part (b) to find estimates for $\alpha$ and $\beta$ in the power law $y=\alpha x^{\beta} .$ Write the power law giving the relationship between time and Amp buildup.
Note: The TI-84Plus/TI-83Plus/TI-nspire calculators fully support the power law model. Place the original $x$ data in list $L 1$ and the corresponding $y$ data in list $L 2 .$ Then press STAT, followed by CALC, and scroll down to option $\mathbf{A}:$ Pwr Reg. The output gives values for $\alpha, \boldsymbol{\beta},$ and the sample correlation coefficient $r$.

Correlation and Regression
Linear Regression and the Coefficient of Determination
Carolyn Behr-Jerome
02:32
Understandable Statistics, Concepts and Methods

Please do the following.
(a) Draw a scatter diagram displaying the data.
(b) Verify the given sums $\Sigma x, \Sigma y, \Sigma x^{2}, \Sigma y^{2},$ and $\Sigma x y$ and the value of the sample correlation coefficient $r$
(c) Find $\bar{x}, \bar{y}, a,$ and $b .$ Then find the equation of the least-squares line $\hat{y}=a+b x$
(d) Graph the least-squares line on your scatter diagram. Be sure to use the point $(\bar{x}, \bar{y})$ as one of the points on the line.
(e) Interpretation Find the value of the coefficient of determination $r^{2} .$ What percentage of the variation in $y$ can be explained by the corresponding variation in $x$ and the least-squares line? What percentage is unexplained? Answers may vary slightly due to rounding.
Cricket Chirps: Temperature Anyone who has been outdoors on a summer evening has probably heard crickets. Did you know that it is possible to use the cricket as a thermometer? Crickets tend to chirp more frequently as temperatures increase. This phenomenon was studied in detail by George W. Pierce, a physics professor at Harvard. In the following data, $x$ is a random variable representing chirps per second and $y$ is a random variable representing temperature
('F). These data are also available for download at the Online Study Center.Complete parts (a) through (e), given $\Sigma x=249.8, \Sigma y=1200.6$ $\Sigma x^{2}=4200.56, \Sigma y^{2}=96,725.86, \Sigma x y=20,127.47,$ and $r \approx 0.835$
(f) What is the predicted temperature when $x=19$ chirps per second?

Correlation and Regression
Linear Regression and the Coefficient of Determination
Tyler Moulton
02:22
Understandable Statistics, Concepts and Methods

Please do the following.
(a) Draw a scatter diagram displaying the data.
(b) Verify the given sums $\Sigma x, \Sigma y, \Sigma x^{2}, \Sigma y^{2},$ and $\Sigma x y$ and the value of the sample correlation coefficient $r$
(c) Find $\bar{x}, \bar{y}, a,$ and $b .$ Then find the equation of the least-squares line $\hat{y}=a+b x$
(d) Graph the least-squares line on your scatter diagram. Be sure to use the point $(\bar{x}, \bar{y})$ as one of the points on the line.
(e) Interpretation Find the value of the coefficient of determination $r^{2} .$ What percentage of the variation in $y$ can be explained by the corresponding variation in $x$ and the least-squares line? What percentage is unexplained? Answers may vary slightly due to rounding.
Education: Violent Crime The following data are based on information from the book Life in America's Small Cities (by G. S. Thomas, Prometheus Books). Let $x$ be the percentage of $16-$ to 19 -year-olds not in school and not high school graduates. Let $y$ be the reported violent crimes per 1000 residents. Six small cities in Arkansas (Blytheville, El Dorado, Hot Springs, Jonesboro, Rogers, and Russellville) reported the following information about $x$ and $y:$ Complete parts (a) through (e), given $\Sigma x=112.8, \Sigma y=32.4$ $\Sigma x^{2}=2167.14, \Sigma y^{2}=290.14, \Sigma x y=665.03,$ and $r \approx 0.764$
(f) If the percentage of $16-$ to 19 -year-olds not in school and not graduates reaches $24 \%$ in a similar city, what is the predicted rate of violent crimes per 1000 residents?

Correlation and Regression
Linear Regression and the Coefficient of Determination
Tyler Moulton

Multiple Regression

6 Practice Problems
View More
02:08
Elementary Statistics

A confidence interval for the regression cocfficicnt $\boldsymbol{\beta}_{1}$ is expressed as
$$b_{1}-E<\beta_{1}< b_{1}+E$$
where
$$E=t_{\alpha / 2}^{3} k_{1}$$
The critical $t$ score is found using $n-(k+1)$ degrees of freedom, where $k$, $n$, and $s_{h}$ are as described in Exereise $17 .$ Use the sample data in Table $10-6$ and the Minitab display in Example 1 to construct $95 \%$ confidence interval estimates of $\boldsymbol{\beta}_{1}$ (the coefficient for the variable representing height of the mother ) and $\beta_{2}$ (the coefficient for the variable representing height of the father). Does either confidence interval include 0 , suggesting that the variable be eliminated from the regression equation?

Correlation and Regression
Multiple Regression
01:36
Elementary Statistics

Refer to Data Set 9 in Appendix B and find the best regression equation with movie gross amount (in millions of dollars) as the response (y) variable. Ignore the MPAA ratings. Why is this equation best? Is this "best" equation good for predicting the amount of money that a movie will gross? Does the combination of predictor variables make sense?

Correlation and Regression
Multiple Regression
01:06
Elementary Statistics

Identify the multiple regression equation that expresses the amount of nicotine in terms of the amount of tar and carbon monoxide (CO).

Correlation and Regression
Multiple Regression
James Kiss

Get 24/7 study help with our app

 

Available on iOS and Android

About
  • Our Story
  • Careers
  • Our Educators
  • Numerade Blog
Browse
  • Bootcamps
  • Books
  • Notes & Exams NEW
  • Topics
  • Test Prep
  • Ask Directory
  • Online Tutors
  • Tutors Near Me
Support
  • Help
  • Privacy Policy
  • Terms of Service
Get started