Correlation and correlation coefficient
On a scatterplot, when the y
variable increases as the x
variable increases we say there is a positive correlation between the variables
r
= Correlation coefficient
r
is a measurement of how close a linear line fits through points on a scatterplot.- “correlation coefficient is a measure of how well a line can describe the relationship between X and Y”
r
will always be-1 <= r >= 1
between -1 and 1r = 1
- When one variable gets larger the other variable gets larger as well and you can draw a perfect line between multiple points in a scatterplotr = 0
- You can’t fit a line at all. It is not linear in any wayr < 0
when the linear line is negativer > 0
when the linear line is positive- The closer that a line can be fit directly through all points on the scatterplot, the closer it is to 1.
- The more random the dots and farther away from a line they are, the closer it is to 0
- The line always goes through the mean of X and Y
- It is trying to “minimize the square of the distance between each of the points on the scatterplot”
To calculate r
\(r = \frac{1}{n - 1} \sum{(\frac{(x_i - \bar{x})}{s_x})(\frac{(y_i - \bar{y})}{s_y})}\)
- Reads as: r equals 1 over n minus 1 times the sum of the z-score for each xy pair in the set
Graph:
- 113.030 Statistics - Correlation and correlation coefficient to 113.031 Statistics - Linear Regression
- 113.020 Math - Statistics to 113.030 Statistics - Correlation and correlation coefficient