Folder:
113 Math
File:
113.030 Statistics - Correlation and correlation coefficient

# Correlation and correlation coefficient

On a scatterplot, when the y variable increases as the x variable increases we say there is a positive correlation between the variables

## r = Correlation coefficient

• r is a measurement of how close a linear line fits through points on a scatterplot.
• "correlation coefficient is a measure of how well a line can describe the relationship between X and Y"
• r will always be -1 <= r >= 1 between -1 and 1
• r = 1 - When one variable gets larger the other variable gets larger as well and you can draw a perfect line between multiple points in a scatterplot
• r = 0 - You can't fit a line at all. It is not linear in any way
• r < 0 when the linear line is negative
• r > 0 when the linear line is positive
• The closer that a line can be fit directly through all points on the scatterplot, the closer it is to 1.
• The more random the dots and farther away from a line they are, the closer it is to 0
• The line always goes through the mean of X and Y
• It is trying to "minimize the square of the distance between each of the points on the scatterplot"

To calculate r
$$r = \frac{1}{n - 1} \sum{(\frac{(x_i - \bar{x})}{s_x})(\frac{(y_i - \bar{y})}{s_y})}$$
- Reads as: r equals 1 over n minus 1 times the sum of the z-score for each xy pair in the set

• statistics