113 Math
113.030 Statistics - Correlation and correlation coefficient

Correlation and correlation coefficient

On a scatterplot, when the y variable increases as the x variable increases we say there is a positive correlation between the variables

r = Correlation coefficient

  • r is a measurement of how close a linear line fits through points on a scatterplot.
    • "correlation coefficient is a measure of how well a line can describe the relationship between X and Y"
  • r will always be -1 <= r >= 1 between -1 and 1
  • r = 1 - When one variable gets larger the other variable gets larger as well and you can draw a perfect line between multiple points in a scatterplot
  • r = 0 - You can't fit a line at all. It is not linear in any way
  • r < 0 when the linear line is negative
  • r > 0 when the linear line is positive
  • The closer that a line can be fit directly through all points on the scatterplot, the closer it is to 1.
  • The more random the dots and farther away from a line they are, the closer it is to 0
  • The line always goes through the mean of X and Y
  • It is trying to "minimize the square of the distance between each of the points on the scatterplot"

To calculate r
$$r = \frac{1}{n - 1} \sum{(\frac{(x_i - \bar{x})}{s_x})(\frac{(y_i - \bar{y})}{s_y})}$$
- Reads as: r equals 1 over n minus 1 times the sum of the z-score for each xy pair in the set

  • statistics