Folder:

113 Math

File:

113.030 Statistics - Correlation and correlation coefficient

# Correlation and correlation coefficient

On a scatterplot, when the `y`

variable increases as the `x`

variable increases we say there is a positive correlation between the variables

`r`

= Correlation coefficient

`r`

is a measurement of how close a linear line fits through points on a scatterplot.- "correlation coefficient is a measure of how well a line can describe the relationship between X and Y"

`r`

will*always*be`-1 <= r >= 1`

between -1 and 1`r = 1`

- When one variable gets larger the other variable gets larger as well and you can draw a perfect line between multiple points in a scatterplot`r = 0`

- You can't fit a line at all. It is not linear in any way`r < 0`

when the linear line is negative`r > 0`

when the linear line is positive- The closer that a line can be fit directly through all points on the scatterplot, the closer it is to 1.
- The more random the dots and farther away from a line they are, the closer it is to 0
- The line
*always*goes through the mean of X and Y - It is trying to "minimize the square of the distance between each of the points on the scatterplot"

**To calculate r**

$$r = \frac{1}{n - 1} \sum{(\frac{(x_i - \bar{x})}{s_x})(\frac{(y_i - \bar{y})}{s_y})}$$

- Reads as: r equals 1 over n minus 1 times the sum of the z-score for each xy pair in the set

##### Tags:

- statistics