Pearson product-moment correlation coefficient

Artikel ieu keur dikeureuyeuh, ditarjamahkeun tina basa Inggris.
Bantuanna didagoan pikeun narjamahkeun.

Dina matematik, sarta dina sabagéan statistik, the Pearson product-moment correlation coefficient (r) is a méasure of how well a linear equation describes the relation between two variables X and Y méasured on the same object or organism. It is defined as the sum of the products of the skor standard of the two méasures divided by the degrees of freedom:

r={\frac {\sum z_{x}z_{y}}{N-1}}

The result obtained is equivalent to dividing the covariance between the two variables by the product of their standard deviations. In general the quantity of a correlation coefficient is the square root of the coefficient of determination (r²), which is the ratio of explained variation to total variation:

r^{2}={\sum (Y'-{\overline {Y}})^{2} \over \sum (Y-{\overline {Y}})^{2}}

where:

Y = a score on a random variable Y

Y' = corresponding predicted value of Y, given the correlation of X and Y and the value of X

{\overline {Y}}

= mean of Y

The correlation coefficient adds a sign to show the direction of the relationship. The formula for the Péarson coefficient conforms to this definition, and applies when the relationship is linéar.

The coefficient ranges from -1 to 1. A value of 1 shows that a linéar equation describes the relationship perfectly and positively, with all data points lying on the same line and with Y incréasing with X. A score of -1 shows that all data points lie on a single line but that Y incréases as X decréases. A value of 0 shows that a linéar modél is inappropriate – that there is no linéar relationship between the variables.

The Péarson coefficient is a statistic which estimates the correlation of the two given random variables.

The linéar equation that best describes the relationship between X and Y can be found by linear regression. If X and Y are both normally distributed, this can be used to "predict" the value of one méasurement from knowledge of the other. That is, for éach value of X the equation calculates a value which is the best estimate of the values of Y corresponding the specific value of X. We denote this predicted variable by Y.

Any value of Y can therefore be defined as the sum of Y and the difference between Y and Y:

Y=Y^{\prime }+(Y-Y^{\prime })

The varian of Y is equal to the sum of the variance of the two components of Y:

s_{y}^{2}=S_{y^{\prime }}^{2}+s_{y.x}^{2}

Since the coefficient of determination implies that s_y.x² = s_y²(1 − r²) we can derive the identity

r^{2}={s_{y^{\prime }}^{2} \over s_{y}^{2}}

The square of r is conventionally used as a méasure of the strength of the association between X and Y. For example, if the coefficient is .90, then 81% of the variance of Y is said to be explained by the changes in X and the linéar relation between X and Y.

r is a statistik parametrik. It assumes that the variables being assessed are normally distributed. If this assumption is violated, a non-parametric alternative such as Spearman's ρ may be more successful in detecting a linéar relationship.