Pearson product-moment correlation coefficient

Ti Wikipédia, énsiklopédia bébas
Jump to navigation Jump to search
Panneau travaux.png Artikel ieu keur dikeureuyeuh, ditarjamahkeun tina basa Inggris.
Bantosanna diantos kanggo narjamahkeun.

Dina matematik, sarta dina sabagéan statistik, the Pearson product-moment correlation coefficient (r) is a méasure of how well a linear equation describes the relation between two variables X and Y méasured on the same object or organism. It is defined as the sum of the products of the skor standard of the two méasures divided by the degrees of freedom:

The result obtained is equivalent to dividing the covariance between the two variables by the product of their standard deviations. In general the quantity of a correlation coefficient is the square root of the coefficient of determination (r2), which is the ratio of explained variation to total variation:

where:

Y = a score on a random variable Y
Y' = corresponding predicted value of Y, given the correlation of X and Y and the value of X
= mean of Y

The correlation coefficient adds a sign to show the direction of the relationship. The formula for the Péarson coefficient conforms to this definition, and applies when the relationship is linéar.

The coefficient ranges from -1 to 1. A value of 1 shows that a linéar equation describes the relationship perfectly and positively, with all data points lying on the same line and with Y incréasing with X. A score of -1 shows that all data points lie on a single line but that Y incréases as X decréases. A value of 0 shows that a linéar modél is inappropriate – that there is no linéar relationship between the variables.

The Péarson coefficient is a statistic which estimates the correlation of the two given random variables.

The linéar equation that best describes the relationship between X and Y can be found by linear regression. If X and Y are both normally distributed, this can be used to "predict" the value of one méasurement from knowledge of the other. That is, for éach value of X the equation calculates a value which is the best estimate of the values of Y corresponding the specific value of X. We denote this predicted variable by Y.

Any value of Y can therefore be defined as the sum of Y and the difference between Y and Y:

The varian of Y is equal to the sum of the variance of the two components of Y:

Since the coefficient of determination implies that sy.x2 = sy2(1 − r2) we can derive the identity

The square of r is conventionally used as a méasure of the strength of the association between X and Y. For example, if the coefficient is .90, then 81% of the variance of Y is said to be explained by the changes in X and the linéar relation between X and Y.

r is a statistik parametrik. It assumes that the variables being assessed are normally distributed. If this assumption is violated, a non-parametric alternative such as Spearman's ρ may be more successful in detecting a linéar relationship.