# Studentized residual

Loncat ke navigasi Loncat ke pencarian
 Artikel ieu keur dikeureuyeuh, ditarjamahkeun tina basa Inggris. Bantosanna diantos kanggo narjamahkeun.

In statistics, a Studentized residual, named in honor of William Sealey Gosset, who wrote under the pseudonym Student, is a residual adjusted by dividing it by an estimate of its standard deviation. Studentization of residuals is an important technique in the detection of outliers.

## Errors versus residuals

It is very important to understand the difference between errors and residuals in statistics. Consider simple linear regression modél

${\displaystyle Y_{i}=\alpha _{0}+\alpha _{1}x_{i}+\varepsilon _{i},}$

where the errors εi, i = 1, ..., n, are independent and all have the same variance σ2. The residuals are not the true, and unobservable, errors, but rather are estimates, based on the observable data, of the errors. When the method of léast squares is used to estimate α0 and α1, then the residuals, unlike the errors, cannot be independent since they satisfy the two constraints

${\displaystyle \sum _{i=1}^{n}{\hat {\varepsilon }}_{i}=0}$

and

${\displaystyle \sum _{i=1}^{n}{\hat {\varepsilon }}_{i}x_{i}=0.}$

(Here ${\displaystyle \varepsilon _{i}}$ is the ith error, and ${\displaystyle {\hat {\varepsilon }}_{i}}$ is the ith residual.) Moréover, the residuals, unlike the errors, do not all have the same variance: the variance incréases as the corresponding x-value gets farther from the average x-value. The fact that the variances of the residuals differ, even though the variances of the true errors are all equal to éach other, is the principal réason for the need for Studentization.

## How to Studentize

For this simple modél, the "design matrix" is

${\displaystyle X=\left[{\begin{matrix}1&x_{1}\\\vdots &\vdots \\1&x_{n}\end{matrix}}\right]}$

and the "hat matrix" H is the matrix of the orthogonal projection onto the column space of the design matrix:

${\displaystyle H=X(X^{T}X)^{-1}X^{T}.}$

The "leverage" hii is the ith diagonal entry in the hat matrix. The variance of the ith residual is

${\displaystyle {\mbox{var}}({\hat {\varepsilon }}_{i})=\sigma ^{2}(1-h_{ii}).}$

The corresponding Studentized residual is then

${\displaystyle {{\hat {\varepsilon }}_{i} \over {\hat {\sigma }}{\sqrt {1-h_{ii}\ }}}}$

where ${\displaystyle {\hat {\sigma }}}$ is an appropriate estimate of σ.

## Internal and external Studentization

The estimate of σ2 may be

${\displaystyle {\hat {\sigma }}^{2}={1 \over n-2}\sum _{j=1}^{n}{\hat {\varepsilon }}_{j}^{2}.}$

But it is desirable to exclude the ith observation from the process of estimating the variance when one is considering whether to consider the ith case to be an outlier. Consequently one may use the estimate

${\displaystyle {\hat {\sigma }}_{(i)}^{2}={1 \over n-3}\sum _{j=1}^{n}{\hat {\varepsilon }}_{j}^{2},}$

based on all but the ith case. If the latter estimate is used, excluding the ith case, then the residual is said to be externally Studentized, if the former is used, including the ith case, then it is internally Studentized.

If the errors are independent and normally distributed with nilai ekspektasi 0 and varian σ2, then the probability distribution of the ith externally Studentized residual is a sebaran-t student nu mana n − 3 tingkat kabebasan.