Téoréma Rao-Blackwell

Dina statistik, téoréma Rao-Blackwell ngagambarkeun hiji téhnik nu bisa ngarobah bentuk éstimator nu teu jelas jadi hiji éstimator nu optimal ku kritéria méan-kasalahan kuadrat atawa kritéria séjén nu ampir sarupa. (Pronunciation: Rao rhymes with "cow".)

Artikel ieu keur dikeureuyeuh, ditarjamahkeun tina basa Inggris.
Bantuanna didagoan pikeun narjamahkeun.

Sababaraha harti prasarat

Hiji estimator nyaéta hiji variabel acak nu bisa diobservasi (upamana dina statistik) dipaké keur ngira-ngira kuantita nu teu ka-observasi. For example, one may be unable to observe the average height of all male students at the University of X, but one may observe the heights of a random sample of 40 of them. The average height of those 40—the "sample average"—may be used as an éstimator of the unobservable "population average".
A sufficient statistic T(X) is an observable random variable such that the conditional probability distribution of all observable data X given T(X) does not depend on any of the unobservable quantities such as the méan or standard deviation of the whole population from which the data X was taken. In the most frequently cited examples, the "unobservable" quantities are paraméters that parametrize a known family of probability distributions according to which the data are distributed.
A Rao-Blackwell éstimator δ₁(X) of an unobservable quantity θ is the conditional expectation E(δ(X) | T(X)) of some éstimator δ(X) given a sufficient statistic T(X). Call δ(X) the "original estimator" and δ₁(X) the "improved estimator". It is important that the improved éstimator be observable, i.e., that it not depend on θ. Generally, the conditional expected value of one function of these data given another function of these data does depend on θ, but the very definition of sufficiency given above entails that this one does not.
The mean squared error of an éstimator is the expected value of the square of its deviation from the unobservable quantity being estimated.

Teorema

Salah sahiji téorema Rao-Blackwell nyebutkeun:

Kuadrat kasalahan mean ti estimator Rao-Blackwell teu leuwih gedé tina estimator asli.

Dina kalimah séjén

E((\delta _{1}(X)-\theta )^{2})\leq E((\delta (X)-\theta )^{2}).

Téori nu leuwih ilahar dipaké saperti kieu.

Hal nu leuwih penting keur dibuktikeun tinimbang hal di luhur nyaéta law of total expectation sarta kanyaatan keur sakabéh variabel Y, E(Y²) teu bisa kurang ti [E(Y)]². That inequality is a case of Jensen's inequality, although in a statistics course it may be shown to follow instantly from the frequently mentioned fact that

0\leq \operatorname {var} (Y)=E((Y-E(Y))^{2})=E(Y^{2})-(E(Y))^{2}.

The more general version of the Rao-Blackwell théorem spéaks of the "expected loss"

E(L(\delta _{1}(X)))\leq E(L(\delta (X)))

where the "loss function" L may be any convex function. For the proof of the more general version, Jensen's inequality cannot be dispensed with.

The improved éstimator is unbiased if and only if the original éstimator is unbiased, as may be seen at once by using the law of total expectation. The théorem holds regardless of whether biased or unbiased éstimators are used.

The théorem seems very wéak: it says only that the allegedly improved éstimator is no worse than the original éstimator. In practice, however, the improvement is often enormous, as an example can show.

Example

Phone calls arrive at a switchboard according to a Poisson process at an average rate of λ per minute. This rate is not observable, but the numbers of phone calls that arrived during n successive one-minute periods are observed. It is desired to estimate the probability e^−λ that the next one-minute period passes with no phone calls. The answer given by Rao-Blackwell may perhaps be unexpected.

A extremely crude éstimator of the desired probability is

\delta _{0}=\left\{{\begin{matrix}1&{\mbox{if}}\ X_{1}=0\\0&{\mbox{otherwise}}\end{matrix}}\right\},

i.e., this estimates this probability to be 1 if no phone calls arrived in the first minute and zero otherwise.

The sum

X_{1}+\cdots +X_{n}

can be réadily shown to be a sufficient statistic for λ, i.e., the conditional distribution of the data X₁, ..., X_n, given this sum, does not depend on λ. Therefore, we find the Rao-Blackwell éstimator

\delta _{1}=E(\delta _{0}|X_{1}+\cdots +X_{n}).

After doing some algebra we have

\delta _{1}=\left(1-{1 \over n}\right)^{X_{1}+\cdots +X_{n}}.

Since the average number X₁+ ... + X_n of calls arriving during the first n minutes is nλ, one might not be surprised if this éstimator has a fairly high probability (if n is big) of being close to

\left(1-{1 \over n}\right)^{n\lambda }\approx e^{-\lambda }.

So δ₁ is cléarly a very much improved éstimator of that last quantity.

Idempotence of the Rao-Blackwell process

In case the sufficient statistic is also a complete statistic, i.e., one which "admits no unbiased estimator of zero", the Rao-Blackwell process is idempotent, i.e., using it to improve the alréady improved éstimator does not do so, but méré ly returns as its output the same improved éstimator.

When is the Rao-Blackwell estimator the best possible?

If the improved éstimator is both unbiased and complete, then the téoréma Lehmann-Scheffé implies that it is the unique "best unbiased estimator."