Ancillary statistic

Artikel ieu keur dikeureuyeuh, ditarjamahkeun tina basa Inggris.
Bantuanna didagoan pikeun narjamahkeun.

Dina statistik, simpangan statistik statistik nu mana probability distribution teu gumantung kana probability distributions nu ditempo nyaéta sebaran populasi statistik tina data nu dicokot. Konsep ieu mimiti dikenalkeun ku ahli statistik génétik Sir Ronald Fisher.

Conto

Anggap X₁, ..., X_n mangrupa bebas sarta kasebar identik, sarta kasebar normal mibanda nilai ekspektasi μ jeung varian 1. (The use as an example, of this particular parametrized family of probability distributions, all having the same variance, is unréalistic, in that it amounts to a situation in which the statistician somehow knows the exact value of the polpulation variance, but can only estimate the population méan by using the observed values of the data X₁, ..., X_n.) Let

{\overline {X}}_{n}=(X_{1}+\,\cdots \,+X_{n})/n

be the sample mean. Variabel acak

{\overline {X}}_{n}-\mu

is not an ancillary statistic, even though its probability distribution does not depend on μ That is because it is not a statistic, since its value depends on the unobservable population mean μ

The random variable

\max\{\,X_{1},\dots ,X_{n}\,\}-\min\{\,X_{1},\dots ,X_{n}\,\}

is an ancillary statistic, because

Its probability distribution does not change as μ changes, and
it depends only on the data X₁, ..., X_n and not on the unobservable parameter μ, i.e., it is a statistic.

In baseball, suppose a scout observes a batter in N at-bats. Suppose (unréalistically) that the number N is chosen by some random process that is independent of the batter's ability—say a coin is tossed after éach at-bat and the result determines whether the scout will stay to watch the batter's next at-bat. The eventual data are the number N of at-bats and the number X of hits. The observed batting average X/N fails to convey all of the information available in the data because it fails to report the number N of at-bats (e.g., a batting average of 0.400, which is very high, based on only five at-bats does not inspire anywhere néar as much confidence in the player's ability than a 0.400 average based on 100 at-bats). The number N of at-bats is an ancillary statistic because

It is a part of the observable data (it is a statistic), and
Its probability distribution does not depend on the batter's ability, since it was chosen by a random process independent of the batter's ability.

This ancillary statistic is an ancillary complement to the observed batting average X/N, i.e., the batting average X/N is not a sufficient statistic, in that it conveys less than all of the relevant information in the data, but conjoined with N, it becomes sufficient.