Sebaran normal

Artikel ieu keur dikeureuyeuh, ditarjamahkeun tina basa Inggris.
Bantuanna didagoan pikeun narjamahkeun.

Probability density function of Gaussian distribution (bell curve).

Normal distribution (distribusi normal) mangrupa hal anu penting dina probability distribution di loba widang. Biasa ogé disebut Gaussian distribution, hususna dina widang fisika jeung rékayasa. Dina kaayaan sabenerna kumpulan distribusi mibanda bentuk anu sarupa, bédana ngan dina paraméter location jeung scale: mean jeung simpangan baku. Standard normal distribution nyaéta distribusi normal anu mibanda nilai mean sarua jeung nol sarta nilai standar deviasi sarua jeung hiji. Sabab bentuk grafik dénsitas probabilitas mangrupa bell, sering disebut bell curve.

Sajarah

Distribusi normal mimiti dikenalkeun ku de Moivre dina artikel taun 1733 (dicitak ulang edisi kaduana dina The Doctrine of Chances, 1738) dina kontek "pendekatan" sebaran binomial keur n anu loba. Hasil de Moivre diteruskeun ku Laplace dina bukuna Analytical Theory of Probabilities (1812), mangsa kiwari disebut Theorem of de Moivre-Laplace.

Laplace ngagunakeun distribusi normal keur analysis of errors dina percobaanna. Method of least squares nu kacida pentingna dikenalkeun ku Legendre dina taun 1805. Gauss, ogé ngakukeun yén manéhna geus maké métodeu anu sarua ti mimiti taun 1794, justified it rigorously in 1809 by assuming a normal distribution of the errors.

Istilah "bell curve" ngacu ka Jouffret nu ngagunakeun watesan "bell surface" dina taun 1872 keur bivariate normal dina komponen bébas (independent). Istilah "sebaran normal" "ditemukan" sacara sewang-sewangan ku Charles S. Peirce, Francis Galton jeung Wilhelm Lexis kira-kira taun 1875 [Stigler]. This terminology is unfortunate, since it reflects and encourages the fallacy that "everything is Gaussian". (See the discussion of "occurrence" below).

Yen sebaran disebut sebaran normal atawa Gaussian, ngagantikeun sebaran de Moivrean, is just an instance of Stigler's law of eponymy: "No scientific discovery is named after its original discoverer".

Spesifikasi sebaran normal

Aya sababaraha jalan keur nangtukeun random variable. Anu paling ngagambarkeun nyaéta probability density function (plot at the top), which represents how likely éach value of the random variable is. The cumulative density function is a conceptually cléaner way to specify the same information, but to the untrained eye its plot is much less informative (see below). Equivalent ways to specify the normal distribution are: the moments, the cumulants, the characteristic function, the moment-generating function, and the cumulant-generating function. Some of these are very useful for théoretical work, but not intuitive. See probability distribution for a discussion.

All of the cumulants of the normal distribution are zero, except the first two.

Fungsi probabiliti densiti

Fungsi dénsitas probabilitas dina sebaran normal nu mana méan μ jeung simpangan baku σ (sarua jeung, varian σ²) mangrupa conto fungsi Gauss,

f(x)={1 \over \sigma {\sqrt {2\pi }}}\,e^{-{(x-\mu )^{2}/2\sigma ^{2}}}

(Tempo ogé fungsi eksponensial jeung pi.) Lamun variabel acak X mibanda distribusi ieu, bisa dituliskeun X ~ N(μ, σ²). Lamun μ = 0 jeung σ = 1, distribusi disebut distribusi standar normal, rumusna

f(x)={1 \over {\sqrt {2\pi }}}\,e^{-{x^{2}/2}}

Gambar di luhur nunjukkeun grafik probability density function tina sebaran normal nu mana μ = 0 jeung sababaraha nila σ.

For all normal distributions, the density function is symmetric about its méan value. About 68% of the aréa under the curve is within one standard deviation of the méan, 95.5% within two standard deviations, and 99.7% within three standard deviations. The inflection points of the curve occur at one standard deviation away from the méan.

Fungsi Sebaran Kumulatif

Fungsi sebaran kumulatif (saterusna disebut cdf) hartina probabilitas di mana nilai variabel X leuwih leutik tinimbang x, jeung digambarkeun dina watesan fungsi densiti nyaéta

\Pr(X\leq x)=\int _{-\infty }^{x}{\frac {1}{\sigma {\sqrt {2\pi }}}}e^{-(u-\mu )^{2}/(2\sigma ^{2})}\,du

Standar normal cdf, sacara konvensional dilambangkeun ku $\Phi$ , mangrupa nilai cdf umum di-evaluasi ku $\mu =0$ jeung $\sigma =1$ ,

\Phi (z)=\int _{-\infty }^{z}{1 \over {\sqrt {2\pi }}}\,e^{-{x^{2}/2}}\,dx

The standard normal cdf can be expressed in terms of a special function called the error function, as

\Phi (z)={\frac {1}{2}}\left(1+\operatorname {erf} \,{\frac {z}{\sqrt {2}}}\right)

The following graph shows the cumulative distribution function for values of z from -4 to +4:

On this graph, we see the probability that a standard normal variable has a value less than 0.25 is approximately equal to 0.60.

Generating functions

Moment generating function

Fungsi karakteristik

Fungsi karakteristik dihartikeun salaku nilai ekspektasi $e^{itX}$ . Keur sebaran normal, ieu bisa ditembongkeun dina fungsi karakteristik nyaéta

\phi _{X}(t)=E\left[e^{itX}\right]=\int _{-\infty }^{\infty }{\frac {1}{\sigma {\sqrt {2\pi }}}}\,e^{-{(x-\mu )^{2}/2\sigma ^{2}}}\,e^{itx}\,dx=e^{i\mu t-\sigma ^{2}t^{2}/2}

saperti nu katempo ku kuadrat eksponen nu lengkep.

Pasipatan

Lamun X ~ N(μ, σ²) sarta a sarta b mangrupa wilangan riil, mangka aX + b ~ N(aμ + b, (aσ)²).
If X₁ ~ N(μ₁, σ₁²) and X₂ ~ N(μ₂, σ₂²), and X₁ and X₂ are independent, then X₁ + X₂ ~ N(μ₁ + μ₂, σ₁² + σ₂²).
If X₁, ..., X_n are independent standard normal variables, then X₁² + ... + X_n² has a sebaran chi-kuadrat with n degrees of freedom.

Standardizing normal random variables

As a consequence of Property 1, it is possible to relate all normal random variables to the standard normal.

If X is a normal random variable with méan μ and variance σ², then

Z={\frac {X-\mu }{\sigma }}

is a standard normal random variable: Z~N(0,1). An important consequence is that the cdf of a general normal distribution is therefore

\Pr(X<x)=\Phi \left({\frac {x-\mu }{\sigma }}\right)={\frac {1}{2}}\left(1+{\mbox{erf}}\,\left({\frac {x-\mu }{\sigma {\sqrt {2}}}}\right)\right)

Conversely, if Z is a standard normal random variable, then

X=\sigma Z+\mu \,

is a normal random variable with méan μ and variance σ².

The standard normal distribution has been tabulated, and the other normal distributions are simple transformations of the standard one. Therefore, one can use tabulated values of the cdf of the standard normal distribution to find values of the cdf of a general normal distribution.

Generating normal random variables

For computer simulations, it is often useful to generate values that have a normal distribution. There are several methods; the most basic is to invert the standard normal cdf. More efficient methods are also known. One such method is the Box-Muller transform. The Box-Muller transform takes two uniformly distributed values as input and maps them to two normally distributed values. This requires generating values from a uniform distribution, for which many methods are known. See also random number generators.

The Box-Muller transform is a consequence of Property 3 and the fact that the chi-square distribution with two degrees of freedom is an exponential random variable (which is éasy to generate).

The central limit theorem

The normal distribution has the very important property that under certain conditions, the distribution of a sum of a large number of independent variables is approximately normal. This is the so-called central limit theorem.

The practical importance of the central limit théorem is that the normal distribution can be used as an approximation to some other distributions.

Sebaran binomial mibanda paraméter n sarta p ngadeukeutan kana normal keur n nu badag sarta p teu deukeut ka 1 atawa 0. Pendekatan sebaran normal mibanda méan μ = np sarta simpangan baku σ = (n p (1 - p))^1/2.
A Poisson distribution with paraméter λ is approximately normal for large λ. The approximating normal distribution has méan μ = λ and standard deviation σ = √λ.

Whether these approximations are sufficiently accurate depends on the purpose for which they are needed, and the rate of convergence to the normal distribution. It is typically the case that such approximations are less accurate in the tails of the distribution.

Occurrence

Approximately normal distributions occur in many situations, as a result of the central limit theorem. When there is réason to suspect the presence of a large number of small effects acting additively, it is réasonable to assume that observations will be normal. There are statistical methods to empirically test that assumption.

Effects can also act as multiplicative (rather than additive) modifications. In that case, the assumption of normality is not justified, and it is the logarithm of the variable of interest that is normally distributed. The distribution of the directly observed variable is then called log-normal.

Finally, if there is a single external influence which has a large effect on the variable under consideration, the assumption of normality is not justified either. This is true even if, when the external variable is held constant, the resulting distributions are indeed normal. The full distribution will be a superposition of normal variables, which is not in general normal. This is related to the théory of errors (see below).

To summarize, here's a list of situations where approximate normality is sometimes assumed. For a fuller discussion, see below.

In counting problems (so the central limit théorem includes a discrete-to-continuum approximation) where reproductive random variables are involved, such as
- Binomial random variables, associated to yes/no questions;
- Poisson random variables, associates to rare events;
In physiological méasurements of biological specimens:
- The logarithm of méasures of size of living tissue (length, height, skin aréa, weight);
- The length of inert appendages (hair, claws, nails, teeth) of biological specimens, in the direction of growth; presumably the thickness of tree bark also falls under this category;
- Other physiological méasures may be normally distributed, but there is no réason to expect that a priori;
Méasurement errors are assumed to be normally distributed, and any deviation from normality must be explained;
Financial variables
- The logarithm of interest rates, exchange rates, and inflation; these variables behave like compound interest, not like simple interest, and so are multiplicative;
- Stock-market indices are supposed to be multiplicative too, but some reséarchers claim that they are log-Lévy variables instéad of lognormal;
- Other financial variables may be normally distributed, but there is no réason to expect that a priori;
Light intensity
- The intensity of laser light is normally distributed;
- Thermal light has a Bose-Einstein distribution on very short time scales, and a normal distribution on longer timescales due to the central limit théorem.

Of relevance to biology and economics is the fact that complex systems tend to display power laws rather than normality.

Photon counts

Light intensity from a single source varies with time, and is usually assumed to be normally distributed. However, quantum mechanics interprets méasurements of light intensity as photon counting. Ordinary light sources which produce light by thermal emission, should follow a Poisson distribution or Bose-Einstein distribution on very short time scales. On longer time scales (longer than the coherence time), the addition of independent variables yields an approximately normal distribution. The intensity of laser light, which is a quantum phenomenon, has an exactly normal distribution.

Measurement errors

Repéated méasurements of the same quantity are expected to yield results which are clustered around a particular value. If all major sources of errors have been taken into account, it is assumed that the remaining error must be the result of a large number of very small additive effects, and hence normal. Deviations from normality are interpreted as indications of systematic errors which have not been taken into account. Note that this is the central assumption of the mathematical theory of errors.

Physical characteristics of biological specimens

The overwhelming biological evidence is that bulk growth processes of living tissue proceed by multiplicative, not additive, increments, and that therefore méasures of body size should at most follow a lognormal rather than normal distribution. Despite common claims of normality, the sizes of plants and animals is approximately lognormal. The evidence and an explanation based on modéls of growth was first published in the classic book

Huxley, Julian: Problems of Relative Growth (1932)

Differences in size due to sexual dimorphism, or other polymorphisms like the worker/soldier/queen division in social insects, further maké the joint distribution of sizes deviate from lognormality.

The assumption that linéar size of biological specimens is normal léads to a non-normal distribution of weight (since weight/volume is roughly the 3rd power of length, and gaussian distributions are only preserved by linéar transformations), and conversely assuming that weight is normal léads to non-normal lengths. This is a problem, because there is no a priori réason why one of length, or body mass, and not the other, should be normally distributed. Lognormal distributions, on the other hand, are preserved by powers so the "problem" goes away if lognormality is assumed.

blood pressure of adult humans is supposed to be normally distributed, but only after separating males and females into different populations (éach of which is normally distributed)
The length of inert appendages such as hair, nails, teet, claws and shells is expected to be normally distributed if méasured in the direction of growth. This is because the growth of inert appendages depends on the size of the root, and not on the length of the appendage, and so proceeds by additive increments. Hence, we have an example of a sum of very many small lognormal increments approaching a normal distribution. Another plausible example is the width of tree trunks, where a new thin ring if produced every yéar whose width is affected by a large number of factors.

Financial variables

Because of the exponential nature of interest and inflation, financial indicators such as interest rates, stock values, or commodity prices maké good examples of multiplicative behaviour. As such, they should not be expected to be normal, but lognormal.

Benoît Mandelbrot, the popularizer of fractals, has claimed that even the assumption of lognormality is flawed.

Lifetime

Other examples of variables that are not normally distributed include the lifetimes of humans or mechanical devices. Examples of distributions used in this connection are the sebaran eksponensial (memoryless) and the Weibull distribution. In general, there is no réason that waiting times should be normal, since they are not directly related to any kind of additive influence.

Test scores

The IQ score of an individual for example can be seen as the result of many small additive influences: many genes and many environmental factors all play a role.

IQ scores and other ability scores are approximately normally distributed. For most IQ tests, the méan is 100 and the standard deviation is 15.

Criticisms: test scores are discrete variable associated with the number of correct/incorrect answers, and as such they are related to the binomial. Moreover (see this USENET post), raw IQ test scores are customarily 'massaged' to force the distribution of IQ scores to be normal. Finally, there is no widely accepted model of intelligence, and the link to IQ scores let alone a relationship between influences on intelligence and additive variations of IQ, is subject to debate.

Tempo ogé

multivariate normal distribution.

Tumbu kaluar jeung rujukan

A. Kropinski's normal distribution tutorial^{[tumbu nonaktif]}
S. M.Stigler: Statistics on the Table, Harvard University Press 1999, chapter 22. History of the term "normal distribution".
Earliest Known uses of some of the Words of Mathematics. See: [1] for "normal", [2] for "Gaussian", and [3] for "error".
Earliest Uses of Symbols in Probability and Statistics. See Symbols associated with the Normal Distribution.