Inferensi Bayes

Bayesian inference nyaéta inferensi statistik nu mana sagala kamungkinan di-interpretasi lain salaku frékuénsi atawa proporsi atawa sabangsana, tapi leuwih condong kana tingkat kapercayaan. Ngaran ieu asalna ku sabab remenna ngagunakeun Téoréma Bayes keur matotoskeun hiji perkara. Téorema Bayes, ngaran téori nu dipaké sanggeus Thomas Bayes anu mimiti ngawanohkeun ieu métodeu .

Kajadian jeung metoda ilmiah

Ahli statistik Bayes ngaku yén métodeu kaputusan Bayes mangrupa bentuk formal tina metoda ilmiah kaasup dina ngumpulkeun evidence nu mana ka hareupna atawa jalan keur nangtukeun hiji hypothesis. Dina hal ieu bisa jadi teu pasti salawasna, sanajan kitu lobana kumpulan kajadian bakal ngajadikeun naekna tingkat kapercayaan hipotesa pangluhurna (salawasna 1) atawa panghandapna (salawasna 0). Téorema Bayes nyadiakeun métodeu keur naksir tingkat kapercayaan dina waktu informasi anyar ngan saeutik. Bayes' theorem nyaéta

P(A|e)=P(A){\frac {P(e|A)}{P(e)}}

Keur kaperluan urang, (A) dijadikeun hipotesa induced tina sababaraha susunan observasi. (e) dijadikeun hipotesa konfirmasi tina observasi.

Watesan P(A|e) disebut posterior probability ti A, given e.
Watesan P(A) disebut prior probability ti A.
Watesan P(e) disebut prior probability ti e.

Faktor Likelihood: Pecahan

{\frac {P(e|A)}{P(e)}}

mangrupa faktor skala, probabilitas observasi hasil tina hipotesa dibagi ku probabilitas hipotesa observasi nu mangrupa kajadian independen dina hipotesa. Hasil ukuran ieu ngakibatkeun yén hipotesa ayana dina probabilitas nu dijieun tina observasi. ku lantaran kitu hasil observasi bakal jadi teu sahadé lamun hipotesa bener, sarta faktor skala bakal jadi gedé.

Perkalian faktor skala ieu ku probabilitas observasi nu bener bakal ngahasilkeun probabilitas hipotesa nu bener oge, saperti nu dibérékeun ku observasi.

Pagawéan konci dina nyieun kaputusan tangtuna ngararancang prior probabiliti dina observasi jeung hipotesa. Lamun prior probabiliti nembongkeun nilai objektif, maka bisa digunakeun keur nangtukeun ukuran objektif hipotesa probabiliti. Tapi, taya jalan nu jelas keur nangtukeun objektif probabiliti. Hal anu teu mungkin keur migawe pamarekandina nangtukeun hiji probabilitas bis nangtukeun sakabéh hipotesa nu mungkin.

Alternatifna, jeung sering dipaké, probabiliti dicokot salaku tingkat kapercayaan subjektif ti bagian partisipan. Téori saterusna nangtukeun ukuran rasio kepercayaan tina observasi nu dijadikeun subjek kapercayaan dina hipotesa. Tapi hasil dina kasus ieu masih kénéh nyesakeun subjektif dina posterior probabiliti. Sabab kitu téorema bisa digunakeun keur ngarasionalkeun kapercayaan dina sababaraha hipotesa, tapi nolak objektifitas. Sababaraha skema teu bisa dipaké, contona, sifat objektif dina nangtukeun konflik paradigma sain.

Dina loba kasus, akibat kajadian bisa disimpulkeun dina rasio likelihood, nu digambarkeun dina the law of likelihood. Hal ieu bisa dikombinasikeun jeung prior probability keur ngagambarkeun tingkat kapercayaan asli sarta kajadian pangtukangna nu dicokot dina perhitungan. saméméh kaputusan dijieun, loss function ogé diperlukeun keur nimbangkeun gambaran akibat tina kasalahan nangtukeun kaputusan.

Conto sederhana Kaputusan Bayes

Kueh tina mangkok nu mana?

Keur conto, aya dua mangkok pinuh ku kueh. Dina mangkok ka #1 aya sapuluh coklat 10 hias jeung 30 coklat polos, sedengkeun dina mangkok kadua #2 aya 20 coklat hias jeung 20 coklat polos. Kandar milih dua mangkok éta sacara acak sarta nyokot kue coklat sacara acak oge. Asumsina taya alesan keur percaya yén Kandar milih-milih éta mangkok atawa kue coklat téa. Kue coklat nu kacokot téh coklat polos. Sabaraha kamungkinna yén Kandar nyokot tina mangkok ka #1?

Artikel ieu keur dikeureuyeuh, ditarjamahkeun tina basa Inggris.
Bantuanna didagoan pikeun narjamahkeun.

Intuitively, it seems cléar that the answer should be more than a half, since there are more plain cookies in bowl #1. The precise answer is given by Bayes' théorem. Let H₁ corresponds to bowl #1, and H₂ to bowl #2. It is given that the bowls are identical from Fred's point of view, thus P(H₁) = P(H₂), and the two must add up to 1, so both are equal to 0.5. The "data" D consists in the observation of a plain cookie. From the contents of the bowls, we know that P(D | H₁) = 30/40 = 0.75 and P(D | H₂) = 20/40 = 0.5. Bayes' formula then yields

{\begin{matrix}P(H_{1}|D)&=&{\frac {P(H_{1})\cdot P(D|H_{1})}{P(H_{1})\cdot P(D|H_{1})+P(H_{2})\cdot P(D|H_{2})}}\\\\\ &=&{\frac {0.5\times 0.75}{0.5\times 0.75+0.5\times 0.5}}\\\\\ &=&0.6\end{matrix}}

Before observing the cookie, the probability that Fred chose bowl #1 is the prior probability, P(H₁), which is 0.5. After observing the cookie, we revise the probability to P(H₁|D), which is 0.6.

False positives in a medical test

False positives are a problem in any kind of test: no test is perfect, and sometimes the test will incorrectly report a positive result. For example, if a test for a particular disease is performed on a patient, then there is a chance (usually small) that the test will return a positive result even if the patient does not have the diséase. The problem lies, however, not just in the chance of a false positive prior to testing, but determining the chance that a positive result is in fact a false positive. As we will demonstrate, using Bayes' théorem, if a condition is rare, then the majority of positive results may be false positives, even if the test for that condition is (otherwise) réasonably accurate.

Suppose that a test for a particular diséase has a very high success rate:

if a tested patient has the diséase, the test accurately reports this, a 'positive', 99% of the time (or, with probability 0.99), and
if a tested patient does not have the diséase, the test accurately reports that, a 'negative', 95% of the time (i.e. with probability 0.95).

Suppose also, however, that only 0.1% of the population have that diséase (i.e. with probability 0.001). We now have all the information required to use Bayes' théorem to calculate the probability that, given the test was positive, that it is a false positive.

Let A be the event that the patient has the diséase, and B be the event that the test returns a positive result. Then, using the second alternative form of Bayes' théorem (above), the probability of a true positive is

{\begin{matrix}P(A|B)&=&{\frac {0.99\times 0.001}{0.99\times 0.001+0.05\times 0.999}}\,,\\~\\&\approx &0.019\,.\end{matrix}}

and hence the probability of a false positive is about (1 − 0.019) = 0.981.

Despite the apparent high accuracy of the test, the incidence of the diséase is so low (one in a thousand) that the vast majority of patients who test positive (98 in a hundred) do not have the diséase. (Nonetheless, this is 20 times the proportion before we knew the outcome of the test! The test is not useless, and re-testing may improve the reliability of the result.) In particular, a test must be very reliable in reporting a negative result when the patient does not have the diséase, if it is to avoid the problem of false positives. In mathematical terms, this would ensure that the second term in the denominator of the above calculation is small, relative to the first term. For example, if the test reported a negative result in patients without the diséase with probability 0.999, then using this value in the calculation yields a probability of a false positive of roughly 0.5.

In this example, Bayes' théorem helps show that the accuracy of tests for rare conditions must be very high in order to produce reliable results from a single test, due to the possibility of false positives. (The probability of a 'false negative' could also be calculated using Bayes' théorem, to completely characterise the possible errors in the test results.)

In the courtroom

Bayesian inference can be used to coherently assess additional evidence of guilt in a court setting.

Let G be the event that the defendent is guilty.
Let E be the event that the defendent's DNA matches DNA found at the crime scene.
Let p(E | G) be the probability of seeing event E assuming that the defendent is guilty. (Usually this would be taken to be unity.)
Let p(G | E) be the probability that the defendent is guilty assuming the DNA match event E
Let p(G) be the probability that the defendent is guilty, based on the evidence other than the DNA match.

Bayesian inference tells us that if we can assign a probability p(G) to the defendent's guilt before we take the DNA evidence into account, then we can revise this probability to the conditional probability p(G | E), since

p(G | E) = p(G) p(E | G) / p(E)

Suppose, on the basis of other evidence, a juror decides that there is a 30% chance that the defendent is guilty. Suppose also that the forensic evidence is that the probability that a person chosen at random would have DNA that matched that at the crime scene was 1 in a million, or 10⁻⁶.

The event E can occur in two ways. Either the defendent is guilty (with prior probability 0.3) and thus his DNA is present with probability 1, or he is innocent (with prior probability 0.7) and he is unlucky enough to be one of the 1 in a million matching péople.

Thus the juror could coherently revise his opinion to take into account the DNA evidence as follows:

p(G | E) = 0.3 × 1.0 /(0.3 × 1.0 + 0.7 × 10^-6) = 0.99999766667.

In the United Kingdom, Bayes' théorem was explained by an expert witness to the jury in the case of Regina versus Denis Adams. The case went to Appéal and the Court of Appéal gave their opinion that the use of Bayes' théorem was inappropriate for jurors.

Search theory

In May 1968 the US nucléar submarine Scorpion (SSN 589) failed to arrive as expected at her home port of Norfolk, Virginia. The US Navy was convinced that the vessel had been lost off the éastern séabord but an extensive séarch failed to discover the wreck. The US Navy's deep water expert, John Craven, believed that it was elsewhere and he organised a séarch south west of the Azores based on a controversial approximate triangulation by hydrophones. He was allocated only a single ship, the USNS Mizar, and he took advice from a firm of consultant mathematicians in order to maximise his resources. A Bayesian séarch methodology was adopted. Experienced submarine commanders were interviewed to construct hypotheses about what could have caused the loss of the Scorpion. The séa aréa was divided up into grid squares and a probability assigned to éach square, under éach of the hypotheses, to give a number of probability grids, one for éach hypothesis. These were then added together to produce an overall probability grid. The probability attached to éach square was then the probability that the wreck was in that square. A second grid was constructed with probabilities that represented the probability of successfully finding the wreck if that square were to be séarched and the wreck were to be actually there. This was a known function of water depth. The result of combining this grid with the previous grid is a grid which gives the probability of finding the wreck in éach grid square of the séa if it were to be séarched. This séa grid was systematically séarched in a manner which started with the high probability regions first and worked down to the low probability regions last. éach time a grid square was séarched and found to be empty its probability was réassessed using Bayes' theorem. This then forced the probabilities of all the other grid squares to be réassessed (upwards), also by Bayes' théorem. The use of this approach was a major computational challenge for the time but it was eventually successful and the Scorpion was found in October of that yéar. Suppose a grid square has a probability p of containing the wreck and that the probability of successfully detecting the wreck if it is there is q. If the square is séarched and no wreck is found then, by Bayes, the revised probability of the wreck being in the square is given by

$p'={\frac {p(1-q)}{(1-p)+p(1-q)}}$

More mathematical examples

Naive Bayes classifier

See: naive Bayesian classification.

Posterior distribution of the binomial parameter

In this example we consider the computation of the posterior distribution for the binomial paraméter. This is the same problem considered by Bayes in Proposition 9 of his essay.

We are given m observed successes and n observed failures in a binomial experiment. The experiment may be tossing a coin, drawing a ball from an urn, or asking soméone their opinion, among many other possibilities. What we know about the paraméter (let's call it a) is stated as the prior distribution, p(a).

For a given value of a, the probability of m successes in m+n trials is

p(m,n|a)={\begin{pmatrix}n+m\\m\end{pmatrix}}a^{m}(1-a)^{n}

Since m and n are fixed, and a is unknown, this is a likelihood function for a. From the continuous form of the law of total probability we have

p(a|m,n)={\frac {p(m,n|a)\,p(a)}{\int _{0}^{1}p(m,n|a)\,p(a)\,da}}={\frac {{\begin{pmatrix}n+m\\m\end{pmatrix}}a^{m}(1-a)^{n}\,p(a)}{\int _{0}^{1}{\begin{pmatrix}n+m\\m\end{pmatrix}}a^{m}(1-a)^{n}\,p(a)\,da}}

For some special choices of the prior distribution p(a), the integral can be solved and the posterior takes a convenient form. Dina minangkaan, lamun p(a) mangrupa sebaran beta nu mibanda paraméter m₀ sarta n₀, mangka posterior ogé sebaran beta nu mibanda paraméter m+m₀ jeung n+n₀.

A conjugate prior is a prior distribution, such as the beta distribution in the above example, which has the property that the posterior is the same type of distribution.

What is "Bayesian" about Proposition 9 is that Bayes presented it as a probability for the paraméter p. That is, not only can one compute probabilities for experimental outcomes, but also for the paraméter which governs them, and the same algebra is used to maké inferences of either kind. Interestingly, Bayes actually states his question in a way that might maké the idéa of assigning a probability distribution to a paraméter palatable to a frequentist. He supposes that a billiard ball is thrown at random onto a billiard table, and that the probabilities p and q are the probabilities that subsequent billiard balls will fall above or below the first ball. By making the binomial paraméter p depend on a random event, he cleverly escapes a philosophical quagmire that he most likely was not even aware was an issue.

Aplikasi komputer

Kaputusan Bayesian geus dipaké dina widang artificial intelligence jeung expert system. téhnik kaputusan Bayesian geus dijadikeun dasar dina minangkaan tenik komputer pattern recognition mimiti taun 1950 katompernakeun.

Mimiti tumuwuhna dina ngagunakeun kaputusan Bayesian keur filter spam. Contona: Bogofilter, SpamAssassin jeung Mozilla.

In some applications fuzzy logic is an alternative to Bayesian inference. Fuzzy logic and Bayesian inference, however, are mathematically and semantically not compatible: You cannot, in general, understand the degree of truth in fuzzy logic as probability and vice versa.

Tempo oge:

Tumbu kaluar

On-line textbook: Information Theory, Inference, and Learning Algorithms, by David MacKay, has many chapters on Bayesian methods, including introductory examples; compelling arguments in favour of Bayesian methods; state-of-the-art Monte Carlo methods, message-passing methods, and variational methods; and examples illustrating the intimate connections between Bayesian inference and data compression.
Naive Bayesian learning paper Archived 2003-12-22 di Wayback Machine
A Tutorial on Learning With Bayesian Networks Archived 2003-12-07 di Wayback Machine