Bayes’ Theorem

The foundation of modern probability

7 min readOct 3, 2023

Introduction

In the world of probability and statistics, there exists a fundamental principle that underpins a vast array of applications, from machine learning and data science to medical diagnosis and finance. This principle is known as Bayes’ Theorem, named after the 18th-century English statistician and philosopher, Thomas Bayes. In this article, we will explore the origins of Bayes’ Theorem, its significance, and how it serves as the bedrock upon which modern probability theory is built.

I. Life and Times of Thomas Bayes

To appreciate the significance of Bayes’ Theorem, it’s essential to delve into the life and contributions of Thomas Bayes. Born in 1701 in London, Bayes was a nonconformist minister and mathematician with a passion for probability theory and theology. Although he published relatively few works during his lifetime, his most famous work, “An Essay towards solving a Problem in the Doctrine of Chances” (1763), laid the groundwork for what would become known as Bayes’ Theorem.

Bayes’ Theorem was not widely recognized during his lifetime, and it wasn’t until several years after his death in 1761 that his work gained the attention it deserved. The theorem was further developed and refined by later mathematicians and statisticians, including Pierre-Simon Laplace in the early 19th century. Today, Thomas Bayes is celebrated as one of the pioneers of probability theory.

II. Essence of Bayes’ Theorem

At its core, this theorem provides a framework for updating probabilities based on new evidence. It addresses the question of how to revise our beliefs or degrees of belief when confronted with additional information. The theorem mathematically expresses this idea as follows:

P(A∣B)=P(B)P(B∣A)⋅P(A),

where

P(A∣B) represents the probability of event A occurring given that event B has occurred.
P(B∣A) is the probability of event B occurring given that event A has occurred.
P(A) and P(B) are the probabilities of events A and B occurring, respectively.

In other words, Bayes’ Theorem tells us that the updated probability of event A happening after observing event B depends on the prior probability of A, the probability of B given A, and the overall probability of B.

To truly grasp the essence of Bayes’ Theorem, let’s move away from the abstract letters for a bit.

Imagine you have a diagnostic test for the flu. This test is 99% accurate in correctly identifying the flu when you have it, and 95% accurate in correctly identifying that you don’t have the flu when you don’t. If you receive a positive result on this flu test, you might think there’s a 95% chance you have the flu, right?

However, in reality, you still don’t have enough information to accurately assess your chances of having the flu based solely on the positive test result. This is because, without knowing the prevalence of the flu in the population, you can’t determine the likelihood of false positives.

Let’s say that during flu season, about 1 in 100 people in the population has the flu at any given time, and you test a large group of people. On average, you’d expect that 1% of the population has the flu. The test correctly identifies 99% of those who have it, which means it would correctly identify 0.99% of the people tested as having the flu. Among the 99% who don’t have the flu, 5% would be wrongly told they do, which amounts to 4.95% of the people tested.

So, if you take this flu test, which is said to be 95% accurate, and it comes back positive for the flu, your actual likelihood of having the flu in this scenario would be roughly 0.99%, not 95%. Moreover, a significant number of people who receive positive results will actually be flu-free, around 4.95% of them.

Therefore, if you take the flu test and it comes back positive, your actual chance of having the flu is around 0.99% based on the test’s accuracy and the prevalence of the flu in the population you tested. It’s a significant difference from the 95% you might have initially assumed.

In statistical terms:

Prior Probability: Before taking the test, we had a “prior probability” of having the flu, which is based on the prevalence of the flu in the population. In the example, this prior probability is about 1% because 1 in 100 people in the population has the flu.
Sensitivity & Specificity: The test’s sensitivity (ability to correctly identify those who have the flu) and specificity (ability to correctly identify those who don’t have the flu) are key parameters. In the example, the test has a sensitivity of 99% and a specificity of 95%.
Positive Test Result: When we receive a positive test result, we want to know the “posterior probability” of having the flu. Bayes’ theorem helps us calculate this probability by incorporating our prior probability, the test’s sensitivity, and the test’s specificity.

Bayes’ theorem allows us to calculate the probability of having the flu given a positive test result. It involves multiplying the prior probability by the sensitivity and then dividing by the probability of getting a positive result (which is a combination of true positives and false positives).
In our example, the posterior probability of having the flu after a positive test result (around 0.99%) is much lower than the initial assumption (1%), mainly because of the relatively low prevalence of the flu in the population and the potential for false positives.

In summary, the flu test example demonstrates how Bayes’ theorem helps us update our beliefs about having a disease based on new information (the test result), while considering both the test’s accuracy and the prevalence of the disease. It emphasizes that the accuracy of a test alone is not sufficient to determine the likelihood of having a disease; the context and prior probabilities play a crucial role in interpreting medical test results accurately.

Here’s another example (using the formula):

Imagine you’re trying to predict whether it will rain tomorrow (A) based on historical weather data and current meteorological conditions. Here’s how Bayes’ Theorem is applied:

Prior Belief (P(A)): This represents your initial belief about the likelihood of rain tomorrow based on historical averages. It’s essentially your starting point before considering any new information.
Prior Probability of Favorable Conditions (P(B)): P(B) represents your prior probability that the weather conditions are favorable for rain. This is also based on historical data and your knowledge of what conditions typically lead to rain.
Probability of Rain Given Favorable Conditions (P(A∣B)): P(A∣B) is the probability of rain (A) given that the meteorological conditions are conducive to precipitation (B). It’s the likelihood of rain if the conditions are right.
Likelihood of Favorable Conditions Given Rain (P(B∣A)): P(B∣A) is the likelihood that the weather conditions favor rain (B) if it indeed rains tomorrow. It’s the probability that the conditions you expect for rain actually occur when it rains.

Now, let’s introduce a new piece of information, the weather forecast (event C):

You check the weather forecast, and it predicts a 60% chance of rain (C∣A) if rain is expected (event A), and a 10% chance of rain (C∣B) if it’s not going to rain (event B).

With this forecast, you can use Bayes’ Theorem to update your belief about the likelihood of rain tomorrow (P(A∣C)) and the likelihood of favorable conditions (P(B∣C)) based on the forecast:

Updated Probability of Rain Tomorrow (P(A∣C)): This represents your updated probability of rain tomorrow after considering the weather forecast. Bayes’ Theorem helps you combine your prior knowledge (P(A)) with the forecast’s information (P(C∣A)) to make a more accurate prediction.
Updated Probability of Favorable Conditions (P(B∣C)): Similarly, P(B∣C) represents your updated probability that the weather conditions are favorable for rain after considering the forecast. It combines your prior belief (P(B)) with the forecast’s information (P(C∣B)).

III. Bayesian Approach to Probability

In the Bayesian view, probabilities are subjective degrees of belief rather than objective frequencies. This perspective allows for a more flexible and intuitive way of dealing with uncertainty. It accommodates the incorporation of prior information and continually updates beliefs as new data becomes available. Bayesian probability theory has found extensive applications in fields such as machine learning, Bayesian statistics, Bayesian networks, and Bayesian decision theory.

IV. Applications of Bayes’ Theorem

Medical Diagnosis: Bayes’ Theorem plays a crucial role in medical diagnosis, where doctors use prior knowledge about a patient’s medical history and symptoms to update the probability of a particular disease.
Spam Filtering: Email services employ Bayesian classifiers to filter out spam by analyzing the likelihood of certain words or patterns appearing in spam emails.
Machine Learning: In machine learning, the Bayesian framework is used for probabilistic modeling, Bayesian networks, and probabilistic graphical models to make predictions and perform statistical inference.
Finance: Bayesian methods are applied in financial modeling, where investors update their beliefs about the future state of the market based on new information.
Natural Language Processing: Bayesian models are used in natural language processing tasks like speech recognition and language understanding, where they help decipher the meaning of ambiguous words or phrases.
A/B Testing: Businesses use Bayesian statistics to analyze the results of A/B tests, enabling them to make data-driven decisions about website design, product features, and marketing strategies.

V. Unifying Force of Bayes’ Theorem

One of the most remarkable aspects of this theorem is its unifying power. It provides a common framework for understanding and solving diverse problems across various domains. Whether you are estimating the likelihood of a medical diagnosis, predicting stock market trends, or classifying emails as spam, Bayes’ Theorem offers a consistent and rational way to update beliefs in the face of uncertainty.

Concluding remarks

Bayes’ Theorem stands as a testament to the remarkable synergy of mathematics and human ingenuity. Its elegant formulation allows us to harmonize prior knowledge with new evidence, creating a rational and adaptable framework for decision-making and inference. In an ever-uncertain world, Bayes’ Theorem remains a guiding light, illuminating the path through the intricate landscape of probability and statistics while forging connections across disciplines, solving mysteries in medicine, unraveling financial complexities, and even aiding in the quest for cosmic communication. Its enduring influence reminds us that, in the realm of human knowledge, the power of elegant mathematics knows no bounds.