Bayes Theorem for Computer Scientists

Use: For example, suppose one is interested in whether a woman has cancer, and knows that she is 65. If cancer is related to age, information about her age can be used to more accurately assess the probability of her having cancer using Bayes’ Theorem.

Bayes’ theorem is the foundational theory in the field of Bayesian inference. After establishing a firm method between relating the outcome of a know event in terms of an unknown event, we can now observe the relationship between the two events (and vice-versa). Using Bayes’ rule, we can update our knowledge about how these two events are related. These ideas belong to a broader school of thought called Bayesian statistics which helps us build advanced statistical models using techniques like Markov Chain monte carlo methods and the No-U-Turn sampler

Assume you have a room full of men and women. 70% of the people are women and 30% are men. Additionally, we know from polling every person that 40% of the women’s favorite color is green and 75% of the men’s favorite color is green.

What proportion of people like green? If the population is 100, 70 percent of them are women (exactly 70 women), and 40% of those women like it, so ((0.70.4)100) women like it(28 women). 30% are men and 75% of those men like it so ((0.30.75)100) like it(22.5 men). Add those together and we get 50.5 people.

If we chose a person from the population of people that like Green, what’s the chance that it is a woman?

We know that the number of people that like green is 50.5. 28 of them are women. So when picking from people that like green, there is a 28/55.5 chance that it will be woman (55%). 

ere’s the condensed version for Bayesian newcomers like myself:

  • Tests are not the event. We have a cancer test, separate from the event of actually having cancer. We have a test for spam, separate from the event of actually having a spam message.

  • Tests are flawed. Tests detect things that don’t exist (false positive), and miss things that do exist (false negative).

  • Tests give us test probabilities, not the real probabilities. People often consider the test results directly, without considering the errors in the tests.

  • False positives skew results. Suppose you are searching for something really rare (1 in a million). Even with a good test, it’s likely that a positive result is really a false positive on somebody in the 999,999.

  • People prefer natural numbers. Saying “100 in 10,000″ rather than “1%” helps people work through the numbers with fewer errors, especially with multiple percentages (“Of those 100, 80 will test positive” rather than “80% of the 1% will test positive”).

  • Even science is a test. At a philosophical level, scientific experiments can be considered “potentially flawed tests” and need to be treated accordingly. There is a test for a chemical, or a phenomenon, and there is the event of the phenomenon itself. Our tests and measuring equipment have some inherent rate of error.

Bayes’ theorem converts the results from your test into the real probability of the event. For example, you can:

  • Correct for measurement errors. If you know the real probabilities and the chance of a false positive and false negative, you can correct for measurement errors.

  • Relate the actual probability to the measured test probability. Bayes’ theorem lets you relate Pr(A|X), the chance that an event A happened given the indicator X, and Pr(X|A), the chance the indicator X happened given that event A occurred. Given mammogram test results and known error rates, you can predict the actual chance of having cancer.

     

    a cancer testing scenario:

    • 1% of women have breast cancer (and therefore 99% do not).
    • 80% of mammograms detect breast cancer when it is there (and therefore 20% miss it).
    • 9.6% of mammograms detect breast cancer when it’s not there (and therefore 90.4% correctly return a negative result).

    Put in a table, the probabilities look like this:

    bayes_table.png

    How Accurate Is The Test?

    Now suppose you get a positive test result. What are the chances you have cancer? 
     
    • Ok, we got a positive result. It means we’re somewhere in the top row of our table. Let’s not assume anything — it could be a true positive or a false positive.
    • The chances of a true positive = chance you have cancer * chance test caught it = 1% * 80% = .008
    • The chances of a false positive = chance you don’t have cancer * chance test caught it anyway = 99% * 9.6% = 0.09504

    The table looks like this:

    bayes_table_computed.png

The chance of getting a real, positive result is .008. The chance of getting any type of positive result is the chance of a true positive plus the chance of a false positive (.008 + 0.09504 = .10304).

So, our chance of cancer is .008/.10304 = 0.0776, or about 7.8%.

Leave a comment

Leave a comment