- The mathematics of Bayes Theorem are stunningly simple. In its most basic form, it is just an equation with three known variables and one unknown one.
- This simple formula can lead to surprising predictive insights.
Bayes and Laplace
- The intimate connection between probability, prediction, and scientific progress was thus well understood by Bayes and Laplace in the eighteenth century
- the period when human societies were beginning to take the explosion of information that had become available with the invention of the printing press several centuries earlier, and finally translate it into sustained scientific, technological, and economic progress.
- Bayes’s theorem is concerned with conditional probability. That is, it tells us the probability that a hypothesis is true if some event has happened.
Probability that your partner is cheating on you, given an event
- Event: you come home from a business trip to discover a strange pair of underwear
* The Signal and the Noise: Why So Many Predictions Fail--but Some Don't, Nate Silver, 2012
The probability of underwear u given cheating c
- Probability of underwear appearing, conditional on his cheating
The probability of the underwear u appearing if NO cheating
- Probability of the underwear’s appearing conditional on the hypothesis being false
The probability of cheating c
- What is the probability you would have assigned to him cheating on you before you found the underwear?
Active Learning – Calculate Cheating Probability
Example: Classification of Drew
- We have two classes: c1=male, and c2=female
- Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female.
- Posterior probability based on prior probability plus a new event
Classification of Documents
Questions We Can Answer
- Is this spam?
- Who wrote which Federalist papers?
- Positive or negative movie review?
- What is the subject of this article?
- Assigning subject categories, topics, or genres
- Authorship identification
- Age/gender identification
- Language Identification
- Sentiment analysis
For Active Learning we will use*
- probability that word shows up in a language
- probability that word is not in language
- Multiplying lots of probabilities can result in floating-point underflow. Since log(xy) = log(x) + log(y); better to sum logs of probabilities instead of multiplying probabilities.
- Add probability of words (per language) using:
- At completion of each language: