Machine Learning

Naive Bayesian Classification

Abel Sanchez and John R. Williams

Introduction

Video Source: https://www.youtube.com/watch?v=R13BD8qKeTg

Stunningly Simple

  • The mathematics of Bayes Theorem are stunningly simple. In its most basic form, it is just an equation with three known variables and one unknown one.
  • This simple formula can lead to surprising predictive insights.

Bayes and Laplace

  • The intimate connection between probability, prediction, and scientific progress was thus well understood by Bayes and Laplace in the eighteenth century
  • the period when human societies were beginning to take the explosion of information that had become available with the invention of the printing press several centuries earlier, and finally translate it into sustained scientific, technological, and economic progress.

Conditional Probability

  • Bayes’s theorem is concerned with conditional probability. That is, it tells us the probability that a hypothesis is true if some event has happened.

Bayes Theorem

Probability that your partner is cheating on you, given an event

  • Event: you come home from a business trip to discover a strange pair of underwear

Underwear Example*

* The Signal and the Noise: Why So Many Predictions Fail--but Some Don't, Nate Silver, 2012

p(u/c)

The probability of underwear u given cheating c

  • Probability of underwear appearing, conditional on his cheating
  • 50%

p(u)

The probability of the underwear u appearing if NO cheating

  • Probability of the underwear’s appearing conditional on the hypothesis being false
  • 5%

p(c)

The probability of cheating c

  • What is the probability you would have assigned to him cheating on you before you found the underwear?
  • 4%

Active Learning

Active Learning – Calculate Cheating Probability

Classification of Drew

Example: Classification of Drew

  • We have two classes: c1=male, and c2=female
  • Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female.

Using Data

Bayesian Approach

  • Posterior probability based on prior probability plus a new event

Classification of Documents

Questions We Can Answer

  • Is this spam?
  • Who wrote which Federalist papers?
  • Positive or negative movie review?
  • What is the subject of this article?

Text Classification

  • Assigning subject categories, topics, or genres
  • Authorship identification
  • Age/gender identification
  • Language Identification
  • Sentiment analysis
  • ...

For Active Learning we will use*

* http://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering

Calculating Probabilities

  • probability that word shows up in a language
  • probability that word is not in language

Underflow Prevention

  • Multiplying lots of probabilities can result in floating-point underflow. Since log(xy) = log(x) + log(y); better to sum logs of probabilities instead of multiplying probabilities.
  • Add probability of words (per language) using:
  • In JavaScript ln is Math.log, and e is Math.exp
  • At completion of each language:

THE END