## Bayes Theorem

## Example: Classification of Drew

- We have two classes: c1=male, and c2=female
- Classifying drew as male or female is equivalent to asking is it more probable that drew is male or female.

## Using Data

## Bayesian Approach

- Posterior probability based on prior probability plus a new event

## Classification of Documents

## Questions We Can Answer

- Is this spam?
- Who wrote which Federalist papers?
- Positive or negative movie review?
- What is the subject of this article?

## Text Classification

- Assigning subject categories, topics, or genres
- Authorship identification
- Age/gender identification
- Language Identification
- Sentiment analysis
- ...

## For Active Learning we will use*

* http://en.wikipedia.org/wiki/Naive_Bayes_spam_filtering
## Calculating Probabilities

- probability that word shows up in a language
- probability that word is not in language

## Underflow Prevention

- Multiplying lots of probabilities can result in floating-point underflow. Since log(xy) = log(x) + log(y); better to sum logs of probabilities instead of multiplying probabilities.
- Add probability of words (per language) using:

- In JavaScript ln is Math.log, and e is Math.exp
- At completion of each language: