Lecture 3: Probability and Naive Bayes Classification

Today’s Topic

Using Bayes’ theorem for machine learning. You should do some revision on the use of Bayes’ theorem in general. In this lecture you will look at how to use Bayes’ theorem to build a spam detector. One important idea to take away from this lecture is that there are a variety of ways implementing spam detection: in particular there are different feature models that you can use that give you different ways of calculating the relevant probabilities. It is important that you understand the difference between the different ways of implementing spam detection.

Reading Guide

Lecture Slides
Chapter 2 of The Hundred-Page Machine Learning Book contains some background on probability and Bayes’ theorem.
My notes on Naive Bayes for spam detection talk about the different ways of calculating the probabilities involved.
- Jonathan Lee’s notes on Naive Bayes for Spam Filtering notes are a bit more mathematical.

What should I know by the end of this lecture?

How do I use Bayes’ theorem?
How do I use Bayes’ theorem to build a simple spam detector that only uses one word?
What is independence assumption in the naive Bayes’ algorithm?
What are the various models estimating the probabilities for spam detection?
What is Laplacian smoothing and how does it work in the context of spam detection?
When do you need to use logarithms to calculate the relevant probabilities, and how do you use them?