Today’s Topics
Today’s slides.
Logistic Regression
Logistic regression, overfitting and regularisation. Again Logistic regression is an algorithm that comes from statistics, but it can also seen as a machine learning algorithm. The hypothesis is very similar to linear regression is it a set of values that defines a linear function. The difference between logistic regression and linear regression is the linear function goes through a logistic function that works as a threshold function. Unlike linear regression it is not possible to solve the model exactly, and gradient descent is necessary.
There are a lot of ways of thinking about how logistic regression works.
- As a modification of linear regression to get $0$ or $1$ values to divide the data set into two halves, or two find a separating hyperplane between the two classes.
- As an estimator of the probability that point begins to one class or another.
- As a single neuron. You can see logistic regression as the beginning of neural networks.
Overfitting and Regularisation
Both linear and logistic regression can be improved with a regularisation term that avoids overfitting. You should try to begin to understand why overfitting is a problem and some strategies for avoiding it.
Reading Guide
Logistic Regression
- Hundred-Page Machine Learning Book Chapter 3 section 3.2.
Overfitting and Regularisation
- Hundred-Page Machine Learning Book Chapter 3 section 3.1.2 and Chapter 5 sections 5.4 and 5.5.
Multiclass classification.
- One-vs-Rest and One-vs-One an excellent article by Jason Brownlee.
Confusion Matrices
- Again the Hundred-Page Machine Learning Book Chapter 5 section 5.6 (but not 5.6.5 or 5.6.4).
What should I know by the end of this lecture?
- What is logistic regression and how does it differ from linear regression?
- What is the cost function? What does the logistic function do?
- How do I implement gradient descent for logistic regression?
- How does logistic regression relate to log-odds and what is it relationship with probability.
- What is overfitting?
- How does the regularisation term work in linear and logistic regression and how does it avoid overfitting.
- How do you use a binary classifier for multi-class classification? What is one-vs-all classification?
- What is a confusion matrix?