Lecture 2: Linear Regression as Machine Learning

Today’s Topics

Linear Regression

Linear Regression as a machine learning algorithm. Machine learning algorithms and hypothesises. In short a machine learning find the best hypothesis that explains that data. A cost function (or an error function, or a loss function) measure how far way a hypothesis is from explaining the data: the smaller the cost, the better the hypothesis.

Ideally you want an algorithm takes the training data and gives you the hypothesis that with the smallest cost value. With linear regression this is possible (using linear algebra), but in general it is not possible.

If you have been reading about neural networks, then in a neural network the weight roughly corresponds to the set of all possible hypothesis.

Training and Test Sets

As the course moves along we will learn more about best practices with machine learning. The first important idea is that you should split your data into a test set and a training set. If do not do this then there is a possibility that you will over fit to your data set, and when you meet new examples your machine learning system will not perform that well. It is also important to keep in mind that when you are using gradient descent to find the best hypothesis you use the training set, but when you are evaluating the performance of the learning algorithm you should use the test set. Later on when we look at cross validation we will look at more advance ways to divide up your data.

Links to Slides

The slides can be found here..

Reading Guide

Hundred-Page Machine Learning Book Chapter 3 section 3.1 and Chapter 4.

What should I know by the end of this lecture?

How does linear regression work with one variable?
How does linear regression work with many variables?
What is a hypothesis in a machine learning algorithm?
What is a cost function? Note that in machine learning there is no standard terminology. This is because machine learning comes from many different disciplines. Other names for the cost function are the Error function and the loss function.
What is the goal of a machine learning algorithm with the hypothesis and the cost function? What does the cost function measure? Why is a low value of the cost function desirable?
How does gradient descent work for linear regression? Can you derive it from first principles?
Why is it necessary to split the data up into a training and a test set?