Lecture 10: Ensemble Learning

Slides

Link for the slides.

Today’s Topics: Ensemble Learning

Ensemble learning is a simple idea. Instead of training one model, we train multiple models and combine the results. It is related to cross validation that we covered in lecture 6. A popular ensemble learning algorithm is random forests that combines many decision trees that are trained on random subsets of the features. To build a classifier a majority vote is taken. There are various techniques to improve the system including boosting, bagging and gradient boosting. As you will see below these are not limited to random forests, but to other combinations of learning algorithms. There is a lot extensions and refinements of ensemble methods including AdaBoost. We will only cover gradient boosting and not go too much into the mathematics behind ensemble learning (that is reserved for a more advanced course). The aim of this lecture is to give you access to another powerful machine learning technique.

One thing that we did not cover in Lecture 8 was using decision trees for regression. Although the idea is not that complicated you should spend some time understand how Regression trees can be used for regression. Although you will not be examined on other algorithms for constructing decisions trees (we covered ID3 in Lecture 8 ) you should be aware that there are other algorithms.

Constructing the perfect decision tree is a computationally hard problem (in fact it is NP-complete). For machine learning we want fast and efficient algorithms for learning, and most decision tree algorithms in the literature are approximations.

Reading Guide

Chapter 8 of Machine Learning Algorithms (Bonaccorso, Giuseppe) online university library link contains a very good overview of decision trees and random forests. There are also lots of scikit-learn code fragments that you can use in your own projects.
Hundred-Page Machine Learning Book Section 3.3 is a good start on decision trees for regression. Again the book does not really go into much detail on the algorithms, and it only a starting point. The wikipedia page on Decision tree learning is a good starting point and has many useful references to different learning algorithms.
Hundred-Page Machine Learning Book Chapter 7.5 is a good start on ensemble methods but does not go into much detail. As always with the book, if you do not understand the explanation then start looking at the references.
A good Kaggle notebook on Bagging and Boosting
To explore more than is covered in the book on gradient boosting you can start at the Wikipedia page and follow the references.
For your project you should explore scikit-learn API on ensemble methods.

What should I know by the end of this lecture?

How do I use decision trees for regression?
What is ensemble learning?
How do random forest work?
What is boosting and bagging?
What is gradient boosting?