Linköping University 
Department of Mathematics
Lars Eldén

September 2020


Sese Matrices and Statistics

Computer assignment

Classification of Handwritten Digits








ASSIGNMENT

Code the algorithm below in MATLAB or R for classification of handwritten digits.

ALGORITHM

  1. Training: For each class 0-9 of training digits, compute the SVD and take the first 10 singular vectors as basis matrix for the class.
  2. Classification: For each digit in the test set, compute its relative residual in all 10 bases. Classify as the one with the smallest relative residual. Check if it was correctly classified. Count and give the percentage of correctly classified digits.
  3. If time permits: make experiments to see if 10 is the optimal number of basis vectors.

DATA

The test data are available at the URL https://users.mai.liu.se/larel04/kurser/. Two sets of files are provided: Those with extension .mat are for Matlab users and those with extension .dat are for R users. When downloading you must write the full address to each file, e.g. https://users.mai.liu.se/larel04/kurser/dzip.dat

  1. dzip.mat and azip.mat: (for R users dzip.dat and azip.dat:) the first is a vector that holds the digits (the number) and the second is an array of dimension 256 x 1707 that holds the training images. The images are vectors of dimension 256, that have been constructed from 16 x 16 images.

  2. dtest.mat and testzip.mat (for R dtest.dat and testzip.dat) hold the test data.

  3. Matlab: ima2.m takes an image vector as input and displays it.
  4. R: Download the file image-R.txt for instructions on how to display digits.

The data are a subset of the US Postal Service Database, and we downloaded them from the webpage of the book The Elements of Statistical Learning, Hastie, Tibshirani and Friedman (2001). Springer-Verlag.