AD1 -- SML Assignment 3

Introduction and History

Radix sort was originally used for mechanically sorting punch cards. The principle used is to classify the given numbers first according to the least significant digit, then according to the second-least significant digit, and so on. For this to work, one needs to use a stable classification method: the order between two equal numbers must be preserved. Radix sort is described in CLRS (Introduction to Algorithms), on pages 170 to 173 (you need not read up anything about counting sort, nor use a sorting algorithm to classify the numbers into the bins). You can choose any number of bins, corresponding to the base in which your algorithm views the given numbers, but indicate that base in your specification. The objective of this assignment is to assess the subtleties in efficiently implementing radix sort for lists and properly examining its actual efficiency.

Lab Instructions

These instructions won't make sense to you until you have read the relevant sections in the textbook Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein
Introduction to Algorithms (second edition) (errata). Since editions differ slightly, you'll simply have to look up Radix Sort in the table of contents.

Implement radix sort for sorting where keys are compared in non-decreasing order. That is, any element is less than or equal to its successor, if any. The algorithm should take a list L of natural numbers with maximum d digits in the chosen base: radixSort L d : int list -> int -> int list
such that the running time for n elements in the list is Θ(n).
Example: radixSort [9821, 5674, 7890, 1234, 638] 4 = [638, 1234, 5674, 7890, 9821], assuming the algorithm has 10 bins, because all the n=5 given numbers have maximum d=4 digits in base 10.
Compare radix sort against merge-sort or quick-sort implementation via experiments on the same random lists of 100, 200, ..., 1000 elements. To generate a random list, use Random.rangelist (minimalElement, maximalElement) (numberOfElements, Random.newgen()). The interval between the smallest and largest possible values of a random list should be at least 100 times larger than the number of elements of that list, otherwise there will be too many duplicates in the generated lists. Many duplicates would mean that one examines an extreme case rather than a general, random case. To measure a running time, use the function timeIt in the file ranTime.sml, which takes a function f and an argument a and returns the running time for f applied to a..
- Do measurements for different sizes of input data and present results in one diagram with curves. (Tips: use gnuplot.)
- Are the results as expected? Why / Why not?

Hand-in

Your report complying with ethics rules of the course should be handed in latest on Friday the 16th of February before 08:14 (morning) via the online course manager system, and should contain:

SML functions for question 1 from above, in a *.sml text file, running under Moscow ML version 2.01, and commented, in English, complying with coding convention.
Answer, in English, for question 2 from above, in a *.pdf, or *.rft file. Code for quick sort and merge sort can be found here