NEW:A note on specifications.

PK2 -- SML Assignment 2

Introduction and History

Radix sort was originally used for mechanically sorting punch cards. The principle used is to classify the given numbers first according to the least significant digit, then according to the second-least significant digit, and so on. For this to work, one needs to use a stable classification method: the order between two equal numbers must be preserved.

The textbook by Cormen et al., Introduction to Algorithms describes radix sort. For the second edition, look on pages 170 to 173. For other editions, you'll simply have to look up Radix Sort in the table of contents.

You need not read up anything about counting sort, nor use a sorting algorithm to classify the numbers into the bins. You can choose any number of bins, corresponding to the base in which your algorithm views the given numbers, but indicate that base in your specification. The objective of this assignment is to assess the subtleties in efficiently implementing radix sort for lists and properly examining its actual efficiency.

Lab Instructions

Read the relevant section of the textbook.
Implement radix sort for sorting where keys are compared in non-decreasing order. That is, any element is less than or equal to its successor, if any. The algorithm should take a list L of natural numbers with maximum d digits in the chosen base:
radixSort L d : int list -> int -> int list
such that the running time for n elements in the list is Θ(n).
Example: radixSort [9821, 5674, 7890, 1234, 638] 4 = [638, 1234, 5674, 7890, 9821], assuming the algorithm has 10 bins, because all the n=5 given numbers have maximum d=4 digits in base 10.
Compare radix sort against merge-sort or quick-sort implementation via experiments on the same random lists of 100, 200, ..., 1000 elements. To generate a random list, use Random.rangelist (minimalElement, maximalElement) (numberOfElements, Random.newgen()).
The interval between the smallest and largest possible values of a random list should be at least 100 times larger than the number of elements of that list, otherwise there will be too many duplicates in the generated lists. Many duplicates would mean that one examines an extreme case rather than a general, random case. To measure a running time, use the function timeIt in the file ranTime.sml, which takes a function f and an argument a and returns the running time for f applied to a..
- Do measurements for different sizes of input data and present results in one diagram with curves. (Tip: use gnuplot.)
- Are the results as expected? Why / Why not?

Hand-in

Your report complying with ethics rules of the course should be handed in latest on Friday the 20th of April before 17:00 via the online course manager system, and should contain:

SML functions for question 2 from above, in an sml file, running under Moscow ML version 2.01, and commented, complying with coding convention.
Answer, in English, for question 3 from above, in a pdf, txt, or html file. Code for quick sort and merge sort can be found here