Bit Stream Benchmarks

Implementing Bit Stream Manipulating Programs in Many languages

huffman

Description

The huffman benchmark decodes a huffman-encoded file. The structure of the file is as follows:

     -----------------------------------------------------------------------------------------------------
     | SizeofTree (2 bytes) | Tree (SizeofTree bytes) | NumofBits (4 bytes) | EncodedData (Rest of File) |  
     -- --------------------------------------------------------------------------------------------------

The huffman tree is encode in the following manner:

An internal node:

 -------------------------------------------------------------------------------------------------------------------
 | SizeofLeft (2 bytes) | LeftSubTree (SizeofLeft bytes) | SizeofRight (2 bytes) | RightSubTree (SizeofRight bytes) |
 -- ----------------------------------------------------------------------------------------------------------------

A leaf node:

 -----------------
 | Char (1 byte) |
 -- --------------

The NumofBits field in the file shows how many bits are used to encode the original file.

The task of the program is to construct the huffman tree from the file and use this tree to decode the data. The program should not use any tricks such as lookup tables, etc. to do the decoding but it should simply use the bitstream to traverse the tree until it reaches a leaf. The contents of the leaf should then go into the result. When traversing the tree, the left subtree should be chosen if the next bit is 0 and the right subtree should be chosen if the next bit is 1.


Example:

Huffman Tree:

        / \  
       /   \  
      /\   /\   
     /\ e a  s  
    n /\  
     l  o  



01 = e, 10 = a, 11 = s, 000 = n, 0010 = l, 0011 = o

Encoded bitstream = 00100011 00001101 11011011 0
Original bitstream = loneasasea


Time Measurement

The time for this benchmark should be taken when the file has been loaded but before the tree has been constructed until the entire bitstream is decoded. Functional languages are not allowed to use destructive updates or preallocated buffers in the time-sensitive part.

Iterations: 10


Source Code

This benchmark has been implemented in the following languages:

  1. C:
  2. Erlang:
  3. Haskell:
  4. Java:
  5. O'Caml:

Example input and output:

testdata.huffman and testdata.huffman.out