Fractal Models of Network Traffic

Reference

Old model: Poisson arrival process

Problem: bursty traffic, even in low traffic situations, means peaks are up to 5 times greater than average

Networks and network devices created based on Poisson process models don't work as they are expected to work. Lots of time and money is being spent to develop advance packet network infrastructure and network applications that will depend on that infrastructure. If this development is all based on models that aren't accurate there will be a lot of wasted time and money and advanced applications may fail.

Many studies of traffic have been done recently, on varying scales (peer-to-peer, Ethernet, MAN, WAN) and speeds (Ethernet to very high speed networks). All of these studies confirm the self-similar nature of network traffice.

Here's a typical study done on campus Ethernet traffic at the University of Auckland.

Experimental data

They monitored the traffic in a typical university network:

For each experiment they collected at least 300,000 packets using a sniffer. The raw data consisted of when each packet arrived. Typical data looked like this:

<graph of number of packet arrivals versus time>

The interesting thing is to see how bursty the traffic is, even when the average load is quite small.

<3 histograms of packet per second arrival rate>

Statistical tests of the data showed that there was a self-similar or fractal nature to the network traffic and that the renewal process with infinite variance was a good model of the real data.

Renewal process model

A renewal process is one where the inter-event times are independent and identically distributed. A renewal process with infinite variance makes a good mathematical model of the bursty nature of packet network traffic. "Infinite variance" refers to a probability distribution which doesn't have an end. It has "heavy tails", and hence infinite variance. So how does it integrate to 1?

Studies of individual source/destination pairs show that they exhibit on/off (infinite variance renewal process) behavior, and mathematically it is known that this behavior of many individual pairs would aggregate to the self-similar nature that is observed.

The unanswered question is why network traffic acts this way. Researchers would like a physical explanation of the problem so that they can explain it in terms of data communications policies, algorithms, applications.

Engineering Impact

Data is gathered by network devices and monitors and used as input to things like routing algorithms and assignmnet of capacity. Today most network devices only gather fairly coarse (in time) data about load. This probably comes from the assumption that the load followed a Poisson process, so very little data was needed to acurately characterize it. One area of research is figuring out what set of parameters to gather that accurately characterize the traffic.

The first generation of ATM switches was designed with buffers which were too small for the sort of real traffice they experienced. The result was high packet loss when buffers overflowed.