Data Link Layer

The datalink layer doesn't worry about getting bits on and off the medium. The design issues at this layer are the low-level concerns of communication: what about errors? what if the sender outstrips the receiver? how do we distinguish packets within a bit stream? how can we make the best use of the channel?

What isn't considered at this layer is anything more than a single-hop away, i.e. no networking issues. All communication issues are point-to-point at this layer, and the stream of bits is sequential.

One potential point of confusion when studying concepts like error and flow control: these issues arise in multiple layers, depending on the application, the medium, the design of the stack, etc.

Services

Connectionless, unacknowledged Connectionless, acknowledged Connection-oriented

Framing

The physical layer deals with bit streams. It is the datalink layer that thinks in terms of packets or frames. Everything the datalink layer does is in the context of frames. Frames can be either bit or character oriented A big question for the receiving datalink layer is, how does it convert the stream of bits from the physical layer into a sequence of frames or packets? This issue is more complicated than it might seem. Some possible approaches:
  1. size of frame (i.e. counting bits) either fixed, or communicated
  2. magic characters that mark the start/stop of frames
  3. violations of the signalling scheme used in the physical  layer

Character counts

Sending preliminary information in a header with a count of the number of characters in a frame is problematic because the count may be corrupted. Another possibility is fixing the size of the frame, like ATM cells.

Magic characters

Sending magic characters is a form of in-band signalling. The problem with in-band signalling of any sort is, how do you distinguish the magic characters from legitimate data? You need to use special characters that can't be used in the data stream. For ASCII text exchange this could work, but what about the exchange of binary (i.e. any of 256 possible 8 bit characters is allowed) data?

One possible approach is to use character stuffing, where each start/stop character that appears coincidentally in the data stream is replaced by 2 such characters. Then the receiver knows to remove the redundant characters and can distinguish them from the true start/stop characters. If you avoid being "character centric", then you can apply the same sort of idea to a bit stream, special bit flags, and bit-stuffing.

Signal violations

The last technique is to use a signal for the start/stop frames which is normally not used to transmit data. This only works if there are such "illegal" signals in the coding scheme. For example, a Manchester code works like this:

Example: ATM

The cells in ATM are 48 bytes of payload and 5 bytes of header. The header contains circuit control information (4 bytes) and a 1 byte checksum protecting only the header. The 4 control bytes are calculated by the layer above the TC layer. The TC layer must just calculate the header checksum.

Why not protect the data as well? Larger checksum would be needed and time to calculate would be longer, so only the header is protected. This means that cells won't be misdirected by corrupt headers, but they may have corrupt data. Higher layers must protect their data if so desired.

The checksum used corrects single bit errors and detects most multibit errors. The design was based on the assumption that the most common medium in use would be fiber, and hence have very low error rates. The probability of a cell having an undetected header error is about 10^-20 (at 155 Mbps that's 1 cell every 90,000 years).

Cells that have their header checksum inserted are ready to go. Whether they leave immediately or not depends on the physical medium. Synchronous medium require waiting for the start of a cell boundary, also the transmission of idle cells. Asynchronous medium allow for cells to be sent at any time.

In some cases the TC sublayer also has the responsibility of putting the ATM cells into the frames of the underlying medium. For example, ATM cells can be put into SONET frames for transmission.

The TC sublayer communicates control information over special OAM (operation and maintenance) cells.

Reception
On reception of a bitstream from the medium, the TC sublayer finds the cell boundaries, confirms the header checksum and throws out bad cells, and passes the cells up to the ATM layer.

The ATM cells don't have a structure like PPP or HDLC with special fields to mark the boundaries of the cells. How can the TC recognize the cells from the bit stream? The TC hunts for valid headers by sequentially searching the bit stream.

A valid header is one in which the header checksum agrees with the other 4 bytes of header info. A 5 byte shift register is used to search for valid headers. The checksum is computed on the 40 bits, and if it matches the HEC then it is possible that the cell boundaries are now known. The cell is discarded and the 40 bits which occur 48 bytes later (what should be the start of the next cell) are examined. Repeating this you can make the probability of finding the right cell boundaries arbitrarily high, at the cost of greater synchronization time.

<figure: state machine for cell synchronization
no_synch -> no_synch : shift, compute, no match
no_synch -> possible_synch: match
possible_synch -> possible_synch: shift 1 cell, compute, match
possible_synch -> synch: N matches
possible_synch -> no_synch: mismatch >

Probability of false synchronization = 2^(-8*N), where N is the number of correct checksums you have to find to move from the no_synch to the synch state.

Tanenbaum points out that this means of synchronization violates the fundamental principle of a protocol stack since the TC sublayer uses its knowledge of the cell header which is generated by the ATM layer. This ties the ATM layer to the TC sublayer.
 
 

Error control

Bad things happen. A bit may be corrupted in a received frame, or an entire frame may be lost. For anything other than the bare-bones connectionless, unacknowledged service these possibilities must be dealt with.

The basic idea is to provide a feedback channel from the receiver to the sender. This sort of feedback is known as acknowledgments, or ACKs.

Timers are used to cope with the possibility that frames are never received, or that ACKs are lost.

Flow control

Another reason for providing feedback to the sender is to prevent a fast sender from overloading a slow receiver. Buffers provide for some mis-match in transmission speed, but all buffers are finite, so a reliable mechanism to throttle or reduce the flow from the sender is needed.

Error Control

Thermal noise can be measured, predicted and designed for (change bandwidth used, increase signal power, alter signalling scheme). Other sources of noise are more difficult. Impulse noise, for examle, may corrupt a single bit or a sequence of bits. How many bits are corrupted by your refrigerator turning off (and the resulting EMI pulse) depends on the baud rate of communication. For example, a 0.5ms pulse will corrupt one or two signals from a 2400 baud modem (and hence 4-8 bits for V.32), but will corrupt around 1000 bits on a 10Mbps Ethernet.

Error detection

Three possibilities in communication: frame is ok, frame is bad & undetected, frame is bad and detected. Clearly the middle possibility should concern us.

Parity schemes

Block sum check

Polynomial codes (CRC codes)

Error correction

If you have a feedback channel (not all communication situations allow for this) then you can get away with just detecting errors, have the receiver ask for retransmission. If you don't have a feedback channel (e.g. space probes) then you must use some form of feed forward control where redundancy allows both detection and correction of errors.

Most computer networking is concerned with error detection, since the amount of information added to do feedforward error correction can be large.

Total length of a transmission in bits is  n = m + r (message + redundancy), where the codewords sent (distinct from the messages, remember) are n bits long.

Given two codewords, it is easy to say by how many bits they differ by (XOR and count the 1 bits).  The number of bits that two codewords differ by is known as the Hamming distance.  If two codewords are a Hamming distance d apart, it will require d single-bit errors to convert one to the other (corruption).

Most of the time when using m data bits we use all possible 2^m combinations. For example, the 7 bit ASCII code had 128 different symbols defined for it. It makes sense to use all possible combinations since to do otherwise decreases the inherent efficiency of the representation (i.e. you always send m bits, you might as well be able to communicate 2^m different symbols).

When you design a code that can be used to actually correct errors, and not just detect them, then you don't use all possible 2^n values in an n bit code. This "waste" represents redundancy which can be put to good use.

The Hamming distance of a code is the minimum Hamming distance between any two codewords of the code.

Detection: a code with a Hamming distance of d+1 can detect d single bit errors.
Correction: a code with Hamming distance of 2d+1 can correct d single bit errors.

Parity is a code with d=2, thus can detect single bit errors.

Simple correction example

Flow Control

Flow control is needed so that the receiver isn't overrun in time or buffer space by the sender. Flow control is independent of error control, but is made much more complicated by the possibilities of errors. The main goal of providing flow control is to insure that data is not lost by the sender sending too fast. The secondary goal is to use the channel as efficiently as possible.

X-on/X-off

Special characters embedded in data streams which cause the sender to stop sending (either ignore or buffer) information until a X-on is received. Also known as "software" flow control. Problems with picking special characters and binary data.

Send & Wait

With this simple scheme, the sender can only have one frame outstanding, then it must wait for an ACK from the receiver. When the receiver receives an error free frame, it returns an ACK to the sender. When the sender receives an ACK it sends another frame.

The data frames from A to B are intermixed with the acknowlegement frames from A to B, since the communication is usually full-duplex. A field in the frame header can be used to distinguish data from ACK. This way ACKs get a free ride on the next outgoing data frame (piggybacking) which results in better use of the available channel bandwidth. How about if new frames do not arrive on time to piggyback the acknowlegement? (sender will time out) Use a timer in the receiver to time out and send separate acknowlegement frames when necessary.

Efficiency

    H = number of header (overhead) bits per frame
    D = number of data bits per frame
    L = number of bits in a frame (H + D)
    C = length of the channel
    R = data rate in bits per second
    v = velocity of propogation
    Tframe = time to signal frame
    Tproc = time to process frame in physical layer, pass to datalink layer
    Tack = time to signal ACK
    Tprop = time for first bit to travel from sender to receiver

Total time to send one frame and receive an ACK for it (draw a picture of this)
    Ttotal = Tframe + Tproc + Tack + Tproc + 2Tprop

Then we can calculate the utilization of the channel with this protocol as follows:
U = Tframe / Ttotal
  ~= Tframe / (Tframe + 2Tprop)
since processing time is small and ACKs are short.
  = 1 / (1 + 2Tprop / Tframe)

Propogation delay is typically 2/3 speed of light, so Tprop = C / v and v = 2/3 * 3*10^8, where C is the distance between sender/receiver, and 3*10^8 is the speed of light in meters per second.

Tframe = L / R where L = length of frame in bits, and R is the data rate. So substituting we get

When Tprop / Tframe is less than 1 (short links, low data rates) then efficiency of send-and-wait is good.

For long links and high speeds, i.e. Tprop/Tframe is large, utilization can be very bad.

For long links, but low data rates it might be ok.

The efficiency problems of Send&Wait are because only one packet can be in transit at a time, and that frame may be much smaller than the channel length.

What about errors?

When sender sends a frame, it starts a timer. If receiver receives a frame with errors or sender receives an ACK with errors, it is discarded. If sender does not receive an ACK, it times out and recsends the frame.

If an ACK is lost, or corrupted, then sender may send two copies of the same frame.  To fix this we use a sequence number in packets. Receiver can simply discard duplicates that it receives. How many bits to use for sequence number?  You only need 1 bit (sequence number 0 or 1) since there is only ambiguity arising between a packet and its predecessor or successor. In other words, there is only one frame at a time outstanding.

The timer must be sent to at least twice what the propogation delay is, or the sender will draw false conclusions about lost frames. Time is also wasted when the receiver gets a corrupted frame, as it must wait for the sender to timeout and resend. An improvement is possible by sending explicit acknowledgement of the receipt of bad frames: NAKs.

Efficiency is also reduced since frames with errors must be resent, wasting some of the capacity of the channel.
P1 = probability that a data frame is lost or corrupted
P2 = probability that an ACK is lost or corrupted

Prob of successful transmission = (1 - P1) * (1 - P2) assuming frame and ACK loss are independent events
Prob. of failed transmission = F = 1 -  (1 - P1) * (1 - P2)

So your channel utilization is reduced by the probability you waste time sending frames that aren't successfully received and becomes

Sliding Window Protocols

Efficiency can be 100% in the absence of errors, yet the sender won't overrun the receiver.

Essence of sliding window:

  1. Some sort of maximum outstanding number of frames
  2. Requires N buffers to hold outstanding frames
  3. Receiver holds frames in a list, delivers in order to network layer.
  4. Sender need not wait for an ACK to send the next frame, but must obey point 1.
You can view send-and-wait as a sliding window protocol where the size of the window is 1. Since more than one frame can be outstanding at a time, each frame must be uniquely identified. A sequence number is used in the header of each frame to identify it. The size of the sequence number depends on how many frames can be outstanding (window size), and the way in which errors are handled.

Example of sliding window (Tannenbaum, 2nd ed, p 226)
window of size 1, sequence number of 3 bits
 
initial state first frame sent first frame received ack receive
sender next frame is 000 expects 000 
next is 001
expects 000 
next is 001
next is 001
receiver expects 000 expects 000 expects 001 expects 001
If the size of the window was increased beyond 1, then the sender could transmit the 2nd frame immediately and the gap between expects and next would widen.

Efficiency

Calculate the length of the channel in terms of number of frames - this is the key parameter for sliding window. If the channel is 10 frames long, and the send window is only 1 frame, then the maximum efficiency is 1/20, since it takes time equal to the sending of 20 frames to send and ACK just one frame.

In general, two modes to calculate efficiency for.  The first is for when the sender is never idle waiting for an ACK.  The sender can signal for a time Tframe*W, where W is the size of the sliding window. After that, the sender must wait for an ACK from the first frame, which will take Tframe + 2*Tprop to reach the sender.  The sender will never be blocked if   W* Tframe > Tframe + 2*Tprop, so the theoretical efficiency is 1.0, since the "pipeline" of transmission could always be full.

However, if W*Tframe < Tframe + 2*Tprop , or put another way
W < 1 + 2 * Tprop/Tframe
that is to say the windows size W is less than 2 times the length of the channel (plus 1) then the sender will send W frames, then wait for an ACK, send 1 more frame, wait for an ACK, etc.  This makes the steady-state efficiency

In other words, sliding window improves on send-and-wait by the size of the window (or 100%, whichever U is smaller).

Errors in sliding window

How do you handle corrupt frames with a sliding window protocol? Two possible strategies: have the sender selectively resend the data that was corrupt, or just start back in the frame sequence from the first bad frame.
Selective retransmission
Sender detects out-of-sequence ACK and retransmits the unACKd frames. Or receiver sends a NAK and the bad frames are resent.

Example with frame error (figure 4.10)
·error in frame N+1
·receiver returns ACK for all correctly received frames (...,N-1, N, N+2, ...)
·on receipt of ACK N+2, sender figures out that N+1 not received correctly
·sender retransmits N+1, changes position in list, goes on
Example with ack error (figure 4.11)
·error in ack N
·sender receives ack N+1, resends frame N
·receiver checks list, concludes that N is a duplicate, trashes
·receiver acks duplicate frame

Requires 2W+1 sequence numbers for a window of W frames
Consider case where full window is sent, all acks lost  - requires W buffer in the sender, W in the receiver.

If the probability of lost frames is high, and the frames can be large, then the space for buffers may be significant. That motivates the next strategy.

Go-back-N
Receiver detects receipt of an out-of-sequence frame and requests sender to retransmit all outstanding unACKd frames from the last correctly received frame. Or, the receiver sends a NAK to the primary when it first detects an out-of-sequence data frame.  Thereafter the receiver discards all data frames until it receives the missing frame.

If an ACK is corrupted (say N and N+1), then the sender will infer from receipt of the ACK for N+2 that both N and N+1 were received correctly, and will take the N+2 ACK as sufficient for all three frames.

Since the receiver doesn't have to store frames out-of-sequence, it doesn't require much buffer space.  Remember that it throws away the out-of-sequence frames it receives.

Requires W+1 sequence numbers for a window of W frames and W buffers in the sender, 1 in the receiver.

Go-Back-N handles duplicate (lost ACKs) frames easily by simply discarding them.  Selective-Retransmission can only know to discard a duplicate by maintaining a history list of received frames.

Most data comm is full duplex, i.e. you have as much BW in one direction as you do in another.  If you only use the reverse channel to send ACK/NAKs, then you will be wasting the BW of the reverse channel (low utilization).  One idea is to piggy-back ACK/NAK traffic on the information frames that are flowing on the reverse channel.

Errors and efficiency

P1 = probability that a data frame is lost or corrupted
P2 = probability that an ACK is lost or corrupted
Prob of successful transmission = (1 - P1) * (1 - P2) assuming frame and ACK loss are independent events
Prob. of failed transmission Assuming independent events, the probability that exactly k attempts are needed to get a frame transmitted is
    (1 - F) F^(k-1)    i.e. one successful attempt and k-1 attempts

Now you can calculate the average number of transmission attempts to get the one frame through successfully:
    N = sum of k from 1 to inf { k * (1-F)*F^(k-1) }
which is just the weighted average of all possible values for the number of times the frame must be sent and reduces to
    N = 1 / (1 - F)
and the number of re-transmission (i.e wasted channel capacity) is N - 1 or
    Nr = N -1 = F / (1 - F)

SelectiveRetransmission

GoBackN

Finite State Machine Protocol Models

Bugs in protocols are bad, protocols are complex, wouldn't it be nice to formally prove them bug-free?

Protocol machine: always in a specific state at every instant of time. Typically the states are chosen at instants that the protocol machine is waiting for the next event to happen. The events happening cause a change of state.

The state of the complete system is the combination of all the states of the two protocol machines and the channel. The state of the channel is simplified to refer to whether a frame is on it or not, and which kind of frame is on it.

Formally, a finite state machine of a protocol is a quadruple: (S,M,I,T)

It is important to formally define the protocol as a directed graph because then you can apply various graph theoretic results to determine interesting things about the protocol. For example, reachability analysis can be used detect a variety of errors in the protocol specification. Reachability analysis shows which states of all possible states may be reached.

Incompleteness of specification

Deadlock Extraneous transition: <example of stop and wait protocol machine>

What can you say about the correctness of send-and-wait from the FSM? An error in send-and-wait would be for a sender to send two frames with the same number in a row (i.e. to not alternate 0, 1, 0, 1, etc). You could state this requirement as

Complex protocols have very complex FSM representations. The number of possible states explodes in such cases, and some kinds of analysis become too expensive.