Putting it all together: TCP - IP - Ethernet

References

Boundaries

How do the logical, simple, clean protocol stacks actually get implemented in real computers? Not quite so logically, simply, and cleanly. There are important boundaries in most implementations
Application - OS
application protocols and application processes are outside of the OS kernel
they deal with internetwork names and addresses
for TCP/IP, this is between the app layer and the transport layer
Network - Physical addresses
network names uniquely identify hosts in the world
physical addresses identify a particular wire going into a particular interface card
for TCP/IP, this boundary is found between the IP and the Host-to-network layer

Problems with the strictly layered model

A tenet of the strictly layered model is that layer N doesn't know anything about layer N-1 other than the service interface.

Efficiency can suffer if this is maintained in implementation. So, for example, the transport layer (TCP) should divided the stream of data from the app into segments that fit optimally into the frames that will be used to transmit them. But knowing the size of the data link frame is none of the transport layer's business. Even worse, the transport layer must really know the size of the headers used at the IP and data link layers, so it won't be sending segments that a just a bit too big.

Another problem arises with the job of de-multiplexing segments (say for TCP or UDP) from network layer packets (say IP) and data link layer frames (say Ethernet). What if the transport layer buffer an app has for incoming data is full? Then the kernel has no where to store the recently arrived and de-muxed segment. But look how much work was done by the kernel in the lower layers (physical, data link, network, transport) only to throw the data out since the buffer is full. A situation where the kernel invests significant CPU resources only to throw away the result is prone to thrashing. A solution would be to recognize early on in the de-multiplexing process that a particular frame of data is bound for a process where the transport layer buffer is full. Then the data can be immediately (aggresively) discarded, and wasted CPU time is minimized.

Efficiency sometimes means sacrificing the pureness of the layered model.

What this is

I wrote this detailed example to try to "put it all together" for students who have learned networking in the traditional approach, from the bottom up, and have studied up to and including the transport layer. The example uses TCP/IP and Ethernet. If nothing else this exemplifies the maxim that an encapsulated complexity isn't a complexity. Try to remember that all this is going on "behind the scenes" the next time you browse the web or use ftp.

Scenario

What happens when you type
ftp host.entity.domain
to launch an FTP client from a UNIX shell and transfer a file from a remote server?

Assume that the destination address (host.entity.domain) is not on the same Ethernet subnet as the station you launch the client from. Assume also that the DNS is in use on the client.

Here's a picture of the players in this scenario

The players

Client

DNS Server

Routers

Internet

Server

Steps in the scenario

  1. User enters ftp command line to shell prompt.
  2. Shell forks new process to be ftp client process.
  3. New process execs ftp client program.
  4. Convert hostname host.somewhere.domain to IP address a.b.c.d.
  5. struct hostent *gethostbyname(const char *name);
    
  6. Name server library functions used to resolve request FQDN to IP address. This step alone may involve the creation and use of many network connections and subsequent transactions, but for this purpose we'll just note that a DNS server somewhere on the network provides the information we need. To get an idea of what this information looks like, try the nslookup command and type in a hostname.
  7. Look up port number for well-known service FTP (in this case 21).
  8. struct servent *getservbyname(const char *name, const char *proto);
    
  9. With a few other library calls the client process has now completed the necessary data structure to create a socket to use to communicate with the server.
  10. int socket(int domain, int type, int protocol);
    

    This system call results in the client's kernel allocating various data structures to keep track of the new socket, and making entries into the processes data structures to give the proces access to the socket.

  11. Connect the client's newly allocated socket to the servers pre-existing socket.
  12. int connect(int socket, struct sockaddr *serv_addr, int addrlen);
    

    The kernel first binds an unused port number to the socket, if it isn't already bound. Now the socket has a unique name: client IP address + port number.
     

  13. Next the TCP portion of the kernel (hereafter called "TCP") begins the three-way handshake to establish an FTP connection with the server.
  14. TCP creates a SYN segment which says which port on the server the connection is for, and an initial sequence number to use.
  15. The TCP segment is passed to the IP portion of the kernel (hereafter referred to as "IP"). IP puts a header on the TCP segment creating an IP datagram. The header of course contains the IP address of the server that the TCP segment is bound for.
  16. IP uses the subnet mask for the Ethernet interface to decide whether the destination is local or remote, with respect to the local subnet.
  17. Since the packet is leaving the subnet, it will be sent to the appropriate router after consulting the routing tables. Most likely there is a single default router to which packets for all non-local destinations are sent. Suppose the routers IP address is r.x.y.z.
  18. padda% netstat -r
    
    Routing Table:
      Destination           Gateway           Flags  Ref   Use   Interface
    -------------------- -------------------- ----- ----- ------ ---------
    ethernet-9.it.uu.se  padda.it.uu.se        U        3   9004  hme0
    BASE-ADDRESS.MCAST.net padda.it.uu.se        U        3      0  hme0
    default              r1.n.it.uu.se         UG       0  48971  
    localhost            localhost             UH       0 169334  lo0
    
    
    padda% netstat -rn
    
    Routing Table:
      Destination           Gateway           Flags  Ref   Use   Interface
    -------------------- -------------------- ----- ----- ------ ---------
    130.238.9.0          130.238.9.165         U        3   9004  hme0
    224.0.0.0            130.238.9.165         U        3      0  hme0
    default              130.238.9.125         UG       0  48971  
    127.0.0.1            127.0.0.1             UH       0 169334  lo0
    
    
  19. IP now hands the packet to the Ethernet driver to be bundled into a frame and signalled onto the local network. What NIC address is used for this frame? The frame is meant for the router, so the routers interface address is needed.
  20. The Ethernet driver doesn't know about IP addresses (this would violate the idea of a protocol stack), but needs to have the NIC address for the router. The driver asks ARP to map the IP number of the router to its Ethernet address. ARP maps 32 bit IP network layer addresses to 48 bit Ethernet LAN NIC addresses.
  21. The ARP cache is consulted, to see if the desired mapping is already known. Since the router would be a common destination, there is a good chance that this is the case. But if it is not, then ARP takes the pending IP packet, generates it's own Ethernet frame with a broadcast address, and gives this ARP packet in Ethernet frame to the Ethernet driver to broadcast and ask everyone on the local subnet, "who has IP address r.x.y.z?".
  22. The Ethernet interface of the router sees this broadcast ARP request and issues an ARP reply. The ARP packet in the received frame is given to ARP by the Ethernet driver. When the ARP reply is received by the client, the client knows the Ethernet address of the router, and this Ethernet<->IP mapping is stored in the ARP cache.
  23. padda% arp -a
    Net to Media Table
    Device   IP Address               Mask      Flags   Phys Addr 
    ------ -------------------- --------------- ----- ---------------
    hme0   r1.n.it.uu.se        255.255.255.255       00:10:29:8d:64:00
    hme0   fenix.it.uu.se       255.255.255.255       08:00:20:89:bd:0f
    hme0   padda.it.uu.se       255.255.255.255 SP    08:00:20:89:14:68
    hme0   BASE-ADDRESS.MCAST.net 240.0.0.0       SM    01:00:5e:00:00:00
    
  24. Now the pending IP packet containing the TCP SYN segment can be re-submitted to the Ethernet driver for transmission. The Ethernet driver now has the address from ARP to the destination router NIC.
  25. The router's Ethernet device receives the Ethernet frame, checks that it is uncorrupt and of valid format and length, unbundles the data from the Ethernet frame, and passes the result (an IP packet) up to the IP layer of the router.
  26. The router consults it's routing tables and decides which of its interfaces to send this IP packet out on. This is the routing decision.
  27. The IP packet is passed back down to the appropriate hardware/software datalink device for the outgoing interface. For example, the IP packet may be onto a T1 leased line.
  28. After meandering through the Internet, going from router to router (hops), the IP packet eventually arives to the router which connects the server's Ethernet LAN to the rest of the world.
  29. The router uses ARP to determine the Ethernet address of the server and the IP packet, carried in an Ethernet frame, arrives to the server's hardware Ethernet NIC (a chip or card). This NIC buffers the frame and interupts the CPU. The CPU does a context switch and jumps to the Ethernet interupt handler. The Ethernet interupt handler copies the data from the Ethernet NIC buffer to kernel memory (or perhaps the Ethernet NIC does DMA and then interupts the CPU).
  30. The Ethernet driver unbundles the IP packet from the Ethernet frame and puts the Ethernet frame data in a series of kernel buffers (mbufs).
  31. The Ethernet driver then checks the type of the Ethernet frame to determine what the contents of the frame are (IP packet, ARP packet, other network protocol packet). The driver sees that this is an IP packet and puts the IP packet on the IP input queue. A software interupt is then scheduled for the kernel to process the received IP packet. The IP input queue has some finite length (for example, 50 packets). A full queue causes the Ethernet driver to throw the packet away.
  32. The software interupt generated by the Ethernet driver to tell the kernel that an IP packet has been receives causes the IP receive code of the kernel to execute. This code verifies the packet, processes options, reassembles the packet, and finally demultiplexes (unbundles) the TCP segment out of the IP packet data field. This consists of the following steps:
  33. The kernel copies the IP packet from the (possibly many) mbufs it is stored in into a single contiguous memory area to ease verification and processing.
    Verification means checking for damaged or incorrect formats. Since IP is an unreliable protocol, bad packets are simply thrown away.
    Since IP packets may be fragmented by small MTU networks, the kernel potentially needs to do reassembly of the complete, original packet.
  34. A software interupt is again used to notify the kernel of the arrival of a transport layer segment, in this case a TCP SYN segment.
  35. The TCP code processes the received segment, checking the format, the offset, options, updating the value of the sliding window, etc. If this were a data segment, then the data would be queued on the appropriate sockets incoming buffer. That data would then be available to the application via read().
  36. In this case, since the segment is a SYN segment, there is no data and instead a new socket is allocated, initialized to the CLOSED state, and a SYN/ACK segment is sent in response.
  37. The three-way TCP handshake is completed and the connection is established when the server receives an ACK segment to its SYN/ACK segment.
  38. Now that a TCP connection is set-up, the client and server may exchange data frames.
  39. The ftp server forks and hands-off processing this new client to its child process, returns to listening on well-known port 21. We'll now start calling the child process the server.
  40. In the case of this FTP example, the server notifies the client that it is ready, and the ftp client prompts the user for a login name.
  41. The ftp client then sends a USER message containing the login name to the server and awaits a result message.
  42. The ftp client then sends a PASS message containing the password it obtained from the user.
  43. At this point the control connection for the FTP protocol has been set-up. Note that each data transfer done with FTP requires the establishment of a separate data connection.
  44. Eventually the user says "quit" to the ftp client process and the TCP connection is torn down (up to four TCP segments exchanged).