Putting it all together: TCP - IP - Ethernet
References
TCP/IP Illustrated, Volume 1, The Protocols, W. Richard Stevens,
Addison-Wesley, 1994.
TCP/IP Illustrated, Volume 2, The Implementation, Gary R. Wright,
W. Richard Stevens, Addison-Wesley, 1995.
Boundaries
How do the logical, simple, clean protocol stacks actually get implemented
in real computers? Not quite so logically, simply, and cleanly. There are
important boundaries in most implementations
Application - OS
application protocols and application processes are outside
of the OS kernel
they deal with internetwork names and addresses
for TCP/IP, this is between the app layer and the transport layer
Network - Physical addresses
network names uniquely identify hosts in the world
physical addresses identify a particular wire going into a particular
interface card
for TCP/IP, this boundary is found between the IP and the Host-to-network
layer
Problems with the strictly layered model
A tenet of the strictly layered model is that layer N doesn't
know anything about layer N-1 other than the service interface.
Efficiency can suffer if this is maintained in implementation. So, for
example, the transport layer (TCP) should divided the stream of data from
the app into segments that fit optimally into the frames that will be used
to transmit them. But knowing the size of the data link frame is none of
the transport layer's business. Even worse, the transport layer must really
know the size of the headers used at the IP and data link layers, so it
won't be sending segments that a just a bit too big.
Another problem arises with the job of de-multiplexing segments (say
for TCP or UDP) from network layer packets (say IP) and data link layer
frames (say Ethernet). What if the transport layer buffer an app has for
incoming data is full? Then the kernel has no where to store the recently
arrived and de-muxed segment. But look how much work was done by the kernel
in the lower layers (physical, data link, network, transport) only to throw
the data out since the buffer is full. A situation where the kernel invests
significant CPU resources only to throw away the result is prone to thrashing.
A solution would be to recognize early on in the de-multiplexing process
that a particular frame of data is bound for a process where the transport
layer buffer is full. Then the data can be immediately (aggresively) discarded,
and wasted CPU time is minimized.
Efficiency sometimes means sacrificing the pureness of the layered model.
What this is
I wrote this detailed example to try to "put it all together" for students
who have learned networking in the traditional approach, from the bottom
up, and have studied up to and including the transport layer. The example
uses TCP/IP and Ethernet. If nothing else this exemplifies the maxim that
an encapsulated complexity isn't a complexity. Try to remember that all
this is going on "behind the scenes" the next time you browse the web or
use ftp.
Scenario
What happens when you type
ftp host.entity.domain
to launch an FTP client from a UNIX shell and transfer a file from a remote
server?
Assume that the destination address (host.entity.domain) is not
on the same Ethernet subnet as the station you launch the client from.
Assume also that the DNS is in use on the client.
Here's a picture of the players in this scenario
The players
Client
shell program and process - user interaction (~10,000 lines of code)
ftp client program - GUI or CLI
ftp client process
networking library - handy functions used by the ftp client program
client kernel TCP protocol stack code (~4500 lines of C)
client kernel IP protocol stack code (~ lines of C)
Ethernet device driver (~1000 lines of C)
DNS Server
assumed available on the client's network; maps FQDN to IP numbers
Routers
joins Ethernet LANs to the Internet
Internet
connects LANs via potentially complex internetwork of routers and media
Server
ftp server program - concurrent server forks processes as needed, one
per client
ftp server process
networking library - used by ftp server code
server kernel TCP/IP protocol stack code
Ethernet device driver
Steps in the scenario
-
User enters ftp command line to shell prompt.
-
Shell forks new process to be ftp client process.
-
New process execs ftp client program.
-
Convert hostname host.somewhere.domain to IP address a.b.c.d.
struct hostent *gethostbyname(const char *name);
-
Name server library functions used to resolve request FQDN to IP address.
This step alone may involve the creation and use of many network connections
and subsequent transactions, but for this purpose we'll just note that
a DNS server somewhere on the network provides the information we need.
To get an idea of what this information looks like, try the nslookup command
and type in a hostname.
-
Look up port number for well-known service FTP (in this case 21).
struct servent *getservbyname(const char *name, const char *proto);
-
With a few other library calls the client process has now completed the
necessary data structure to create a socket to use to communicate with
the server.
int socket(int domain, int type, int protocol);
This system call results in the client's kernel allocating various data
structures to keep track of the new socket, and making entries into the
processes data structures to give the proces access to the socket.
-
Connect the client's newly allocated socket to the servers pre-existing
socket.
int connect(int socket, struct sockaddr *serv_addr, int addrlen);
The kernel first binds an unused port number to the socket, if it isn't
already bound. Now the socket has a unique name: client IP address + port
number.
-
Next the TCP portion of the kernel (hereafter called "TCP") begins the
three-way handshake to establish an FTP connection with the server.
-
TCP creates a SYN segment which says which port on the server the connection
is for, and an initial sequence number to use.
-
The TCP segment is passed to the IP portion of the kernel (hereafter referred
to as "IP"). IP puts a header on the TCP segment creating an IP datagram.
The header of course contains the IP address of the server that the TCP
segment is bound for.
-
IP uses the subnet mask for the Ethernet interface to decide whether the
destination is local or remote, with respect to the local subnet.
-
Since the packet is leaving the subnet, it will be sent to the appropriate
router after consulting the routing tables. Most likely there is a single
default router to which packets for all non-local destinations are sent.
Suppose the routers IP address is r.x.y.z.
padda% netstat -r
Routing Table:
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ------ ---------
ethernet-9.it.uu.se padda.it.uu.se U 3 9004 hme0
BASE-ADDRESS.MCAST.net padda.it.uu.se U 3 0 hme0
default r1.n.it.uu.se UG 0 48971
localhost localhost UH 0 169334 lo0
padda% netstat -rn
Routing Table:
Destination Gateway Flags Ref Use Interface
-------------------- -------------------- ----- ----- ------ ---------
130.238.9.0 130.238.9.165 U 3 9004 hme0
224.0.0.0 130.238.9.165 U 3 0 hme0
default 130.238.9.125 UG 0 48971
127.0.0.1 127.0.0.1 UH 0 169334 lo0
-
IP now hands the packet to the Ethernet driver to be bundled into a frame
and signalled onto the local network. What NIC address is used for this
frame? The frame is meant for the router, so the routers interface address
is needed.
-
The Ethernet driver doesn't know about IP addresses (this would violate
the idea of a protocol stack), but needs to have the NIC address for the
router. The driver asks ARP to map the IP number of the router to its Ethernet
address. ARP maps 32 bit IP network layer addresses to 48 bit Ethernet
LAN NIC addresses.
-
The ARP cache is consulted, to see if the desired mapping is already known.
Since the router would be a common destination, there is a good chance
that this is the case. But if it is not, then ARP takes the pending IP
packet, generates it's own Ethernet frame with a broadcast address, and
gives this ARP packet in Ethernet frame to the Ethernet driver to broadcast
and ask everyone on the local subnet, "who has IP address r.x.y.z?".
-
The Ethernet interface of the router sees this broadcast ARP request and
issues an ARP reply. The ARP packet in the received frame is given to ARP
by the Ethernet driver. When the ARP reply is received by the client, the
client knows the Ethernet address of the router, and this Ethernet<->IP
mapping is stored in the ARP cache.
padda% arp -a
Net to Media Table
Device IP Address Mask Flags Phys Addr
------ -------------------- --------------- ----- ---------------
hme0 r1.n.it.uu.se 255.255.255.255 00:10:29:8d:64:00
hme0 fenix.it.uu.se 255.255.255.255 08:00:20:89:bd:0f
hme0 padda.it.uu.se 255.255.255.255 SP 08:00:20:89:14:68
hme0 BASE-ADDRESS.MCAST.net 240.0.0.0 SM 01:00:5e:00:00:00
-
Now the pending IP packet containing the TCP SYN segment can be re-submitted
to the Ethernet driver for transmission. The Ethernet driver now has the
address from ARP to the destination router NIC.
-
The router's Ethernet device receives the Ethernet frame, checks that it
is uncorrupt and of valid format and length, unbundles the data from the
Ethernet frame, and passes the result (an IP packet) up to the IP layer
of the router.
-
The router consults it's routing tables and decides which of its interfaces
to send this IP packet out on. This is the routing decision.
-
The IP packet is passed back down to the appropriate hardware/software
datalink device for the outgoing interface. For example, the IP packet
may be onto a T1 leased line.
-
After meandering through the Internet, going from router to router (hops),
the IP packet eventually arives to the router which connects the server's
Ethernet LAN to the rest of the world.
-
The router uses ARP to determine the Ethernet address of the server and
the IP packet, carried in an Ethernet frame, arrives to the server's hardware
Ethernet NIC (a chip or card). This NIC buffers the frame and interupts
the CPU. The CPU does a context switch and jumps to the Ethernet interupt
handler. The Ethernet interupt handler copies the data from the Ethernet
NIC buffer to kernel memory (or perhaps the Ethernet NIC does DMA and then
interupts the CPU).
-
The Ethernet driver unbundles the IP packet from the Ethernet frame and
puts the Ethernet frame data in a series of kernel buffers (mbufs).
-
The Ethernet driver then checks the type of the Ethernet frame to determine
what the contents of the frame are (IP packet, ARP packet, other network
protocol packet). The driver sees that this is an IP packet and puts the
IP packet on the IP input queue. A software interupt is then scheduled
for the kernel to process the received IP packet. The IP input queue has
some finite length (for example, 50 packets). A full queue causes the Ethernet
driver to throw the packet away.
-
The software interupt generated by the Ethernet driver to tell the kernel
that an IP packet has been receives causes the IP receive code of the kernel
to execute. This code verifies the packet, processes options, reassembles
the packet, and finally demultiplexes (unbundles) the TCP segment out of
the IP packet data field. This consists of the following steps:
The kernel copies the IP packet from the (possibly many) mbufs it is stored
in into a single contiguous memory area to ease verification and processing.
Verification means checking for damaged or incorrect formats. Since IP
is an unreliable protocol, bad packets are simply thrown away.
Since IP packets may be fragmented by small MTU networks, the kernel potentially
needs to do reassembly of the complete, original packet.
-
A software interupt is again used to notify the kernel of the arrival of
a transport layer segment, in this case a TCP SYN segment.
-
The TCP code processes the received segment, checking the format, the offset,
options, updating the value of the sliding window, etc. If this were a
data segment, then the data would be queued on the appropriate sockets
incoming buffer. That data would then be available to the application via
read().
-
In this case, since the segment is a SYN segment, there is no data and
instead a new socket is allocated, initialized to the CLOSED state, and
a SYN/ACK segment is sent in response.
-
The three-way TCP handshake is completed and the connection is established
when the server receives an ACK segment to its SYN/ACK segment.
-
Now that a TCP connection is set-up, the client and server may exchange
data frames.
-
The ftp server forks and hands-off processing this new client to its child
process, returns to listening on well-known port 21. We'll now start calling
the child process the server.
-
In the case of this FTP example, the server notifies the client that it
is ready, and the ftp client prompts the user for a login name.
-
The ftp client then sends a USER message containing the login name to the
server and awaits a result message.
-
The ftp client then sends a PASS message containing the password it obtained
from the user.
-
At this point the control connection for the FTP protocol has been set-up.
Note that each data transfer done with FTP requires the establishment of
a separate data connection.
-
Eventually the user says "quit" to the ftp client process and the TCP connection
is torn down (up to four TCP segments exchanged).