Silicon-based forwarding engines today can switch at wire rate 2-3 M
packets/sec - lots of companies do this
10 G (0C192) Ethernet is ~2 years away - limits are laser optics &
standards, not forwarding performance
$5000 lasers - too much per portNo forwarding issues up to 40 G bps (100 M packets/sec) limits of electronics. The big technical challenges are (2-5 years):
flattening the net
flattening the stack
Boundaries are where everything happens for the Internet: qos, authentication, charging, security, etc. You need lots of CPU power in routers to do this.
At the core you don't need much CPU power since the actions/decisions
are much simpler.
Looking 5 years ahead, the router box looks like:
1,2 fibers coming in with WDM - lots of power needed
channelized OC48 out to subscribers
IP router - ATM - SONET - WDM
5 years ago; expensive boxesIP router - SONET - WDM
not possible to shred IP packets into cells at OC192, no one does this at OC48 even
this is an example of the telco controlling the WANdrop the ATM since it doesn't buy muchIP router - WDM raw fiber
SONET mux is 2x as much money as IP router
SONET forced on campuses by the telco, since that's what they speakthis is where we need to go
Cisco could make a cheap box: small chassis, 16 GBE ports, 1 incoming fiber, $30k
but how do you hook up to the WDM of the telco? There needs to be a standard for this.DPT - Ciscos attempt to replace SONET
Phone system
phone switches are big mainframes, they know network topology, compute route, sets up connection for call (vc and time slots)Alternative is packet network
notice that:no smarts in the network, all in the controller. the switches are very dumb
the controller is very expensive
1 billion DS0s in OC192 - the switch controller knows them all for the network it controls
scaling is: trunk speed * number of trunks (i.e. bad)
system can't adapt: only point of control is at connection setup
large setup cost means you have to have lengthy conversations to amortize the costnot good for simple 1 packet exchanges (e.g. web, POS)services must be constrained (e.g. only DS0 calls) since it is NP hard to mix different services (bin packing problem)
system is very fragile - call state is spread along the path in the switches; lose one you lose the whole call
picture: cloud versus string, routers instead of switches
intelligence and work is distributed throughout the network
scaling is: trunk speed (networks can be bigger and cheaper)
architecturally agnostic: easy to incorporate strings as part of the cloud - not true in the reverse (hard to put IP clouds into ATM strings)
can adapt at any time scale consistent with the speed of light - data is self-describing, intelligence is distributed
no setup to amortize - supports single packet transactions
no constraint on services - mix and match different speeds of service
the best part: the bigger the network gets, the more reliable it isTelcos do a a great marketing con job on "five nines" reliability. They need super-high component reliability to get their system to work at all
Graph
system rel versus network diameter (hops)IP routers aren't particularly reliable, they are cheap. They could certainly be better. The market determines this; cost has been more important than reliability. Now there is demand for telco reliability as voice moves to IP, so IP routers will be improved. The fact that the net works at all, given the relative unreliability of the components, hints at the underlying mathematical stuctural advantage of clouds versus strings.
IP network - gradually rises from 0.9 with diameter
telco - decreases from 0.9 slow curve as the networks get biggerThis relationship forces the telcos to work very hard to get reliability - the switches must be fantastic
IP networks - routers aren't as important, the very structure gives reliability![]()
Theory of random graphs
An arbitrarily connected graph, with a degree slightly above fully connected (just epsilon above) has the property that the graph is almost certainly connected. If you go above epsilon then there is almost no failure that will disconnect the graph. This was the original motivation for the net, even though the math wasn't known at the time.Basic reasonStrings fail if any single part fails
Clouds fail only when everything fails
Sender and receiver act like mirrors - bouncing packets/acks back and forth. The network is highly non-linear. Two mirrors at the end of a non-linear substrate is a laser. What arises is rich temporal structure from chaos. So it isn't suprising that telcos don't see this structure: they don't have mirrors. Routers remember what is done, so there is an iteration, further promoting ordered behavior.
The only lab to examine this behavior is the net itself. Lots of data must be present to see the structure, to have the behavior emerge.
In the US: the testbeds have acceptable use policies that don't let you experiment on backbones. So no testing shows up the important behavior.
We don't know why the net works as well as it does; no one knows the general principles.
US academic research is focused on producing novelty, not understanding.
US corporate research is focused on producing products and market differentiators.
Component research is what we do today. System, control, understanding is the research we need to do.
VJ: True, but this is not an an artifact of the technical aspects of the system, rather it is political, economic, etc. Bad policy on good technology. A surplus of bandwidth might improve this situation as people come to value more rich interconnection, rather than jealouly guarding bandwidth.
Don't firewalls and netboxes (the net of today) break the picture of the net you painted?
VJ: Perception is that the only way to do it today. Moving toward universal connectivity is important, more so than doing it with the best possible architecture. Peer-to-peer is much better, but client-server is the way growth is taking place now.
What about routing reliability?
VJ: Wavefront routing algorithms don't converge as fast, but you don't lose as much data when a link fails. The other view is that the ends are responsible for reliability, the routers just deal with their local issues. End systems should try to improve their reliability knowing that the routers are acting only locally. Family of algorithms involving multicast to do this. Driven by telephony since dropping voice packets is bad.
My experience is that the net doesn't work, and the telco net does work.
VJ: Structurally the net is capable of greater reliability. The bigger
it gets, the more reliable it is. The telco net is only reliable because
the pieces are very reliable. This is from the combinatoric math underlying
the two types of networks (strings versus clouds). The unreliability of
the net today is driven by economics, and the fact that it works at all
is indicative of the underlying potential for greater reliability.