18   Mininet

It is certainly possible to run network experiments using a set of interconnected virtual machines. However, even pared-down virtual machines consume sufficient resources that it is hard to create a network of more than a handful of nodes. Mininet is a system that supports the creation of lightweight virtual nodes sometimes called containers, or, more accurately, network namespaces. These containers consume sufficiently few resources that networks of over a thousand nodes have been created, running on a single laptop. While Mininet was originally developed as a testbed for software-defined networking (2.7   Software-Defined Networking), it works just as well for demonstrations and experiments involving traditional networking.

A Mininet container is a process (or group of processes) that no longer has access to all the host system’s “native” network interfaces, much as a process that has executed the chroot() system call no longer has access to the full filesystem. Mininet containers then are assigned virtual Ethernet interfaces (see the ip-link man page entries for veth), which are connected to other containers through virtual Ethernet links. Any process started within a container inherits the container’s view of network interfaces.

For efficiency, Mininet containers all share the same filesystem by default. This makes setup simple, but sometimes causes problems with applications that expect individualized configuration files in specified locations. Mininet containers can be configured with different filesystem views, though we will not do this here.

Mininet is a form of network emulation, as opposed to simulation. An important advantage of emulation is that all network software, at any layer, is simply run “as is”; in a simulator environment, applications and protocol implementations need to be ported to run within the simulator before they can be used. A drawback of emulation is that as the network gets large and complex the emulation may slow down. In particular, it is not possible to emulate link speeds faster than the underlying hardware can support.

The Mininet group maintains extensive documentation; three useful starting places are the Overview, the Introduction and the FAQ.

The goal of this chapter is to present a series of Mininet examples. Most examples are in the form of a self-contained Python2 file (Mininet does not at this time support Python3). Each Mininet Python2 file configures the network and then starts up the Mininet command-line interface (which is necessary to start commands on the various node containers). The use of self-contained Python files arguably makes the configurations easier to edit, and avoids the complex command-line arguments of many standard Mininet examples.

We will occasionally need supplemental programs as well. When possible these supplemental programs will be written, perhaps confusingly, in Python3. Python2 files are run with the python command, while Python3’s command is python3. Alternatively, given that all these programs are running under linux, one can make all Python files executable and be sure that the first line is either #!/usr/bin/python or #!/usr/bin/python3 as appropriate.

18.1   Installing Mininet

Mininet runs only under the linux operating system. Windows and Mac users can, however, easily run Mininet in a single linux virtual machine. Even linux users may wish to do this, as running Mininet has a nontrivial potential to affect normal operation (a virtual-switch process started by Mininet has, for example, interfered with the suspend feature on the author’s laptop).

The Mininet group maintains a virtual machine with a current Mininet installation at their downloads site. The download file is actually a .zip file, which unzips to a modest .ovf file defining the specifications of the virtual machine and a much larger (~2 GB) .vmdk file representing the virtual disk image. (Some unzip versions have trouble with unzipping very large files; if that happens, search online for an alternative version.)

There are several choices for virtual-machine software; two options that are well supported and free (as of 2017) for personal use are VirtualBox and VMware Workstation Player. The .ovf file should open in either (in VirtualBox with the “import appliance” option). However, it may be easier simply to create a new linux virtual machine and specify that it is to use an existing virtual disk; then select the downloaded .vmdk file as that disk.

Both the login name and the password for the virtual machine is “mininet”. Once logged in, the sudo command can be used to obtain root privileges, which are needed to run Mininet. It is safest to do this on a command-by-command basis; eg sudo python switchline.py. It is also possible to keep a terminal window open that is permanently logged in as root, eg via sudo bash.

Another option is to set up a linux virtual machine from scratch (eg via the Ubuntu distribution) and then install Mininet on it, although the preinstalled version also comes with other useful software, such as the Pox controller for Openflow switches.

The preinstalled version does not, however, come with any graphical-interface desktop. One can install the full Ubuntu desktop with the command (as root) apt-get install ubuntu-desktop. This will, however, add more than 4 GB to the virtual disk. A lighter-weight option, recommended by the Mininet site, is to install the alternative desktop environment lxde; it is half the size of Ubuntu. Install it with

apt-get install xinit lxde

The standard graphical text editor included with lxde is leafpad, though of course others can be installed as well.

After desktop installation, the command startx will be necessary after login to start the graphical environment (though one can automate this). A standard recommendation for new Debian-based linux systems, before installing anything else, is

apt-get update
apt-get upgrade

Most virtual-machine software offers a special package to improve compatibility with the host system. One of the most annoying incompatibilities is the tendency of the virtual machine to grab the mouse and not allow it to be dragged outside the virtual-machine window. (Usually a special keypress releases the mouse; on VirtualBox it is the right-hand Ctrl key and on VMWare Player it is Cntl-Alt.) Installation of the compatibility package (in VirtualBox called Guest Additions) usually requires mounting a CD image, with the command

mount /dev/cdrom /media/cdrom

The Mininet installation itself can be upgraded as follows:

cd /home/mininet/mininet
git fetch
git checkout master   # Or a specific version like 2.2.1
git pull
make install

The simplest environment for beginners is to install a graphical desktop (eg lxde) and then work within it. This allows seamless opening of xterm and WireShark as necessary. Enabling copy/paste between the virtual system and the host is also convenient.

However, it is also possible to work entirely without the desktop, by using multiple ssh logins with X-windows forwarding enabled:

ssh -X -l username mininet

This does require an X-server on the host system, but these are available even for Windows (see, for example, Cygwin/X). At this point one can open a graphical program on the ssh command line, eg wireshark & or gedit mininet-demo.py &, and have the program window display properly (or close to properly).

Finally, it is possible to access the Mininet virtual machine solely via ssh terminal sessions, without X-windows, though one then cannot launch xterm or WireShark.

18.2   A Simple Mininet Example

Starting Mininet via the mn command (as root!), with no command-line arguments, creates a simple network of two hosts and one switch, h1–s1–h2, and starts up the Mininet command-line interface (CLI). By convention, Mininet host names begin with ‘h’ and switch names begin with ‘s’; numbering begins with 1.

At this point one can issue various Mininet-CLI commands. The command nodes, for example, yields the following output:

available nodes are:
c0 h1 h2 s1

The node c0 is the controller for the switch s1. The default controller action her makes s1 behave like an Ethernet learning switch (2.4.1   Ethernet Learning Algorithm). The command intfs lists the interfaces for each of the nodes, and links lists the connections, but the most useful command is net, which shows the nodes, the interfaces and the connections:

h1 h1-eth0:s1-eth1
h2 h2-eth0:s1-eth2
s1 lo:  s1-eth1:h1-eth0 s1-eth2:h2-eth0

From the above, we can see that the network looks like this:

18.2.1   Running Commands on Nodes

The next step is to run commands on individual nodes. To do this, we use the Mininet CLI and prefix the command name with the node name:

h1 ifconfig
h1 ping h2

The first command here shows that h1 (or, more properly, h1-eth0) has IP address Note that the name ‘h2’ in the second is recognized. The ifconfig command also shows the MAC address of h1-eth0, which may vary but might be something like 62:91:68:bf:97:a0. We will see in the following section how to get more human-readable MAC addresses.

There is a special Mininet command pingall that generates pings between each pair of hosts.

We can open a full shell window on node h1 using the Mininet command below; this works for both host nodes and switch nodes.

xterm h1

Note that the xterm runs with root privileges. From within the xterm, the command ping h2 now fails, as hostname h2 is not recognized. We can switch to ping, or else add entries to /etc/hosts for the IP addresses of h1 and h2:        h1        h2

As the Mininet system shares its filesystem with h1 and h2, this means that the names h1 and h2 are now defined everywhere within Mininet (though be forwarned when a different Mininet configuration assigns different addresses to h1 or h2, chaos will ensue).

From within the xterm on h1 we might try logging into h2 via ssh: ssh h2 (if h2 is defined in /etc/hosts as above). But the connection is refused: the ssh server is not running on node h2. We will return to this in the following example.

We can also start up WireShark, and have it listen on interface h1-eth0, and see the progress of our pings. (We can also usually start WireShark from the mininet> prompt using h1 wireshark &.)

Similarly, we can start an xterm on the switch and start WireShark there. However, there is another option, as switches by default share all their network systems with the Mininet host system. (In terms of the container model, switches do not by default get their own network namespace; they share the “root” namespace with the host.) We can see this by running the following from the Mininet command line

s1 ifconfig

and comparing the output with that of ifconfig run on the Mininet host, while Mininet is running but outside of the Mininet process itself. We see these interfaces:


We see the same interfaces on the controller node c0, even though the net and intfs commands above showed no interfaces for c0.

Running WireShark on, say, s1-eth1 is an excellent way to observe traffic on a nearly idle network; by default, the Mininet nodes are not connected to the outside world. As an example, suppose we start up xterm windows on h1 and h2, and run netcat -l 5432 on h2 and then netcat 5432 on h1. We can then watch the ARP exchange, the TCP three-way handshake, the content delivery and the connection teardown, with most likely no other traffic at all. Wireshark filtering is not needed.

18.3   Multiple Switches in a Line

The next example creates the topology below. All hosts are on the same subnet.

The Mininet-CLI command links can be used to determine which switch interface is connected to which neighboring switch interface.

The full Python2 program is switchline.py; to run it use

python switchline.py

This configures the network and starts the Mininet CLI. The default number of host/switch pairs is 4, but this can be changed with the -N command-line parameter, for example python switchline.py -N 5.

We next describe selected parts of switchline.py. The program starts by building the network topology object, LineTopo, extending the built-in Mininet class Topo, and then call Topo.addHost() to create the host nodes.

class LineTopo( Topo ):
   def __init__( self , **kwargs):
       "Create linear topology"
       super(LineTopo, self).__init__(**kwargs)
       h = []          # list of hosts; h[0] will be h1, etc
       s = []          # list of switches

       for key in kwargs:
          if key == 'N': N=kwargs[key]

       # add N hosts  h1..hN
       for i in range(1,N+1):
          h.append(self.addHost('h' + str(i)))

Method Topo.addHost() takes a string, such as “h2”, and builds a host object of that name. We immediately append the new host object to the list h[]. Next we do the same to switches, using Topo.addSwitch():

# add N switches s1..sN
for i in range(1,N+1):
   s.append(self.addSwitch('s' + str(i)))

Now we build the links, with Topo.addLink. Note that h[0]..h[N-1] represent h1..hN. First we build the host-switch links, and then the switch-switch links.

for i in range(N):               # Add links from hi to si
   self.addLink(h[i], s[i])

for i in range(N-1):            # link switches

Now we get to the main program. We use argparse to support the -N command-line argument.

def main(**kwargs):
    parser = argparse.ArgumentParser()
    parser.add_argument('-N', '--N', type=int)
    args = parser.parse_args()
    if args.N is None:
        N = 4
        N = args.N

Next we create a LineTopo object, defined above. We also set the log-level to ‘info’; if we were having problems we would set it to ‘debug’.

ltopo = LineTopo(N=N)

Finally we’re ready to create the Mininet net object, and start it. We’ve specified the type of switch here, though at this point that does not really matter. It does matter that we’re using the DefaultController, as otherwise the switches will not behave automatically as Ethernet learning switches. The autoSetMacs option sets the host MAC adddresses to 00:00:00:00:00:01 through 00:00:00:00:00:04 (for N=4), which can be a great convenience when manually examining Ethernet addresses.

net = Mininet(topo = ltopo, switch = OVSKernelSwitch,
            controller = DefaultController,
            autoSetMacs = True

The next bit starts /usr/sbin/sshd on each node. This command automatically puts itself in the background; otherwise we would need to add a second string parameter ‘&’ to h1.cmd().

for i in range(1, N+1):
   hi = net['h' + str(i)]

Finally we start the Mininet CLI, and, when that exits, we stop the emulation.

CLI( net)

Because we started sshd on each host, the command ssh on h1 will successfully connect to h4. The first time a connection is made from h1 to h4 (as root), ssh will ask for confirmation, and then store h4’s key in /root/.ssh/known_hosts. As this is the same file for all Mininet nodes, due to the common filesystem, a subsequent request to connect from h2 to h4 will succeed immediately; h4 has already been authenticated for all nodes.

18.4   IP Routers in a Line

In the next example we build a Mininet example involving a router rather than a switch. A router here is simply a multi-interface Mininet host that has IP forwarding enabled in its linux kernel. Mininet support for multi-interface hosts is somewhat fragile; interfaces may need to be initialized in a specific order, and IP addresses often cannot be assigned at the point when the link is created. In the code presented below we assign IP addresses using calls to Node.cmd() used to invoke the linux command ifconfig (Mininet containers do not support the use of the alternative ip addr command).

Our first router topology has only two hosts, one at each end, and N routers in between; below is the diagram with N=3. All subnets are /24. The program to set this up is routerline.py, here invoked as python routerline.py -N 3. We will use N=3 in most of the examples below. A somewhat simpler version of the program, which sets up the topology specifically for N=3, is routerline3.py.

In both versions of the program, routing entries are created to route traffic from h1 to h2, but not back again. That is, every router has a route to, but only r1 knows how to reach (to which r1 is directly connected). We can verify the “one-way” connectedness by running WireShark or tcpdump on h2 (perhaps first starting an xterm on h2), and then running ping on h1 (perhaps using the Mininet command h1 ping h2). WireShark or tcpdump should show the arriving ICMP ping packets from h1, and also the arriving ICMP Destination Network Unreachable packets from r3 as h2 tries to reply (see 7.11   Internet Control Message Protocol).

It turns out that one-way routing is considered to be suspicious; one interpretation is that the packets involved have a source address that shouldn’t be possible, perhaps spoofed. Linux provides the interface configuration option rp_filter – reverse-path filter – to block the forwarding of packets for which the router does not have a route back to the packet’s source. This must be disabled for the one-way example to work; see the notes on the code below.

Despite the lack of connectivity, we can reach h2 from h1 via a hop-by-hop sequence of ssh connections (the program enables sshd on each host and router):

h1: slogin
r1: slogin
r2: slogin
r3: slogin (that is, h3)

To get the one-way routing to work from h1 to h2, we needed to tell r1 and r2 how to reach destination This can be done with the following commands (which are executed automatically if we set ENABLE_LEFT_TO_RIGHT_ROUTING = True in the program):

r1: ip route add to via
r2: ip route add to via

To get full, bidirectional connectivity, we can create the following routes to

r2: ip route add to via
r3: ip route add to via

When building the network topology, the single-interface hosts can have all their attributes set at once (the code below is from routerline3.py:

h1 = self.addHost( 'h1', ip='', defaultRoute='via' )
h2 = self.addHost( 'h2', ip='', defaultRoute='via' )

The routers are also created with addHost(), but with separate steps:

r1 = self.addHost( 'r1' )
r2 = self.addHost( 'r2' )

self.addLink( h1, r1, intfName1 = 'h1-eth0', intfName2 = 'r1-eth0')
self.addLink( r1, r2, inftname1 = 'r1-eth1', inftname2 = 'r2-eth0')

Later on the routers get their IPv4 addresses:

r1 = net['r1']
r1.cmd('ifconfig r1-eth0')
r1.cmd('ifconfig r1-eth1')
r1.cmd('sysctl net.ipv4.ip_forward=1')

The sysctl command here enables forwarding in r1. The rp_disable(r1) call disables Linux’s default refusal to forward packets if the router does not have a route back to the packet’s source; this is often what is wanted in the real world but not necessarily in routing demonstrations. It too is ultimately implemented via sysctl commands.

18.5   IP Routers With Simple Distance-Vector Configuration

The next step is to automate the discovery of the route from h1 to h2 (and back) by using a simple distance-vector routing-update protocol. We give a partial implementation of the Routing Information Protocol, RIP, as defined in RFC 2453.

The distance-vector algorithm is described in 9.1   Distance-Vector Routing-Update Algorithm. In brief, the idea is to add a cost attribute to the forwarding table, so entries have the form (destination,next_hop,cost). Routers then send (destination,cost) lists to their neighbors; these lists are referred to the RIP specification as update messages. Routers receiving these messages then process them to figure out the lowest-cost route to each destination. The format of the update messages is diagrammed below:

The full RIP specification also includes request messages, but the implementation here omits these. The full specification also includes split horizon, poison reverse and triggered updates (   Split Horizon and   Triggered Updates); we omit these as well. Finally, while we include code for the third next_hop increase case of 9.1.1   Distance-Vector Update Rules, we do not include any test for whether a link is down, so this case is never triggered.

The implementation is in the Python3 file rip.py. Most of the time, the program is waiting to read update messages from other routers. Every UPDATE_INTERVAL seconds the program sends out its own update messages. All communication is via UDP packets sent using IP multicast, to the official RIP multicast address Port 520 is used for both sending and receiving.

Rather than creating separate threads for receiving and sending, we configure a short (1 second) recv() timeout, and then after each timeout we check whether it is time to send the next update. An update can be up to 1 second late with this approach, but this does not matter.

The program maintains a “shadow” copy RTable of the real system forwarding table, with an added cost column. The real table is updated whenever a route in the shadow table changes. In the program, RTable is a dictionary mapping TableKey values (consisting of the IP address and mask) to TableValue objects containing the interface name, the cost, and the next_hop.

To run the program, a “production” approach would be to use Mininet’s Node.cmd() to start up rip.py on each router, eg via r.cmd('python3 rip.py &') (assuming the file rip.py is located in the same directory in which Mininet was started). For demonstrations, the program output can be observed if the program is started in an xterm on each router.

18.5.1   Multicast Programming

Sending IP multicast involves special considerations that do not arise with TCP or UDP connections. The first issue is that we are sending to a multicast group – – but don’t have any multicast routes (multicast trees, 20.5   Global IP Multicast) configured. What we would like is to have, at each router, traffic to forwarded to each of its neighboring routers.

However, we do not actually want to configure multicast routes; all we want is to reach the immediate neighbors. Setting up a multicast tree presumes we know something about the network topology, and, at the point where RIP comes into play, we do not. The multicast packets we send should in fact not be forwarded by the neighbors (we will enforce this below by setting TTL); the multicast model here is very local. Even if we did want to configure multicast routes, linux does not provide a standard utility for manual multicast-routing configuration; see the ip-mroute.8 man page.

So what we do instead is to create a socket for each separate router interface, and configure the socket so that it forwards its traffic only out its associated interface. This introduces a complication: we need to get the list of all interfaces, and then, for each interface, get its associated IPv4 addresses with netmasks. (To simplify life a little, we will assume that each interface has only a single IPv4 address.) The function getifaddrdict() returns a dictionary with interface names (strings) as keys and pairs (ipaddr,netmask) as values. If ifaddrs is this dictionary, for example, then ifaddrs['r1-eth0'] might be ('',''). We could implement ``getifaddrdict() straightforwardly using the Python module netifaces, though for demonstration purposes we do it here via low-level system calls.

We get the list of interfaces using myInterfaces = os.listdir('/sys/class/net/'). For each interface, we then get its IP address and netmask (in get_ip_info(intf)) with the following:

s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
SIOCGIFADDR    = 0x8915     # from /usr/include/linux/sockios.h
intfpack = struct.pack('256s', bytes(intf, 'ascii'))
# ifreq, below, is like struct ifreq in /usr/include/linux/if.h
ifreq    = fcntl.ioctl(s.fileno(), SIOCGIFADDR, intfpack)
ipaddrn  = ifreq[20:24]     # 20 is the offset of the IP addr in ifreq
ipaddr   = socket.inet_ntoa(ipaddrn)
netmaskn = fcntl.ioctl(s.fileno(), SIOCGIFNETMASK, intfpack)[20:24]
netmask  = socket.inet_ntoa(netmaskn)
return (ipaddr, netmask)

We need to create the socket here (never connected) in order to call ioctl(). The SIOCGIFADDR and SIOCGIFNETMASK values come from the C language include file; the Python3 libraries do not make these constants available but the Python3 fcntl.ioctl() call does pass the values we provide directly to the underlying C ioctl() call. This call returns its result in a C struct ifreq; the ifreq above is a Python version of this. The binary-format IPv4 address (or netmask) is at offset 20.   createMcastSockets()

We are now in a position, for each interface, to create a UDP socket to be used to send and receive on that interface. Much of the information here comes from the linux socket.7 and ip.7 man pages. The function createMcastSockets(ifaddrs) takes the dictionary above mapping interface names to (ipaddr,netmask) pairs and, for each interface intf, configures it as follows. The list of all the newly configured sockets is then returned.

The first step is to obtain the interface’s address and mask, and then convert these to 32-bit integer format as ipaddrn and netmaskn. We then enter the subnet corresponding to the interface into the shadow routing table RTable with a cost of 1 (and with a next_hop of None), via

RTable[TableKey(subnetn, netmaskn)] = TableValue(intf, None, 1)

Next we create the socket and begin configuring it, first by setting its read timeout to a short value. We then set the TTL value used by outbound packets to 1. This goes in the IPv4 header Time To Live field (7.1   The IPv4 Header); this means that no downstream routers will ever forward the packet. This is exactly what we want; RIP uses multicast only to send to immediate neighbors.

sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_TTL, 1)

We also want to be able to bind the same socket source address, and port 520, to all the sockets we are creating here (the actual bind() call is below):

sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

The next call makes the socket receive only packets arriving on the specified interface:

sock.setsockopt(socket.SOL_SOCKET, socket.SO_BINDTODEVICE, bytes(intf, 'ascii'))

We add the following to prevent packets sent on the interface from being delivered back to the sender; otherwise multicast delivery may do just that:

sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_LOOP, False)

The next call makes the socket send on the specified interface. Multicast packets do have IPv4 destination addresses, and normally the kernel chooses the sending interface based on the IP forwarding table. This call overrides that, in effect telling the kernel how to route packets sent via this socket. (The kernel may also be able to figure out how to route the packet from the subsequent call joining the socket to the multicast group.)

sock.setsockopt(socket.IPPROTO_IP, socket.IP_MULTICAST_IF, socket.inet_aton(ipaddr))

Finally we can join the socket to the multicast group represented by We also need the interface’s IP address, ipaddr.

addrpair = socket.inet_aton('')+ socket.inet_aton(ipaddr)
sock.setsockopt(socket.IPPROTO_IP, socket.IP_ADD_MEMBERSHIP, addrpair)

The last step is to bind the socket to the desired address and port, with sock.bind(('', 520)). This specifies the source address of outbound packets; it would fail (given that we are using the same socket address for multiple interfaces) without the SO_REUSEADDR configuration above.

18.5.2   The RIP Main Loop

The rest of the implementation is relatively nontechnical. One nicety is the use of select() to wait for arriving packets on any of the sockets created by createMcastSockets() above; the alternatives might be to poll each socket in turn with a short read timeout or else to create a separate thread for each socket. The select() call takes the list of sockets (and a timeout value) and returns a sublist consisting of those sockets that have data ready to read. Almost always, this will be just one of the sockets. We then read the data with s.recvfrom(), recording the source address src which will be used when we, next, call update_tables(). When a socket closes, it must be removed from the select() list, but the sockets here do not close; for more on this, see   dualreceive.py.

The update_tables() function takes the incoming message (parsed into a list of RipEntry objects via parse_msg()) and the IP address from which it arrives, and runs the distance-vector algorithm of 9.1.1   Distance-Vector Update Rules. TK is the TableKey object representing the new destination (as an (addr,netmask) pair). The new destination rule from 9.1.1   Distance-Vector Update Rules is applied when TK is not present in the existing RTable. The lower cost rule is applied when newcost < currentcost, and the third next_hop increase rule is applied when newcost > currentcost but currentnexthop == update_sender.

18.6   TCP Competition: Reno vs Vegas

The next routing example uses the following topology in order to emulate competition between two TCP connections h1→h3 and h2→h3. We introduce Mininet features to set, on the links, an emulated bandwidth and delay, and to set on the router an emulated queue size. Our first application will be to arrange a competition between TCP Reno (13   TCP Reno and Congestion Management) and TCP Vegas (15.5   TCP Vegas). The Python2 file for running this Mininet configuration is competition.py.

To create links with bandwidth/delay support, we simply set Link=TCLink in the Mininet() call in main(). The TCLink class represents a Traffic Controlled Link. Next, in the topology section calls to addLink(), we add keyword parameters such as bw=BottleneckBW and delay=DELAY. To implement the bandwidth limit, Mininet then takes care of creating the virtual-Ethernet links with a rate constraint.

To implement the delay, Mininet uses a queuing hierarchy (19.7   Hierarchical Queuing). The hierarchy is managed by the tc (traffic control) command, part of the LARTC system. In the topology above, Mininet sets up h3’s queue as an htb queue (19.13.2   Linux htb) with a netem queue below it (see the man page for tc-netem.8). The latter has a delay parameter set as requested, to 110 ms in our example here. Note that this means that the delay from h3 to r will be 110 ms, and the delay from r to h3 will be 0 ms.

The queue configuration is also handled via the tc command. Again Mininet configures r’s r-eth3 interface to have an htb queue with a netem queue below it. Using the tc qdisc show command we can see that the “handle” of the netem queue is 10:; we can now set the maximum queue size to, for example, 25 with the following command on r:

tc qdisc change dev r-eth3 handle 10: netem limit 25

18.6.1   Running A TCP Competition

In order to arrange a TCP competition, we need the following tools:

  • sender.py, to open the TCP connection and send bulk data, after requesting a specific TCP congestion-control mechanism (Reno or Vegas)
  • dualreceive.py, to receive data from two connections and track the results
  • randomtelnet.py, to send random additional data to break TCP phase effects.
  • wintracker.py, to monitor the number of packets a connection has in flight (a good estimator of cwnd).   sender.py

The Python3 program sender.py is similar to tcp_stalkc.py, except that it allows specification of the TCP congestion algorithm. This is done with the following setsockopt() call:

s.setsockopt(socket.IPPROTO_TCP, TCP_CONGESTION, cong)

where cong is “reno” or “cubic” or some other available TCP flavor. The list is at /proc/sys/net/ipv4/tcp_allowed_congestion_control.

See also 15.1   Choosing a TCP on linux.   dualreceive.py

The receiver for sender.py’s data is dualreceive.py. It listens on two ports, by default 5430 and 5431, and, when both connections have been made, begins reading. The main loop starts with a call to select(), where sset is the list of all (both) connected sockets:

sl,_,_ = select(sset, [], [])

The value sl is a sublist of sset consisting of the sockets with data ready to read. It will normally be a list consisting of a single socket, though with so much data arriving it may sometimes contain both. We then call s.recv() for s in sl, and record in either count1 or count2 the running total of bytes received.

If a sender closes a socket, this results in a read of 0 bytes. At this point dualreceive.py must close the socket, at which point it must be removed from sset as it will otherwise always appear in the sl list.

We repeatedly set a timer (in printstats()) to print the values of count1 and count2 at 0.1 second intervals, reflecting the cumulative amounts of data received by the connections. If the TCP competition is fair, count1 and count2 should stay approximately equal. When printstats() detects no change in count1 and count2, it exits.

In Python, calling exit() only exits the current thread; the other threads keep running.   randomtelnet.py

In 16.3.4   Phase Effects we show that, with completely deterministic travel times, two competing TCP connections can have throughputs differing by a factor of as much as 10 simply because of unfortunate synchronizations of transmission times. We must introduce at least some degree of packet-arrival-time randomization in order to obtain meaningful results.

In 16.3.6   Phase Effects and overhead we used the ns2 overhead attribute for this. This is not availble in real networks, however. The next-best thing is to introduce some random telnet-like traffic, as in 16.3.7   Phase Effects and telnet traffic. This is the purpose of randomtelnet.py.

This program sends packets at random intervals; the lengths of the intervals are exponentially distributed, meaning that to find the length of the next interval we choose X randomly between 0 and 1 (with a uniform distribution), and then set the length of the wait interval to a constant times -log(X). The packet sizes are 210 bytes (a very atypical value for real telnet traffic). Crucially, the average rate of sending is held to a small fraction (by default 1%) of the available bottleneck bandwidth, which is supplied as a constant BottleneckBW. This means the randomtelnet traffic should not interfere significantly with the competing TCP connections (which, of course, have no additional interval whatsoever between packet transmissions, beyond what is dictated by sliding windows). The randomtelnet traffic appears to be quite effective at eliminating TCP phase effects.

Randomtelnet.py sends to port 5433 by default. We will usually use netcat (12.6.2   netcat again) as the receiver, as we are not interested in measuring throughput for this traffic.   Monitoring cwnd with wintracker.py

At the end of the competition, we can look at the dualreceive.py output and determine the overall throughput of each connection, as of the time when the first connection to send all its data has just finished. We can also plot throughput at intervals by plotting successive differences of the cumulative-throughput values.

However, this does not give us a view of each connection’s cwnd, which is readily available when modeling competition in a simulator. Indeed, getting direct access to a connection’s cwnd is nearly impossible, as it is a state variable in the sender’s kernel.

However, we can do the next best thing: monitor the number of packets (or bytes) a connection has in flight; this is the difference between the highest byte sent and the highest byte acknowledged. The highest byte ACKed is one less than the value of the ACK field in the most recent ACK packet, and the highest byte sent is one less than the value of the SEQ field, plus the packet length, in the most recent DATA packet.

To get these ACK and SEQ numbers, however, requires eavesdropping on the network connections. We can do this using a packet-capture library such as libpcap. The Pcapy Python2 (not Python3) module is a wrapper for libpcap.

The program wintracker.py uses Pcapy to monitor packets on the interfaces r-eth1 and r-eth2 of router r. It would be slightly more accurate to monitor on h1-eth0 and h2-eth0, but that entails separate monitoring of two different nodes, and the difference is small as the h1–r and h2–r links have negligible delay and no queuing. Wintracker.py must be configured to monitor only the two TCP connections that are competing.

The way libpcap, and thus Pcapy, works is that we first create a packet filter to identify the packets we want to capture. The filter for both connections is

host and tcp and portrange 5430-5431

The host is, of course, h3; packets are captured if either source host or destination host is h3. Similarly, packets are captured if either the source port or the destination port is either 5430 or 5431. The connection from h1 to h3 is to port 5430 on h3, and the connection from h2 to h3 is to port 5431 on h3.

For the h1–h3 connection, each time a packet arrives heading from h1 to h3 (in the code below we determine this because the destination port dport is 5430), we save in seq1 the TCP header SEQ field plus the packet length. Each time a packet is seen heading from h3 to h1 (that is, with source port 5430), we record in ack1 the TCP header ACK field. The packets themselves are captured as arrays of bytes, but we can determine the offset of the TCP header and read the four-byte SEQ/ACK values with appropriate helper functions:

_,p = cap1.next()                       # p is the captured packet
(_,iphdr,tcphdr,data) = parsepacket(p)          # find the headers
sport = int2(tcphdr, TCP_SRCPORT_OFFSET)        # extract port numbers
dport = int2(tcphdr, TCP_DSTPORT_OFFSET)
if dport == port1:                              # port1 == 5430
    seq1 = int4(tcphdr, TCP_SEQ_OFFSET) + len(data)
elif sport == port1:
    ack1 = int4(tcphdr, TCP_ACK_OFFSET)

Separate threads are used for each connection, as there is no variant of select() available to return the next captured packet of either connection.

Both the SEQ and ACK fields have had ISNA added to them, but this will cancel out when we subtract. The SEQ and ACK values are subject to 32-bit wraparound, but subtraction again saves us here.

As with dualreceive.py, a timer fires every 100 ms and prints out the differences seq1-ack1 and seq2-ack2. This isn’t completely thread-safe, but it is close enough. There is some noise in the results; we can minimize that by taking the average of several differences in a row.   Synchronizing the start

The next issue is to get both senders to start at about the same time. We could use two ssh commands, but ssh commands can take several hundred milliseconds to complete. A faster method is to use netcat to trigger the start. On h1 and h2 we run shell scripts like the one below (separate values for $PORT and $CONG are needed for each of h1 and h2, which is simplest to implement with separate scripts, say h1.sh and h2.sh):

netcat -l 2345
python3 sender.py $BLOCKS $PORT $CONG

We then start both at very close to the same time with the following on r (not on h3, due to the delay on the r–h3 link); these commands typically complete in under ten milliseconds.

echo hello | netcat h1 2345
echo hello | netcat h2 2345

The full sequence of steps is

  • On h3, start the netcat -l ... for the randomtelnet.py output (on two different ports)
  • On h1 and h2, start the randomtelnet.py senders
  • On h3, start dualreceive.py
  • On h1 and h2, start the scripts (eg h1.sh and h2.sh) that wait for the signal and start sender.py
  • On r, send the two start triggers via netcat

This is somewhat cumbersome; it helps to incorporate everything into a single shell script with ssh used to run subscripts on the appropriate host.   Reno vs Vegas results

In the Reno-Vegas graph at 16.5   TCP Reno versus TCP Vegas, we set the Vegas parameters α and β to 3 and 6 respectively. The implementation of TCP Vegas on the Mininet virtual machine does not, however, support changing α and β, and the default values are more like 1 and 3. To give Vegas a fighting chance, we reduce the queue size at r to 10 in competition.py. Here is the graph, with the packets-in-flight monitoring above and the throughput below:

TCP Vegas is getting a smaller share of the bandwidth (overall about 40% to TCP Reno’s 60%), but it is consistently holding its own. It turns out that TCP Vegas is greatly helped by the small queue size; if the queue size is doubled to 20, then Vegas gets a 17% share.

In the upper part of the graph, we can see the Reno sawteeth versus the Vegas triangular teeth (sloping down as well as sloping up); compare to the red-and-green graph at 16.5   TCP Reno versus TCP Vegas. The tooth shapes are somewhat mirrored in the throughput graph as well, as throughput is proportional to queue utilization which is proportional to the number of packets in flight.

18.7   TCP Competition: Reno vs BBR

We can apply the same technique to compare TCP Reno to TCP BBR. This was done to create the graph at 15.13   TCP BBR. The Mininet approach is usable as soon as a TCP BBR module for linux was released (in source form); to use a simulator, on the other hand, would entail waiting for TCP BBR to be ported to the simulator.

One nicety is that it is essential that the fq queuing discipline be enabled for the TCP BBR sender. If that is h2, for example, then the following Mininet code (perhaps in competition.py) removes any existing queuing discipline and adds fq:

h2.cmd('tc qdisc del dev h2-eth root')
h2.cmd('tc qdisc add dev h2-eth root fq')

18.8   The POX Controller

In the switchline.py example from earlier, we ignored the controller specification. Inside the Mininet() call was a parameter controller=DefaultController, which makes each switch behave like an ordinary Ethernet learning switch. In this section we will look at the internals of the POX controller for OpenFlow (2.7.1   OpenFlow Switches) switches, which is preinstalled on the Mininet virtual machine, and see some other possible approaches to switch operation.

Pox is, like Mininet, written in Python2. It receives and sends OpenFlow messages. Received messages, for our purposes here, can be grouped into the following categories:

  • PacketIn: a switch is informing the controller about an arriving packet, usually because the switch does not know how to forward the packet or does not konw how to forward the packet without flooding. Often, but not always, PacketIn events will result in the controller providing new forwarding instructions.
  • ConnectionUP: a switch has connected to the controller. This will be the point at which the controller gives the switch its initial packet-handling instructions.
  • LinkEvent: a switch is informing the controller of a link becoming available or becoming unavailable; this includes initial reports of link availability.
  • BarrierEvent: a switch’s response to an OpenFlow Barrier message, meaning the switch has completed its responses to all messages received before the Barrier and now may begin to respond to messages received after the Barrier.

The Pox program comes with several demonstration modules illustrating how controllers can be programmed; these are in the pox/misc and pox/forwarding directories. The starting point for Pox documentation is the Pox wiki, which among other thing includes brief outlines of these programs. We now review a few of these programs; most were written by James McCauley and are licensed under the Apache license.

The Pox code data structures are very closely tied to the OpenFlow Switch Specification, versions of which can be found at the OpenNetworking.org technical library.

18.8.1   hub.py

As a first example of Pox, suppose we take a copy of the switchline.py file and make the following changes:

  • change the controller specification, inside the Mininet() call, from controller=DefaultController to controller=RemoteController.
  • add the following lines immediately following the Mininet() call:
c = RemoteController( 'c', ip='', port=6633 )

If we now run this modified switchline.py, then pings fail because the RemoteController, c, does not yet exist; in the absence of a controller, the switches’ default response is to do nothing.

We now start Pox, in the directory /home/mininet/pox, as follows; this loads the file pox/forwarding/hub.py

./pox.py forwarding.hub

Ping connectivity should be restored! The switch connects to the controller at IPv4 address (more on this below) and TCP port 6633. At this point the controller is able to tell the switch what to do.

The hub.py example configures each switch as a simple hub, flooding each arriving packet out all other interfaces (though for the linear topology of switchline.py, this doesn’t matter much). The relevant code is here:

def _handle_ConnectionUp (event):
    msg = of.ofp_flow_mod()
    msg.actions.append(of.ofp_action_output(port = of.OFPP_FLOOD))

This is the handler for ConnectionUp events; it is invoked when a switch first reports for duty. As each switch connects to the controller, the hub.py code instructs the switch to forward each arriving packet to the virtual port OFPP_FLOOD, which means to forward out all other ports.

The event parameter is of class ConnectionUp, a subclass of class Event. It is defined in pox/openflow/__init__.py. Most switch-event objects throughout Pox include a connection field, which the controller can use to send messages back to the switch, and a dpid field, representing the switch identification number. Generally the Mininet switch s1 will have a dpid of 1, etc.

The code above creates an OpenFlow modify flow table message, msg; this is one of several types of controller-to-switch messages that are defined in the OpenFlow standard. The field msg.actions is a list of actions to be taken; to this list we append the action of forwarding on the designated (virtual) port OFPP_FLOOD.

Normally we would also append to the list msg.match the matching rules for the packets to be forwarded, but here we want to forward all packets and so no matching is needed.

A different – though functionally equivalent – approach is taken in pox/misc/of_tutorial.py. Here, the response to the ConnectionUp event involves no communication with the switch (though the connection is stored in Tutorial.__init__()). Instead, as the switch reports each arriving packet to the controller, the controller responds by telling the switch to flood the packet out every port (this approach does result in sufficient unnecessary traffic that it would not be used in production code). The code (slightly consolidated) looks something like this:

def _handle_PacketIn (self, event):
    packet = event.parsed # This is the parsed packet data.
    packet_in = event.ofp # The actual ofp_packet_in message.
    self.act_like_hub(packet, packet_in)

def act_like_hub (self, packet, packet_in):
    msg = of.ofp_packet_out()
    msg.data = packet_in
    action = of.ofp_action_output(port = of.OFPP_ALL)

The event here is now an instance of class PacketIn. This time the switch sents a packet out message to the switch. The packet and packet_in objects are two different views of the packet; the first is parsed and so is generally easier to obtain information from, while the second represents the entire packet as it was received by the switch. It is the latter format that is sent back to the switch in the msg.data field. The virtual port OFPP_ALL is equivalent to OFPP_FLOOD.

For either hub implementation, if we start WireShark on h2 and then ping from h4 to h1, we will see the pings at h2. This demonstrates, for example, that s2 is behaving like a hub rather than a switch.

18.8.2   l2_pairs.py

The next Pox example, l2_pairs.py, implements a real Ethernet learning switch. This is the pairs-based switch implementation discussed in 2.7.2   Learning Switches in OpenFlow. This module acts at the Ethernet address layer (layer 2, the l2 part of the name), and flows are specified by (src,dst) pairs of addresses. The l2_pairs.py module is started with the Pox command ./pox.py forwarding.l2_pairs.

A straightforward implementation of an Ethernet learning switch runs into a problem: the switch needs to contact the controller whenever the packet source address has not been seen before, so the controller can send back to the switch the forwarding rule for how to reach that source address. But the primary lookup in the switch flow table must be by destination address. The approach used here uses a single OpenFlow table, versus the two-table mechanism of 18.8.3   l2_nx.py. However, the learned flow table match entries will all include match rules for both the source and the destination address of the packet, so that a separate entry is necessary for each pair of communicating hosts. The number of flow entries thus scales as O(N2), which presents a scaling problem for very large switches but which we will ignore here.

When a switch sees a packet with an unmatched (dst,src) address pair, it forwards it to the controller, which has two cases to consider:

  • If the controller does not know how to reach the destination address from the current switch, it tells the switch to flood the packet. However, the controller also records, for later reference, the packet source address and its arrival interface.
  • If the controller knows that the destination address can be reached from this switch via switch port dst_port, it sends to the switch instructions to create a forwarding entry for (dst,src)→dst_port. At the same time, the controller also sends to the switch a reverse forwarding entry for (src,dst), forwarding via the port by which the packet arrived.

The controller maintains its partial map from addresses to switch ports in a dictionary table, which takes a (switch,destination) pair as its key and which returns switch port numbers as values. The switch is represented by the event.connection object used to reach the switch, and destination addresses are represented as Pox EthAddr objects.

The program handles only PacketIn events. The main steps of the PacketIn handler are as follows. First, when a packet arrives, we put its switch and source into table:

table[(event.connection,packet.src)] = event.port

The next step is to check to see if there is an entry in table for the destination, by looking up table[(event.connection,packet.dst)]. If there is not an entry, then the packet gets flooded by the same mechanism as in of_tutorial.py above: we create a packet-out message containing the to-be-flooded packet and send it back to the switch.

If, on the other hand, the controller finds that the destination address can be reached via switch port dst_port, it proceeds as follows. We first create the reverse entry; event.port is the port by which the packet just arrived:

msg = of.ofp_flow_mod()
msg.match.dl_dst = packet.src       # reversed dst and src
msg.match.dl_src = packet.dst       # reversed dst and src
msg.actions.append(of.ofp_action_output(port = event.port))

This is like the forwarding rule created in hub.py, except that we here are forwarding via the specific port event.port rather than the virtual port OFPP_FLOOD, and, perhaps more importantly, we are adding two packet-matching rules to msg.match.

The next step is to create a similar matching rule for the src-to-dst flow, and to include the packet to be retransmitted. The modify-flow-table message thus does double duty as a packet-out message as well.

msg = of.ofp_flow_mod()
msg.data = event.ofp                # Forward the incoming packet
msg.match.dl_src = packet.src       # not reversed this time!
msg.match.dl_dst = packet.dst
msg.actions.append(of.ofp_action_output(port = dst_port))

The msg.match object has quite a few potential matching fields; the following is taken from the Pox-Wiki:

Attribute Meaning
in_port Switch port number the packet arrived on
dl_src Ethernet source address
dl_dst Ethernet destination address
dl_type Ethertype / length (e.g. 0x0800 = IPv4)
nw_tos IPv4 TOS/DS bits
nw_proto IPv4 protocol (e.g., 6 = TCP), or lower 8 bits of ARP opcode
nw_src IPv4 source address
nw_dst IP destination address
tp_src TCP/UDP source port
tp_dst TCP/UDP destination port

It is also possible to create a msg.match object that matches all fields of a given packet.

We can watch the forwarding entries created by l2_pairs.py with the linux program ovs-ofctl. Suppose we start switchline.py and then the Pox module l2_pairs.py. Next, from within Mininet, we have h1 ping h4 and h2 ping h4. If we now run the command (on the Mininet virtual machine but from a linux prompt)

ovs-ofctl dump-flows s2

we get

cookie=0x0, duration=45.54s, table=0, n_packets=3, n_bytes=238, idle_age=40, dl_src=00:00:00:00:00:01,dl_dst=00:00:00:00:00:04 actions=output:3
cookie=0x0, duration=41.425s, table=0, n_packets=5, n_bytes=378, idle_age=36, dl_src=00:00:00:00:00:04,dl_dst=00:00:00:00:00:02 actions=output:1
cookie=0x0, duration=41.426s, table=0, n_packets=4, n_bytes=336, idle_age=36, dl_src=00:00:00:00:00:02,dl_dst=00:00:00:00:00:04 actions=output:3
cookie=0x0, duration=45.539s, table=0, n_packets=4, n_bytes=280, idle_age=40, dl_src=00:00:00:00:00:04,dl_dst=00:00:00:00:00:01 actions=output:2

Because we used the autoSetMacs=True option in the Mininet() call in switchline.py, the Ethernet addresses assigned to hosts are easy to follow: h1 is 00:00:00:00:00:01, etc. The first and fourth lines above result from h1 pinging h4; we can see from the output port at the end of each line that s1 must be reachable from s2 via port 2 and s3 via port 3. Similarly, the middle two lines result from h2 pinging h4; h2 lies off s2’s port 1. These port numbers correspond to the interface numbers shown in the diagram at 18.3   Multiple Switches in a Line.

18.8.3   l2_nx.py

The l2_nx.py example accomplishes the same Ethernet-switch effect as l2_pairs.py, but using only O(N) space. It does, however, use two OpenFlow tables, one for destination addresses and one for source addresses. In the implementation here, source addresses are held in table 0, while destination addresses are held in table 1; this is the reverse of the multiple-table approach outlined in 2.7.2   Learning Switches in OpenFlow. The l2 again refers to network layer 2, and the nx refers to the so-called Nicira extensions to Pox, which enable the use of multiple flow tables.

Initially, table 0 is set up so that it tries a match on the source address. If there is no match, the packet is forwarded to the controller, and sent on to table 1. If there is a match, the packet is sent on to table 1 but not to the controller.

Table 1 then looks for a match on the destination address. If one is found then the packet is forwarded to the destination, and if there is no match then the packet is flooded.

Using two OpenFlow tables in Pox requires the loading of the so-called Nicira extensions (hence the “nx” in the module name here). These require a slightly more complex command line:

./pox.py openflow.nicira --convert-packet-in forwarding.l2_nx

Nicira will also require, eg, nx.nx_flow_mod() instead of of.ofp_flow_mod().

The no-match actions for each table are set during the handling of the ConnectionUp events. An action becomes the default action when no msg.match() rules are included, and the priority is low; recall (2.7.1   OpenFlow Switches) that if a packet matches multiple flow-table entries then the entry with the highest priority wins. The priority is here set to 1; the Pox default priority – which will be used (implicitly) for later, more-specific flow-table entries – is 32768. The first step is to arrange for table 0 to forward to the controller and to table 1.

msg = nx.nx_flow_mod()
msg.table_id = 0              # not necessary as this is the default
msg.priority = 1              # low priority
msg.actions.append(of.ofp_action_output(port = of.OFPP_CONTROLLER))
msg.actions.append(nx.nx_action_resubmit.resubmit_table(table = 1))

Next we tell table 1 to flood packets by default:

msg = nx.nx_flow_mod() msg.table_id = 1 msg.priority = 1 msg.actions.append(of.ofp_action_output(port = of.OFPP_FLOOD)) event.connection.send(msg)

Now we define the PacketIn handler. First comes the table 0 match on the packet source; if there is a match, then the source address has been seen by the controller, and so the packet is no longer forwarded to the controller (it is forwarded to table 1 only).

msg = nx.nx_flow_mod()
msg.table_id = 0
msg.match.of_eth_src = packet.src     # match the source
msg.actions.append(nx.nx_action_resubmit.resubmit_table(table = 1))

Now comes table 1, where we match on the destination address. All we know at this point is that the packet with source address packet.src came from port event.port, and we forward any packets addressed to packet.src via that port:

msg = nx.nx_flow_mod() msg.table_id = 1 msg.match.of_eth_dst = packet.src # this rule applies only for packets to packet.src msg.actions.append(of.ofp_action_output(port = event.port)) event.connection.send(msg)

Note that there is no network state maintained at the controller; there is no analog here of the table dictionary of l2_pairs.py.

Suppose we have a simple network h1–s1–h2. When h1 sends to h2, the controller will add to s1’s table 0 an entry indicating that h1 is a known source address. It will also add to s1’s table 1 an entry indicating that h1 is reachable via the port on s1’s left. Similarly, when h2 replies, s1 will have h2 added to its table 0, and then to its table 1.

18.9   Exercises

  1. Show that, in the example of 18.8.3   l2_nx.py, table 0 and table 1 will always have the same entries.
  2. Suppose we implement l2_pairs a little differently, so that when h1 pings h4, then as soon as the controller knows what ports to use to reach h1 and/or h4 from s2, it installs on s2 ordinary destination-only forwarding rules for reaching those destinations. Specifically, if a packet arrives at s2 that s2 forwards to the controller, and the controller knows how s2 should forward the packet to its destination, it installs the forwarding rule on s2.

Why does this not work? Hint: consider h1 sending to h4, h4 sending to h1, h3 sending to h1, and finally h1 sending to h3.