Baixe Linux advanced routing and traffic control HOWTO e outras Notas de estudo em PDF para Informática, somente na Docsity!
Linux Advanced Routing & Traffic
Control HOWTO
Bert Hubert
Netherlabs BV
Thomas Graf (Section Author)
tgraf%suug.ch
Gregory Maxwell (Section Author)
Remco van Mook (Section Author)
Martijn van Oosterhout (Section Author)
Paul B Schroeder (Section Author)
Jasper Spaans (Section Author)
Pedro Larroy (Section Author)
piotr%member.fsf.org
Linux Advanced Routing & Traffic Control HOWTO
by Bert Hubert
Thomas Graf (Section Author)
tgraf%suug.ch
Gregory Maxwell (Section Author)
Remco van Mook (Section Author)
Martijn van Oosterhout (Section Author)
Paul B Schroeder (Section Author)
Jasper Spaans (Section Author)
Pedro Larroy (Section Author)
piotr%member.fsf.org
A very hands-on approach to iproute2, traffic shaping and a bit of netfilter.
Revision History
Revision $Revision$ $Date$
DocBook Edition
Chapter 1. Dedication
This document is dedicated to lots of people, and is my attempt to do something back. To list but a few:
- Rusty Russell
- Alexey N. Kuznetsov
- The good folks from Google
- The staff of Casema Internet
Chapter 2. Introduction
Welcome, gentle reader.
This document hopes to enlighten you on how to do more with Linux 2.2/2.4 routing. Unbeknownst to
most users, you already run tools which allow you to do spectacular things. Commands like route and
ifconfig are actually very thin wrappers for the very powerful iproute2 infrastructure.
I hope that this HOWTO will become as readable as the ones by Rusty Russell of (amongst other things)
netfilter fame.
You can always reach us by posting to the mailing list (see the relevant section) if you have comments or
questions about or somewhat related to this HOWTO. We are no free helpdesk, but we often will answer
questions asked on the list.
Before losing your way in this HOWTO, if all you want to do is simple traffic shaping, skip everything
and head to the Other possibilities chapter, and read about CBQ.init.
2.1. Disclaimer & License
This document is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
PURPOSE.
In short, if your STM-64 backbone breaks down and distributes pornography to your most esteemed
customers - it’s never our fault. Sorry.
Copyright (c) 2002 by bert hubert, Gregory Maxwell, Martijn van Oosterhout, Remco van Mook, Paul B.
Schroeder and others. This material may be distributed only subject to the terms and conditions set forth
in the Open Publication License, v1.0 or later (the latest version is presently available at
http://www.opencontent.org/openpub/).
Please freely copy and distribute (sell or give away) this document in any format. It’s requested that
corrections and/or comments be forwarded to the document maintainer.
It is also requested that if you publish this HOWTO in hardcopy that you send the authors some samples
for “review purposes” :-)
Chapter 2. Introduction
2.4. Housekeeping notes ....................................................................................................................
There are several things which should be noted about this document. While I wrote most of it, I really
don’t want it to stay that way. I am a strong believer in Open Source, so I encourage you to send
feedback, updates, patches etcetera. Do not hesitate to inform me of typos or plain old errors. If my
English sounds somewhat wooden, please realize that I’m not a native speaker. Feel free to send
suggestions.
If you feel you are better qualified to maintain a section, or think that you can author and maintain new
sections, you are welcome to do so. The SGML of this HOWTO is available via GIT, I very much
envision more people working on it.
In aid of this, you will find lots of FIXME notices. Patches are always welcome! Wherever you find a
FIXME, you should know that you are treading in unknown territory. This is not to say that there are no
errors elsewhere, but be extra careful. If you have validated something, please let us know so we can
remove the FIXME notice.
About this HOWTO, I will take some liberties along the road. For example, I postulate a 10Mbit Internet
connection, while I know full well that those are not very common.
2.5. Access, GIT & submitting updates .............................................................................................
The canonical location for the HOWTO is here (http://lartc.org/).
We now have anonymous GIT access available to the world at large. This is good in a number of ways.
You can easily upgrade to newer versions of this HOWTO and submitting patches is no work at all.
Furthermore, it allows the authors to work on the source independently, which is good too.
$ git clone git://repo.or.cz/lartc.git or (if you’re behind a firewall which only allows HTTP) $ git clone http://repo.or.cz/r/lartc.git Enter the checked out directory: $ cd lartc.git If you want to update your local copy, run $ git pull
If you made changes and want to contribute them, run git diff , and mail the output to the LARTC
mailing list , we can then integrate it easily. Thanks! Please make sure
that you edit the .db file, by the way, the other files are generated from that one.
A Makefile is supplied which should help you create postscript, dvi, pdf, html and plain text. You may
need to install docbook, docbook-utils, ghostscript and tetex to get all formats.
Chapter 2. Introduction
Be careful not to edit 2.4routing.sgml! It contains an older version of the HOWTO. The right file is
lartc.db.
2.6. Mailing list ..................................................................................................................................
The authors receive an increasing amount of mail about this HOWTO. Because of the clear interest of the
community, it has been decided to start a mailinglist where people can talk to each other about Advanced
Routing and Traffic Control. You can subscribe to the list here
(http://mailman.ds9a.nl/mailman/listinfo/lartc).
It should be pointed out that the authors are very hesitant of answering questions not asked on the list.
We would like the archive of the list to become some kind of knowledge base. If you have a question,
please search the archive, and then post to the mailinglist.
2.7. Layout of this document .............................................................................................................
We will be doing interesting stuff almost immediately, which also means that there will initially be parts
that are explained incompletely or are not perfect. Please gloss over these parts and assume that all will
become clear.
Routing and filtering are two distinct things. Filtering is documented very well by Rusty’s HOWTOs,
available here:
- Rusty’s Remarkably Unreliable Guides (http://netfilter.samba.org/unreliable-guides/)
We will be focusing mostly on what is possible by combining netfilter and iproute2.
Chapter 3. Introduction to iproute
Some parts of iproute require you to have certain kernel options enabled. It should also be noted that all
releases of RedHat up to and including 6.2 come without most of the traffic control features in the default
kernel.
RedHat 7.2 has everything in by default.
Also make sure that you have netlink support, should you choose to roll your own kernel. Iproute2 needs
it.
3.4. Exploring your current configuration..........................................................................................
This may come as a surprise, but iproute2 is already configured! The current commands ifconfig and
route are already using the advanced syscalls, but mostly with very default (ie. boring) settings.
The ip tool is central, and we’ll ask it to display our interfaces for us.
3.4.1. ip shows us our links
[ahu@home ahu]$ ip link list 1: lo: mtu 3924 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00: 2: dummy: mtu 1500 qdisc noop link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 3: eth0: mtu 1400 qdisc pfifo_fast qlen 100 link/ether 48:54:e8:2a:47:16 brd ff:ff:ff:ff:ff:ff 4: eth1: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:e0:4c:39:24:78 brd ff:ff:ff:ff:ff:ff 3764: ppp0: mtu 1492 qdisc pfifo_fast qlen 10 link/ppp
Your mileage may vary, but this is what it shows on my NAT router at home. I’ll only explain part of the
output as not everything is directly relevant.
We first see the loopback interface. While your computer may function somewhat without one, I’d advise
against it. The MTU size (Maximum Transfer Unit) is 3924 octets, and it is not supposed to queue.
Which makes sense because the loopback interface is a figment of your kernel’s imagination.
I’ll skip the dummy interface for now, and it may not be present on your computer. Then there are my
two physical network interfaces, one at the side of my cable modem, the other one serves my home
ethernet segment. Furthermore, we see a ppp0 interface.
Note the absence of IP addresses. iproute disconnects the concept of ’links’ and ’IP addresses’. With IP
aliasing, the concept of ’the’ IP address had become quite irrelevant anyhow.
Chapter 3. Introduction to iproute
It does show us the MAC addresses though, the hardware identifier of our ethernet interfaces.
3.4.2. ip shows us our IP addresses
[ahu@home ahu]$ ip address show 1: lo: mtu 3924 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00: inet 127.0.0.1/8 brd 127.255.255.255 scope host lo 2: dummy: mtu 1500 qdisc noop link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff 3: eth0: mtu 1400 qdisc pfifo_fast qlen 100 link/ether 48:54:e8:2a:47:16 brd ff:ff:ff:ff:ff:ff inet 10.0.0.1/8 brd 10.255.255.255 scope global eth 4: eth1: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:e0:4c:39:24:78 brd ff:ff:ff:ff:ff:ff 3764: ppp0: mtu 1492 qdisc pfifo_fast qlen 10 link/ppp inet 212.64.94.251 peer 212.64.94.1/32 scope global ppp
This contains more information. It shows all our addresses, and to which cards they belong. ’inet’ stands
for Internet (IPv4). There are lots of other address families, but these don’t concern us right now.
Let’s examine eth0 somewhat closer. It says that it is related to the inet address ’10.0.0.1/8’. What does
this mean? The /8 stands for the number of bits that are in the Network Address. There are 32 bits, so we
have 24 bits left that are part of our network. The first 8 bits of 10.0.0.1 correspond to 10.0.0.0, our
Network Address, and our netmask is 255.0.0.0.
The other bits are connected to this interface, so 10.250.3.13 is directly available on eth0, as is 10.0.0.
for example.
With ppp0, the same concept goes, though the numbers are different. Its address is 212.64.94.251,
without a subnet mask. This means that we have a point-to-point connection and that every address, with
the exception of 212.64.94.251, is remote. There is more information, however. It tells us that on the
other side of the link there is, yet again, only one address, 212.64.94.1. The /32 tells us that there are no
’network bits’.
It is absolutely vital that you grasp these concepts. Refer to the documentation mentioned at the
beginning of this HOWTO if you have trouble.
You may also note ’qdisc’, which stands for Queueing Discipline. This will become vital later on.
3.4.3. ip shows us our routes
Well, we now know how to find 10.x.y.z addresses, and we are able to reach 212.64.94.1. This is not
enough however, so we need instructions on how to reach the world. The Internet is available via our ppp
Chapter 3. Introduction to iproute
[root@espa041 /home/src/iputils]# ip neigh show 9.3.76.42 dev eth0 lladdr 00:60:08:3f:e9:f9 nud reachable 9.3.76.1 dev eth0 lladdr 00:06:29:21:73:c8 nud reachable
As you can see my machine espa041 (9.3.76.41) knows where to find espa042 (9.3.76.42) and espagate
(9.3.76.1). Now let’s add another machine to the arp cache.
[root@espa041 /home/paulsch/.gnome-desktop]# ping -c 1 espa PING espa043.austin.ibm.com (9.3.76.43) from 9.3.76.41 : 56(84) bytes of data. 64 bytes from 9.3.76.43: icmp_seq=0 ttl=255 time=0.9 ms
--- espa043.austin.ibm.com ping statistics --- 1 packets transmitted, 1 packets received, 0% packet loss round-trip min/avg/max = 0.9/0.9/0.9 ms
[root@espa041 /home/src/iputils]# ip neigh show 9.3.76.43 dev eth0 lladdr 00:06:29:21:80:20 nud reachable 9.3.76.42 dev eth0 lladdr 00:60:08:3f:e9:f9 nud reachable 9.3.76.1 dev eth0 lladdr 00:06:29:21:73:c8 nud reachable
As a result of espa041 trying to contact espa043, espa043’s hardware address/location has now been
added to the arp/neighbor cache. So until the entry for espa043 times out (as a result of no
communication between the two) espa041 knows where to find espa043 and has no need to send an ARP
request.
Now let’s delete espa043 from our arp cache:
[root@espa041 /home/src/iputils]# ip neigh delete 9.3.76.43 dev eth [root@espa041 /home/src/iputils]# ip neigh show 9.3.76.43 dev eth0 nud failed 9.3.76.42 dev eth0 lladdr 00:60:08:3f:e9:f9 nud reachable 9.3.76.1 dev eth0 lladdr 00:06:29:21:73:c8 nud stale
Now espa041 has again forgotten where to find espa043 and will need to send another ARP request the
next time he needs to communicate with espa043. You can also see from the above output that espagate
(9.3.76.1) has been changed to the "stale" state. This means that the location shown is still valid, but it
will have to be confirmed at the first transaction to that machine.
Chapter 4. Rules - routing policy database
If you have a large router, you may well cater for the needs of different people, who should be served
differently. The routing policy database allows you to do this by having multiple sets of routing tables.
If you want to use this feature, make sure that your kernel is compiled with the "IP: advanced router" and
"IP: policy routing" features.
When the kernel needs to make a routing decision, it finds out which table needs to be consulted. By
default, there are three tables. The old ’route’ tool modifies the main and local tables, as does the ip tool
(by default).
The default rules:
[ahu@home ahu]$ ip rule list 0: from all lookup local 32766: from all lookup main 32767: from all lookup default
This lists the priority of all rules. We see that all rules apply to all packets (’from all’). We’ve seen the
’main’ table before, it is output by ip route ls , but the ’local’ and ’default’ table are new.
If we want to do fancy things, we generate rules which point to different tables which allow us to
override system wide routing rules.
For the exact semantics on what the kernel does when there are more matching rules, see Alexey’s
ip-cref documentation.
4.1. Simple source policy routing ....................................................................................................
Let’s take a real example once again, I have 2 (actually 3, about time I returned them) cable modems,
connected to a Linux NAT (’masquerading’) router. People living here pay me to use the Internet.
Suppose one of my house mates only visits hotmail and wants to pay less. This is fine with me, but
they’ll end up using the low-end cable modem.
The ’fast’ cable modem is known as 212.64.94.251 and is a PPP link to 212.64.94.1. The ’slow’ cable
modem is known by various ip addresses, 212.64.78.148 in this example and is a link to 195.96.98.253.
The local table:
[ahu@home ahu]$ ip route list table local broadcast 127.255.255.255 dev lo proto kernel scope link src 127.0.0.
Chapter 4. Rules - routing policy database
/ _ | if1 | /
/ \ | | |
| Local network -----+ Linux router | | Internet
_ __/ | | |
__ __/ | if2 | \
___/ +------+-------+ +------------+ |
| | | \
+-------------+ Provider 2 +-------
+------------+ ________
There are usually two questions given this setup.
4.2.1. Split access
The first is how to route answers to packets coming in over a particular provider, say Provider 1, back out
again over that same provider.
Let us first set some symbolical names. Let $IF1 be the name of the first interface (if1 in the picture
above) and $IF2 the name of the second interface. Then let $IP1 be the IP address associated with $IF
and $IP2 the IP address associated with $IF2. Next, let $P1 be the IP address of the gateway at Provider
1, and $P2 the IP address of the gateway at provider 2. Finally, let $P1_NET be the IP network $P1 is in,
and $P2_NET the IP network $P2 is in.
One creates two additional routing tables, say T1 and T2. These are added in /etc/iproute2/rt_tables.
Then you set up routing in these tables as follows:
ip route add $P1_NET dev $IF1 src $IP1 table T
ip route add default via $P1 table T
ip route add $P2_NET dev $IF2 src $IP2 table T
ip route add default via $P2 table T
Nothing spectacular, just build a route to the gateway and build a default route via that gateway, as you
would do in the case of a single upstream provider, but put the routes in a separate table per provider.
Note that the network route suffices, as it tells you how to find any host in that network, which includes
the gateway, as specified above.
Next you set up the main routing table. It is a good idea to route things to the direct neighbour through
the interface connected to that neighbour. Note the ‘src’ arguments, they make sure the right outgoing IP
address is chosen.
ip route add $P1_NET dev $IF1 src $IP
Chapter 4. Rules - routing policy database
ip route add $P2_NET dev $IF2 src $IP
Then, your preference for default route:
ip route add default via $P
Next, you set up the routing rules. These actually choose what routing table to route with. You want to
make sure that you route out a given interface if you already have the corresponding source address:
ip rule add from $IP1 table T
ip rule add from $IP2 table T
This set of commands makes sure all answers to traffic coming in on a particular interface get answered
from that interface.
Warning
Reader Rod Roark notes: ’If $P0_NET is the local network and $IF0 is its
interface, the following additional entries are desirable:
ip route add $P0_NET dev $IF0 table T
ip route add $P2_NET dev $IF2 table T
ip route add 127.0.0.0/8 dev lo table T
ip route add $P0_NET dev $IF0 table T
ip route add $P1_NET dev $IF1 table T
ip route add 127.0.0.0/8 dev lo table T
Now, this is just the very basic setup. It will work for all processes running on the router itself, and for
the local network, if it is masqueraded. If it is not, then you either have IP space from both providers or
you are going to want to masquerade to one of the two providers. In both cases you will want to add rules
selecting which provider to route out from based on the IP address of the machine in the local network.
4.2.2. Load balancing
The second question is how to balance traffic going out over the two providers. This is actually not hard
if you already have set up split access as above.
Instead of choosing one of the two providers as your default route, you now set up the default route to be
a multipath route. In the default kernel this will balance routes over the two providers. It is done as