7.12.07

Advanced routing mini-HOWTO

Advanced routing mini-HOWTO

Timur A. Bolokhov, timur@tepkom.ru

This document describes new routing features of 2.1.X development and coming 2.2.X stable linux kernels. Among them are source-based routing and Network Address Translation (NAT).

Introduction

Somewhere in the middle of 2.1 development kernel series routing code was rewriten by Alexey Kuznetsov (kuznet@ms2.inr.ac.ru), many new features like policy(source)-based routing, Network Address Translation, scheduling etc were added. Networking is now managed by means of ip, tc and rtmon utilities from iproute2 package. I hope this document will help novices to enter new conception.

Regrets

This document is written by a USER, even some basic notions can be incorrect. The ip utility is very powerfull, as you can see by its syntax in appendix, only a little part of its possibilities is described. Hope that you can guess the rest. No word is said about cooperation with tc and about tc itself. No picture yet. Bad language, punctuation, general mistakes.

Preliminary reading

Suppose that you already have some experince with linux routing, or at least just studied NET-3 HOWTO, IP-Alias, IP-Subnetworking, IP-Masquerading, Proxy-Arp minis. Kernel-HOWTO will help you to compile new-featured kernel.

Where to find them

  • The iproute2 package is available in ftp://ftp.inr.ac.ru/ip-routing/ There is a mirror(s), but I couldnot even resolve it in DNS. May be the situation will change?

  • Howtos are as usual in /usr/doc/ or in the nearest mirror of sunsite.unc.edu.

  • Utility ipchains is homed in

    http://www.adelaide.net.au/rustcorp/ipfwchains.

  • This document: hope that current version will be somewhere under

    ftp://post.tepkom.ru/pub/Linux/

Convention

Value standing in square brackets [ ] is just an option to smth.

Software

Author of this document is using 2.1.121 kernel with glibc-2.0.7, iproute2-ss980827 along with gated-3.5.9. Also iproute2-glibc2-patch?? was applied. This combination experienced only a week uptime, I couldnot test it longer.

How it was before

I'll try to remind you in brief routing conception from 2.0.X series kernels. When IP packet hits router's interface kernel, at first, applies to it rules from input firewall chain. Then if packet survives and in case that forwarding is enabled (/proc/sys/net/ipv4/ip_forward is nonzero) it is being passed to another interface according to the routing table and forward firewall chain. Or just finish its way if its destination is one of the routers' interfaces. Normally routing table contain description of paths to all possible IP destinations. The latest are gathered in groups -- networks, each of them is uniquelly described by network adress (the first address in the group) and netmask (masklengh), which characterizes the number of adresses in the group ( $ 2^{32 - {\rm masklen}}$ is the right number). Routing table has two main columns:

DESTINATION:            HOWTO_REACH_IT

Indeed, look at the example:

router># route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
192.168.1.32 0.0.0.0 255.255.255.224 U 0 0 12 eth1:1
192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 34 eth0
192.168.2.0 0.0.0.0 255.255.255.0 U 0 0 3 eth1
192.168.3.0 192.168.0.3 255.255.255.0 UG 1 0 8 eth0
127.0.0.0 0.0.0.0 255.0.0.0 U 0 0 1 lo
0.0.0.0 192.168.0.4 0.0.0.0 UG 1 0 3 eth0
We have two network devices, three interfaces (without loopback) -- eth0, eth1 and an alias eth1:1, three networks connected directly, so we have 0.0.0.0 as gateway, one network connected behind the gateway 192.168.0.3 and a wise router 192.168.0.4 which knows how to forward packets to the rest part of the world. Routing table is scanned by kernel from top to bottom, when destination is found within some network (or there is special "host" entry for it) packet is forwarded to the specified gateway via corresponding interface.

Note that networks are sorted strongly in the direction of decreasing of netmask (masklen), so that if a smaller network within a bigger one has its own gateway then it will appear higher in the table and have its chance to be routed correctly.

Now I want to remind you how to make such a table. Here is some base syntax:

ifconfig DEVICE [ADDRESS] [netmask MASK] [broadcast ADDR] [up,down]
route {add,del,flush} [-net,-host] [NETWORK] [netmask MASK] \
>[gw GATEWAY] [dev DEVICE]
and the real commands:
router># ifconfig lo 127.0.0.1 netmask 255.0.0.0 broadcast 127.255.255.255 up
router># ifconfig eth0 192.168.0.1 netmask 255.255.255.0 up
router># ifconfig eth1 192.168.2.1 up
router># ifconfig eth1:1 192.168.1.35 netmask 255.255.255.224 \
> broadcast 192.168.1.63 up
router># route add -net 127.0.0.0 dev lo
router># route add -net 192.168.0.0 netmask 255.255.255.0 dev eth0
router># route add -net 192.168.2.0 dev eth1
router># route add -net 192.168.3.0 netmask 255.255.255.0 gw 192.168.0.3
router># route add -net 192.168.1.32 netmask 255.255.255.224 dev eth1:1
router># route add default gw 192.168.0.4

What it is now

Short description of a new routing mechanisms you can find in linux/Documentation/Policy-routing.txt. Below I'll try give it in more detail.

Now we have not only one table (string) of correspondencies

DESTINATION:            HOWTO_REACH_IT
but a set of such a tables (which are called classes in the document referenced above), each one being applied to the packets satisfying certain conditions. These conditions are set by means of ip rule syntaxis of ip utility, while routing tables are filled by means of ip route. There are three built-in tables (classes): local, main and default. Here we can see how they are bound by the rules:
router># ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
Rules are scanned by the kernel in order of their preferense (the number before semicolon), so in this initial setup for any arrived packet path to destination will be looked up, at first, in table local and if it's not found -- in tables main and default.

When an interface has been configured with ifconfig (or ip link and ip addr) host entries of its ip and broadcast addresses appear in the table local. Route to its attached network appears in the table main. All this is done automatically, you should not type no command now. To check up what do we have in table N just type ip route list table N.

Utilities ifconfig and route from net-tools are still available under 2.1.X, so set up from the previous section can readily be done as above (but without dealing with attached networks). Another variant is to use ip:

router># ip link set eth0 up
router># ip addr add 192.168.0.1/24 broadcast 192.168.0.255 \
> label eth0 dev eth0
router># ip link set eth1 up
router># ip addr add 192.168.2.1/24 broadcast 192.168.2.255 \
> label eth1 dev eth1
router># ip addr add 192.168.1.35/27 broadcast 192.168.1.63 \
> label eth1:1 dev eth1
router># ip route add 192.168.3.0/24 via 192.168.0.3 table main
router># ip route add 0/0 via 192.168.0.4 table main
Static and default routes from this example may have been also put to any other table which is looked up after table main (with preference greater than 32766). For example:

router># ip route add 192.168.3.0/24 via 192.168.0.3 table 1
router># ip route add 0/0 via 192.168.0.4 table 2
router># ip rule add [from 0/0] table 1 pref 32800
router># ip rule add [from 0/0] table 2 pref 32810
so that ip rule gives:

router># ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
32800: from all lookup 1
32810: from all lookup 2
But we won't consider this variant below.

So what's the difference of the new routing scheme from the previous one? The main is that ip packets now can be sorted with regards to their source address, TOS field, and may be in the future -- to special marks put on them by external classifier (like ipchains). Suppose that we want in our example for the packets [with TOS 0x10 (minimum delay)] coming from 192.168.1.32/27 to be routed thruogh default gateway 192.168.0.5, then we type (after our interfaces are up):

router># ip route add 192.168.3.0/24 via 192.168.0.3 table main
router># ip route add 0/0 via 192.168.0.5 table 3
router># ip route add 0/0 via 192.168.0.4 table 4
router># ip rule add from 192.168.1.32/27 [tos 0x10] table 3 pref 32900
router># ip rule add from 0/0 table 4 pref 32910
Rules now looks like this:

router># ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
32900: from 192.168.1.32/27 [tos 0x10] lookup 3
32910: from all lookup 4

Similar setup may be usefull for organizations connected to the net through two or more ISPs via one linux gateway (of course, we shouldn't check here TOS field -- just route packets from network assigned by the second ISP to its gateway or ppp interface). It is even possible to make a script notice a problems in one link and redirect (in combination with NAT) critical outgoing connections to another ISPs link. This won't work for incoming calls as long as you do not change your DNS entries accordingly or have multihomed servers.

Here is a syntax for ipchains to set the TOS field:

ipchains -A input -p PROTO -s SOURCE [port] -d DEST [port] -t 0x01 0x10

NATs

You should be extremely careful playing with NAT, even in a network with complex topology, routed by routing protocols or simply connected to other network through more than one router.

Translation of a packet's destination address is always done in routing table local. The syntax is the following:

ip route add nat WHAT/MASKLEN via WHERE table local
So to translate all packets coming to 192.168.1.50 in the packets destinned to 192.168.2.25 you type:

router># ip route add nat 192.168.1.50 via 192.168.2.25 table local
And to translate whole subnet 192.168.1.40/29 into 192.168.2.48/29 command is
router># ip route add nat 192.168.1.40/29 via 192.168.2.48 table local

Translation of source addresses should be set by means of rules:

ip rule add from REAL_SOURCE/MASKLEN nat PSEUDO_SOURCE table TABLEID

According to the routing conception ip packets comimg from REAL_SOURCE will translate their source addresses to PSEUDO_SOURCE and routed according to the table TABLEID. The translation will be valid only for the packets whos destination is in this table.

Let's illustrate it. Suppose that in our example 192.168.2.0/24 is an address space from ISP with gateway 192.168.0.4 and 192.168.1.32/27 is from ISP with gateway 192.168.0.5. We suddenly want to relink hosts in subnetwork 192.168.2.48/29 to another ISP. We have wisely reserved a spare subnet 192.168.1.40/29 for this. But we want no translation when 192.168.2.48/29 comes to local nets, especially to 192.168.1.0. Next commands provide our needs:

router># ip route add nat 192.168.1.40/29 via 192.168.2.48 table local
router># ip rule add from 192.168.2.48/29 nat 192.168.1.40 table 3 pref 32820
(Remind that table 3 contains default gw 192.168.0.5). Our setup now is:
router># ip rule
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
32820: from 192.168.2.48/29 nat 192.168.1.40 lookup 3
32900: from 192.168.1.32/27 lookup 3
32910: from all lookup 4

Want the same translation when going to 192.168.1.0 too? Ok, just type

router># ip rule add from 192.168.2.48/29 nat 192.168.1.40 table 5
router># ip rule add 192.168.1.0/24 via 192.168.0.3 table 5
Then you'll get
router># ip rule
0: from all lookup local
32765: from 192.168.2.48/29 nat 192.168.1.40 lookup 5
32766: from all lookup main
32767: from all lookup default
32820: from 192.168.2.48/29 nat 192.168.1.40 lookup 3
32900: from 192.168.1.32/27 lookup 3
32910: from all lookup 4

Note that you should allways think of where your rule appears in the list, i.e. control its preference. Otherwise result may be very confusing. Guess why we couldnot just put the route to 192.168.1.0/24 into table 3 with

router># ip rule add 192.168.1.0/24 via 192.168.0.3 table 5
instead of last two ip rule add ... and ip route add ...?

Hope that those imaginary examples will help to organize your real system.

Appendix

Full syntax of ip utility is gathered here

ip

Usage: ip [ OPTIONS ] OBJECT { COMMAND | help }
where OBJECT := { link | addr | route | rule | neigh | tunnel }
OPTIONS := { -s[tatistics] | -f[amily] { inet | inet6 }}

ip link

Usage: ip link set DEVICE { up | down | arp { on | off } |
multicast { on | off } | txqueuelen PACKETS |
name NEWNAME }
ip link show [ DEVICE ]

ip addr

Usage: ip addr [ add | del ] IFADDR dev STRING
ip addr show [ dev STRING ] [ ipv4 | ipv6 | link | all ] [txqueuelen]
IFADDR := PREFIX [ local ADDR ]
[ broadcast ADDR ] [ anycast ADDR ]
[ label STRING ] [ scope SCOPE ]
SCOPE := [ host | link | global | NUMBER ]

ip route

Usage: ip route list SELECTOR
ip route { change | del | add | append | replace | monitor } ROUTE
SELECTOR := [ root PREFIX ] [ match PREFIX ] [ exact PREFIX ]
[ table TABLE_ID ] [ proto RTPROTO ]
[ type TYPE ] [ scope SCOPE ]
ROUTE := NODE_SPEC [ INFO_SPEC ]
NODE_SPEC := [ TYPE ] PREFIX [ tos TOS ]
[ table TABLE_ID ] [ proto RTPROTO ]
[ type TYPE ] [ scope SCOPE ]
INFO_SPEC := NH OPTIONS FLAGS [ nexthop NH ]...
NH := [ via ADDRESS ] [ dev STRING ] [ weight NUMBER ] NHFLAGS
OPTIONS := FLAGS [ mtu NUMBER ] [ rtt NUMBER ] [ window NUMBER ]
[ flowid CLASSID ]
TYPE := [ unicast | local | broadcast | multicast | throw |
unreachable | prohibit | blackhole | nat ]
TABLE_ID := [ local | main | default | all | NUMBER ]
SCOPE := [ host | link | global | NUMBER ]
NHFLAGS := [ onlink | pervasive ]
RTPROTO := [ kernel | boot | static | NUMBER ]

ip rule

Usage: ip rule [ list | add | del ] SELECTOR ACTION
SELECTOR := [ from PREFIX ] [ to PREFIX ] [ tos TOS ]
[ dev STRING ] [ pref NUMBER ]
ACTION := [ table TABLE_ID ] [ nat ADDRESS ]
[ prohibit | reject | unreachable ]
[ flowid CLASSID ]
TABLE_ID := [ local | main | default | new | NUMBER ]

ip neigh

Usage: ip neigh { add | del } { ADDR [ lladdr LLADDR ]
[ nud { permanent | noarp | stale | reachable } ]
| proxy ADDR } [ dev DEVICE ]
ip neigh show [ ipv4 | ipv6 | all ]

ip tunnel

Usage: ip tunnel { add | change | del | show } [ NAME ]
[ mode { ipip | gre | sit } ] [ remote ADDR ] [ local ADDR ]
[ [i|o]seq ] [ [i|o]key KEY ] [ [i|o]csum ]
[ ttl TTL ] [ tos TOS ] [ nopmtudisc ] [ dev PHYS_DEV ]

Where: NAME := STRING
ADDR := { IP_ADDRESS | any }
TOS := { NUMBER | inherit }
TTL := { 1..255 | inherit }
KEY := { DOTTED_QUAD | NUMBER }

No comments: