TC
1. Introduction
The tc
command is used to configure Traffic Control in the Linux kernel and is
part of the iproute2
package.
The tc
framework in Linux offers a very capable, flexible, and comprehensive
set of features. tc
in combination with SwitchDev or DSA, also offers hardware
offloading.
Comparing with more traditional Switch or Network OS’es, tc
covers the
following features:
-
Classifying
-
Scheduling
-
Shaping and policing
-
ACLs
-
Push/pop vlans
When tc
is running in SW without any HW offload, it offers an unlimited number
of lookups, all shapers, policers, and filters are available with the full set of
parameters. But the performance (both in terms of bandwidth, latency, real-time
processing etc) is limited by the CPU and the connectivity between the network
ports and the CPU. On the other hand, when tc
is offloaded to HW, it can
typically offer wire-speed processing, latency, and real-time performance are
typically in nano-second granularity, but the features, scaling, and flexibility
is limited by the capability of the under-laying HW.
This documentation will focus on tc
being used on platforms with SwitchDev/DSA
offloading, and only cover the features which can be offloaded by the relevant
Microchip switches.
2. tc
terminology and architecture
tc
operates on a single network interface and was originally designed to work on egress only.
Later on tc
was enhanced to also work on ingress but with a more limited feature set.
Each network interface has two hooks where tc
can affect the traffic. These hooks are
adjacent to the network interface as shown in the simplified illustration below.
Not all traffic is forwarded. The user application may consume the frame without further forwarding. Also, the user application may produce frames. |
Other hooks, such as netfilter hooks, are located further away from the network interface,
which means that tc
is consulted first after receiving an Ethernet frame and last before
transmitting an Ethernet frame. Other hooks than tc
hooks are not supported by Microchip switches.
tc
consists of the following components:
-
Qdisc
-
Class
-
Filter
-
Chain
-
Shared filter blocks
All components are identified by an ID that has the same syntax: <major>:<minor>
.
Both major and minor are hexadecimal numbers.
The nameing conventions of the ID are different in each of the components and will be described under each component.
2.1. Qdisc
A qdisc (queuing discipline) is defined as a scheduler and/or shaper that decides which frame to send next.
There are two qdiscs that do not adhere to this definition, namely the ingress
qdisc and the clsact
qdisc.
These are not really queueing disciplines but creates a location where filters can be attached.
The ingress
qdisc works on ingress only while the clsact
works on both ingress and egress.
As the clsact
qdisc is more general than the ingress
qdisc all examples will use the clsact
qdisc.
The clsact
qdisc can be used simultaneously with all other qdiscs that only work on egress.
On ingress, the clsact
qdisc is typically used for classifying, dropping, or policing frames.
On egress, the clsact
qdisc is typically used to add, modify or delete VLAN tags.
clsact
qdisc is created by using the tc qdisc
command:# tc qdisc add dev eth0 clsact
The rest of the qdiscs are for egress only and can be subdivided into two types: Classful qdiscs and Classless qdiscs. Each interface has a default root egress qdisc attached that depends on the network interface.
tc qdisc
command:# tc qdisc add dev eth0 root handle 1:0 mqprio
All qdiscs must be created with a parent and an optional handle. If no handle is specified the system will create one.
The example above creates a new root qdisc of type mqprio and with root
as the parent and a handle of 1:0
.
The minor number of a qdisc must always be 0 (or simply omitted in which case handle can be specified as either <major>:
or <major>
).
2.1.1. Classful vs Classless qdiscs
Classful qdiscs allow you to create a separate policy for different traffic classes by assigning an arbitrary number of classes to the classful qdisc. These classes can again contain other classes or qdiscs. It is possible to create a very complex tree structure by combining classful qdiscs and classes. A filter attached to the root qdisc or to any of the classes is used to steer the traffic to a specific class.
Classless qdiscs do not support classes and filters. The behavior of the qdisc is determined at creation time. Some of the classless qdiscs maps traffic flows to traffic classes but these are not real classes as in a classful qdisc. In most cases, it is possible to assign another qdisc to each of these traffic classes.
It is the chosen implementation inside the kernel which decides if it is a classful or classless qdisc. When understanding the TC framework, and when choosing which qdiscs to use in a given configuration scenario, it is much more important to understand the concept of schedulers and shapers. The current implementation in the kernel offers different schedulers (some implemented as classful others as classless), likewise with shapers (some implemented as classful others as classless). The following sections provide an introduction to schedulers and shapers.
The qdiscs in TC are very flexible, and allow to nest objects in a recursive
way. HW on the other hand is fixed, and to allow HW offload we need to align with the
limitation of the HW. The skip_sw
flag will cause the driver to return an
error if a given configuration cannot be offloaded.
2.1.2. Schedulers
A scheduler splits traffic into different traffic classes and decides which frame to send next.
The schedulers supported for HW offload by Microchip switches are:
-
mqprio
- Multiqueue Priority Qdisc (classless) -
taprio
- Time Aware Priority Shaper (classless) -
ets
- Enhanced Transmission Selection (classful)
mqprio
The mqprio
qdisc is the most simple to understand.
It basically splits traffic up into eight different queues based on the priority of the frame where frames in high priority queues are sent first.
Adding an mqprio qdisc does nothing by itself as this is the way the egress interface works by default.
The reason to use it is that it creates eight qdisc classes which map 1:1 to the eight priority queues.
On each of these qdisc classes, it is possible to attach another qdisc such as a shaper.
An mqprio qdisc can be illustrated like this:
In the example above a cbs shaper is attached on traffic class 1:5
(priority 4). A default qdisc is automatically attached to all the other traffic classes.
See the Strict scheduling section on the Scheduling page to see how the example above can be implemented.
taprio
The taprio
qdisc is basically an mqprio qdisc with added support for scheduled traffic as described in IEEE Std 802.1Q-2018 Section 8.6.8.4,
also known as Time-Aware Scheduling (TAS).
A cyclic schedule opens and closes each priority queue relative to a known timescale, e.g. controlled via PTP.
When a queue is closed all frames are held back in the queue and when it opens again the frames are transmitted again in priority order.
This cycle repeats forever.
See the Time Aware Scheduling section on the Scheduling page to see how Time-Aware Scheduling can be implemented.
ets
The ets
qdisc is basically an mqprio qdisc with added support for Enhanced Transmission Selection (ETS) as described in IEEE Std 802.1Q-2018 Section 37.
With the ets qdisc, you can either have eight strict priority queues, in which case it works in the same way as the mqprio qdisc, or you can have up to eight weighted queues where you can configure each queue to have a guaranteed bandwidth in percent of the total bandwidth.
If you have less than eight weighted queues the rest of the queues works as strict priority queues.
The weighted queues are always allocated from the priority queues with the lowest priority.
See the Strict and DWRR scheduling section on the Scheduling page to see how Enhanced Transmission Selection can be implemented.
2.1.3. Shapers
A shaper sends out the traffic with a specified maximum bitrate and buffers all traffic that exceeds this bitrate. In other words: It smooths the traffic and removes bursts.
Frames are never discarded unless the buffer is full.
The shapers supported for HW offload by Microchip switches are:
-
tbf
- Token Bucket Filter (classful) -
cbs
- Credit Based Shaper (classless)
tbf
The tbf
qdisc implements a shaper based on the Token Bucket algorithm.
A tbf qdisc is created using the tc qdisc
command and can be attached either to the root or to a traffic class in one of the supported schedulers mentioned above.
See the Priority Shaping section on the Shaping page for a description of how to implement a tbf
shaper.
cbs
The cbs
qdisc implements the shaper algorithm described in IEEE Std 802.1Q-2018 Section 8.6.8.2.
A cbs qdisc is created using the tc qdisc
command and can be attached to a traffic class in one of the supported schedulers mentioned above.
See the Priority Shaping section on the Shaping page for a description of how to implement a cbs
shaper.
2.2. Class
Classes in the traffic control framework can be described by the following statements:
-
A class represents a traffic class and can only exist inside a classful qdisc of the same type as the class.
-
A class can contain other child classes or a single qdisc, which can be a classful or classless qdisc.
-
A class that does not contain a child class is called a leaf class and will always contain a default simple classless qdisc unless another qdisc is assigned.
-
A class must be assigned a parent and a class id when it is created.
The following illustration shows an example of a classful qdisc with child classes.
The different classes are selected by filters attached to the root qdiscs.
Classful qdiscs with user-defined classes are not supported by Microchip switches.
2.3. Filter
A tc
filter is used to match frames in some way and apply actions on the matching frames.
The filters supported for HW offload by Microchip switches are:
-
matchall
- Matches all frames -
flower
- Matches packets via a set of keys, such assrc_mac
anddst_ip
.
Microchip switches only support using filters on the clsact
qdisc on either ingress or egress.
Filters support several general parameters that are independent of the filter type:
-
prio
(orpref
) - The priority of the filter. The lowest value has the highest priority and are checked first. -
handle
- The filter ID to be used when modifying or deleting the filter. -
protocol
- The protocol to match, such as all, ip, ipv6 or 802.1q. -
skip_sw
- The filter is HW offloaded.
All examples expect that a clsact
qdisc is created as shown in the Qdisc section.
The clsact
qdisc creates two new handles, ingress
and egress
, which are used as parents in the filter commands.
2.3.1. tc filter actions
Every filter contains one or more actions that are applied when the filter is hit.
Not all kinds of actions are supported by each filter as it depends on both the context and the capabilities of the hardware.
The following actions are supported in at least one configuration:
-
pass
- do nothing -
trap
- send packet to CPU -
drop
- drop packet -
skbedit priority <PRIO>
- modify packet priority to PRIO (qos class) -
vlan pop
- pop vlan tag -
vlan modify [ protocol <PROTO> ] id <VID> [ priority <PCP> ]
- modifyVID
andPCP
.<PROTO>
is one of 802.1q (default) or 802.1ad -
vlan push [ protocol <PROTO> ] id <VID> [ priority <PCP> ]
- push a new vlan tag.<PROTO>
is one of 802.1q (default) or 802.1ad -
police rate <BPS> burst <BYTES>
- police traffic -
mirred egress mirror dev <MONITOR_PORT>
- mirror traffic on monitor port -
goto chain <CHAIN_INDEX>
- goto specified chain
Actions are always added last in the filter-specific options.
Use tc filter add matchall action <action> help
to see all action parameters.
The following sections describe briefly how to use matchall
and flower
filters.
They will be described in more detail in the sections where the actual usage is shown.
2.3.2. matchall filter
The matchall
filter matches all packets and applies one or more actions.
# tc filter add dev eth0 ingress prio 10 handle 1 protocol all matchall \ skip_sw \ action skbedit priority 4
Use tc filter add matchall help
to see all parameters.
2.3.3. flower filter
The flower
filter is able to match packets using a set of keys and apply one or
more actions on the matching packets
# tc filter add dev eth0 ingress prio 20 handle 2 protocol all flower \ skip_sw \ dst_mac 00:11:22:33:44:55 \ action drop
Use tc filter add flower help
to see all parameters.
2.4. Chain
tc
filter chains allow us to jump from one filter chain to another using the
goto
action.
Chain 10000 contains a filter that matches on SMAC and drops the matching packets:
# tc filter add dev eth0 ingress prio 20 handle 2 protocol all matchall \ skip_sw \ action goto chain 10000 # tc filter add dev eth0 ingress prio 21 handle 3 protocol all chain 10000 flower \ skip_sw \ src_mac 00:22:33:44:55:66 \ action drop
tc chain
command:# tc chain add dev eth0 ingress chain 10000 protocol ip flower \ dst_ip 0.0.0.0 \ ip_proto tcp
The command above creates a chain template that will limit filters created in this chain to only specify protocol as ip and dst_ip and ip_proto as keys.
The key parameters given in the chain template are not fixed, only the name of the keys.
# tc filter add dev eth0 ingress prio 1 handle 1 protocol ip chain 10000 flower \ skip_sw \ dst_ip 10.10.10.10 \ ip_proto udp \ action drop
# tc filter add dev eth0 ingress prio 2 handle 2 protocol ip chain 10000 flower \ skip_sw \ src_ip 20.20.20.20 \ ip_proto udp \ action drop
2.5. Shared Filter Blocks
tc
operates on a single interface only, but if two or more interfaces need exactly
the same filter configuration then shared filter blocks are the rescue.
The downside is that you cannot combine shared filter blocks with individual filters on a specific interface, so it is all or nothing.
clsact
qdisc directly on an interface we addthe same ingress_block on the interfaces we want to share the filter configuration:
# tc qdisc add dev eth0 ingress_block 10 clsact # tc qdisc add dev eth1 ingress_block 10 clsact # tc qdisc add dev eth3 ingress_block 10 clsact
# tc filter add block 10 chain 12000 prio 1 handle 1 protocol all flower skip_sw \ dst_mac 00:01:01:00:00:00/ff:ff:ff:00:00:00 \ src_mac 00:02:02:02:02:02 \ action goto chain 20002 # tc filter add block 10 chain 20002 prio 2 handle 2 protocol all flower skip_sw \ action drop \ action goto chain 21000
The kernel will now apply all filters to each interface.
It is possible to use a mask in many of the filter keys as shown above for dst_mac .This allows us to match on only a subset of the key. |
The primary reason for using shared filter blocks is that it allows optimizing the use of TCAM entries where the port mask would be the only difference between the entries. In the example above only one TCAM rule is needed.
Secondary it makes it easier to setup the filters.
3. References
tc
can be found in the man pages:$ man tc $ man tc-mqprio $ man tc-taprio $ man tc-ets $ man tc-cbs $ man tc-tbf $ man tc-matchall $ man tc-flower
https://tldp.org/HOWTO/Traffic-Control-HOWTO/index.html (version 1.0.2 Oct 2006)
https://tldp.org/en/Traffic-Control-HOWTO/ (version 1.1 Jan 2016)
https://tldp.org/HOWTO/Adv-Routing-HOWTO/ (version 1.1 Jul 2002)
http://borg.uu3.net/traffic_shaping/index.html
https://lwn.net/Articles/736338/
https://lwn.net/Articles/743391/