TC

1. Introduction

The tc command is used to configure Traffic Control in the Linux kernel and is part of the iproute2 package.

The tc framework in Linux offers a very capable, flexible, and comprehensive set of features. tc in combination with SwitchDev or DSA, also offers hardware offloading.

Comparing with more traditional Switch or Network OS’es, tc covers the following features:

  • Classifying

  • Scheduling

  • Shaping and policing

  • ACLs

  • Push/pop vlans

When tc is running in SW without any HW offload, it offers an unlimited number of lookups, all shapers, policers, and filters are available with the full set of parameters. But the performance (both in terms of bandwidth, latency, real-time processing etc) is limited by the CPU and the connectivity between the network ports and the CPU. On the other hand, when tc is offloaded to HW, it can typically offer wire-speed processing, latency, and real-time performance are typically in nano-second granularity, but the features, scaling, and flexibility is limited by the capability of the under-laying HW.

This documentation will focus on tc being used on platforms with SwitchDev/DSA offloading, and only cover the features which can be offloaded by the relevant Microchip switches.

2. tc terminology and architecture

tc operates on a single network interface and was originally designed to work on egress only.

Later on tc was enhanced to also work on ingress but with a more limited feature set.

Each network interface has two hooks where tc can affect the traffic. These hooks are adjacent to the network interface as shown in the simplified illustration below.

tc flow
Not all traffic is forwarded. The user application may consume the frame without further forwarding. Also, the user application may produce frames.

Other hooks, such as netfilter hooks, are located further away from the network interface, which means that tc is consulted first after receiving an Ethernet frame and last before transmitting an Ethernet frame. Other hooks than tc hooks are not supported by Microchip switches.

tc consists of the following components:

  • Qdisc

  • Class

  • Filter

  • Chain

  • Shared filter blocks

All components are identified by an ID that has the same syntax: <major>:<minor>. Both major and minor are hexadecimal numbers.

The nameing conventions of the ID are different in each of the components and will be described under each component.

2.1. Qdisc

A qdisc (queuing discipline) is defined as a scheduler and/or shaper that decides which frame to send next.

There are two qdiscs that do not adhere to this definition, namely the ingress qdisc and the clsact qdisc. These are not really queueing disciplines but creates a location where filters can be attached. The ingress qdisc works on ingress only while the clsact works on both ingress and egress. As the clsact qdisc is more general than the ingress qdisc all examples will use the clsact qdisc.

The clsact qdisc can be used simultaneously with all other qdiscs that only work on egress. On ingress, the clsact qdisc is typically used for classifying, dropping, or policing frames.
On egress, the clsact qdisc is typically used to add, modify or delete VLAN tags.

The clsact qdisc is created by using the tc qdisc command:
# tc qdisc add dev eth0 clsact

The rest of the qdiscs are for egress only and can be subdivided into two types: Classful qdiscs and Classless qdiscs. Each interface has a default root egress qdisc attached that depends on the network interface.

The default root qdisc can be replaced with another qdisc by using the tc qdisc command:
# tc qdisc add dev eth0 root handle 1:0 mqprio

All qdiscs must be created with a parent and an optional handle. If no handle is specified the system will create one.

The example above creates a new root qdisc of type mqprio and with root as the parent and a handle of 1:0.

The minor number of a qdisc must always be 0 (or simply omitted in which case handle can be specified as either <major>: or <major>).

2.1.1. Classful vs Classless qdiscs

Classful qdiscs allow you to create a separate policy for different traffic classes by assigning an arbitrary number of classes to the classful qdisc. These classes can again contain other classes or qdiscs. It is possible to create a very complex tree structure by combining classful qdiscs and classes. A filter attached to the root qdisc or to any of the classes is used to steer the traffic to a specific class.

Classless qdiscs do not support classes and filters. The behavior of the qdisc is determined at creation time. Some of the classless qdiscs maps traffic flows to traffic classes but these are not real classes as in a classful qdisc. In most cases, it is possible to assign another qdisc to each of these traffic classes.

It is the chosen implementation inside the kernel which decides if it is a classful or classless qdisc. When understanding the TC framework, and when choosing which qdiscs to use in a given configuration scenario, it is much more important to understand the concept of schedulers and shapers. The current implementation in the kernel offers different schedulers (some implemented as classful others as classless), likewise with shapers (some implemented as classful others as classless). The following sections provide an introduction to schedulers and shapers.

The qdiscs in TC are very flexible, and allow to nest objects in a recursive way. HW on the other hand is fixed, and to allow HW offload we need to align with the limitation of the HW. The skip_sw flag will cause the driver to return an error if a given configuration cannot be offloaded.

2.1.2. Schedulers

A scheduler splits traffic into different traffic classes and decides which frame to send next.

The schedulers supported for HW offload by Microchip switches are:

  • mqprio - Multiqueue Priority Qdisc (classless)

  • taprio - Time Aware Priority Shaper (classless)

  • ets - Enhanced Transmission Selection (classful)

mqprio

The mqprio qdisc is the most simple to understand.

It basically splits traffic up into eight different queues based on the priority of the frame where frames in high priority queues are sent first.

Adding an mqprio qdisc does nothing by itself as this is the way the egress interface works by default.

The reason to use it is that it creates eight qdisc classes which map 1:1 to the eight priority queues.

On each of these qdisc classes, it is possible to attach another qdisc such as a shaper.

An mqprio qdisc can be illustrated like this:

tc mqprio

In the example above a cbs shaper is attached on traffic class 1:5 (priority 4). A default qdisc is automatically attached to all the other traffic classes.

See the Strict scheduling section on the Scheduling page to see how the example above can be implemented.

taprio

The taprio qdisc is basically an mqprio qdisc with added support for scheduled traffic as described in IEEE Std 802.1Q-2018 Section 8.6.8.4, also known as Time-Aware Scheduling (TAS).

A cyclic schedule opens and closes each priority queue relative to a known timescale, e.g. controlled via PTP.

When a queue is closed all frames are held back in the queue and when it opens again the frames are transmitted again in priority order.

This cycle repeats forever.

See the Time Aware Scheduling section on the Scheduling page to see how Time-Aware Scheduling can be implemented.

ets

The ets qdisc is basically an mqprio qdisc with added support for Enhanced Transmission Selection (ETS) as described in IEEE Std 802.1Q-2018 Section 37.

With the ets qdisc, you can either have eight strict priority queues, in which case it works in the same way as the mqprio qdisc, or you can have up to eight weighted queues where you can configure each queue to have a guaranteed bandwidth in percent of the total bandwidth.

If you have less than eight weighted queues the rest of the queues works as strict priority queues.

The weighted queues are always allocated from the priority queues with the lowest priority.

See the Strict and DWRR scheduling section on the Scheduling page to see how Enhanced Transmission Selection can be implemented.

2.1.3. Shapers

A shaper sends out the traffic with a specified maximum bitrate and buffers all traffic that exceeds this bitrate. In other words: It smooths the traffic and removes bursts.

Frames are never discarded unless the buffer is full.

The shapers supported for HW offload by Microchip switches are:

  • tbf - Token Bucket Filter (classful)

  • cbs - Credit Based Shaper (classless)

tbf

The tbf qdisc implements a shaper based on the Token Bucket algorithm.

A tbf qdisc is created using the tc qdisc command and can be attached either to the root or to a traffic class in one of the supported schedulers mentioned above.

See the Priority Shaping section on the Shaping page for a description of how to implement a tbf shaper.

cbs

The cbs qdisc implements the shaper algorithm described in IEEE Std 802.1Q-2018 Section 8.6.8.2.

A cbs qdisc is created using the tc qdisc command and can be attached to a traffic class in one of the supported schedulers mentioned above.

See the Priority Shaping section on the Shaping page for a description of how to implement a cbs shaper.

2.2. Class

Classes in the traffic control framework can be described by the following statements:

  • A class represents a traffic class and can only exist inside a classful qdisc of the same type as the class.

  • A class can contain other child classes or a single qdisc, which can be a classful or classless qdisc.

  • A class that does not contain a child class is called a leaf class and will always contain a default simple classless qdisc unless another qdisc is assigned.

  • A class must be assigned a parent and a class id when it is created.

The following illustration shows an example of a classful qdisc with child classes.

tc class

The different classes are selected by filters attached to the root qdiscs.

Classful qdiscs with user-defined classes are not supported by Microchip switches.

2.3. Filter

A tc filter is used to match frames in some way and apply actions on the matching frames.

The filters supported for HW offload by Microchip switches are:

  • matchall - Matches all frames

  • flower - Matches packets via a set of keys, such as src_mac and dst_ip.

Microchip switches only support using filters on the clsact qdisc on either ingress or egress.

Filters support several general parameters that are independent of the filter type:

  • prio (or pref) - The priority of the filter. The lowest value has the highest priority and are checked first.

  • handle - The filter ID to be used when modifying or deleting the filter.

  • protocol - The protocol to match, such as all, ip, ipv6 or 802.1q.

  • skip_sw - The filter is HW offloaded.

All examples expect that a clsact qdisc is created as shown in the Qdisc section. The clsact qdisc creates two new handles, ingress and egress, which are used as parents in the filter commands.

2.3.1. tc filter actions

Every filter contains one or more actions that are applied when the filter is hit.

Not all kinds of actions are supported by each filter as it depends on both the context and the capabilities of the hardware.

The following actions are supported in at least one configuration:

  • pass - do nothing

  • trap - send packet to CPU

  • drop - drop packet

  • skbedit priority <PRIO> - modify packet priority to PRIO (qos class)

  • vlan pop - pop vlan tag

  • vlan modify [ protocol <PROTO> ] id <VID> [ priority <PCP> ] - modify VID and PCP. <PROTO> is one of 802.1q (default) or 802.1ad

  • vlan push [ protocol <PROTO> ] id <VID> [ priority <PCP> ] - push a new vlan tag. <PROTO> is one of 802.1q (default) or 802.1ad

  • police rate <BPS> burst <BYTES> - police traffic

  • mirred egress mirror dev <MONITOR_PORT> - mirror traffic on monitor port

  • goto chain <CHAIN_INDEX> - goto specified chain

Actions are always added last in the filter-specific options.

Use tc filter add matchall action <action> help to see all action parameters.

The following sections describe briefly how to use matchall and flower filters.

They will be described in more detail in the sections where the actual usage is shown.

2.3.2. matchall filter

The matchall filter matches all packets and applies one or more actions.

Add matchall filter that sets the priority to 4 on all packets received on eth0:
# tc filter add dev eth0 ingress prio 10 handle 1 protocol all matchall \
  skip_sw \
  action skbedit priority 4

Use tc filter add matchall help to see all parameters.

2.3.3. flower filter

The flower filter is able to match packets using a set of keys and apply one or more actions on the matching packets

Add flower filter that matches on DMAC and drops the matching packets:
# tc filter add dev eth0 ingress prio 20 handle 2 protocol all flower \
  skip_sw \
  dst_mac 00:11:22:33:44:55 \
  action drop

Use tc filter add flower help to see all parameters.

2.4. Chain

tc filter chains allow us to jump from one filter chain to another using the goto action.

Here is an example that matches all in chain 0 (the default) and goto chain 10000.

Chain 10000 contains a filter that matches on SMAC and drops the matching packets:

# tc filter add dev eth0 ingress prio 20 handle 2 protocol all matchall \
  skip_sw \
  action goto chain 10000

# tc filter add dev eth0 ingress prio 21 handle 3 protocol all chain 10000 flower \
  skip_sw \
  src_mac 00:22:33:44:55:66 \
  action drop
Another use of chains is to use the tc chain command:
# tc chain add dev eth0 ingress chain 10000 protocol ip flower \
  dst_ip 0.0.0.0 \
  ip_proto tcp

The command above creates a chain template that will limit filters created in this chain to only specify protocol as ip and dst_ip and ip_proto as keys.

The key parameters given in the chain template are not fixed, only the name of the keys.

This filter will be accepted:
# tc filter add dev eth0 ingress prio 1 handle 1 protocol ip chain 10000 flower \
  skip_sw \
  dst_ip 10.10.10.10 \
  ip_proto udp \
  action drop
This filter will NOT be accepted as it violates the template:
# tc filter add dev eth0 ingress prio 2 handle 2 protocol ip chain 10000 flower \
  skip_sw \
  src_ip 20.20.20.20 \
  ip_proto udp \
  action drop

2.5. Shared Filter Blocks

tc operates on a single interface only, but if two or more interfaces need exactly the same filter configuration then shared filter blocks are the rescue.

The downside is that you cannot combine shared filter blocks with individual filters on a specific interface, so it is all or nothing.

Instead of creating the clsact qdisc directly on an interface we add

the same ingress_block on the interfaces we want to share the filter configuration:

# tc qdisc add dev eth0 ingress_block 10 clsact
# tc qdisc add dev eth1 ingress_block 10 clsact
# tc qdisc add dev eth3 ingress_block 10 clsact
Now create the filters in the shared block:
# tc filter add block 10 chain 12000 prio 1 handle 1 protocol all flower skip_sw \
  dst_mac 00:01:01:00:00:00/ff:ff:ff:00:00:00 \
  src_mac 00:02:02:02:02:02 \
  action goto chain 20002

# tc filter add block 10 chain 20002 prio 2 handle 2 protocol all flower skip_sw \
  action drop \
  action goto chain 21000

The kernel will now apply all filters to each interface.

It is possible to use a mask in many of the filter keys as shown above for dst_mac.
This allows us to match on only a subset of the key.

The primary reason for using shared filter blocks is that it allows optimizing the use of TCAM entries where the port mask would be the only difference between the entries. In the example above only one TCAM rule is needed.

Secondary it makes it easier to setup the filters.

3. References

More information about tc can be found in the man pages:
$ man tc
$ man tc-mqprio
$ man tc-taprio
$ man tc-ets
$ man tc-cbs
$ man tc-tbf
$ man tc-matchall
$ man tc-flower
An introduction to clsact qdisc:

https://lwn.net/Articles/671458/

An introduction to tc filter chains:

https://lwn.net/Articles/723067/

An introduction to tc shared filter blocks:

https://lwn.net/Articles/736338/
https://lwn.net/Articles/743391/