Scheduling
The classified priority is a number from 0 to 7 where 0 is the lowest priority and 7 is the highest priority.
1. Strict scheduling.
Strict scheduling is the default behaviour at egress.
A port has eight priority queues and frames in the highest priority queue are always transmitted before frames in lower priority queues.
If shapers are needed on one or more of these eight priority queues then the mqprio qdisc can be used.
The mqprio qdisc does nothing by itself but serves as an attachment point for shapers.
To create an mqprio qdisc on eth0 and attach a cbs shaper on priority queue 4:
$ tc qdisc add dev eth0 root handle 1:0 mqprio
$ tc qdisc replace dev eth0 parent 1:5 handle 2 cbs \
idleslope 10000 sendslope -990000 hicredit 15 locredit -990 offload 1
The mqprio handle is 1:0 and the major part is 1. The minor part of a qdisc handle must always be 0.
The cbs parent is 1:5 where the major part (1) must match the major part of the mqprio handle and the minor part (5) designates priority queue 4.
The minor part is always offset by one and minor part 5, therefore, corresponds to priority queue 4.
See the Priority Shaping section on the Shaping page for a description of the supported shapers and their parameters.
2. Strict and DWRR scheduling.
Default is that the frames are subject to the strict priority shaper algorithm.
It can be configured that priorities are subject to Enhanced Transmission Selection (ETS).
These are the lowest priorities so that transmission from strict priorities are always
done first.
Configuration is done using the tc qdisc add ets command.
This command configures bands as either strict or bandwidth-sharing (DWRR).
According to tc-ets(8) man page:
When dequeuing, strict bands are tried first, if there are any. Band 0 is tried first.
This means that the first band - band 0 - has the highest priority.
In the Microchip switch, there are 8 priorities 0-7 and it is priority 7 that has the highest priority.
This is not configurable.
This means that Band 0 always equals to Priority 7.
This command configures a mix of strict and ETS priorities:
tc qdisc add dev eth0 handle 1: root ets bands 8 strict 5 quanta 1000 1000 1000 priomap 7 6 5 4 3 2 1 0
The device (port) is eth0.
The handle for this tc is 1:
There are (always) 8 bands
The first 5 bands are strict.
The next 3 bands have an equal quanta of 1000.
The priomap is (always) 7 6 5 4 3 2 1 0
The quanta parameter gives the DWRR weight for each band. In this case 1/3 of the bandwidth to each band.
The priomap is configuring per priority what the related band is - Priority 0 is the first in the list.
As Priority 0 has the lowest priority it maps to band 7 that is the band with the lowest priority.
The priomap must always be configured as shown above.
This command changes the quantum parameter for a specific band.
tc qdisc add dev eth0 handle 1: root ets bands 8 strict 5 quanta 600 300 100 priomap 7 6 5 4 3 2 1 0
tc class change dev eth0 classid 1:6 ets quantum 800
The device (port) is eth0.
The classid is 1:6
the quantum is 800
According to tc-ets(8) man page:
The minor number of `classid` to use when referring to a band is the band number increased by one.
So in this case the changed band is 5 - the first bandwidth-sharing band.
3. Time-Aware Scheduling
Time-aware scheduling, as described in IEEE Std 802.1Q-2018 Section 8.6.8.4, is implemented using the taprio qdisc.
A cyclic schedule opens and closes each priority queue relative to a known timescale, e.g. controlled via PTP.
When a queue is closed all frames are held back in the queue and when it opens again the frames are transmitted again in priority order.
A taprio qdisc is created using the tc qdisc command.
$ tc qdisc add dev eth0 root handle 1:0 taprio \
num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 map 0 1 2 3 4 5 6 7 \
flags 2 base-time 0 cycle-time 50000000 \
sched-entry S 0x80 10000000 \
sched-entry S 0x7f 40000000
The taprio qdisc is a little more complicated to set up as the tc taprio command does not (yet) support the same defaults as the tc mqprio command. Here you must manually set up the number of traffic classes, the mapping from traffic class to priority, and flags that indicates HW offload.
The rest of the parameters are for the time-aware scheduler configuration:
-
base-time- The PTP time when the cycle should start. Set to 0 if the cycle should start immediately. -
cycle-time- The total cycle-time in nanoseconds. -
sched-entry- A single entry in the schedule whereSis the command 'SetGateStates' followed by a mask that selects the queues to open (LSB is the lowest priority queue) and finally the duration of the entry in nanoseconds.
In the example above the cycle-time is 50 milliseconds, queue 7 is open for 10 milliseconds and all other queues are open for 40 milliseconds. This cycle repeats forever.
To update a running schedule without deleting and re-adding the qdisc, use
tc qdisc replace:
$ tc qdisc replace dev eth0 root handle 1:0 taprio \
num_tc 8 queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 map 0 1 2 3 4 5 6 7 \
flags 2 base-time 0 cycle-time 50000000 \
sched-entry S 0x80 20000000 \
sched-entry S 0x7f 30000000
The hardware performs a hitless transition from the old (oper) schedule to the new (admin) schedule at the next cycle boundary.
It is possible to add a shaper on one or more of the eight priority queues in the same way as on the mqprio qdisc.
See the Priority Shaping section on the Shaping page for a description of the supported shapers and their parameters.
3.1. Hardware Limits
The following table summarizes the TAS hardware limits per platform.
| Parameter | LAN9645x | LAN969x | LAN966x | Sparx5 |
|---|---|---|---|---|
Min cycle time / entry interval |
1 µs |
1 µs |
1 µs |
1 µs |
Max cycle time |
< 1 s (999,999,999 ns) |
< 1 s (999,999,999 ns) |
< 1 s (999,999,999 ns) |
< 1 s (999,999,999 ns) |
Total GCL entries (shared pool) |
900 |
3000 |
256 |
10000 |
TAS lists (admin + oper) |
18 |
60 |
— |
130 |
Lists per port |
2 |
2 |
2 |
2 |
The GCL entry pool is shared across all ports. There is no per-port maximum — a single port’s schedule can consume as many entries from the pool as needed, limited only by the total pool size.
3.2. tc command: maximum sched-entries
The tc command from iproute2 allocates a fixed 1024-byte buffer for netlink attributes.
Each sched-entry requires 28 bytes of netlink attribute space, and the fixed overhead (qdisc type, priomap, base-time, cycle-time, flags) consumes approximately 140 bytes.
This limits tc to a practical maximum of 31 sched-entries:
(1024 - 140) / 28 = 31
Attempting to add more entries will fail with:
addattr_l ERROR; message exceeded bound of 1024
This is a userspace limitation in iproute2, not a kernel or hardware limit. To configure schedules with more than 31 entries netlink messages must be constructed directly.
3.3. Guard Band and max-sdu
When a gate closes, the hardware uses a guard band to prevent a frame from being partially transmitted when the gate transition occurs. By default, the guard band is sized for maximum-length frames (MTU 1500), which at 1 Gbps corresponds to approximately 12.3 µs.
For short cycle times, this default guard band can consume a significant portion
of the open window. The max-sdu parameter allows the guard band to be reduced
by specifying the maximum expected frame size per traffic class:
$ tc qdisc add dev eth0 parent root handle 100 taprio \
num_tc 8 map 0 1 2 3 4 5 6 7 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0 cycle-time 20000 \
sched-entry S 0x01 5000 \
sched-entry S 0x00 15000 \
max-sdu 100 0 0 0 0 0 0 0 \
flags 0x2
The max-sdu parameter takes 8 values, one per traffic class (TC 0-7). A value
of 0 means the default guard band (MTU-based) is used for that traffic class.
For example, if traffic class 0 only carries small control frames (80 bytes),
setting max-sdu to 100 reduces the guard band to approximately 1 µs, making
short cycle times (e.g. 20 µs) practical.