The man page tc-hfsc(8) just describes the command line syntax, but even for that simple task manages to be incomplete. tc-hfsc(7) is a good summary of the original paper, but like the original paper isn't helpful to newbies trying to get an intuitive grasp of what HFSC does. In other words, don't blame yourself, the official doco really is bad.
Try reading this: http://linux-tc-notes.sourceforge.net/tc/doc/sch_hfsc.txt
Answering your specific questions. He is claiming he gets a 78ms advantage over in latency if he sends a 1500 VOIP packet at 400kbit rather than 100kbit. Under the real time curve, the 1500 byte VOIP packet will be sent as soon as the line is free AND VOIP's service curve says it has earned 1500 bytes worth of transmit time. The latency is the maximum amount of time between receiving the packet and completion of sending the last byte.
= Time_to_earn_credit + Time_to_send_VOIP_packet
Time_to_send_VOIP_packet is 12ms:
= 1500 [byte] * 8 [bits/byte] / 1000000 [seconds]
Time_to_earn_credit for send the VOIP packet depends on it's allowed rate. At 100kbit it is 120ms:
= 1500 [byte] * 8 [bits/byte] / 100000 [seconds]
Similarly, the Time_to_earn_credit packet at 400kbit is 1/4 of that, or 30ms. The different between the two latencies is 80ms:
= (120ms + 12ms) - (30ms + 12ms)
In other words, given the facts as stated his claim he gets 78ms advantage is wrong. It should be 80ms. Note that in his diagram below the text you quote he confirms this. There he shows the difference being (120ms - 30ms) = 80ms.
Next, he claims the burst granted to VOIP the increased delay in when User A Data starts sending from 30ms to 52.5ms. Lets assume the situation he is talking about is VOIP and User Data A both queue a MTU sized packet at time 0. We saw above that Time_to_earn_credit for sending User A Data at 400kbit is indeed 30ms.
With the extra 300kbit (= 400kbit - 100kbit) burst awarded to VOIP there must be some other class that drops in speed by the same 300kbit otherwise the link will be overcommitted. We can not tell what he intended from the commands at the bottom because as we will see they are simply wrong (they over commit the link). But I will make a guess and assume that it comes from User A Data. So it must drop to 100kbit for 30ms, then go back to 400kbit. This makes the calculation of Time_to_earn_credit for User A Data complex. It becomes 52.5ms:
= 30ms + ((1500 [bytes] * 8 [bits/byte] - Data_sent_in_30ms) / 400000 [bits/sec]
= 30ms + (12000 [bit] - 0.03 [sec] * 100000 [bytes/sec]) / 400000 [bits/sec]
= 30ms + (12000 [bit] - 3000 [bit]) / 400000 [bits/sec]
= 30ms + 22.5ms
= 52.5ms
So that claim is indeed correct - it did go from 30ms to 52.5ms. He goes onto discuss how virtual time works (which isn't a bad description).
Finally, the gives the specification. Here are the relevant bit for User A Data and VOIP:
classid 1:11 hfsc sc umax 1500b dmax 53ms rate 400kbit ul rate 1000kbit
classid 1:12 hfsc sc umax 1500b dmax 30ms rate 100kbit ul rate 1000kbit
At this point we need to understand how umax and dmax work. The tc-hfsc(8) describes then in terms of an alternate syntax, which is:
m1 BURST-RATE d BURST-SEC m2 STEADY-RATE
Which means you get BURST-RATE for BURST-SEC, and then STEADY-RATE thereafter. The man page then goes on say:
Obviously, m1 is simply umax/dmax.
It doesn't say, so lets assume dmax has the same meaning as d. (It's a good assumption - that's what tc's source says too.) His code guarantees VOIP burst speed of 400kbit:
= 1500 [bytes] * 8 [bits/byte] / 0.03 [seconds]
So he has given VOIP 400kbit for 30ms, which is exactly what he intended. But, his code guarantees User A Data a burst speed of around 226kbit:
= 1500 [bytes] * 8 [bits/sec] / 0.053 [seconds]
Notice the total 626kbit (=226kbit + 400kbit) exceeds what he wanted to give User A (500 kbit). Also notice that he has guaranteed 500kbit to User B. So under this simplified scenario he has guaranteed a total of 1126kbit (=626kbit + 500kbit) on a 1000kbit link. Clearly this won't work.
Sadly the umax/dmax thing does not do what the man page says. As you will see, they are so complex only their mother could love them. Us mere mortals should use the alternative syntax "m1 ... d ... m2 ..." instead. However Patrick McHardy is their mother, so perhaps he knew what he was doing.
The umax dmax rate syntax only does the obvious if the resultant burst rate is faster than the speed allocate by rate. For VOIP 226kbit does indeed exceed 100kbit, so that is OK. For User A Data 226kbit it doesn't exceed 400kbit, so the dmax...umax thing in fact means a burst speed of 0(!) for 23 milliseconds. The 23 millisecond comes from this calculation (taken from tc's source code, in q_hfsc.c):
= 53 - Time_to_send_MTU_at_*rate*
= 53 - 30
Unfortunately this doesn't fix the link over commit. Now he has committed VOIP for a burst of 400kbit for 30ms, and after 23ms User A Data also gets 400kbit, so for 7ms there (=30ms - 23ms) User A has been guaranteed 800kbit (=400kbit VOIP + 400kbit User A Data), and User B has guaranteed 500kbit, thus he has guaranteed 1300kbit (=800kbit + 500kbit) on a 1000kbit link. Thus proving not even the inventor of dmax..umax can use them correctly.
If I understand his intentions correctly this will do what he wanted:
tc qdisc add dev eth0 root handle 1: hfsc
tc class add dev eth0 parent 1: classid 1:1 hfsc ls rate 1000kbit ul rate 1000kbit
tc class add dev eth0 parent 1:1 classid 1:10 hfsc ls rate 500kbit
tc class add dev eth0 parent 1:1 classid 1:20 hfsc ls rate 500kbit
tc class add dev eth0 parent 1:10 classid 1:11 hfsc ls m2 400kbit
tc class add dev eth0 parent 1:10 classid 1:12 hfsc rt m1 400kbit d 30ms m2 100kbit
There are a number of changes:
- All bar ul's bar the one at the top are gone. The rest are redundant.
- The sc's for the internal nodes have become ls. sc sets both rt and ls, but internals nodes ignore rt.
- The sc for User A Data has been replaced by a simple ls. User A Data is clearly needs no latency guarantees. Anything more is an overkill.
- The sc for VOIP has become rt. The umax/dmax notation is gone because as we have seen it is unreliable. As it doesn't have a ls allocation, the 100kbit is also a maximum - it will never get more than that after the burst. The ls's don't need to grant rt the extra bust, as rt takes what it needs and ls gets the rest.
If User A Data was in fact latency sensitive (obviously it isn't) it's line would become:
tc class add dev eth0 parent 1:10 classid 1:11 hfsc sc m1 100kbit d 30ms m2 400kbit
Notice sc is used here to set both rt (for the latency) and ls. The ls must be there, otherwise it won't get a share of any unused capacity. The rt probably isn't necessary, as something called User A Data probably doesn't need hard latency guarantees. Providing hard latency guarantees is the only reason you use rt.
Notice also that it is now trivial to check you haven't over committed the link. Just add up all the leafs m1's and m2's. For real time, the respective sums must be below the link capacity. (For link share it doesn't matter.) For the d's, just ensure all classes that are granted a real burst had d's no bigger than any of the classes that don't. In this case the rt's (recall an sc is also an rt) m1's add up to 500kbit, so that's good. The m2's add to 500kbit, so that's good. And the longest fast burst d is 30ms, and no other slow burst d's are smaller than that, so it's good.
That took longer than I though. The take away is: it's not you, the official documentation truly does suck.