Bandwidth Estimator

Question

Bandwidth Estimator

Sean-Der opened this issue 4 years ago · comments

The issue covers everything we want to do around Congestion Estimation and Pion. This covers a Bandwidth Estimator in Pion, and generating feedback for remote peers.

@mengelbart is the owner of this work. If you have feedback/suggestions/ideas he knows this all the best!

bc30165 Implement Sender/Receiver Reports
c8a26a2 Implement Transport Wide Congestion Control Feedback
Design a interface/API that all Bandwidth Estimators will satify
Merge the NetworkConditoner so we can test our implementations
- Start with basic tests (hard limits on total bandwidth/start queueing at certain bitrate]
- Implement full test suite from RFC8867
Implement Google Congestion Control
- Have it be Sender estimates only using TWCC. No REMB our Receiver Reports.
Implement FlexFEC
- This will be needed by servers attempting to deliver Simulcast/SVC
Make WebRTC Bandwidth Estimation accessiable to others
- Write blog posts about what you are implementing/challenges
- Write a dedicated chapter in WebRTC for the Curious about Bandwidth Estimation
Make work accessible for other projects.

In the future we will also need

Switch from Transport Wide Congestion Control to cc-feedback-message
- This hasn't been adopted by anyone yet
Investigate AdaptiveFEC
- Paper

Juliusz Chroboczek · Answer 1 · Sat Jan 09 2021 01:32:19 GMT+0800 (China Standard Time)

Accurate SR/RR is the most urgent bit — without that, nothing works correctly. SR are essential for correct lipsynch, and GCC will yield acceptable performance on non-bufferbloated networks with just correct RR.

Let me expand on that. GCC consists of two congestion controllers:

a fairly traditional loss-based controller, that reacts to the loss rate communicated in RR;
a rather novel delay-based controller, that reacts to the second derivative of the packet arrival time.

The loss-based controller is trivial to implement, it works reasonably well, except that in the absence of a low-delay AQM at the bottleneck router it tends to cause latency proportional to the amount of buffering. The delay-based controller reacts earlier than the loss-based one, thus limiting latency, but it is underspecified and difficult to implement.

Here is Galène's implementation of the loss-based controller:

https://github.com/jech/galene/blob/master/rtpconn/rtpconn.go#L962

Here's Galène's implementation of SR and RR:

https://github.com/jech/galene/blob/master/rtpconn/rtpconn.go#L879
https://github.com/jech/galene/blob/master/rtpconn/rtpconn.go#L788

Sean DuBois · Answer 2 · Sat Jan 09 2021 02:24:17 GMT+0800 (China Standard Time)

Thank you so much @jech. I am going to move forward with everything until the TWCC. I am hoping I can make things good enough that you can simplify Galene.

When I start the delay based one I will send emails to the IETF list and see if I can get clarifications/help!

Alessandro Ros · Answer 3 · Fri Feb 05 2021 07:25:26 GMT+0800 (China Standard Time)

Sender and Receiver Report Generator

I've updated the PR with sender and receiver reports:

reordered packets are now supported
the incomplete bandwidth estimation part has been removed

Google Congestion Control Interceptor that uses Sender Reports - This is an optional interceptor that the user opts into. It then provides a callback with estimated bandwidth.

The estimated bandwidth is provided by the RTCP packet with payload 206, as confirmed in the paper, which is ReceiverEstimatedMaximumBitrate in pion/rtcp.

I've always seen ReceiverEstimatedMaximumBitrate and SenderReport shipped together in the same UDP packet - how can they be shipped together if there are two separate interceptors, one for Sender Reports and the other for congestion control?

What's the plan to make the two parts communicate with each other?

Dave Täht · Answer 4 · Tue Feb 16 2021 09:30:15 GMT+0800 (China Standard Time)

I am just logging in on this bug so I can keep track.

Alessandro Ros · Answer 5 · Thu Feb 18 2021 16:11:06 GMT+0800 (China Standard Time)

Ok, since the SR/RR part has been merged, the bandwidth estimation part is the next one... but i repeat, in my opinion the bandwidth estimation must be a plugin of the RR interceptor, not a separate interceptor.

Sean DuBois · Answer 6 · Sat Feb 20 2021 15:08:33 GMT+0800 (China Standard Time)

@aler9 We couldn't not even worry about REMB if that makes things simpler! Looking at Chromium's RTCP parser I don't think they care. Mostly in rtcp_receiver.cc

We should start with a Receiver Report driven congestion controller. It reads the incoming Receiver Reports and adjusts accordingly. This would just be the loss based estimation of Google Congestion Control. We could then later add a TWCC driven congestion controller. That could be GCC/NADA/SCReAM.

If we do need to communicate between interceptors my plan was to use Attributes.

The only things we can't compromise with the CC design is

User needs to be able to construct and choose which one they use
User needs to get info in their code. My goal is to provide a example that looks like this

for estimate := range CongestionController.Estimate() {
  case high:
    //send frame from high.mp4
  case med:
    //send frame from med.mp4
   case low:
    // send frame from low.mp4
}

Dave Täht · Answer 7 · Sun Feb 21 2021 09:05:53 GMT+0800 (China Standard Time)

My principal requests at this point would be:

ability to set and inspect the values of the ecn bits - and to be able to link an action to a change thereof
ability to set and inspect dscp also.

I REALLY want to gain the ability to put voice and video on different tuples also.

Scream and nada want to futz with ecn, and my gcc variant (unpublished) used the voice clock to calculate when a probe for more video bandwidth was succeeding or failing in a fq_codel environment. being able to set dscp meant that video calls could go into the wifi VI queue.

Sean DuBois · Answer 8 · Mon Feb 22 2021 14:13:47 GMT+0800 (China Standard Time)

@dtaht I had no idea DSCP/ECN would be needed. That is going to be somewhat of a pain because our ICE Agent only deals in bytes.

Putting audio/video on different tuples shouldn't be so bad! I just didn't do it before because we didn't have demand. I always perceived it as worse (because of extra port usage) so didn't worry about it. I would love to get it in though.

Have you ever seen any numbers about how much better Loss Based (GCC part 1) vs Latency Based (GCC part 1+2)? I do want to build everything eventually. I just don't want to invest in something with smaller returns then other Pion projects I would be ignoring.

Thank you so much for sharing all this knowledge. It is so hard to find people who knows this stuff and are willing to share.

Juliusz Chroboczek · Answer 9 · Tue Feb 23 2021 23:34:18 GMT+0800 (China Standard Time)

@dtaht what about hashing the SSID into the IPv6 flow-id? Would that yield the same benefits as using different ports?

Dave Täht · Answer 10 · Tue Mar 02 2021 09:39:44 GMT+0800 (China Standard Time)

Sorry for the delay in reply. I need to put github notifications somewhere I read more regularly.

scream mandates ecn usage. worse, it presently requires l4s style markings, where I would vastly prefer to attempt SCE style.
an early version of nada did some intelligent things with ecn. I actually favor using ecn on the moral equivalent of iframes,
if there was a way for the encoder and congestion control mechanisms to communicate and some experimentation possible.

I really really really appreciate how clean the pion stuff is, compared to, say, trying to hack on the browsers or janus was. The browsers also make it really, really hard to get at the dscp/ecn headers, being single return (C, C++) based systems, whereas go can return multiple variables really cleanly.

All the gcc/scream/nada variants had REALLY slow probe times when I was last active in the rmcat working group circa 2012-2013. Since then "BBR" showed the way towards probing for more bandwidth and RTT separately which I think would be
interesting - my core idea was sending a ton of duplicate packets for a probe during a delta phase and seeing how many got through and observe rtt inflation before switching the encoder over to a higher rate. (I can point to a couple cool papers on this idea if I can remember the titles on google scholar This one was not horrible, but I can't find the one I'm thinking of

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8737672)

You can't run out of udp ports with ipv6. There's a lot of ipv6 going around, especially in the lte world. Given that a given galene conference is under 100 people, and you can bind to as many addresses as you like in ipv6... as for ipv4, well, maybe not even try negotiating separate tuples for it, but I'd really like to try....

Dave Täht · Answer 11 · Tue Mar 02 2021 09:42:28 GMT+0800 (China Standard Time)

while I'm citing papers, I adored @jech 's paper on mosh. https://arxiv.org/abs/1502.02402 - in my case I very much like the idea of an ipv6 address tunneled from elsewhere being an endpoint in webrtc. Short term this provides a degree of anyomnity not present in any webrtc implementation I know of, longer term might point to a way to proxying more users into a single conference.

Dave Täht · Answer 12 · Tue Mar 02 2021 09:44:44 GMT+0800 (China Standard Time)

@jech I have NO idea what happens to flow labels on the path. None. It would be an interesting stat to collect. I'm under the impression that an early use case was for scribbling on it along the path as a load balancing technique, and have no idea if that still exists, but certainly the prospect of that crippled the flowlabel for any other purpose.

And I have no idea what you can accomplish by hashing the SSID onto it.

Dave Täht · Answer 13 · Tue Mar 02 2021 09:53:30 GMT+0800 (China Standard Time)

@Sean-Der Going back to your workload... are you a funded entity? I have a few good connections that are looking for viable proposals, please contact me via email for more details.

In my case I cannot refute or confirm the claims of the scream folk without somehow getting access to the ecn bits. It's been 8 years of pointless debate vs a simulation, and the same goes for my own attempt at work here - and my dream has been to try and get something like this, located on a edge gateway, to work worth a damn, on fiber, at least, across town.

https://lola.conts.it/

Blasting raw video packets works. :) It's really hard to get gear that uses scanlines anymore, sad to say.

I've been at the jamaphone project for a really, really, really long time.

https://www.internetsociety.org/wp-content/uploads/2013/09/28_towards_imperceptible_latency.pdf

And see:

https://ccrma.stanford.edu/groups/soundwire/publications/papers/schuett_honorThesis2002.pdf

Dave Täht · Answer 14 · Tue Mar 02 2021 12:00:11 GMT+0800 (China Standard Time)

@jech ok, I see the germ of your idea. Hashing anything onto the flow id "voice", "video" might make some sense if it is actually preserved at least somewhat across the internet and save on using different ports for it. Same level of security exposure for both concepts. But you still have to get at the flowid, and if you do that you can also get to the port number, dscp, etc. So having an extended udp library for go that "did the right thing" with the extra info that can be extracted from the os (note that linux also supports per packet timestamping). My guess is that the current one could be copy/pasted and extended but I don't know much about it....

being able to effectively use "sendmmsg" might also be a way to ensure bursts hit wifi faster and as one txop. same goes for recvmmsg.

Dave Täht · Answer 15 · Tue Mar 02 2021 14:01:58 GMT+0800 (China Standard Time)

heh... and just to get rid of a pointless researchy idea - udp-lite is fully implemented on many OSes. It passes through many ipv6 implementations and routers. It's now pointless to have udp-lites' only partial crc coverage given the integrity features of crypto, but ya know, if you are worried about running out of udp ports and want to run towards the future, attempting to use it ought to be very interesting. And natting it over ipv4 does work, but is rather rare today. It's only a couple lines of code to use/listen on it. I used to run test versions of babeld over it, just because I could.

Pay no attention to the mad speculator behind the curtain...

John Davis · Answer 16 · Thu Nov 04 2021 06:27:58 GMT+0800 (China Standard Time)

@Sean-Der @mengelbart are there use cases where having all connection stats available real-time/globally in something like redpanda/kafka would be useful to pion users? The intention would be connection orchestration, bandwidth concerns, and business logic specific to a user, which up/down scales automatically using kafka consumer groups. In all likelihood it would be a separate repo. What would be useful about it is being able to try out new strategies more quickly/easily. Could even bring something like tensorflow into the picture for bandwidth concerns, etc.

Dave Täht · Answer 17 · Fri Nov 05 2021 00:02:22 GMT+0800 (China Standard Time)

I'd really like to be able to construct a lag meter somehow.

Juliusz Chroboczek · Answer 18 · Fri Nov 05 2021 04:26:10 GMT+0800 (China Standard Time)

@dtaht you already have one — receiver reports contain all the information needed to compute application-layer RTT. Have a look at /stats.html next time you use Galene, where all of that is exported.

@unicomp21 In my experience, the accounting libraries tend to use a set of global atomics to keep statistics, which causes cache-line bouncing in higly concurrent applications. Since Galene has been carefully tuned to scale well across multiple cores, that's something that makes me nervous. Please don't add dependencies on third-party accounting libraries in low-level code.

Dave Täht · Answer 19 · Fri Nov 05 2021 08:33:15 GMT+0800 (China Standard Time)

@jech on-screen, much like you can see a lagmeter in many games. If the data is there, can the javascript pull it out? galene Push it to each client? (e.g. see the lag others are experiencing). Example:

https://www.earthli.com/quake/lagometer.php

https://en.wikipedia.org/wiki/Lagometer

Juliusz Chroboczek · Answer 20 · Fri Nov 05 2021 23:36:59 GMT+0800 (China Standard Time)

That's pretty much off-topic for this thread, perhaps you'll want to open an issue in Galene or raise the issue on Galene's mailing list.

Dave Täht · Answer 21 · Sat Nov 06 2021 02:21:58 GMT+0800 (China Standard Time)

agreed. Apologies.

Thibault Saunier · Answer 22 · Mon May 30 2022 22:10:08 GMT+0800 (China Standard Time)

You should maybe update the Todo list from the issue description now that gcc has landed? :-)

Sean DuBois · Answer 23 · Mon Sep 25 2023 23:24:57 GMT+0800 (China Standard Time)

Sorry I missed this @thiblahute. Updated the tracking issue

Are you using the GCC implementation at all? Any feedback or questions I would always love to talk about it!

Thibault Saunier · Answer 24 · Wed Sep 27 2023 22:28:47 GMT+0800 (China Standard Time)

@Sean-Der I am not using it no, I just read your implementation while implementing GCC for GStreamer: https://gstreamer.freedesktop.org/documentation/rsrtp/rtpgccbwe.html?gi-language=c#rtpgccbwe

Alex Pokotilo · Answer 25 · Fri Dec 29 2023 17:07:42 GMT+0800 (China Standard Time)

First of all I want to thanks everybody involved in GCC implementation! Thanks for very useful comments in this thread either.

I faced with a problem during my tests and I'm seeking a bug in GCC implementation to report it appropriately, but as GCC code is full of math let me explain what I encountered and may be @mengelbart or somebody else give me some advises to find this problem quickly.
Problem reproduced with fresh version of Firefox(120.0.1 (64-bit)) and Chrome(Version 120.0.6099.129 (Official Build) (64-bit)).

Unfortunately it's hard to just provide a test to reproduce this case. I'm sorry.
But there are some simple steps I followed to reproduce this problem.

c.InterceptorFactory, err = cc.NewInterceptor(func() (cc.BandwidthEstimator, error) {
	return gcc.NewSendSideBWE(
		gcc.SendSideBWEMinBitrate(300000),        // set loweset bitrate of abr streams
		gcc.SendSideBWEInitialBitrate(1000000),   // set initial bitrate of abr streams
		gcc.SendSideBWEMaxBitrate(2000000),       // set maximum bitrate of abr streams
		gcc.SendSideBWEPacer(gcc.NewNoOpPacer())) // I set NoOp pacer for simplicity sake
})

if err != nil {
	return nil, nil
}

i.Add(c.InterceptorFactory)
if err = webrtc.ConfigureTWCCHeaderExtensionSender(m, i); err != nil {
	return nil, nil
}

I send single bitrate stream of 750K. I don't change stream bitrate during the test
Once start playback I wait till estimator reached maximum bitrate

target bitrate 2000000
target stats {
"averageLoss": 2.253618279230226e-14,
"delayEstimate": -0.203,
"delayMeasurement": -0.065,
"delayTargetBitrate": 2000000,
"delayThreshold": 6,
"lossTargetBitrate": 2000000,
"state": "increase",
"usage": "normal"
}

Once Max bitrate reached by estimator I run following script

sudo tc qdisc del dev lo root netem delay 0ms
sudo tc qdisc add dev lo root netem delay 0ms

echo "start playback and wait till GCC estimator shows maximum bandwidth then press Enter"
read -n 1 x;

echo "delay 5ms"
sudo tc qdisc change dev lo root netem delay 5ms
sleep 9
echo "delay 1ms"
sudo tc qdisc change dev lo root netem delay 1ms
sleep 9
echo "delay 0ms"
sudo tc qdisc del dev lo root netem delay 0ms

make sure estimator still shows max bitrate and press Enter

During script execution estimated bandwidth drops to min bandwidth and sometimes even below(rarely).
The problem is that estimated bandwidth goes around minimum bandwidth once problem reproduced and doesn't reach max bitrate anymore.

I've attached observed stats after script execution.
stats.zip

Unfortunately problem is not 100% reproducible, but if you run this script several times, you will eventually face it.
I'm trying to find where is the bug or it's browser problem. I run my tests on localhost interface.
Any help appreciated

Dave Täht · Answer 26 · Sun Jan 07 2024 18:24:20 GMT+0800 (China Standard Time)

Interesting. Part of my issue is in trusting netem to change anything without screwing up the link. can you also generate tc -s qdisc show stats? a few times after each change?

Are you observing any drops at all? Any queue depth at all?

As a completely opposite test of what you are trying to do, could you try this with cake, changing the bandwidth parameter? I trust cake to take the change command, and want to permute the gcc estimator to what cake presents as bandwidth and delay (e.g, if you change it to 500k, the delay will skyrocket, and loss start to happen), rather than just applying delay.

(btw, you can get rid of a del/add cycle with replace)

tc qdisc replace dev lo root cake bandwidth 2.5mbit #

tc qdisc change dev lo root cake bandwidth 500kbit # etc

Alex Pokotilo · Answer 27 · Thu Jan 11 2024 01:11:49 GMT+0800 (China Standard Time)

@dtaht it's a pleasure to be in the same thread with the legend. I've prepared scripts and even run some but had to dig into
https://www.linkedin.com/pulse/explaining-schcakes-statistics-dave-taht to understand tc stats.
I'll prepare test results and get back. I just need some time to not look like a dummy

Alex Pokotilo · Answer 28 · Fri Jan 12 2024 19:19:58 GMT+0800 (China Standard Time)

@dtaht, I've checked netem and it doesn't hurt link.
If I run the same netem command(with the same delay value) several times nothing happens with BWE.
The problem I described reproduced only in case if I set new delay with big dispersion from original.
E.g if I set 200ms before I start test and then set delays ~ 200ms+-0.06*(200ms) everything works fine.
If I set delay to 0ms and then set delay to 5m the problem is reproduced immediately even though delay increased very little in absolute values.

I've checked loss_based_bwe.go. I both checked and debugged code as well as patched "updateLossEstimate" method and hardcoded "lossRatio" to always be "0". Problem reproduced with these patches. This means the problem is not in loss_based_bwe.go. I'll inspect delay based bwe.

My quess is that delay based bwe calculates delay "mean" and if current "dispersion" >= some X% of mean then this is a signal to decrease bandwidth. The problem is probably in fact that in case of initial 0ms delay in localhost interface then I set 5ms this is a huge dispersion from delay estimator POV even though absolute delay value is very little.
Most probably I will add simple condition that if the square of the standard deviation is less than 10ms we should just consider this fluctuation as white noise.
I'll get back once I check delay based bwe.

I've performed cake-based tests. They reproduced the problem the same way as netem based tests but netem is more simple to calculate.

Alex Pokotilo · Answer 29 · Thu Jan 18 2024 21:33:21 GMT+0800 (China Standard Time)

Hi @dtaht,
I was able to reach @mengelbart and ask him to help.
He made a fix #221 and this fix helps in eliminating reset target bitrate to minBitrate.
I found two problems either:

one with variance calculation in exponentialMovingAverage
but much more important problem is that rateController never reset latestDecreaseRate but rfc states we need to reset it.
Since we don't reset latestDecreaseRate our current bitrate could always fall into latestDecreaseRate condition in the future and we will reset target bitrate in "increase" state if our source bitrate is less that target bitrate and close to latestDecreaseRate.
So we MUST reset latestDecreaseRate.
We will work with @mengelbart to fix this properly.

Another problem is why we come to overuse state. But we will fix it a little later.

Alex Pokotilo · Answer 30 · Fri Jan 26 2024 00:38:24 GMT+0800 (China Standard Time)

Hi all,
I'm proposing to discuss this #226 request as a start point. This version works fine to me and I am going to use it in production.
@dtaht if you know details about math around gcc BWE, you could really help me with review or maybe we can setup a call so you can guide me and check my results.
@mengelbart could you please review #226 and share your opinion too?
@Sean-Der you asked who is using this. I'm going to use this in production and if somebody can help me with review I can sponsor this job. I will all myself, just help to find somebody to review

Dave Täht · Answer 31 · Sun Jan 28 2024 20:48:03 GMT+0800 (China Standard Time)

It has been 10+ years since I looked at gcc. My principal critique of the current codebase was that delay is more important than loss.

Dave Täht · Answer 32 · Sun Jan 28 2024 20:48:26 GMT+0800 (China Standard Time)

I am otherwise kind of low on brain cells!

Alex Pokotilo · Answer 33 · Mon Jan 29 2024 04:01:33 GMT+0800 (China Standard Time)

Dave, Your suggestion about ‘cake’ was very helpful. I’m sorry bothering you, but I needed someone to review my results so I tried my luck with you;)