dbry / WavPack

WavPack encode/decode library, command-line programs, and several plugins

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Integrate low latency mode into regular releases

mardab opened this issue · comments

I have recently came across low latency mode document from 13 years ago, looks like its branch was untouched since initial experiment. In that time, Opus came up and became a standard low-latency lossy codec with multitude of applications, while having single digit percentage of cpu usage, even at its most complex settings and with GRU neural network, simply because of combination of decade-long optimization and technological advancement in modern processors, even in IoT space. To my surprise, there is no codec that's open source and low latency and hybrid/lossless, besides that experiment from 14 years ago, while applications for such codec keep popping up from time to time.

I propose this integration, so that there is no difference/errors whenever decoder encounters "continuation blocks", as well as to introduce low-latency option to standard encoder builds.

When I created the low-latency version I thought that latency was the only issue (which it addresses superbly). Now I understand than in many, and perhaps most, low-latency types of applications, error resilience and recovery are actually as important as latency.

The scheme I should have pursued is simply making the existing block overhead as small as possible (the existing preamble
structure was optimized for blocks of thousands of samples). That was the idea and motivation for my newer wavpack-stream project which targets audio streaming applications where low-latency and error resilience are both important. It creates blocks with about 1/3 the overhead as regular WavPack and has the hybrid and lossy modes as well. Additionally it has an option to terminate blocks at a specific byte length rather than number of samples which might be useful in network scenarios that have a maximum or optimum block size. I have successfully integrated this library into a multi-room speaker system.

In any event, that's why the low-latency branch was abandoned and I believe that this is a suitable replacement. As for integration with regular WavPack, I'm not sure I understand the value of that since for formats are not interchangeable and the regular WavPack library has many features that are useless in a streaming scenario.

I see, and I understand.

However, I'm going to continue being an advocate for one common wavpack project/format/library/toolset.

Of course, that would mean a lot of work, including a change to format, which I'm pretty sure is way to soon for that to be an acceptable option. Right now I see no better option for having stream/low latency capability whilst staying up-to-date with fixes and improvements from regular releases without making a mess.

You bring up a great point. I need to make sure that any fixes to the mainline WavPack code (which is constantly being fuzz tested) get ported into the wavpack-stream library.

However, it's probably unfortunate that the wavpack-stream project has WavPack in its name because they really are completely independent, despite having a common origin. Because WavPack is supported in a lot of older software and hardware devices, I certainly would not want the standard encoder to generate low-latency packets because that would break all the decoders out there. That ship has sailed.

And since the wavpack-stream library is intended for low-level (possibly embedded) streaming applications, there's no reason to include support for stuff like seeking and tagging. Being able to rip that all out helps make the project so much more manageable (no more iconv dependancy!)

All that said, if I were starting all over I would probably use the smaller headers of the streaming version because having smaller packets is just better. But again, it's too late for that now.

"Completely independent"? So they don't share underlying codec algorithm? If they do, the difference is in I/O, which for compatibility has to be kept separately, however codec core could be one for both of them, right?

Yes, they share the underlying codec algorithm and, for instance, the assembly language optimizations. However it's not just the I/O on top that's different, the file format is completely different. Whereas regular WavPack has a 32-byte header, the streaming library has a 6- or 12-byte header depending on the implementation. Also, the data after the header, which contains the initial coefficients of the decoder, is different (designed to be more concise but less accurate in the streaming version).

So while it would be possible to merge the source code for the two projects, the result would be code full of conditionals that would be far more difficult to understand and maintain, especially by an outsider. I think it would be far easier to simply port across any applicable bug fixes and improvements when they arise. And like I said previously, there really should not be a question about which library a given application should use.

I recently got a different idea and wonder if this would work. Since I'd like to be up-to-date with regular WavPack's bugfixes and optimizations, but implanting streaming support would be too much to maintain, how about making a sort of library for converting streaming format "on the fly" to be readable/writable by standard WavPack codec? This way adding streaming support won't have to be a major change to WavePack, will still be optional and relatively easy to keep in line with releases, but as of today I have no idea how to write such translator "the right way".

I'm pretty sure it would be possible to create a utility to convert a wavpack-stream to a regular WavPack stream/file that could be decoded by the regular library. However, the reverse would not be possible because regular WavPack blocks are too long, and even if you got around that the quantized values generated by the regular WavPack library cannot be represented in the streaming version.

I'm still having a hard time understanding your concerns, and it might help if you let me know what you're actually doing with the streaming library. I think it's unlikely that there will ever be a fix/change in the regular WavPack library that would need to be ported into the streaming code. Virtually all of the fixes and improvements in the future will be in the WavPack command-line programs, and the command-line programs in wavpack-stream are just placeholders for experimenting with the library; they should not appear in an actual product.

It seems like if you are using the wavpack-stream library for streaming then you would have no use for the regular WavPack library at all. You should bring in upstream changes to that library unless a change is made that breaks old decoders (in which case you would need to decide what to do based on your application, although I certainly have no plans for this now and it would have to offer a significant improvement to be considered).

If you ask me for a aplication, which prompted me to file this request in the first place, that would be basically most of current applications of Opus codec, more specifically, cases where quality is as important as low-latency (with assumption of much higher compute budget).

Although I didn't have time in November to experiment further with my copy of this repository, I thought of another point for this feature set: hybrid mode stream with packet QoS: since low-latency packets are different than regular WavPack anyway, why not make them have a lossy/correction flag? This way, when stream has insufficient bandwidth for full lossless audio, be it temporarily or constantly, one could know which packets could be "safely" dropped.

The low-latency packets do contain that information, and in fact I used this library to create a system for reliably streaming audio over UDP. I created both a lossless and lossy stream on the host and normally sent just the lossless packets (and for network efficiency I used the mode that creates uniformly sized packets).

It a re-transmit was required, then I would send the much smaller lossy version of the missing packet, and these lossy packets were seamlessly merged on the receiving end if the lossless one had not shown up in time. If too many packets were going missing or late then I would temporarily switch to sending only the lossy packets which could cut the required bandwidth by at least half (and even better at higher sampling rates).

Of course this was was all inaudible because the lossy bitrate was high enough to be completely transparent (about 6 bits / sample @ 44.1/48 kHz) but the system would remain lossless whenever the network would support it.