codename0og / RVC_Onnx_Infer

RVC Onnx Infer- Upgraded and simplified-ish

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Let's Improve This.

MikuAuahDark opened this issue · comments

So, I have some experiments related to RVC and ONNX lying in my laptop. The goal was to have deeper understanding how it work so I can create Java Android port of RVC pipeline with full on-device inference. So far I have these working:

  • Full ONNX-only inference.
  • RMVPE f0 pitch estimator. I only used RMVPE in the first place.
  • Support for both 32-bit and 16-bit float content-vec 768. There's no noticeable difference between 32-bit and 16-bit content-vec.

With my implementation, I can inference up to 36 seconds long audio before ONNX Runtime starts throwing error. It also doesn't perfect. There are certain issues and notes:

  • It uses okada's RMVPE .onnx file.
  • The voice ONNX must be exported from okada's voice-changer.
  • For some reason, using CPU + okada's RMVPE result in bad quality. Using DML works though.
  • Faiss is not used. This means the index file is unused. Since installing Faiss through pip is not supported, then I don't bother.
  • Considering all of that, my code theoretically works up to Python 3.12.

I don't want it to go waste, but at same time I need some insights on how to improve it further. If you'd like to discuss it further, please let me know your Discord username (or perhaps another way to communicate) by sending me an email (it's in my GitHub). Perhaps we could merge all the improvements to this repository instead.

There's few things to consider.

  • W-okada's onnx export isn't really optimal for rvc purposes due to some key and dictionary differences. It is recommended to export it using RVC ( I have a proper working exporter in my wip fork ). One in standard rvc and most likely other forks is faulty and has some issues.
  • W-okada's onnx is meant to be working with DML, so, AMD and most likely Intel but I ain't sure on that one. Perhaps there's a way to incorporate Vulkan into it but I ain't the person to ask for that really.
  • Faiss not being used isn't that much of a deal, given the constraits.
  • Generally, when rvc still 'worked' in dml manner, it was using the bundled rmvpe.onnx rather than w-okada's.

Overal, I think that chasing onnx isn't a worthy subject mainly due to the fact the current pytorch -> onnx exporting methodology is based on the old static tracing. If one was to create a onnx dynamo based exporting where stuff's dynamic, it'd definitely improve the models' performance and " reproduction quality ".
https://pytorch.org/docs/stable/onnx.html

Other than that:

  • Both you and me encounter memory errors in longer ish inferences. I was able to squeeze 50~ ish secs max out of it. Without some more advanced and actually working internal slicing / segmenting and then concatenation, yeah, rip.
  • Having no ability to use Faiss + worse quality onnx compared to pytorch already is a downside.
  • Despite it being counterintuitive, onnx inference isn't that much faster than pytorch ( rvc native ) on cpu and there we've no inference length constraints as well as index is an option.

And for some closing words, yea, I kinda abandoned the idea after seeing how little benefits if any it provides while introducing issues.
However, you might find this interesting, for your android-port purposes perhaps? ( I considered going that way but nah, seems it ain't any better, esp in terms of memory issues ~ at least on pc. )

https://axinc.jp/en/solutions/ailia_sdk.html
https://github.com/axinc-ai/ailia-models
https://github.com/axinc-ai/ailia-models/tree/master/audio_processing/rvc

In any way, good luck! I personally am out of it.
In the best case scenario, Imma just add a native onnx support to my wip rvc fork.

W-okada's onnx export isn't really optimal for rvc purposes due to some key and dictionary differences.

For me, it's not an issue. My script can detect the available input parameter names and handle the difference as needed. Thus supporting both okada's and mainline ONNX. The reason I'm using okada's exporter because mainline exporter doesn't export the model correctly. It gives ONNX file but throws (not a memory) error during inference.

W-okada's onnx is meant to be working with DML

Probably. I'm only hitting the issue specifically with RMVPE though. I should try mainline RMVPE and see how it goes.

Both you and me encounter memory errors in longer ish inferences.

My error is not related to memory errors. I'm inferencing on integrated AMD APU which means the VRAM is CPU RAM. Rather it errors related to the tensor itself unable to handle very long inputs, around 1860 x 960 samples for V2 48k model.

If one was to create a onnx dynamo based exporting where stuff's dynamic, it'd definitely improve the models' performance and " reproduction quality ".

I think this would be a good idea to try out, if dynamo export is supported on Windows.

Without some more advanced and actually working internal slicing / segmenting and then concatenation, yeah, rip.

For my script, I have 2 strategies to solve this issue:

  1. Silence detection. I'm using 1D convolution to compute the RMS of the sample. If the average is below threshold then I treat them as silence and slice them.
  2. Segmentation interpolation. If silence detection is longer than 30 seconds, then I fallback to interpolating the samples. I infer the previos 0.1 second buffer, 0.2 second of desired audio, and 0.1 second ahead buffer then I interpolate between those 0.1 seconds buffer. This result is worse quality though.
    Sure there could be a better segmentation. The mainline repository is probably a good way to look at this (if it has one).

Despite it being counterintuitive, onnx inference isn't that much faster than pytorch ( rvc native ) on cpu

Yeah. For my case though, my intended target won't be able to run Python code at all. Let alone Numpy or PyTorch. At least ONNX has NNAPI (low-level Android API for neural network) backend though, so it may able to leverage the Hexagon DSP in my phone at least.

For the solution, I'm not quite happy of resorting to commercial solution sadly. The reason why I wrote my own because I want to eventually open-source the app source code and improving trust by using only OSS libraries. Their code still can be used for reference to figure out what certain parameters are doing though.

At least in my case the issues were mostly bound to memory leaks but yea.
Hexagon, hmmm.. that's right, Snapdragons surely would make it

About the broken exporter, I mentioned the issue quite a while ago but devs kinda don't care so I had to fix it on my own, effectively porting older rvc's exporter which works properly but it's only rvc compatible due to the shapes yea.
Generally I am quite busy lately and onnx, like mentioned, isn't that much longer within my interest however, if you need some contact with me then catch my discord: .codename0.

Thanks. Friend request sent. My Discord username is similar to my GitHub username, you can't miss it. You'll know it when you see it.

Thanks. Friend request sent. My Discord username is similar to my GitHub username, you can't miss it. You'll know it when you see it.

Well, so far haven't seen anything ~ You sure you invited the right person?