possibly use MLX for MacOS users with WhisperSpeech

Question

possibly use MLX for MacOS users with WhisperSpeech

BBC-Esq opened this issue 5 months ago · comments

The purpose is the discuss possibly implementing MLX support for MacOS users. For example, currently Pytorch doesn't support the FFT operation whereas MLX does. This means that WhisperSpeech must put certain models and/or tensors etc. on CPU for MacOs users...whereas CUDA users have complete speedup.

Possibly Compatible with the operator WhisperSpeech uses

Option 1 - implement MLX just where MPS can't be used
Option 2 - completely replace MPS with MLX
Option 3 - replace MPS with MLX as much as possible based on the multiple models involved in WhisperSpeech and whether each specifically can be run with MLX.
Option 4 - offer MLX IN ADDITION to MPS for all MacOs users.

In general, MLX provides a 2-3x speedup compared to MPS across the board in most cases.

Here are some snippets from the Medium article:

Benchmark Setup

Linear Layer

Softmax

Sigmoid

Concatenation

Binary Cross Entropy

Sort

Conv2D

Unified Memory Gamechanger

MLX:

https://github.com/ml-explore/mlx

MLX Examples:

https://github.com/ml-explore/mlx-examples/tree/main/llms/llama

MLX Community:

https://huggingface.co/mlx-community

MLX Bark:

https://huggingface.co/mlx-community/mlx_bark (would beat all current implementations in WhisperSpeech currently, in MPS as far as speed that is)

Sample MLX Whisper Script:

https://github.com/ml-explore/mlx-examples/blob/main/whisper/whisper/transcribe.py

Example MLX Whisper model:

https://huggingface.co/mlx-community/whisper-large-v2-mlx

signalprime · Answer 1 · Tue Feb 20 2024 22:47:09 GMT+0800 (China Standard Time)

Great initiative @BBC-Esq ! I'll definitely circle back to this one as soon as possible

BBC-Esq · Answer 2 · Tue Feb 20 2024 22:49:25 GMT+0800 (China Standard Time)

@signalprime It would take someone with more programming experience than me to implement, especially since I don't own a Mac, but thought I'd start the discussion anyways. Interested as always in what you find out.

BBC-Esq · Answer 3 · Thu Feb 22 2024 23:55:37 GMT+0800 (China Standard Time)

UPDATE: Looks like Pytorch might be getting support sooner than later...

pytorch/pytorch@53bfae2

signalprime · Answer 4 · Sun Feb 25 2024 04:31:50 GMT+0800 (China Standard Time)

I'm definitely looking into it. Reviewing the Vocos model today

BBC-Esq · Answer 5 · Sun Feb 25 2024 04:34:54 GMT+0800 (China Standard Time)

I'm definitely looking into it. Reviewing the Vocos model today

I'd love to learn if you want to keep me posted and teach me along the way, just FYI. This is not my profession but a hobby.

signalprime · Answer 6 · Sun Feb 25 2024 08:56:38 GMT+0800 (China Standard Time)

Absolutely @BBC-Esq, I will keep you in the loop about it. MLX mimics the pytorch API in most ways. I've been building models since before we had frameworks like TF and Torch, and in this case I'll be rebuilding the Vocos model using the MLX library. It just depends on time constraints.

I recently finished a long project with ML/RL in the finance domain and put in an application with Collabora last week. Would you put in a nice word for me @jpc?

signalprime · Answer 7 · Sun Feb 25 2024 13:46:15 GMT+0800 (China Standard Time)

I'm getting closer.. almost reached the end of the hole. We have a standard whisper model for MLX already established.

I was able to convert the Vocos model and weights to MLX, however ran into many issues with its feature extractor. MLX doesn't have weight_norm established yet. I've dug into the code, and debating when I have time to add the _weight_norm primitive to the C++ MLX library

https://github.com/pytorch/pytorch/blob/834c7a1d3ea07878ad87d127ee28606fc140b552/aten/src/ATen/native/WeightNorm.cpp#L50

I'd like to do a little more research before trying that because it could perhaps be handled another way, or not needed at all, kinda like a quick initial pass-through. I removed those references and there are some other issue, kinda out of energy for this today.

BBC-Esq · Answer 8 · Sun Feb 25 2024 18:10:44 GMT+0800 (China Standard Time)

Interesting...

signalprime · Answer 9 · Mon Feb 26 2024 02:43:27 GMT+0800 (China Standard Time)

Good thing I waited. I got a response that it should be possible using existing ops.

Here is the whisper model in MLX format, which is used during voice cloning.

I was working with MLX conversions for all the parts of the Vocos model. Transferring weights wasn't an issue, but components used in the functions also likely need to be updated. I'm still becoming familiar, but it seems parts can be mixed and matched.. as in a tensor can be converted to an MX array and passed to a MX component and back to a tensor later. That would appear necessary since I wouldn't want to keep going further and further into torchaudio for example. Ideally we just put a replacement for the components where torch doesn't yet support the ops.

BBC-Esq · Answer 10 · Mon Feb 26 2024 04:56:25 GMT+0800 (China Standard Time)

Good thing I waited. I got a response that it should be possible using existing ops.

Here is the whisper model in MLX format, which is used during voice cloning.

I was working with MLX conversions for all the parts of the Vocos model. Transferring weights wasn't an issue, but components used in the functions also likely need to be updated. I'm still becoming familiar, but it seems parts can be mixed and matched.. as in a tensor can be converted to an MX array and passed to a MX component and back to a tensor later. That would appear necessary since I wouldn't want to keep going further and further into torchaudio for example. Ideally we just put a replacement for the components where torch doesn't yet support the ops.

That's what my intuition was telling me based on what I read about MLX, but I am far from an expert and would have no way to verify it. My initial hypothesis was that it might be possible to use MLX for some (but not all) of the necessary operations (or whatever you call them), kind of mix and match like you were saying. Math is math...but again, this it totally a notice intuition kind of thing.

Let me know if I can help out any...

BBC-Esq · Answer 11 · Tue Feb 27 2024 21:54:12 GMT+0800 (China Standard Time)

Not sure if it's relevant, but apparently aten::Lupsample_linear1d has been implemented on pytorch's working version (not included in a release yet though):

pytorch/pytorch#116630 (comment)

BBC-Esq · Answer 12 · Thu Feb 29 2024 21:48:44 GMT+0800 (China Standard Time)

@signalprime how's it going? Any updates?

signalprime · Answer 13 · Wed Mar 06 2024 01:57:27 GMT+0800 (China Standard Time)

Hi @BBC-Esq I haven't had an opportunity to resume work on this unfortunately, my friend

BBC-Esq · Answer 14 · Wed Mar 06 2024 02:04:18 GMT+0800 (China Standard Time)

Hey @signalprime I hope you don't stop working on this kind of stuff even if you don't get the job with Collabora. I enjoy working with ya and look forward to improving this all-around kick ass library. Just throwing that out there!

signalprime · Answer 15 · Wed Mar 06 2024 02:18:46 GMT+0800 (China Standard Time)

Likewise @BBC-Esq, I'll keep it on my mind and make time to return to the effort. Definitely not related to Collabora, rather the launch of another project, meetings, and the occasional things that pull us away from our desks. At the next go, I'll try the mixed approach where rather than converting everything to MLX we just use MLX ops for those times where coverage is still missing in torch. If that works it should keep things more simple. I've been spending a lot of time working with autonomous agents, and giving them a good voice, using whatever style we prefer, is an important feature.

Jakub Piotr Cłapa · Answer 16 · Wed Mar 06 2024 16:35:42 GMT+0800 (China Standard Time)

@signalprime Sure, I'll see what I can do :)

Jakub Piotr Cłapa · Answer 17 · Thu Mar 14 2024 00:28:27 GMT+0800 (China Standard Time)

@signalprime Btw. do you have a Discord? Maybe we could have a chat there?

signalprime · Answer 18 · Fri Mar 15 2024 01:24:02 GMT+0800 (China Standard Time)

@jpc yes absolutely, I sent you an email with details. Looking forward to it!

Touhidul Alam · Answer 19 · Mon May 06 2024 17:49:53 GMT+0800 (China Standard Time)

is it still working with MPS? i couldn't make it run the current main branch, its use CPU only.