[RFC] Deprecate/Stop TorchText releases starting with Pytorch release 2.4

Question

[RFC] Deprecate/Stop TorchText releases starting with Pytorch release 2.4

atalman opened this issue 2 months ago · comments

🚀 Deprecation of TorchText releases

As of September 2023 we have paused active development of TorchText because our focus has shifted away from building out this library offering.

We would like to do the following:

For TorchText Release 0.17.2 and 0.18.x TorchData dependency is removed from TorchText [COMPLETED]
- Users can still install torchdata manually to use the datasets
Minor PyTorch Release 2.3 will be last Release where we release TorchText 0.18
Starting PyTorch Release 2.4 we would like to stop releasing TorchText.
TorchText will still be available from nightlies on a best effort basis, with no guarantee that we'll be fixing issues and breakages

For reference here is the PyTorch Release schedule:
https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-cadence

cc @seemethere @malfet @matthewdzmura @NicolasHug

Ankith Gunapal · Answer 1 · Fri Mar 29 2024 06:10:00 GMT+0800 (China Standard Time)

Do we recommend any alternatives? Ex: TorchServe has a text_classifier handler and tests associated with these ( uses TorchText)
https://github.com/pytorch/serve/blob/master/ts/torch_handler/text_classifier.py

So, wondering whats the strategy. Should we replace it with HuggingFace and PyTorch would come up another solution at a later date?

Ankith Gunapal · Answer 2 · Fri Mar 29 2024 06:14:13 GMT+0800 (China Standard Time)

Can we release TorchText in PyTorch 2.3 for all platforms (ex: aarch64, not sure what other platform has this missing for PyTorch 2.2) ?

Andrey Talman · Answer 3 · Fri Mar 29 2024 06:17:43 GMT+0800 (China Standard Time)

Yes. we will release same set of binaries as for PyTorch 2.2:
https://hud2.pytorch.org/hud/pytorch/text/release%2F0.18/1?per_page=50

Nicolas Hug · Answer 4 · Tue Apr 02 2024 17:25:33 GMT+0800 (China Standard Time)

Do we recommend any alternatives?

This would be case-by-case. For the TorchServe example the simple alternative is to copy/paste the one functionality that was used from torchtext into the example. It's very short and simple, so that's a viable solution.

https://github.com/pytorch/text/blob/main/torchtext/data/utils.py#L207-L228

Ankith Gunapal · Answer 5 · Wed Apr 03 2024 06:45:04 GMT+0800 (China Standard Time)

Do we recommend any alternatives?

This would be case-by-case. For the TorchServe example the simple alternative is to copy/paste the one functionality that was used from torchtext into the example. It's very short and simple, so that's a viable solution.

https://github.com/pytorch/text/blob/main/torchtext/data/utils.py#L207-L228

Thanks. This seems like a good idea. We also use from torchtext.data.utils import get_tokenizer . Looking at the code, it doesn't seem too complicated to copy paste it for basic_english

Andrey Talman · Answer 6 · Thu Apr 11 2024 01:20:17 GMT+0800 (China Standard Time)

cc @matthewdzmura @seemethere : releng team and @malfet propose to stop releasing TorchText as of release 2.3 since we can't ensure the quality of the release.

Hadi · Answer 7 · Thu Apr 18 2024 16:55:33 GMT+0800 (China Standard Time)

What would be the alternative if I need a preprocessing for bert / vocab / regex operations compiled with my model?

Felipe · Answer 8 · Tue Apr 30 2024 20:53:06 GMT+0800 (China Standard Time)

keras.io is a viable alternative ...

gluefox · Answer 9 · Thu May 02 2024 17:34:00 GMT+0800 (China Standard Time)

Is there any alternative for c++-only environments, that need native tokenizers now?