Cached Usage of SparseBytePairFeaturizer docs contains dense BytePairFeaturizer

Question

Cached Usage of SparseBytePairFeaturizer docs contains dense BytePairFeaturizer

namhoai167 opened this issue 3 years ago · comments

The Cached Usage of SparseBytePairFeaturizer contains dense BytePairFeaturizer. I checked the sparse_bytepair.md and it is with SparseBytePairFeaturizer. There was a commit edit this but it's not showing on the docs.

Sara-tagger · Answer 1 · Tue Dec 28 2021 21:00:11 GMT+0800 (China Standard Time)

Thanks for raising this issue, @alopez will get back to you about it soon✨

Please also check out the docs and the forum in case your issue was raised there too 🤗

vincent d warmerdam · Answer 2 · Tue Dec 28 2021 22:57:11 GMT+0800 (China Standard Time)

That's a fair comment!

One thing though; are you using the sparse BytePair featurizer? I'm currently prepping the repository for Rasa 3.0 and my impression was that the feature was barely used and I was considering dropping it. Would you happen to have an anecdote that suggests that I should keep it around?

namhoai167 · Answer 3 · Wed Dec 29 2021 12:40:45 GMT+0800 (China Standard Time)

Thank @koaning for your reply, you and Dr. Rachael are the two who teach me most of the stuff about NLP from Jan and till now. I'm not using sparse BytePair featurizer at the moment and you can drop it if you want. I came from your videos (this, this and this), so CountVectorsFeaturizer is doing a great job at handling spelling errors, but I want to try stacking a subword featurizer (which is dense BytePair) and benchmark will it increase DIET classification score on my small artificial errors data.
For sparse BytePair featurizer, I'm not sure how it works. It has terminologies that I don't know like BytePair tokeniser, I try to figure out is it BPE tokenizer or something else. After well understand it, I may give it a shot.

vincent d warmerdam · Answer 4 · Wed Dec 29 2021 17:56:17 GMT+0800 (China Standard Time)

Happy to hear it that you find our content useful :)

I will drop the sparse featurizer then, which will also resolve this issue.