RasaHQ / rasa-nlu-examples

This repository contains examples of custom components for educational purposes.

Home Page:https://RasaHQ.github.io/rasa-nlu-examples/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cached Usage of SparseBytePairFeaturizer docs contains dense BytePairFeaturizer

namhoai167 opened this issue · comments

The Cached Usage of SparseBytePairFeaturizer contains dense BytePairFeaturizer. I checked the sparse_bytepair.md and it is with SparseBytePairFeaturizer. There was a commit edit this but it's not showing on the docs.

Thanks for raising this issue, @alopez will get back to you about it soon✨

Please also check out the docs and the forum in case your issue was raised there too 🤗

That's a fair comment!

One thing though; are you using the sparse BytePair featurizer? I'm currently prepping the repository for Rasa 3.0 and my impression was that the feature was barely used and I was considering dropping it. Would you happen to have an anecdote that suggests that I should keep it around?

Thank @koaning for your reply, you and Dr. Rachael are the two who teach me most of the stuff about NLP from Jan and till now. I'm not using sparse BytePair featurizer at the moment and you can drop it if you want. I came from your videos (this, this and this), so CountVectorsFeaturizer is doing a great job at handling spelling errors, but I want to try stacking a subword featurizer (which is dense BytePair) and benchmark will it increase DIET classification score on my small artificial errors data.
For sparse BytePair featurizer, I'm not sure how it works. It has terminologies that I don't know like BytePair tokeniser, I try to figure out is it BPE tokenizer or something else. After well understand it, I may give it a shot.

Happy to hear it that you find our content useful :)

I will drop the sparse featurizer then, which will also resolve this issue.