Create Asian-language specific sections
NirantK opened this issue · comments
To become the go-to resource, we should be able to curate good tools and datasets in at least the following languages (in addition to English):
- Chinese
- Korean
- Japanese
Of course, we are willing to accept PRs from all other languages.
Please feel free to raise a PR or simply comment on this issue itself and we will add it on your behalf.
-
Thoughts? Is this a good direction to take?
-
Can you take up any of the languages? Sorry for assuming your Asian heritage, the South Korean flag is on your Github account
A short search for NLP Chinese section landed me in these places:
- A CNN for chinese text classifications in TensorFlow
- Using deep learning for Chinese information extraction
- Recognizers by Microsoft for Chinese( currently developing Korean and Japanese)
For Korean:
- A R package for Korean NLP; koNLP(unmentioned in this awesome list; koNLPy is mentioned already)
- A whole new list of corpora, communities and POS taggers
Didn't have time for Japanese, though: sweat_smile:
@NirantK , do you want me to begin for Korean? If so, then can you explain the initial steps?
Edit: A small search landed me here
@the-ethan-hunt thanks for creating the Korean section for us. Would you like to go ahead and create more for Chinese?
@NirantK I searched for Chinese but found very very few. Almost all the libraries have been mentioned in the list previously and not much has changed since I posted this comment:
- A CNN for chinese text classifications in TensorFlow
- Using deep learning for Chinese information extraction
- Recognizers by Microsoft for Chinese( currently developing Korean and Japanese)
Do you want me to get going with it? IMHO, I don't think we need a PR for that.
Thanks for the update @the-ethan-hunt
Our present coverage of Chinese-NLP tools is not exhaustive, we need to improve that soon. I'll look into it.
Consider Japanese or frankly, any other open issue?
I'd be grateful for any help on #96 if you are looking at open issues. If nothing else, remove content that doesn't make sense and re-format the rest as per awesome-lint
expectations.
Following this up from the issue #96 , I have found a list of NLP resources for:
- Arabic
- Chinese
- Spanish(This list is mentioned in the original awesome list)
Should I add each of them in individual PRs?
Secondly just a question: I now do have a certain amount of contribution too. (See @keon 's comment at #80 ). Does this make me eligible for a collaborator? 😅
I'd suggest a single PR with additions for all languages. You can keep 3 different commits and I'll not squash them so that it's easy to revisit when the changes were made.
I'll defer that to @keon, I am still new and frequently wait to hear from him/her on major changes like #110 where I've revamped the Python section heavily
Just logging some major progress by @the-ethan-hunt in #111 where he added Chinese, Arabic and Spanish tools and corpora.
TODO:
- Spanish Tools, preferably Python/PyTorch/TF
- More Japanese tools and corpora