keon / awesome-nlp

:book: A curated list of resources dedicated to Natural Language Processing (NLP)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Create Asian-language specific sections

NirantK opened this issue · comments

To become the go-to resource, we should be able to curate good tools and datasets in at least the following languages (in addition to English):

  • Chinese
  • Korean
  • Japanese

Of course, we are willing to accept PRs from all other languages.

Please feel free to raise a PR or simply comment on this issue itself and we will add it on your behalf.

@keon

  1. Thoughts? Is this a good direction to take?

  2. Can you take up any of the languages? Sorry for assuming your Asian heritage, the South Korean flag is on your Github account

A short search for NLP Chinese section landed me in these places:

  • A CNN for chinese text classifications in TensorFlow
  • Using deep learning for Chinese information extraction
  • Recognizers by Microsoft for Chinese( currently developing Korean and Japanese)

For Korean:

  • A R package for Korean NLP; koNLP(unmentioned in this awesome list; koNLPy is mentioned already)
  • A whole new list of corpora, communities and POS taggers

Didn't have time for Japanese, though: sweat_smile:
@NirantK , do you want me to begin for Korean? If so, then can you explain the initial steps?
Edit: A small search landed me here

@the-ethan-hunt thanks for creating the Korean section for us. Would you like to go ahead and create more for Chinese?

@NirantK I searched for Chinese but found very very few. Almost all the libraries have been mentioned in the list previously and not much has changed since I posted this comment:

  • A CNN for chinese text classifications in TensorFlow
  • Using deep learning for Chinese information extraction
  • Recognizers by Microsoft for Chinese( currently developing Korean and Japanese)

Do you want me to get going with it? IMHO, I don't think we need a PR for that.

Thanks for the update @the-ethan-hunt

Our present coverage of Chinese-NLP tools is not exhaustive, we need to improve that soon. I'll look into it.

Consider Japanese or frankly, any other open issue?

I'd be grateful for any help on #96 if you are looking at open issues. If nothing else, remove content that doesn't make sense and re-format the rest as per awesome-lint expectations.

Following this up from the issue #96 , I have found a list of NLP resources for:

  • Arabic
  • Chinese
  • Spanish(This list is mentioned in the original awesome list)

Should I add each of them in individual PRs?

Secondly just a question: I now do have a certain amount of contribution too. (See @keon 's comment at #80 ). Does this make me eligible for a collaborator? 😅

I'd suggest a single PR with additions for all languages. You can keep 3 different commits and I'll not squash them so that it's easy to revisit when the changes were made.

I'll defer that to @keon, I am still new and frequently wait to hear from him/her on major changes like #110 where I've revamped the Python section heavily

Just logging some major progress by @the-ethan-hunt in #111 where he added Chinese, Arabic and Spanish tools and corpora.

TODO:

  • Spanish Tools, preferably Python/PyTorch/TF
  • More Japanese tools and corpora

Merging this #113 to streamline contributor attention