ykdojo / editdojo

(I'm no longer working on this - currently working on https://github.com/ykdojo/defaang)

Home Page:https://www.csdojo.io/edit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Automatically detect if the given text is Japanese or English with Python

ykdojo opened this issue · comments

commented

I think I'm going to release the Twitter-based version of this product for Japanese and English first. So, we should be able to detect if a given tweet is written in Japanese or English with Python. This way, we can only show Japanese tweets coming from Japanese learners to native speakers of the language. Same with English.

@ykdojo does it mean whenever there is a japanese tweet from a person,the person who is familiar with Japanese will only be able to see that.?or all the members in the community?If we notify only japanese familiar people,then while using this twitter app,they must be registered as learning English knows japanese?Is your thought process is the similar to this?,What I have understood.By the way I am very much interested in contributing to this app idea from which I can gain more knowledge.we can do this to other languages aswell here in India :)

Small doubt :(

commented

Hmm here's an example to clarify.

Suppose User A is learning Japanese, and her native language is English.

She starts using one of her Twitter accounts, say, @uesr_a_jp to start tweeting stuff in Japanese.

Then, Japanese native speakers should start seeing these tweets so they can fix them.

However, I'm only concerned that, what if @user_a_jp starts tweeting stuff in both Japanese and English? We should probably be able to ignore all English tweets in that case.

commented

For something like this, we could look into the langdetect library? If, following along with the above example, @user_a_jp writes a tweet that returns 'en', we would ignore the tweet.

commented

Oh yeah, the langdetect library looks good!

commented

Would you like me to go ahead and create a few functions that make use of the library? @ykdojo

commented
commented

NOTE: there's already a PR for this. #29

Will come back to this when it's more immediately useful.

would it be easier to implement google traductors feature of automatic language detection or its something extra and unnecessary ? @ykdojo

commented

Yeah, actually I think that will be ideal.