ashalogic / Persian-Word-Embedding

Persian word embedding ( نشاننده واژه ها فارسی | تعبیه سازی کلمات فارسی )

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

*still under construction

Persian Word Embedding

Word is a word embedding model?

Word embedding is one of the most popular representation of document vocabulary. It is capable of capturing context of a word in a document, semantic and syntactic similarity, relation with other words, etc.








So what now?

Hmm not very important but maybe the only place you can find all word embedding for Persian to Train or just download the Pretrained version and of course one important thing is that, here I collect current best Models (2019) and I made a Lite version of them to use in your JS or Android or C# or ... Application without using Online API or...

Important note: some models currently have pretrained version for Persian so I just made them lite

Corpus? Wikipedia!




Where to Download Wikipedia Corpus?

You can see backup status of Wikipedia in each language here. And you can see backup versions you can download for Persian Wikipedia here. Choose "latest" because we want to use the newest version. And we need to download fawiki-latest-pages-articles-multistream.xml.bz2 in the files.

Here is some others Corpus to Download




Models

  • #Fasttext FastText is an open-source, free, lightweight library that allows users to learn text representations and text classifiers. It works on standard, generic hardware. Models can later be reduced in size to even fit on mobile devices.

Project status

  • Google’s Universal Sentence Encoder (This one is not Public Available)
    • Train.py
    • Train.ipynb
    • Model.bin
    • Model_Lite.bin
    • Model.vec
    • Model_Lite.vec
  • FasTText
    • Train.py
    • Train.ipynb
    • Model.bin
    • Model_Lite.bin
    • Model.vec
    • Model_Lite.vec
  • ELMo
    • Train.py
    • Train.ipynb
    • Model.bin
    • Model_Lite.bin
    • Model.vec
    • Model_Lite.vec
  • Word2Vec
    • Train.py
    • Train.ipynb
    • Model.bin
    • Model_Lite.bin
    • Model.vec
    • Model_Lite.vec
  • Glove
    • Train.py
    • Train.ipynb
    • Model.bin
    • Model_Lite.bin
    • Model.vec
    • Model_Lite.vec

About

Persian word embedding ( نشاننده واژه ها فارسی | تعبیه سازی کلمات فارسی )

License:MIT License


Languages

Language:Python 100.0%