parham1998 / LSTM_Projects

Implementation of some fun projects with LSTM (long short-term memory) architecture by PyTorch library

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

LSTM_Projects

Implementation of some fun projects with LSTM (long short-term memory) architecture by PyTorch library

LSTM

LSTM

Persian name generator

In this project I've tried to generate persian names with a character-level RNN (LSTM). I've used "دیتاست-اسامی-نام-های-فارسی.csv" as my names dataset, which contains 4055 names written in Persian.
This network can generate up to ten characters in a row, which means the maximum name length is 10.
I've trained this network once by embedding vector method, and once by one-hot vector method. At last, for generating new names, you have to specify the first few characters and K (K is used to specify the number of top predictions in each time step, and the model randomly selects one of these predictions).

You can see how to train and test the model in the figure below:

Screenshot (487) Screenshot (490) Screenshot (491)

Emojify

Emojify is something like emotion classification, but the difference is, we describe sentences with emojis or in better words with ❤️, ⚾, 😄, 😞 and 🍴.
Such as the above project, I've used LSTM as my model and trained it once by embedding vector method and once by one-hot vector method. (using embedding vector method is essential here because vocabulary size is 400,000, and the one-hot vector method needs much more resources for training and, at last, the results for vocabulary that have not been used in training examples is worse).
There are only 132 sentences to train models and 56 sentences to test models.
"glove.6B.50d.txt" is the word embedding I've used in the project, and it is already trained on large datasets by Glove. It transforms every word index into a 50-dimensional embedding vector. You have to download this file from here and put in "glove" folder.

You can see the model architecture in the figure below:

Screenshot (492) Screenshot (494)

Neural machine translation

In the last project, I've built a neural machine translation (NMT) model to translate human-readable dates ("25th of June, 2009") into machine-readable dates ("2009-06-25").
I've implemented this NMT once by using an attention model and once by a simple sequence-to-sequence model. You can see the clear difference between the results of the models, and the attention weights show why this model performs much better.
The model I've build here could be used to translate from one language to another, such as translating from English to Persian. (However, language translation requires massive datasets and usually takes days of training on GPUs).
I generated 11000 data, 10000 for training and 1000 for testing models.

You can see the models (attention & sequence-to-sequence) architecture in the figure below:

sequence-to-sequence photo_2021-11-15_12-57-40 Screenshot (496) Screenshot (497) Attention Screenshot (500) Screenshot (499)

attention weights for each character of the predicted output:

attention1 attention2

Inspiration

Coursera course by Andrew Ng

About

Implementation of some fun projects with LSTM (long short-term memory) architecture by PyTorch library


Languages

Language:Python 100.0%