ravikg / deep-learning-nlp-kaggle-toxic-comment

Tutorial for Deep Learning for NLP on Kaggle's Toxic Comment Classification Challenge Dataset

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Deep Learning for NLP

This tutorial is an introduction of using Deep Learning algorithm in the domain of Natural Language Processing.

And it is prepared using content (theory and code) from following sources:

  1. Deep Learning with Python, Book by François Chollet
  2. Neural Network Methods in Natural Language Processing, Book by Yoav Goldberg
  3. CS224d: Deep Learning for Natural Language Processing

Practice code on Kaggle's Toxic Comment Classification Challenge dataset

Table of Contents

  1. Use Cases

    1. Sequence classification
      1. Language detection
      2. Category classification (Sentiment, topics etc.)
      3. Keyword classification (name-gender, place/person name)
    2. Sequence to sequence (Seq2Seq)
      1. Translation
      2. Gmail smart reply
      3. Conversational AI: Chat bots
    3. Others
      1. Name, Story, poem, dialog generator
      2. Image captioning
      3. Part of speech tagging
      4. Name entity recognition
  2. System Setup

    1. Python 3.6
    2. pip
    3. Virtualenv
    4. Libraries:
      • Keras
      • Tensorflow
      • Jupyter
      • matplotlib
  3. Datasets to play

    1. IMDB review dataset
    2. Kaggle (Toxic comment classification challenge) Wikipedia comment dataset
    3. Ubuntu dialog corpora
    4. Translation dataset
    5. Other datasets
  4. Data Analysis

    1. General Analysis
  5. Sequence Representation

    1. Representation
    2. One Hot Encoding
    3. Word Embeddings
      • Pre trained embeddings
        • Word2vec
        • GloVe
  6. Models

    1. Embedding to Class Model 1
    2. Embedding connected to 1 layer RNN (Recurrent Neural Network) Model 2 and Model 2 Extended
    3. Bidirectional RNN Model 3 and Model 3 Extended
  7. Modern RNN architecture

    1. Long short-term memory (LSTM)
    2. Gater Recurrent Unit (GRU)
    3. Seq2seq
    4. Attention
    5. Beam Search
  8. Keras

    1. API & keywords
      1. Optimizers
      2. Loss
      3. Activation
      4. Metrics
    2. Deploy model to production and inference
  9. Model optimization techniques

    1. Dropout
    2. Truncated backpropagation through time (TBPTT)
    3. Vanishing Gradient Problem

About

Tutorial for Deep Learning for NLP on Kaggle's Toxic Comment Classification Challenge Dataset


Languages

Language:Jupyter Notebook 99.9%Language:Dockerfile 0.1%