gallupliu / CNN-Text-Pairs-Classification

About Text Pairs (Sentence Level) Classification (Similarity Modeling) Based on CNN.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Convolutional Neural Networks for Text Pairs Classification

This project is used by my bachelor graduation project, and it is also a study of TensorFlow, Deep Learning(CNN, RNN, LSTM, etc.).

The main objective of the project is to determine whether the two sentences are similar in sentence meaning (binary classification problems) by the two given sentences based on Convolutional Neural Networks.

The project refer to dennybritz/cnn-text-classification-tf, make the data helper supports Chinese language (Task required) and modified the network structure (Based on my task).

Requirements

  • Python 3.x
  • Tensorflow 1.0.0 +
  • Numpy
  • Gensim

Data

Research data may attract copyright protection under China law. Thus, there is only code.

实验数据属于实验室与某公司的合作项目,涉及商业机密,在此不予提供,还望谅解。

Pre-trained Word Vectors

Use gensim package to pre-train my data.

Network Structure

Innovation

  1. Make the data support Chinese and English.(Which use gensim seems easy)
  2. Can use your own pre-trained word vectors.
  3. Deign two subnetworks to meet the task requirements.
  4. Add a new Highway Layer.
  5. Add AUC Performance Measure since the data is imbalanced.
  6. Can choose train the model directly or restore the model from checkpoint.
  7. Add model test code.

References

About Me

黄威,Randolph

SCU SE Bachelor; USTC CS Master

Email: chinawolfman@hotmail.com

My Blog: randolph.pro

LinkedIn: randolph's linkedin

About

About Text Pairs (Sentence Level) Classification (Similarity Modeling) Based on CNN.

License:Apache License 2.0


Languages

Language:Python 100.0%