connormeaton / CNN_for_text_features

A CNN for feature generation from text roughly based on Kim (2014) in Keras

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CNN_for_text_features

This repo is a Keras implementation of a CNN for feature generation from text roughly based on Kim (2014). The goal of this work is to transform new data into the proper format to be fed into the DialogueGCN model from this repo.

Model overview:

The model works as such:

  • Convert text data to int tokens
  • Pass tokens into word2vec embedding
  • Dropout
  • Extract features from sequences using convolutional layers (3, filter sizes 2, 4, and 5)
  • 1D Maxpool
  • Flatten
  • Concat tensors from each convolution
  • Dropout
  • Output tensor

Updates:

  • 4/02/2020: I have the model working on SPAFF data, however some questions remain:
    • How to best tokenize data? Should the word:int mapping be conversation-wide or corpus-wide? - Currently, the output tensor is shape [5290, 50]. This is problematic, as I believe the DialogueGCN model is built to except [n, n, 100]. Work in progress here...

About

A CNN for feature generation from text roughly based on Kim (2014) in Keras

License:MIT License


Languages

Language:Python 100.0%