liangkai / CDSSM

CDSSM implementation in torch

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Note: Still in developing.

##implementation of (c)DSSM in torch

##dependencies:

  • torch
  • tds (DSSM dense)

##Data Preprocessing ###Related Functions:

  • Batch.lua
  • WordHash.lua
  • ComputelogPD.lua
  • Preprocess.lua

##Tranining 1: generate data from dataset. The data format follows the C# implementation. Each query and document in the same line, and the seperator is 'Tab'. 2: generate vocabulary for question and answers. Using WordHash.Pair2Voc(). you should get the result like this: ''' Creating Voc file form ... srcVoc contains vocabulary: 5584 tgtVoc contains vocabulary: 10876 ''' 3: Create Pair2Seq Feature and save to txt. Using WordHash.Pair2SeqFea()

4: Convert the seq Feature to Binay file, we give the batchsize here. (this can't be change after you train the model. for orginial data, the batch size is 1024. Using WordHash.SeqFea2Bin(), See more info under the function.

###Related functions

  • (Data Provider): BatchSample.lua, SequenceInputStream.lua, PairInputStream.lua
  • (Model): DSSM_Train, DSSM_MMI_Criterion.lua
  • (Training): th train.lua

##Predicting 1: generate feature file, refer PreProcess.lua for details

  • Preprocessing:

  • (Predict): th predict.lua

##To-do List

  • testing the cu implementation of sparse Linear.
  • implement the cu implementation of sparse convolution.

About

CDSSM implementation in torch


Languages

Language:Lua 79.5%Language:C 20.5%