joschu / cgt

Computation Graph Toolkit

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Support for ragged arrays and sparse matrix/vectors

mjwillson opened this issue · comments

Two things which theano doesn't really do and which would be really really useful for sequential data and NLP applications, perhaps enough to make me take the jump :)

In theano, ragged arrays require workarounds with padding and masking, which aside from being quite ugly and making the code less intuitive, can also hurt performance unless you do a bunch of extra preprocessing to lump sequences of similar lengths together in minibatches.

Sparse vectors and matrices are also very useful and something that theano has at best second-class support for. For the common case of neural models with dense weight matrices, sparse by dense dot products are probably the most useful thing to have implemented with efficient sparse gradients. Common operations in NLP neural models can be seen as sparse-by-dense dot products, e.g. a lookup table (sparse one-hot vector by dense embedding matrix), or a "continuous bag of words" sum of word embeddings (sparse count vector by dense embedding matrix.). Noise-contrastive estimation (useful for large softmax output layers) also relies for its speed advantage on efficiently backpropagating a sparse error vector from the output layer.