Disclaimer: obsolete! See the new one here
Implement the sparse (one-hot input) 1-dimensional (temporal) convolution defined in [1, 2].
For NLP task, it applies the convolution over the one-hot word vector directly that the word embedding can be ommited, as shwon in [1, 2].
Only support the "Narrow Convolution", i.e., the sequence size is reduced after "forward".
Convolution stride must be 1.
The exposed module is essentially a wrapper depending on nn.LookupTable
, so that:
- Input must be a
Tensor
of word index pointing to the vocabulary - Gradient of the inputs are unavailable (
gradInput
is a dummy variable) duringbackward()
by default, since it saves a lot of training time whilegradInput
is not involved. Call the methodshould_updateGradInput(flag)
to explicitly enable/disable it ifgradInput
is indeed desired/undesired. See explanations below. - Both CPU and GPU are supported, depending on
require'nn'
orrequire'cunn'
Interfaces and tensor size layout are consistent with nn.TemporalConvolution
.
The terms of nn.TemporalConvolution
are borrowed here and are aliased as the following:
B = batch size
M = sequence length = nInputFrame = #words
V = inputFrameSize = vocabulary size
C = outputFrameSize = #output feature maps = #hidden units = embedding size
kW = convolution kernel size = kernel width
- Torch 7
- run command
git clone https://github.com/pengsun/onehot-temp-conv
- cd to the directory, run command
luarocks make
Then the lib will ba installed to your torch 7 directory. Delete the git-cloned source directory onehot-temp-conv
if you like.
After installation, running require'onehot-temp-conv'
will add to the nn
namespace the following classes:
module = nn.OneHotTemporalConvolution(inputFrameSize, outputFrameSize, kW [,opt])
Applies a 1D convolution over an input sequence composed of nInputFrame
frames. The input
tensor in
forward(input)
must be a 2D tensor in size
BatchSize x nInputFrame
where each element is an index ranging from 1
to InputFrameSize
.
The output will be sized
BatchSize x nOutputFrame x outputFrameSize
where nOutputFrame = nInputFrame - kW + 1
.
The parameters are the following:
inputFrameSize
: The input frame size expected in sequences given intoforward()
.outputFrameSize
: The output frame size the convolution layer will produce.kW
: The kernel width of the convolution,kW <= nInputFrame
required.opt
: Options. A lua table with the fields:hasBias
: Flag for whether to have bias for the convolution. Default tofalse
.
See also the examples below for the NLP alias of these terms.
Example 1:
require'onehot-temp-conv'
B = 200 -- batch size
M = 45 -- sequence length (#words)
V = 12333 -- inputFrameSize (vocabulary size)
C = 300 -- outputFrameSize (#output feature maps, or embedding size)
kW = 5 -- convolution kernel size (width)
-- inputs: the one-hot vector as index in set {1,2,...,V}. size: B, M
inputs = torch.LongTensor(B, M):apply(
function (e) return math.random(1,V) end
)
-- the 1d conv module
tf = nn.OneHotTemporalConvolution(V, C, kW, {hasBias = true})
-- outputs: the dense tensor. size: B, M-kW+1, C
outputs = tf:forward(inputs)
-- back prop: the gradients w.r.t. parameters
gradOutputs = outputs:clone():normal()
tf:backward(inputs, gradOutputs)
Example 2 (gpu):
require'cunn'
require'onehot-temp-conv'
B = 200 -- batch size
M = 45 -- sequence length (#words)
V = 12333 -- inputFrameSize (vocabulary size)
C = 300 -- outputFrameSize (#output feature maps, or embedding size)
kW = 5 -- convolution kernel size (width)
-- inputs: the one-hot vector as index in set {1,2,...,V}. size: B, M
inputs = torch.LongTensor(B, M):apply(
function (e) return math.random(1,V) end
):cuda()
-- the 1d conv module
tf = nn.OneHotTemporalConvolution(V, C, kW, {hasBias = true}):cuda()
-- outputs: the dense tensor. size: B, M-kW+1, C
outputs = tf:forward(inputs)
-- back prop: the gradients w.r.t. parameters
gradOutputs = outputs:clone():normal()
tf:backward(inputs, gradOutputs)
OneHotTemporalConvolution:should_updateGradInput(flag)
Set if it should do updateGradInput (default to false while class construction). flag
must be true
or false
OneHotTemporalConvolution:index_copy_weight(vocabIdxThis, convThat, vocabIdxThat)
Copy the weight from another OneHotTemporalConvolution
with the respective vocabulary index.
vocabIdxThis
, vocabIdxThat
must be torch.LongTensor
.
convThat
must be also a OneHotTemporalConvolution
.
Suppose the weight size
this: V1, C, p
that: V2, C, p
where the vocabulary size V1
and V2
can be different (but the outputFeatureMap C
and region size p
must be the same).
Then calling the method would in effect do the copying
this(vocabIdxThis, :, :) = that(vocabIdxThat, :, :)
on gradInput
and updateGradInput()
. When OneHotTemporalConvolution
is usually used as the first layer, the gradInput
is usually unnecessary since it does not contribute to parameters updating during training. That's why it's default to dummy. When you do need it, just call should_updateGradInput(true)
, and gradInput
will be available after calling updateGradInput()
or backward()
. Use it with caution as gradInput
is usually very large and demands much memory.
Example:
require'cunn'
require'onehot-temp-conv'
B = 1 -- batch size
M = 225 -- sequence length (#words)
V = 30*1000 -- inputFrameSize (vocabulary size)
C = 300 -- outputFrameSize (#output feature maps, or embedding size)
kW = 5 -- convolution kernel size (width)
-- inputs: the one-hot vector as index in set {1,2,...,V}. size: B, M
inputs = torch.LongTensor(B, M):apply(
function (e) return math.random(1,V) end
):cuda()
-- the 1d conv module
tf = nn.OneHotTemporalConvolution(V, C, kW):cuda()
-- outputs: the dense tensor. size: B, M-kW+1, C
outputs = tf:forward(inputs)
-- enable backprop for input
tf:should_updateGradInput(true)
-- back prop: the gradients w.r.t. parameters and inputs
gradOutputs = outputs:clone():normal()
gradInputs = tf:backward(inputs, gradOutputs)
-- size should be B x M x V
print(gradInputs:size())
-- disable the gradInput
tf:should_updateGradInput(false)
-- should be null
gradInputs2 = tf:backward(inputs, gradOutputs)
print(gradInputs2)
on parameters. When you need the kernel weight and its gradient, call self:parameters()
or self:getParameters()
- note that OneHotTemporalConvolution
is derived from the container nn.Sequential
.
OneHotTemporalConvolution.share_weight(tOhConv1, tOhConv2)
Examples
require'onehot-temp-conv'
oh1 = nn.OneHotTemporalConvolution(10, 20, 2)
oh2 = nn.OneHotTemporalConvolution(10, 20, 3)
ct = nn.ConcatTable():add(oh1):add(oh2)
-- no sharing
params = ct:getParameters()
print(params:size())
-- share at kern3l position 1
nn.OneHotTemporalConvolution.share_weights({oh1,oh2}, {1,1})
params = ct:getParameters()
print(params:size())
-- share at kernel position 1, 2
nn.OneHotTemporalConvolution.share_weights({oh1,oh2}, {1,1})
nn.OneHotTemporalConvolution.share_weights({oh1,oh2}, {2,2})
params = ct:getParameters()
print(params:size())
-- convert to cuda
require'cunn'
ct:cuda()
params = ct:getParameters()
print(params:size())
module = nn.OneHotTemporalConvolutionOnlyFP(ohConv)
Derived from nn.Module
.
Initialize from a (pre-trained) nn.OneHotTemporalConvolution
.
Only do forward propagation.
Back propagation is dummy in that 1) the parameters are invincible to outer module. 2) both the gradInput
and gradParameters
are not updated.
This module should work as a "feature extractor".
Example:
require'nn'
require'onehot-temp-conv'
V, C, p = 33, 10, 2
ohConv = nn.OneHotTemporalConvolution(V, C, p)
fetext = nn.OneHotTemporalConvolutionOnlyFP(ohConv:float())
-- fprop
B, M = 12, 5
inputs = torch.LongTensor(B, M):apply(
function (e) return math.random(1,V) end
):float()
outputs = fetext:forward(inputs)
-- cannot get the parameters
params, grad = fetext:getParameters()
assert(params:numel()==0 and grad:numel()==0)
An auxiliary class. Extend nn.Narrow
in that
- interpret input as index of one-hot tensor
- the updateGradInput() can be turned off during bp()
An auxiliary class. Extend nn.LookupTable
in that
- can do the updateGradInput(), which is not implemented in
nn.LookupTable
- the updateGradInput() can be turned off during backward()
##Reference [1] Rie Johnson and Tong Zhang. Effective use of word order for text categorization with convolutional neural networks. NAACL-HLT 2015.
[2] Rie Johnson and Tong Zhang. Semi-supervised convolutional neural networks for text categorization via region embedding. NIPS 2015.