mask zero and activation in HATT
zhangsh950618 opened this issue · comments
we use this code to build our project, but we found the acc dropped. So , we review the code, and find the following issues.
- This code did not implemented "mask" in the "AttLayer" class.
- we believe "Dense layer" should implemented in the class "AttLayer", instead of using "Dense" out of the class
- lost "Activation function" in the Dense layer
We made the above changes,and the acc increased by 4-5 percent from baseline in out task(text classification).
we give our "AttLayer" class, this input is the direct output from the GRU without an additional "Dense layer":
class AttLayer(Layer):
def __init__(self, attention_dim):
self.init = initializers.get('normal')
self.supports_masking = True
self.attention_dim = attention_dim
super(AttLayer, self).__init__()
def build(self, input_shape):
assert len(input_shape) == 3
self.W = K.variable(self.init((input_shape[-1], self.attention_dim)))
self.b = K.variable(self.init((self.attention_dim, )))
self.u = K.variable(self.init((self.attention_dim, 1)))
self.trainable_weights = [self.W, self.b, self.u]
super(AttLayer, self).build(input_shape)
def compute_mask(self, inputs, mask=None):
return mask
def call(self, x, mask=None):
# size of x :[batch_size, sel_len, attention_dim]
# size of u :[batch_size, attention_dim]
# uit = tanh(xW+b)
uit = K.tanh(K.bias_add(K.dot(x, self.W), self.b))
ait = K.dot(uit, self.u)
ait = K.squeeze(ait, -1)
ait = K.exp(ait)
if mask is not None:
# Cast the mask to floatX to avoid float64 upcasting in theano
ait *= K.cast(mask, K.floatx())
ait /= K.cast(K.sum(ait, axis=1, keepdims=True) + K.epsilon(), K.floatx())
ait = K.expand_dims(ait)
weighted_input = x * ait
output = K.sum(weighted_input, axis=1)
return output
def compute_output_shape(self, input_shape):
return (input_shape[0], input_shape[-1])
Thanks for the implementation!
The issues (2&3) you mentioned are also covered here in issue #24 .
Can you do a pull & push so that everyone can benefit?
Hi, I have implemented the new Attention layer but I get a error:
`
File "D:/Hierachical_2_imbd.py", line 227, in call
uit = K.tanh(K.bias_add(K.dot(x, self.w), self.b))
AttributeError: module 'keras.backend' has no attribute 'bias_add'`
Can someone help me?
I hava push a new verison of this implements, and you can review full code in my repo
Thanks! I will check it
How can I implement to derive the attention weight and identify important words for the classification?? I have read in your post the last update, but I don't understand your approach
It's not a fixed weight. Don't confused with the context vector or weights learned in the attention layer. You need to do a forward pass to derive importance of sentences and words. Different sentences and words will get to different result. Please read the paper.