Most of the parameters are configured as per given in the paper like (hidden_layer size, embedding size, etc.). We also contacted the paper authors regarding optimal hyperparameters such as m(margin), and lambda. All the parameters are mentioned in params.config under heading of dataset as name.
For Eurlex
input_size = 5000
output_size = 3993
embedding_size = 100
attention_layer_size = 25
encoder_layer_size = 1000
hidden_layer_size = 100
learning_rate = 1e-3
epoch_num = 1000
batch_size = 256
m = 10
lamda = 1
All the parameters can be found in the params.config file
Model details can be found of model/README.md. Three modules (Feature Embedding, Encoder, Decoder) are used. Batch Norm is added between two FC layers. Initialisation is done by kaiming initialisation. For textd ataset is a text dataset Glove embeddings are used. The code for that can be found in construct_weight_matrix.py file, otherwise they are initialised randomly. Net.py file has the implementation of Rank Based Auto-Encoder for multi class classification. Over here we define The feature embedding layer, encoder and decoder.
Following is the snippet of model for Eurlex data.
-----------------------------------------------------------------------
Layer (type) Input Shape Param #
=======================================================================
FeatureEmbedding-1 [256, 5000] 0
Embedding-2 [256, 5000] 500,000
Linear-3 [256, 5000, 100] 2,525
BatchNorm1d-4 [256, 5000, 25] 10,000
ReLU-5 [256, 5000, 25] 0
Linear-6 [256, 5000, 25] 2,600
Sigmoid-7 [256, 5000, 100] 0
Linear-8 [256, 100] 10,100
Sigmoid-9 [256, 100] 0
Encoder-10 [256, 3993] 0
Linear-11 [256, 3993] 3,195,200
BatchNorm1d-12 [256, 800] 1,600
ReLU-13 [256, 800] 0
Linear-14 [256, 800] 80,100
Sigmoid-15 [256, 100] 0
Decoder-16 [256, 100] 0
Linear-17 [256, 100] 80,800
BatchNorm1d-18 [256, 800] 1,600
Sigmoid-19 [256, 800] 0
Linear-20 [256, 800] 3,198,393
=======================================================================
Total params: 7,082,918
Trainable params: 7,082,918
Non-trainable params: 0
-----------------------------------------------------------------------
It takes model object, loss function, parameters and lambda as input Following functions are defined in Solver class:
The model is trained on batches the Bow, tfidf and label are passed to the model, it returns x_hidden, y_hidden and y_predicted. For training Bow and tfidf are passed to Embedding module and thus returning x_hidden, labels are passed to Encoder module thus returning y_hidden which in then are passed to Decoder Module. These are pass to loss function, loss function calculate hidden loss between x_hidden and y_hidden, and reconstruction loss between y_predicted and y_original. These are then merged together by a parameter lambda. Then backward is called on this loss following by the optimizer step.
Although at the time of model training the Decoder is trained over y_hidden but it is assumed that it'll be able to predict the labels on the basis of x_hidden. For prediction Bow and tfidf are passed to embedding module getting x_hidden which are then passed to Decoder module returning the labels for the given test data.
The file losses.py has the implementation of
Learning Common Embedding (Lh): We employ the mean squared loss for
Reconstructing Output (Lae):
wherein N(y) is the set of negative label indexes, P(y) is the complement ofN(y), and margin m ∈ [0, 1] is a hyper-parameter for controlling the mini- mal distance between positive and negative labels. The loss consists of two parts: 1) LP targets to raise the minimal score from positive labels over all negative labels at least by m; 2) LN aims to penalize the most violated negative label under all positive labels by m.
1. Set the dataset and path to appropriate location in params.config
2. python main.py