The dataset used for this project is CNN/Daily Mail. You can download it using this link. Linux/Unix:
wget https://s3.amazonaws.com/datasets.huggingface.co/summarization/cnn_dm.tgz tar -xzvf cnn_dm.tgz
- Download from here: get the dataset here
- Extract the contents of tar.gz file into the
- After extraction you should see
train.source, train.target, test.source, tes.target, val.source and val.targetfiles in your
- Transfer of knowledge among different tasks.
- The idea of pretraining a model and transferring its knowledge to downstream tasks.
- Using the knowledge base of a pretrained model, new tasks are solved.
We used T5 and BART pretrained transformer models and fine tuned them for the text summarization tasks. We fine tuned the model on the CNN/Daily mail dataset. We achieved pretty decent results. In addition, we also built a transformer model with self-attention layers. This modle requires longer training duration and a larger pretraining dataset to yield higher performance.
- T5 Text-to-Text Transfer Transformer.
- Proposed by google AI's Colin et al.
- 3 Types
- Encoder-Decoder Transformer
- Language Model Transformer(Autoregressive)
- Prefix Language Model Transformer
- Fill in the blan denoising.
- Useful for many downstream tasks.
- Proposed by Facebook AI's Mike et.al.
- BART = BERT + GPT
- uses the GELU activation function instead of RELU.
- Yields more semantically sensible summaries compared to the summaries generated by the T5 transformer.
- Bidirectional Encoder and Autoregressive Decoder are used in this model.
- 6 Attention Heads
- 3 Encoder and Decoder Layers
- also using pretrained Fasttext Embeddings of 300 dimensions in the embedding layer.
- Positional Encoding for the continous context.
- Dimensionality of the hidden state: 300
- Attention and padding masks as needed.