Given a text and a reason, predict if text
satisfies the reason
.
Please refer report.pdf
for more details.
First, I conducted some Exploratory Analysis of the Data and had the following insights:
- The data does not contain any null values and any contractions.
- The maximum
text
length of the train data was 66 and it was 186 for the test data. - And the maximum
reason
length of the train data is 16 and it was 13 for test data. - Most
text
sentences were neutral in polarity like any statement or a fact. - Same was the case with the polarity of
reason
.
-
Tokenization: Tokenize the text and reason features using a pre-trained tokenizer
distilbert-base-uncased
-
Encoding: Encode the tokenized features using a pre-trained transformer-based language
distilbert-base-uncased
-
Concatenation: Concatenate the pooled representations of the text and reason features to get a joint representation of the input.
-
Classification: Add a classification layer on top of the joint representation to predict the label.
-
The baseline model was trained for 4 epochs on the
distilbert-base-uncased
model withbatch_size
= 32 andlearning_rate
= 1e-4. -
The performance was as:
- Training Loss:
0.3133
- Training Accuracy:
1.0
- Test Loss:
0.9797
- Test Accuracy:
0.3334
- Training Loss:
-
Balance the dataset: As the dataset contains only positive samples, we need to generate negative samples too.
-
There can be many techniques to achieve the same but the one of the simplest and effective way is to negate the
text
sentences, so that their meaning and/or polarity is reversed. -
This was achieved using python module called
negator
which usesSpacy
andtransformers
to negate the eligible sentences. -
Out of
2061
,1854
sentences were able to be negated, increasing the dataset size to3915
.
-
-
Training: After augmenting the dataset, the same
distilbert-base-uncased
was evaluated after 4 epochs of training.- With negative samples also added the final performance was:
- Training Loss:
0.5791
- Training Accuracy:
0.5264
- Test Loss:
0.8786
- Test Accuracy:
0.3334
- Training Loss:
- With negative samples also added the final performance was:
- Here the test accuracy is almost the same but the loss for the Neural Network classification layer has been reduced by amount of 0.1.
-
There are many more approaches which can be further implemented to increase the performance but are not tested yet such as:
-
Data augmentation
by replacing various words by there synonyms (or antonyms for negative samples). -
Multimodal learning
by using various numerical features along with the sentences like word count/length, polarity of sentences, etc. More features can increase performance. -
If more time was permitted, above proposed techniques could be implemented and more models like
GPT
could also be tested.
-