FinBERT: Financial Sentiment Analysis with BERT
FinBERT is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification. For the details, please see FinBERT: Financial Sentiment Analysis with Pre-trained Language Models.
Important Note:
FinBERT implementation relies on Hugging Face's pytorch_pretrained_bert
library and their implementation of BERT for sequence classification tasks. pytorch_pretrained_bert
is an earlier version of the transformers
library. It is on the top of our priority to migrate the code for FinBERT to transformers
in the near future.
Installing
Install the dependencies by creating the Conda environment finbert
from the given environment.yml
file and
activating it.
conda env create -f environment.yml
conda activate finbert
Models
You can download the models from the links below:
For both of these model, the workflow should be like this:
- Create a directory for the model. For example:
models/sentiment/<model directory name>
- Download the model and put it into the directory you just created.
- Put a copy of
config.json
in this same directory. - Call the model with
.from_pretrained(<model directory name>)
Datasets
There are two datasets used for FinBERT. The language model further training is done on a subset of Reuters TRC2 dataset. This dataset is not public, but researchers can apply for access here.
For the sentiment analysis, we used Financial Phrase Bank from Malo et al. (2014).
The dataset can be downloaded from this link.
If you want to train the model on the same dataset, after downloading it, you should create three files under the
data/sentiment_data
folder as train.csv
, validation.csv
, test.csv
.
Training the model
Training is done in finbert_training.ipynb
notebook. The trained model will
be saved to models/classifier_model/finbert-sentiment
. You can find the training parameters in the notebook as follows:
config = Config( data_dir=cl_data_path,
bert_model=bertmodel,
num_train_epochs=4.0,
model_dir=cl_path,
max_seq_length = 64,
train_batch_size = 32,
learning_rate = 2e-5,
output_mode='classification',
warm_up_proportion=0.2,
local_rank=-1,
discriminate=True,
gradual_unfreeze=True )
The last two parameters discriminate
and gradual_unfreeze
determine whether to apply the corresponding technique
against catastrophic forgetting.
Getting predictions
We provide a script to quickly get sentiment predictions using FinBERT. Given a .txt file, predict.py
produces a .csv file including the sentences in the text, corresponding softmax probabilities for three labels, actual prediction and sentiment score (which is calculated with: probability of positive - probability of negative).
Here's an example with the provided example text: test.txt
. From the command line, simply run:
python predict.py --text_path test.txt --output_dir output/ --model_path models/classifier_model/finbert-sentiment
Disclaimer
This is not an official Prosus product. It is the outcome of an intern research project in Prosus AI team.
About Prosus
Prosus is a global consumer internet group and one of the largest technology investors in the world. Operating and investing globally in markets with long-term growth potential, Prosus builds leading consumer internet companies that empower people and enrich communities. For more information, please visit www.prosus.com.
Contact information
Please contact Dogu Araci dogu.araci[at]naspers[dot]com
and Zulkuf Genc zulkuf.genc[at]naspers[dot]com
about
any FinBERT related issues and questions.