- sentence encoding
- personality traits regression
- encode_and_map_sentences.py
- input:
- pretrained_bert/multi_cased_L-12_H-768_A-12
- spacymoji vocabulary
- myPersonalitySmall/statuses_unicode.txt
- myPersonalitySmall/big5labels.txt
- output:
- train_whole_lines.csv
- lines_skipped.csv
- input:
- mse.py
- input:
- cls_table.csv
- output:
- predictions.txt
- mse.txt
- input:
- distributions.py
- input:
- predictions.py
- predictions_eng.py //this file is used for comparison, change bert model model in previous steps
- lines_skipped.py
- train_whole_lines.csv
- output:
- data distributions as images
- kullback leibler divergences between multilinglual model and english model
- input:
* https://github.com/D2KLab/twitpersonality
* ibm_insights_script.py
* format the output to compare data distributions