Sohom Ghosh's repositories
FinRAD_Financial_Readability_Assessment_Dataset
FinRAD: Financial Readability Assessment Dataset - 13,000+ Definitions of Financial Terms for Measuring Readability
FinSim_Financial_Hypernym_detection
Codes and models to extract hypernyms of Financial Terms
BERT-Relation-Extraction
PyTorch implementation for "Matching the Blanks: Distributional Similarity for Relation Learning" paper
deep-finance
Datasets, papers and books on AI & Finance.
Finance-FinNum
Numeral is the crucial part of financial documents. In order to understand the detail of opinions in financial documents, we should not only analyze the text, but also need to assay the numeric information in depth. Because of the informal writing style, analyzing social media data is more challenging than analyzing news and official documents. FinNum is a dataset for fine-grained numeral understanding in financial social media data - to identify the category of a numeral.
Finance-FinProLex
FinProLex provides 5,162 tokens in professional analysts' reports and the financial social media platform posts with expert-like scores. The expert-like scores are calculated based on the pointwise mutual information (PMI).
Finance-NTUSD-Fin
NTUSD-Fin provides various scoring methods including frequency, CFIDF, chi-squared value, market sentiment score and word vector for the tokens. Only the tokens appeared at least ten times and shown significantly difference between expected and observed frequency with chi-squared test are remained in our dictionary. The predetermined significance level is 0.05. The market sentiment score is calculated by substracting the bearish PMI from the bullish PMI. There are 8,331 words, 112 hashtags and 115 emojis in the constructed dictionary, NTUSD-Fin.
Finance-Numeracy-600K
Numeral is the crucial part of in narrative, especially in financial documents. We should not only analyze the text, but also need to assay the numeric information in depth. Numeracy-600K is a dataset for testing the numeracy of machines.
LIPI_ERAI_FinNLP_EMNLP-2022
Codes of the system developed by team LIPI while participating in ERAI shared task of FinNLP, co-located with 2022
Machine_Learning_Training
This repository contains the data of the training programs taken by Anshu Pandey
ML-ESG3_LIPI
The codes correspond to the system developed by team LIPI while participating in the ML-ESG3 shared task of FinNLP-KDF@LREC-COLING-2024.
sohom2ghosh.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
sohomghosh.github.io
A beautiful, simple, clean, and responsive Jekyll theme for academics
tezansahu-website
The repository for my website
tweetfinsent
TweetFinSent: A Dataset of Stock Sentiments on Twitter