Digital-Alpha-ML

The ML repository for the Problem Statement - Digital Alpha SaaS Analyzer at Inter IIT Tech Meet 10.0. The deployed version of the website can be found here. The report can be found here

Files Description

dict-sentiment.ipynb -> For sentiment analysis(6 classes - lexicon based)
- Input: The input of the file is specified in the temp_text variable.
- Output: The input is passed to the get_class_counter function, which returns the sentiment dictionary containing the results.
finbert_inference.ipynb -> For sentiment analysis(3 classes - transformer)
- Input: The input of the file is specified in the temp_text variable.
- Output; The input is passed to the get_output function, which returns the sentiment dictionary.
mdna_extractor.ipynb -> For extracting contents(section wise)
- Input: The input to the function is the filing_url and section_name, where the names have their usual meanings
- Output: The output is obtained from the get_section function, which returns the desired section text
find_company_trends_using_lda.py -> For extracting the latest trending topics relevant to the company
- Input: The input to the file is the company title and the number of tweets we want to extract
- Output: The output of the file is a list of top keywords relevant to the company
extract_metrics_from_fillings.ipynb -> For extracting metrics from the fillings
- Input: The inputs are:
  - api_key - for accessing the fillings using sec-api
  - url - url to the filing
  - metric - name of the metric in lowercase
  - val_type - metric data type - one of ['PERCENT', 'MONEY', 'NUMBER', 'RATIO']
  - k - window size for metric search, default = 6
  - relevant_sections - list of sections to search for the metric
- Output: The output of the file is value of the metric extracted from the filing stored in correct_value variable
extract_tables.ipynb -> For extracting tables from the fillings
- Input: The inputs are api_key for accessing the fillings using sec-api, url to the filing and the section
- Output: The output of the file is the tables extracted from the filing stored in tables variable
qna_on_tables.ipynb -> For question answering on the tables
- Input: The inputs are table and ques (a list of questions)
- Output: The output of the file is the answers to the question based on the table
theme-vocab-builder.ipynb -> To build vocabulary for various sectors
- Input: any important data file related to various sectors
- Output: The output of the file is the vocabulary file for various sectors
exposure-calc.ipynb -> to calculate the exposure of a company to various sectors
- Input: The inputs are -
  - filing.txt - sec filing of a company
  - theme.txt - vocabulary file for a specific sector
- Output: The output of the file is the similarity score with respect to the vocabulary of a specific sector
generate_questions_answers.ipynb -> to generate questions and answers from the text given
- Input: The only input is the text
- Output: The output of the file is the generated questions and answers in the dictionary qna_dict
summarize_text.ipynb -> to summarize the text given
- Input: The only input is the text
- Output: The output of the file is the summary of the text in the variable summary
10Q_parser.ipynb -> For extracting contents(section wise)
- Input: The input to the function is the link of the filing and section number
- Output: The output is obtained from the parse_10q_filing function, which returns the desired section text
find_metric.ipynb -> complete pipeline for extracting metrics from filings of a company
- Input: The inputs are -
  - api_key - for accessing the fillings using sec-api
  - company_cik - cik of the cmopany
  - metric - name of the metric in lowercase
  - val_type - metric data type - one of ['PERCENT', 'MONEY', 'NUMBER', 'RATIO']
  - k - window size for metric search, default = 6
- Output: The output of the file is value of the metric extracted from the filings

ris27hav / digital-alpha-ml

Digital-Alpha-ML

Files Description

About

Languages