Yumi 's starred repositories
PDF-Extract-Kit
A Comprehensive Toolkit for High-Quality PDF Content Extraction
vaderSentiment
VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.
pysentimiento
A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks
tweetnlp
TweetNLP for all the NLP enthusiasts working on Twitter! The Python library tweetnlp provides a collection of useful tools to analyze/understand tweets such as sentiment analysis, emoji prediction, and named entity recognition, powered by state-of-the-art language models specialised on Twitter.
granite-code-models
Granite Code Models: A Family of Open Foundation Models for Code Intelligence
CascadeTabNet
This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"
albumentations
Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125
DAVAR-Lab-OCR
OCR toolbox from Davar-Lab
ReadingBank
ReadingBank: A Benchmark Dataset for Reading Order Detection
Transformer-Explainability
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
OCRDatasets
A collection of OCR-related datasets
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
open-parse
Improved file parsing for LLM’s