Yumi (yumikim381)

yumikim381

Geek Repo

Github PK Tool:Github PK Tool

Yumi 's starred repositories

MinerU

A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。

Language:PythonLicense:AGPL-3.0Stargazers:7893Issues:0Issues:0

PDF-Extract-Kit

A Comprehensive Toolkit for High-Quality PDF Content Extraction

Language:PythonLicense:Apache-2.0Stargazers:4075Issues:0Issues:0

vaderSentiment

VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains.

Language:PythonLicense:MITStargazers:4354Issues:0Issues:0

pysentimiento

A Python multilingual toolkit for Sentiment Analysis and Social NLP tasks

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:537Issues:0Issues:0

timelms

TimeLMs: Diachronic Language Models from Twitter

Language:Jupyter NotebookStargazers:100Issues:0Issues:0

BERTweet

BERTweet: A pre-trained language model for English Tweets (EMNLP-2020)

Language:PythonLicense:MITStargazers:572Issues:0Issues:0

tweetnlp

TweetNLP for all the NLP enthusiasts working on Twitter! The Python library tweetnlp provides a collection of useful tools to analyze/understand tweets such as sentiment analysis, emoji prediction, and named entity recognition, powered by state-of-the-art language models specialised on Twitter.

Language:PythonLicense:MITStargazers:298Issues:0Issues:0

tweeteval

Repository for TweetEval

Language:Jupyter NotebookStargazers:351Issues:0Issues:0
Language:PythonStargazers:51Issues:0Issues:0

data2neo

Data2Neo is a library that simplifies the conversion of data in relational format to a graph knowledge database.

Language:PythonLicense:Apache-2.0Stargazers:14Issues:0Issues:0

granite-code-models

Granite Code Models: A Family of Open Foundation Models for Code Intelligence

License:Apache-2.0Stargazers:1028Issues:0Issues:0

ragflow

RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.

Language:PythonLicense:Apache-2.0Stargazers:14155Issues:0Issues:0

CascadeTabNet

This repository contains the code and implementation details of the CascadeTabNet paper "CascadeTabNet: An approach for end to end table detection and structure recognition from image-based documents"

Language:PythonLicense:MITStargazers:1479Issues:0Issues:0
Language:PythonStargazers:23Issues:0Issues:0

interpret

Fit interpretable models. Explain blackbox machine learning.

Language:C++License:MITStargazers:6167Issues:0Issues:0

albumentations

Fast and flexible image augmentation library. Paper about the library: https://www.mdpi.com/2078-2489/11/2/125

Language:PythonLicense:MITStargazers:13887Issues:0Issues:0

yolov5

YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

Language:PythonLicense:AGPL-3.0Stargazers:49181Issues:0Issues:0
Language:Jupyter NotebookLicense:MITStargazers:304Issues:0Issues:0

DAVAR-Lab-OCR

OCR toolbox from Davar-Lab

Language:PythonLicense:Apache-2.0Stargazers:726Issues:0Issues:0

ReadingBank

ReadingBank: A Benchmark Dataset for Reading Order Detection

Stargazers:87Issues:0Issues:0

DocBank

DocBank: A Benchmark Dataset for Document Layout Analysis

Language:PythonLicense:Apache-2.0Stargazers:553Issues:0Issues:0

PyMuPDF

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Language:PythonLicense:AGPL-3.0Stargazers:4831Issues:0Issues:0

xplique

👋 Xplique is a Neural Networks Explainability Toolbox

Language:PythonLicense:NOASSERTIONStargazers:614Issues:0Issues:0

Transformer-Explainability

[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.

Language:Jupyter NotebookLicense:MITStargazers:1741Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:3Issues:0Issues:0

OCRDatasets

A collection of OCR-related datasets

Stargazers:88Issues:0Issues:0

unstract

No-code LLM Platform to launch APIs and ETL Pipelines to structure unstructured documents

Language:PythonLicense:AGPL-3.0Stargazers:781Issues:0Issues:0

unitable

UniTable: Towards a Unified Table Foundation Model

Language:Jupyter NotebookLicense:MITStargazers:320Issues:0Issues:0

unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

Language:HTMLLicense:Apache-2.0Stargazers:8062Issues:0Issues:0

open-parse

Improved file parsing for LLM’s

Language:PythonLicense:MITStargazers:2270Issues:0Issues:0