jpWang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonApache-2.02690 28 275

InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

Language:Python1888 36 283

InternVideo

Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.01077 30 125

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Language:PythonApache-2.01077 27 83

seqeval

A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)

Language:PythonMIT1058 9 66

Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonApache-2.0691 7 45

Awesome-Foundation-Models

A curated list of foundation models for vision and language tasks

MIT642 330

Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

436 16 3

VILA

VILA - A multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Language:PythonApache-2.0205 9 19

InstructDoc

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)

Language:PythonNOASSERTION129 3 7

docile

DocILE: Document Information Localization and Extraction Benchmark

Language:PythonMIT113 12 4

We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datasets that represent several challenges: rich schema including diverse data types, complex templates, and diversity of layouts within a single document type.

67 4 2

baselines

The code related to the baselines from NeurIPS 2021 paper "DUE: End-to-End Document Understanding Benchmark."

Language:PythonMIT36 6 9

RFUND

Official release of RFUND introduced in the paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction" (arXiv:2401.03472).

14 10

jpWang

Jiapeng Wang's starred repositories

spaCy

fastText

llama3

vllm

MediaCrawler

ChatGLM3

Qwen

trl

MiniCPM-V

self-llm

CogVLM

Qwen-VL

Baichuan2

GLM

Ask-Anything

sglang