jpWang

SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.

Language:PythonApache-2.0267700

self-llm

《开源大模型食用指南》基于Linux环境快速部署开源大模型，更适合**宝宝的部署教程

Language:Jupyter NotebookApache-2.0557300

Awesome-Foundation-Models

A curated list of foundation models for vision and language tasks

63700

mPLUG-DocOwl

mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding

Language:PythonApache-2.0107200

Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.

Language:PythonMIT282100

Chat-UniVi

[CVPR 2024 Highlight🔥] Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding

Language:PythonApache-2.068800

RFUND

Official release of RFUND introduced in the paper "PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction" (arXiv:2401.03472).

1400

baselines

The code related to the baselines from NeurIPS 2021 paper "DUE: End-to-End Document Understanding Benchmark."

Language:PythonMIT3600

Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

43500

InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

Language:Python187700

InternVideo

Video Foundation Models & Data for Multimodal Understanding

Language:PythonApache-2.0107300

docile

DocILE: Document Information Localization and Extraction Benchmark

Language:PythonMIT11300

seqeval

A Python framework for sequence labeling evaluation(named-entity recognition, pos tagging, etc...)

Language:PythonMIT105800

MediaCrawler

小红书笔记 | 评论爬虫、抖音视频 | 评论爬虫、快手视频 | 评论爬虫、B 站视频｜评论爬虫、微博帖子｜评论爬虫

Language:PythonNOASSERTION1465200

InstructDoc

InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)

Language:PythonNOASSERTION12900

We identify the desiderata for a comprehensive benchmark and propose Visually Rich Document Understanding (VRDU). VRDU contains two datasets that represent several challenges: rich schema including diverse data types, complex templates, and diversity of layouts within a single document type.

6700

jpWang

Jiapeng Wang's starred repositories

MiniCPM-V

VILA

vllm

trl

llama3

spaCy

sglang

self-llm

Long-CLIP

Awesome-Foundation-Models

mPLUG-DocOwl

Ask-Anything

Chat-UniVi

RFUND

baselines

Multimodal-AND-Large-Language-Models

InternLM-XComposer

InternVideo

docile

seqeval

MediaCrawler

InstructDoc

vrdu

GLM

ChatGLM3

CogVLM

Baichuan2

fastText

Qwen

Qwen-VL