Beast code in Giters

Suanyang's starred repositories

LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Language:Python24100

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonApache-2.0725700

PQDiff

[ICLR 2024] Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach Link: https://arxiv.org/abs/2401.15652

Language:Python5500

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonApache-2.0133300

latent-consistency-model

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Language:PythonMIT417600

yolov9-improve

Integration of many innovative for YOLOV9

Language:PythonGPL-3.02400

MTP

The official repo for [JSTARS'24] "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining"

Language:PythonMIT11200

为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。

Language:PythonGPL-3.06046300

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonMIT353800

IC-Light

More relighting!

Language:PythonApache-2.0378000

Bunny

A family of lightweight multimodal models.

Language:PythonApache-2.075000

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonNOASSERTION548500

dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Language:TypeScriptNOASSERTION3402800

labelU

Data annotation toolbox supports image, audio and video data.

Language:Python21400

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonMIT148900

LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language:PythonMIT58700

ceval

Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]

Language:PythonMIT152700

Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Language:Python21700

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonApache-2.0300800

EMS-YOLO

Offical implementation of "Deep Directly-Trained Spiking Neural Networks for Object Detection" (ICCV2023)

Language:PythonGPL-3.012500

Video-LLaVA

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language:PythonApache-2.0261700

Skywork

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数，训练数据，评估数据，评估方法。

Language:PythonNOASSERTION114900

suay1113

Suanyang's starred repositories

LLaVA-UHD

MiniCPM-V

PQDiff

CogVLM2

latent-consistency-model

yolov9-improve

MTP

gpt_academic

InternVL

IC-Light

Bunny

DiT

dify

labelU

Monkey

LanguageBind

ceval

Video-LLaVA

opencompass

EMS-YOLO

Video-LLaVA

Skywork

AutoGPTQ

InternLM-XComposer

Firefly

Qwen-VL

baipiaoOCR

PaddleOCRModelConvert

PySceneDetect

towhee