Suanyang (suay1113)

suay1113

Geek Repo

Company:Jilin University

Location:China

Github PK Tool:Github PK Tool

Suanyang's starred repositories

LLaVA-UHD

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Language:PythonStargazers:241Issues:0Issues:0

MiniCPM-V

MiniCPM-Llama3-V 2.5: A GPT-4V Level Multimodal LLM on Your Phone

Language:PythonLicense:Apache-2.0Stargazers:7257Issues:0Issues:0

PQDiff

[ICLR 2024] Continuous-Multiple Image Outpainting in One-Step via Positional Query and A Diffusion-based Approach Link: https://arxiv.org/abs/2401.15652

Language:PythonStargazers:55Issues:0Issues:0

CogVLM2

GPT4V-level open-source multi-modal model based on Llama3-8B

Language:PythonLicense:Apache-2.0Stargazers:1333Issues:0Issues:0

latent-consistency-model

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Language:PythonLicense:MITStargazers:4176Issues:0Issues:0

yolov9-improve

Integration of many innovative for YOLOV9

Language:PythonLicense:GPL-3.0Stargazers:24Issues:0Issues:0

MTP

The official repo for [JSTARS'24] "MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining"

Language:PythonLicense:MITStargazers:112Issues:0Issues:0

gpt_academic

为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。

Language:PythonLicense:GPL-3.0Stargazers:60463Issues:0Issues:0

InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型

Language:PythonLicense:MITStargazers:3538Issues:0Issues:0

IC-Light

More relighting!

Language:PythonLicense:Apache-2.0Stargazers:3780Issues:0Issues:0

Bunny

A family of lightweight multimodal models.

Language:PythonLicense:Apache-2.0Stargazers:750Issues:0Issues:0

DiT

Official PyTorch Implementation of "Scalable Diffusion Models with Transformers"

Language:PythonLicense:NOASSERTIONStargazers:5485Issues:0Issues:0

dify

Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.

Language:TypeScriptLicense:NOASSERTIONStargazers:34028Issues:0Issues:0

labelU

Data annotation toolbox supports image, audio and video data.

Language:PythonStargazers:214Issues:0Issues:0

Monkey

【CVPR 2024 Highlight】Monkey (LMM): Image Resolution and Text Label Are Important Things for Large Multi-modal Models

Language:PythonLicense:MITStargazers:1489Issues:0Issues:0

LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language:PythonLicense:MITStargazers:587Issues:0Issues:0

ceval

Official github repo for C-Eval, a Chinese evaluation suite for foundation models [NeurIPS 2023]

Language:PythonLicense:MITStargazers:1527Issues:0Issues:0

Video-LLaVA

PG-Video-LLaVA: Pixel Grounding in Large Multimodal Video Models

Language:PythonStargazers:217Issues:0Issues:0

opencompass

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

Language:PythonLicense:Apache-2.0Stargazers:3008Issues:0Issues:0

EMS-YOLO

Offical implementation of "Deep Directly-Trained Spiking Neural Networks for Object Detection" (ICCV2023)

Language:PythonLicense:GPL-3.0Stargazers:125Issues:0Issues:0

Video-LLaVA

Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language:PythonLicense:Apache-2.0Stargazers:2617Issues:0Issues:0

Skywork

Skywork series models are pre-trained on 3.2TB of high-quality multilingual (mainly Chinese and English) and code data. We have open-sourced the model, training data, evaluation data, evaluation methods, etc. 天工系列模型在3.2TB高质量多语言和代码数据上进行预训练。我们开源了模型参数,训练数据,评估数据,评估方法。

Language:PythonLicense:NOASSERTIONStargazers:1149Issues:0Issues:0

AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Language:PythonLicense:MITStargazers:3990Issues:0Issues:0

InternLM-XComposer

InternLM-XComposer2 is a groundbreaking vision-language large model (VLLM) excelling in free-form text-image composition and comprehension.

Language:PythonStargazers:1862Issues:0Issues:0

Firefly

Firefly: 大模型训练工具,支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Language:PythonStargazers:5075Issues:0Issues:0

Qwen-VL

The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.

Language:PythonLicense:NOASSERTIONStargazers:4123Issues:0Issues:0

baipiaoOCR

convert paddleOCR to torchOCR, ppocr-v3,ppocr-v4, onnx, openvino

Language:PythonStargazers:27Issues:0Issues:0

PaddleOCRModelConvert

Convert the model in PaddleOCR to ONNX format

Language:PythonLicense:Apache-2.0Stargazers:45Issues:0Issues:0

PySceneDetect

:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.

Language:PythonLicense:BSD-3-ClauseStargazers:2917Issues:0Issues:0

towhee

Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.

Language:PythonLicense:Apache-2.0Stargazers:3045Issues:0Issues:0