ranck626

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Language:PythonApache-2.091600

dive-into-llms

《动手学大模型Dive into LLMs》系列编程实践教程

300300

MultiMon

Language:Python2200

Tip-Adapter

Language:Python51800

CLIP-LoRA

An easy way to apply LoRA to CLIP. Implementation of the paper "Low-Rank Few-Shot Adaptation of Vision-Language Models" (CLIP-LoRA) [CVPRW 2024].

Language:Python5800

CLIP-Adapter

Language:Python44100

SAR-CLIP

Language:Python100

LanguageBind

【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment

Language:PythonMIT65400

ImageBind-LoRA

Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRA

Language:PythonNOASSERTION16800

Chinese-LLaVA

支持中英文双语视觉-文本对话的开源可商用多模态模型。

Language:PythonApache-2.034800

Fine-Tuning-the-Image-Encoder-of-clip-using-pre-Trained-CLIP-ViT-Large-Patch14

Optimize CLIP-ViT-Large-Patch14.ipynb with our tailored image encoder fine-tuning script. Quickly adapt the model to your needs for enhanced performance on image-based tasks.

Language:Jupyter Notebook100

executor-image-clip-encoder

CLIPImageEncoder is an image encoder that wraps the image embedding functionality using the CLIP

Language:Python800

ImageBind

ImageBind One Embedding Space to Bind Them All

Language:PythonNOASSERTION815700

CLIP-API-service

CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search

Language:Jupyter NotebookApache-2.04600

Chinese-CLIP

Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.

Language:PythonMIT417800

Few-shot-NL2SQL-with-prompting

Language:PythonMIT29700

Matrix-Theory

电子科技大学《矩阵理论》复习笔记

Language:TeXApache-2.0500

ollama

Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.

Language:GoMIT8545800

RoCLIP

Robust Contrastive Language-Image Pretraining against Data Poisoning and Backdoor Attacks

Language:Python900

CLIP4CMR

A Comprehensive Empirical Study of Vision-Language Pre-trained Model for Supervised Cross-Modal Retrieval

Language:Python4000

Adversarial-Prompt-Tuning

ECCV2024: Adversarial Prompt Tuning for Vision-Language Models

Language:PythonMIT1500

VLAttack

This is an official repository of ``VLAttack: Multimodal Adversarial Attacks on Vision-Language Tasks via Pre-trained Models'' (NeurIPS 2023).

Language:Jupyter NotebookBSD-3-Clause2600