Chaos's starred repositories
MMTrustEval
A toolbox for benchmarking trustworthiness of multimodal large language models (MultiTrust)
LLaVA-MOSS2
Modified LLaVA framework for MOSS2, and makes MOSS2 a multimodal model.
AI-Scientist
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery 🧑🔬
AI_Gen_Novel
基于大语言模型(LLM)和多智能体(Multi-Agent),探究AI写小说能力的边界
RemoteCLIP
🛰️ Official repository of paper "RemoteCLIP: A Vision Language Foundation Model for Remote Sensing" (IEEE TGRS)
WHU-OPT-SAR-dataset
Open source dataset; multimodal fusion;remote sensing;optical images; SAR images;deep learning
dive-into-llms
《动手学大模型Dive into LLMs》系列编程实践教程
LanguageBind
【ICLR 2024🔥】 Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
ImageBind-LoRA
Fine-tuning "ImageBind One Embedding Space to Bind Them All" with LoRA
Chinese-LLaVA
支持中英文双语视觉-文本对话的开源可商用多模态模型。
Fine-Tuning-the-Image-Encoder-of-clip-using-pre-Trained-CLIP-ViT-Large-Patch14
Optimize CLIP-ViT-Large-Patch14.ipynb with our tailored image encoder fine-tuning script. Quickly adapt the model to your needs for enhanced performance on image-based tasks.
executor-image-clip-encoder
CLIPImageEncoder is an image encoder that wraps the image embedding functionality using the CLIP
CLIP-API-service
CLIP as a service - Embed image and sentences, object recognition, visual reasoning, image classification and reverse image search
Chinese-CLIP
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
Matrix-Theory
电子科技大学《矩阵理论》复习笔记
Adversarial-Prompt-Tuning
ECCV2024: Adversarial Prompt Tuning for Vision-Language Models