There are 6 repositories under clip topic.
Chinese version of CLIP which achieves Chinese cross-modal retrieval and representation generation.
OpenMMLab Pre-training Toolbox and Benchmark
中文nlp解决方案(大模型、数据、模型、训练、推理)
Effortless data labeling with AI support from Segment Anything and other awesome models.
Easily compute clip embeddings and build a clip retrieval system with them
Android UI 快速开发,专治原生控件各种不服
Collection of AWESOME vision-language models for vision tasks
Must-have resource for anyone who wants to experiment with and build on the OpenAI vision API 🔥
🥂 Gracefully face hCaptcha challenge with MoE(ONNX) embedded solution.
Awesome list for research on CLIP (Contrastive Language-Image Pre-Training).
Stable Diffusion in NCNN with c++, supported txt2img and img2img
Search photos on Unsplash using natural language
Search inside YouTube videos using natural language
"Video-ChatGPT" is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Pocket-Sized Multimodal AI for content understanding and generation across multilingual texts, images, and 🔜 video, up to 5x faster than OpenAI CLIP and LLaVA 🖼️ & 🖋️
[ICCV 2021- Oral] Official PyTorch implementation for Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers, a novel method to visualize any Transformer-based network. Including examples for DETR, VQA.
基于Stable Diffusion优化的AI绘画模型。支持输入中英文文本,可生成多种现代艺术风格的高质量图像。| An optimized text-to-image model based on Stable Diffusion. Both Chinese and English text inputs are available to generate images. The model can generate high-quality images in several modern art styles.
React component for truncating multi-line spans and adding an ellipsis.
Keras beit,caformer,CMT,CoAtNet,convnext,davit,dino,efficientdet,edgenext,efficientformer,efficientnet,eva,fasternet,fastervit,fastvit,flexivit,gcvit,ghostnet,gpvit,hornet,hiera,iformer,inceptionnext,lcnet,levit,maxvit,mobilevit,moganet,nat,nfnets,pvt,swin,tinynet,tinyvit,uniformer,volo,vanillanet,yolor,yolov7,yolov8,yolox,gpt2,llama2, alias kecam
[CVPR'23] OpenScene: 3D Scene Understanding with Open Vocabularies
ZMJImageEditor is a picture editing component like WeChat. It is powerful and easy to integrate, supporting rendering, text, rotation, tailoring, mapping and other functions. (ZMJImageEditor 是一个和微信一样图片编辑的组件,功能强大,极易集成,支持绘制、文字、旋转、剪裁、贴图等功能)
👁️ + 💬 + 🎧 = 🤖 Curated list of top foundation and multimodal models! [Paper + Code + Examples + Tutorials]
Extract video features from raw videos using multiple GPUs. We support RAFT flow frames as well as S3D, I3D, R(2+1)D, VGGish, CLIP, ResNet features.
Android Easy Reveal Library
Open-source evaluation toolkit of large vision-language models (LVLMs), support GPT-4v, Gemini, QwenVLPlus, 30+ HF models, 15+ benchmarks
Getting the latest versions of Disco Diffusion to work locally, instead of colab. Including how I run this on Windows, despite some Linux only dependencies ;)