There are 19 repositories under foundation-models topic.
Making large AI models cheaper, faster and more accessible
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.
[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
EVA Series: Visual Representation Fantasies from BAAI
Chronos: Pretrained (Language) Models for Probabilistic Time Series Forecasting
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Images to inference with no labeling (use foundation models to train supervised models).
Emu Series: Generative Multimodal Models from BAAI
An automatic evaluator for instruction-following language models. Human-validated, high-quality, cheap, and fast.
Awesome things about LLM-powered agents. Papers / Repos / Blogs / ...
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting
A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
TorchXRayVision: A library of chest X-ray datasets and models. Classifiers, segmentation, and autoencoders.
日本語LLMまとめ - Overview of Japanese LLMs
A professional list on Large (Language) Models and Foundation Models (LLM, LM, FM) for Time Series, Spatiotemporal, and Event Data.
Creative interactive views of any dataset.
[ICLR 2024] Official PyTorch implementation of FasterViT: Fast Vision Transformers with Hierarchical Attention
[MICCAI 2019] [MEDIA 2020] Models Genesis
Must-read Papers on Knowledge Editing for Large Language Models.
A curated list of foundation models for vision and language tasks
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks.
Official implementation for HyenaDNA, a long-range genomic foundation model built with Hyena
PyTorch Implementation of EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via Self-Supervision
[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
[ECCV 2024] Tokenize Anything via Prompting
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
[ICLR'24 Spotlight] Uni3D: 3D Visual Representation from BAAI
[ECCV 2024] PointLLM: Empowering Large Language Models to Understand Point Clouds
A unified multi-task time series model.