my-foundation-models
In this repository, I aim to document the useful resources of foundational models for my work.
Language foundation models
- Chat completion, text generation: GPT-4
- Text embeddings: OpenAI API embeddings, CLIP and OpenCLIP, T5
Vision foundation models
- Class-agnostic segmentation models: SAM, HQ-SAM
- ImageNet22k trained: SwinTransformer?
- Semi-supervised models: MAE, DINOv2
Vision-Language models
-
Image tag generation: RAM, RAM++
-
Region-level grounding model: GLaMM, GroundingDINO, GroundingSAM
-
VQA, caption generation: BLIP2, CaSED, LLaVA