andrehuang / my-foundation-models

my-foundation-models

In this repository, I aim to document the useful resources of foundational models for my work.

Language foundation models

Chat completion, text generation: GPT-4
Text embeddings: OpenAI API embeddings, CLIP and OpenCLIP, T5

Vision foundation models

Class-agnostic segmentation models: SAM, HQ-SAM
ImageNet22k trained: SwinTransformer?
Semi-supervised models: MAE, DINOv2

Vision-Language models

CLIP, DiHT, SigLIP
Image tag generation: RAM, RAM++
Region-level grounding model: GLaMM, GroundingDINO, GroundingSAM
VQA, caption generation: BLIP2, CaSED, LLaVA

About