nahidalam's repositories
annotated_deep_learning_paper_implementations
๐งโ๐ซ 60 Implementations/tutorials of deep learning papers with side-by-side notes ๐; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), ๐ฎ reinforcement learning (ppo, dqn), capsnet, distillation, ... ๐ง
datacomp
DataComp: In search of the next generation of multimodal datasets
DHS-LLM-Workshop
DHS 2023 LLM Workshop by Sourab Mangrulkar
groundingLMM
Grounding Large Multimodal Model (GLaMM), the first-of-its-kind model capable of generating natural language responses that are seamlessly integrated with object segmentation masks [CVPR 2024].
huggingface.js
Utilities to use the Hugging Face Hub API
imp
a family of multimodal small language models
inspect_ai
Inspect: A framework for large language model evaluations
llama.cpp
LLM inference in C/C++
LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
LLaVA-RLHF
Aligning LMMs with Factually Augmented RLHF
llm.c
LLM training in simple, raw C/CUDA
LLMs-from-scratch
Implementing a ChatGPT-like LLM from scratch, step by step
matmulfreellm
Implementation for MatMul-free LM.
ml-tic-clip
Repository for the paper: "TiC-CLIP: Continual Training of CLIP Models".
MMFM-Challenge
Official repository for the MMFM challenge
MobileVLM
Strong and Open Vision Language Assistant for Mobile Devices
MobiLlama
MobiLlama : Small Language Model tailored for edge devices
ollama
Get up and running with Llama 3, Mistral, Gemma, and other large language models.
open_clip
An open source implementation of CLIP.
OpenMoE
A family of open-sourced Mixture-of-Experts (MoE) Large Language Models
PLLaVA
Official repository for the paper PLLaVA
StoryDiffusion
Create Magic Story!
ultralytics
NEW - YOLOv8 ๐ in PyTorch > ONNX > OpenVINO > CoreML > TFLite
Video-ChatGPT
[ACL 2024 ๐ฅ] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Yi
A series of large language models trained from scratch by developers @01-ai