PeterPham's repositories
applied-llm
Everything about LLMs in production.
Awesome-Diffusion-Models
A collection of resources and papers on Diffusion Models
DiffSynth-Studio
Enjoy the magic of Diffusion models!
HSIConvKAN
How to Learn More? Exploring the Possibility of Kolmogorov-Arnold Networks for Hyperspectral Image Classification
MimicBrush
Official implementations for paper: Zero-shot Image Editing with Reference Imitation
transformers.js
State-of-the-art Machine Learning for the web. Run 🤗 Transformers directly in your browser, with no need for a server!
videollm-online
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
BentoBLIP
how to build an image captioning application on top of a BLIP model with BentoML
CosmicMan
CosmicMan: A Text-to-Image Foundation Model for Humans (CVPR 2024)
DDMI
Official Implementation (Pytorch) of "DDMI: Domain-Agnostic Latent Diffusion Models for Synthesizing High-Quality Implicit Neural Representations", ICLR 2024
Depth-Anything-V2
Depth Anything V2. A More Capable Foundation Model for Monocular Depth Estimation
grokfast-pytorch
Explorations into the proposal from the paper "Grokfast, Accelerated Grokking by Amplifying Slow Gradients"
hallo
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
MagicTime
MagicTime: Time-lapse Video Generation Models as Metamorphic Simulators
MultiPly
MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild (CVPR2024 Oral)
MV-VTON
MV-VTON: Multi-View Virtual Try-On with Diffusion Models
OpenCLAY
CLAY: A Controllable Large-scale Generative Model for Creating High-quality 3D Assets
OpenYOLO3D
Our OpenYOLO3D model achieves state-of-the-art performance in Open Vocabulary 3D Instance Segmentation on ScanNet200 and Replica datasets with up ∼16x speedup compared to the best existing method in literature.
RAG-Survey
Collecting awesome papers of RAG for AIGC. We propose a taxonomy of RAG foundations, enhancements, and applications in paper "Retrieval-Augmented Generation for AI-Generated Content: A Survey".
SMILE-Dataset
[NAACL'24] Repository for "SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models"
top-cvpr-2024-papers
This repository is a curated collection of the most exciting and influential CVPR 2024 papers. 🔥 [Paper + Code + Demo]
typer
Typer, build great CLIs. Easy to code. Based on Python type hints.
VGen
Official repo for VGen: a holistic video generation ecosystem for video generation building on diffusion models