Camilo Fosco's starred repositories
open-interpreter
A natural language interface for computers
insightface
State-of-the-art 2D and 3D Face Analysis Project
generative_agents
Generative Agents: Interactive Simulacra of Human Behavior
Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
Rerender_A_Video
[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation
Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
awesome-segment-anything
Tracking and collecting papers/projects/others related to Segment Anything.
chameleon-llm
Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".
GPT-4V-Act
AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI
agent-protocol
Common interface for interacting with AI agents. The protocol is tech stack agnostic - you can use it with any framework for building agents.
ControlVideo
[ICLR 2024] Official pytorch implementation of "ControlVideo: Training-free Controllable Text-to-Video Generation"
pgvector-python
pgvector support for Python
Text-To-Video-Finetuning
Finetune ModelScope's Text To Video model using Diffusers 🧨
Woodpecker
✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.
vid2vid-zero
Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models
fMRI-reconstruction-NSD
fMRI-to-image reconstruction on the NSD dataset.
VideoControlNet
Official Pytorch Implementation for "VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet"
emergent_analogies_LLM
Code for 'Emergent Analogical Reasoning in Large Language Models'