Camilo Fosco's starred repositories

bun

Incredibly fast JavaScript runtime, bundler, test runner, and package manager – all in one

Language:ZigLicense:NOASSERTIONStargazers:71030Issues:616Issues:6812

open-interpreter

A natural language interface for computers

Language:PythonLicense:AGPL-3.0Stargazers:42175Issues:317Issues:747

Fooocus

Focus on prompting and generating

Language:PythonLicense:GPL-3.0Stargazers:36055Issues:272Issues:1336

insightface

State-of-the-art 2D and 3D Face Analysis Project

Language:PythonLicense:MITStargazers:21480Issues:502Issues:2410

generative_agents

Generative Agents: Interactive Simulacra of Human Behavior

pgvector

Open-source vector similarity search for Postgres

Language:CLicense:NOASSERTIONStargazers:9607Issues:79Issues:466

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.

Rerender_A_Video

[SIGGRAPH Asia 2023] Rerender A Video: Zero-Shot Text-Guided Video-to-Video Translation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:2896Issues:26Issues:102

mavo

Create web applications entirely by writing HTML and CSS!

Language:JavaScriptLicense:MITStargazers:2822Issues:57Issues:740

Video-LLaMA

[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

Language:PythonLicense:BSD-3-ClauseStargazers:2472Issues:30Issues:135

Painter

Painter & SegGPT Series: Vision Foundation Models from BAAI

Language:PythonLicense:MITStargazers:2433Issues:36Issues:64

LISA

Project Page for "LISA: Reasoning Segmentation via Large Language Model"

Language:PythonLicense:Apache-2.0Stargazers:1487Issues:10Issues:121

TokenFlow

Official Pytorch Implementation for "TokenFlow: Consistent Diffusion Features for Consistent Video Editing" presenting "TokenFlow" (ICLR 2024)

Language:PythonLicense:MITStargazers:1474Issues:77Issues:40

awesome-segment-anything

Tracking and collecting papers/projects/others related to Segment Anything.

chameleon-llm

Codes for "Chameleon: Plug-and-Play Compositional Reasoning with Large Language Models".

Language:Jupyter NotebookLicense:Apache-2.0Stargazers:1021Issues:19Issues:9

shell-ai

LangChain powered shell command generator and runner CLI

Language:PythonLicense:MITStargazers:956Issues:13Issues:19

GPT-4V-Act

AI agent using GPT-4V(ision) capable of using a mouse/keyboard to interact with web UI

agent-protocol

Common interface for interacting with AI agents. The protocol is tech stack agnostic - you can use it with any framework for building agents.

Language:PythonLicense:MITStargazers:787Issues:12Issues:39

ControlVideo

[ICLR 2024] Official pytorch implementation of "ControlVideo: Training-free Controllable Text-to-Video Generation"

Language:PythonLicense:MITStargazers:708Issues:21Issues:30

pgvector-python

pgvector support for Python

Language:PythonLicense:MITStargazers:704Issues:12Issues:53

Text-To-Video-Finetuning

Finetune ModelScope's Text To Video model using Diffusers 🧨

Language:PythonLicense:MITStargazers:623Issues:18Issues:68

Woodpecker

✨✨Woodpecker: Hallucination Correction for Multimodal Large Language Models. The first work to correct hallucinations in MLLMs.

vstar

PyTorch Implementation of "V* : Guided Visual Search as a Core Mechanism in Multimodal LLMs"

Language:PythonLicense:MITStargazers:440Issues:10Issues:13

vid2vid-zero

Zero-Shot Video Editing Using Off-The-Shelf Image Diffusion Models

fMRI-reconstruction-NSD

fMRI-to-image reconstruction on the NSD dataset.

Language:Jupyter NotebookLicense:MITStargazers:260Issues:4Issues:32

llark

Code for the paper "LLark: A Multimodal Foundation Model for Music" by Josh Gardner, Simon Durand, Daniel Stoller, and Rachel Bittner.

Language:PythonLicense:NOASSERTIONStargazers:251Issues:7Issues:6

LLaVAR

Code/Data for the paper: "LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding"

Language:PythonLicense:Apache-2.0Stargazers:237Issues:5Issues:20

VideoControlNet

Official Pytorch Implementation for "VideoControlNet: A Motion-Guided Video-to-Video Translation Framework by Using Diffusion Model with ControlNet"

IMProv

IMProv: Inpainting-based Multimodal Prompting for Computer Vision Tasks

emergent_analogies_LLM

Code for 'Emergent Analogical Reasoning in Large Language Models'

Language:PythonLicense:NOASSERTIONStargazers:36Issues:2Issues:0