joslefaure

Gueter Josmy Faure's starred repositories

ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.

anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.

Language:JavaScriptMIT23839 189 1562

Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI.

Language:PythonMIT18344 203 386

TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Language:PythonApache-2.07700 108 156

Depth-Anything

[CVPR 2024] Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data. Foundation Model for Monocular Depth Estimation

Language:PythonApache-2.06830 49 211

Awesome-Transformer-Attention

An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites

4564 128 30

Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Language:PythonApache-2.02881 28 179

Awesome-Text-to-Image

(ෆ`꒳´ෆ) A Survey on Text-to-Image Generation/Synthesis.

MIT2100 72 7

DemoFusion

Let us democratise high-resolution generation! (CVPR 2024)

Language:Jupyter Notebook1971 33 44

MoE-LLaVA

Mixture-of-Experts for Large Vision-Language Models

Language:PythonApache-2.01934 24 90

LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)

Language:PythonApache-2.0693 14 104

TinyLLaVA_Factory

A Framework of Small-scale Large Multimodal Models

Language:PythonApache-2.0587 13 109

DoRA

[ICML2024 (Oral)] Official PyTorch implementation of DoRA: Weight-Decomposed Low-Rank Adaptation

Language:PythonNOASSERTION582 9 16

StreamMultiDiffusion

Official code for the paper "StreamMultiDiffusion: Real-Time Interactive Generation with Region-Based Semantic Control."

Language:Jupyter NotebookMIT526 10 15

meshed-memory-transformer

Meshed-Memory Transformer for Image Captioning. CVPR 2020

Language:PythonBSD-3-Clause516 13 97

AlphAction

Spatio-Temporal Action Localization System

Language:Python400 18 95

llava-phi

Language:Python361 27 24

TransformerCompression

For releasing code related to compression methods for transformers, accompanying our publications

Language:PythonMIT360 9 45

unmasked_teacher

[ICCV2023 Oral] Unmasked Teacher: Towards Training-Efficient Video Foundation Models

Language:PythonMIT285 13 47

ReST

[ICCV 2023] ReST: A Reconfigurable Spatial-Temporal Graph Model for Multi-Camera Multi-Object Tracking

Language:PythonMIT137 5 21

videoCC-data

VideoCC is a dataset containing (video-URL, caption) pairs for training video-text machine learning models. It is created using an automatic pipeline starting from the Conceptual Captions Image-Captioning Dataset.

CC-BY-4.076 3 2

HIT

Official Implementation of our WACV2023 paper: “Holistic Interaction Transformer Network for Action Detection”

Language:Python55 6 49

Koala-video-llm

Language:PythonBSD-3-Clause28 1 8

Tensorflow-JS-Projects

Web projects using Tensorflow JS, Plotly, D3, Echarts, NumJS, and NumericJS

Language:JavaScriptApache-2.019 60

iCLIP

[ICCVW 2023] Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection

Language:Python16 3 2