richardhahahaha

🦦 Otter, a multi-modal model based on OpenFlamingo (open-sourced version of DeepMind's Flamingo), trained on MIMIC-IT and showcasing improved instruction-following and in-context learning ability.

Language:PythonMIT3483 100 159

T-Rex

API for T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy

Language:PythonNOASSERTION1940 39 59

MotionBERT

[ICCV 2023] PyTorch Implementation of "MotionBERT: A Unified Perspective on Learning Human Motion Representations"

Language:PythonApache-2.0892 21 130

diart

A python package to build AI-powered real-time audio applications

Language:PythonMIT841 20 139

ar5iv

A web service offering HTML5 articles from arXiv.org as converted with latexml

Language:RustMIT728 7 464

autogen-ui

Web UI for AutoGen (A Framework Multi-Agent LLM Applications)

Language:TypeScriptMIT611 18 17

AFFiNE.pro

AFFiNE official website, source for affine.pro

Language:VueAGPL-3.0571 13 16

APE

[CVPR 2024] Aligning and Prompting Everything All at Once for Universal Visual Perception

Language:PythonApache-2.0444 6 46

Open-NLLB

Effort to open-source NLLB checkpoints.

Language:PythonMIT383 9 24

LocalAIVoiceChat

Local AI talk with a custom voice based on Zephyr 7B model. Uses RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis.

Language:PythonNOASSERTION344 6 11

PCT

This is an official implementation of our CVPR 2023 paper "Human Pose as Compositional Tokens" (https://arxiv.org/pdf/2303.11638.pdf)

Language:PythonMIT269 5 37

StyleSync_PyTorch

PyTorch implementation of "StyleSync: High-Fidelity Generalized and Personalized Lip Sync in Style-based Generator"

Language:Python180 13 3

MeMOTR

[ICCV 2023] MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking

Language:PythonMIT129 5 17

APTM

The official code of "Towards Unified Text-based Person Retrieval: A Large-scale Multi-Attribute and Language Search Benchmark"

Language:PythonMIT117 4 19

BUCTD

[ICCV 2023] "Rethinking pose estimation in crowds: overcoming the detection information-bottleneck and ambiguity"

Language:PythonApache-2.080 10 12

Speaker_diarization

Speech Diarization for scrum automation

Language:Jupyter NotebookMIT80 1 1

ContextAware-PoseFormer

The project is an official implementation of our paper "A Single 2D Pose With Context is Worth Hundreds for 3D Human Pose Estimation".

Language:Python57012

svt

Scattering Vision Transformer

Language:Python41 2 1

Lightweight-Face-Detector-Pruning

Code and pruned models for our paper: K. Gkrispanis, N. Gkalelis, V. Mezaris, "Filter-Pruning of Lightweight Face Detectors Using a Geometric Median Criterion", Proc. IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW 2024), Waikoloa, Hawaii, USA, Jan. 2024. Repository updated in April 2024.

Language:Python900

create-high-quality-dataset-for-computer-vision

This project focuses on generating a diverse and realistic dataset for computer vision training using ChatGPT and a realistic vision image generation model. The process involves dynamically creating prompts, utilizing ChatGPT to generate image descriptions, and generating images based on those descriptions.

Language:Jupyter Notebook700

CGB_ULD

Language:Python400

richardhahahaha

Richard Chen's starred repositories

screenshot-to-code

penpot

everyone-can-use-english

Transformers-Tutorials

rags

gpt-fast

annotated-transformer

AnyText

Otter