Siri-2001's starred repositories
naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Fay
Fay is an open-source digital human framework integrating language models and digital characters. It offers retail, assistant, and agent versions for diverse applications like virtual shopping guides, broadcasters, assistants, waiters, teachers, and voice or text-based mobile assistants.
anything-llm
The all-in-one Desktop & Docker AI application with full RAG and AI Agent capabilities.
audioWhisper
Listen to any audio stream on your machine and print out the transcribed or translated audio.
faster-whisper
Faster Whisper transcription with CTranslate2
VoiceTyping
通过语音(说话)即可完成实时文本输入。通过PaddleSpeech项目二次开发 完成,支持离线脱网环境部署,支持GPU推理,目前客户端仅支持Windows。
Glance-Focus
This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)
annotated_deep_learning_paper_implementations
🧑🏫 60 Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
VecFloorSeg
Source code repo for VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation
Room-Segmentation
Automatic Room Segmentation
VQA_to_multimodal_survey
Update 2020
llm-action
本项目旨在分享大模型相关技术原理以及实战经验。
py_floor_plan_segmenter
A Python package to segment cluttered 2D floor plans based on down-sampling.
awesome-pretrained-chinese-nlp-models
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
COCO-WholeBody
ECCV2020 paper "Whole-Body Human Pose Estimation in the Wild"
wholebody3d
Official repository of Human3.6M 3D WholeBody (H3WB) dataset
INR-V-VideoGenerationSpace
The Official Implementation for INR-V: A Continuous Representation Space for Video-based Generative Tasks
slt_how2sign_wicv2023
Sign Language Translation for Instructional Videos - CVPR WiCV 2023
Chinese-LLaMA-Alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
stable-diffusion-webui-extension-templates
a template of stable-diffusion-webui extension