Hon-Wong

followers

following

stars

Bytedance Inc.

Hon-Wong's starred repositories

whisper

Robust Speech Recognition via Large-Scale Weak Supervision

Language:PythonMIT64866 5420

grok-1

Grok open release

Language:PythonApache-2.049186 561 202

bark

🔊 Text-Prompted Generative Audio Model

Language:Jupyter NotebookMIT33888 316 423

spaCy

💫 Industrial-strength Natural Language Processing (NLP) in Python

Language:PythonMIT29328 558 5612

awesome-nlp

:book: A curated list of resources dedicated to Natural Language Processing (NLP)

CC0-1.016295 611 55

ChatGLM2-6B

ChatGLM2-6B: An Open Bilingual Chat LLM | 开源双语对话语言模型

Language:PythonNOASSERTION15635 134 615

wechat-chatgpt

Use ChatGPT On Wechat via wechaty

Language:TypeScript13192 950

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

ChatRWKV

ChatRWKV is like ChatGPT but powered by RWKV (100% RNN) language model, and open source.

Language:PythonApache-2.09343 90 116

textract

extract text from any document. no muss. no fuss.

Language:HTMLMIT3833 82 241

Awesome-Video-Datasets

Video datasets

Awesome-LLMs-for-Video-Understanding

🔥🔥🔥Latest Papers, Codes and Datasets on Vid-LLMs.

shikra

Language:PythonNOASSERTION709 8 63

text-dedup

All-in-one text de-duplication

Language:PythonApache-2.0552 4 57

vmoe

Language:Jupyter NotebookApache-2.0542 14 15

Video-MME

✨✨Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Youku-mPLUG

Youku-mPLUG: A 10 Million Large-scale Chinese Video-Language Pre-training Dataset and Benchmarks

Language:PythonApache-2.0271 5 29

GroundingGPT

[ACL 2024] GroundingGPT: Language-Enhanced Multi-modal Grounding Model

Language:PythonApache-2.0265 14 10

Multilingual-PR

Phoneme Recognition using pre-trained models Wav2vec2, HuBERT and WavLM. Throughout this project, we compared specifically three different self-supervised models, Wav2vec (2019, 2020), HuBERT (2021) and WavLM (2022) pretrained on a corpus of English speech that we will use in various ways to perform phoneme recognition for different languages with a network trained with Connectionist Temporal Classification (CTC) algorithm.

Language:Python186 4 5

FreestyleNet

[CVPR 2023 Highlight] Freestyle Layout-to-Image Synthesis

Language:PythonMIT138 5 15

orange3-text

🍊 :page_facing_up: Text Mining add-on for Orange3

Language:PythonNOASSERTION125 20 356

Text2NeRF

Official implementation of 'Text2NeRF: Text-Driven 3D Scene Generation with Neural Radiance Fields'

Language:PythonMIT111 15 14

Elysium

[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM

Language:Python30 9 5

PTSEFormer

[ECCV2022] PTSEFormer: Progressive Temporal-Spatial Enhanced TransFormer Towards Video Object Detection

Language:PythonMIT28 2 14