Austin Zhang's repositories
cocktailparty
Multi-Modal Multi-Channel System and Corpus For Cocktail Party Problem
ASRT_SpeechRecognition
A Deep-Learning-Based Chinese Speech Recognition System 基于深度学习的中文语音识别系统
AugLy
A data augmentations library for audio, image, text, and video.
chinese_text_normalization
Chinese text normalization for speech processing
google-research
Google Research
Hey-Jetson
Deep Learning based Automatic Speech Recognition with attention for the Nvidia Jetson.
jetson-inference
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
local-llms-analyse-finance
In this project, I explored how local LLMs can be used to label data and support analyses. Specifically, I used Llama2 model to automatically categorise my bank transaction data.
Megatron-LLM
distributed trainer for LLMs
pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding
pyctcdecode
A fast and lightweight python-based CTC beam search decoder for speech recognition.
PythonRobotics
Python sample codes for robotics algorithms.
pytorch-struct
Fast, general, and tested differentiable structured prediction in PyTorch
rendezvous
Next generation videoconference system
rnnt_decoder_cuda
An efficient implementation of RNN-T Prefix Beam Search in C++/CUDA.