hsiangyuzhao

followers

following

stars

Shanghai Jiao Tong University

Shanghai

hsiangyuzhao.github.io

Xiangyu Zhao's starred repositories

Combined_Dataset_for_Speech_Emotion_Recognition

A collection of dataset consists of a total of 8 English speech datasets for SER

Language:Jupyter NotebookMIT600

audiomentations

A Python library for audio data augmentation. Inspired by albumentations. Useful for machine learning.

Language:PythonMIT182900

depression-detect

Predicting depression from acoustic features of speech using a Convolutional Neural Network.

Language:Python28700

speechbrain

A PyTorch-based Speech Toolkit

Language:PythonApache-2.0868700

openspeech

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Language:PythonMIT67700

zouxian

Permanent Apple Intelligence + Xcode Predictive Code Completion for Chinese-market Mac computers

Language:ShellMIT68100

iRingo

解锁完整的 Apple功能和集成服务

Language:Vim SnippetGPL-3.0937400

fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

Language:PythonMIT3029100

audiolm-pytorch

Implementation of AudioLM, a SOTA Language Modeling Approach to Audio Generation out of Google Research, in Pytorch

Language:PythonMIT239900

EmoLLM

心理健康大模型、LLM、The Big Model of Mental Health、Finetune、InternLM2、InternLM2.5、Qwen、ChatGLM、Baichuan、DeepSeek、Mixtral、LLama3、GLM4、Qwen2、LLama3.1

Language:PythonMIT79000

OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language

Language:PythonNOASSERTION56900

cambrian

Cambrian-1 is a family of multimodal LLMs with a vision-centric design.

Language:PythonApache-2.0171700

ShiArthur03

Language:MATLABGPL-3.01036900

NExT-GPT

Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model

Language:PythonBSD-3-Clause323500

AnyGPT

Code for "AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling"

Language:Python75200

Awesome-Multimodal-Large-Language-Models

:sparkles::sparkles:Latest Advances on Multimodal Large Language Models

speech-trident

Awesome speech/audio LLMs, representation learning, and codec models

downkyi

哔哩下载姬downkyi，哔哩哔哩网站视频下载工具，支持批量下载，支持8K、HDR、杜比视界，提供工具箱（音视频提取、去水印等）。

Language:C#GPL-3.02083000

LLaVA-NeXT

Language:PythonApache-2.0257600

CVPR2024-Papers-with-Code

CVPR 2024 论文和开源项目合集

connected-components-3d

Connected components on discrete and continuous multilabel 3D & 2D images. Handles 26, 18, and 6 connected variants; periodic boundaries (4, 8, & 6)

Language:C++LGPL-3.036100

nnUNet

Language:PythonApache-2.0568800

learning_research

本人的科研经验

CLIP-Driven-Universal-Model

[ICCV 2023] CLIP-Driven Universal Model; Rank first in MSD Competition.

Language:PythonNOASSERTION56600

AbdomenAtlas

[NeurIPS 2023] AbdomenAtlas 1.0 (5,195 CT volumes + 9 annotated classes)

Language:PythonNOASSERTION20700

open_lm

A repository for research on medium sized language models.

Language:PythonMIT47300

text-generation-webui

A Gradio web UI for Large Language Models.

Language:PythonAGPL-3.03995600

ollama

Get up and running with Llama 3.2, Mistral, Gemma 2, and other large language models.

Language:GoMIT9267600

mmagic

OpenMMLab Multimodal Advanced, Generative, and Intelligent Creation Toolbox. Unlock the magic 🪄: Generative-AI (AIGC), easy-to-use APIs, awsome model zoo, diffusion models, for text-to-image generation, image/video restoration/enhancement, etc.

Language:Jupyter NotebookApache-2.0689600

mmsegmentation

OpenMMLab Semantic Segmentation Toolbox and Benchmark.

Language:PythonApache-2.0805100