Mortyzhou-Shef-BIT

followers

following

stars

UoS -> NUS & BIT

https://mortyzaigc.netlify.app/

yhzhouowo's starred repositories

ego-AV-spatial-correspondence

[CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'

MIT300

Vitron

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Language:Python27000

AudioVisualLLM

Language:PythonApache-2.01200

av2av

[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Language:PythonMIT1900

GeoSeg

LSKNet for Remote Sensing Segmentation. This Repo is Based on UNetFormer official GitHub.

Language:PythonGPL-3.02700

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookNOASSERTION1069100

AudioEditingCode

Language:Python12900

Awesome-Simultaneous-Translation

Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.

TruthX

Code for ACL 2024 paper "TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space"

Language:PythonGPL-3.09600

BayLing

“百聆”是一个基于LLaMA的语言对齐增强的英语/中文大语言模型，具有优越的英语/中文能力，在多语言和通用任务等多项测试中取得ChatGPT 90%的性能。BayLing is an English/Chinese LLM equipped with advanced language alignment, showing superior capability in English/Chinese generation, instruction following and multi-turn interaction.

Language:PythonGPL-3.029200

Video_Call_MOS

A video quality MOS prediction model for videoconferencing calls that takes temporal distortions into account

Language:PythonCC-BY-4.03300

Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

audio-retrieval-benchmark

Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".

Language:Python4500

Visionary-Vids

Multi-modal transformer approach for natural language query based joint video summarization and highlight detection

Language:Jupyter NotebookNOASSERTION1100

bissa

[Pattern Recognition'24] Looking Beyond Input Frames: Self-Supervised Adaptation for Video Super-Resolution

Language:PythonMIT1200

moment_detr

[NeurIPS 2021] Moment-DETR code and QVHighlights dataset

Language:PythonMIT25700

TeleSpeech-ASR

Language:Python43400

unified-io-2

Language:PythonApache-2.055200

SegMamba

SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Language:Python31900

awesome-speech-to-speech-translation

List of direct speech-to-speech translation papers.

2600

AV-Deepfake1M

[ACM MM] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

NOASSERTION6100

MONET

Transparent medical image AI via an image–text foundation model grounded in medical literature

Language:PythonNOASSERTION3900

UniAV

Unified Audio-Visual Perception for Multi-Task Video Localization

Language:PythonMIT1500

images-that-sound

Official repo for Images that sound: a special spectrogram that can be seen as images and played as sound generated by diffusions

Language:PythonMIT20400

TempoTokens

This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

Language:PythonMIT9800

efficient-kan

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

Language:PythonMIT377400

SAM-Adapter-PyTorch

Adapting Meta AI's Segment Anything to Downstream Tasks with Adapters and Prompts

Language:PythonMIT94200

ssamba

The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

Language:PythonBSD-3-Clause8900

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:PythonMIT44800

CompA

Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

Language:Python1100