yhzhouowo (Mortyzhou-Shef-BIT)

Mortyzhou-Shef-BIT

Geek Repo

Location:UoS -> NUS & BIT

Home Page:https://mortyzaigc.netlify.app/

Github PK Tool:Github PK Tool

yhzhouowo's starred repositories

ego-AV-spatial-correspondence

[CVPR 2024] Code and datasets for 'Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos'

License:MITStargazers:3Issues:0Issues:0

Vitron

A Unified Pixel-level Vision LLM for Understanding, Generating, Segmenting, Editing

Language:PythonStargazers:270Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:12Issues:0Issues:0

av2av

[CVPR 2024] AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Language:PythonLicense:MITStargazers:19Issues:0Issues:0

GeoSeg

LSKNet for Remote Sensing Segmentation. This Repo is Based on UNetFormer official GitHub.

Language:PythonLicense:GPL-3.0Stargazers:27Issues:0Issues:0

seamless_communication

Foundational Models for State-of-the-Art Speech and Text Translation

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:10691Issues:0Issues:0
Language:PythonStargazers:129Issues:0Issues:0

Awesome-Simultaneous-Translation

Paper list of simultaneous translation / streaming translation, including text-to-text machine translation and speech-to-text translation.

Stargazers:558Issues:0Issues:0

TruthX

Code for ACL 2024 paper "TruthX: Alleviating Hallucinations by Editing Large Language Models in Truthful Space"

Language:PythonLicense:GPL-3.0Stargazers:96Issues:0Issues:0

BayLing

“百聆”是一个基于LLaMA的语言对齐增强的英语/中文大语言模型,具有优越的英语/中文能力,在多语言和通用任务等多项测试中取得ChatGPT 90%的性能。BayLing is an English/Chinese LLM equipped with advanced language alignment, showing superior capability in English/Chinese generation, instruction following and multi-turn interaction.

Language:PythonLicense:GPL-3.0Stargazers:292Issues:0Issues:0

Video_Call_MOS

A video quality MOS prediction model for videoconferencing calls that takes temporal distortions into account

Language:PythonLicense:CC-BY-4.0Stargazers:33Issues:0Issues:0

Multimodal-AND-Large-Language-Models

Paper list about multimodal and large language models, only used to record papers I read in the daily arxiv for personal needs.

Stargazers:497Issues:0Issues:0

audio-retrieval-benchmark

Implementation of "Audio Retrieval with Natural Language Queries: A Benchmark Study".

Language:PythonStargazers:45Issues:0Issues:0

Visionary-Vids

Multi-modal transformer approach for natural language query based joint video summarization and highlight detection

Language:Jupyter NotebookLicense:NOASSERTIONStargazers:11Issues:0Issues:0

bissa

[Pattern Recognition'24] Looking Beyond Input Frames: Self-Supervised Adaptation for Video Super-Resolution

Language:PythonLicense:MITStargazers:12Issues:0Issues:0

moment_detr

[NeurIPS 2021] Moment-DETR code and QVHighlights dataset

Language:PythonLicense:MITStargazers:257Issues:0Issues:0
Language:PythonStargazers:434Issues:0Issues:0
Language:PythonLicense:Apache-2.0Stargazers:552Issues:0Issues:0

SegMamba

SegMamba: Long-range Sequential Modeling Mamba For 3D Medical Image Segmentation

Language:PythonStargazers:319Issues:0Issues:0

awesome-speech-to-speech-translation

List of direct speech-to-speech translation papers.

Stargazers:26Issues:0Issues:0

AV-Deepfake1M

[ACM MM] AV-Deepfake1M: A Large-Scale LLM-Driven Audio-Visual Deepfake Dataset

License:NOASSERTIONStargazers:61Issues:0Issues:0

MONET

Transparent medical image AI via an image–text foundation model grounded in medical literature

Language:PythonLicense:NOASSERTIONStargazers:39Issues:0Issues:0

UniAV

Unified Audio-Visual Perception for Multi-Task Video Localization

Language:PythonLicense:MITStargazers:15Issues:0Issues:0

images-that-sound

Official repo for Images that sound: a special spectrogram that can be seen as images and played as sound generated by diffusions

Language:PythonLicense:MITStargazers:204Issues:0Issues:0

TempoTokens

This repo contains the official PyTorch implementation of: Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

Language:PythonLicense:MITStargazers:98Issues:0Issues:0

efficient-kan

An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).

Language:PythonLicense:MITStargazers:3774Issues:0Issues:0

SAM-Adapter-PyTorch

Adapting Meta AI's Segment Anything to Downstream Tasks with Adapters and Prompts

Language:PythonLicense:MITStargazers:942Issues:0Issues:0

ssamba

The official implementation of SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model

Language:PythonLicense:BSD-3-ClauseStargazers:89Issues:0Issues:0

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:PythonLicense:MITStargazers:448Issues:0Issues:0

CompA

Code for ICLR 2024 Paper: CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

Language:PythonStargazers:11Issues:0Issues:0