medbar

followers

following

stars

Special Technological Center Ltd.

Saint Petersburg

https://scholar.google.ru/citations?user=T-kDfn4AAAAJ&hl=ru

Anton Mitrofanov's starred repositories

duckdb

DuckDB is an analytical in-process SQL database management system

Language:C++MIT2309700

m2d

Masked Modeling Duo: Towards a Universal Audio Pre-training Framework

Language:Jupyter NotebookNOASSERTION6400

GoodbyeDPI

GoodbyeDPI — Deep Packet Inspection circumvention utility (for Windows)

Language:CApache-2.02387100

prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

Language:PythonApache-2.01595400

FAdam_PyTorch

an implementation of FAdam (Fisher Adam) in PyTorch

Language:PythonMIT3100

fense

Fluency ENhanced Sentence-bert Evaluation (FENSE), metric for audio caption evaluation. And Benchmark dataset AudioCaps-Eval, Clotho-Eval.

Language:Python1900

Dasheng

Source for the Interspeech 2024 Paper "Scaling up masked audio encoder learning for general audio classification"

Language:PythonApache-2.04000

ONE-PEACE

A general representation model across vision, audio, language modalities. Paper: ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

Language:PythonApache-2.094300

AIR-Bench

AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension

Language:Python3800

DataProcessingFramework

Framework for processing and filtering datasets

Language:PythonApache-2.02500

AudioLLM

Audio Large Language Models

6800

AudioBench

AudioBench: A Universal Benchmark for Audio Large Language Models

Language:PythonNOASSERTION7400

FunASR

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Language:PythonNOASSERTION615400

zeta

Build high-performance AI models with modular building blocks

Language:PythonApache-2.038400

Qwen2-Audio

The official repo of Qwen2-Audio chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:Python111900

py-webrtcvad

Python interface to the WebRTC Voice Activity Detector

Language:CNOASSERTION202300

webdataset

A high-performance Python-based I/O system for large (and small) deep learning problems, with strong support for PyTorch.

Language:PythonBSD-3-Clause223000

YaFSDP

YaFSDP: Yet another Fully Sharded Data Parallel

Language:PythonApache-2.082400

SLAM-LLM

Speech, Language, Audio, Music Processing with Large Language Model

Language:PythonMIT51200

Awesome-Speaker-Diarization

Some comprehensive papers about speaker diarization

rir-classifier

Recipe for training and testing RIR-Classifier

Language:PythonMIT300

jsalt2020_simulate

Training data simulation

Language:PythonApache-2.04000

CTranslate2

Fast inference engine for Transformer models

Language:C++MIT325600

einops

Flexible and powerful tensor operations for readable and reliable code (for pytorch, jax, TF and others)

Language:PythonMIT839500

C8DASR-Baseline-NeMo

NeMo: a toolkit for conversational AI

Language:PythonApache-2.01200

Pengi

An Audio Language model for Audio Tasks

Language:PythonMIT28200

meeteval

MeetEval - A meeting transcription evaluation toolkit

Language:PythonMIT7500

chime-utils

Scripts for data generation, scoring and data manifest preparation for CHiME-8 DASR task.

Language:PythonMIT2000

NOTSOFAR1-Challenge

NOTSOFAR-1 Challenge: Distant Diarization and ASR

Language:PythonMIT4200

Qwen-Audio

The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.

Language:PythonNOASSERTION141800