Beast code in Giters

wgansir's starred repositories

AI-Job-Notes

AI算法岗求职攻略（涵盖准备攻略、刷题指南、内推和AI公司清单等资料）

SenseVoice

Multilingual Voice Understanding Model

Language:PythonNOASSERTION157500

speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.

Language:PythonMIT17100

CNNDetection

Code for the paper: CNN-generated images are surprisingly easy to spot... for now https://peterwang512.github.io/CNNDetection/

Language:PythonNOASSERTION80400

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Language:PythonMIT411500

whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Language:PythonBSD-2-Clause1020400

demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Language:PythonMIT796500

sahi

Framework agnostic sliced/tiled inference + interactive ui + error analysis plots

Language:PythonMIT383100

detectree2

Python package for automatic tree crown delineation based on the Detectron2 implementation of Mask R-CNN

Language:Jupyter NotebookMIT14900

supervoice-separate

Supervoice Speaker Separation Network

Language:Jupyter Notebook1200

USIS10K

[ICML 2024] Official repository of the paper: "Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset"

Language:PythonApache-2.06400

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language:Jupyter NotebookMIT558400

Bert-VITS2

vits2 backbone with multilingual-bert

Language:PythonAGPL-3.0755100

Variations-of-SFANet-for-Crowd-Counting

The official implementation of "Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting"

Language:Jupyter NotebookGPL-3.010800

Rethinking-Counting

[CVPR 2022] Rethinking Spatial Invariance of Convolutional Networks for Object Counting

Language:Python5800

neural-style-pytorch

Neural Style implementation in PyTorch! :art:

Language:Jupyter Notebook6400

neural-style-pytorch

A fast PyTorch implementation of "A Neural Algorithm of Artistic Style"

Language:Python500

CPIAD

Grid Patch Attack for Object Detection

Language:Python4200

RepRTADet

Implementation of paper - Rep-RTADet: Reparameterized Real-Time Algae Object Detectors Enhanced through Dynamic Cache-Based Poisson Fusion

Language:PythonAGPL-3.0700

datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

Language:PythonMIT246600

GFPGAN

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

Language:PythonNOASSERTION3516800

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

Language:PythonNOASSERTION161800

wgansir

wgansir's starred repositories

AI-Job-Notes

SenseVoice

speech-dataset-generator

GenImage

CNNDetection

AIGCDetectBaseline

MeloTTS

whisperX

demucs

sahi

detectree2

supervoice-separate

USIS10K

pyannote-audio

Bert-VITS2

Variations-of-SFANet-for-Crowd-Counting

Rethinking-Counting

neural-style-pytorch

neural-style-pytorch

pmf_cvpr22

CAML

CPIAD

RepRTADet

datasketch

GFPGAN

denoiser

DragGAN

StyleNeRF

google-images-download

stable-diffusion-webui