wgansir

wgansir

Geek Repo

Github PK Tool:Github PK Tool

wgansir's starred repositories

AI-Job-Notes

AI算法岗求职攻略(涵盖准备攻略、刷题指南、内推和AI公司清单等资料)

Stargazers:4965Issues:0Issues:0

SenseVoice

Multilingual Voice Understanding Model

Language:PythonLicense:NOASSERTIONStargazers:1575Issues:0Issues:0

speech-dataset-generator

🔊 Create labeled datasets, enhance audio quality, identify speakers, support diverse dataset types. 🎧👥📊 Advanced audio processing.

Language:PythonLicense:MITStargazers:171Issues:0Issues:0
Language:PythonLicense:NOASSERTIONStargazers:276Issues:0Issues:0

CNNDetection

Code for the paper: CNN-generated images are surprisingly easy to spot... for now https://peterwang512.github.io/CNNDetection/

Language:PythonLicense:NOASSERTIONStargazers:804Issues:0Issues:0

AIGCDetectBaseline

AIGCDetectBaseline

Language:PythonStargazers:10Issues:0Issues:0

MeloTTS

High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean.

Language:PythonLicense:MITStargazers:4115Issues:0Issues:0

whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Language:PythonLicense:BSD-2-ClauseStargazers:10204Issues:0Issues:0

demucs

Code for the paper Hybrid Spectrogram and Waveform Source Separation

Language:PythonLicense:MITStargazers:7965Issues:0Issues:0

sahi

Framework agnostic sliced/tiled inference + interactive ui + error analysis plots

Language:PythonLicense:MITStargazers:3831Issues:0Issues:0

detectree2

Python package for automatic tree crown delineation based on the Detectron2 implementation of Mask R-CNN

Language:Jupyter NotebookLicense:MITStargazers:149Issues:0Issues:0

supervoice-separate

Supervoice Speaker Separation Network

Language:Jupyter NotebookStargazers:12Issues:0Issues:0

USIS10K

[ICML 2024] Official repository of the paper: "Diving into Underwater: Segment Anything Model Guided Underwater Salient Instance Segmentation and A Large-scale Dataset"

Language:PythonLicense:Apache-2.0Stargazers:64Issues:0Issues:0

pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Language:Jupyter NotebookLicense:MITStargazers:5584Issues:0Issues:0

Bert-VITS2

vits2 backbone with multilingual-bert

Language:PythonLicense:AGPL-3.0Stargazers:7551Issues:0Issues:0

Variations-of-SFANet-for-Crowd-Counting

The official implementation of "Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting"

Language:Jupyter NotebookLicense:GPL-3.0Stargazers:108Issues:0Issues:0

Rethinking-Counting

[CVPR 2022] Rethinking Spatial Invariance of Convolutional Networks for Object Counting

Language:PythonStargazers:58Issues:0Issues:0

neural-style-pytorch

Neural Style implementation in PyTorch! :art:

Language:Jupyter NotebookStargazers:64Issues:0Issues:0

neural-style-pytorch

A fast PyTorch implementation of "A Neural Algorithm of Artistic Style"

Language:PythonStargazers:5Issues:0Issues:0
Language:PythonStargazers:176Issues:0Issues:0
Language:PythonLicense:MITStargazers:36Issues:0Issues:0

CPIAD

Grid Patch Attack for Object Detection

Language:PythonStargazers:42Issues:0Issues:0

RepRTADet

Implementation of paper - Rep-RTADet: Reparameterized Real-Time Algae Object Detectors Enhanced through Dynamic Cache-Based Poisson Fusion

Language:PythonLicense:AGPL-3.0Stargazers:7Issues:0Issues:0

datasketch

MinHash, LSH, LSH Forest, Weighted MinHash, HyperLogLog, HyperLogLog++, LSH Ensemble and HNSW

Language:PythonLicense:MITStargazers:2466Issues:0Issues:0

GFPGAN

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

Language:PythonLicense:NOASSERTIONStargazers:35168Issues:0Issues:0

denoiser

Real Time Speech Enhancement in the Waveform Domain (Interspeech 2020)We provide a PyTorch implementation of the paper Real Time Speech Enhancement in the Waveform Domain. In which, we present a causal speech enhancement model working on the raw waveform that runs in real-time on a laptop CPU. The proposed model is based on an encoder-decoder architecture with skip-connections. It is optimized on both time and frequency domains, using multiple loss functions. Empirical evidence shows that it is capable of removing various kinds of background noise including stationary and non-stationary noises, as well as room reverb. Additionally, we suggest a set of data augmentation techniques applied directly on the raw waveform which further improve model performance and its generalization abilities.

Language:PythonLicense:NOASSERTIONStargazers:1618Issues:0Issues:0

DragGAN

Official Code for DragGAN (SIGGRAPH 2023)

Language:PythonLicense:NOASSERTIONStargazers:35603Issues:0Issues:0

StyleNeRF

This is the open source implementation of the ICLR2022 paper "StyleNeRF: A Style-based 3D-Aware Generator for High-resolution Image Synthesis"

Language:PythonStargazers:952Issues:0Issues:0

google-images-download

Python Script to download hundreds of images from 'Google Images'. It is a ready-to-run code!

Language:PythonLicense:MITStargazers:356Issues:0Issues:0

stable-diffusion-webui

Stable Diffusion web UI

Language:PythonLicense:AGPL-3.0Stargazers:136580Issues:0Issues:0