chwlsunny's repositories
ast
Code for the Interspeech 2021 paper "AST: Audio Spectrogram Transformer".
bottom-up-attention
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
Computer_Vision_primer
计算机视觉入门
CPlusPlusThings
C++那些事
d2-net
D2-Net: A Trainable CNN for Joint Description and Detection of Local Features
DeepLearning-500-questions
深度学习500问,以问答形式对常用的概率知识、线性代数、机器学习、深度学习、计算机视觉等热点问题进行阐述,以帮助自己及有需要的读者。 全书分为18个章节,50余万字。由于水平有限,书中不妥之处恳请广大读者批评指正。 未完待续............ 如有意合作,联系scutjy2015@163.com 版权所有,违权必究 Tan 2018.06
deeplearningbook-chinese
Deep Learning Book Chinese Translation
DF-GAN
Deep Fusion Generative Adversarial Networks for Text-to-Image Synthesis
MatchZoo-py
Facilitating the design, comparison and sharing of deep text matching models.
mcan-vqa
Deep Modular Co-Attention Networks for Visual Question Answering
Multi-Source-Sound-Localization
This repo aims to perform sound localization in complex audiovisual scenes, where there multiple objects making sounds.
nvim-config
My custom Neovim configuration with full battery for Python, Markdown, LaTeX and more...
openvqa
A lightweight, scalable, and general framework for visual question answering (VQA) research
pytorch-cnn-visualizations
Pytorch implementation of convolutional neural network visualization techniques
PyTorch-GAN
PyTorch implementations of Generative Adversarial Networks.
pytorch-grad-cam
PyTorch implementation of Grad-CAM
PyTorchTricks
Some tricks of pytorch... :star:
ResDAVEnet-VQ
Official codes for the paper "Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech"
ros_exploring
《ROS机器人开发实践》源码
rubi.bootstrap.pytorch
RUBi : Reducing Unimodal Biases for Visual Question Answering
Semantics-AssistedVideoCaptioning
Source code for Semantics-Assisted Video Captioning Model Trained with Scheduled Sampling Strategy
show-control-and-tell
Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
speaksee
PyTorch library for Visual-Semantic tasks
speech2image
Neural network implementation of a speech to image system. Networks are trained to embed images and corresponding captions to the same vector space.
Up-Down-Captioner
Automatic image captioning model based on Caffe, using features from bottom-up attention.
voice_datasets
🔊 A comprehensive list of open-source datasets for voice and sound computing (40+ datasets).
vqa_lol
Visual Reasoning :
vse_infty
Code for "Learning the Best Pooling Strategy for Visual Semantic Embedding", CVPR 2021
VSRN
PyTorch code for ICCV'19 paper "Visual Semantic Reasoning for Image-Text Matching"