hzhang57

hzhang57's repositories

AdaFocusV2

000

AS-MLP

This is an official implementation for "AS-MLP: An Axial Shifted MLP Architecture for Vision".

MIT000

awesome-attention-mechanism-in-cv

:punch: CV中常用注意力模块;即插即用模块;ViT模型. PyTorch Implementation Collection of Attention Module and Plug&Play Module

Language:PythonMIT010

awesome-hand-pose-estimation

Awesome work on hand pose estimation/tracking

Language:Python010

CAA

CAA: Channelized Axial Attention for Semantic Segmentation

000

CMT_CNN-meet-Vision-Transformer

A PyTorch implementation of CMT based on paper CMT: Convolutional Neural Networks Meet Vision Transformers.

MIT000

Compact-Transformers

[Preprint] Escaping the Big Data Paradigm with Compact Transformers, 2021

Apache-2.0000

ConvNeXt

Code release for ConvNeXt model

Language:PythonMIT010

Convolutional-MLPs

[Preprint] ConvMLP: Hierarchical Convolutional MLPs for Vision, 2021

Apache-2.0000

deepvecfont

[SIGGRAPH Asia 2021] DeepVecFont: Synthesizing High-quality Vector Fonts via Dual-modality Learning

MIT000

DynamicViT

[NeurIPS 2021] DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification

MIT000

ffcv

FFCV: Fast Forward Computer Vision (and other ML workloads!)

Language:PythonApache-2.0010

GFNet

[NeurIPS 2021] Global Filter Networks for Image Classification

Language:Jupyter NotebookMIT010

GrabNet

GrabNet: A Generative model to generate realistic 3D hands grasping unseen objects (ECCV2020)

NOASSERTION000

Hand3DResearch

Language:Python010

how-do-vits-work

(ICLR 2022 Spotlight) Official PyTorch implementation of "How Do Vision Transformers Work?"

Language:PythonApache-2.0010

img2dataset

Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.

MIT000

LLVIP

LLVIP: A Visible-infrared Paired Dataset for Low-light Vision

000

MotionSqueeze

Official PyTorch Implementation of MotionSqueeze, ECCV 2020

BSD-2-Clause000

MoViNet-pytorch

MoViNets PyTorch implementation: Mobile Video Networks for Efficient Video Recognition;

MIT000

MOW

Language:Python010

MSG-Transformer

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

Apache-2.0000

MultiBench

[NeurIPS 2021] Multiscale Benchmarks for Multimodal Representation Learning

MIT000

PASS

The PASS dataset: pretrained models and how to get the data

MIT000

poster_template

some academic posters as references. May we have in-person poster session soon!

000

SlowFast

PySlowFast: video understanding codebase from FAIR for reproducing state-of-the-art video models.

Apache-2.0000

SPACH

MIT000

temporal-adaptive-module

TAM: Temporal Adaptive Module for Video Recognition

Apache-2.0000

vidaug

Effective Video Augmentation Techniques for Training Convolutional Neural Networks

MIT000

Visformer

Language:Python000