Jeff-Zilence's starred repositories
latent-diffusion
High-Resolution Image Synthesis with Latent Diffusion Models
Awesome-Transformer-Attention
An ultimately comprehensive paper list of Vision Transformer/Attention, including papers, codes, and related websites
poolformer
PoolFormer: MetaFormer Is Actually What You Need for Vision (CVPR 2022 Oral)
InternVideo
Video Foundation Models & Data for Multimodal Understanding
Video-ChatGPT
[ACL 2024 🔥] Video-ChatGPT is a video conversation model capable of generating meaningful conversation about videos. It combines the capabilities of LLMs with a pretrained visual encoder adapted for spatiotemporal video representation. We also introduce a rigorous 'Quantitative Evaluation Benchmarking' for video-based conversational models.
Paint-by-Example
Paint by Example: Exemplar-based Image Editing with Diffusion Models
VideoMAEv2
[CVPR 2023] VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
lamar-benchmark
Source code for the ECCV 2022 paper "Benchmarking Localization and Mapping for Augmented Reality".
deep-visual-geo-localization-benchmark
Official code for CVPR 2022 (Oral) paper "Deep Visual Geo-localization Benchmark"
TransGeo2022
Official repository for TransGeo: Transformer Is All You Need for Cross-view Image Geo-localization
mapillary_sls
Mapillary Street-level Sequences Dataset
SSHarmonization
[ICCV'2021] "SSH: A Self-Supervised Framework for Image Harmonization", Yifan Jiang, He Zhang, Jianming Zhang, Yilin Wang, Zhe Lin, Kalyan Sunkavalli, Simon Chen, Sohrab Amirghodsi, Sarah Kong, Zhangyang Wang
CrossViewMetricLocalization
ECCV2022: Visual Cross-View Metric Localization with Dense Uncertainty Estimates
Vision-DiffMask
Official PyTorch implementation of Vision DiffMask, a post-hoc interpretation method for vision models.
instructpix2pix-sdxl
Training InstructPi2Pix with SDXL.
Street-to-Satellite_Image_Matching
Street-to-Satellite Image Matching thesis at the Intelligent Vehicles Group of the TU Delft.