OmkarThawakar

Omkar Thawakar's starred repositories

DiffusionDet

[ICCV2023 Best Paper Finalist] PyTorch implementation of DiffusionDet (https://arxiv.org/abs/2211.09788)

Language:PythonNOASSERTION2034 17 112

multimodal-prompt-learning

[CVPR 2023] Official repository of paper titled "MaPLe: Multi-modal Prompt Learning".

Language:PythonMIT572 6 75

object-centric-ovd

[NeurIPS 2022] Official repository of paper titled "Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection".

Language:Jupyter NotebookApache-2.0283 5 23

ViFi-CLIP

[CVPR 2023] Official repository of paper titled "Fine-tuned CLIP models are efficient video learners".

Language:PythonMIT230 9 21

OW-DETR

[CVPR 2022] Official Pytorch code for OW-DETR: Open-world Detection Transformer

Language:Python227 6 62

Handwriting-Transformers

Handwriting-Transformers (ICCV21)

Language:PythonMIT164 10 27

BIPNet

[CVPR 2022--Oral, Best paper Finalist] Burst Image Restoration and Enhancement. SOTA for Burst Super-resolution, Low-light Burst Image Enhancement, Burst Image De-noising

Language:Python132 7 20

Open-World-Tracking

Official code for "Opening up Open World Tracking" (CVPR 2022)

Language:Python53 5 11

doodleformer

DoodleFormer: Creative Sketch Drawing with Transformers (ECCV22)

Language:Python24 3 1

data_science_interview

Interview questions asked in Data Science/ Machine Learning interviews

19 50

Abstract. Person search is a challenging problem with various real- world applications, that aims at joint person detection and re-identification of a query person from uncropped gallery images. Although, previous study focuses on rich feature information learning, it’s still hard to re- trieve the query person due to the occurrence of appearance deformations and background distractors. In this paper, we propose a novel attention- aware relation mixer (ARM) module for person search, which exploits the global relation between different local regions within RoI of a per- son and make it robust against various appearance deformations and occlusion. The proposed ARM is composed of a relation mixer block and a spatio-channel attention layer. The relation mixer block introduces a spatially attended spatial mixing and a channel-wise attended channel mixing for effectively capturing discriminative relation features within an RoI. These discriminative relation features are further enriched by intro- ducing a spatio-channel attention where the foreground and background discriminability is empowered in a joint spatio-channel space. Our ARM module is generic and it does not rely on fine-grained supervisions or topological assumptions, hence being easily integrated into any Faster R-CNN based person search methods. Comprehensive experiments are performed on two challenging benchmark datasets: CUHK-SYSU [1] and PRW [2]. Our PS-ARM achieves state-of-the-art performance on both datasets. On the challenging PRW dataset, our PS-ARM achieves an absolute gain of 5% in the mAP score over SeqNet, while operating at a comparable speed

Language:PythonMIT13 4 1