dogvane / CVPR2023-Papers-with-Code

CVPR 2023 论文和开源项目合集(标题翻译中文版)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

CVPR 2023 论文和开源项目合集(Papers with Code)

CVPR 2023 论文和开源项目合集(papers with code)!

25.78% = 2360 / 9155 25.78% = 2360 / 9155

CVPR 2023 decisions are now available on OpenReview! This year, wereceived a record number of 9155 submissions (a 12% increase over CVPR 2022), and accepted 2360 papers, for a 25.78% acceptance rate.

注0:项目来自于 https://github.com/amusi/CVPR2023-Papers-with-Code, 当前项目将原文里的标题用翻译工具转为中文,未做修订,仅作参考

注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目!

注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision

如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~

【CVPR 2023 论文开源目录】

Backbone

Integrally Pre-Trained Transformer Pyramid Networks

Stitchable Neural Networks 缝合神经网络

Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks 跑步,不走路:追逐更高的FLOPS,为更快的神经网络加速

BiFormer: Vision Transformer with Bi-Level Routing Attention BiFormer: 具有双层路由注意力机制的视觉变分器

DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network 深度MAD: 深度卷积神经网络的数学架构设计

Vision Transformer with Super Token Sampling 具有超令牌采样的视觉变压器

Hard Patches Mining for Masked Image Modeling 硬质补丁挖掘用于遮罩图像建模

  • Paper: None
  • Code: None

SMPConv: Self-moving Point Representations for Continuous Convolution SMPConv: 连续卷积的自移动点表示

Making Vision Transformers Efficient from A Token Sparsification View 从令牌稀疏化的角度优化视觉变压器

CLIP

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis GALIP:用于文本到图像生成的生成对抗网络

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation ΔEdit:探索无需文本驱动的图像编辑

MAE

Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders

Generic-to-Specific Distillation of Masked Autoencoders 隐藏的自动编码器泛化到特定

GAN

DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation ΔEdit:探索无文本训练,用于文本驱动的图像操作

NeRF

NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior NoPe-NeRF: 优化神经辐射场,无需姿态先验

Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures 潜在的NeRF用于形状引导生成3D形状和纹理

NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis 手掌中的NeRF:通过新颖的视图合成为机器人提供校正增强

Panoptic Lifting for 3D Scene Understanding with Neural Fields 带有神经场的3D场景理解中的全景提升

NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer NeRFLiX: 由学习退化驱动的交互式多视角器

HNeRV: A Hybrid Neural Representation for Videos HNeRV:一种用于视频的混合神经表示

DETR

DETRs with Hybrid Matching DETRs with Hybrid Matching

Prompt

Diversity-Aware Meta Visual Prompting 多样性感知的元视觉提示

NAS

PA&DA: Jointly Sampling PAth and DAta for Consistent NAS PA&DA:联合采样PA路径和DAta以实现一致的NAS

Avatars

Structured 3D Features for Reconstructing Relightable and Animatable Avatars 结构化的3D特征用于重建可重用和可动画化角色

Learning Personalized High Quality Volumetric Head Avatars from Monocular RGB Videos 从单目RGB视频中学取个性化高质量体积头戴式表盘

ReID(重识别)

Clothing-Change Feature Augmentation for Person Re-Identification 服装变更特征增强用于人员重新识别

  • Paper: None
  • Code: None

MSINet: Twins Contrastive Search of Multi-Scale Interaction for Object ReID MSINet:多尺度交互对比搜索对象的重新识别

Shape-Erased Feature Learning for Visible-Infrared Person Re-Identification 形状消去特征学习可视红外人体重新识别

Large-scale Training Data Search for Object Re-identification 大规模训练数据搜索用于对象识别

Diffusion Models(扩散模型)

Video Probabilistic Diffusion Models in Projected Latent Space

Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models 使用预训练的2D扩散模型解决3D反演问题

Imagic: Text-Based Real Image Editing with Diffusion Models 异想天开:基于扩散模型的文本驱动式图像编辑

Parallel Diffusion Models of Operator and Image for Blind Inverse Problems 盲目问题中的算子异质扩散模型

DiffRF: Rendering-guided 3D Radiance Field Diffusion DiffRF: 基于渲染的3D辐射场扩散

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation MM-Diffusion:学习联合音频和视频生成的多模态扩散模型

HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising HouseDiffusion:通过具有离散和连续去噪的扩散模型生成矢量 floorplan。

TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets TrojDiff:针对具有多样目标的扩散模型的 Trojan 攻击

Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption 回到原文:基于扩散的适应性测试时间损坏

DR2: Diffusion-based Robust Degradation Remover for Blind Face Restoration DR2:基于扩散的盲人脸修复增强去除器

Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion 跟踪与速度:通过引导轨迹扩散可控制的行人动画

Generative Diffusion Prior for Unified Image Restoration and Enhancement 生成对抗扩散优先用于统一图像恢复和增强

Conditional Image-to-Video Generation with Latent Flow Diffusion Models 条件图像到视频生成:潜在流动扩散模型

长尾分布(Long-Tail)

Long-Tailed Visual Recognition via Self-Heterogeneous Integration with Knowledge Excavation 长尾视觉识别通过自下而上的异质整合与知识挖掘

Vision Transformer

Integrally Pre-Trained Transformer Pyramid Networks

Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors Mask3D: 使用遮罩3D生成器进行预训练的2D视觉变压器

Learning Trajectory-Aware Transformer for Video Super-Resolution 学习轨迹感知的变分自注意力器用于视频超分辨率

Vision Transformers are Parameter-Efficient Audio-Visual Learners 视觉变分器是一种参数高效的音频-视觉学习者

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes 我们所处位置以及我们关注的焦点:基于层次和场景的全世界图像地理定位查询

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets DSVT:动态稀疏 voxel 转换器 with 旋转集

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting DeepSolo:为文本检测提供显式点生成器

BiFormer: Vision Transformer with Bi-Level Routing Attention BiFormer: 具有双层路由注意力机制的视觉Transformer

Vision Transformer with Super Token Sampling 具有超令牌采样的Transformer视觉模型

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision BEVFormer v2:通过视角监督调整现代图像骨干网络以实现 bird's-eye-view 识别

BAEFormer: Bi-directional and Early Interaction Transformers for Bird’s Eye View Semantic Segmentation BAEFormer: 双向和早期交互变压器用于鸟瞰式语义分割

  • Paper: None
  • Code: None

Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention 视觉依赖变体:依赖树从反转注意力中产生

Making Vision Transformers Efficient from A Token Sparsification View 从令牌稀疏化的角度优化视觉Transformer

视觉和语言(Vision-Language)

GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods GIVL: 改进视觉语言模型在地理包容性方面的表现,通过预训练方法

Teaching Structured Vision&Language Concepts to Vision&Language Models 教授结构化视觉与语言概念给视觉与语言模型

Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks Uni-Perceiver v2:大规模视觉和视觉语言任务的泛模型

Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training 向可扩展性视频 moment 检索:视觉动态注入到图像文本预训练

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining 捕获目标检测:统一稠密标注和开放世界检测预训练

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks FAME-ViL:多任务视觉语言模型,用于异质时尚任务

Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding 元探索:使用场景对象谱聚类进行探究式层次视觉语言导航

All in One: Exploring Unified Video-Language Pre-training 万物皆可统一:探索统一视频语言预训练

Position-guided Text Prompt for Vision Language Pre-training 面向位置的文本提示,用于视觉语言预训练

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding EDA:显式文本解耦和稠密对齐 3D 视觉定位

CapDet: Unifying Dense Captioning and Open-World Detection Pretraining 捕获目标检测:统一稠密标注和开放世界检测预训练

FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks FAME-ViL:多任务视觉语言模型,用于异质时尚任务

Align and Attend: Multimodal Summarization with Dual Contrastive Losses 对齐和参加:带有双对比损失的多元摘要

Multi-Modal Representation Learning with Text-Driven Soft Masks 多模态表示学习与文本驱动的软掩码

Learning to Name Classes for Vision and Language Models 学习为视觉和语言模型命名类别

目标检测(Object Detection)

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors YOLOv7: 训练有素的教学套件为实时物体检测创造了新境界

DETRs with Hybrid Matching DETR with Hybrid Matching

Enhanced Training of Query-Based Object Detection via Selective Query Recollection 基于选择性查询回调的查询为基础对象检测的增强训练

Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection 面向对象的蒸馏 Pyramid 模型 Open-Vocabulary 对象检测

目标跟踪(Object Tracking)

Simple Cues Lead to a Strong Multi-Object Tracker 简单的提示导致一个强大的多目标跟踪器

Joint Visual Grounding and Tracking with Natural Language Specification 联合视觉基础和跟踪与自然语言规格

语义分割(Semantic Segmentation)

Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos 高效的语义分割通过改变压缩视频的分辨率来实现

FREDOM: Fairness Domain Adaptation Approach to Semantic Scene Understanding 自由:公平领域自适应方法用于语义场景理解

医学图像分割(Medical Image Segmentation)

Label-Free Liver Tumor Segmentation 标签无肝肿瘤分割

Directional Connectivity-based Segmentation of Medical Images 基于方向连接的医学图像分割

Bidirectional Copy-Paste for Semi-Supervised Medical Image Segmentation 双向复制粘贴半监督医学图像分割

Devil is in the Queries: Advancing Mask Transformers for Real-world Medical Image Segmentation and Out-of-Distribution Localization 恶魔在查询中:用于真实医学图像分割和异构局部化的先进面具变压器

Fair Federated Medical Image Segmentation via Client Contribution Estimation 公平联合医学图像分割通过客户端贡献估计

Ambiguous Medical Image Segmentation using Diffusion Models 模糊医学图像分割使用扩散模型

Orthogonal Annotation Benefits Barely-supervised Medical Image Segmentation 线性注释:几乎没有监督的医学图像分割

MagicNet: Semi-Supervised Multi-Organ Segmentation via Magic-Cube Partition and Recovery MagicNet: 半监督多器官分割通过魔方划分和恢复

MCF: Mutual Correction Framework for Semi-Supervised Medical Image Segmentation MCF: 半监督医疗图像分割相互校正框架

Rethinking Few-Shot Medical Segmentation: A Vector Quantization View 重新思考少量样本医疗分割:一个向量化视角

Pseudo-label Guided Contrastive Learning for Semi-supervised Medical Image Segmentation 伪标签引导对比学习半监督医学图像分割

SDC-UDA: Volumetric Unsupervised Domain Adaptation Framework for Slice-Direction Continuous Cross-Modality Medical Image Segmentation SDC-UDA:用于医学图像分割的卷积无监督领域自适应框架

DoNet: Deep De-overlapping Network for Cytology Instance Segmentation DoNet: 深度重叠网络用于细胞学实例分割

视频目标分割(Video Object Segmentation)

Two-shot Video Object Segmentation 双击视频对象分割

Under Video Object Segmentation Section

视频实例分割(Video Instance Segmentation)

Mask-Free Video Instance Segmentation 无 mask 视频实例分割

参考图像分割(Referring Image Segmentation )

PolyFormer: Referring Image Segmentation as Sequential Polygon Generation PolyFormer:将图像分割视为序列多边形生成

3D点云(3D-Point-Cloud)

Physical-World Optical Adversarial Attacks on 3D Face Recognition 物理世界对3D人脸识别的光学对抗攻击

IterativePFN: True Iterative Point Cloud Filtering 迭代PFN:真实的迭代点云过滤

Attention-based Point Cloud Edge Sampling 基于注意力的点云边缘采样

3D目标检测(3D Object Detection)

DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets DSVT:动态稀疏立方体变换器

FrustumFormer: Adaptive Instance-aware Resampling for Multi-view 3D Detection FrustumFormer: 适应性实例感测多视角3D检测

3D Video Object Detection with Learnable Object-Centric Global Optimization 3D 视频对象检测与可学习的目标中心全局优化

  • Paper: None
  • Code: None

Hierarchical Supervision and Shuffle Data Augmentation for 3D Semi-Supervised Object Detection 分层监督和随机数据增强对于3D半监督目标检测

3D语义分割(3D Semantic Segmentation)

Less is More: Reducing Task and Model Complexity for 3D Point Cloud Semantic Segmentation 少即是多:为3D点云语义分割减少任务和模型复杂性

3D语义场景补全(3D Semantic Scene Completion)

3D配准(3D Registration)

Robust Outlier Rejection for 3D Registration with Variational Bayes 稳健的异常排斥对于3D配准使用变分贝叶斯

3D人体姿态估计(3D Human Pose Estimation)

3D人体Mesh估计(3D Human Mesh Estimation)

3D Human Mesh Estimation from Virtual Markers 3D 人体网格估计从虚拟标记

Low-level Vision

Causal-IR: Learning Distortion Invariant Representation for Image Restoration from A Causality Perspective 因果-IR:从因果视角学习图像恢复的失真不变表示

Burstormer: Burst Image Restoration and Enhancement Transformer 暴风雨:暴风雨图像恢复和增强变压器

超分辨率(Video Super-Resolution)

Super-Resolution Neural Operator 超级分辨率神经算子

视频超分辨率

Learning Trajectory-Aware Transformer for Video Super-Resolution 学习轨迹感知的变分自注意力器用于视频超分辨率

Denoising

去噪(Denoising)

图像去噪(Image Denoising)

Masked Image Training for Generalizable Deep Image Denoising 隐藏图像训练用于一般深度图像去噪

图像生成(Image Generation)

GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis GALIP: 生成对抗网络CLIPs用于文本到图像合成

MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis MAGE:要求生成对抗网络统一表示学习和图像生成

Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation 面向可验证和可重复的人类评估文本到图像生成

Few-shot Semantic Image Synthesis with Class Affinity Transfer 少样本语义图像生成与类别迁移

TopNet: Transformer-based Object Placement Network for Image Compositing 顶级网络:基于Transformer的图像合成物体放置网络

视频生成(Video Generation)

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation MM-Diffusion:学习联合音频和视频生成的多模态扩散模型

Conditional Image-to-Video Generation with Latent Flow Diffusion Models 条件图像到视频生成与潜在流动扩散模型

视频理解(Video Understanding)

Learning Transferable Spatiotemporal Representations from Natural Script Knowledge 从自然脚本知识中学习可转移的时空表示

Frame Flexible Network 框架灵活网络

Masked Motion Encoding for Self-Supervised Video Representation Learning 遮蔽运动编码用于自监督视频表示学习

MARLIN: Masked Autoencoder for facial video Representation LearnING 玛琳:面部视频表征的遮蔽自动编码器学习

行为检测(Action Detection)

TriDet: Temporal Action Detection with Relative Boundary Modeling TriDet: 基于相对边界模型的时间动作检测

文本检测(Text Detection)

DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting DeepSolo:带有显式点对文本检测的Transformer解码器

知识蒸馏(Knowledge Distillation)

Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation 在对抗性数据无监督知识蒸馏中学会保留:对抗分布漂移的防御

Generic-to-Specific Distillation of Masked Autoencoders 隐藏变量自动编码器的泛化到特定蒸馏

模型剪枝(Model Pruning)

DepGraph: Towards Any Structural Pruning DepGraph:迈向任何结构剪枝

图像压缩(Image Compression)

Context-Based Trit-Plane Coding for Progressive Image Compression 基于内容的渐进图像压缩

异常检测(Anomaly Detection)

Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images 深度特征回填在X射线图像的无监督异常检测中的应用

三维重建(3D Reconstruction)

OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields OReX:从平面截面中使用神经场进行物体重构

SparsePose: Sparse-View Camera Pose Regression and Refinement 稀疏姿态:稀疏视图相机姿态回归和细化

NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction 神经可塑锚:用于高精度隐式表面重构的高维神经网络

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition Vid2Avatar:通过自监督场景分解从野外视频重建3D Avatar

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision 是否符合:基于模型的面部重建和遮盖分割弱监督

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction 结构多层图像:连接神经视图合成和3D重建

3D Cinemagraphy from a Single Image 单张图像的3D电影短片

Revisiting Rotation Averaging: Uncertainties and Robust Losses 重新回顾旋转平均:不确定的性和鲁棒损失

FFHQ-UV: Normalized Facial UV-Texture Dataset for 3D Face Reconstruction FFHQ-UV: 用于3D人脸重建的标准化面部UV纹理数据集

A Hierarchical Representation Network for Accurate and Detailed Face Reconstruction from In-The-Wild Images 一种从野外图像中进行准确详细人脸重建的层次表示网络

深度估计(Depth Estimation)

Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation 轻量级单通道深度估计:一个用于自监督单通道深度估计的轻量级CNN和Transformer架构

轨迹预测(Trajectory Prediction)

IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction IPCC-TP:联合多智能体轨迹预测的增量Pearson相关系数

EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning 均衡多智能体运动预测与不变交互推理

车道线检测(Lane Detection)

Anchor3DLane: Learning to Regress 3D Anchors for Monocular 3D Lane Detection 锚3D通道:学习为单目3D车道检测训练3D锚点

BEV-LaneDet: An Efficient 3D Lane Detection Based on Virtual Camera via Key-Points BEV-LaneDet:基于虚拟相机的关键点高效3D车道检测

图像描述(Image Captioning)

ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing ConZIC:基于采样的可控制零样本图像标题识别

Cross-Domain Image Captioning with Discriminative Finetuning 跨域图像标题识别与有监督微调

Model-Agnostic Gender Debiased Image Captioning 模型无关的性别偏置图像标题生成

视觉问答(Visual Question Answering)

MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering 混合PHM:针对低资源视觉问答的冗余性感知参数高效调整

手语识别(Sign Language Recognition)

Continuous Sign Language Recognition with Correlation Network 连续手语识别与相关网络

Paper: https://arxiv.org/abs/2303.03202

Code: https://github.com/hulianyuyy/CorrNet

视频预测(Video Prediction)

MOSO: Decomposing MOtion, Scene and Object for Video Prediction MOSO: 分解运动、场景和对象,用于视频预测

新视点合成(Novel View Synthesis)

3D Video Loops from Asynchronous Input

Zero-Shot Learning(零样本学习)

Bi-directional Distribution Alignment for Transductive Zero-Shot Learning 双向分布对齐对于导电零样本学习

Semantic Prompt for Few-Shot Learning 少样本学习语义提示

  • Paper: None
  • Code: None

立体匹配(Stereo Matching)

Iterative Geometry Encoding Volume for Stereo Matching 迭代几何编码体积用于立体匹配

Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation 联合差异和不确定性估计中,学习立体匹配中错误分布

特征匹配(Feature Matching)

Adaptive Spot-Guided Transformer for Consistent Local Feature Matching 适应性空间引导的Transformer模型用于一致的局部特征匹配

场景图生成(Scene Graph Generation)

Prototype-based Embedding Network for Scene Graph Generation 基于原型表征的网络用于场景图生成

隐式神经表示(Implicit Neural Representations)

Polynomial Implicit Neural Representations For Large Diverse Datasets 多项式隐式神经表示对于大规模多样化数据集

图像质量评价(Image Quality Assessment)

Re-IQA: Unsupervised Learning for Image Quality Assessment in the Wild 重新计算:在野外进行图像质量评估的无监督学习

数据集(Datasets)

Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes 人类艺术:一个将自然和人工场景结合在一起的多功能人类中心化数据集

Align and Attend: Multimodal Summarization with Dual Contrastive Losses 对齐和参加:具有双对比损失的多模态摘要

GeoNet: Benchmarking Unsupervised Adaptation across Geographies 地理网络:地理环境中的无监督适应性基准研究

CelebV-Text: A Large-Scale Facial Text-Video Dataset 赛博-文本:大规模面部文本-视频数据集

其他(Others)

Interactive Segmentation as Gaussian Process Classification 交互式分割作为高斯过程分类

Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger 针对深度图像压缩的暗网攻击通过自适应频率触发

SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries SplineCam:深度网络几何和决策边界精确可视化和表征

SCOTCH and SODA: A Transformer Video Shadow Detection Framework SCOTCH 和 SODA:一种用于视频阴影检测的 Transformer 框架

DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization DeepMapping2: 深度学习自监督大规模LiDAR地图优化

RelightableHands: Efficient Neural Relighting of Articulated Hand Models 可重新调整的手:高效的手部关节模型重新定位

Token Turing Machines 令牌图灵机

Single Image Backdoor Inversion via Robust Smoothed Classifiers 单张图像背景门通过鲁棒平滑分类器

To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision 是否适合:基于模型的面部重建和遮挡分割

HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics HOOD:用于衣物动力学一般建模的分层图形

A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others 挥棒子困境:简化的方法在多个层面上存在,一个简化的方法会强化其他简化的方法。

RelightableHands: Efficient Neural Relighting of Articulated Hand Models 重新定位手部模型:高效的手部关节重新定位

Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation 神经调节的Hebbian学习用于完全测试时间适应

Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression 解码对抗性样本和因果接种对鲁棒网络的因果特征

UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy UniDexGrasp: 统一机器人灵活抓取通过学习多样化的建议生成和目标条件策略

Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness 解开两正交平面为室内全景房间布局估计的二维扭曲感知

Learning Neural Parametric Head Models 学习神经参数高维头模型

A Meta-Learning Approach to Predicting Performance and Data Requirements 元学习方法预测性能和数据需求

MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision MACARONS:使用RGB在线自监督实现图层映射和覆盖预测

Masked Images Are Counterfactual Samples for Robust Fine-tuning 遮罩图像是用于稳健微调的反事实样本

HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling hairstep:使用丝线和深度图将合成头发模型转换为真实效果

Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization 分解,调整,组合:通过改变频率进行域推广的有效归一化

Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization 梯度范数感知最小化寻求一阶平滑性并改进泛化

Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples 无监督学习聚类:指向标签无关的无监督学习示例

Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes 我们所处之处和我们所关注的:基于层次和场景的查询全球图像地理定位

UniHCP: A Unified Model for Human-Centric Perceptions UniHCP:统一的人类感知模型

CUDA: Convolution-based Unlearnable Datasets CUDA:基于卷积的不学习数据集

Masked Images Are Counterfactual Samples for Robust Fine-tuning 遮蔽图像是对齐的样本用于稳健的微调

AdaptiveMix: Robust Feature Representation via Shrinking Feature Space 自适应混合:通过收缩特征空间实现稳健特征表示

Physical-World Optical Adversarial Attacks on 3D Face Recognition 针对3D人脸识别的物理世界光学对抗攻击

DPE: Disentanglement of Pose and Expression for General Video Portrait Editing DPE: pose 和表达的解码

SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation 悲伤谈话者:为时尚化音频驱动的单张图像聊天机器人学习真实的3D运动系数

Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models 内生物理概念发现与以对象为中心的预测模型

  • Paper: None
  • Code: None

Sharpness-Aware Gradient Matching for Domain Generalization 深度可分离梯度匹配在领域泛化

Mind the Label-shift for Augmentation-based Graph Out-of-distribution Generalization 注意:基于增强的图卷积自监督泛化

  • Paper: None
  • Code: None

Blind Video Deflickering by Neural Filtering with a Flawed Atlas 盲视频去抖动通过神经滤波器与有缺陷的地图

RiDDLE: Reversible and Diversified De-identification with Latent Encryptor RiDDLE:可逆和多元化的反向身份识别与潜在加密器

PoseExaminer: Automated Testing of Out-of-Distribution Robustness in Human Pose and Shape Estimation 姿势评估器:人类姿势和形状估计中的分布式外模式鲁棒性自动测试

Upcycling Models under Domain and Category Shift 领域和类别迁移下的升级模型

Modality-Agnostic Debiasing for Single Domain Generalization 模式无关的消元处理单领域泛化

Progressive Open Space Expansion for Open-Set Model Attribution 为Open-Set模型分配渐进式开放空间扩展

Dynamic Neural Network for Multi-Task Learning Searching across Diverse Network Topologies 动态神经网络在多任务学习中搜索 diverse网络拓扑

GFPose: Learning 3D Human Pose Prior with Gradient Fields GFPose: 学习3D人体姿态优先与梯度场

PRISE: Demystifying Deep Lucas-Kanade with Strongly Star-Convex Constraints for Multimodel Image Alignment PRISE:通过强星凸约束多模态图像对齐解开深度洛卡西德

Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings 轮廓2识别:从人类绘画中检测突出对象

Boundary Unlearning 边界消融

ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing 图像网络-E:通过属性编辑测试神经网络的鲁棒性

Zero-shot Model Diagnosis 零样本模型诊断

GeoNet: Benchmarking Unsupervised Adaptation across Geographies GeoNet:无监督适应性在地理空间上的基准测试

Quantum Multi-Model Fitting 量子多模型拟合

DivClust: Controlling Diversity in Deep Clustering DivClust:控制深度聚类中的多样性

Neural Volumetric Memory for Visual Locomotion Control 神经体积记忆视觉定位控制

MonoHuman: Animatable Human Neural Field from Monocular Video MonoHuman: 从单眼视频中的可动画人类神经场

Trace and Pace: Controllable Pedestrian Animation via Guided Trajectory Diffusion 追踪与速度:通过引导轨迹扩散控制的行人动画

Bridging the Gap between Model Explanations in Partially Annotated Multi-label Classification 部分注释多标签分类模型中的模型解释 gap 之间的桥梁

HyperCUT: Video Sequence from a Single Blurry Image using Unsupervised Ordering HyperCUT:从单个模糊图像中生成视频序列的无监督排序

On the Stability-Plasticity Dilemma of Class-Incremental Learning 在类别递增学习中的稳定性-塑性困境

Defending Against Patch-based Backdoor Attacks on Self-Supervised Learning 防御基于补丁的自我监督学习中的补丁后门攻击

VNE: An Effective Method for Improving Deep Representation by Manipulating Eigenvalue Distribution VNE:通过操纵特征值分布的有效方法来改进深度表示

Detecting and Grounding Multi-Modal Media Manipulation 检测和定位多模态媒体操作

Meta-causal Learning for Single Domain Generalization 元因果学习用于单领域推广

Disentangling Writer and Character Styles for Handwriting Generation 解开手写风格生成中作家与角色的风格分离

DexArt: Benchmarking Generalizable Dexterous Manipulation with Articulated Objects DexArt:使用关节对象进行灵活度基准测试

Hidden Gems: 4D Radar Scene Flow Learning Using Cross-Modal Supervision 隐藏的宝藏:使用跨模态监督的4D雷达场景流学习

Marching-Primitives: Shape Abstraction from Signed Distance Function Marching-Primitives: 从签名距离函数中提取形状

Towards Trustable Skin Cancer Diagnosis via Rewriting Model's Decision 皮肤癌诊断的可靠途径是通过重写模型的决策

About

CVPR 2023 论文和开源项目合集(标题翻译中文版)