CVPR 2023 论文和开源项目合集(papers with code)!
25.78% = 2360 / 9155
CVPR2023 decisions are now available on OpenReview! This year, wereceived a record number of 9155 submissions (a 12% increase over CVPR2022), and accepted 2360 papers, for a 25.78% acceptance rate.
注1:欢迎各位大佬提交issue,分享CVPR 2023论文和开源项目!
注2:关于往年CV顶会论文以及其他优质CV论文和大盘点,详见: https://github.com/amusi/daily-paper-computer-vision
如果你想了解最新最优质的的CV论文、开源项目和学习资料,欢迎扫码加入【CVer学术交流群】!互相学习,一起进步~
Integrally Pre-Trained Transformer Pyramid Networks
Stitchable Neural Networks
- Homepage: https://snnet.github.io/
- Paper: https://arxiv.org/abs/2302.06586
- Code: https://github.com/ziplab/SN-Net
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
BiFormer: Vision Transformer with Bi-Level Routing Attention
- Paper: None
- Code: https://github.com/rayleizhu/BiFormer
DeepMAD: Mathematical Architecture Design for Deep Convolutional Neural Network
- Paper: https://arxiv.org/abs/2303.02165
- Code: https://github.com/alibaba/lightweight-neural-architecture-search
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Generic-to-Specific Distillation of Masked Autoencoders
DeltaEdit: Exploring Text-free Training for Text-driven Image Manipulation
NoPe-NeRF: Optimising Neural Radiance Field with No Pose Prior
- Home: https://nope-nerf.active.vision/
- Paper: https://arxiv.org/abs/2212.07388
- Code: None
Latent-NeRF for Shape-Guided Generation of 3D Shapes and Textures
NeRF in the Palm of Your Hand: Corrective Augmentation for Robotics via Novel-View Synthesis
- Paper: https://arxiv.org/abs/2301.08556
- Code: None
Panoptic Lifting for 3D Scene Understanding with Neural Fields
- Homepage: https://nihalsid.github.io/panoptic-lifting/
- Paper: https://arxiv.org/abs/2212.09802
- Code: None
DETRs with Hybrid Matching
- Paper: https://arxiv.org/abs/2207.13080
- Code: https://github.com/HDETR
PA&DA: Jointly Sampling PAth and DAta for Consistent NAS
Structured 3D Features for Reconstructing Relightable and Animatable Avatars
- Homepage: https://enriccorona.github.io/s3f/
- Paper: https://arxiv.org/abs/2212.06820
- Code: None
- Demo: https://www.youtube.com/watch?v=mcZGcQ6L-2s
Video Probabilistic Diffusion Models in Projected Latent Space
- Homepage: https://sihyun.me/PVDM/
- Paper: https://arxiv.org/abs/2302.07685
- Code: https://github.com/sihyun-yu/PVDM
Solving 3D Inverse Problems using Pre-trained 2D Diffusion Models
- Paper: https://arxiv.org/abs/2211.10655
- Code: None
Imagic: Text-Based Real Image Editing with Diffusion Models
- Homepage: https://imagic-editing.github.io/
- Paper: https://arxiv.org/abs/2210.09276
- Code: None
Parallel Diffusion Models of Operator and Image for Blind Inverse Problems
- Paper: https://arxiv.org/abs/2211.10656
- Code: None
DiffRF: Rendering-guided 3D Radiance Field Diffusion
- Homepage: https://sirwyver.github.io/DiffRF/
- Paper: https://arxiv.org/abs/2212.01206
- Code: None
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
HouseDiffusion: Vector Floorplan Generation via a Diffusion Model with Discrete and Continuous Denoising
- Homepage: https://aminshabani.github.io/housediffusion/
- Paper: https://arxiv.org/abs/2211.13287
- Code: https://github.com/aminshabani/house_diffusion
TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
Back to the Source: Diffusion-Driven Adaptation to Test-Time Corruption
Integrally Pre-Trained Transformer Pyramid Networks
Mask3D: Pre-training 2D Vision Transformers by Learning Masked 3D Priors
- Homepage: https://niessnerlab.org/projects/hou2023mask3d.html
- Paper: https://arxiv.org/abs/2302.14746
- Code: None
Learning Trajectory-Aware Transformer for Video Super-Resolution
Vision Transformers are Parameter-Efficient Audio-Visual Learners
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
- Paper: https://arxiv.org/abs/2303.04249
- Code: None
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
BiFormer: Vision Transformer with Bi-Level Routing Attention
- Paper: None
- Code: https://github.com/rayleizhu/BiFormer
GIVL: Improving Geographical Inclusivity of Vision-Language Models with Pre-Training Methods
- Paper: https://arxiv.org/abs/2301.01893
- Code: None
Teaching Structured Vision&Language Concepts to Vision&Language Models
- Paper: https://arxiv.org/abs/2211.11733
- Code: None
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
- Paper: https://arxiv.org/abs/2303.00040
- Code: None
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
- Paper: https://arxiv.org/abs/2303.02489
- Code: None
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
- Paper: https://arxiv.org/abs/2303.02483
- Code: None
Meta-Explore: Exploratory Hierarchical Vision-and-Language Navigation Using Scene Object Spectrum Grounding
- Homepage: https://rllab-snu.github.io/projects/Meta-Explore/doc.html
- Paper: https://arxiv.org/abs/2303.04077
- Code: None
All in One: Exploring Unified Video-Language Pre-training
Position-guided Text Prompt for Vision Language Pre-training
EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding
CapDet: Unifying Dense Captioning and Open-World Detection Pretraining
- Paper: https://arxiv.org/abs/2303.02489
- Code: None
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
DETRs with Hybrid Matching
- Paper: https://arxiv.org/abs/2207.13080
- Code: https://github.com/HDETR
Enhanced Training of Query-Based Object Detection via Selective Query Recollection
Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
Simple Cues Lead to a Strong Multi-Object Tracker
- Paper: https://arxiv.org/abs/2206.04656
- Code: None
Efficient Semantic Segmentation by Altering Resolutions for Compressed Videos
Label-Free Liver Tumor Segmentation
PolyFormer: Referring Image Segmentation as Sequential Polygon Generation
-
Code: None
Physical-World Optical Adversarial Attacks on 3D Face Recognition
DSVT: Dynamic Sparse Voxel Transformer with Rotated Sets
Super-Resolution Neural Operator
- Paper: https://arxiv.org/abs/2303.02584
- Code: https://github.com/2y7c3/Super-Resolution-Neural-Operator
Learning Trajectory-Aware Transformer for Video Super-Resolution
GALIP: Generative Adversarial CLIPs for Text-to-Image Synthesis
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting
Learning to Retain while Acquiring: Combating Distribution-Shift in Adversarial Data-Free Knowledge Distillation
- Paper: https://arxiv.org/abs/2302.14290
- Code: None
Generic-to-Specific Distillation of Masked Autoencoders
DepGraph: Towards Any Structural Pruning
Context-Based Trit-Plane Coding for Progressive Image Compression
Deep Feature In-painting for Unsupervised Anomaly Detection in X-ray Images
OReX: Object Reconstruction from Planar Cross-sections Using Neural Fields
- Paper: https://arxiv.org/abs/2211.12886
- Code: None
SparsePose: Sparse-View Camera Pose Regression and Refinement
- Paper: https://arxiv.org/abs/2211.16991
- Code: None
NeuDA: Neural Deformable Anchor for High-Fidelity Implicit Surface Reconstruction
- Paper: https://arxiv.org/abs/2303.02375
- Code: None
Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition
- Homepage: https://moygcc.github.io/vid2avatar/
- Paper: https://arxiv.org/abs/2302.11566
- Code: https://github.com/MoyGcc/vid2avatar
- Demo: https://youtu.be/EGi47YeIeGQ
To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision
- Paper: https://arxiv.org/abs/2106.09614
- Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA
Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
- Paper: https://arxiv.org/abs/2303.05937
- Code: None
3D Cinemagraphy from a Single Image
- Homepage: https://xingyi-li.github.io/3d-cinemagraphy/
- Paper: https://arxiv.org/abs/2303.05724
- Code: https://github.com/xingyi-li/3d-cinemagraphy
Revisiting Rotation Averaging: Uncertainties and Robust Losses
Lite-Mono: A Lightweight CNN and Transformer Architecture for Self-Supervised Monocular Depth Estimation
IPCC-TP: Utilizing Incremental Pearson Correlation Coefficient for Joint Multi-Agent Trajectory Prediction
- Paper: https://arxiv.org/abs/2303.00575
- Code: None
ConZIC: Controllable Zero-shot Image Captioning by Sampling-Based Polishing
- Paper: https://arxiv.org/abs/2303.02437
- Code: Node
MixPHM: Redundancy-Aware Parameter-Efficient Tuning for Low-Resource Visual Question Answering
Continuous Sign Language Recognition with Correlation Network
Paper: https://arxiv.org/abs/2303.03202
Code: https://github.com/hulianyuyy/CorrNet
MOSO: Decomposing MOtion, Scene and Object for Video Prediction
3D Video Loops from Asynchronous Input
- Homepage: https://limacv.github.io/VideoLoop3D_web/
- Paper: https://arxiv.org/abs/2303.05312
- Code: https://github.com/limacv/VideoLoop3D
Human-Art: A Versatile Human-Centric Dataset Bridging Natural and Artificial Scenes
- Paper: https://arxiv.org/abs/2303.02760
- Code: None
Interactive Segmentation as Gaussian Process Classification
- Paper: https://arxiv.org/abs/2302.14578
- Code: None
Backdoor Attacks Against Deep Image Compression via Adaptive Frequency Trigger
- Paper: https://arxiv.org/abs/2302.14677
- Code: None
SplineCam: Exact Visualization and Characterization of Deep Network Geometry and Decision Boundaries
- Homepage: http://bit.ly/splinecam
- Paper: https://arxiv.org/abs/2302.12828
- Code: None
SCOTCH and SODA: A Transformer Video Shadow Detection Framework
- Paper: https://arxiv.org/abs/2211.06885
- Code: None
DeepMapping2: Self-Supervised Large-Scale LiDAR Map Optimization
- Homepage: https://ai4ce.github.io/DeepMapping2/
- Paper: https://arxiv.org/abs/2212.06331
- None: https://github.com/ai4ce/DeepMapping2
RelightableHands: Efficient Neural Relighting of Articulated Hand Models
- Homepage: https://sh8.io/#/relightable_hands
- Paper: https://arxiv.org/abs/2302.04866
- Code: None
Token Turing Machines
- Paper: https://arxiv.org/abs/2211.09119
- Code: None
Single Image Backdoor Inversion via Robust Smoothed Classifiers
To fit or not to fit: Model-based Face Reconstruction and Occlusion Segmentation from Weak Supervision
- Paper: https://arxiv.org/abs/2106.09614
- Code: https://github.com/unibas-gravis/Occlusion-Robust-MoFA
HOOD: Hierarchical Graphs for Generalized Modelling of Clothing Dynamics
- Homepage: https://dolorousrtur.github.io/hood/
- Paper: https://arxiv.org/abs/2212.07242
- Code: https://github.com/dolorousrtur/hood
- Demo: https://www.youtube.com/watch?v=cBttMDPrUYY
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
RelightableHands: Efficient Neural Relighting of Articulated Hand Models
- Homepage: https://sh8.io/#/relightable_hands
- Paper: https://arxiv.org/abs/2302.04866
- Code: None
- Demo: https://sh8.io/static/media/teacher_video.923d87957fe0610730c2.mp4
Neuro-Modulated Hebbian Learning for Fully Test-Time Adaptation
- Paper: https://arxiv.org/abs/2303.00914
- Code: None
Demystifying Causal Features on Adversarial Examples and Causal Inoculation for Robust Network by Adversarial Instrumental Variable Regression
- Paper: https://arxiv.org/abs/2303.01052
- Code: None
UniDexGrasp: Universal Robotic Dexterous Grasping via Learning Diverse Proposal Generation and Goal-Conditioned Policy
- Paper: https://arxiv.org/abs/2303.00938
- Code: None
Disentangling Orthogonal Planes for Indoor Panoramic Room Layout Estimation with Cross-Scale Distortion Awareness
Learning Neural Parametric Head Models
- Homepage: https://simongiebenhain.github.io/NPHM)
- Paper: https://arxiv.org/abs/2212.02761
- Code: None
A Meta-Learning Approach to Predicting Performance and Data Requirements
- Paper: https://arxiv.org/abs/2303.01598
- Code: None
MACARONS: Mapping And Coverage Anticipation with RGB Online Self-Supervision
- Homepage: https://imagine.enpc.fr/~guedona/MACARONS/
- Paper: https://arxiv.org/abs/2303.03315
- Code: None
Masked Images Are Counterfactual Samples for Robust Fine-tuning
- Paper: https://arxiv.org/abs/2303.03052
- Code: None
HairStep: Transfer Synthetic to Real Using Strand and Depth Maps for Single-View 3D Hair Modeling
- Paper: https://arxiv.org/abs/2303.02700
- Code: None
Decompose, Adjust, Compose: Effective Normalization by Playing with Frequency for Domain Generalization
- Paper: https://arxiv.org/abs/2303.02328
- Code: None
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
- Paper: https://arxiv.org/abs/2303.03108
- Code: None
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples
- Paper: https://arxiv.org/abs/2301.01217
- Code: https://github.com/jiamingzhang94/Unlearnable-Clusters
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes
- Paper: https://arxiv.org/abs/2303.04249
- Code: None
UniHCP: A Unified Model for Human-Centric Perceptions
CUDA: Convolution-based Unlearnable Datasets
- Paper: https://arxiv.org/abs/2303.04278
- Code: https://github.com/vinusankars/Convolution-based-Unlearnability
Masked Images Are Counterfactual Samples for Robust Fine-tuning
- Paper: https://arxiv.org/abs/2303.03052
- Code: None
AdaptiveMix: Robust Feature Representation via Shrinking Feature Space
Physical-World Optical Adversarial Attacks on 3D Face Recognition
DPE: Disentanglement of Pose and Expression for General Video Portrait Editing
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation