编辑丨极市平台
CVPR2023已经放榜,今年有2360篇,接收率为25.78%。在CVPR2023正式会议召开前,为了让大家更快地获取和学习到计算机视觉前沿技术,极市对CVPR2023 最新论文进行追踪,包括分研究方向的论文、代码汇总以及论文技术直播分享。
CVPR 2023 论文分方向整理目前在极市社区持续更新中,已累计更新了381篇,项目地址:https://www.cvmart.net/community/detail/7422
以下是最近更新的 CVPR 2023 论文,包含检测、分割、人脸、视频处理、医学影像、神经网络结构、多模态、小样本学习等方向。
下载地址:https://www.cvmart.net/community/detail/7454
- 检测
- 分割
- 视频处理
- 估计
- 人脸
- 目标跟踪
- 图像&视频检索/视频理解
- 医学影像
- GAN/生成式/对抗式
- 图像生成/图像合成
- 神经网络结构设计
- 数据处理
- 模型训练/泛化
- 图像特征提取与匹配
- 视觉表征学习
- 模型评估
- 多模态学习
- 视觉预测
- 数据集
- 小样本学习/零样本学习
- 持续学习
- 迁移学习/domain/自适应
- 场景图
- 视觉定位/位姿估计
- 视觉推理/视觉问答
- 对比学习
- 强化学习
- 机器人
- 半监督学习/弱监督学习/无监督学习/自监督学习
- 其他
[1]Object-Aware Distillation Pyramid for Open-Vocabulary Object Detection
paper:https://arxiv.org/abs/2303.05892
[1]Bi3D: Bi-domain Active Learning for Cross-domain 3D Object Detection
paper:https://arxiv.org/abs/2303.05886
[2]PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
paper:https://arxiv.org/abs/2303.08129
code:https://github.com/blvlab/pimae
[3]MSF: Motion-guided Sequential Fusion for Efficient 3D Object Detection from Point Cloud Sequences
paper:https://arxiv.org/abs/2303.08316
[4]CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
paper:https://arxiv.org/abs/2303.10209
code:https://github.com/PaddlePaddle/Paddle3D
[5]Weakly Supervised Monocular 3D Object Detection using Multi-View Projection and Direction Consistency
paper:https://arxiv.org/abs/2303.08686)
[6]AeDet: Azimuth-invariant Multi-view 3D Object Detection
paper:https://arxiv.org/abs/2211.12501
code:https://github.com/fcjian/AeDet
[1]DeSTSeg: Segmentation Guided Denoising Student-Teacher for Anomaly Detection
paper:https://arxiv.org/abs/2211.11317
[1]UniDAformer: Unified Domain Adaptive Panoptic Segmentation Transformer via Hierarchical Mask Calibration
paper:https://arxiv.org/abs/2206.15083
[1]MSeg3D: Multi-modal 3D Semantic Segmentation for Autonomous Driving
paper:https://arxiv.org/abs/2303.08600
code:https://github.com/jialeli1/lidarseg3d
[2]Side Adapter Network for Open-Vocabulary Semantic Segmentation
paper:https://arxiv.org/abs/2302.12242
code:https://github.com/mendelxu/san
[3]Multi-view Inverse Rendering for Large-scale Real-world Indoor Scenes
paper:https://arxiv.org/abs/2211.10206
[1]FastInst: A Simple Query-Based Model for Real-Time Instance Segmentation
paper:https://arxiv.org/abs/2303.08594
[2]SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation
paper:https://arxiv.org/abs/2303.08578
code:https://github.com/lslrh/sim
[3]DynaMask: Dynamic Mask Selection for Instance Segmentation
paper:https://arxiv.org/abs/2303.07868
code:https://github.com/lslrh/dynamask
[1]MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
paper:https://arxiv.org/abs/2303.07815
[2]InstMove: Instance Motion for Object-centric Video Segmentation
paper:https://arxiv.org/abs/2303.08132
code:https://github.com/wjf5203/vnext
[3]Unified Mask Embedding and Correspondence Learning for Self-Supervised Video Segmentation
paper:https://arxiv.org/abs/2303.10100
[1]MobileVOS: Real-Time Video Object Segmentation Contrastive Learning meets Knowledge Distillation
paper:https://arxiv.org/abs/2303.07815
[2]InstMove: Instance Motion for Object-centric Video Segmentation
paper:https://arxiv.org/abs/2303.08132
code:https://github.com/wjf5203/vnext
[3]Video Dehazing via a Multi-Range Temporal Alignment Network with Physical Prior
paper:https://arxiv.org/abs/2303.09757
code:https://github.com/jiaqixuac/map-net
[4]Blind Video Deflickering by Neural Filtering with a Flawed Atlas
paper:https://arxiv.org/abs/2303.08120
code:https://github.com/chenyanglei/all-in-one-deflicker
[1]3D Cinemagraphy from a Single Image
paper:https://arxiv.org/abs/2303.05724
[2]VideoFusion: Decomposed Diffusion Models for High-Quality Video Generation
paper:https://arxiv.org/abs/2303.08320
code:https://github.com/modelscope/modelscope
[1]Towards High-Quality and Efficient Video Super-Resolution via Spatial-Temporal Data Overfitting
paper:https://arxiv.org/abs/2303.08331
[1]Rethinking Optical Flow from Geometric Matching Consistent Perspective
paper:https://arxiv.org/abs/2303.08384
code:https://github.com/dqiaole/matchflow
[1]Fully Self-Supervised Depth Estimation from Defocus Clue
paper:https://arxiv.org/abs/2303.10752
code:https://github.com/ehzoahis/dered
[1]Mutual Information-Based Temporal Difference Learning for Human Pose Estimation in Video
paper:https://arxiv.org/abs/2303.08475
[2]Markerless Camera-to-Robot Pose Estimation via Self-supervised Sim-to-Real Transfer
paper:https://arxiv.org/abs/2302.14338
[1]CVT-SLR: Contrastive Visual-Textual Transformation for Sign Language Recognition with Variational Alignment
paper:https://arxiv.org/abs/2303.05725
code:https://arxiv.org/abs/2303.05725
[1]DeltaEdit: Exploring Text-free Training for Text-Driven Image Manipulation
paper:https://arxiv.org/abs/2303.06285
code:https://github.com/yueming6568/deltaedit
[1]Contrastive Semi-supervised Learning for Underwater Image Restoration via Reliable Bank
paper:https://arxiv.org/abs/2303.09101
code:https://github.com/huang-shirui/semi-uir
[1]ACR: Attention Collaboration-based Regressor for Arbitrary Two-Hand Reconstruction
paper:https://arxiv.org/abs/2303.05938
code:https://github.com/zhengdiyu/arbitrary-hands-3d-reconstruction
[1]StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields
paper:https://arxiv.org/abs/2303.10598
[2]Fix the Noise: Disentangling Source Feature for Transfer Learning of StyleGAN
paper:https://arxiv.org/abs/2204.14079
code:https://github.com/LeeDongYeun/FixNoise
[1]Local Region Perception and Relationship Learning Combined with Feature Fusion for Facial Action Unit Detection
paper:https://arxiv.org/abs/2303.08545
[2]Multi Modal Facial Expression Recognition with Transformer-Based Fusion Networks and Dynamic Sampling
paper:https://arxiv.org/abs/2303.08419
[1]Robust Model-based Face Reconstruction through Weakly-Supervised Outlier Segmentation
paper:https://arxiv.org/abs/2106.09614
code:https://github.com/unibas-gravis/Occlusion-Robust-MoFA
[1]MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking
paper:https://arxiv.org/abs/2303.10404
[2]Visual Prompt Multi-Modal Tracking
paper:https://arxiv.org/abs/2303.10826
code:https://github.com/jiawen-zhu/vipt
[1]Data-Free Sketch-Based Image Retrieval
paper:https://arxiv.org/abs/2303.07775
[2]DAA: A Delta Age AdaIN operation for age estimation via binary code transformer
paper:https://arxiv.org/abs/2303.07929
[3]Dual-path Adaptation from Image to Video Transformers
paper:https://arxiv.org/abs/2303.09857
code:https://github.com/park-jungin/dualpath
[1]Dual-Stream Transformer for Generic Event Boundary Captioning
paper:https://arxiv.org/abs/2207.03038
code:https://github.com/gx77/dual-stream-transformer-for-generic-event-boundary-captioning
[1]Video Test-Time Adaptation for Action Recognition
paper:https://arxiv.org/abs/2211.15393
[1]TranSG: Transformer-Based Skeleton Graph Prototype Contrastive Learning with Structure-Trajectory Prompted Reconstruction for Person Re-Identification
paper:https://arxiv.org/abs/2303.06819
code:https://github.com/kali-hac/transg
[1]Neuron Structure Modeling for Generalizable Remote Physiological Measurement
paper:https://arxiv.org/abs/2303.05955
code:https://github.com/lupaopao/nest
[2]Unsupervised Contour Tracking of Live Cells by Mechanical and Cycle Consistency Losses
paper:https://arxiv.org/abs/2303.08364
code:https://github.com/junbongjang/contour-tracking
[3]Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification
paper:https://arxiv.org/abs/2303.08446
[2]Graph Transformer GANs for Graph-Constrained House Generation
paper:https://arxiv.org/abs/2303.08225
[1]Cross-GAN Auditing: Unsupervised Identification of Attribute Level Similarities and Differences between Pretrained Generative Models
paper:https://arxiv.org/abs/2303.10774
[1]3DQD: Generalized Deep 3D Shape Prior via Part-Discretized Diffusion Process
paper:https://arxiv.org/abs/2303.10406
code:https://github.com/colorful-liyu/3dqd
[2]A Dynamic Multi-Scale Voxel Flow Network for Video Prediction
paper:https://arxiv.org/abs/2303.09875
code:https://github.com/megvii-research/CVPR2023-DMVFN
[3]Regularized Vector Quantization for Tokenized Image Synthesis
paper:https://arxiv.org/abs/2303.06424
[1]Controllable Mesh Generation Through Sparse Latent Point Diffusion Models
paper:https://arxiv.org/abs/2303.07938
[2]Parameter is Not All You Need: Starting from Non-Parametric Networks for 3D Point Cloud Analysis
paper:https://arxiv.org/abs/2303.08134
code:https://github.com/zrrskywalker/point-nn
[3]Rotation-Invariant Transformer for Point Cloud Matching
paper:https://arxiv.org/abs/2303.08231
[4]Deep Graph-based Spatial Consistency for Robust Non-rigid Point Cloud Registration
paper:https://arxiv.org/abs/2303.09950
code:https://github.com/qinzheng93/graphscnet
[1]Masked Wavelet Representation for Compact Neural Radiance Fields
paper:https://arxiv.org/abs/2212.09069
[2]Decoupling Human and Camera Motion from Videos in the Wild
paper:https://arxiv.org/abs/2302.12827
[3]Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction
paper:https://arxiv.org/abs/2303.05937
[4]NEF: Neural Edge Fields for 3D Parametric Curve Reconstruction from Multi-view Images
paper:https://arxiv.org/abs/2303.07653
[5]PartNeRF: Generating Part-Aware Editable 3D Shapes without 3D Supervision
paper:https://arxiv.org/abs/2303.09554
[6]SDFusion: Multimodal 3D Shape Completion, Reconstruction, and Generation
paper:https://arxiv.org/abs/2212.04493
code:https://github.com/yccyenchicheng/SDFusion
[1]Robust Dynamic Radiance Fields
paper:https://arxiv.org/abs/2301.02239
[2]I2-SDF: Intrinsic Indoor Scene Reconstruction and Editing via Raytracing in Neural SDFs
paper:https://arxiv.org/abs/2303.07634
[3]MobileNeRF: Exploiting the Polygon Rasterization Pipeline for Efficient Neural Field Rendering on Mobile Architectures
paper:https://arxiv.org/abs/2208.00277
code:https://github.com/google-research/jax3d
[1]LargeKernel3D: Scaling up Kernels in 3D Sparse CNNs
paper:https://arxiv.org/abs/2206.10555
code:https://github.com/dvlab-research/largekernel3d
[1]Randomized Adversarial Training via Taylor Expansion
paper:https://arxiv.org/abs/2303.10653
code:https://github.com/alexkael/randomized-adversarial-training
[2]Alias-Free Convnets: Fractional Shift Invariance via Polynomial Activations
paper:https://arxiv.org/abs/2303.08085
code:https://github.com/hmichaeli/alias_free_convnets
[1]BiFormer: Vision Transformer with Bi-Level Routing Attention
paper:https://arxiv.org/abs/2303.08810
code:https://github.com/rayleizhu/biformer
[2]Making Vision Transformers Efficient from A Token Sparsification View
paper:https://arxiv.org/abs/2303.08685
[1]Turning Strengths into Weaknesses: A Certified Robustness Inspired Attack Framework against Graph Neural Networks
paper:https://arxiv.org/abs/2303.06199
[1]TINC: Tree-structured Implicit Neural Compression
paper:https://arxiv.org/abs/2211.06689
code:https://github.com/richealyoung/tinc
[1]On the Effects of Self-supervision and Contrastive Alignment in Deep Multi-view Clustering
paper:https://arxiv.org/abs/2303.09877
code:https://github.com/danieltrosten/deepmvc
[1]HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
paper:https://arxiv.org/abs/2303.05675
[2]Universal Instance Perception as Object Discovery and Retrieval
paper:https://arxiv.org/abs/2303.06674
code:https://github.com/MasterBin-IIAU/UNINEXT
[3]Sharpness-Aware Gradient Matching for Domain Generalization
paper:https://arxiv.org/abs/2303.10353
code:https://github.com/wang-pengfei/sagm
[2]Iterative Geometry Encoding Volume for Stereo Matching
paper:https://arxiv.org/abs/2303.06615
code:https://github.com/gangweix/igev
[1]Referring Image Matting
paper:https://arxiv.org/abs/2206.05149
code:https://github.com/jizhizili/rim
[1]MARLIN: Masked Autoencoder for facial video Representation LearnINg
paper:https://arxiv.org/abs/2211.06627
code:https://github.com/ControlNet/MARLIN
[1]TrojDiff: Trojan Attacks on Diffusion Models with Diverse Targets
paper:https://arxiv.org/abs/2303.05762
code:https://github.com/chenweixin107/trojdiff
[1]Mutilmodal Feature Extraction and Attention-based Fusion for Emotion Estimation in Videos
paper:https://arxiv.org/abs/2303.10421
code:https://github.com/xkwangcn/abaw-5th-rt-iai
[2]Emotional Reaction Intensity Estimation Based on Multimodal Data
paper:https://arxiv.org/abs/2303.09167
[3]Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers
paper:https://arxiv.org/abs/2303.09164
[4]Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning
paper:https://arxiv.org/abs/2303.05952
[1]Watch or Listen: Robust Audio-Visual Speech Recognition with Visual Corruption Modeling and Reliability Scoring
paper:https://arxiv.org/abs/2303.08536
code:https://github.com/joannahong/av-relscore
[2]CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
paper:https://arxiv.org/abs/2303.06357
code:https://arxiv.org/abs/2303.06357
[1]Lana: A Language-Capable Navigator for Instruction Following and Generation
paper:https://arxiv.org/abs/2303.08409
code:https://github.com/wxh1996/lana-vln
[1]TBP-Former: Learning Temporal Bird's-Eye-View Pyramid for Joint Perception and Prediction in Vision-Centric Autonomous Driving
paper:https://arxiv.org/abs/2303.09998
[1]A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
paper:https://arxiv.org/abs/2212.04825
code:https://github.com/facebookresearch/Whac-A-Mole
[2]MVImgNet: A Large-scale Dataset of Multi-view Images
paper:https://arxiv.org/abs/2303.06042
[3]SLOPER4D: A Scene-Aware Dataset for Global 4D Human Pose Estimation in Urban Environments
paper:https://arxiv.org/abs/2303.09095
code:https://github.com/climbingdaily/SLOPER4D
[4]A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
paper:https://arxiv.org/abs/2212.04825
code:https://github.com/facebookresearch/Whac-A-Mole
[5]MVImgNet: A Large-scale Dataset of Multi-view Images
paper:https://arxiv.org/abs/2303.06042
[1]DiGeo: Discriminative Geometry-Aware Learning for Generalized Few-Shot Object Detection
paper:https://arxiv.org/abs/2303.09674
code:https://github.com/phoenix-v/digeo
[2]Hubs and Hyperspheres: Reducing Hubness and Improving Transductive Few-shot Learning with Hyperspherical Embeddings
paper:https://arxiv.org/abs/2303.09352
code:https://github.com/uitml/nohub
[3]Bi-directional Distribution Alignment for Transductive Zero-Shot Learning
paper:https://arxiv.org/abs/2303.08698
code:https://github.com/zhicaiwww/bi-vaegan
[1]Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning
paper:https://arxiv.org/abs/2303.09483
code:https://github.com/kim-sanghwan/ancl
[1]Trainable Projected Gradient Method for Robust Fine-tuning
paper:https://arxiv.org/abs/2303.10720
[2]DA-DETR: Domain Adaptive Detection Transformer with Information Fusion
paper:https://arxiv.org/abs/2103.17084
[3]Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
paper:https://arxiv.org/abs/2203.15793
code:https://github.com/vibashan/irg-sfda
[4]Instance Relation Graph Guided Source-Free Domain Adaptive Object Detection
paper:https://arxiv.org/abs/2203.15793
code:https://github.com/vibashan/irg-sfda
[1]PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
paper:https://arxiv.org/abs/2211.16312
code:https://github.com/cvmi-lab/pla
[1]PSVT: End-to-End Multi-person 3D Pose and Shape Estimation with Progressive Video Transformers
paper:https://arxiv.org/abs/2303.09187
[2]StructVPR: Distill Structural Knowledge with Weighting Samples for Visual Place Recognition
paper:https://arxiv.org/abs/2212.00937
[1]Divide and Conquer: Answering Questions with Object Factorization and Compositional Reasoning
paper:https://arxiv.org/abs/2303.10482
code:https://github.com/szzexpoi/poem
[2]Generative Bias for Robust Visual Question Answering
paper:https://arxiv.org/abs/2208.00690
[1]Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
paper:https://arxiv.org/abs/2303.10323
code:https://github.com/mlii0117/dcl
[1]EqMotion: Equivariant Multi-agent Motion Prediction with Invariant Interaction Reasoning
paper:https://arxiv.org/abs/2303.10876
code:https://github.com/mediabrain-sjtu/eqmotion
[1]Efficient Map Sparsification Based on 2D and 3D Discretized Grids
paper:https://arxiv.org/abs/2303.10882
[1]Extracting Class Activation Maps from Non-Discriminative Features as well
paper:https://arxiv.org/abs/2303.10334
code:https://github.com/zhaozhengchen/lpcam
[2]TeSLA: Test-Time Self-Learning With Automatic Adversarial Augmentation
paper:https://arxiv.org/abs/2303.09870
code:https://github.com/devavrattomar/tesla
[3]LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding
paper:https://arxiv.org/abs/2303.09665
[4]MixTeacher: Mining Promising Labels with Mixed Scale Teacher for Semi-Supervised Object Detection
paper:https://arxiv.org/abs/2303.09061
code:https://github.com/lliuz/mixteacher
[5]Semi-supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination
paper:https://arxiv.org/abs/2303.06380
[6]Non-Contrastive Unsupervised Learning of Physiological Signals from Video
paper:https://arxiv.org/abs/2303.07944
[1]Facial Affective Analysis based on MAE and Multi-modal Information for 5th ABAW Competition
paper:https://arxiv.org/abs/2303.10849
[2]Partial Network Cloning
paper:https://arxiv.org/abs/2303.10597
code:https://github.com/jngwenye/pncloning
[3]Uncertainty-Aware Optimal Transport for Semantically Coherent Out-of-Distribution Detection
paper:https://arxiv.org/abs/2303.10449
code:https://github.com/lufan31/et-ood
[4]Adversarial Counterfactual Visual Explanations
paper:https://arxiv.org/abs/2303.09962
code:https://github.com/guillaumejs2403/ace
[5]A New Benchmark: On the Utility of Synthetic Data with Blender for Bare Supervised Learning and Downstream Domain Adaptation
paper:https://arxiv.org/abs/2303.09165
code:https://github.com/huitangtang/on_the_utility_of_synthetic_data
[6]Taming Diffusion Models for Audio-Driven Co-Speech Gesture Generation
paper:https://arxiv.org/abs/2303.09119
code:https://github.com/advocate99/diffgesture
[7]Skinned Motion Retargeting with Residual Perception of Motion Semantics & Geometry
paper:https://arxiv.org/abs/2303.08658
code:https://github.com/kebii/r2et
[8]Towards Compositional Adversarial Robustness: Generalizing Adversarial Training to Composite Semantic Perturbations
paper:https://arxiv.org/abs/2202.04235
code:https://github.com/twweeb/composite-adv
[9]Backdoor Defense via Deconfounded Representation Learning
paper:https://arxiv.org/abs/2303.06818
code:https://github.com/zaixizhang/cbd
[10]Label Information Bottleneck for Label Enhancement
paper:https://arxiv.org/abs/2303.06836
[11]LayoutDM: Discrete Diffusion Model for Controllable Layout Generation
paper:https://arxiv.org/abs/2303.08137
code:https://github.com/CyberAgentAILab/layout-dm
[12]Diversity-Aware Meta Visual Prompting
paper:https://arxiv.org/abs/2303.08138
code:https://github.com/shikiw/dam-vp
更新时间:2024-08-28
本站资料均由网友自行发布提供,仅用于学习交流。如有版权问题,请与我联系,QQ:4156828
© CopyRight 2008-2024 All Rights Reserved. Powered By bs178.com 闽ICP备11008920号-3
闽公网安备35020302034844号