本专栏是计算机视觉方向论文收集积累,时间:2021年3月31日,来源:paper digest
欢迎关注原创公众号 【计算机视觉联盟】,回复 【西瓜书手推笔记】 可获取我的机器学习纯手推笔记!
直达笔记地址:机器学习手推笔记(GitHub地址)
1, TITLE: 3D AffordanceNet: A Benchmark for Visual Object Affordance Understanding
AUTHORS: Shengheng Deng ; Xun Xu ; Chaozheng Wu ; Ke Chen ; Kui Jia
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we present a 3D AffordanceNet dataset, a benchmark of 23k shapes from 23 semantic object categories, annotated with 18 visual affordance categories.
2, TITLE: Weakly Supervised Temporal Action Localization Through Learning Explicit Subspaces for Action and Context
AUTHORS: ZIYI LIU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this challenge, we introduce a framework that learns two feature subspaces respectively for actions and their context.
3, TITLE: Contrastive Embedding for Generalized Zero-Shot Learning
AUTHORS: Zongyan Han ; Zhenyong Fu ; Shuo Chen ; Jian Yang
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: To tackle this issue, we propose to integrate the generation model with the embedding model, yielding a hybrid GZSL framework.
4, TITLE: Learnable Graph Matching: Incorporating Graph Partitioning with Deep Feature Learning for Multiple Object Tracking
AUTHORS: Jiawei He ; Zehao Huang ; Naiyan Wang ; Zhaoxiang Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Therefore, in this paper we propose a novel learnable graph matching method to address these issues.
5, TITLE: Repopulating Street Scenes
AUTHORS: YIFAN WANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present a framework for automatically reconfiguring images of street scenes by populating, depopulating, or repopulating them with objects such as pedestrians or vehicles.
6, TITLE: Differentiable Drawing and Sketching
AUTHORS: Daniela Mihai ; Jonathon Hare
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present a bottom-up differentiable relaxation of the process of drawing points, lines and curves into a pixel raster.
7, TITLE: Deep Learning and Machine Vision for Food Processing: A Survey
AUTHORS: Lili Zhu ; Petros Spachos ; Erica Pensini ; Konstantinos Plataniotis
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we provide an overview on the traditional machine learning and deep learning methods, as well as the machine vision techniques that can be applied to the field of food processing. We present the current approaches and challenges, and the future trends.
8, TITLE: Large Scale Autonomous Driving Scenarios Clustering with Self-supervised Feature Extraction
AUTHORS: Jinxin Zhao ; Jin Fang ; Zhixian Ye ; Liangjun Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This article proposes a comprehensive data clustering framework for a large set of vehicle driving data.
9, TITLE: Generalized Organ Segmentation By Imitating One-shot Reasoning Using Anatomical Correlation
AUTHORS: HONG-YU ZHOU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we show that such process can be integrated into the one-shot segmentation task which is a very challenging but meaningful topic.
10, TITLE: Learning Parallel Dense Correspondence from Spatio-Temporal Descriptors for Efficient and Robust 4D Reconstruction
AUTHORS: Jiapeng Tang ; Dan Xu ; Kui Jia ; Lei Zhang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we present a novel pipeline to learn a temporal evolution of the 3D human shape through spatially continuous transformation functions among cross-frame occupancy fields.
11, TITLE: Differentiable Network Adaption with Elastic Search Space
AUTHORS: Shaopeng Guo ; Yujie Wang ; Kun Yuan ; Quanquan Li
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper we propose a novel network adaption method called Differentiable Network Adaption (DNA), which can adapt an existing network to a specific computation budget by adjusting the width and depth in a differentiable manner.
12, TITLE: Two-Stage Monte Carlo Denoising with Adaptive Sampling and Kernel Pool
AUTHORS: Tiange Xiang ; Hongliang Yuan ; Haozhi Huang ; Yujin Shi
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: In this paper, we tackle the problems in Monte Carlo rendering by proposing a two-stage denoiser based on the adaptive sampling strategy.
13, TITLE: High-fidelity Face Tracking for AR/VR Via Deep Lighting Adaptation
AUTHORS: LELE CHEN et. al.
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: This paper addresses previous limitations by learning a deep learning lighting model, that in combination with a high-quality 3D face tracking algorithm, provides a method for subtle and robust facial motion transfer from a regular video to a 3D photo-realistic avatar.
14, TITLE: ICE: Inter-instance Contrastive Encoding for Unsupervised Person Re-identification
AUTHORS: Hao Chen ; Benoit Lagadec ; Francois Bremond
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this issue, we propose Inter-instance Contrastive Encoding (ICE) that leverages inter-instance pairwise similarity scores to boost previous class-level contrastive ReID methods.
15, TITLE: Source-Free Domain Adaptation for Semantic Segmentation
AUTHORS: Yuang Liu ; Wei Zhang ; Jun Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To cope with this issue, we propose a source-free domain adaptation framework for semantic segmentation, namely SFDA, in which only a well-trained source model and an unlabeled target domain dataset are available for adaptation.
16, TITLE: Learning Monocular 3D Reconstruction of Articulated Categories from Motion
AUTHORS: Filippos Kokkinos ; Iasonas Kokkinos
CATEGORY: cs.CV [cs.CV, cs.GR, cs.LG]
HIGHLIGHT: In this work we use video self-supervision, forcing the consistency of consecutive 3D reconstructions by a motion-based cycle loss.
17, TITLE: Active Learning for Deep Object Detection Via Probabilistic Modeling
AUTHORS: Jiwoong Choi ; Ismail Elezi ; Hyuk-Jae Lee ; Clement Farabet ; Jose M. Alvarez
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel deep active learning approach for object detection.
18, TITLE: Kaleido-BERT: Vision-Language Pre-training on Fashion Domain
AUTHORS: MINGCHEN ZHUGE et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which introduces a novel kaleido strategy for fashion cross-modality representations from transformers.
19, TITLE: Distribution Alignment: A Unified Framework for Long-tail Visual Recognition
AUTHORS: Songyang Zhang ; *g Li ; Shipeng Yan ; Xuming He ; Jian Sun
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG]
HIGHLIGHT: Motivated by our discovery, we propose a unified distribution alignment strategy for long-tail visual recognition.
20, TITLE: Large Scale Visual Food Recognition
AUTHORS: WEIQING MIN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we introduce Food2K, which is the largest food recognition dataset with 2,000 categories and over 1 million images.Compared with existing food recognition datasets, Food2K bypasses them in both categories and images by one order of magnitude, and thus establishes a new challenging benchmark to develop advanced models for food visual representation learning.
21, TITLE: Learning Domain Invariant Representations for Generalizable Person Re-Identification
AUTHORS: YI-FAN ZHANG et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this work, we introduce causality into person ReID and propose a novel generalizable framework, named Domain Invariant Representations for generalizable person Re-Identification (DIR-ReID).
22, TITLE: Recognizing Actions in Videos from Unseen Viewpoints
AUTHORS: AJ Piergiovanni ; Michael S. Ryoo
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we show that current convolutional neural network models are unable to recognize actions from camera viewpoints not present in their training data (i.e., unseen view action recognition). Further, we introduce a new, challenging dataset for unseen view recognition and show the approaches ability to learn viewpoint invariant representations.
23, TITLE: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop
AUTHORS: HONGWEN ZHANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To address this issue, we propose a Pyramidal Mesh Alignment Feedback (PyMAF) loop to leverage a feature pyramid and rectify the predicted parameters explicitly based on the mesh-image alignment status in our deep regressor.
24, TITLE: Quantifying The Scanner-Induced Domain Gap in Mitosis Detection
AUTHORS: MARC AUBREVILLE et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: Models trained on images of the same scanner yielded an average F1 score of 0.683, while models trained on a single other scanner only yielded an average F1 score of 0.325.
25, TITLE: Self-Guided and Cross-Guided Learning for Few-Shot Segmentation
AUTHORS: Bingfeng Zhang ; Jimin Xiao ; Terry Qin
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a simple but effective self-guided learning approach, where the lost critical information is mined.
26, TITLE: Comparison of Different Convolutional Neural Network Activa-tion Functions and Methods for Building Ensembles
AUTHORS: Loris Nanni ; Gianluca Maguolo ; Sheryl Brahnam ; Michelangelo Paci
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: The objective of this study is to examine the performance of CNN ensembles made with different activation functions, including six new ones presented here: 2D Mexican ReLU, TanELU, MeLU+GaLU, Symmetric MeLU, Symmetric GaLU, and Flexible MeLU.
27, TITLE: Free-form Description Guided 3D Visual Graph Network for Object Grounding in Point Cloud
AUTHORS: MINGTAO FENG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we address all three challenges.
28, TITLE: SD-6DoF-ICLK: Sparse and Deep Inverse Compositional Lucas-Kanade Algorithm on SE(3)
AUTHORS: Timo Hinzmann ; Roland Siegwart
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: This paper introduces SD-6DoF-ICLK, a learning-based Inverse Compositional Lucas-Kanade (ICLK) pipeline that uses sparse depth information to optimize the relative pose that best aligns two images on SE(3).
29, TITLE: What Causes Optical Flow Networks to Be Vulnerable to Physical Adversarial Attacks
AUTHORS: Simon Schrodi ; Tonmoy Saikia ; Thomas Brox
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we analyze the cause of the problem and show that the lack of robustness is rooted in the classical aperture problem of optical flow estimation in combination with bad choices in the details of the network architecture.
30, TITLE: Endo-Depth-and-Motion: Localization and Reconstruction in Endoscopic Videos Using Depth Networks and Photometric Constraints
AUTHORS: David Recasens ; Jos� Lamarca ; Jos� M. F�cil ; J. M. M. Montiel ; Javier Civera
CATEGORY: cs.CV [cs.CV, cs.LG, cs.RO]
HIGHLIGHT: In this paper we present Endo-Depth-and-Motion, a pipeline that estimates the 6-degrees-of-freedom camera pose and dense 3D scene models from monocular endoscopic videos.
31, TITLE: Class-Aware Robust Adversarial Training for Object Detection
AUTHORS: Pin-Chun Chen ; Bo-Han Kung ; Jun-Cheng Chen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, to address the issue, we present a novel class-aware robust adversarial training paradigm for the object detection task.
32, TITLE: Diagonal Attention and Style-based GAN for Content-Style Disentanglement in Image Generation and Translation
AUTHORS: Gihyun Kwon ; Jong Chul Ye
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Inspired by a mathematical understanding of normalization and attention, here we present a novel hierarchical adaptive Diagonal spatial ATtention (DAT) layers to separately manipulate the spatial contents from styles in a hierarchical manner.
33, TITLE: Graph Stacked Hourglass Networks for 3D Human Pose Estimation
AUTHORS: Tianhan Xu ; Wataru Takano
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel graph convolutional network architecture, Graph Stacked Hourglass Networks, for 2D-to-3D human pose estimation tasks.
34, TITLE: Visual Room Rearrangement
AUTHORS: Luca Weihs ; Matt Deitke ; Aniruddha Kembhavi ; Roozbeh Mottaghi
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: In this paper, we propose a new dataset and baseline models for the task of Rearrangement.
35, TITLE: Rethinking Spatial Dimensions of Vision Transformers
AUTHORS: BYEONGHO HEO et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: From the successful design principles of CNN, we investigate the role of the spatial dimension conversion and its effectiveness on the transformer-based architecture.
36, TITLE: Revisiting Deep Local Descriptor for Improved Few-Shot Classification
AUTHORS: Jun He ; Richang Hong ; Xueliang Liu ; Mingliang Xu ; Meng Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end, we present a new method named \textbf{DCAP} in which we investigate how one can improve the quality of embeddings by leveraging \textbf{D}ense \textbf{C}lassification and \textbf{A}ttentive \textbf{P}ooling.
37, TITLE: CoLA: Weakly-Supervised Temporal Action Localization with Snippet Contrastive Learning
AUTHORS: Can Zhang ; Meng Cao ; Dongming Yang ; Jie Chen ; Yuexian Zou
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we argue that learning by comparing helps identify these hard snippets and we propose to utilize snippet Contrastive learning to Localize Actions, CoLA for short.
38, TITLE: FONTNET: On-Device Font Understanding and Prediction Pipeline
AUTHORS: Rakshith S ; Rishabh Khurana ; Vibhav Agarwal ; Jayesh Rajkumar Vachhani ; Guggilla Bhanodai
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: In this paper, we propose two engines: Font Detection Engine, which identifies the font style, color and size attributes of text in an image and a Font Prediction Engine, which predicts similar fonts for a query font.
39, TITLE: Learning Target Candidate Association to Keep Track of What Not to Track
AUTHORS: Christoph Mayer ; Martin Danelljan ; Danda Pani Paudel ; Luc Van Gool
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose to keep track of distractor objects in order to continue tracking the target.
40, TITLE: Causal Hidden Markov Model for Time Series Disease Forecasting
AUTHORS: Jing Li ; Botong Wu ; Xinwei Sun ; Yizhou Wang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose a causal hidden Markov model to achieve robust prediction of irreversible disease at an early stage, which is safety-critical and vital for medical treatment in early stages.
41, TITLE: Locate Then Segment: A Strong Pipeline for Referring Image Segmentation
AUTHORS: YA JING et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Previous methods usually focus on designing an implicit and recurrent feature interaction mechanism to fuse the visual-linguistic features to directly generate the final segmentation mask without explicitly modeling the localization information of the referent instances.
42, TITLE: Progressively Complementary Network for Fisheye Image Rectification Using Appearance Flow
AUTHORS: Shangrong Yang ; Chunyu Lin ; Kang Liao ; Chunjie Zhang ; Yao Zhao
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To solve these two problems, in this paper, we focus on the interpretable correction mechanism of the distortion rectification network and propose a feature-level correction scheme.
43, TITLE: Single Test Image-Based Automated Machine Learning System for Distinguishing Between Trait and Diseased Blood Samples
AUTHORS: Sahar A. Nasser ; Debjani Paul ; Suyash P. Awate
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We introduce a machine learning-based method for fully automated diagnosis of sickle cell disease of poor-quality unstained images of a mobile microscope.
44, TITLE: Pre-training Strategies and Datasets for Facial Representation Learning
AUTHORS: ADRIAN BULAT et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: To this end, we make the following 4 contributions: (a) we introduce, for the first time, a comprehensive evaluation benchmark for facial representation learning consisting of 5 important face analysis tasks. We will release code, pre-trained models and data to facilitate future research.
45, TITLE: Augmented Transformer with Adaptive Graph for Temporal Action Proposal Generation
AUTHORS: Shuning Chang ; Pichao Wang ; Fan Wang ; Hao Li ; Jiashi Feng
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To deal with these problems, we present an augmented transformer with adaptive graph network (ATAG) to exploit both long-range and local temporal contexts for TAPG.
46, TITLE: Unsupervised Learning of 3D Object Categories from Videos in The Wild
AUTHORS: PHILIPP HENZLER et. al.
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: Our goal is to learn a deep network that, given a small number of images of an object of a given category, reconstructs it in 3D.
47, TITLE: Self-supervised Image-text Pre-training With Mixed Data In Chest X-rays
AUTHORS: Xiaosong Wang ; Ziyue Xu ; Leo Tam ; Dong Yang ; Daguang Xu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we introduce an image-text pre-training framework that can learn from these raw data with mixed data inputs, i.e., paired image-text data, a mixture of paired and unpaired data.
48, TITLE: Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
AUTHORS: Antoine Miech ; Jean-Baptiste Alayrac ; Ivan Laptev ; Josef Sivic ; Andrew Zisserman
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Our objective is language-based search of large-scale image and video datasets.
49, TITLE: Is Segmentation Uncertainty Useful?
AUTHORS: Steffen Czolbe ; Kasra Arnavaz ; Oswin Krause ; Aasa Feragen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We consider two common use cases of segmentation uncertainty, namely assessment of segmentation quality and active learning.
50, TITLE: Identity-Aware CycleGAN for Face Photo-Sketch Synthesis and Recognition
AUTHORS: Yuke Fang ; Jiani Hu ; Weihong Deng
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Recently, generative adversarial networks (GANs) based methods have significantly improved the quality of image synthesis, but they have not explicitly considered the purpose of recognition.
51, TITLE: Noise-resistant Deep Metric Learning with Ranking-based Instance Selection
AUTHORS: CHANG LIU et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a noise-resistant training technique for DML, which we name Probabilistic Ranking-based Instance Selection with Memory (PRISM).
52, TITLE: Multi-modal Trajectory Prediction for Autonomous Driving with Semantic Map and Dynamic Graph Attention Network
AUTHORS: BO DONG et. al.
CATEGORY: cs.CV [cs.CV, cs.RO]
HIGHLIGHT: Inspired by people's natural habit of navigating traffic with attention to their goals and surroundings, this paper presents a unique dynamic graph attention network to solve all those challenges.
53, TITLE: Deep Gaussian Processes for Few-Shot Segmentation
AUTHORS: Joakim Johnander ; Johan Edstedt ; Martin Danelljan ; Michael Felsberg ; Fahad Shahbaz Khan
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To tackle this issue, we propose a few-shot learner formulation based on Gaussian process (GP) regression.
54, TITLE: SIMstack: A Generative Shape and Instance Model for Unordered Object Stacks
AUTHORS: ZOE LANDGRAF et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To this end we propose SIMstack, a depth-conditioned Variational Auto-Encoder (VAE), trained on a dataset of objects stacked under physics simulation.
55, TITLE: The Elastic Lottery Ticket Hypothesis
AUTHORS: XIAOHAN CHEN et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, stat.ML]
HIGHLIGHT: We conduct extensive experiments on CIFAR-10 and ImageNet, and propose a variety of strategies to tweak the winning tickets found from different networks of the same model family (e.g., ResNets).
56, TITLE: MT3: Meta Test-Time Training for Self-Supervised Test-Time Adaption
AUTHORS: Alexander Bartler ; Andre B�hler ; Felix Wiewel ; Mario D�bler ; Bin Yang
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: We combine meta-learning, self-supervision and test-time training to learn to adapt to unseen test distributions.
57, TITLE: 3D-MAN: 3D Multi-frame Attention Network for Object Detection
AUTHORS: Zetong Yang ; Yin Zhou ; Zhifeng Chen ; Jiquan Ngiam
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we present 3D-MAN: a 3D multi-frame attention network that effectively aggregates features from multiple perspectives and achieves state-of-the-art performance on Waymo Open Dataset.
58, TITLE: Learning Representational Invariances for Data-Efficient Action Recognition
AUTHORS: Yuliang Zou ; Jinwoo Choi ; Qitong Wang ; Jia-Bin Huang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we investigate various data augmentation strategies that capture different video invariances, including photometric, geometric, temporal, and actor/scene augmentations.
59, TITLE: Automated Cleanup of The ImageNet Dataset By Model Consensus, Explainability and Confident Learning
AUTHORS: Csaba Kert�sz
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: This paper describes automated heuristics based on model consensus, explainability and confident learning to correct labeling mistakes and remove ambiguous images from this dataset.
60, TITLE: Leveraging Self-Supervision for Cross-Domain Crowd Counting
AUTHORS: Weizhe Liu ; Nikita Durasov ; Pascal Fua
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: Unfortunately, due to domain shift, the resulting models generalize poorly on real imagery.
61, TITLE: Physics-based Differentiable Depth Sensor Simulation
AUTHORS: Benjamin Planche ; Rajat Vikram Singh
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: In this paper, we introduce a novel end-to-end differentiable simulation pipeline for the generation of realistic 2.5D scans, built on physics-based 3D rendering and custom block-matching algorithms.
62, TITLE: Progressive Domain Expansion Network for Single Domain Generalization
AUTHORS: LEI LI et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose a novel learning framework called progressive domain expansion network (PDEN) for single domain generalization.
63, TITLE: Grounding Physical Concepts of Objects and Events Through Dynamic Visual Reasoning
AUTHORS: ZHENFANG CHEN et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.CL, cs.LG, cs.SC]
HIGHLIGHT: In this paper, we present the Dynamic Concept Learner (DCL), a unified framework that grounds physical objects and events from video and language.
64, TITLE: Robust Audio-Visual Instance Discrimination
AUTHORS: Pedro Morgado ; Ishan Misra ; Nuno Vasconcelos
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present a self-supervised learning method to learn audio and video representations.
65, TITLE: Tasting The Cake: Evaluating Self-supervised Generalization on Out-of-distribution Multimodal MRI Data
AUTHORS: ALEX FEDOROV et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we evaluate a range of current contrastive self-supervised methods on out-of-distribution generalization in order to evaluate their applicability to medical imaging.
66, TITLE: Sign Language Production: A Review
AUTHORS: Razieh Rastgoo ; Kourosh Kiani ; Sergio Escalera ; Mohammad Sabokrou
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this survey, we review recent advances in Sign Language Production (SLP) and related areas using deep learning.
67, TITLE: Diagnosing Vision-and-Language Navigation: What Really Matters
AUTHORS: WANRONG ZHU et. al.
CATEGORY: cs.CV [cs.CV, cs.AI, cs.CL]
HIGHLIGHT: In this work, we conduct a series of diagnostic experiments to unveil agents' focus during navigation.
68, TITLE: Boundary IoU: Improving Object-Centric Image Segmentation Evaluation
AUTHORS: Bowen Cheng ; Ross Girshick ; Piotr Doll�r ; Alexander C. Berg ; Alexander Kirillov
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present Boundary IoU (Intersection-over-Union), a new segmentation evaluation measure focused on boundary quality.
69, TITLE: Multi-View Radar Semantic Segmentation
AUTHORS: Arthur Ouaknine ; Alasdair Newson ; Patrick P�rez ; Florence Tupin ; Julien Rebut
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we propose several novel architectures, and their associated losses, which analyse multiple "views" of the range-angle-Doppler radar tensor to segment it semantically.
70, TITLE: Dynamic Attention Guided Multi-Trajectory Analysis for Single Object Tracking
AUTHORS: XIAO WANG et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we propose to introduce more dynamics by devising a dynamic attention-guided multi-trajectory tracking strategy.
71, TITLE: Broaden Your Views for Self-Supervised Video Learning
AUTHORS: ADRI� RECASENS et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We introduce BraVe, a self-supervised learning framework for video.
72, TITLE: Deep Regression on Manifolds: A 3D Rotation Case Study
AUTHORS: Romain Br�gier
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we establish a set of properties that such mapping should satisfy to allow proper training, and illustrate it in the case of 3D rotations.
73, TITLE: Fully Convolutional Scene Graph Generation
AUTHORS: Hengyue Liu ; Ning Yan ; Masood S. Mortazavi ; Bir Bhanu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: This paper presents a fully convolutional scene graph generation (FCSGG) model that detects objects and relations simultaneously.
74, TITLE: XVFI: EXtreme Video Frame Interpolation
AUTHORS: Hyeonjun Sim ; Jihyong Oh ; Munchurl Kim
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: XVFI: EXtreme Video Frame Interpolation
75, TITLE: Bilevel Online Adaptation for Out-of-Domain Human Mesh Reconstruction
AUTHORS: Shanyan Guan ; Jingwei Xu ; Yunbo Wang ; Bingbing Ni ; Xiaokang Yang
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: Our general idea is to dynamically fine-tune the source model on test video streams with additional temporal constraints, such that it can mitigate the domain gaps without over-fitting the 2D information of individual test frames.
76, TITLE: Fast and Accurate Normal Estimation for Point Cloud Via Patch Stitching
AUTHORS: JUN ZHOU et. al.
CATEGORY: cs.CV [cs.CV, cs.GR]
HIGHLIGHT: This paper presents an effective normal estimation method adopting multi-patch stitching for an unstructured point cloud.
77, TITLE: Temporal Memory Relation Network for Workflow Recognition from Surgical Video
AUTHORS: YUEMING JIN et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we propose a novel end-to-end temporal memory relation network (TMRNet) for relating long-range and multi-scale temporal patterns to augment the present features.
78, TITLE: Face Forensics in The Wild
AUTHORS: Tianfei Zhou ; Wenguan Wang ; Zhiyuan Liang ; Jianbing Shen
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: On existing public benchmarks, face forgery detection techniques have achieved great success. To take face forgery detection to a new level, we construct a novel large-scale dataset, called FFIW-10K, which comprises 10,000 high-quality forgery videos, with an average of three human faces in each frame.
79, TITLE: Spatiotemporal Transformer for Video-based Person Re-identification
AUTHORS: TIANYU ZHANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To solve this problem, we propose a novel pipeline where the model is pre-trained on a set of synthesized video data and then transferred to the downstream domains with the perception-constrained Spatiotemporal Transformer (STT) module and Global Transformer (GT) module.
80, TITLE: A Simple Approach for Zero-Shot Learning Based on Triplet Distribution Embeddings
AUTHORS: Vivek Chalumuri ; Bac Nguyen
CATEGORY: cs.CV [cs.CV, cs.LG]
HIGHLIGHT: We address this issue by leveraging the use of distribution embeddings.
81, TITLE: DeepWORD: A GCN-based Approach for Owner-Member Relationship Detection in Autonomous Driving
AUTHORS: ZIZHANG WU et. al.
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: To address these issues, we propose an innovative relationship prediction method, namely DeepWORD, by designing a graph convolution network (GCN).
82, TITLE: In-Place Scene Labelling and Understanding with Implicit Scene Representation
AUTHORS: Shuaifeng Zhi ; Tristan Laidlow ; Stefan Leutenegger ; Andrew J. Davison
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We show the benefit of this approach when labels are either sparse or very noisy in room-scale scenes.
83, TITLE: Benchmarking Representation Learning for Natural World Image Collections
AUTHORS: GRANT VAN HORN et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In order to facilitate progress in this area we present two new natural world visual classification datasets, iNat2021 and NeWT.
84, TITLE: Improving Robustness Against Common Corruptions with Frequency Biased Models
AUTHORS: Tonmoy Saikia ; Cordelia Schmid ; Thomas Brox
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we introduce a mixture of two expert models specializing in high and low-frequency robustness, respectively.
85, TITLE: A Multiplexed Network for End-to-End, Multilingual OCR
AUTHORS: JING HUANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose an E2E approach, Multiplexed Multilingual Mask TextSpotter, that performs script identification at the word level and handles different scripts with different recognition heads, all while maintaining a unified loss that simultaneously optimizes script identification and multiple recognition heads.
86, TITLE: Read and Attend: Temporal Localisation in Sign Language Videos
AUTHORS: G�l Varol ; Liliane Momeni ; Samuel Albanie ; Triantafyllos Afouras ; Andrew Zisserman
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: The objective of this work is to annotate sign instances across a broad vocabulary in continuous sign language. Our contributions are as follows: (1) we demonstrate the ability to leverage large quantities of continuous signing videos with weakly-aligned subtitles to localise signs in continuous sign language; (2) we employ the learned attention to automatically generate hundreds of thousands of annotations for a large sign vocabulary; (3) we collect a set of 37K manually verified sign instances across a vocabulary of 950 sign classes to support our study of sign language recognition; (4) by training on the newly annotated data from our method, we outperform the prior state of the art on the BSL-1K sign language recognition benchmark.
87, TITLE: SPatchGAN: A Statistical Feature Based Discriminator for Unsupervised Image-to-Image Translation
AUTHORS: Xuning Shao ; Weidong Zhang
CATEGORY: cs.CV [cs.CV, cs.AI, cs.LG, eess.IV]
HIGHLIGHT: For unsupervised image-to-image translation, we propose a discriminator architecture which focuses on the statistical features instead of individual patches.
88, TITLE: Depth-conditioned Dynamic Message Propagation for Monocular 3D Object Detection
AUTHORS: LI WANG et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: The objective of this paper is to learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
89, TITLE: Assessing YOLACT++ for Real Time and Robust Instance Segmentation of Medical Instruments in Endoscopic Procedures
AUTHORS: Juan Carlos Angeles Ceron ; Leonardo Chang ; Gilberto Ochoa-Ruiz ; Sharib Ali
CATEGORY: cs.CV [cs.CV, cs.AI]
HIGHLIGHT: In this paper, we propose the addition of attention mechanisms to the YOLACT architecture that allows real-time instance segmentation of instrument with improved accuracy on the ROBUST-MIS dataset.
90, TITLE: Delving Into Localization Errors for Monocular 3D Object Detection
AUTHORS: XINZHU MA et. al.
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, by intensive diagnosis experiments, we quantify the impact introduced by each sub-task and found the `localization error' is the vital factor in restricting monocular 3D detection.
91, TITLE: AGQA: A Benchmark for Compositional Spatio-Temporal Reasoning
AUTHORS: Madeleine Grunde-McLaughlin ; Ranjay Krishna ; Maneesh Agrawala
CATEGORY: cs.CV [cs.CV, cs.CL]
HIGHLIGHT: We present Action Genome Question Answering (AGQA), a new benchmark for compositional spatio-temporal reasoning. We also provide a balanced subset of $3.9M$ question answer pairs, $3$ orders of magnitude larger than existing benchmarks, that minimizes bias by balancing the answer distributions and types of question structures.
92, TITLE: Enabling Data Diversity: Efficient Automatic Augmentation Via Regularized Adversarial Training
AUTHORS: Yunhe Gao ; Zhiqiang Tang ; Mu Zhou ; Dimitris Metaxas
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: To automate medical data augmentation, we propose a regularized adversarial training framework via two min-max objectives and three differentiable augmentation models covering affine transformation, deformation, and appearance changes.
93, TITLE: Head2HeadFS: Video-based Head Reenactment with Few-shot Learning
AUTHORS: Michail Christos Doukas ; Mohammad Rami Koujan ; Viktoriia Sharmanska ; Stefanos Zafeiriou
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We propose head2headFS, a novel easily adaptable pipeline for head reenactment.
94, TITLE: DiNTS: Differentiable Neural Network Topology Search for 3D Medical Image Segmentation
AUTHORS: Yufan He ; Dong Yang ; Holger Roth ; Can Zhao ; Daguang Xu
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we focus on three important aspects of NAS in 3D medical image segmentation: flexible multi-path network topology, high search efficiency, and budgeted GPU memory usage.
95, TITLE: Adaptive Pseudo-Label Refinement By Negative Ensemble Learning for Source-Free Unsupervised Domain Adaptation
AUTHORS: Waqar Ahmed ; Pietro Morerio ; Vittorio Murino
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this work, we cast UDA as a pseudo-label refinery problem in the challenging source-free scenario.
96, TITLE: Domain-robust VQA with Diverse Datasets and Methods But No Target Labels
AUTHORS: Mingda Zhang ; Tristan Maidment ; Ahmad Diab ; Adriana Kovashka ; Rebecca Hwa
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: To emulate the setting of real-world generalization, we focus on unsupervised domain adaptation and the open-ended classification task formulation.
97, TITLE: Does It Work Outside This Benchmark? Introducing The Rigid Depth Constructor Tool, Depth Validation Dataset Construction in Rigid Scenes for The Masses
AUTHORS: Cl�ment Pinard ; Antoine Manzanera
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: We present a protocol to construct your own depth validation dataset for navigation. Using this application, we propose two new real datasets, outdoor and indoor, readily usable in UAV navigation context.
98, TITLE: Beltrami Signature: A Novel Invariant 2D Shape Representation for Object Classification
AUTHORS: Chenran Lin ; Lok Ming Lui
CATEGORY: cs.CV [cs.CV, math.CV]
HIGHLIGHT: There is a growing interest in shape analysis in recent years and in this paper we present a novel contour-based shape representation named Beltrami signature for 2D bounded simple connected domain.
99, TITLE: TransFill: Reference-guided Image Inpainting By Merging Multiple Color and Spatial Transformations
AUTHORS: Yuqian Zhou ; Connelly Barnes ; Eli Shechtman ; Sohrab Amirghodsi
CATEGORY: cs.CV [cs.CV]
HIGHLIGHT: In this paper, we propose TransFill, a multi-homography transformed fusion method to fill the hole by referring to another source image that shares scene contents with the target image.
100, TITLE: Flow-based Kernel Prior with Application to Blind Super-Resolution
AUTHORS: Jingyun Liang ; Kai Zhang ; Shuhang Gu ; Luc Van Gool ; Radu Timofte
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: To address this issue, this paper proposes a normalizing flow-based kernel prior (FKP) for kernel modeling.
101, TITLE: Using Low-rank Representation of Abundance Maps and Nonnegative Tensor Factorization for Hyperspectral Nonlinear Unmixing
AUTHORS: LIANRU GAO et. al.
CATEGORY: cs.CV [cs.CV, eess.IV]
HIGHLIGHT: In this article, we extend the linear tensor method to the nonlinear tensor method and propose a nonlinear low-rank tensor unmixing algorithm to solve the generalized bilinear model (GBM).
102, TITLE: Automating Defense Against Adversarial Attacks: Discovery of Vulnerabilities and Application of Multi-INT Imagery to Protect Deployed Models
AUTHORS: Josh Kalin ; David Noever ; Matthew Ciolino ; Dominick Hambrick ; Gerry Dozier
CATEGORY: cs.CR [cs.CR, cs.CV]
HIGHLIGHT: This work proposes an automated approach to defend these models.
103, TITLE: Online Defense of *ed Models Using Misattributions
AUTHORS: Panagiota Kiourti ; Wenchao Li ; Anirban Roy ; Karan Sikka ; Susmit Jha
CATEGORY: cs.CR [cs.CR, cs.CV, stat.ML]
HIGHLIGHT: This paper proposes a new approach to detecting neural *s on Deep Neural Networks during inference.
104, TITLE: Foveated Neural Radiance Fields for Real-Time and Egocentric Virtual Reality
AUTHORS: NIANCHEN DENG et. al.
CATEGORY: cs.GR [cs.GR, cs.CV]
HIGHLIGHT: Tailored for the future portable, low-storage, and energy-efficient VR platforms, we present the first gaze-contingent 3D neural representation and view synthesis method.
105, TITLE: HapTable: An Interactive Tabletop Providing Online Haptic Feedback for Touch Gestures
AUTHORS: Senem Ezgi Emgin ; Amirreza Aghakhani ; T. Metin Sezgin ; Cagatay Basdogan
CATEGORY: cs.HC [cs.HC, cs.CV, cs.GR, cs.MM]
HIGHLIGHT: We present HapTable; a multimodal interactive tabletop that allows users to interact with digital images and objects through natural touch gestures, and receive visual and haptic feedback accordingly.
106, TITLE: Model-Contrastive Federated Learning
AUTHORS: Qinbin Li ; Bingsheng He ; Dawn Song
CATEGORY: cs.LG [cs.LG, cs.AI, cs.CV]
HIGHLIGHT: In this paper, we propose MOON: model-contrastive federated learning.
107, TITLE: Training Sparse Neural Network By Constraining Synaptic Weight on Unit Lp Sphere
AUTHORS: Weipeng Li ; Xiaogang Yang ; Chuanxiang Li ; Ruitao Lu ; Xueli Xie
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: Here we demonstrate constraining the synaptic weights on unit Lp-sphere enables the flexibly control of the sparsity with p and improves the generalization ability of neural networks.
108, TITLE: PointBA: Towards Backdoor Attacks in 3D Point Cloud
AUTHORS: XINKE LI et. al.
CATEGORY: cs.LG [cs.LG, cs.CV]
HIGHLIGHT: We present the backdoor attacks in 3D with a unified framework that exploits the unique properties of 3D data and networks.
109, TITLE: Reconstructing Interactive 3D Scenes By Panoptic Mapping and CAD Model Alignments
AUTHORS: MUZHI HAN et. al.
CATEGORY: cs.RO [cs.RO, cs.AI, cs.CV]
HIGHLIGHT: In this paper, we rethink the problem of scene reconstruction from an embodied agent's perspective: While the classic view focuses on the reconstruction accuracy, our new perspective emphasizes the underlying functions and constraints such that the reconstructed scenes provide \em{actionable} information for simulating \em{interactions} with agents.
110, TITLE: Detecting and Mapping Trees in Unstructured Environments with A Stereo Camera and Pseudo-Lidar
AUTHORS: Brian H. Wang ; Carlos Diaz-Ruiz ; Jacopo Banfi ; Mark Campbell
CATEGORY: cs.RO [cs.RO, cs.CV, eess.IV]
HIGHLIGHT: We present a method for detecting and mapping trees in noisy stereo camera point clouds, using a learned 3-D object detector. We generate detector training data with a novel automatic labeling process that clusters a fused global point cloud. We collect a data set for tree detection consisting of 8680 stereo point clouds, and validate our method on an outdoors test sequence.
111, TITLE: A Tutorial on $\mathbf{SE}(3)$ Transformation Parameterizations and On-manifold Optimization
AUTHORS: Jos� Luis Blanco-Claraco
CATEGORY: cs.RO [cs.RO, cs.CV]
HIGHLIGHT: A Tutorial on $\mathbf{SE}(3)$ Transformation Parameterizations and On-manifold Optimization
112, TITLE: Environmental Sound Analysis with Mixup Based Multitask Learning and Cross-task Fusion
AUTHORS: Weiping Zheng ; Dacan Jiang ; Gansen Zhao
CATEGORY: cs.SD [cs.SD, cs.CV, cs.MM, eess.AS]
HIGHLIGHT: In this letter, a two-stage method is proposed for the above tasks.
113, TITLE: Iterative Gradient Encoding Network with Feature Co-Occurrence Loss for Single Image Reflection Removal
AUTHORS: Sutanu Bera ; Prabir Kumar Biswas
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this study, we proposed an iterative gradient encoding network for single image reflection removal.
114, TITLE: Assessing The Role of Random Forests in Medical Image Segmentation
AUTHORS: Dennis Hartmann ; Dominik M�ller ; I�aki Soto-Rey ; Frank Kramer
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: For this purpose, two random forest approaches were compared with a state-of-the-art deep convolutional neural network.
115, TITLE: Is Image-to-Image Translation The Panacea for Multimodal Image Registration? A Comparative Study
AUTHORS: Jiahao Lu ; Johan �fverstedt ; Joakim Lindblad ; Nata?a Sladoje
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: We conduct an empirical study of the applicability of modern I2I translation methods for the task of multimodal biomedical image registration.
116, TITLE: Machine Learning Method for Light Field Refocusing
AUTHORS: Eisa Hedayati ; Timothy C. Havens ; Jeremy P. Bos
CATEGORY: eess.IV [eess.IV, cs.CV, cs.GR, cs.LG]
HIGHLIGHT: In this paper we introduce a machine learning based refocusing technique that is capable of extracting 16 refocused images with refocusing parameters of \alpha=0.125,0.250,0.375,...,2.0 in real-time.
117, TITLE: Adversarially Learned Iterative Reconstruction for Imaging Inverse Problems
AUTHORS: Subhadip Mukherjee ; Ozan �ktem ; Carola-Bibiane Sch�nlieb
CATEGORY: eess.IV [eess.IV, cs.CV, cs.LG]
HIGHLIGHT: Motivated by the maximum-likelihood principle, we propose an unsupervised learning framework for solving ill-posed inverse problems.
118, TITLE: DualNorm-UNet: Incorporating Global and Local Statistics for Robust Medical Image Segmentation
AUTHORS: Junfei Xiao ; Lequan Yu ; Lei Xing ; Alan Yuille ; Yuyin Zhou
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: In this paper, we propose to incorporate the semantic class information into normalization layers, so that the activations corresponding to different regions (i.e., classes) can be modulated differently.
119, TITLE: Automatic Airway Segmentation from Computed Tomography Using Robust and Efficient 3-D Convolutional Neural Networks
AUTHORS: A. Garcia-Uceda Juarez ; R. Selvan ; Z. Saghir ; H. A. W. M. Tiddens ; M. de Bruijne
CATEGORY: eess.IV [eess.IV, cs.CV]
HIGHLIGHT: This paper presents a fully automatic and end-to-end optimised airway segmentation method for thoracic computed tomography, based on the U-Net architecture.