[自用]目标检测综述学习

A Survey of Deep Learning-based Object Detection

2021/12/15

the purpose of object detection: locating instances of semantic objects of a certain class

*object detection and domain-specific object detection

most of the state-of-the-art object detectors utilize deep learning networks as their backbone and detection network to extract features from input images (or videos), classification and localization respectively

well-researched domains of object detection include multi-categories detection, edge detection, salient object detection, pose detection, scene text detection, face detection and pedestrain detection etc

*benchmark: 一个领域公认的基准,具体表现为领域中论文一致使用的数据集、评价指标

  • two kinds of object detectors

    two-stage: Faster R-CNN; one-stage: YOLO

    two-stage detectors have high localization and object recognition accuracy, whereas the one-stage detectors achieve high inference speed

most of backbone networks for detection are the network for classification task taking out the last FC layer

2021/12/16

Two-stage Detectors

  • R-CNN (first deep learning-based detector)
  • Fast R-CNN (use of RoI Pooling)
  • Faster R-CNN (use of region proposal network/RPN, the use of multi-scale anchors)
  • Mask R-CNN (for instance segmentation task, use of feature pyramid network/FPN, use of RoIAlign)

*N+1-way classification layer, N for object classes and 1 for background

One-stage Detectors

  • YOLO (real-time detection of full images and webcam)
  • YOLOv2 (adopt a series of design decisions from past works with novel concepts, new backbone)
  • YOLOv3 (an improved version of YOLOv2)
  • SSD (a single-shot detector for multiple categories)
  • DSSD (a modified version of SSD)
  • RetinaNet (use of focal loss)
  • M2Det (have no idea about this)
  • RefineDet (have no idea about this)

detecting an object has to state that an object belongs to a specified class and locate it in the image

the localization of an object is typically represented by a bounding box

benchmarks

  • PASCAL VOC dataset (basic)
  • MS COCO benchmark (large in images per class)
  • ImageNet (large in class num)
  • VisDrone2018 (have no idea about this)
  • OpenImages V5 (have no idea about this)
  • Recall
  • Precision
  • Average Precision (AP)
  • mean Average Precision (mAP)

deep neural network based object detection piplines:

  • image pre-processing: resize raw data and perform data augmentation
  • feature extraction: a key step for further detection
  • classification and localization: concluding classification scores and bounding box coordinates
  • post-processing: delete any weak detecting results (like NMS)

to obtain precise detection results, there exists several methods can be used alone or in combination with other methods:

  • Enhanced features: for extracting effective features from input images (like FPN, Attention)
  • Increasing localization accuracy: design a novel loss function
  • Solving negatives-positives imbalance issue: for one-stage, like hard mining / add some item in classification loss
  • Improving post-processing NMS methods
  • Combining one-stage and two-stage detectors to make good results
  • Complicated scene solutions (have no idea about this)
  • Anchor-free: still a novel direction for further research
  • Training from scratch: 有的数据集就是需要从头训练才能保证稳定以及准确性
  • Designing new architecture
  • Speeding up detection
  • Achieving Fast and Accurate Detections

typical application areas:

  • Security Field: Face detection, Pedestrain detection, Anomaly detection
  • Military field: Remote sensing OD
  • Transportation field
  • Medical field: Computer Aided Diagnosis (CAD) systems
  • Life field: Pattern detection, Image caption generation
上一篇:Spark源码——Job全流程以及DAGScheduler的Stage划分


下一篇:Optional 是个好东西,你真的会用么?