3D Detection Get Started

3 minute read


3D Detection Get Started

1. 激光雷达和点云


  • https://zhuanlan.zhihu.com/p/33792450
  • https://pdal.io/workshop/lidar-introduction.html

2. 数据集

  • PASCAL3D+ (2014) [Link] 12 categories, on average 3k+ objects per category, for 3D object detection and pose estimation.
  • ModelNet (2015) [Link] 127915 3D CAD models from 662 categories ModelNet10: 4899 models from 10 categories ModelNet40: 12311 models from 40 categories, all are uniformly orientated
  • ShapeNet (2015) [Link] 3Million+ models and 4K+ categories. A dataset that is large in scale, well organized and richly annotated. ShapeNetCore [Link]: 51300 models for 55 categories.
  • NYU Depth Dataset V2 (2012) [Link] 1449 densely labeled pairs of aligned RGB and depth images from Kinect video sequences for a variety of indoor scenes.
  • SUNRGB-D 3D Object Detection Challenge [Link] 19 object categories for predicting a 3D bounding box in real world dimension Training set: 10,355 RGB-D scene images, Testing set: 2860 RGB-D images
  • ScanNet (2017) [Link] An RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations.
  • Facebook House3D: A Rich and Realistic 3D Environment (2017) [Link] House3D is a virtual 3D environment which consists of 45K indoor scenes equipped with a diverse set of scene types, layouts and objects sourced from the SUNCG dataset. All 3D objects are fully annotated with category labels. Agents in the environment have access to observations of multiple modalities, including RGB images, depth, segmentation masks and top-down 2D map views.


  • KITTI Benckmark

    paper link

    The KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) dataset is a widely used computer vision benchmark which was released in 2012. A Volkswagen station was fitted with grayscale and color cameras, a Velodyne 3D Laser Scanner and a GPS/IMU system. They have datasets for various scenarios like urban, residential, highway, and campus.

  • nuScenes Benckmark

    paper link

    nuTonomy scenes (nuScenes) is the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field ofview. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bound- ing boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset.

Papers List

Voxel-based Methods

Point-based Methods

BEV & Multi-View

Depth & Monocular

Sensor Fusion & Tracking



  • PCL
  • Open3D (推荐)


  • 之前的一次 3D 分享: https://zhuanlan.zhihu.com/p/58734240