3D Detection Get Started
- PASCAL3D+ (2014) [Link] 12 categories, on average 3k+ objects per category, for 3D object detection and pose estimation.
- ModelNet (2015) [Link] 127915 3D CAD models from 662 categories ModelNet10: 4899 models from 10 categories ModelNet40: 12311 models from 40 categories, all are uniformly orientated
- ShapeNet (2015) [Link] 3Million+ models and 4K+ categories. A dataset that is large in scale, well organized and richly annotated. ShapeNetCore [Link]: 51300 models for 55 categories.
- NYU Depth Dataset V2 (2012) [Link] 1449 densely labeled pairs of aligned RGB and depth images from Kinect video sequences for a variety of indoor scenes.
- SUNRGB-D 3D Object Detection Challenge [Link] 19 object categories for predicting a 3D bounding box in real world dimension Training set: 10,355 RGB-D scene images, Testing set: 2860 RGB-D images
- ScanNet (2017) [Link] An RGB-D video dataset containing 2.5 million views in more than 1500 scans, annotated with 3D camera poses, surface reconstructions, and instance-level semantic segmentations.
- Facebook House3D: A Rich and Realistic 3D Environment (2017) [Link] House3D is a virtual 3D environment which consists of 45K indoor scenes equipped with a diverse set of scene types, layouts and objects sourced from the SUNCG dataset. All 3D objects are fully annotated with category labels. Agents in the environment have access to observations of multiple modalities, including RGB images, depth, segmentation masks and top-down 2D map views.
The KITTI (Karlsruhe Institute of Technology and Toyota Technological Institute) dataset is a widely used computer vision benchmark which was released in 2012. A Volkswagen station was fitted with grayscale and color cameras, a Velodyne 3D Laser Scanner and a GPS/IMU system. They have datasets for various scenarios like urban, residential, highway, and campus.
nuTonomy scenes (nuScenes) is the first dataset to carry the full autonomous vehicle sensor suite: 6 cameras, 5 radars and 1 lidar, all with full 360 degree field ofview. nuScenes comprises 1000 scenes, each 20s long and fully annotated with 3D bound- ing boxes for 23 classes and 8 attributes. It has 7x as many annotations and 100x as many images as the pioneering KITTI dataset.
- Second: Sparsely embedded convolutional detection
- VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
- PIXOR: Real-time 3D Object Detection from Point Clouds
- PointPillars: Fast Encoders for Object Detection from Point Clouds
- PointNet: Deep learning on point sets for 3D classification and segmentation
- PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space
- PointSIFT: A SIFT-like Network Module for 3D Point Cloud Semantic Segmentation
- SO-Net: Self-Organizing Network for Point Cloud Analysis
- PointRCNN: 3D Object Proposal Generation and Detection from Point Cloud
- IPOD: Intensive Point-based Object Detector for Point Cloud
- [Deep Hough Voting for 3D Object Detection in Point Clouds][http://arxiv.org/abs/1904.09664]
BEV & Multi-View
- Complex-YOLO: An Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds
- Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds
- Joint 3D Proposal Generation and Object Detection from View Aggregation
- Multi-View 3D Object Detection Network for Autonomous Driving
Depth & Monocular
Sensor Fusion & Tracking
- Open3D (推荐)
- 之前的一次 3D 分享： https://zhuanlan.zhihu.com/p/58734240