Researchers at Audi have released the Audi Autonomous Driving Dataset (A2D2) for developing self-driving cars. The dataset includes camera images, LiDAR point clouds, and vehicle control information, and over 40,000 frames have been segmented and labelled for use in supervised learning. The dataset can be used for commercial purposes.
The research team described the dataset and compared it with other similar datasets in a paper published on arXiv. A2D2 was captured using six cameras, five LiDAR units, and vehicle bus data, which includes steering and throttle control state as well as speed and acceleration data. 41,277 frames contain image and point cloud labels from semantic segmentation, with every pixel assigned one of 38 labels such as "pedestrian" or "truck." 12,497 of those frames also contain 3D bounding boxes of the objects. The dataset is published under the CC BY-ND 4.0 license, which allows commercial use subject to the terms of the license. The research team says,
We release A2D2 to foster research, in keeping with our ethos of promoting innovation and actively participating in the research community.
The data collection platform was an Audi Q7 e-tron, with the six cameras and five LiDAR units rack-mounted on the roof. Three cameras faced front, one to the rear, and one to each side, providing 360◦ coverage. Three LiDAR units faced front and two to the rear. The LiDAR scan patterns were setup to provide maximum overlap with the camera images, which allows them to scan large areas above the vehicle and identify tall buildings, making the dataset "particularly relevant for SLAM and 3D map generation." The dataset includes "extensive" vehicle bus data; according to the team, "to the best of our knowledge other multimodal datasets do not provide such data." The data was collected in a variety of urban and rural locations. Besides the labelled data for use in supervised learning, there are an additional 390,000 unlabeled sequential frames "suitable for self-supervised approaches."
Audi's paper compares A2D2 to several other publicly-available autonomous driving datasets, including Waymo Open Dataset (WOD) and Lyft Level 5 (LL5), which were released last year. While all three datasets were collected from comparable numbers of cameras and LiDARs, the Lyft and Waymo datasets were captured exclusively in urban sites. Lyft's dataset contains no vehicle data, while Waymo's contains only vehicle velocity, in contrast to A2D2's extensive vehicle data.
Commenters on Twitter and Reddit have also noted that A2D2's license, unlike that of most other publicly-available autonomous driving datasets, allows commercial use:
The good thing about this dataset is that, unlike KITTI, Waymo, etc, you can use this for commercial works. This is because it is licensed under CC By-ND 4.0.
A2D2, WOD, and LL5 join the growing list of datasets from both commercial and academic sources. Udacity's dataset recently made headlines when its self-driving car project's dataset was found to contain images with "thousands of unlabeled vehicles, hundreds of unlabeled pedestrians, and dozens of unlabeled cyclists." Udacity has since updated its repository to note that the data is "intended for educational purposes only," urging users to "explore newer, more complete datasets."