10 Open Source Datasets For Autonomous Driving

Data is as the oil in the era of artificial intelligence. With the development of the automotive industry and the implementation of autonomous driving business scenarios, autonomous driving algorithms have become particularly important. Large amount high quality data is required for autonomous driving algorithms. In this article, I share 10 open source datasets for autonomous driving models.

  1. KTTI Dataset

The KITTI dataset was co-founded by Karlsruhe Institute of Technology in Germany and Toyota American Institute of Technology. It is a computer vision algorithm evaluation dataset in autonomous driving scenarios. This dataset is used to evaluate the performance of computer vision technologies such as stereo, optical flow, visual odometry, 3D object detection and 3D tracking in vehicle environments.

2. Waymo Open Dataset

The Waymo Open Dataset includes data collected by Waymo vehicles driving millions of miles in Phoenix, Arizona, Kirkland, Washington, Mountain View, California, and San Francisco, and covers day and night, dawn and dusk, sunny and rainy days in various cities and data collected while driving in suburban environments. The data sample is divided into 1,000 driving segments, and each driving segment continuously captures 20 seconds of driving data through sensors installed on Waymo vehicles, which is equivalent to capturing 200,000 frames of images using a 10Hz camera, which includes 5 customized versions of LiDAR and 5 front and side view cameras.

3. A2D2 Dataset

Audi’s large autonomous driving dataset A2D2. This dataset provides camera, LiDAR, and vehicle bus data, allowing developers and researchers to explore multimodal sensor fusion methods. The sensor suite includes six cameras and five LiDAR units for full 360-degree coverage. The data mainly comes from German streets, including RGB images, but also the corresponding 3D point cloud data. The recorded data is time-synchronized.

4. nuScenes Dataset

The nuScenes dataset is a public large-scale dataset for autonomous driving developed by the team at Motional. Motional is committed to enabling safe, reliable and accessible driverless environments. By releasing some of the data to the public, Motional aims to advance research in computer vision and autonomous driving.

5. CityScapes Dataset

CityScapes is a public dataset jointly released by the Mercedes-Benz Autonomous Driving Laboratory, the Max Planck Institute, and Darmstadt University of Technology, focusing on the semantic understanding of urban street scenes. The dataset contains 50 different cities, various stereoscopic video sequences recorded in street scenes under different seasons and weather conditions. The Cityscapes dataset has two sets of evaluation criteria: fine and coarse. The former provides 5000 finely annotated images, the latter provides 5000 finely annotated images plus 20000 coarsely annotated images.

6. DBB100K Dataset

In May 2018, Berkeley University AI Lab (BAIR) released the public driving data set BDD100K, and designed an image annotation system at the same time. The BDD100K dataset contains 100,000 high-definition videos, each about 40 seconds\720p\30 fps. The key frame is sampled at the 10th second of each video to obtain 100,000 pictures (picture size: 1280*720), and annotated. Among the 100,000 pictures, there are pictures of different weather, scenes, and time, and there are high-definition and blurred pictures, which have the characteristics of large scale and variety.

7. ApolloCar3D Dataset

The dataset contains 5,277 driving images and over 60K car instances, where each car is equipped with an industry-grade 3D CAD model with absolute model dimensions and semantically labeled keypoints. This dataset is more than 20 times larger than PASCAL3D+ and KITTI (state of the art).

8. Argoverse Dataset

The Argoverse dataset is a dataset released by Argo AI, Carnegie Mellon University, and Georgia Institute of Technology to support research on 3D Tracking and Motion Forecasting for autonomous vehicles. The dataset consists of two parts: Argoverse 3D Tracking and Argoverse Motion Forecasting.

9. H3D-HRI-US Dataset

Honda Research Institute released its Autonomous Driving Orientation Dataset in March 2019, a large-scale full-surround 3D multi-object detection and tracking dataset collected using 3D LiDAR scanners. It contains 160 crowded and highly interactive traffic scenes with a total of 1 million labeled instances in 27,721 frames. With unique dataset size, rich annotations, and complex scenes, H3D comes together to inspire research on full-surround 3D multi-object detection and tracking.

10. Lyft Dataset

The Lyft dataset is currently the largest transportation agent dataset. This dataset includes motion logs of cars, cyclists, pedestrians, and other traffic agents encountered by autonomous fleets and is ideal for training motion prediction models. Specifically include: hourly traffic agent movement. (1000+); data from 23 vehicles (16K); semantic graph annotations (15K).

About Datatang

Founded in 2011, Datatang is a professional AI data service provider and committed to providing high-quality training data and data services for global AI companies. Relying on own data resources, technical advantages and intensive data processing experiences, Datatang provides data services to 1,000+ companies and institutions worldwide.

If you need data services, please feel free to contact us: info@datatang.com




Off-the-shelf AI training data, on-demand data collection & annotation services

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How Face Recognition Works with Face Masks

“Autonomous vehicle” Science-Research, October 2021 — summary from Arxiv, Springer Nature and DOAJ

Why does 3D face recognition technology make “face swiping” safer?

“Robotics” Science-Research, December 2021, Week 3 — summary from Arxiv, Springer Nature, PubMed…

“Artificial Intelligence” Science-Research, September 2021, Week 4 — summary from ClinicalTrials.gov

“Virtual Reality” Science-Research, February 2022, Week 1 — summary from PubMed, Europe PMC…

Chatbot for Healthcare: Chatbots Can Be Money-Savers for Hospitals and Clinics

“Speech Recognition” Science-Research, November 2021, Week 4 — summary from Arxiv and Springer…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


Off-the-shelf AI training data, on-demand data collection & annotation services

More from Medium

Getting started with ROS2 — Part 2

Polar Stream: Simultaneous object detection and semantic segmentation algorithm for streaming lidar

ROS: Simultaneous Mapping and Localization with RTABmap

How Intel enhanced photorealism using machine learning techniques