Why autonomous driving cannot do without the power of data?

7 min readJun 16, 2021

Recently, Xiaomi Group, China internet giant, announced to enter the smart electric vehicle industry after Baidu and Alibaba. After the epidemic eased in China last year, Pony AI and Uisee Technology immediately received financing.

Besides the original self-driving companies, Tier1, Tier2 and OEMs such as FAW, GAC, and Great Wall also begun to develop autonomous driving technology. Although we don’t know whether it is the epidemic that has promoted the development of autonomous driving, there is no doubt that autonomous driving has “again” become popular in China.

Autonomous driving needs to solve two problems of “Perception” and “Decision-making.” Perception is the use of multi-sensor fusion, usually including cameras, LiDAR, millimeter-wave radar, GPS/IMU and other equipment to perceive the road, vehicles and pedestrians. Decision-making is to teach the vehicles to determine how to act through the perceived information. So effective perception is a premise for making reliable decisions.

At present, the main way to solve the “Perception” problem is to feed piles of training data to the algorithm through supervised learning, so that the algorithm models get universal perception capabilities. The balloons on the road, the dummy, the advertisement on the car body, the reflection of the buildings, etc., which may be difficult to imagine for us, also need to be fed to the algorithm one by one, to ensure that the algorithm can correctly perceive.

As the first listed company in China’s AI data service industry, Datatang’s core business is to provide own copyright training data products, on-demand data collection and annotation services.

“The traditional data annotation to manually label a large amount of data. This way is high costly, inefficient and cannot be mass produced. We have been in the field of AI data services for more than ten years, and several of our core team members are from AI companies or institutes. Algorithm is our specialty. We have been constantly looking for automated labeling methods to improve data production efficiency through automated pre-labeling in early stage and manual quality inspection and error correction in the mid-term.” Datatang’s spokesperson said.

It is understood that Datatang has developed dozens of automatic processing and pre-labeling algorithms for image, video, speech and text, and these algorithms have been successfully applied in more than 5,000 data labeling projects.

In order to better serve customers and help them get through the enterprise data platform, data processing, data delivery and other links, Datatang has launched privatized data labeling platform Shujiajia Pro in 2019, based on years of experience.

Shujiajia Pro Platform — Customized Data Service Expert

From data access to result data delivery, from project progress to personnel management, Shujiajia Pro covers the entire life cycle of data labeling services. It is the work of Datatang’s years of experience in data services.

Shujiajia Pro integrates external plug-ins such as data processing, automatic labeling, machine quality inspection, etc. It has diverse deployment forms and humanized interfaces, which can be quickly deployed and integrated.

For different labeling tasks for each data type, Datatang has developed more than forty labeling template tools, all of which have been polished and tested for years, to meet the sophisticated needs of voice, image, text, video, and point cloud.

“Our data labeling solution supports automatic labeling, manual-assisted labeling and quality inspection. The entire project and quality control process is streamlined, and the data quality is reliable for customers.” The spokesperson from Datatang said.

Shujiajia Pro platform supports probes in the labeling process. The labeling results and answers are compared through the machine quality inspection which can achieve machine quality inspection. The data passed by the probe will be submitted to the quality inspector. The data that the probe doesn’t pass will be returned to the original annotator for repair, so as to improve the efficiency of the quality inspector. In terms of data quality control, Datatang also adopts algorithms and manual quality inspection, which greatly improves data delivery efficiency.

Today, with the rapid development of AI technology, autonomous driving data is diverse and high sensitive, even the data of some customers is highly confidential. For this reason, Shujiajia Pro supports flexible deployment methods, such as SaaS, independent deployment in the Cloud, and localized deployment, to ensure data security.

The platform can serve multiple companies at the same time through multi-tenant authority division, IP whitelisting, VPN login, link transmission encryption, etc., and truly realize “data does not go out”.

3D point cloud powers perfect data annotation

For autonomous driving data, 3D LiDAR point cloud data labeling is a very representative data task, and the data quality can directly affect the recognition results of the vehicles. In a short, 3D LiDAR point cloud labeling used to label the target objects, such as vehicles, buildings, trees, pedestrians, etc. through 3D rectangular boxes or semantic segmentation.

However, manual labeling is quite difficult to avoid errors. 3D LiDAR point cloud labeling has much higher requirements for the ability of labeling personnel, project managers, technical support and other related personnel than other type data tasks.

In order to ensure the accuracy of the data processing, Shujiajia Pro has entered multiple sets of 3D LiDAR point cloud data annotation template tools, which can used for single-frame annotation, 2D-3D joint single-frame annotation, 3D tracking annotation, 2D-3D joint tracking , 3D point cloud segmentation and other common tasks.

Meanwhile, pre-recognition and intelligent processing algorithms are also added to the template to help annotators complete more annotation tasks in a short time without worrying about quality issues. Every icon position, every function, and every algorithm in the template tools has gone through multiple projects, without any invalid and redundant designs.

1. Default ground placing

All the targets in the 3D LiDAR point cloud labeling needs to be marked on the ground, since they cannot “float” in the air. In the labeling process, the bottom of the objects to be labeled should be marked on the ground. The point needs to be attached to the ground and cannot include the part that the LiDAR hits the ground.

The “ Default ground placing” function is added to the 3D LiDAR point cloud template, which can automatically places the 3D frame on the ground, reducing the labeling time. You know, saving a second for an autonomous driving project may save hundreds of thousands to millions costs for customers.

2. Ground detection algorithm + automatic Color Rendering

As we know, missing labels is very serious problem in the labeling of 3D LiDAR data task. 3D LiDAR point cloud template can calculate the ground coordinates of the point cloud according to the plane algorithm. It will render different colors automatically according to the different distances from the ground. Annotator can judge the marked objects based on the color, reducing missing marks.

3. Interpolation Algorithm + Pre-Labeling

In the tracking task, the same ID needs to be marked in multiple consecutive frames. 3D LiDAR point cloud template has a built-in interpolation algorithm to improve the labeling efficiency. If the annotator marked the target ID in the first and fifth frames, the template tool will automatically calculate the second, third, and the forth frame and marked the ID position. Annotators will need only to slightly correct the position, which can greatly reduce repetitive work.

4. Static Tracking Algorithm

It has more advantages than the above dynamic tracking technology, which can avoid the inaccurate label or the decrease of tracking ID efficiency.

5. Target Default Size Setting

For the parts that cannot be irradiated by the LiDAR, the annotator needs to make reasonable imagination based on the upper and lower frames of the image and the type of the object. However, the length, width, height and size of each annotator’s imagination frame may be different, which could lead to errors in the frame annotation.

The function of “default size setting function” can set the default size of the target, and quickly generate a 3D frame of the default size, so that the imagination frame is no longer a random frame.

In addition, the shortcut operations in labeling processing are also the ways to improve efficiency. Shujiajia Pro also has multiple shortcut functions such as automatic welting, self-adaptive rotation, self-adaptive best 2D image, and one-key rotation direction, which can save labor, reduce repeatitive operations and the difficulty of labeling.

Datatang has accumulated numerous training dataset for autonomous driving, all of which have been authorized by the collected person, and the authorization documents are authentic and checkable.

In addition, Datatang supports customized data services, such as cockpit crew behavior collection, 2D street view data collection and labeling, multi-language and multi-crowd speech collection and labeling, etc.

In the past few years, Datatang has been practicing basic skills in order to provide perfect data services. From teams and products, to template layouts, button shortcuts, button methods, etc., all of them are tempered, and can meet highly tailored customer needs.

With a variety of template tools, flexible coordination of manpower and algorithms, and professional data services, Datatang will continue to spare no efforts to complete customers’ data processing tasks.

End

If you need data services, please feel free to contact us: info@datatang.com

Why autonomous driving cannot do without the power of data?

End

Written by Nexdata