Classification for AI Dialogue to Boost the Next Generation Dialogue System

In recent years, with the development of deep learning technology, AI dialogue systems have developed from the first generation based on rules and the second generation based on traditional machine learning to the third generation featuring big data and large models. The dialogue ability has undergone revolutionary changes, and it has demonstrated amazing dialogue ability on open topics.

However, as a cutting-edge technology, the AI dialogue system lacks standards, resulting in different evaluation systems and uneven levels of application. In response to this situation, Professor Huang Minlie, Deputy Director of the Laboratory of Intelligent Technology and Systems of Tsinghua University, jointly formulated the world’s first “Classification Definition of AI Dialogue System” (hereinafter referred to as “Classification Definition”) in conjunction with academia and industry scientific research institutions. The “Classification Definition” will promote the application of AI dialogue systems in the fields of virtual personal assistants, smart home, intelligent vehicle (in-vehicle voice assistant), emotional support and mental health. It will accelerate the development and application of the next generation of AI dialogue systems.

“Classification Definition” starts from the perspectives of automatic dialogue ability, dialogue quality, single/multiple scenarios, cross-scenario context dependence and natural switching ability, personification degree, active and continuous learning ability, multimodal perception and expression ability, etc. According the “Classification Definition”, the AI dialogue system is divided into 6 levels from L0 to L5. The higher the level, the smarter the level of the AI dialogue system.

“Now the AI dialogue system is on the way to L3 and L4 as a whole. It is still a certain distance from the ideal, and it will take one to two years or even longer continuous efforts.” Huang Minlie said that to move towards L4 and L5, it is necessary to break many key challenges in memory, association, reasoning, self-learning ability, etc., it is extremely challenging to make high-expressive speech synthesis. If it is applied to the metaverse, it is also necessary to make fine-grained expressions of actions and expressions.

As a world’s leading AI data service provider, Datatang has accumulated 40,000 hours conversational speech data, covering single and multi-person conversations, multiple languages and various scenerios.

American English Natural Dialogue Speech Data

2000 speakers participated in the recording and conducted face-to-face communication in a natural way. They had free discussion on a number of given topics, with a wide range of fields; the voice was natural and fluent, in line with the actual dialogue scene. Text is transferred manually, with high accuracy.

Korean Conversational Speech Data by Mobile Phone

About 700 Korean speakers participated in the recording, and conducted face-to-face communication in a natural way. They had free discussion on a number of given topics, with a wide range of fields; the voice was natural and fluent, in line with the actual dialogue scene. Text is transferred manually, with high accuracy.

Japanese Conversation Speech by Mobile Phone

About 1000 speakers participated in the recording, and conducted face-to-face communication in a natural way. They had free discussion on a number of given topics, with a wide range of fields; the voice was natural and fluent, in line with the actual dialogue scene. Text is transferred manually, with high accuracy.

German Conversational Speech Data by Mobile Phone

About 750 speakers participated in the recording, and conducted communication in a natural way. They had free discussion on a number of given topics, with a wide range of fields; the voice was natural and fluent, in line with the actual dialogue scene. Text is transferred manually, with high accuracy.

Spanish Conversational Speech Data by Mobile Phone

About 700 speakers participated in the recording, and conducted face-to-face communication in a natural way. They had free discussion on a number of given topics, with a wide range of fields; the voice was natural and fluent, in line with the actual dialogue scene. Text is transferred manually, with high accuracy.

End

If you want to know more details about the datasets or how to acquire, please feel free to contact us: info@datatang.com.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Datatang

Datatang

29 Followers

Off-the-shelf AI training data, on-demand data collection & annotation services