How to Build a Voiceprint System? In View of Datasets
Like fingerprints, voiceprints are unique and difficult to imitate under normal circumstances. Voiceprint recognition technology is a technology that extracts identity feature information and voiceprint features, converts acoustic signals into electrical signals, and uses computers to compare and identify them through relevant algorithms.
The process of voiceprint recognition is roughly divided into two steps, voice feature parameter extraction and matching recognition judgment.
- Voice Feature Parameter Extraction
Simply put, it is to extract the characteristic parameters of specific organ structure and behavioral habits from the speaker’s speech. The characteristic of this parameter is that it is relatively stable, will not change too obviously with time or environmental changes, is not easy to imitate, and has strong noise immunity.
2. Pattern Matching Recognition and Judgment
After obtaining the speech feature parameters, according to certain criteria, the unrecognized feature parameters are matched with the model trained in the model library, and finally the best matching result is obtained according to the similarity and output. Among them, several models commonly used in model matching include vectorized model, random model, neural network model and so on.
Thanks to the maturity of machine learning technology and the advancement of sensors, microphones, communication channels and other technologies, although voiceprint recognition cannot ensure 100% identification of fraud, the success rate is still high.
Of course, the continuous improvement of voiceprint recognition accuracy is also inseparable from the support of training data. Datatang has launched a series of voiceprint datasets, including multiple application scenarios and multiple languages. These datasets are ready to go and help to improve customers’ models right away.
831 Hours–Mobile Telephony British English Speech Data, which is recorded by 1651 native British speakers. The recording contents cover many categories such as generic, interactive, in-car and smart home. The texts are manually proofreaded to ensure a high accuracy rate. The database matchs the Android system and IOS.
1842 American native speakers participated in the recording with authentic accent. The recorded script is designed by linguists, based on scenes, and cover a wide range of topics including generic, interactive, on-board and home. The text is manually proofread with high accuracy. It matches with mainstream Android and Apple system phones.
Each person’s time span is very long, which can better cover the sound features of a person in different periods and different states.
The data collected 203 Taiwan people, covering Taipei, Kaohsiung, Taichung, Tainan, etc. 137 females, 66 males. It is recorded in quiet indoor environment. It can be used in speech recognition, machine translation, voiceprint recognition model training and algorithm research.
11,010 Chinese native speakers participated in the recording with equal gender. Each speaker reads 30 sentences of 4 -8 digit number.
205 People Accent Mandarin Speech Data in Noise Environment _ G. Speakers recorded their speech in accent mandarin in various daily scenarios in noisy environment. The recordings cover categories like in-car scene, smart home, smart speech assistant. It be used for speech recognition acoustics language model training and algorithm research, machine translation corpus construction, voiceprint recognition model training and algorithm research.
If you need data services, please feel free to contact us at firstname.lastname@example.org.