4 Open Source Person Re-ID Training Datasets for Your ML Project
Person ReID (short for Re-identification) is a technology that uses computer vision technology to detect whether there is a specific pedestrian in an image or video. It is a sub-task of image retrieval. The concept of ReID was first proposed at the CVPR conference in 2006. In short, ReID can replace face recognition to find the object in the video when the face cannot be captured by cameras.
Public safety has become a common topic of concern to the whole society, and the video surveillance system has also been widely popularized. With 24-hour uninterrupted surveillance video and thousands of cameras, ReID technology is a powerful tool for the public security to solve cases.
In a public place, children accidentally get separated from their parents. If the children are too young to understand the voice broadcast, ReID technology can be used at this time. Parents provide a picture of the child, and they can find the picture of the child in all the surveillance cameras in the current scene in real time. It will be very helpful to find the child immediately.
4 Open Source Person Re-ID Datasets
The Market-1501 dataset was collected on the campus of Tsinghua University, and it includes 1501 pedestrians captured by 6 cameras and 32668 detected pedestrian rectangles. Each pedestrian is captured by at least 2 cameras and may have multiple images in one camera. The training set has 751 people, containing 12,936 images, and each person has an average of 17.2 training data; the test set has 750 people, containing 19,732 images, and each person has an average of 26.3 test data. The pedestrian detection rectangles of the 3368 query images are drawn manually, while the pedestrian detection rectangles in the gallery are detected using the DPM detector.
The DukeMTMC dataset is a large-scale labeled multi-target multi-camera pedestrian tracking dataset publicly available from Duke University. It provides a new large-scale high-definition video dataset recorded by 8 simultaneous cameras, with more than 7,000 single-camera trajectories and more than 2,700 independent people. DukeMTMC-reID is a subset of DukeMTMC dataset for pedestrian re-identification, and provides manually annotated bounding box.
CUHK03 is the first large-scale person re-identification dataset for deep learning, with images collected on the campus of the Chinese University of Hong Kong (CUHK). The data is stored in the MAT file format of “cuhk-03.mat” and contains 1467 different characters, which were collected by 5 pairs of cameras.
In CVPR2018, a new large-scale dataset MSMT17, which is closer to the real scene, is proposed, namely Multi-Scene Multi-Time, which covers multiple scenes and multiple time periods. The dataset uses a network of 15 cameras installed on campus, including 12 outdoor cameras and 3 indoor cameras. To capture raw surveillance video, 4 days of the month with different weather conditions were selected. Three hours of video are collected every day, covering three time periods: morning, noon, and afternoon. The original video is 180 hours long.
In general, the ReID algorithm for the above open source datasets has achieved high performance, especially for multi-angle human body recognition. However, there are still many difficulties:
● Scene occlusion or truncation
In real scenes such as shopping malls and streets, the human body is usually occluded by objects or other human bodies in the scene, and the human body is cut off by the edge of the screen when the human body is at the edge of the screen. The incomplete human body features make it difficult for the algorithm to identify.
● The same person changes clothes
When ReID recognizes the identity of the target person, it relies on the characteristics of the person’s clothing. The target person changes clothes of different colors and styles, and the algorithm will significantly reduce the performance.
● Different people wear the same clothes
If people with similar height and weight wear the same clothes, for example, school students wear uniform school uniforms, workers wear specific work clothes, etc., the clothing characteristics of different people is very similar, and that will also cause great interference to algorithm recognition.
● Changes in human movements
In addition to clothing, the posture of the human body is also an important part of the characteristics of the human body. Large changes in the posture of the human body (such as squatting, crouching or other large deformation behaviors) will also affect the characteristics of the human body, resulting in a decrease in the performance of the algorithm.
Datatang’s People Re-ID Datasets
Datatang has developed person ReID datasets and helps to quickly solve the above problems. The ReID datasets include the collected data of 21,000 subjects in real scenarios and controlled construction scenarios.
● 10,000 People Real Scene Re-ID Data
The data includes 10,000 collectors in real scenes such as shopping malls, supermarkets, and communities. Each scene has an average of about 15 cameras, covering a variety of monitoring heights, monitoring shooting angles, and monitoring areas (for example, the same shopping mall has different monitoring areas) The human body information, and there are occlusion truncations that occur in real scenes.
In order to solve the identification difficulty of the same person changing different clothes and different people wearing the same clothes, the data was collected in a controlled scene, and the data hall built the collection scene by itself to form a 360-degree full-angle monitoring, a total of 12 cameras, one camera every 30 degrees.
● 1033 People Monitoring Scene Data
In order to increase the richness of human body poses, a total of 1033 people were collected in this dataset, and each subject collected 30 different poses. At the same time, in order to increase the diversity of angles, each subject collects ReID data from head-up and top-down views.
Datatang’s ReID dataset far exceeds open source resources in terms of the scale of the collected people and the number of cross-cameras, and covers a variety of scenes. In addition, Datatang’s ReID datasets are authorized by the collectors, strictly complying with ISO27701 privacy management system and ISO27001 information security management system and customers can use them with confidence.
If you want to know more details about the datasets or how to acquire, please feel free to contact us: firstname.lastname@example.org