Multi-Camera Tracking Works?By: Carlton Purvis, Published on Jun 27, 2013
Being able to track people across cameras and buildings simply by using cameras and analytics is something of a 'holy grail' for the surveillance industry.
Two researchers from Carnegie Mellon University have developed just such a system that allows them use surveillance cameras to track a person's movements throughout a building, even when they are out of sight of the camera. The set-up combines three different datasets that allow them to tag and identify people and place their movements on a map. In this note, we interview one of the researchers and breakdown how the system works.
Alex Hauptmann and Shoou-I Yu undertook this research because past datasets used to test multi-camera tracking usually include at least one camera with a global view of the whole environment, which is not the case in most surveillance deployments. They were especially interested in successfully tracking people in “complex indoor environments.”
Its first live test took place in a nursing home in 2005, with narrow corridors (see the camera layout above), many walls and a moderate amount of foot traffic during the day. Hauptmann and Yu found that they could successfully track people, even without seeing them by combining data from clothing, trajectory and facial recognition. The research was just published this year.
See the demo below:
- 15 Panasonic PVDV953 3MP Cameras with MPEG2 encoders. The FoV among the cameras ranged from 30 to 60 degrees.
- One server for every four cameras -- “We encoded [footage] into MPEG2 in 15 min chunks, and sent it to a server. Nowadays you would just have the encoding in the cameras but we didn't back then," Hauptmann said.
- PittPatt Facial recognition software (before it was bought by Google)
- A tracking algorithm created to combine clothing color, trajectory, a pre-loaded 3D map of the facility, and facial recognition.
Hauptmann says, “We didn’t actually pick this [equipment] to support facial recognition or person detection so when we recorded, we recorded in as high a resolution as possible (720X480), and this is what we ended up with. For this application, it worked reasonably well.” They were able to track a subject within one meter 88 percent of the time.
Multiple Data Sources
“Trackers not using color information will have difficulty avoiding identity switches when multiple people come very close together and split up. Exploiting color information will help in disambiguating different people,” according to their final report. Therefore, color was their primary data collected to track a person throughout the facility.
Lighting changes caused colors to appear lighter or darker to different cameras so the system was set to allows cameras to make their own color determination. If it stayed within a certain range, the camera would still recognize it. Another frequent problem with the nursing home residents was that “they don’t dress in fashionably different colors,” Hauptmann said. Often many of the residents were wearing similar colors, giving the camera a challenge in distinguishing between them.
In both cases the algorithm would use trajectory to “verify” it was the same person as before. “Because you know in 3D where you think the person is, the new camera says ‘OK I’ve got the person in view and I’m taking this color determination for this person now.’" If two people appeared on camera wearing similar colors, the trajectory component would kick in and be weighted higher for identification. They later added a mutual exclusion limitation -- a person cannot be in two places at one time.
Trajectory estimation shows an expectation of where a person should be, based on their direction of travel, a 3D construction of the facility that the algorithm “sees” and other cameras. Even when a camera is not contributing information on a person, on its monitor, you can still see if a person is on the opposite side of a wall, indicated by a broken outline. Trajectories shorter than three seconds long were ignored. One limitation noted in the study would be corridors only covered by one camera.
These images show the location same person in two camera views. One camera actually sees him, the other is estimating based on other camera views and his trajectory.
Facial detection was primarily used to supplement tracking when a person could not be identified by clothing color or predicted trajectory. It was also the most time intensive (taking about six seconds to recognize a face). Additionally some camera angles were less suitable for clear face shots, and because “many of the subjects were older people so as they were walking they were often looking down.” When cameras did get a clear shot of a person’s face, that data would verify that it was the same person detected by trajectory and clothing color.
“The rate of [images using facial recognition] was not very high but there was enough information coming in to let us know it was this person again,” Hauptmann said. Only 10 percent of all of the video frames collected produced usable facial information, but “it provided a component of error correction that you didn’t get otherwise. Without facial recognition the system could track a person within one meter 58 percent of the time. The algorithm using facial recognition was able to track a person within one meter 88 percent of the time.
If you just go by color direction and trajectory, if a person stops of goes backward, you have almost no chance to keep track of them. The facial recognition gives us sort of a checkpoint that says this has to be this person here right now.”
The researchers note however that the algorithm is “not effective in very crowded sequences where each person wears the same color clothes,” presumably because of facial recognition difficulty, no reliable trajectory (people moving too slow/too erratically to estimate a path) and too much of the same color.
They present their results this week at the Computer Vision and Pattern Recognition Conference.
"We have already improved the algorithm substantially further – this project is ongoing in order to refine the technology. Currently there are no specific plans to repeat the study. This would need a significant amount of preparation and funding," Hauptmann said.
There are no plans to commercialize the software (according to university policy it is owned by the creators and the university), but it could be made available for licensing, Hauptmann says.
Hauptmann said nursing homes were the application they had in mind when they started their research. The tracking system would be a way to keep tabs on patients who could wander off or to know where people are in case of a medical emergency. This technology could be beneficial in environments like hospital wards where people need to be accounted for health and safety reasons, but also in high security areas. The first application of this that came to mind was secure government facilities. The Mark Center for example has spent millions on layers of access control to know who is going to what areas of the building. Hauptmann's technology could give them additional ability to know exactly where they are in a building, not just what door's they've gone through. Another researcher is already working on similar technology that would be usable in crowded and more large scale environments to track a person of interest.
On the negative side, the sheer density of camera coverage - 15 across a half dozen rooms, suggests significant cost and logistical constraints.
However, Carnegie Mellon was using technology from 2003 for study conducted in 2005, so it would be interesting to see how this study would turn out if replicated in 2013 with current technology and faster processing power.