Subscriber Discussion

YOLO Performance From "Network Cameras In Variable Ambient Lighting Conditions"

John Poole

•Jan 01, 2019

A paper that describes a novel effort to evaluate the popular computer vision program, YOLO's (“You Only Look Once”), consistency for large-scale applications.

140 network cameras providing 24 hours of feed. Processing on 328 nodes, cluster totals 47 TB of system RAM and 3 TB of GPU RAM. See page 3, section C. Computing Clusters for details on the processing power they utilized.

Conclusion: YOLO struggles to consistently detect the same humans and cars as their positions change from one frame to the next; it also struggles to detect objects at night time. The findings suggest that state-of-the art vision solutions should be trained by data from network camera with contextual information before they can be deployed in applications that demand high consistency on object detection.

I found this fascinating as I'm not satisfied with the current paradigm I have in a simple home-based network using Reolink. The capture is satisfactory, but the monitoring and tweaking of triggers and sensitivity settings seem primitive. I've been working on a system that might allow full capture that is then time-delayed for processing foregoing the real-time experience. My objective, in part, is to be able to whittle down interesting events and to use machines to identify an executive summary. This paper seems to be along the same line; what the paper does demonstrate is the tremendous amount of processing power needed to achieve such a paradigm. Moreover, it confirms my assessment that the various lighting conditions outdoors do play an important role is object detection and that a way to introduce into the model processing known set objects as a stationary camera might have be excluded by way of a mask, if you will.

Lastly, I'm not sure if computer vision is something is appropriate for this forum so please speak up if this is too esoteric.

Reactions:

(4)

Undisclosed #1

•Jan 01, 2019

Esoteric or not, thanks for the interesting post.

You mention [time-delayed process to frame-work real time experience], my guess is that computation processes in machines(current day) are not sequential and can be processed in any direction, bit, spectrum or mathematical logic without producing a recognizable pattern. It seems the neo cortex of the human mind I-VI are able to respond(recognize) patterns consciously(effectively) in comparison to machine processes.

Perhaps gathering data with optics is efficient, perhaps triple axis gravitational field measurement would be another venture to collect real world data for the machine. I just posted in another thread the possibility of a device such as the LakeShore F71 one day becoming such an analytic tool.

Happy New Year, my last post of 2018.

Reactions:

Undisclosed #2

•Jan 01, 2019

IPVMU Certified

In reply to Undisclosed #1

Happy New Year, my last post of 2018.

California?

Reactions:

(1)

John Honovich

•Jan 01, 2019

IPVM

John, thanks for sharing this. I changed the title to make it clearer that it involved real world IP cameras and made the link easier to click.

This looks pretty relevant to video surveillance. I've asked Tyler on our side to read it. We might actually do a post summarizing and explaining the results found and the potential lessons to video surveillance.

Reactions:

(1)

John Poole

•Jan 01, 2019

John Honovich - thank you, headlines are all and your change does do justice.

Reactions:

Suresh Yendrapalli

•Jan 03, 2019

Duranc

Did they really needed that much computing power to train 13440 images.

1) Four images were uniformly sampled from each of the 140 cameras at hourly intervals, over the 24 hour period. Thus, we would have images from all cameras spanning a full day. In total, 13,440 (4×140×24) were selected.

2) The 13,440 images were reviewed to eliminate a few low quality images (e.g., excessively grainy images, etc.) Only images that contain clearly visible humans and cars are selected. This was done completely by hand and resulted in the removal of 400 images.

Reactions:

(1)

New discussion

Ask questions and get answers to your physical security questions from IPVM team members and fellow subscribers.

Newest discussions