1) Deep Learning can be used for detection and/or for classification.. It detects in the image specific classes of targets which have been trained in specific conditions. So “what is object?”, first of all? A bag, a trolley, a box, a plant, a cleaner machine, a chair, a shadow, a sudden light reflection, the feet of a person partially hidden by a bench, ...?..
2) Deep Learning then is not able to tell if that detected object is unattended or moving and for how long.. In few words, it’s not able to tell that this specific object is the same target of some frames ago.. That’s anyway given by a tracker, which it’s totally another story..
3) Deep Learning has a great wow effect when lots of targets are detected and classified in complex areas; and of course, that’s great.. But none in this “wow” ever notes also the huge amount of missings and falses.. If you counted each object detected as an alarm, you would cry after 40 seconds.... Could you imagine in the long term and with dynamic scenarios..
4) Deep Learning works well with high resoluted targets, for having reliable features to work with. For detecting a bag in a panoramic image you may need to process a huge resolution; thus a huge amount and cost of computational power could be needed.
5) I agree with Undisc.#2 and #4 when they write about how much it’s relative the real need of a “left object detection” in real environments.. Is your customer really that sure a left object detection has a real sense and a practical value in its case? Because for my experience the 90% of times it’s requested just for “inertial fashion” and at last it would have really few sense even if it worked... If instead in this case it had effectively a sense, anyway which case/environment are you talking about? Because Brian is absolutely right as well: marketing fairy tales, hyper claims and YouTube’s nice movies asides, anyway to make work well an “unattended object detection” in real life it’s still very hot stuff in dynamically complex or crowded areas....
So, resuming, technically and theoretically it would have a sense to use deep learning as base for detection and classification, integrated with other levels of “classical” video analysis (funny for me to call it “classical”....🙄😅). But at today this would be still severely too demanding, so basically unapproachable yet in most practical applications, for too high cost/benefits ratio.
Maybe in future, but not yet now..