In addition to the notification outages, People Search is also not functioning properly.
Do the alerts do anything other than send notifications on alarms? From what is covered in the report, this sounds as if it may be affecting more than just "notifications", and may be impacting some part of the analytics running in the cloud as well.
It will be interesting to see if the search function works after things are restored, particularly on video recorded during the time of the outage.
Sorry, for the first question, the alerts are email/SMS/app notifications on selected events (people, vehicles) based on a user defined schedule (always, business hours, after hours, etc). The analytics seem to be working properly, as the people are highlighted with bounding boxes in the timeline, but the search results were not populating.
It is unbelievable that any enterprise would use a cloud processing solution for analytics and alerts or a strictly web-based interface hosted remotely. This is one of many ways critical events can go unnoticed.
A simple DDOS attack on the enterprise and it renders their GSoC useless if they are using Verkada and the like. Terrorists, domestic or foreign, will have a field day with targets who deploy these systems, just like they target anything with critical vulnerabilities.
The next Mirae botnet will now make millions of people and assets vulnerable.
It is unbelievable that any enterprise would use a cloud processing solution for analytics and alerts or a strictly web-based interface hosted remotely.
"Unbelievable", no not at all. Consider that their payroll, CRM, web, email, and other critical platforms are all likely entirely outsourced to cloud-hosted platforms. Why would it be "unbelievable" that their security cameras would be any different?
It might be unadvisable for them to use cloud-based cameras, or maybe short-sighted even, but if you truly think it is "unbelievable" then you as a manufacturer are in for a very rough road ahead.
A fun exercise is watching Reddit or Twitter for posts like "Office 365/Google Drive is down, guess I'll go home!" It happens more often than you'd think, and a lot of organizations seem to be okay with it.
It is "unadvisable" for you to think that losing email, payroll, CRM, etc could result in loss of life like losing a surveillance system using analytics for critical areas can and that you wonder why it would be any different. Or maybe "unbelievable", depending on what your role in the industry is.
Let's make intrusion sensors, fire alarm pull stations, and smoke detectors have to run through the cloud, too. Let's make access control readers have to go to the cloud first, before opening a door for emergency egress. It's all the same as email and CRM.
Whatever organization hires you (or that you run) for security has a very rough road ahead.
It is "unadvisable" for you to think that losing email, payroll, CRM, etc could result in loss of life like losing a surveillance system using analytics for critical areas can and that you wonder why it would be any different.
I think you missed my point. I agree that the two kinds of platforms probably should not be compared in the same way, but I can totally see how and why end-users are doing exactly that. Calling it "unbelievable" implies that you are not understanding the buying forces of the market, and are unlikely to adapt accordingly, IMO.
If you want to minimize IT systems management demands and the resultant costs of the people that are needed to manage the multitude of networked systems, move towards cloud and fairly predicatable operating costs.
If you want to own all aspects of your IT for privacy, data integrity, security, redundancy, fault tolerance, up-time, etc, to minimize risk, you have to pay the price and do it in-house.
That's an over-simplification in two sentances of course, but that is the essense of it. The lesser of two evils or the greater of two goods.
Every institution will have it's own set of risks to mitigate and the appropriate solutions to mitigate those risks.
If one considers video as a life safety technology, do it in house or be prepared to handle the consequences of dependency on others (the cloud). If one does not consider video as a life safety technology, cloud may make sense.
Of course there's much more to this in each circumstance but, again, I think that's the essence of it.
Yes, your point was ill-communicated. I would hope that anyone who has a career in this industry knows that simplicity and outsourcing is a huge buying factor in IT. I didn't think I had to state the obvious. What is unbelievable is the fact that this buying factor has greater priority than reliability of a system used to protect lives and assets. From an ERM point of view, "unadvisable" is an inadequate adjective to describe how ignorant and dangerous this is, IMO.
The problem is that executives\board members don't understand technology from a risk perspective so whatever IT says, happens. If ABC cloud provider sells IT that their lives will be easier without sacrificing logical security, IT goes for it without thinking about the sacrifices in physical security.
Just a note that I was not engaging in the unbelievable vs undesirable kerfuffle, just trying to note a need to evaluate the dependencies and reliabilities of any given solution on the risks and mitigation of those risks in ones particular environment.
And your original post was a bunch of vague speculation wrapping up with anticipating a Mirai-style botnet that takes down corporations via their cloud-dependent surveillance cameras.
Most of the enterprises I have seen with their own GSOC span multiple locations and are not handicapped by a "simple" DDoS attack. Also, Verkada allows live local viewing, which does not have a cloud dependency.
What is unbelievable is the fact that this buying factor has greater priority than reliability of a system used to protect lives and assets.
Citation needed. What "facts" are you referencing that show buying factor is prioritized over net reliability of the system?
Update 12/13 - Email and SMS Notifications Available / No Mobile
As of the morning of 12/13 email and SMS notifications for alerts have been received from our cameras. Mobile app notifications are still not working. We reached out to Verkada for additional feedback and will update the report when it is provided.
Additionally, People Search is also working properly, however for the day(s) that it was not functioning, people detection events are still not found.
I am very curious to see how Verkada is going to address the post-mortem on this.
Typically your pipeline for events is something like: Person Detected --> Rule(s) matched --> Event data logged to database --> Event info dispatched to notification service.
I find it intriguing that the event notification stack seems to be somehow commingled with the rules stack, but still separate from the person detection algorithms.
Just from what has been posted, it feels more like a failure in the rules engine that was not identifying events and sending the event off to the notification dispatcher, but of course I could not say for sure.
It is also very common to use a third party service for the notification handling (eg: Pushed), it would be interesting to know if Verkada is claiming to have rolled (or hosted) their own push notification stack.
We do. For a few reasons and I cannot say which one is the right one more important one, I just list them all. And it's not that binary...
1) by mistake. While TDD is much more efficient, originally as a startup we choose to ship the code ASAP and just make QA manually tested in parallel with development. Little short term help, but long term tech debt. If we would do it again, we would have a much smaller QA.
2) We use OpenGL client and back in the days, it was much harder to automate.
3) Still, there are things where manual QA is more efficient(especially short term).
4) In our organization, QA also plays the role of usability testing. They have the authority to change functional specs and assign developers tasks to make things more simple for users.
These days a lot of our original QA members switched from manual QA to auto testing.
But anyway I know some companies/products do not use manual QA at all. I believe exacq would be an example form our industry. It's totally fine, even desirable.
This response is perfectly reasonable. The "sometimes it's OK not to have a QA" is not.
As you describe in the follow-up, while increasing test coverage is desirable, in practice, not having any human QA exposes an organization's customers to all types of bugs and problems.
I'd add to your list, a fifth point, that you cannot anticipate and add a test for every situation or interaction, simply because you can't think through all the possible patterns upfront that QA and users will actually try.
Sometimes I feel you are just playing devil's advocate, I can't believe you really think it's sensible to hire 100 salespeople but no one in QA...
Yes, I would hope someone inside of Verkada even randomly tried this out on staging before deploying to production. It still is funny, evidently to many of us, that it appearantly took them this long to formalize it.
That said, you originally advocated for automation:
Shortly speaking, code is 100% covered with the autotests
Again, not minimizing or detracting from the role of automation, just saying that any enterprise software, whether Network Optix or now Verkada, is going to have a formalized QA program, beyond TDD.
We have received reports of continued notification outages, with no ETA for resolution provided from Verkada. A Verkada partner told us they are still having issues and have not received a response in 3 days.
We have continued to receive email/SMS notifications, with still no mobile app notifications.
Verkada has not provided a reason or resolution for the outage, but we will update the report when we receive one, or Verkada publishes a post-mortem.
That starts to sound like they pushed a software commit that also massively changed a database schema, or altered some other fundamental underpinnings of the architecture that is very very hard to roll back.
If this was the fault of a 3rd party service provider, we would have most likely been told that as a root cause, possibly along with Verkada switching to a different 3rd party. Not to mention, we would be hearing of other platforms having notification problems for other products.
If this was the result of a small bug, patch, etc., they would have rolled that back, or fixed the issue by now.
If they were running out of server resources, they would have sold a bunch of YETI tumblers on eBay to fund some more AWS cycles :)
That leaves me to believe this is tied very tightly to core architecture, and changes that Verkada made that are very difficult, or impossible, to reverse.
I wonder if Verkada customers will ever be given the choice about how and when new changes, enhancements, etc. are rolled out to their personal instances and accounts. This is one of the major downsides of SaaS-style systems, your personal experience is only as good as the weakest link in your providers resources.