Verkada Notification Outage

By Sean Patton, Published Dec 12, 2019, 10:52am EST (Info+)

Verkada is suffering an event notification outage and analytic search failures.

Inside, we examine what the issues are, what Verkada told IPVM and the risks of these outages for VSaaS users.

Update **/** - ***** *** *** ************* ********* / ** ******

** ** *** ******* ** **/** email *** *** ************* *** ****** have **** ******** **** *** *******. Mobile *** ************* *** ***** *** working. ** ******* *** ** ******* for ********** ******** *** **** ****** the ****** **** ** ** ********.

************, ****** ****** ** **** ******* properly, ******* *** *** ***(*) **** it *** *** ***********, ****** ********* events *** ***** *** *****.

Outage ********

*************, *** *****, *** *** ****** app, *** *** *** *******, ********** users **** ******* ****** ***** ********** and ******** *****.

**** ********* *** ****** *** ****** at ***** * **** *** ** the **** ** **** ***********. ******* cameras *** ***** ********* ** *******'* servers *** ********* ** *********** ********, no ********* ************* ** *** ******** were **** ** *******.

*******, **** ** ********* ******* ******* on **/** ** * ** **, they ********* **** **** ***** ** the ***** *** **** ********** * fix:

******* *** ****** ** ******* ** ETA *** *** ***, *** ***** it *** *** *** ******** *** engineering:

*************, *** ******* ***** ***** **** caught *** ***** ******* ******* **** are ***** ******* ******* ** ***** homes:

** ******** * ************* ** ~* PM **, ** *** *** *****, but **** *** ******** *** *****. Verkada ****** *** ************* *** **** not *******.

****** **** ***** ** *** ***** did *** ******* *** ******:

No ******* ************

******* *** *** **** ***** ************* to ********/*** ***** ** ****** * blog ***** ***** *** ****** ** of *** ******* ** **/**, ~** hours ***** **** ****** ***** ** it. ******* **** *******.*** ******* ** ***** ** ******** 5.5 ***** ***** ***** ****** *******(* **** ***** ** *** ********).

*** ********** ***** *****.*** *********** * full ****** ******, ***** ******* ** suffering * ******* ****** ** ***** notifications *** ******** ********.

** ***** ******* **** *** ***** of *** ***** *** **** **** not ******** * ***** ********. ** will ****** **** ***************** **** ******* ***, ***** ** common ** *** ** ********.

Cloud - ***** ************ *****

*** ************ ****** ***** ***** ***** to **** ******** ****** (****** ** vehicles ** ***** ********), ********** ** remote ***** ** ***** ****** ******** hours **** ** *** *** ** actively ********** ***** *******. **** ** especially ********* *** ***** ***** *** rely ***** ** *******'* *** * ********, "***** ******** *** ******", **:

***** ***** ** ******** *********** ********* without ****** ** *********** ******* ************ around *** *****. **** ************* **** immediately ***** ************ ** ******* **********, such ** * ****** ******** ** the ******** ** * ****** ***** hours, *** ********** ** **-**** ********* is ******* ** ********** **********.

**** ** *** *** **** ***** as *** **** ** *** ******** at * ********, ***** ***** ********* alerts **** *********** ***/*** *******. ******** events *** ****** *** **** ***** VMS ******** **** ********* *** ************* have ********* *** *******, *** *** protocols ******** *** *** *********** ** complicated.

Person ********* ****** ******

** ******** ** *** ************ *******, People ****** ** **** *** *********** properly. ***** ** *** ***** ******** History ******, ****** *** ******* *** highlighted ** *** ******, ********* ** People ****** ******* "** ****** ******** within **** **** *****":

******* **** **** ***** *** ******* to *** ****** *** ******* * fix ********, ******* ** ** *** time ** **********, *** ***** ***** persists.

Cloud ** **-**** *******

******* ********* ******* ***** *** ******** of ******** ******* *** ****** **** come **** ************ *************. *** **** are ********* ***** **** ***** *** such *****.

** *** ***** ****, ** **** ongoing ******* ************ ****** *****, ***** are **** ***** **** ***'* ***** provider *** ******** **** ***** **** to ****** **** *** **** **** be ******** ** ******* ** ********.

Comments (37)

In addition to the notification outages, People Search is also not functioning properly.

Do the alerts do anything other than send notifications on alarms? From what is covered in the report, this sounds as if it may be affecting more than just "notifications", and may be impacting some part of the analytics running in the cloud as well.

It will be interesting to see if the search function works after things are restored, particularly on video recorded during the time of the outage.

Agree
Disagree
Informative
Unhelpful
Funny

Search functionality appears partially restored for detected people today, but not for some previously recorded video.

Agree
Disagree
Informative: 1
Unhelpful
Funny

Sorry, for the first question, the alerts are email/SMS/app notifications on selected events (people, vehicles) based on a user defined schedule (always, business hours, after hours, etc). The analytics seem to be working properly, as the people are highlighted with bounding boxes in the timeline, but the search results were not populating.

Agree
Disagree
Informative
Unhelpful
Funny

Thank you for continuing to report on the cons of the various technologies. Before IPVM this didn't exist.

Agree: 7
Disagree
Informative: 1
Unhelpful
Funny

It would be super beneficial if IPVM made this article public.

Agree: 12
Disagree: 1
Informative
Unhelpful: 1
Funny: 1

It is unbelievable that any enterprise would use a cloud processing solution for analytics and alerts or a strictly web-based interface hosted remotely. This is one of many ways critical events can go unnoticed.

A simple DDOS attack on the enterprise and it renders their GSoC useless if they are using Verkada and the like. Terrorists, domestic or foreign, will have a field day with targets who deploy these systems, just like they target anything with critical vulnerabilities.

The next Mirae botnet will now make millions of people and assets vulnerable.

Agree: 9
Disagree: 6
Informative
Unhelpful: 1
Funny: 2

It is unbelievable that any enterprise would use a cloud processing solution for analytics and alerts or a strictly web-based interface hosted remotely.

"Unbelievable", no not at all. Consider that their payroll, CRM, web, email, and other critical platforms are all likely entirely outsourced to cloud-hosted platforms. Why would it be "unbelievable" that their security cameras would be any different?

It might be unadvisable for them to use cloud-based cameras, or maybe short-sighted even, but if you truly think it is "unbelievable" then you as a manufacturer are in for a very rough road ahead.

Agree: 6
Disagree: 1
Informative: 3
Unhelpful
Funny: 1

A fun exercise is watching Reddit or Twitter for posts like "Office 365/Google Drive is down, guess I'll go home!" It happens more often than you'd think, and a lot of organizations seem to be okay with it.

Agree: 2
Disagree: 1
Informative
Unhelpful
Funny

It is "unadvisable" for you to think that losing email, payroll, CRM, etc could result in loss of life like losing a surveillance system using analytics for critical areas can and that you wonder why it would be any different. Or maybe "unbelievable", depending on what your role in the industry is.

Let's make intrusion sensors, fire alarm pull stations, and smoke detectors have to run through the cloud, too. Let's make access control readers have to go to the cloud first, before opening a door for emergency egress. It's all the same as email and CRM.

Whatever organization hires you (or that you run) for security has a very rough road ahead.

Agree: 2
Disagree: 3
Informative: 2
Unhelpful: 5
Funny

It is "unadvisable" for you to think that losing email, payroll, CRM, etc could result in loss of life like losing a surveillance system using analytics for critical areas can and that you wonder why it would be any different.

I think you missed my point. I agree that the two kinds of platforms probably should not be compared in the same way, but I can totally see how and why end-users are doing exactly that. Calling it "unbelievable" implies that you are not understanding the buying forces of the market, and are unlikely to adapt accordingly, IMO.

Agree: 5
Disagree: 1
Informative
Unhelpful: 1
Funny

If you want to minimize IT systems management demands and the resultant costs of the people that are needed to manage the multitude of networked systems, move towards cloud and fairly predicatable operating costs.

If you want to own all aspects of your IT for privacy, data integrity, security, redundancy, fault tolerance, up-time, etc, to minimize risk, you have to pay the price and do it in-house.

That's an over-simplification in two sentances of course, but that is the essense of it. The lesser of two evils or the greater of two goods.

Every institution will have it's own set of risks to mitigate and the appropriate solutions to mitigate those risks.

If one considers video as a life safety technology, do it in house or be prepared to handle the consequences of dependency on others (the cloud). If one does not consider video as a life safety technology, cloud may make sense.

Of course there's much more to this in each circumstance but, again, I think that's the essence of it.

Agree: 4
Disagree
Informative
Unhelpful
Funny

Yes, your point was ill-communicated. I would hope that anyone who has a career in this industry knows that simplicity and outsourcing is a huge buying factor in IT. I didn't think I had to state the obvious. What is unbelievable is the fact that this buying factor has greater priority than reliability of a system used to protect lives and assets. From an ERM point of view, "unadvisable" is an inadequate adjective to describe how ignorant and dangerous this is, IMO.

The problem is that executives\board members don't understand technology from a risk perspective so whatever IT says, happens. If ABC cloud provider sells IT that their lives will be easier without sacrificing logical security, IT goes for it without thinking about the sacrifices in physical security.

Agree
Disagree: 1
Informative
Unhelpful: 2
Funny

FYI, I pitch cloud based services all the time, when appropriate to the risk level.

Agree
Disagree: 1
Informative
Unhelpful
Funny

Just a note that I was not engaging in the unbelievable vs undesirable kerfuffle, just trying to note a need to evaluate the dependencies and reliabilities of any given solution on the risks and mitigation of those risks in ones particular environment.

Agree: 1
Disagree
Informative: 1
Unhelpful
Funny

Yes, your point was ill-communicated.

And your original post was a bunch of vague speculation wrapping up with anticipating a Mirai-style botnet that takes down corporations via their cloud-dependent surveillance cameras.

Most of the enterprises I have seen with their own GSOC span multiple locations and are not handicapped by a "simple" DDoS attack. Also, Verkada allows live local viewing, which does not have a cloud dependency.

What is unbelievable is the fact that this buying factor has greater priority than reliability of a system used to protect lives and assets.

Citation needed. What "facts" are you referencing that show buying factor is prioritized over net reliability of the system?

Agree: 1
Disagree: 2
Informative
Unhelpful: 1
Funny

Update 12/13 - Email and SMS Notifications Available / No Mobile

As of the morning of 12/13 email and SMS notifications for alerts have been received from our cameras. Mobile app notifications are still not working. We reached out to Verkada for additional feedback and will update the report when it is provided.

Additionally, People Search is also working properly, however for the day(s) that it was not functioning, people detection events are still not found.

Agree
Disagree
Informative: 1
Unhelpful
Funny

I am very curious to see how Verkada is going to address the post-mortem on this.

Typically your pipeline for events is something like: Person Detected --> Rule(s) matched --> Event data logged to database --> Event info dispatched to notification service.

I find it intriguing that the event notification stack seems to be somehow commingled with the rules stack, but still separate from the person detection algorithms.

Just from what has been posted, it feels more like a failure in the rules engine that was not identifying events and sending the event off to the notification dispatcher, but of course I could not say for sure.

It is also very common to use a third party service for the notification handling (eg: Pushed), it would be interesting to know if Verkada is claiming to have rolled (or hosted) their own push notification stack.

Agree
Disagree
Informative: 3
Unhelpful
Funny

Related and a bit funny, evidently Verkada did not have a QA team until recently, from a just published Verkada interview:

So Verkada built up a 100-person sales team before adding a QA team.

Agree
Disagree
Informative: 2
Unhelpful
Funny: 7

While a lot of people find it funny, sometimes it's OK not to have a QA.
There are a test driven devlopment techniques
test driven development - Google Search

Shortly speaking, code is 100% covered with the autotests, which is much more reliable and efficient than manual QA. QA should only be used if something could not be automated.

Agree
Disagree
Informative
Unhelpful
Funny

Sergey, does Network Optix have a QA team? Why or why not?

Agree
Disagree
Informative
Unhelpful
Funny

We do. For a few reasons and I cannot say which one is the right one more important one, I just list them all. And it's not that binary...

1) by mistake. While TDD is much more efficient, originally as a startup we choose to ship the code ASAP and just make QA manually tested in parallel with development. Little short term help, but long term tech debt. If we would do it again, we would have a much smaller QA.

2) We use OpenGL client and back in the days, it was much harder to automate.

3) Still, there are things where manual QA is more efficient(especially short term).

4) In our organization, QA also plays the role of usability testing. They have the authority to change functional specs and assign developers tasks to make things more simple for users.

These days a lot of our original QA members switched from manual QA to auto testing.

But anyway I know some companies/products do not use manual QA at all. I believe exacq would be an example form our industry. It's totally fine, even desirable.

Agree
Disagree
Informative
Unhelpful
Funny

This response is perfectly reasonable. The "sometimes it's OK not to have a QA" is not.

As you describe in the follow-up, while increasing test coverage is desirable, in practice, not having any human QA exposes an organization's customers to all types of bugs and problems.

I'd add to your list, a fifth point, that you cannot anticipate and add a test for every situation or interaction, simply because you can't think through all the possible patterns upfront that QA and users will actually try.

Sometimes I feel you are just playing devil's advocate, I can't believe you really think it's sensible to hire 100 salespeople but no one in QA...

Agree
Disagree
Informative
Unhelpful
Funny

From the given developer interview we know that they have a lack of formalized QA(see your quote). It means some people could have played the QA role in unformalized fashion(I'm sure they did).

So it's not about "no one in QA.." or "not having any human QA".

Agree
Disagree
Informative
Unhelpful
Funny

Yes, I would hope someone inside of Verkada even randomly tried this out on staging before deploying to production. It still is funny, evidently to many of us, that it appearantly took them this long to formalize it.

That said, you originally advocated for automation:

Shortly speaking, code is 100% covered with the autotests

Again, not minimizing or detracting from the role of automation, just saying that any enterprise software, whether Network Optix or now Verkada, is going to have a formalized QA program, beyond TDD.

Agree
Disagree
Informative
Unhelpful
Funny

I realized it's worth to note I heard good things about exacq product quality.

Agree
Disagree
Informative
Unhelpful
Funny

What happens to CIO's that buy this snake oil??

Agree
Disagree
Informative
Unhelpful
Funny: 2

We have received reports of continued notification outages, with no ETA for resolution provided from Verkada. A Verkada partner told us they are still having issues and have not received a response in 3 days.

We have continued to receive email/SMS notifications, with still no mobile app notifications.

Verkada has not provided a reason or resolution for the outage, but we will update the report when we receive one, or Verkada publishes a post-mortem.

Agree
Disagree
Informative
Unhelpful
Funny

That starts to sound like they pushed a software commit that also massively changed a database schema, or altered some other fundamental underpinnings of the architecture that is very very hard to roll back.

If this was the fault of a 3rd party service provider, we would have most likely been told that as a root cause, possibly along with Verkada switching to a different 3rd party. Not to mention, we would be hearing of other platforms having notification problems for other products.

If this was the result of a small bug, patch, etc., they would have rolled that back, or fixed the issue by now.

If they were running out of server resources, they would have sold a bunch of YETI tumblers on eBay to fund some more AWS cycles :)

That leaves me to believe this is tied very tightly to core architecture, and changes that Verkada made that are very difficult, or impossible, to reverse.

I wonder if Verkada customers will ever be given the choice about how and when new changes, enhancements, etc. are rolled out to their personal instances and accounts. This is one of the major downsides of SaaS-style systems, your personal experience is only as good as the weakest link in your providers resources.

Agree
Disagree
Informative
Unhelpful
Funny

Interesting Verkada's marketing touts them as being 'always reliable, always on':

Beyond the obvious irony of this situation, companies rarely claim to be 'always' reliable or 'on' as any service (whether its Verkada or Gmail or AWS, etc.) or any product can have issues sometime.

Agree: 1
Disagree
Informative
Unhelpful
Funny: 1

Interesting Verkada's marketing touts them as being 'always reliable, always on'

Well, at this point the service has been impacted long enough that you can reliably assume it won't get fixed quickly.

Agree: 1
Disagree
Informative
Unhelpful
Funny

I'm guessing they are rethinking allowing IPVM to sign up as a customer about now.

Agree
Disagree
Informative
Unhelpful
Funny: 1

We purchased our cameras through a local Verkada dealer and registered our system through standard procedures.

While I suppose maybe they could block accounts created with our email address/domain, I do not know if they can. They have an open channel, anyone can purchase through CDW/distribution.

Agree
Disagree
Informative
Unhelpful
Funny: 1

Would you ever consider partnering with a local integrator to do some covert testing, if a manufacturer tried to keep their products away from you?

I'm just asking out of curiosity; my company doesn't operate in Pennsylvania.

Agree
Disagree
Informative
Unhelpful
Funny: 1

How many 9s of uptime do they guarantee in their SLA? Do they use any service like statuspage.io to report outages?

If this is all advertising bluster with no SLA or statistics to back it up, that's... not surprising at all from this company.

Agree
Disagree
Informative
Unhelpful
Funny

4 9's of uptime, we have heard they are offering service credits for users that have not been receiving notifications.

More details are on Verkada's SLA support page.

Agree
Disagree
Informative: 1
Unhelpful
Funny

I am a Verkada partner. AMA.

NOTICE: This comment has been moved to its own discussion: I Am A Verkada Partner. AMA.

Agree
Disagree
Informative
Unhelpful
Funny: 1
Agree
Disagree
Informative: 2
Unhelpful
Funny
Login to read this IPVM report.
Why do I need to log in?
IPVM conducts reporting, tutorials and software funded by subscriber's payments enabling us to offer the most independent, accurate and in-depth information.
Loading Related Reports