Problems With 300 Camera Milestone VMS System

Does anyone have any suggestions?

We are running Xprotect Smart Client 2014 v9.0a (64-bit)

Please elaborate.

When you say 'station' you mean that the Smart Client is disconnecting from the XProtect server?

How many cameras do you have recording? How many are you trying to display at the same time? What type of network connection do you have? When you say 'disconnect' does the application crash / shut down or does video stop being displayed?

Providing some details will help us provide suggestions.

Also, if you have not already, contacting Milestone support could be beneficial.

That´s right, the Smart Client disconnects and gets an INVALID TOKEN message, no video and recordings are missing.

We have more than 300 full HD cameras, two streams H.264 and MJPEG, highest resolution, 18 fps.

Every Smart Client displays 36 cameras in a GB Ethernet connection. Cameras connected by GPON.

Milestone is working on this, but I would like to have another choice.

Have you tested ISS VMS?

The VMS study dates from 2011 and many VMS have emerged.

If recordings are missing, this indicates a server problem.

How many cameras do you have per server? What's the specs on the servers? 300 cameras with full HD cameras requires non-trivial throughput.

My advice to you, given this scale, is to carefully check and troubleshoot any load / server performance issues before considering changing any VMS. With this level of throughput, any VMS could have issues.

Also, how many clients are displaying 36 cameras? Are these all simultaneously? Because that is going to add a lot more load on the server as well. Also, if the client machine needs to decode 36 streams, that could be an issue.

Even if you change VMSes, you could easily have similar problems unless you carefully verify and optimize the load / server performance.

p.s. - we have not tested ISS but even if we did, the key driver here is ensuring that how those VMS would act under the demanding loads you are putting on the system.

Every SmartClient does this? At the same time or all sporadically at different times? Timesync issues can cause issues similar to this as well. I'd expect that's probably the first thing Milestone would ask you to check. I've also seen (had) that error on occasion when I'm running around switching from network to network, switching my VPN on / off, etc.. basically doing enough stuff where I would expect to see some issues show up on occasion. But in general our clients are solid with regard to an issue like this. Our scale, just based on camera count, is about double yours but that's definitely not a perfect indicator of scale. We probably have around four dozen SmartClient users ranging from infrequent to 24x7 use. We had one client about 3 months ago that was experiencing this issue oddly frequently. Just him, just on one machine. All the basic TShooting items checked out. We moved him to SmartClient 9.0b and his issues disappeared.

I'll also note, that if you're saying that you are showing 36 cameras per SmartClient all at the same time, then that's a lot of decode resources required (as John noted). Above and beyond that, that's a lot for someone or even multiple someones to be looking at on the human side. I'd love to hear more about that use case sometime as a separate entry.

VMS options are good to have but don't expect to ever find one that is "perfect" and problem free. That does not exist in any software world. Particularly at scale, and there are folks at a lot larger scale than us, things are going to happen, bugs will be seen, system and network-level "things" become more likely to rear their ugly head, etc. They can be worked through in most cases.

More info needed - what make/model of cameras? - is this a new installation and it has never worked or or was it existing and it suddenly stopped working? - if it was working what has changed? Any updates applied any new device packs installed? - how many servers? - if it has never worked. Does any configuration work? Ie if you drop it back to 1 x server and say 50 cameras does it work? If yes build it up until a point that it falls over. If it falls over at 75 is it those 75 that make it fall over or do any combination of 75 make it fall over. More information besides it doesn't work would be handy.

Actually, I used to see this "Invalid token" error every couple of days as well, albeit with ONSSI, which does use the same recording module as Milestone. I initially didn't post because my system is only one server with 20 cameras, but maybe this is relevant anyway.

The situation was when I would come out of hibernation with a laptop that I just closed the lid without closing the client. Often but not always, the windows would say "Invalid Token". When I researched it, the problem seemed to be time related. Apparently, for my version of NetDvr, 6.0b, the token gets refreshed once an hour. And when it would come back to life, the token would be stale and therefore error.

In your case it's possible that a time sync issue between client and server would present the same problem. Are they running off the same ntp, or have you checked them?

Though you may know this already, this was posted recently on milestone's support forum to troubleshoot this error:

  1. Restart the Recording Server and/or Management Server computers.
  2. Ensure the Recording Server and Management Server can ping one another and/or resolve each other’s DNS host names.
  3. Ensure the Recording Server can reach the Management Server on port 9993.
  4. Ensure the Management Server can reach the Recording Server on port 7563.
  5. Check for proper time synchronization between the client, Recording Server and Management Server computers.
  6. Disable Windows Firewall.

Also, the way I ended up figuring it out was by looking at the ImageServer log file, on the server. There, for ONSSI at least, it shows every token transaction between client and server. There should be some diagnostic when a token is rejected. If you are unsure how to find the log file, let me know your Xprotect version and I'll be glad to see if I can find it. Hope it helps!

Ok Chris, I´ll check timing and the logs.

Thank you very much.

We did the same thing for one of our clients, moved them to 9.0b and their problems went away

I have been having a similar issue at a site running about 200 cameras spread across 4 servers. They are in a Master/Slave configuration with all clients (~30) simultaneously loosing connection to the master. I am getting the Invalid Token: Cannot Verify Master error on all smart client instances. I have been in contact with Milestone daily for about 2 weeks now with no resolutuion to the problem yet. In the imageserver logs I am getting HTTP too busy error. The slaves are also randomly loosing connection with the master.

This problem occured after the servers had their OS upgraded to Windows Server 2008 and Milestone upgraded to 2013.

I will post a resolution if Milestone can find one.

Regarding time... I recall that a 5 minute maximum deviation is all that is allowed in Milestone between Servers and cams.

At a minumum, visit each cam and set the time to be the server time.

Another possibility to consider is using the Server as an NTP source. Do a search on "windows 7 pro as ntp server" and you will see many references.

Once established, the cams can refer to the server as their time reference instead of keeping their own time.

If the issue is not the time...then you need to understand what resources the server/client are using. Task Manager and Resource Monitor are your first line tools for this. In general, you do not want to see any resource in excess of 80% constantly.

On the storage side, observe the 'Queue length' of the storage target. A general rule here is one per each physical drive that is getting used. IE a RAID5 with 8 drives can have a 7 value for very short amounts of time.

If you see that the network is key...then turn to tools like Wireshark and trace with a filter to a specific camera if possible.

Regarding the NIC hardware itself along with the Windows drivers.... be sure you are using better class hardware here. For example...use Intel over Realtek. Of course be sure to have a current set of device drivers in play.

