Subscriber Discussion

Milestone Database Rebuild After Recording Server Service Restart

UM
Undisclosed Manufacturer #1
Nov 06, 2018

I am frequently seeing a problem in Milestone (Pro+ and Corp) when manually stopping the Recording Server service on systems with large archive drives.  When the service is stopped and then restarted, the service enters a state where is begins a full database check and a rebuild of its cache file.  An error message states that Milestone could not make a connection the Default Storage Path, yet the drive is available and functioning as expected.

What is the cause of this condition?  The only similar thing I found on the Milestone Support Community page stated that the "reasons could be be a power failure, a crash, or disk problems. This rebuild can take a long time, depending on the size of the archive. Anything between a few hours and a few days is quite normal."   You can say that again!  Takes a long time!

I suspect that the drive subsystem cant handle the load (write throughput), could this be the cause?  If so, what is the best way/tool to measure if a server's disk subsystem can handle a specific video load?  Does Milestone provide such a tool?

Other than reducing that load, are there any other best practices to follow/apply when trying to resolve this issue?

Any assistance will be greatly appreciated as always!

Avatar
Mike Dotson
Nov 06, 2018
Formerly of Seneca • IPVMU Certified

Milestone has logs of the system functions and status out in c:\programdata\milestone\recordingserver\logs

The ones you want to look at are the recordingserver, device handling and the ones that have database in the name.

The devicehandling one will have a message that includes the word 'overflow'.   If you see this a lot, that is indicative of an issue.

The database ones tell you what it is trying to do out in the storage area.  For example you can see when the archiving function actually starts and ends.

If you are not archiving and just have the single level storage..usually called the LiveDB...you can see messages about its status as well.

A bad one to see is if it is full, which will cause overflows to be seen in the other log file as it tries to clean up.

Also, a large LiveDB only installation can have this exact issue you described when the SQL index does not match what is out on the LiveDB.   The reason it takes so long is without the Archive process, the LiveDB is the entire size of the storage as opposed to the size defined for use with the archive process.

The recordingserver on has assorted other status messages that can suggest reasons for the issue at hand.

(2)
(4)
Avatar
Mike Dotson
Nov 06, 2018
Formerly of Seneca • IPVMU Certified

Here are two related Milestone articles incase you haven't seen them:

Article Number: 000002377

Article Number: 000003614

(1)
Avatar
Josh Hendricks
Nov 06, 2018
Milestone Systems

Also, a large LiveDB only installation can have this exact issue you described when the SQL index does not match what is out on the LiveDB. The reason it takes so long is without the Archive process, the LiveDB is the entire size of the storage as opposed to the size defined for use with the archive process.

This is partially true - rebuilding the index for a live-only system causes more down time because the live drive(s) must be completely indexed before the Recording Server will pull a live stream or store any new recordings. I only want to point out that there is zero coordination between Recording Server and SQL. The media database index is stored as a cache.xml and archives_cache.xml file in the folder where the recordings are stored, and the SQL server has no knowledge of the contents or index.

For a site with failover recording servers, a live-only configuration is a great setup assuming your failovers have the capacity to record for at least as long as a typical rebuild period.

For a system with a live and archive drive, the live drive should ideally be sized to handle recording for a duration equal to or greater than it would take to perform a reindex on the archive, to avoid the chance that the live drive gets full before the archive drive can complete a reindex.

 

(2)
Avatar
Mike Dotson
Nov 06, 2018
Formerly of Seneca • IPVMU Certified

Thank for the clarification.

UI
Undisclosed Integrator #2
Nov 06, 2018

I'd agree with Mike if you have a large live DB. I recently had a support call with them and he advised to set up an archive on our Corporate instance. Even though their training said it was okay to keep it one large LiveDB, he advised me to create an archive for this exact reason. When it rebuilds it does not need to rebuild the archive prior to recording starting.

Avatar
Josh Hendricks
Nov 06, 2018
Milestone Systems

Hi UM1, in short - make sure to update to the latest version you're entitled to and/or apply any available hotfixes for the installed version. If the Recording Server is unable to stop gracefully, or takes more than a few minutes for the service to stop, it's something we should look at as a support case. It's possible that it could be caused by a storage system, and we can help you to determine whether or not that is the case.

Within the last year there has been an issue resolved where on some systems the Recording Server could to fail to successfully write the archives_cache.xml file when the service stops. Without a valid archives_cache.xml, the Recording Server will need to enumerate the contents of the media database on startup and this can take a lot of time depending on the amount of data and the storage system.

The message you see during startup while the media database index is rebuilding where it's suggested that the Recording Server could not connect to the database is a result of the Recording Server periodically asking the embedded media database service "What's the status of storage X?" and the media database service hasn't "mounted" that storage bank yet because it still needs to be reindexed, so it responds with "I don't know". So it doesn't indicate a problem connecting to the drive at the storage level -- just that the media database isn't ready yet at the application level.

With regard to the debate of having a single large drive, or a live drive with a separate archive drive - at the most simple level, archiving is work and having a single storage configuration with a single path to store all recordings means less work for the Recording Server. There is nothing wrong with this design, but there may be legitimate reasons why it's not the right design for a particular site.

Namely, if a re-index is necessary, the Recording Server (in current versions) will not start until all live drives are online. If it takes 6-12 hours to rebuild the index for a 60TB live drive, that means no live video or recording during that time frame unless you have a failover recording server available.

On the flip side, if you have a small live drive with a separate archive, a reindex may only take 5 minutes and then you'll have live/recording ability in a short time without the need of a failover recording server. But if the archive drive needs to be re-indexed too, it won't be able to receive data from the live drive until that is completed. In this case, there is a risk that the live drive will reach capacity before it can complete an archive session. When that happens, data will be dropped from the live drive in a first-in first-out fashion until archiving is possible.

There are other design considerations like hard drive expense -- small fast drives for live, big slower, less expensive drives for archive. Some storage vendors will spin down disks while not in use. The energy savings and eco-friendliness of that may make it attractive to separate archives from the live drive.

(5)
(1)
Avatar
Jared Tarter
Nov 06, 2018
Milestone Systems

Hi UM1,

While this condition is something that can happen at times, it is definitely not something that should be happening on a regularly basis (most systems will never see it happen).  If you are continually seeing it, I would strongly recommend opening a case with our tech support so they can dig into it and find the underlying cause.

(1)
(2)
New discussion

Ask questions and get answers to your physical security questions from IPVM team members and fellow subscribers.

Newest discussions