IPVMU Certified | 05/19/15 11:48am
Even non-networked electronic devices can benefit from a restart every so often. I remember my old iPod (with a Firewire connection) used to occasionally freeze and all controls would fail to respond. It was not a networked device but restarting it would overcome the occasional freeze.
There are many causes of problems which can be overcome by a restart. One common cause of such problems is a memory leak. Memory leaks, due to buggy code, can lead to all the free RAM being used up over a period of time so the device stops working correctly. Restarting the device doesn't fix the problem but it does purge the RAM so the device can continue working for a while until the memory leak (probably) occurs again.
A, how often is this? As Luke mentions there is some benefit some time, but if it is truly 'often', say once a month or so, that points to a bigger issue.
Can you describe what type of cameras you have and what type on environment this is in? i.e., indoor office, outdoor perimeter or?
IPVMU Certified | 05/19/15 05:18pm
Cameras are usually Unix-derivative machines with many dependant OS and application level processes, all of which may need to be running for the camera to be deemed 'working'.
Often when new software is deployed, even though it has passed QA, it may still crash after working for days. As Luke said, a process may be greedy and continually request more memory be allocated for new structures, without releasing memory that it is done with. Eventually there is no more left, and the program may crash, hypocritically blaming the OS for not being able to provide any more. Although called a memory leak, it's more like memory hoarding. New developer test tools have been making inroads into eliminating this bane of C (typically) programming.
In regards to networking, TCP/IP connections have numerous error and timeout states in various combinations that may not appear, even in thorough testing. So a program may crash every so often roulette style when 00 appears. Over time companies find these rarer occurrences and fix them, or else add them to a watchdog list of processes that will be automatically restarted when they fail.
Also, power glitches can cause machines to end up in random logic states that leave them in limbo. External Watchdog devices, like some managed switches, can detect these failures and do a clean power cycle automatically.
When your tech reboots, he should pull the system log and retain with the system history. This can help you determine if the proximate cause of failure is a repetitive one, even if it's not clear what the log is saying. Forwarding them to the manufacturer will help them as well.
Though no system should have to rely on them, scheduling reboots can help with SOME failures, like memory leaks, others must rely on Watchdog programs.
Linux server uptime can be measured in years for mission critical systems. I wonder, then, if size and cost are the differentiators. You wouldn't necessarily expect multi-level fail-over in a camera server... .
IP Cameras are embedded systems that should be able to run continuously and reliably indefinitely. The workload is clearly defined and they rarely are asked to run 'ad-hoc' programs that might risk hanging up the system. So there's really no excuse for cameras hanging and requiring a reset. You should work with the camera vendor to resolve the issue.
Troubleshoot incrementally. What do the link lights look like when the cameras are in this state? Can you ping them? If you can ping them can you communicate with them in any other ways like via their API (while they presumably do not stream video)? Do no see the problem in some cameras more than others or cameras with certain firmware versions? What about on certain networks?
That being said, one area to look at is your network's router (not necessarily the switch(es), since those are not what controls the network). Are the camera's using DHCP? If so think about situations where a lease expires and the camera needs to reacquire an IP. That too should be a smooth process, but could expose 'defective' behavior from the way the camera reacquires an IP.
I had a similar problem on a mixed network (lots of traffic not having to do with the cctv), where a router between the cameras and the vms stopped working after a couple of days. I solved it by configuring the router to restart every night at 3 am. There is some route redirection by another router, that I believe is messing up the arp tables. So messy company networks may create problems. A simple lan with only switches, computers and cameras, is usually no problem. But when you start messing with routers, ip networking is no longer "plug-and-play".
So I guess my point is, that sometimes smartass solutions may loose to "simplify, simplify, simplify", in the long run. Not because the smartass solution shouldn't work, but because one or two devices may not be working as they should.