Why Do Networked Devices Need To Be Restarted Every So Often?

Our service techs often get to sites where customers are complaining of a camera outage. They reset the camera and now everything works fine. Why does this happen?


Even non-networked electronic devices can benefit from a restart every so often. I remember my old iPod (with a Firewire connection) used to occasionally freeze and all controls would fail to respond. It was not a networked device but restarting it would overcome the occasional freeze.

There are many causes of problems which can be overcome by a restart. One common cause of such problems is a memory leak. Memory leaks, due to buggy code, can lead to all the free RAM being used up over a period of time so the device stops working correctly. Restarting the device doesn't fix the problem but it does purge the RAM so the device can continue working for a while until the memory leak (probably) occurs again.

A, how often is this? As Luke mentions there is some benefit some time, but if it is truly 'often', say once a month or so, that points to a bigger issue.

Can you describe what type of cameras you have and what type on environment this is in? i.e., indoor office, outdoor perimeter or?

Cameras are usually Unix-derivative machines with many dependant OS and application level processes, all of which may need to be running for the camera to be deemed 'working'.

Often when new software is deployed, even though it has passed QA, it may still crash after working for days. As Luke said, a process may be greedy and continually request more memory be allocated for new structures, without releasing memory that it is done with. Eventually there is no more left, and the program may crash, hypocritically blaming the OS for not being able to provide any more. Although called a memory leak, it's more like memory hoarding. New developer test tools have been making inroads into eliminating this bane of C (typically) programming.

In regards to networking, TCP/IP connections have numerous error and timeout states in various combinations that may not appear, even in thorough testing. So a program may crash every so often roulette style when 00 appears. Over time companies find these rarer occurrences and fix them, or else add them to a watchdog list of processes that will be automatically restarted when they fail.

Also, power glitches can cause machines to end up in random logic states that leave them in limbo. External Watchdog devices, like some managed switches, can detect these failures and do a clean power cycle automatically.

When your tech reboots, he should pull the system log and retain with the system history. This can help you determine if the proximate cause of failure is a repetitive one, even if it's not clear what the log is saying. Forwarding them to the manufacturer will help them as well.

Though no system should have to rely on them, scheduling reboots can help with SOME failures, like memory leaks, others must rely on Watchdog programs.

Chris,

Can you recommend some watchdog programs or switches that would have a this feature? Locking up constantly isn't a problem but I'm responsible for over 2000 channels of video internationally and it sure would be nice not to have to tell the client to pull the patch cable on a camera and plug back in to reset. Managed switches and manually dropping power helps save on that phone call, but I have not figured out how to have it automatically reset troubled devices or even identify if it is troubled. We're using switches like the HP 1920/1910 series and the Cisco 300 series if that helps.

It can be a little bit time consuming to initially setup, but Nagios can do what you want in terms of monitoring. It you have managed switches you could even script an automatic PoE toggle in the event a camera stops responding after X time.

Linux server uptime can be measured in years for mission critical systems. I wonder, then, if size and cost are the differentiators. You wouldn't necessarily expect multi-level fail-over in a camera server... .

IP Cameras are embedded systems that should be able to run continuously and reliably indefinitely. The workload is clearly defined and they rarely are asked to run 'ad-hoc' programs that might risk hanging up the system. So there's really no excuse for cameras hanging and requiring a reset. You should work with the camera vendor to resolve the issue.

Troubleshoot incrementally. What do the link lights look like when the cameras are in this state? Can you ping them? If you can ping them can you communicate with them in any other ways like via their API (while they presumably do not stream video)? Do no see the problem in some cameras more than others or cameras with certain firmware versions? What about on certain networks?

That being said, one area to look at is your network's router (not necessarily the switch(es), since those are not what controls the network). Are the camera's using DHCP? If so think about situations where a lease expires and the camera needs to reacquire an IP. That too should be a smooth process, but could expose 'defective' behavior from the way the camera reacquires an IP.

I cannot agree more here ! .. DHCP might be a great time saver on the initial stages of cameras deployment, BUT can easily become an issue in the long run (specially if the network admin/installer is mixing PC Workstations, Laser Network Printers, VOIP phones and all sorts of equipment on the same LAN, your neatly ordered hosts can quickly become a mess and you will waste time troubleshooting). Ideally all cameras and DVR/NVRs should be on their own security LAN, but this is NOT always the case, for several reasons beyond this topic.

What has worked for me so far:

- Either manually change each camera to use a Static IP (time consuming). The manufacturer provided search utility might or might not help depending in how well the software was designed. I've worked with some utilities that do no more than show you a list of found cameras based on each unique MAC Address. Others can batch-assign a unique IP for each camera without you having to open a browser, type the IP and get to the exact menu where you can change the IP.

- Or enter Windows Server OS's DHCP server settings and do a reservation for each dynamically assigned IP address, so that next time it renews, each camera will be re-assigned the same IP address it has before.

- Then on an Excel sheet, record all this information (IP Address, Administrator Passwords, Network Topology, etc.) for technician's future follow-up and not waste excessive amount of time in the field singling-out the failure points. MOST IMPORTANTLY I think, from my experience servicing high demanding customers case tickets, DO NOT FORGET to send yourself an e-mail with this recorded info and type a clear description on the e-mail's subject field. You will be happy you did, when you are inside a server room fumbling with your phone trying to search for that e-mail (coz no one remembers any IP address after a week). Send another one to your customer (if you don't mind risking that said customer will give it to another sub-contractor and not contact you again for more future work related to this initial project).

I had a similar problem on a mixed network (lots of traffic not having to do with the cctv), where a router between the cameras and the vms stopped working after a couple of days. I solved it by configuring the router to restart every night at 3 am. There is some route redirection by another router, that I believe is messing up the arp tables. So messy company networks may create problems. A simple lan with only switches, computers and cameras, is usually no problem. But when you start messing with routers, ip networking is no longer "plug-and-play".

So I guess my point is, that sometimes smartass solutions may loose to "simplify, simplify, simplify", in the long run. Not because the smartass solution shouldn't work, but because one or two devices may not be working as they should.