I work in the research and development field, so keep in mind that I'm a geek whos products are seldom required to fully function in the real world :)
My thinking is, you are looking for a software tool that can be reasonably automated to serve your needs.
These cannot provide a complete answer your question, but may be supporting elements of your solution.
--------------------------------------------------------------
Leverage the properties of the spatially invarient fourier transform for image analysis. I haven't investigated, but I expect you can find online representations of this function capable of operating on video of varying frame rates and resolutions. It would require two inputs: an image or a video stream, and a reference image you want to match within that image or stream.
Taking a leaf from IPVM's playbook, build a large poster representative of issues you wish to test. The details are not terribly important: traditional eye charts with increasingly small letters, standard video test charts with radial lines and such, imagery of objects of interest, or other test charts could all be suitable.
Try to place your poster within each camera visual field on the boresight of the camera. Off boresight can limit SIFT relevance because it may require a mathematical camera model to compensate for non-linear distortions, which would unnecessarily complicate your solution.
You should try to place the poster at such a range that, for each camera, it takes up the same % of the camera field of view. That normalizes the result for effective range (eg even "bad" cameras can indicate a high score if the reference image nearly fills the field of view, compared to a "good" camera where the reference image fills only 5% of the field of view).
The SIFT provides an indication of the position of the image within the camera field of view, as well as an indication of the "goodness" of the correlation. If the video imagery is degraded, the correlation will be poorer than if the video imagery is crisp and clear.
Cons:
1) you have to figure out what is a SIFT, how do you use it, where can you get a suitable algorithm, etc.
2) the approach only works during those times that your poster is in the camera field of view.
Pros:
1) it's a pretty reliable quality measure
2) relevance: your poster is representative to your camera imagery needs
Variations:
Once you have investigated, developed, and placed this arrow in your quiver, you can implement it in a number of ways. A continuous quality measure can be available in those fields of view that have sufficiently detailed permanent feature sets. For example, a field of view with resolvable text or some fairly reliable constant detail can be continuously monitored for image quality.
You might use a high-quality camera with a zoom that allows you to take an image from the same point-of-view of each camera of interest, with similar parameters. Snap the image of that space without any changing objects such as people or vehicles in the scene: this becomes the reference image. Run the SIFT on video stream vs reference image. Note quality and degradation with time.
--------------------------------------------------------------
Alternatively, there are a number of off-the-shelf functions you can use to provide a first-pass approximation of image and video image quality. One example is information entropy. An entropy function returns a result between 0 and 1 that represents the amount of information in the sample. You can see how this might apply. A blurry image will have less information content in each frame than will a crisp image. A camera with "stuff" on the lens will provide less information content over time than will an un-occluded image. An image with impaired dynamic range will have less information content than a wide dynamic range image.
Just as a quick example of how information entropy function can be useful in video, we use it for auto-focus. Calculate the entropy of an image, auto-change focus a few steps, calculate entropy of the resultant image. Entropy (information content) either improved (+) or degraded (-). Auto-change focus an amount proportional to that change in entropy. Repeat. Result: continual small focus adjustments about perfect focus.
Also, we run entropy across two domains. First, we use it on each video frame. This is used for auto-focus and general quality assessment of blurry, foggy (eg dusty lens), or low dynamic range video. Second, we use it over time on the video stream. This selects segments of interest across video streams, because a static scene has low information content (once you've seen one frame, you can mostly predict all subsequent frames).
--------------------------------------------------------------
I can see where this wouldn't be a high priority, but wouldn't it be interesting if VMS manufacturers embedded such a capability for any arbitrary video input, either from a generic quality metric such as entropy, or programmed against a manufacturer-specific test chart (which they can sell for a ridiculous sum such to make even more money)?