Avigilon - It Is Not Transcoding!

Here's the full comment from Alain Bolduc:

"Alex - Apparently, you were right.

Avigilon held a webinar yesterday, titled Surveillance in the new year: 5 trends for 2014 hosted by Willem Ryan, their Senior Product Marketing Manager. If you jump to 42:07 of the playback, someone asks the question about processing resource utilisation, assuming in the question that H4 uses transcoding - was that you John? ;) -, to which Willem replies:

It's not quite transcoding at all.

As could have been expected, he doesn't elaborate on the process used, but goes on to say that there are no special server requirements, apart from having a server that is up to current technology standards, to allow HDSM to stream high quality region of interest video back to the console based on the original high resolution image.

The webinar was obviously geared towards promoting Avigilon's product strengths, but listening to the entire presentation was still informative and the presenter knowledgeable, made particularly evident during the 15 minute or so Q&A at the end."

I love Avigilon's marketing and how they continue to redefine fundamental terms.

First, here's the entire section (listen to the original audio for yourself):

"It’s not quite transcoding at all but to answer the main question there, how much server processing power does it require, it can run on your standard servers today, sort of the well spec'd servers today and does not necessarily require additional or beyond average processing capabilities that we see today.

Does it require some computing resources? Absolutely."

So Avigilon has a method whereby they ingest a stream, process it and then send out a different stream. That's transcoding!

Transcoding is not a bad thing. I think it's a plus that Avigilon has added this in. But let's call it what it is so we can appropriately understand the tradeoffs.

As I asked in the original discussion:

"So we are still left with the question of how they are converting standard H.264 streams into 12 sub streams without transcoding, any overhead nor quality loss."

Now the answer is clear. There is additional overhead.

What remains to determine is how much overhead? How much more processing does it take for cameras using this than for those that do not? Video processing is very CPU intensive. So I am sure it will run on 'well spec'd servers today', we just need to understand how 'well' that is (i5s, Xeons, multiple Xeons, etc.), how it impacts total number of cameras on a server (if a given server ran 40 cameras without transcoding HDSM 2.0, how many will it run with it, etc.?) and what is the load factor is (i.e., if 2 or 3 or 4 clients request processed streams at the same time?)


To me that looks to set a worrying precedent. For forensic Video Analysis, and the correct intepretation of an image and video, all worldwide guidance is to use the original encoded video or image. If a company is transcoding their input stream into something else then as an analyst I would need to know what it was originally and the impact on the image in the transcoding process.

Best practice = first encoded stream is retained, and that is what is exported as evidence when required.

Have to keep on eye on this one!

David, I am pretty sure you can still download / export the original stream. That's typical in these approaches. It's mainly done as a convenience for operators who want to quickly watch video over lower bandwidth connections. If you really need to analyze and/or submit to court, I would expect you can still get the original.

Ah ha! I see now! Thanks for pointing that out John.

To be fair John, anything that involves a server, a network, and a client requires overhead, but the question about exactly what he means when he says "the well spec'd servers" is also fair.

It got me wondering about how their high res cameras capture the video images and wether there might not be any transcoding involved to provide the "region of interest" images simply because that's how they are captured by the camera using multiple image sensors, multistreamed over the network and stored that way, but maybe I'm just letting myself get thrown off by that ARGUS example discussed at the beginning of the presentation.

Obviously, I'm far from being an expert in the field and all I'm doing is throwing the idea out there. How feasible would that be?

Either way, I guess that would also involve some kind of transcoding/compositing to provide the user with a less detailed, situational awareness image, which might require even more computing power to process 12 images into one and reduce the final image resolution.

Unless they have something else going on.

To me transcoding involves decompression(decoding?) and re-compression(encoding?) of the whole or part of the image between the video streaming source and its subscriber. And I think it has to be called transcoding of a cropped portion of a video stream on the fly.

Alain, Peter, the impact whatever 'this' is has on the system is more important than if they (or we) agree that it is 'transcoding'.

When I say overhead, I mean additional overhead from this HDSM 2.0 process / method / technology. Willem Ryan's statement pretty clear confirms that there is additional overhead.

For someone looking to use this, regardless of what is going on in the black box, you need to know:

  • How much more processing power do you need for HDSM 2.0 for a given camera count, resolution, client usage pattern? Is it a 10% increase, 50%, 100%? This can have a big impact on new machine selection as well as whether or not existing Avigilon appliances or boxes can add this on.
  • What type of latency does this incur relative to not using it? No latency? Trivial latency that is not noticeable to the user? Significant latency of multiple seconds?
  • What type of quality reduction does this incur? Any? Not noticeable? Significant?
  • Does this work only on H4 cameras? (That's my understanding) To the extent that it depends on processing on the cameras, what type of cost increase does this bring? How will the cost compare to the numerous 12MP cameras that will be released this Spring from Avigilon's competitors using the same chipset provider (Ambarella)?

Based on Avigilon's strong product track record, I bet overall this will be a useful feature. However, Avigilon also has a track record of omitting or downplaying real technical tradeoffs, so it is important to carefully go through and understand any potential issues involved.

"It’s not quite transcoding at all..."

Was this person a political speech writer in a past life??

How dare you snark at Avigilon! :)

It's definitely a hedge though... (update: here's the original audio clip)

Interesting thread....

John you claim new Pro cameras stream one 8,12 or 16MP image to the server?

Server then restreams/transcodes images to clients?

I am claiming that Avigilon's statement in the webinar is purposefully vague and is likely obscuring some important operational issues. I have already described the specific questions a smart integrator or user should ask above.

Couldn't this just be that the cameras can transmit several streams at the same time, 1 of them full resolution, some other (maybe 12, maybe not) from just a cropped sub-region of the image (maybe a fixed 1/12th region, or maybe a dinamically selected one, depending on the area the client is requesting), more or less like Arecont cameras do? I think this wouldn't be so hard to implement, provided the DSP's are powerful (which I expect they are), and it would be a good solution, not involving transcoding or SVC... The marketing terminology they use, although not probably precise, seems to suggest this, since they speak about just Stream Management, which would be the case... What do you think?

The marketing terminology and the webinar statements both repeatedly emphasize the role of the server. If this was just camera side, I would not expect to see statements like "[HDSM] stores the video information on the server as small packets. Simultaneously, HDSM separates the video into multiple useable segments."

While their marketing indicates the H4 cameras play a role, their other statements indicate the need for server processing, which implies some form of transcoding / video munging / manipulation, etc.

HDSM = High Definition Stream Managment


HDST = High Definition Stream Transconding

The server will manage different streams coming from the cameras.


HDSM first meant SVC (i.e., JPEG200), then it meant H.264 multistreaming, now it means whatever magic Avigilon is claiming.

Don't cite the acronym of a marketing term as a technical explanation.

Suggestion: Why don't y'all get your hands on one of them suckers and let's kick its tires and put it up on a lift?

Supposedly they dog n ponied them in dubai couple weeks back.

As a side note, this also isn't anything new under the Sun. IQinVision was doing "Region of Interest Streaming" up to 64 sub-streams, each with its own available compression, resolution, and frame rate, 6 years ago.

Many cameras today support region-of-interest streaming - it's not new technology.

And yes, it's transcoding. Every time. Note there are also server-side transcoders available. It just depends where your choke points are and how you want to use it.

The reality is, most VMS's today will accept any stream or sub-stream served to them, but very few have engines or UI's designed to request a specific sub-stream, so in the end this ends up being a really cool marketing feature that few, if anybody, uses.

I'd certainly like to see more VMS's support the feature, as it can deliver some really cool things, particularly to end-users who are bandwidth and/or storage constrained. Perhaps Avigilons rennaissance prosylitizing will breathe some new life into the concept. We shall see.

"The HD Pro series will begin shipping in Q2 2014."

H.264 breaks each image into a grid of 16x16 "macroblocks" and encodes the blocks individually using DCTs. For inter-coded frames it also searches for similar areas in successive frames and just encodes the differences.

In theory, if you wanted to transmit only a small area of an image instead of the whole thing at the same pixel density, it should be possible to do this by simply rearranging the already encoded stream to include only the macroblocks that are within the region of interest, possibly with a little extra work to adjust motion vectors and tidy up the blocks around the edges. This would not involve any decompression or recompression, and would be much, much less work for the CPU than decoding each macroblock and re-coding it, as you'd have to do if you wanted to change the resolution or encoding quality. It would also be a completely lossless process.

I don't know if this is what Avigilon are actually doing but that would be my guess. If so I think it would be reasonably accurate to describe it as "not quite transcoding".

Please tell me the exact server load impact , increase , etc. needed.

I do not care if its transcoding or now, just how it performs.

I didn't view the video as I don't feel like getting inundated with marketing bunk. Taking multiple streams of video and combining them into one is multiplexing (MUX). The output in a multviev. i.e. 12 streams combined into one is not transcoding. Transcoding is conversion from one format to another. The multiview is created by a MUX which takes the source video and shrinks the image down then combines it with the other shrunken images and sends a single stream with the combined (shrunken) images to the receiving device. This traditionally was done by hardware and has evolved to software with the increased capabilities of computing power. A software MUX would use a considerable amount of procesing power to perform this function. If the input streams for example are H.264 and the output stream is H.264 this is not transcoding it is multiplexing.

Looking at Avigilons HDSM primer it appears they are using software to stream the target area video rather than the entire image thus reducing bandwidth to the workstation. Not really magic just an interesting video processing technique at the server vs. the client (Digital PTZ). Video Security | Avigilon It's not really that much different than a VMS output to a client looking at multiuple streams. These are tyically multiplexed in a similiar method where you don't receive the full stream until you focus on a single camera. Avigilon is taking this a step further allowing you to focus on a particular view of a camera and only sending that component in the stream.

Yes lot's of CPU cycles to accomplish this.


I understand that with 12 streams they also sent 2 or 3 additional streams with entire image at reduced resolution

what do you think ?

Using the Avigilon link as a reference they are sending multiple streams from the camera. Stream 1 with full resolution (i.e.1280 x 720) and stream 2 with the area of focus (i.e. Cropped area 300 x 200 effective pixel section). The 2nd stream is not represented as lower resolution just a section of the full stream. So the 2nd stream could be full resolution but only the cropped area of focus thus significantly less bandwidth. If the client is subscribed to the 2nd stream it would thus consume less bandwidth and not necessarily be lesser reolution than the original. It is a full resolution of the cropped area.


I was talking from camera to server

12 +2 (or 3)


16 MP image is divided into 12 full resolution tiles and 2 lower resolution streams

According to them ACC server stores full resolution tiles and low resolution streams

I could be wrong but that how I understand HDSM 2

I stand corrected. This white paper provides a good description of how it works and a better understanding of the variation in resolution used during different viewing modes.

It's definitely not transcoding. I would interpret the feature as image scaling based on what you are viewing. The server is creating multiple segmented streams from a single stream vs. multiple streams being sent from the camera. The reduced bandwidth benefit is from the Server to client not from the camera to server. This is far different than how I was originally interpreting the information.

"HDSM stores the video information on the server as small packets. Simultaneously, HDSM separates the video into multiple useable segments: lower resolution and smaller size streams for situational awareness, and much larger streams for full image detail. HDSM then intelligently manages these streams based on what the user is viewing."