I love Avigilon's marketing and how they continue to redefine fundamental terms.
First, here's the entire section (listen to the original audio for yourself):
"It’s not quite transcoding at all but to answer the main question there, how much server processing power does it require, it can run on your standard servers today, sort of the well spec'd servers today and does not necessarily require additional or beyond average processing capabilities that we see today.
Does it require some computing resources? Absolutely."
So Avigilon has a method whereby they ingest a stream, process it and then send out a different stream. That's transcoding!
Transcoding is not a bad thing. I think it's a plus that Avigilon has added this in. But let's call it what it is so we can appropriately understand the tradeoffs.
As I asked in the original discussion:
"So we are still left with the question of how they are converting standard H.264 streams into 12 sub streams without transcoding, any overhead nor quality loss."
Now the answer is clear. There is additional overhead.
What remains to determine is how much overhead? How much more processing does it take for cameras using this than for those that do not? Video processing is very CPU intensive. So I am sure it will run on 'well spec'd servers today', we just need to understand how 'well' that is (i5s, Xeons, multiple Xeons, etc.), how it impacts total number of cameras on a server (if a given server ran 40 cameras without transcoding HDSM 2.0, how many will it run with it, etc.?) and what is the load factor is (i.e., if 2 or 3 or 4 clients request processed streams at the same time?)
To me that looks to set a worrying precedent. For forensic Video Analysis, and the correct intepretation of an image and video, all worldwide guidance is to use the original encoded video or image. If a company is transcoding their input stream into something else then as an analyst I would need to know what it was originally and the impact on the image in the transcoding process.
Best practice = first encoded stream is retained, and that is what is exported as evidence when required.
Have to keep on eye on this one!
To me transcoding involves decompression(decoding?) and re-compression(encoding?) of the whole or part of the image between the video streaming source and its subscriber. And I think it has to be called transcoding of a cropped portion of a video stream on the fly.
Alain, Peter, the impact whatever 'this' is has on the system is more important than if they (or we) agree that it is 'transcoding'.
When I say overhead, I mean additional overhead from this HDSM 2.0 process / method / technology. Willem Ryan's statement pretty clear confirms that there is additional overhead.
For someone looking to use this, regardless of what is going on in the black box, you need to know:
- How much more processing power do you need for HDSM 2.0 for a given camera count, resolution, client usage pattern? Is it a 10% increase, 50%, 100%? This can have a big impact on new machine selection as well as whether or not existing Avigilon appliances or boxes can add this on.
- What type of latency does this incur relative to not using it? No latency? Trivial latency that is not noticeable to the user? Significant latency of multiple seconds?
- What type of quality reduction does this incur? Any? Not noticeable? Significant?
- Does this work only on H4 cameras? (That's my understanding) To the extent that it depends on processing on the cameras, what type of cost increase does this bring? How will the cost compare to the numerous 12MP cameras that will be released this Spring from Avigilon's competitors using the same chipset provider (Ambarella)?
Based on Avigilon's strong product track record, I bet overall this will be a useful feature. However, Avigilon also has a track record of omitting or downplaying real technical tradeoffs, so it is important to carefully go through and understand any potential issues involved.
"It’s not quite transcoding at all..."
Was this person a political speech writer in a past life??
John you claim new Pro cameras stream one 8,12 or 16MP image to the server?
Server then restreams/transcodes images to clients?
Couldn't this just be that the cameras can transmit several streams at the same time, 1 of them full resolution, some other (maybe 12, maybe not) from just a cropped sub-region of the image (maybe a fixed 1/12th region, or maybe a dinamically selected one, depending on the area the client is requesting), more or less like Arecont cameras do? I think this wouldn't be so hard to implement, provided the DSP's are powerful (which I expect they are), and it would be a good solution, not involving transcoding or SVC... The marketing terminology they use, although not probably precise, seems to suggest this, since they speak about just Stream Management, which would be the case... What do you think?
HDSM = High Definition Stream Managment
HDST = High Definition Stream Transconding
The server will manage different streams coming from the cameras.
As a side note, this also isn't anything new under the Sun. IQinVision was doing "Region of Interest Streaming" up to 64 sub-streams, each with its own available compression, resolution, and frame rate, 6 years ago.
Many cameras today support region-of-interest streaming - it's not new technology.
And yes, it's transcoding. Every time. Note there are also server-side transcoders available. It just depends where your choke points are and how you want to use it.
The reality is, most VMS's today will accept any stream or sub-stream served to them, but very few have engines or UI's designed to request a specific sub-stream, so in the end this ends up being a really cool marketing feature that few, if anybody, uses.
I'd certainly like to see more VMS's support the feature, as it can deliver some really cool things, particularly to end-users who are bandwidth and/or storage constrained. Perhaps Avigilons rennaissance prosylitizing will breathe some new life into the concept. We shall see.
H.264 breaks each image into a grid of 16x16 "macroblocks" and encodes the blocks individually using DCTs. For inter-coded frames it also searches for similar areas in successive frames and just encodes the differences.
In theory, if you wanted to transmit only a small area of an image instead of the whole thing at the same pixel density, it should be possible to do this by simply rearranging the already encoded stream to include only the macroblocks that are within the region of interest, possibly with a little extra work to adjust motion vectors and tidy up the blocks around the edges. This would not involve any decompression or recompression, and would be much, much less work for the CPU than decoding each macroblock and re-coding it, as you'd have to do if you wanted to change the resolution or encoding quality. It would also be a completely lossless process.
I don't know if this is what Avigilon are actually doing but that would be my guess. If so I think it would be reasonably accurate to describe it as "not quite transcoding".
Please tell me the exact server load impact , increase , etc. needed.
I do not care if its transcoding or now, just how it performs.
I didn't view the video as I don't feel like getting inundated with marketing bunk. Taking multiple streams of video and combining them into one is multiplexing (MUX). The output in a multviev. i.e. 12 streams combined into one is not transcoding. Transcoding is conversion from one format to another. The multiview is created by a MUX which takes the source video and shrinks the image down then combines it with the other shrunken images and sends a single stream with the combined (shrunken) images to the receiving device. This traditionally was done by hardware and has evolved to software with the increased capabilities of computing power. A software MUX would use a considerable amount of procesing power to perform this function. If the input streams for example are H.264 and the output stream is H.264 this is not transcoding it is multiplexing.
Looking at Avigilons HDSM primer it appears they are using software to stream the target area video rather than the entire image thus reducing bandwidth to the workstation. Not really magic just an interesting video processing technique at the server vs. the client (Digital PTZ). Video Security | Avigilon It's not really that much different than a VMS output to a client looking at multiuple streams. These are tyically multiplexed in a similiar method where you don't receive the full stream until you focus on a single camera. Avigilon is taking this a step further allowing you to focus on a particular view of a camera and only sending that component in the stream.
Yes lot's of CPU cycles to accomplish this.
Using the Avigilon link as a reference they are sending multiple streams from the camera. Stream 1 with full resolution (i.e.1280 x 720) and stream 2 with the area of focus (i.e. Cropped area 300 x 200 effective pixel section). The 2nd stream is not represented as lower resolution just a section of the full stream. So the 2nd stream could be full resolution but only the cropped area of focus thus significantly less bandwidth. If the client is subscribed to the 2nd stream it would thus consume less bandwidth and not necessarily be lesser reolution than the original. It is a full resolution of the cropped area.
I stand corrected. This white paper provides a good description of how it works and a better understanding of the variation in resolution used during different viewing modes.
It's definitely not transcoding. I would interpret the feature as image scaling based on what you are viewing. The server is creating multiple segmented streams from a single stream vs. multiple streams being sent from the camera. The reduced bandwidth benefit is from the Server to client not from the camera to server. This is far different than how I was originally interpreting the information.
"HDSM stores the video information on the server as small packets. Simultaneously, HDSM separates the video into multiple useable segments: lower resolution and smaller size streams for situational awareness, and much larger streams for full image detail. HDSM then intelligently manages these streams based on what the user is viewing."