Does Anybody Know Another VMS Which Has The Same/Similar Streaming Technology As Avigilon HDSM2?


A more basic question is: what is HDSM2? Something we debated in the past.

We own 2 Avigilon cameras that use HDSM2 so that's in queue for us to test it.

...what is HDSM2?

Easy. It's NOT transcoding. ;)

I've waded thru the white papers, mulled over the media, delved into the discussions, and perused the patent. And still I can't be certain.

The basic seems idea to be don't stream anymore to the client than it needs/wants or can use.

So using sub streams when possible is part of it. Also digitally cropping streams before transmission when the client has zoomed in on the FOV is another.

The patent mentions recompression as well as frame-rate reduction.

Whether some of these are literally "transcoding" or not is a irrelevant point to me.

IMHO, the important point is whether the stream has to be decoded/recoded in order to accomplish this, and I can't imagine how it wouldn't have to, at least in the case of recompression or digital cropping.

Some implementations of transcoding do fully decode each image followed by a full re-encode, which in itself isn't necessarily that bad. But there are tricks to speed up transcoding, such as re-using motion vectors of the original stream so that new motion vectors don't have to be recomputed for the second stream, and this saves a lot of computation. So if their marketing documentation states "our method is so great because we don't do transcoding", then I don't think it is such a big deal.

Looking forward to this test as there are a lot of people that don't understand how HDSM V1 works let alone how HDSM V2 works, especially on this site. For those of us that do this should be an interesting test.

How well does what the patent describes match the actual implementation, IYO?

UD1, you are welcome to ask John for my e-mail

I will try my best to explain

then you can pass your knowledge to IPVM Team :)

Will I have to sign a non-undisclosure agreement?

Absolutely not!

and I am not joking

Its true. I've never been able to find a white paper AT ALL that explains HDSM but what I do know and by their own estimation tool, the bandwidth, CPU and storage requirements are heavy and hence the real cost. If you want to dissect it, think about the solution in playback ... how does all that information ie focal planes, fields of view etc all get into the system "after the fact" ? It doesn't..... it goes in live. Mmmmmm howdaydodat?

http://avigilon.com/white-papers/high-definition-stream-management-part-2-the-technical-details/

Does HDSM2 work with only their cameras, or all h.264 cameras?

"what I do know and by their own estimation tool, the bandwidth, CPU and storage requirements are heavy and hence the real cost"

Sorry but this is not true.

"We will solve this mystery!"

"Looks a lot like transcoding!"

So it doesn't transcode, I agree.

But the without transcoding and without JPEG2000, HDSM 2 cannot do what HDSM 1 could, notably send a reduced resolution stream to a client using a higher resolution source.

Without a progressive CODEC, like SVC or JPEG2000, one must decode the stream to do anything with the resolution. Alternatively the camera can stream multiple resolutions, but this is like most other cameras on the market.

What HDSM 2 DOES do (on the camera) is cut up the picture into multiple images or tiles before sending them to ACC. Then when you are digitally zoomed in on an FOV, it only sends enough tiles to show what is necessary.

Since the tiles are split at the camera, no transcoding is necessary to do this.

Any technical arguments welcome.

So it is only if you are digitally zoomed on a high resolution stream between the VMS server and VMS client that there is really an advantage over other methods. No advantage between the VMS server and camera as that would need to record the highest resolution stream at all times anyway.

Looks to me that there is little advantage over transcoding unless you are zoomed in on a large number of streams simultaneously, since transcoding a single stream doesn't have that much overhead (say compared to transcoding every stream on a VMS).

Am I correct?

Looks to me that there is little advantage over transcoding unless you are zoomed in on a large number of streams simultaneously...

To be fair to Avigilon, their can solutions incorporate some extremely high megapixel cameras > 16 MP; in those cases the savings would be much greater, and the response time (speculating) faster.

Outside of the tiling/digital zoom thing though, I am unable to find any particular technique of HDSM 2 not in use by other vendors.

Though I am open to hearing of one...

To be fair to Avigilon, their can solutions incorporate some extremely high megapixel cameras > 16 MP; in those cases the savings would be much greater, and the response time (speculating) faster.

The frame rates of those high MP cameras are usually quite low aren't they, around 5 fps or so for a 30 MP camera... That would take the load of the decoders/transcoders quite a bit. 200ms per frame, that is quite a lot of time to re-encode.

But even without transcoding, I am thinking that when zoomed in, there would probably have to be a reasonable amount of motion in the stream/image outside the area the client is viewing before you see significant savings in bandwidth. That is, h.264 by its nature wouldn't be sending many changes in those areas in that case, (unless it has a low GOV, which is inefficient in itself). That seems to limit the situation where hdsm2 is useful a little bit further still. However, if there is a lot of motion, and they are only transmitting the h.264 macroblocks that are in the zoomed region, then again, because of the way h.264 works with motion estimation etc, decoding those macroblocks may be dependent on a macroblock from a previous frame that was outside of the region of interest and hence wasn't transmitted previously. What I am thinking, however, is that perhaps they are dividing the image into tiles and encoding each h.264 tile independently, that would solve that problem, but if they are doing that, they are not letting the h.264 encoder work as efficiently as it could do (again, in the case where there is a lot of motion).

Now, I don't mean to discredit any good work the Avigilon engineers may have done, my knowledge of h.264 isn't that profound, and I am guessing a bit. But the only way to know for sure is to measure it, otherwise it is hard to tell if it is a genuine technological achievement or just marketing...

What I am thinking, however, is that perhaps they are dividing the image into tiles and encoding each h.264 tile independently, that would solve that problem, but if they are doing that, they are not letting the h.264 encoder work as efficiently as it could do (again, in the case where there is a lot of motion).

Yes, this is what I believe they are doing. That way they don't need to decode...

are the 'tile' regions fixed? i.e. does HDSM always use the same size tiles over the full scene?

regardless of the answer to that, what happens if you zoom into an area that encompasses 2 or more different tiles?

further, if HDSM only streams the high res stream area that has been zoomed into, what is the bandwidth savings of sending high res stream 'regions' (areas that are zoomed into) vs sending low res streams that cover the entire FOV?

i don't understand, without the high res streams you don't have the details, so?

High res (zoomed-into area) = ?kb

Low res (full scene) = ?kb

If the sales talking point is the bandwidth savings of only sending the data required to render what is currently being displayed via digital zoom VS sending the low res stream of the entire scene, don't you first have to quantify what the avg for each of these is in kb rather than inferring any 'savings' by using this different method of delivering video data?

Or is the inference that other 'smart' multi-streaming VMS solutions send the entire high res stream (when digitally zooming or changing the size of the scene in a layout) and not only the zoomed-in portion of the stream - as HDSM purports to do?

is the inference that other 'smart' multi-streaming VMS solutions send the entire high res stream (when digitally zooming or changing the size of the scene in a layout) and not only the zoomed-in portion of the stream - as HDSM purports to do?

Yes, I believe that the usual move would be to handle the digital zoom locally, on the high res, without the server being involved.

If the sales talking point is the bandwidth savings of only sending the data required to render...

I suppose there is also the potential benefit of reduced cpu/memory load on the client for the decoding when zoomed in, but other's can achieve that using other methods.

only sending the data required to render what is currently being displayed via digital zoom VS sending the low res stream of the entire scene,

No, it is as you say in your last paragraph. But it looks to me, if you look at Ehtan's clip below that when zoomed in they might be sending the full low res stream as well as the zoomed high res tile stream, so the client can use it as a fallback until the zoomed tile arrives - that is when the camera is quickly moved to a different position.

Or is the inference that other 'smart' multi-streaming VMS solutions send the entire high res stream (when digitally zooming or changing the size of the scene in a layout) and not only the zoomed-in portion of the stream - as HDSM purports to do?

Yes, the other VMSs either do it like you said, or they transcode a stream for the client on the server side to achieve the same results as HDSM2 but alledegly at the cost of high cpu load on the server.

are the 'tile' regions fixed? i.e. does HDSM always use the same size tiles over the full scene?

Don't know for sure, but would say they are fixed at the time of encoding by the camera. Meaning that for different models they could have different sizes.

if you zoom into an area of two or more tiles I would assume they send both tiles and crop at the client.

The tile regions are fixed. The PRO line uses 12, the H4 4K uses 9.

If you digitally zoom and quickly move you can see the delay before the new tile loads and clears up, like so:

Also, if you zoom into the intersection of two or more tiles, it loads all of them. So you could end up displaying 4 of 9, for example, or 2 of 9.

If they do that, they'd be sending multiple h.264 streams in effect, to the VMS server from the camera, each encoding a different part of the overall image. And the VMS Server would store it in this format. But I am thinking that this might be less efficient than the standard approach of a single encoder encoding the entire image as one, especially in the case where there is a lot of motion. So where HDSM2 may give some advantage in some situations, it could have disadvantages elsewhere, storage capacity perhaps. See what I mean?

So where HDSM2 may give some advantage in some situations, it could have disadvantages elsewhere, storage capacity perhaps. See what I mean?

Sure it's possible. How much efficiency is typically lost by splitting into 8 tiles and then encoding? I don't have an idea, but it's easily tested.

How much efficiency is typically lost by splitting into 8 tiles and then encoding?

I don't know. If you have a lot of constant movement from one tile to the next for instance, there might be a bit, but I suspect not much. That could happen if it is a busy street perhaps with vehicles moving up and down. If it is a large football stadium, maybe not that much, as there is little movement between tiles.

According to Mr Honovich:

that's in queue for us to test it

I suggest test it with a lot of constant motion covering the entire FOV of the camera, and look at the bandwidth between the camera and VMS server compared to a camera with a similar h.264 configuration (same GOV, FPS, quality, resolution etc) but even then, I am not sure how much that will tell us.

Gentleman's, why don't all of you log into Avigilon demo
and play as much as u want with HDSM2
Avigilon demo site info can be gladly provided by partners,sales, and IPVM :)

Bottom line for me is that the "tiling solution" of HDSM is a unique and practical way of reducing bandwidth usage when digitally zooming.

That said, it's also kludgey and arbitrary. Consider that depending where.you zoom in on the screen, you might require 1x, 2x, 3x or 4x the bandwidth because where the fixed tile boundaries lie.

IMO, HDSM 1 with JPEG2000, was more elegant, due the format's built in progressive scaling.

Avigilon allows a user to change the display quality within the client to 4 different levels. I think HDSM must play a role in that quality selection because the cameras appear to only send a high and low resolution stream. I'm not sure how many company's provide that setting, but its very useful for Avigilon.

Those quality settings in the client help HDSM determine when the client gets a high or low resolution stream from the server. However, HDSM must some how reduce the video decompression needed even further and its happening on the client because the server doesn't mind how much work the clients are doing.

So is HDSM really a streaming technology or just a decompression method?

Will anybody answer main ? of this topic, please

Does Anybody Know Another VMS Which Has The Same/Similar Streaming Technology As Avigilon HDSM2?

My answer would be yes, most VMS companies have something similar, but not exactly the same.

They have something similar in the sense that they all face the same problem of managing high definition streams and deal with it to the extent they feel they need to. They may go about this using a combination of efficient transcoding, multi-streaming, H.265, smart codecs, and perhaps other techniques that remain trade secrets. These methods have an advantage in that they all work with most third party cameras adhering to open protocols. Avigilon has the opposite advantage, in that they control both ends of the channel, the VMS server and the camera, so can do clever things.

Will anybody answer main ? of this topic, please

Bottom line for me is that the "tiling solution" of HDSM is a unique and practical way of reducing bandwidth usage when digitally zooming.