As an aside, you will see more and more Gigabit Ethernet IP cameras on the market soon. The cost of going 10/100/1000 is getting smaller and smaller as more commodity items switch over.
There are real benefits to be gained for switching. Many of the problems that he spoke of in the video are related to poor infrastructure choices, but Cat 5E vs 6 is rarely the problem.
H.264 is very 'bursty' when it communicates. I frames are big but few, P frames are small but numerous. The 'instantaneous' bandwidth of the stream can easily exceed 100Mbps even for a 4Mbps stream (heavy emphasis on instantaneous). Cameras are routinely asked to provide multiple concurrent streams out of the camera, so you have this mishmash of I and P frames that are all trying to get through.
What happens is (simplistically speaking) you get a flurry of P frames trying to get out the same pipe, usually over a pretty congested network link. Switches are REALLY cheap these days and do a pretty crappy job of handling a bunch of traffic on their internal backbone. The P frames usually manage to fight their way through and arrive, more or less in the same order and without too much time delayed between the frames.
Then you get a giant I frame that comes along like shamoo on a waterslide. The P frames start to pile up in the camera, causing buffering issues. The aggregate bitrate is far less than 10/100, but when the I frames go through, it can take longer than required to make it from point A to point B.
With just 1 stream and a 1080P30 camera, 10/100 is usually fine. With 8 streams all going into a $30 switch, you can see problems; especially when the switch *itself* only has a 10/100 connection to the world.
The packets all get congested arrive out of order or MIA and you end up with ‘jittery’ or torn video. TCP/IP does a lot to make all these problems invisible to the user, but it’s a major problem.
H.264 is very allergic to dropped frames, so tearing and other badness is not hard to achieve on a congested network.
Consumer applications like YouTube and Netflix solve this by buffering a large time amount of data, so it really doesn't matter if it took 1-2 seconds for the frame of video to be repaired using TCP/IP retransmissions. By the time the data gets sent to the video player, it's all properly ordered and intact.
Security applications try very hard to reduce the amount of latency on the video.
Buffering creates latency. As such, buffers on the playback side (and sending side) are usually as small as possible. Any problems with getting the packets out of order and/or missing, aren't given enough time to be fixed, which results in playback errors.
By going to gigabit, those I frames are allowed to flow through the system like whale $#!T in an ice floe, so there’s less congestion and a lot less problems with dropped packets and other ‘weirdness’ to the system.
Crappy switches are far more of a menace to IP video than Cat 5E.