Right now, ONVIF is just another protocol to support. Actually, it is multiple protocols, because different camera manufacturers have different interpretations of the ONVIF spec (no wonder). It's a protocol that camera manufacturers can follow if they so please. In time it might become a de-facto standard, but I am not waiting with bated breath.
ONVIF should forget about device discovery, authentication, gateways and whatever else they are trying to solve. Just focus on 3 simple things. 1) Getting video, 2) Getting events and 3) Optical control.
To figure out what video formats the camera supports, we could agree on a URI, e.g. a client can open the URI http://ip/ONVIF/videoformats and get a list of supported formats in JSON format for example. For each format, we could define a URI that gives you the supported parameters. This could be http://ip/ONVIF/formatcaps?formatid=[id]. It might tell you what resolutions the camera supports in the specified format.
I believe that it would be reasonable to define a set of options that must be supported (for H.264 VBR and CBR), and thus we wouldn't need to query those caps. But resolutions do differ on different cameras, and we need a way to query this.
Once we have determined the cameras capabilities, the RTSP URI that you open should ALWAYS be the same - e.g. rtsp://ip/stream/[and then the capabilities that we have negotiated]. Like so
Instead every single manufacturer wants to invent a different URI for their particular camera. That alone means that either the user/integrator has to dig out this info from the manual (who reads them), and enter it for every single camera, and update the URIs if they change the camera, or update the firmware. It. makes. no. sense.
If camera manufacturers could agree on, at the very least, a standard RTSP URI we'd have come a long way - even we omitted the capability discovery mechanism it would make things much simpler.
PTZ controls and changes to apperture, gain and so on could follow the same principle. How many ways do you need to express the desire to move the camera to the left, or to change the f-stop?
Events are slightly different. The simplest support should just be to get the camera to send a TCP packet with a certain syntax when "something happens" to an IP address. Alternatively, the server can connect and the camera can send info downstream. More advanced events that carry event meta-data would require a little more work. I expect a simple key-value array could be sent from the camera to the server.
When I started in this business around 2000, I drafted a (naive) universal spec. I spoke to Axis about it. At the time they said "everyone else can just use VAPIX" which was a great idea, because VAPIX is simple and very easy to use. Then they started suing people who did. Axis should scrap ONVIF and make VAPIX a standard.
But I guess being an armchair coach is always easy :)