There is a factually correct response to this and it should not be based on other users' numbers. For example, someone might say 2Mb/s, another person might say 4Mb/s and they may certainly be right for their scenarios and camera models. The big issue is that the 'right' answer depends on your specific conditions.
You should use VBR, you should use a Cap but what the cap should be depends on a few factors:
- What is the frame rate? The average and max required bit rate increases as frame rate increases (see: Testing Bandwidth vs Frame Rate).
- What are the scenes you are monitoring? Cameras covering busy areas (e.g., intersection) need higher caps than those monitoring empty ones (e.g., stairwells). Even within your 1000+ cameras, some likely will need higher caps than others.
- What camera models are you using? Even if all the camera use H.264, all use the same profile, all the same resolution, all the same compression, bit rates consumed will vary depending on the sensor used and the image settings of the camera.
The main reason for using a cap is for night time. Whether you are using IR and even if your night time scenes are moderately bright, you are almost always going to see spikes in bandwidth consumption at night, compared to the day, that are worthless in terms of increasing usable video quality (see: Tested: Lowering Bandwidth at Night is Good).
What you what to do is set the cap such that it is higher than whatever each camera needs during the day but lower than what bandwidth spikes at night. Here is one real world example - Camera A consumes max of 2Mb/s during the day but spikes to a steady 8Mb/s at night. You could easily set the cap at 3Mb/s and save a lot of bandwidth / storage.
As for your second question:
"Also, for 30 days of video, how large of a file on average are you seeing given these resolutions?"
If you can describe camera models, scenes, etc., then we can better estimate. Otherwise, abstractly, you can reasonably have a 20x difference in storage consumption for a given resolution, simply because of differences in frame rate, compression level, cap used, camera type used, scenes monitored, etc. I am not even counting motion vs continuous recording which also would have an impact.