From what you describe, it sounds like you have a network with several hops, possibly some bridging, and maybe even some routing.
Ignoring pros and cons of physical access to the recorders for things like maintenance/environment, etc., you're basically weighing two approaches:
1) Recorders closest to the VMS Client/Operators (again, from a pure networking not physcial perspective) will minimize latency when accessing stoed video and generally make the system seem more responsive.
2) Recorders closest to the cameras will minimize chances of lost or degraded video due to network latency, network outages, etc.
Increased costs might not be as bad as first assumed, since most larger deployments will end up with more than one recording platform anyway. So, you might end up with only 2x-3x the minimum amount of required equipment, even if you more than just 2 remote sites. There may also be cost savings in network infrastructure not having to aggregate all the data back to a single point.
The "best" solution might depend on how the overall system will be used, if it's weighted more towards real-time use and loss of recorded video is acceptable, then centralized recorders might be "better". If the system is likely going to be part of crime reduction/prosection, then you might want to ensure the highest chance of getting the best possible recording, which would tend to be recorders close to the cameras.
As you might see in some other discussions, there can be hybrid approaches depending on the equipment and budget. Multicast to both local and remote recorders. SD cards in cameras (generally not efficient for pulling video from, but "in case of emergency" far better than nothing at all). Unicast to multiple recorders, etc.