A lot of good (and curious) comments in this thread.
First off, IMO, the fundamental idea of "less is more" is certainly true here. Less devices in the field, less moving parts, less things to break and maintain is certainly the ideal approach. You have to weigh that against what is the overall cost of the various solutions. I think when you look at all the factors and the state of technology today, you'll find that there aren't very many acceptable options for edge analytics where the analytic comes from a 3rd party running on some other camera hardware.
Axis has some native on-board edge analytics, tripwire and coarse motion detection. This isn't a terrible choice for when you're looking for a simple solution covering a small area in a low activity outdoor environment. By low activity I mean minimal motion at all, not just low people activity. It's fairly cheap and serves a certain set of use cases OK. I don't think even Axis is going to tell you this is a head-to-head competitive option to more advanced products though.
You can also use some analytics software that is designed to be hosted on a general purpose camera. This option tends to be a little better, but the cost increases significantly, as does the setup and maintenance complexity. If you have full time operators that are highly trained in both systems they might be able to get by with this because they can continuously "tweak" stuff to slowly filter out false alarm objects over time. It can be high-maintenance, but workable.
Server-based systems are pretty rare these days, but there are a few that exist. They tend to fall between the "free" analytics, and the good edge-based stuff. Giving you decent performance in less demanding environments where the overall performance of the analytics isn't one of the top-3 measured criteria for the deployed system.
If you test all these options and look at total system cost, ongoing setup and maintenance costs, and false alarm rates you'll likely come to the conclusion that your best option is most likely to find a thermal camera optimally suited for the viewing task, and an analytics option optimally suited for the analyzing task.
There are lots of thermal camera options out there, one thing I always suggest people look at when evaluating cameras is the overall exposure control options on the camera. IMO, FLIR has some of the best knobs and dials in this capacity. Ideally, you can go with out-of-the-box settings, but like optical cameras, you may need to tune things for the specific environment to get the highest contrast image.
We (VideoIQ) can take an analog or IP feed from a thermal camera and add our edge storage and analytics with our Rialto A4 or I4 respectively. I think you'd find that relative to the other options discussed (and not discussed) so far, we'll end up giving you the best overall performance and flexibility in this scenario. Of course, I'm paid to say that, but I only cash those paychecks because I firmly believe in the product.
Simone also makes some good points in his comments, you need to factor your design according to what the customer really expects or is willing to tolerate. His numbers are a little more conservative than what I usually tell people, but it gets to the same point. You are essentially building a sensor network, if you try to push those sensors to the limit, then you are likely to suffer from decreased performance, just like any other sensor.
For thermal cameras we've been working pretty closely with FLIR over the years. I've come up with the following chart that shows typical semi-conservative coverage ranges/areas for various FLIR cameras coupled to our Rialto analytics appliance:
This is neither a "best case" or "worst case" chart, it's designed to show typical deployed coverage ranges. This would mean you have decent object contrast, clear shot (you're seeing most of the person, they're not heavily obscured by foliage for example) and so forth. It's designed to have enough pixels on target to analyze things properly to intelligently IGNORE as well as DETECT. Also, this isn't showing all the FLIR models, it was focused heavily on the newer FC cameras, as those have been tremendously popular lately, but you can use the same basic HFOV data for other F-Series cameras not listed here.
Obviously, as you try to opt for longer ranges you run greater risk of various things in the environment preventing you from getting good clear shots over the entire distance. I don't recommend you try to get detection beyond about 1000' unless you fully understand the environment and all of the components of the solution.
The next question people usually ask is "what rule works best". In most of our deployments it's pretty simple: you draw an ROI (Region of Interest), and tell the system to alert you when a person is active in that ROI during a specified time period. It can be 24/7, or only at certain times or on certain days. Many people initially think that some kind of tripwire rule is best, but in many cases tripwire is a mask for a deficient product. It's used to reduce false alarms because the system is only looking at activity in a very narrow area, and then just looks for blobs of pixels crossing a line in a particular direction. This reduces false alarms by ignoring most of the image, but if something prevents you from seeing the person just as they are crossing the tripwire, they are "home free" in the scene relatively quickly. We are able to analyze the entire FOV continuously and accurately, so we don't need to do tricks with the rules to filter out nuisance objects. Ideally, we see the person as they enter the ROI and trigger an alarm, essentially acting as a tripwire arond the edges of the ROI. But, if there is some rain that night, or the person has snuck in next to an overgrown bush, and we don't get to "see" them until they are 20' into the perimeter (and past the point where any tripwire would have been) we will still generate an alarm. By making the analytics smarter, we reduce the need to overthink the rules.
There are also "behavior" analytics options, but that has been more marketing gimmick than reality. Customers know what they want... "Tell me when a person is entering my property". We don't need to develop a pattern of behavior in the area, we know what we want to catch, where we want to catch it, and when we want to catch it. Also, if there is lots of intrusions, you run the risk of those behaviors becoming "normal" over time. Similarly if the customer wants to run regular intrusion tests of their own on the system you want to ensure those activities don't contribute negatively to the system's profile of the scene.
Sorry for the long reply, I wasn't intending to write that much, but hopefully it gives you some additional things to consider. As always, if you'd like to challenge my recommendations you're welcome to get an eval unit as test it for yourself :)