Every real AI system has two kinds of parts. There are the parts that must be exact: the billing, the validation, the access rules, the things an auditor will ask about. These are best written the old way: rules, written by people, that run the same every time. And there are the parts that need judgment, such as reading a complaint, drafting a reply, working out what is wrong with something. These are where a model earns its place, because the rules for them could not be written down even if you tried.
The line between those two kinds of parts is the most important decision in the build. It is also the easiest to get wrong, because it is rarely made explicitly. A team builds, and the line ends up wherever the building happened to put it.
A line drawn carelessly fails in one of two directions. Drawn too far one way, a model is handed something that should have been exact, and a system ends up with a model deciding what a number is, which is precisely what no one wants. Drawn too far the other way, rules are written for something that genuinely needs judgment, and the system is brittle, unable to handle the case no one anticipated.
Drawing the line well takes deliberate work. For each part of the system, the question is plain: does this have to be exact, or does it need judgment? The answer is usually clear once the question is actually asked. The discipline is in asking it for every part, rather than letting the answer fall out of how the code happened to be written, and in documenting the reasoning, so the line can be defended later to a reviewer, an auditor, or a board.
This is unglamorous work, and it is the work that most determines whether a system holds up. A carefully drawn line is not visible in a demo. It becomes visible in the third month, when the system meets the cases the demo did not, and handles them, because each case reaches the part of the system built to deal with it.