The $400,000 Question: When Should AI Make Decisions in Your Business?
An Executive Brief on Strategic AI Automation
Futurist AJ Bubb, founder of MxP Studio, and host of Facing Disruption, bridges people and AI to accelerate innovation and business growth.
In December 2024, Deloitte Australia signed a contract worth $440,000 AUD to deliver an independent assurance review for the Australian Department of Employment and Workplace Relations. The assignment seemed straightforward: review the IT system used to automate penalties in Australia’s welfare system.
The report Deloitte delivered was polished and authoritative. It contained detailed analysis, cited court judgments, and referenced academic research. It looked exactly like what you’d expect from a Big Four consulting firm.
Then someone actually read it carefully.
The quote from a federal court judgment? Fabricated. The academic research papers cited throughout? They didn’t exist. The footnotes and references? Wrong.
This wasn’t a small project handled by junior staff. This was Deloitte - one of the world’s premier professional services firms - delivering work to a government client. The kind of work that gets scrutinized. The kind where accuracy isn’t optional.
The Australian government demanded answers. Deloitte refunded $63,000 USD, published a revised version of the report, and became an international case study in what happens when AI-generated content bypasses proper human oversight.
The technology worked perfectly. Deloitte’s judgment about when to rely on it didn’t.
The Illusion of Progress
If this story sounds extreme, it shouldn’t. We’re watching it play out across industries with numbing regularity.
Air Canada’s chatbot promised a customer a bereavement fare policy that didn’t exist. When the customer held them accountable, Air Canada argued the chatbot was “a separate legal entity” responsible for its own actions. A tribunal wasn’t amused. The airline paid.
A legal tech company’s AI drafted briefs citing cases that never existed. Lawyers submitted them to court. Sanctions followed.
Marketing teams automate social media only to have their AI post tone-deaf content during a crisis because nobody thought to add human oversight when context changed.
The pattern is always the same: sophisticated technology, impressive demos, confident deployment - and then the moment when everyone realizes nobody asked the most important question.
Not “Can AI do this?”
But “Should AI do this, and under what conditions?”
The Real Crisis Isn’t Technical
Here’s what keeps me up at night: According to RAND Corporation, 80% of AI projects never make it past the pilot stage. Gartner reports that 85% of AI projects deliver inaccurate outcomes.
The common assumption is that these failures are technical - models that aren’t accurate enough, systems that aren’t robust enough, infrastructure that isn’t ready.
That assumption is wrong.
The failures are almost always about judgment. About organizations that can identify what AI is capable of but can’t systematically evaluate whether deployment is appropriate. About teams operating without a framework to assess the real risks they’re taking.
You’ve felt this pressure. The board asks why your competitors are “leveraging AI” and you’re not. Your team talks about “falling behind.” Industry analysts publish breathless reports about transformation and disruption. The CEO forwards articles with subject lines like “Is this us in 5 years?”
So you move fast. You pilot tools. You automate processes. You chase efficiency.
And sometimes - often - you create risk you didn’t fully understand and can’t effectively manage.
What Actually Matters
After two years of working with organizations implementing AI, I’ve realized the hardest part isn’t teaching people about large language models or prompt engineering or RAG architectures.
The hardest part is teaching people to slow down and think clearly about risk.
Think about what happened at Deloitte. This wasn’t a startup experimenting with new technology. This wasn’t a tech team running an unsanctioned pilot. This was one of the most respected professional services firms in the world, delivering work to a government client under a formal contract.
They had the expertise. They had the resources. They had every reason to get it right.
What they apparently didn’t have was a systematic way to assess when AI output needed human verification and when it could be trusted.
Because the truth is this: With enough time, money, and engineering effort, AI can probably do most tasks. The question that matters - the only question that matters - is whether it should.
That question has three components most organizations never systematically consider:
What happens when things go wrong? Not what happens on average. Not what happens in demos with cherry-picked examples. What happens in the worst case, when the AI fails in exactly the way you didn’t anticipate?
How quickly will you know about it? Errors caught in an hour are manageable. Errors discovered after a week - or a month, or when a government client demands a refund - are catastrophic.
Can you actually fix it? Some mistakes you can take back with an apology and a corrected email. Others require refunds, revised reports, and become international news stories about your firm’s quality control failures.
Impact. Detection speed. Reversibility.
Three questions that determine whether automation is strategic or reckless.
The Framework That Changes Everything
The Traffic Light Framework is almost embarrassingly simple. That’s the point.
Red means stop. Human judgment remains non-negotiable. AI can assist - doing research, preparing briefings, drafting materials - but humans make every decision and own every output. Legal work. Strategic decisions. Anything with serious consequences. When the stakes are high, speed isn’t the goal. Accuracy is.
Yellow means proceed with caution. AI does the heavy lifting, but qualified experts review everything before it goes live. Not junior team members rubber-stamping outputs. Not perfunctory checks that take thirty seconds. Real review by people who could do the task themselves and know what good looks like. Customer-facing content. First-draft contracts. Support responses. The reviewer’s expertise matters more than the AI’s capability.
Green means go. Automate confidently with spot-checks, not systematic review. These are the repetitive, low-stakes tasks draining your team’s time and energy. Expense categorization. Meeting scheduling. Data entry. Document formatting. When errors are obvious, fixes are fast, and consequences are minimal, you’re not being cautious by reviewing everything manually - you’re being inefficient.
The elegance is in the clarity. Every task gets classified. Every classification has clear rules about human involvement. No ambiguity about who’s responsible when something goes wrong.
What Success Actually Looks Like
Let me tell you a different story.
Duolingo wanted to expand their educational content into forty languages. Traditional translation was slow and expensive. AI translation was fast and cheap but potentially inaccurate.
So they started with 100% human review - yellow light treatment. Native speakers checked every translation before publication. They monitored quality metrics obsessively. They tracked which types of errors appeared and refined their approach.
After three months of validated quality, they moved to spot-checking 10% of translations for established content types. Green light, earned through demonstrated performance.
The result? They reduced translation costs by 40% while maintaining quality scores. New language courses launched three times faster than before.
The key wasn’t the AI. The key was the systematic assessment of risk and the discipline to earn each step of increased automation through proven results.
The Risk Nobody’s Talking About
Here’s what worries me most: Classification isn’t static.
The automation you deployed six months ago under one set of conditions might need different oversight today.
Your social media automation works great - until your company becomes involved in a public controversy and suddenly every post is being screenshot and analyzed. What was low-stakes yesterday is high-stakes today.
Your customer service chatbot handles routine inquiries well - until it starts making promises that create legal obligations. Now you’re Air Canada, arguing in court that your chatbot is its own entity.
Your pricing algorithm optimizes effectively - until someone notices it’s subtly discriminatory and you’re facing regulatory action.
Scale changes risk profiles. Context changes risk profiles. New regulations change risk profiles.
Smart organizations don’t just classify tasks once. They reassess quarterly and have clear triggers for when to immediately add more human control. They understand that “set it and forget it” is how you end up making front-page news for the wrong reasons.
The Choice You’re Actually Making
Let’s return to Deloitte for a moment.
Here’s what makes their situation particularly instructive: according to their own statement, “the substance” of the review was retained. The actual analysis, the core findings, the recommendations - those were apparently sound.
What failed were the citations. The academic credibility. The supporting evidence that makes the difference between a professional deliverable and something that looks professional but can’t withstand scrutiny.
In other words, they got the hard part right and failed on what should have been the easy part: verification.
That’s the insidious thing about AI errors. They don’t look like errors. They look authoritative. They’re grammatically perfect, properly formatted, and confidently stated. The fabricated court quote probably read better than the real one would have. The nonexistent research papers probably had perfectly plausible titles.
Someone at Deloitte made a call - probably unconsciously, probably under time pressure - that this work didn’t need the level of verification that would have caught those errors. Maybe they thought AI-generated citations were low-risk. Maybe they assumed the AI wouldn’t fabricate sources. Maybe they simply didn’t have a framework to assess when AI output needed human verification.
Whatever the reason, the result was the same: a $63,000 refund, a revised report, and a case study that will be taught in professional services firms for years as an example of what not to do.
You’re going to automate. That’s not the question.
Your competitors are already doing it. Your team expects it. Your customers will increasingly demand the speed and efficiency it enables.
The question is whether you’ll automate strategically or recklessly.
Whether you’ll have a systematic way to assess risk or make decisions based on demos and pressure and the assumption that “AI is good at this kind of thing.”
Whether you’ll build sustainable competitive advantage or accumulate technical debt and brand risk that will eventually explode in ways you can’t predict or control.
The Traffic Light Framework isn’t revolutionary. It’s a structured application of risk management principles to automation decisions. But in an environment where everyone feels pressure to “do more with less” and fears missing out on AI’s potential, having a clear method to assess these decisions turns out to be surprisingly valuable.
The companies that will win aren’t the ones automating the most tasks the fastest.
They’re the ones automating the right tasks, with appropriate safeguards, creating value they can sustain and defend.
What This Means for You
You don’t need to automate everything this quarter.
You need to automate strategically. You need to know the difference between tasks where AI assistance makes you faster and tasks where AI autonomy creates unmanaged risk. You need systems that learn from each implementation instead of repeating the same mistakes.
Most importantly, you need to answer one question clearly and honestly for every automation you consider:
“What happens when this goes wrong - not if, but when - and can we live with those consequences?”
If you can answer that question and still sleep well at night, automate.
If you can’t, slow down. Add oversight. Build capability. Earn the right to automate through demonstrated performance and proven safeguards.
The goal isn’t speed.
The goal is judgment.
And judgment is what separates the organizations that will thrive with AI from those that will become cautionary tales about moving too fast without thinking clearly about risk.
AJ Bubb is a futurist, innovation strategy consultant, and founder of MxP Studio. He helps organizations navigate AI implementation through practical, risk-based frameworks that create sustainable value. His work has appeared in Forbes, and he hosts the Facing Disruption podcast for 15,000+ innovation leaders. Learn more at mxp.studio.

