The debate about whether AI replaces humans misses the part the data actually shows. The combination wins. Not because of optimism. Because the numbers say so. Three industries, three studies, one pattern.

Radiology: human plus AI doubled the catch rate for malignant nodules

Radiology has had AI tools longer than most fields. The studies are mature enough to tell us something real.

A structured review published in 2025 looked at AI deployments across radiology workflows. Across the board the picture was that AI on its own missed cases the human caught, and the human on their own missed cases the AI caught. Combined, the catch rate for malignant nodules approximately doubled compared to the AI working alone.1

That review also reported AI driven workflow improvements of 30% to 75% in scan time and 30% to 50% faster reporting, with no change in patient mortality.1 Volume went up. Time per case went down. Quality held.

A separate study tracking more than 100,000 pulmonary embolism scans showed a more nuanced picture of how humans and AI actually disagree. When the AI flagged an embolism, radiologists agreed with the call 84% of the time. When the AI predicted no embolism, they agreed 97%. The interesting part was that disagreement evolved over time. In year one of deployment, radiologists rejected AI positive flags 30% of the time. By year two, the rejection rate dropped to 12%.2

That last number is what most people miss. The combination got better as both sides learned what the other was good at. Static replacement does not show this pattern. Collaboration does.

Customer service: less experienced agents got more skilled, faster

The Stanford and MIT study of more than 5,000 customer service agents at a Fortune 500 company is the cleanest data we have on human plus AI in a high volume work environment.

On average, agents using a generative AI assistant resolved 14% more issues. That number does not tell the story.

When researchers broke the data down by agent experience, less skilled and less experienced workers improved most. They resolved 34% more issues per hour. Two month agents using the AI performed at the level of six month agents who did not have it.3

The senior agents barely moved. The junior agents leapt forward.

The takeaway is not that AI replaces senior people. The takeaway is that AI shortens the apprenticeship curve. A new hire becomes useful faster. The senior person spends less time correcting basic mistakes and more time on the calls that actually need their judgment.

Consulting: AI helped the lower performers more than the top ones

The Harvard Business School and Boston Consulting Group study had 758 consultants run through 18 realistic consulting tasks. Half had GPT-4 access. Half did not.4

The group with AI completed 12.2% more tasks and finished 25.1% faster. Forty percent of their work was rated higher quality than the control group.

The most interesting finding was the spread. Below average consultants improved by 43%. Above average consultants improved by 17%. The combination compressed the performance gap between the bottom and the top of the team.

For a small business with a small team this matters more than the average. If your weakest hire becomes meaningfully better and your strongest hire becomes a little better, your team is now operating at a higher floor without changing who is in it.

What the three industries have in common

Three different fields. Three different studies. One pattern.

AI catches things humans miss. Specifically, the low frequency cases. Things humans see rarely enough that pattern recognition is weak.

Humans catch things AI misses. Specifically, the edge cases. The context that does not fit the training data. The patient with a backstory the model never saw.

The combination is better than either side alone. Not by a small margin. By enough that operating without it is starting to look like the more expensive option.

The gain is bigger for the weaker side. Junior people improve more than senior ones. Below average performers improve more than top ones. The combination raises the floor faster than it raises the ceiling.

What stops this from working

Three things.

The human stops thinking. When the AI is mostly right, the human starts rubber stamping. Then the AI is wrong on case 47 and the human ships the wrong call.

The AI is wrong in a way the human cannot see. If the model fabricates a citation, a number, or a name, the human reading it might not catch the fabrication. This is why every output that matters needs to be verifiable, not just plausible.

The system has no feedback loop. If the human disagreed with the AI and was right, that should feed back into how the AI gets used. If it does not, the same disagreement keeps happening.

The radiology data above shows what happens when the feedback loop works. Disagreement rates drop from 30% to 12% over two years because the humans learned when to trust the AI and the AI got better at the calls humans trusted.

How a small business uses this

Pick one role where the work is repetitive and the stakes per decision are moderate. Customer service, scheduling, intake, drafting, data entry.

Pair the human in that role with one AI tool that handles the structured parts. Let the human handle the judgment parts.

Track two things: how much faster the work gets done, and how often the human overrides the AI. Both numbers should move in the right direction over a few months. If they do not, the tool is wrong for the work or the workflow is wrong for the tool.

This is the only configuration where the studies above translate into a small business reality. Without that structure, you are not running a human plus AI system. You are running either an unsupervised AI or an unaided human. Both of those are worse than the combination.

Sources

  1. "AI in radiology and interventions: a structured narrative review of workflow automation, accuracy, and efficiency gains" (2025). PMC. 30 to 75% scan time reduction; 30 to 50% faster reporting; AI plus human approximately doubles malignant nodule detection vs. AI alone. pmc.ncbi.nlm.nih.gov
  2. "Human-AI Collaboration in Radiology: The Case of Pulmonary Embolism" (2026). arXiv 2601.13379. Yale, Stanford Hoover. arxiv.org
  3. Brynjolfsson, Li, Raymond (2023). "Generative AI at Work." NBER Working Paper 31161. Stanford Digital Economy Lab and MIT. nber.org
  4. Dell'Acqua et al. (2023). "Navigating the Jagged Technological Frontier." Harvard Business School and Boston Consulting Group. 758 consultants, 18 tasks, 12.2% more tasks and 25.1% faster with GPT-4. Performance gap compression. hbs.edu

Find the role in your business where this combination wins.

The diagnostic identifies where a human and a tool together would outperform either alone, and which roles should stay fully human for now.

Book a partner call