Police forces in England and Wales have been ordered to immediately halt the use of AI tools in court statements. The directive came from senior officials after it emerged that forces had been deploying commercial AI — including Microsoft Copilot — to draft official evidence without any prior assessment. The accompanying “solution”: tell officers to review everything the AI produces. That is not a solution. It is the problem restated.
How UK Police Got Here
The backstory starts in November 2025. West Midlands Police used Microsoft Copilot to compile an intelligence report for a Europa League match between Maccabi Tel Aviv and Aston Villa. Copilot obliged by inventing a match between Maccabi Tel Aviv and West Ham — a game that has never been played — and populated it with reports of “violent clashes and hate crime offences.” West Ham was playing Olympiacos that week.
Police used that hallucinated report to justify banning Maccabi fans from attending the real match. When the error surfaced, Chief Constable Craig Guildford first told Parliament no AI had been involved. He later admitted the truth in writing. In January 2026, Home Secretary Shabana Mahmood said she had lost confidence in his leadership. Guildford resigned. And by February 2026, 21 police forces were still using Copilot — five months after a police chief lost his job over it.
The Directive and Its Gap
The national halt directive addresses the symptom — AI-drafted court statements — but the accompanying policy reveals how little has been learned. Officials told forces that “all forces will have a policy that says, ‘Check everything that it produces.'” The Hacker News discussion on this story cut straight to the problem in five words: “Check against what?”
When source material is body camera audio processed by AI into a transcript, then processed again into a report, errors compound at each step and become invisible. A reviewer reading the final document has no original to compare against. They see a professional-sounding statement, formatted like a real one, written in the register of a real one. Human complacency does the rest. The “review it” mandate is how you get an officer swearing under oath to what a machine believed it heard.
The Same Pattern, Everywhere
The UK is not alone. Connecticut’s Chief State’s Attorney imposed a moratorium on AI-generated police reports after the same failure pattern emerged with Axon’s Draft One software, which converts body camera audio into written statements. Defense lawyers in Connecticut pointed out that officers would effectively be signing off on whatever a computer inferred from chaotic, multi-speaker, emotionally charged scenes. One illustrative case from Utah: a body camera captured a television playing in the background, and the AI generated a report stating that an officer had “turned into a frog.”
The legal fallout is mounting globally. Courts have documented over 1,200 fabricated citations generated by AI in legal filings. US courts imposed more than $145,000 in sanctions for AI hallucinations in Q1 2026 alone. In the UK, the High Court in Godwin v Godwin found it could not be confident that AI-assisted witness statements reflected the witnesses’ own words and ordered that the evidence be treated with caution. This pattern of AI fabricating details in official documents is not limited to law enforcement — but the stakes in criminal justice are uniquely severe.
What “Properly Assessed” Actually Requires
Officials said forces had deployed AI “before it had been properly assessed.” Proper assessment in a criminal justice context means testing for hallucinations at scale under adversarial conditions, with legal disclosure requirements built in from the start. It means establishing what happens to a case when the AI report later contains fabricated details. It means giving defendants the right to know which parts of evidence against them were AI-generated.
What it does not mean is deploying a commercial productivity tool, putting “please review the outputs” in the policy manual, and calling it governance. Police forces in England and Wales were doing the latter. Now a police chief is gone, a national directive has been issued, and 21 forces were still running the same tool five months after the first public failure. The pattern is not a bug in one deployment. It is what happens when efficiency is treated as a justification and assessment is treated as a formality that comes later — if it comes at all.













