Can AI Think Like a Judge? A New Framework Tackles the Hardest Question in Legal Tech

A decade-long competition and a new research paper reveal why legal reasoning remains AI’s toughest challenge and what it will take to overcome it

Nov 05, 2025

For over ten years, computer scientist Randy Goebel and his colleagues in Japan have been quietly running one of the most revealing experiments in artificial intelligence, a legal reasoning competition based on the Japanese bar exam. The challenge is to have AI systems retrieve relevant laws and then answer the core question at the heart of every legal case of whether the law was broken or not. That yes/no decision, it turns out, is where AI stumbles hardest. And that struggle has profound implications for how, and whether, AI can be ethically and effectively deployed in courtrooms, law offices, and judicial systems under pressure to deliver justice quickly and fairly.

A new paper, “LLMs for legal reasoning: A unified framework and future perspectives”, builds on this competition and outlines the types of reasoning AI must master to “think” like legal professionals. The accompanying article, “Is AI ready for the courtroom?”, explores the stakes and shortcomings of current AI tools, especially large language models (LLMs), in legal contexts. Together, they offer a roadmap for how AI might one day support, not replace, human judgment in law.

The Problem: Why Legal Reasoning Is So Hard for AI

Legal reasoning isn’t just about reading laws. It’s about interpreting them in context, weighing competing facts, and constructing plausible narratives. The authors of the paper are tackling the fundamental problem of how to equip AI systems with the ability to reason like lawyers and judges. That means moving beyond pattern recognition and text prediction to something deeper; logical inference, contextual understanding, and ethical judgment. Current LLMs can summarize documents and mimic legal language, but they often “hallucinate” facts or fail to connect legal principles to real-world scenarios. In high-stakes environments like courtrooms, that kind of error is dangerous.

The Three Types of Reasoning AI Must Learn

To function effectively in legal settings, AI must master three distinct types of reasoning:

Case-Based Reasoning
Rule-Based Reasoning
Abductive Reasoning

Each plays a different role in how legal professionals think through problems.

Case-Based Reasoning: Learning from Past Decisions

This is stare decisis, the legal world’s version of “precedent.” Lawyers and judges often look at previous cases to see how similar situations were handled. AI systems using case-based reasoning compare the facts of a new case to past ones, identifying patterns and outcomes that might apply. It’s like saying, “In a similar case last year, the court ruled X, so that might apply here too.” LLMs are relatively good at this because they’ve been trained on massive datasets that include legal texts. But they still struggle to weigh which cases are most relevant or how subtle differences might change the outcome.

Rule-Based Reasoning: Applying the Law to the Facts

This is the bread and butter of legal analysis. Rule-based reasoning involves taking written laws such as statutes, regulations, and codes, and applying them to the facts of a specific case. For example, if the law says theft requires “intent to permanently deprive,” the AI must determine whether that intent existed in the case at hand. AI can handle this to a degree, especially when the rules are clear and the facts are straightforward. But real-world cases often involve ambiguity, conflicting rules, or exceptions that require human judgment.

Abductive Reasoning: Building Plausible Narratives

This is where AI falters most. Abductive reasoning is about constructing the most plausible explanation for a set of facts. It’s the kind of thinking that asks, “What could have happened here?” and then builds a narrative that fits the evidence. In legal terms, it’s the difference between saying, “The man had a knife” and asking, “Did he stab the victim, or did something else happen?” This kind of reasoning requires imagination, context, and a sense of plausibility, qualities that current LLMs lack.

What the Researchers Did

Goebel and his team designed a framework that breaks down legal reasoning into these three types and tested AI systems against legal problems drawn from the Japanese bar exam. These problems are complex, realistic, and require nuanced judgment. The researchers didn’t just look at whether the AI could retrieve laws, they wanted to know if it could reason through them.

The results were sobering. While AI systems could handle case-based and rule-based reasoning to some extent, they consistently failed at abductive reasoning. They couldn’t build coherent narratives or explain why a particular outcome made sense. Worse, they often invented facts or misapplied laws, making them unreliable in legal contexts.

This finding underscores the critical point that AI is not ready to make legal decisions on its own. But it also highlights a path forward. By developing specialized reasoning frameworks and combining them with LLMs, researchers may be able to build tools that support human decision-making rather than replace it.

Advancing AI in Legal and Judicial Systems

The framework proposed by Goebel’s team could help create modular AI tools tailored to specific legal tasks like retrieving statutes, summarizing cases, and identifying relevant precedents, without pretending to offer perfect judgment. This approach respects the complexity of law and the ethical stakes involved. Rather than chasing a “godlike” AI that can do everything, the researchers advocate for a toolbox of specialized systems, each designed to assist with a particular aspect of legal work. That’s a more realistic and responsible vision for AI in law.

If developed and deployed carefully, these tools could help legal professionals manage heavy caseloads, reduce delays, and improve access to justice. In countries like Canada, where the Supreme Court’s Jordan decision has shortened trial timelines, such tools could prevent serious cases from being thrown out due to procedural delays.

But the risks are real. Misapplied AI could lead to wrongful convictions, biased outcomes, or erosion of public trust. That’s why transparency, oversight, and human judgment must remain central.

The Way Ahead

The next phase of research will likely focus on integrating these reasoning frameworks into real-world legal workflows. That means testing AI tools in live environments, gathering feedback from judges and lawyers, and refining the systems to ensure accuracy and reliability. It also means confronting ethical questions head-on. Who is responsible when AI gets it wrong? How do we ensure fairness and accountability? And how do we balance efficiency with justice?

The path forward isn’t about replacing lawyers, it’s about augmenting their capabilities with tools that respect the complexity of legal reasoning. Goebel’s work offers a blueprint for how to do that, one careful step at a time.