AI Is Crushing Law School Exams—But Crumbling in Court

First, the positive news: In a just-released study, researchers found that a recently developed artificial intelligence model is scoring just as well as high-performing students on law school tests—a marked improvement over results from a similar study conducted a few years ago.

Less encouraging is what continues to happen when lawyers and law firms actually try to apply artificial intelligence as a drafting or research tool, particularly in litigation. If a series of recent, high-profile snafus in U.S. and international courts is any indication, AI appears to be hallucinating more often than guests at an ayahuasca retreat.

The legal profession has a lot riding on the AI revolution.

In most surveys, vast numbers of lawyers say they believe artificial intelligence will transform the way they work. And some see a bright future where AI helps remake the law firm business model to allow lawyers to focus on premium work rather than mundane and less-profitable tasks.

Those dreams could be delayed or derailed if firms—including large, elite players—continue to face errors, ethical lapses and sanctions when they use AI tools.

AI’s A+ Work

In early June, researchers from the University of Maryland Francis King Carey School of Law said they had tested OpenAI’s new reasoning model, o3, to determine how it would perform when taking the law school’s final exams. Answers were graded on the same curve as students, they said, and the experiment mirrored an effort three semesters prior that had worked with an older GPT model.

The previous model received grades ranging between B+ and D, the researchers said. The o3 model, on the other hand, “got three A+s, one A, one A-, two B+s and a B.” The highest grades were for questions involving constitutional law, property law and the legal profession. The lowest grade—a B—was in administrative law, although this appears to have occurred because the model’s data had not yet been updated to incorporate the Supreme Court case overturning the Chevron doctrine.

As Reuters reported, the research joins previous studies which have shown that AI chatbots can pass the nation’s Multistate Professional Responsibility Exam and can do so with higher scores than most human students. That 2023 study was conducted with ChatGPT’s prior model GPT-4.

Making Mistakes

As impressive as those results may be, the problem for practitioners is the practical application of AI tools in their daily work. A legal scholar tracking the issue globally has identified 103 cases in 2025 alone containing hallucinations produced by generative AI. About two-thirds of those cases were in the United States.

While pro se litigants and small firms with limited resources are often most vulnerable to AI’s pitfalls, the problems are spreading to larger, better-heeled firms, as well.

According to news reports, lawyers from Butler Snow are facing potential sanctions in federal court over AI-generated errors made in a case involving abuses in the Alabama prison system. In May, a lawyer from the firm used ChatGPT to prepare filings that would expedite the deposition of a prisoner involved in the case. Four case citations included by ChatGPT did not exist.

U.S. District Judge Anna Manasco, quoted in an article by The Guardian, said the errors were “proof positive” that sanctions for misuse of AI tools have been insufficient. “If they were, we wouldn’t be here,” she said, according to The Guardian’s report.

Also in May, K&L Gates was sanctioned in the U.S. District Court for the Central District for California for failing to check citations in AI-generated material prepared by a co-counsel—even after the judge had flagged errors in the text.

"Plaintiff’s use of AI affirmatively misled me,” Judge Michael Wilner wrote. “I read their brief, was persuaded (or at least intrigued) by the authorities that they cited, and looked up the decisions to learn more about them – only to find that they didn’t exist. That’s scary.” Wilner added that “strong deterrence is needed to make sure that attorneys don’t succumb to this easy shortcut.”

A National and International Issue

During the last two years, lawyers in Wyoming, New York, Colorado, Massachusetts, Utah and Florida have faced sanctions for submitting documents that included citations and quotes concocted by AI. And the problems are not limited to the United States. Lawyers in Israel, the United Kingdom, Brazil, Canada, Australia, South Africa and Trinidad & Tobago, among others, have also turned in filings with fake citations.

For instance, in early June, the High Court of England and Wales warned U.K. lawyers that they could face contempt of court and even criminal charges if they cite cases and quotes invented by artificial intelligence. “Practical and effective measures must now be taken by those within the legal profession” to prevent AI-related errors, Judge Victoria Sharp, president of the King’s Bench Division of the High Court, wrote in ruling joined by Judge Jeremy Johnson. The ruling was triggered by the fictitious, AI-created material included in two recent cases before the court.

Problems arise because the large language models powering chatbots like ChatGPT “hallucinate.” That is, they perceive “patterns or objects that are nonexistent or imperceptible to humans, creating outputs that are nonsensical or altogether inaccurate,” according to a definition by IBM. This can occur, Google notes, because of “insufficient training data, incorrect assumptions made by the model, or biases in the data used to train the model.” Chatbots are also designed to be helpful, and a model may invent material to satisfy a user’s prompt.

Rules Multiply

Federal and state courts and bar associations have issued a flurry of opinions and rules during the last two years attempting to address lawyers’ AI usage and potential ethical pitfalls.

In 2024, the American Bar Association weighed in with Formal Opinion 512. While it acknowledged that generative AI is a “a rapidly moving target,” the ABA said lawyers have three fundamental duties when using AI. They must: be competent and understand how the technology works and the risks associated with it; ensure client confidentiality is protected; and maintain strict oversight over AI-related work.

Local court rules and individual judges are routinely going farther than the ABA and local and state bar associations. A 2024 order in the U.S. District Court for the Eastern District of Pennsylvania is a typical example.

U.S. District Judge Gene E.K. Pratter requires that lawyers disclose the use of AI in any complaint, answer, motion, brief or other paper filed with the court and to certify that citations in AI-generated materials are accurate. Counsel must “in a clear and plain factual statement, disclose that generative AI has been used in any way,” Pratter’s standing order says, “and certify that each and every citation to the law or the record in the filing has been verified as authentic and accurate.”

Great Expectations

In its 2024 survey report on AI and the future of the legal profession, Thomson Reuters found that 77% of legal professionals surveyed believe artificial intelligence will transform or have a major impact on the way they work “now and into the future.” This result was 10% higher than in the previous year. The most common strategic priority cited among law firms was “exploring the potential for and implementing AI.”

“The broad consensus on AI’s expanding role reflects its proven return on investment (ROI) in driving growth and enhancing internal processes,” the report said. “Even risk-averse sectors are recognizing AI’s potential, maintaining a focus on trusted content and human oversight.”

Yet a number of law firm users also expressed a note of caution. Most who have used AI see it as a basic starting point for their work, rather than as a “robot lawyer” capable of producing sophisticated legal materials and content. And a significant percentage of professionals—37%—said they have yet to try AI tools. When asked what is stopping them, “concern about the accuracy of outputs” was their most commonly cited reason.

A February report from Harvard Law School’s Center on the Legal Profession, based upon interviews with chief operating officers and partners at Am Law 100 firms, found that firms are testing various use cases for AI tools. The results are promising. In high-volume litigation matters, associate time spent working on complaint responses has declined from 16 hours to 3-4 minutes. “Lawyers have seen productivity gains greater than 100 times,” the report noted. “Using AI for the automation of initial drafting has demonstrated not only time savings but also increased accuracy.”

The key, however, is that firms successfully integrating AI are adopting specialized tools specifically designed to fit legal tasks. Hopping on ChatGPT may prove tempting to a lawyer facing a time crunch. But as the growing body of evidence shows, expecting it to perform complex legal work is a risky move that may yield hallucinations and potential sanctions. As one tech news commentator has put it, “ChatGPT continues to be a bad lawyer.”

David L. Brown is a legal affairs writer and consultant, who has served as head of editorial at ALM Media, editor-in-chief of The National Law Journal and Legal Times, and executive editor of The American Lawyer. He consults on thought leadership strategy and creates in-depth content for legal industry clients and works closely with Best Law Firms as senior content consultant.