All the Case Law.
- Harvard Law Library’s collection of digitized case law is now available to all.
- Free Law Project now has everything from this collection, with nearly one million enhancements.
- Free Law Project is focused on transparency and continual improvement.
Free Law Project began collecting case law in 2009, but before we entered the scene, the community around free access to law already shared an audacious goal. That goal, put simply, was to be able to say, “It’s done. All the case law is now online.”
One group that took a big swing at this daunting goal is the Library Innovation Lab at Harvard Law Library. The Lab spent three years working with Ravel Law to digitize the library’s enormous collection of case law. They completed that work several years ago, but it was not fully available until recently — when Harvard released the entire thing to the public.
A world-class collection containing centuries of case law — over forty million pages from forty thousand books — is now available to all. This is nothing short of incredible.
Our work in this area began in 2009, when we wrote the first code for CourtListener. From the very beginning, we knew we’d have to gather case law, and we have been working to build a complete collection of it ever since.
We’ve had a number of milestones over the years, and last fall, we took a big step towards that goal by launching a page on our website that shows in great detail what cases we have.
Today, we’re announcing our latest milestone. We have worked with the Harvard data for several years and have made over a million enhancements to it. With that complete, we have added their entire collection of nearly nine million decisions to our case law collection. This brings our total count to nearly ten million decisions.
Working at this scale is tremendously challenging and time-consuming. The enhancements we have made include:
- We analyzed every citation throughout American legal history so that it’s properly parsed and understood.
- We normalized every court name so that it can be stored properly in a database.
- Using machine learning, we improved and refined the data structure of millions of cases.
- We developed code to merge the best parts of our data with the best parts of Harvard’s.
This work required a combination of legal research and technical knowhow. In the face of errors in the scanned original books, we had to answer questions like “Did this court even exist, and if so, when?” We had to grapple with technical questions like how to match a case from our existing data to one in the Harvard collection. We had to identify every possible way — correct or not — that somebody could make a citation to case law.
We solved all of these challenges in the open, providing vast public enhancements to our database of courts and our tool for identifying citations. All of our corrections and enhancements to the data itself are online for others to use and build on.
Into the Future
CourtListener is now one of the most comprehensive and transparent sources of case law, but there is always more work to do. In the coming months and years, we will identify gaps that might exist between the conclusion of Harvard’s scanning effort and today. We will attempt a first-of-its-kind comparison between the data in for-profit systems against that of Harvard and CourtListener to learn how each can be enhanced. We will add more citations, better star pagination, and better typography. We will launch a project to scan even more books, so we can get content like citations and page numbers that are only available in print.
In 2024, we celebrate our 15th year collecting and disseminating case law. Through continued community effort and investment, our collection just gets better and better. No collection of case law is ever done, but by achieving digital equivalence with the Harvard Law Library's case law collection, we’ve taken another huge step towards ensuring there’s no case law research that you can’t do either on CourtListener itself or in the ecosystem of organizations we empower.