Legal Data Resources

We are committed to building complete, reliable, and permanent sources of legal data. In service to this mission, we have a large and growing archive of datasets.

In research papers, these data sets may be cited according to our citation guidance.

Case Law Database

Courtlistener has one of the most comprehensive collections of American legal jurisprudence on the Internet.

Our database encompasses more than 99% of all precedential legal case law published in the United States. We have over nine million decisions from over 2,000 courts, and we gather more opinions from state and federal courts every day.

Our data is sourced from a wide range of institutions and organizations, including Resource.org, the Supreme Court Database, Columbia, Harvard, and the Law Library of Congress, among others. We also partner with courts to enable direct publishing of case law.

To ensure that our users have access to the most recent case law, we collect case law as it is released by the courts. This technology allows us to continuously update and expand our database, providing users with the most current legal case law available the moment it happens.

While we emphasize quantity, we also prioritize quality. Our team constantly is working to improve each and every decision, and has used machine learning and human resources to make over a million corrections and enhancements to other datasets.

Learn More

RECAP Archive Database

The RECAP Archive is the biggest open collection of federal court data on the Internet.

It contains hundreds of millions of docket entries, nearly every federal case, and millions of documents. It grows by thousands of documents each day.

The data in RECAP is collected from many different sources. A significant source is more than 30,000 people that have installed the RECAP Extension in their browser. As they use PACER, the extension sends copies of their purchases to RECAP.

People that receive ECF notification emails from PACER can use the @recap.email system to seamlessly add documents to the RECAP Archive.

We host a free API that organizations use to gather content from PACER. We also scrape PACER documents, crawl federal district court RSS feeds, and scrape metadata for District and Bankruptcy cases.

Additionally, Free Law Project consults with organizations to gather PACER data. In these arrangements, we purchase thousands of items from PACER, which we add to the archive for all to access.

Learn More

Judge and Disclosure Database

For the benefit of researchers, practitioners, and the public we are proud to host a comprehensive structured database of judges and their financial interests.

The database contains information about more than sixteen thousand state and federal judges, making it a treasure trove for those wishing to do judicial analytics. This data has been sourced from books, court websites, commercial data sources, public records requests, and federal financial disclosure forms.

Information currently includes biographical data about each person, their financial disclosure records, the roles they have held before, during and after their time in the judicial branch, their political affiliations, their education, and any retention events that kept them in a judicial position (such as a reappointment).

The financial disclosure information is a one-of-a-kind resource that we built by processing over 250,000 pages of disclosure forms. It has information about:

Over 1.7 million investment records.
Around 14,000 sources of income outside of regular investments.
More than 1,900 gifts received by federal judges.

This collection was used by the Wall Street Journal in their groundbreaking series on judicial conflicts of interest.

In 2023, ProPublica utilized this dataset for an investigation into the Supreme Court, which led to their winning the Pulitzer Prize. The Pulitzer committee praised ProPublica's "groundbreaking and ambitious reporting that pierced the thick wall of secrecy surrounding the Supreme Court."

We hope this database, its APIs, and its bulk data will be valuable tools for practitioners and researchers across the country.

Free Law Project created this project with support from and in collaboration with Pre/Dicta, Justia, Elliott Ash of Princeton University, the National Science Foundation, and the John S. and James L. Knight Foundation.

What's next?

The database is a constant work in progress, and we are regularly adding more judges and financial information. We need your support to do more of this work.

To learn more about this data, or to get involved, please get in touch or see the related blog posts.

Search for a Judge Search Financial Records Download Bulk Data

Reporters Database

Our Reporters Database is a Python library or JSON file that provides structured information about nearly every American legal reporter.

The database can be used in a variety of tools. For example, we use the Reporters Database to extract citations from text, and to look them up in our database.

In the Reporters Database, you will find information about nearly 1,000 different reporters dating from 1754 to today. For each reporter, we have researched and provide structured information such as the date it began and ended, its official abbreviation, and any other variations that we've found while parsing over fifty million legal citations.

We believe that with the exception of niche, small, regional reporters, this database is complete and ready for production.

The Reporters Database is written in Python, but is available as JSON or can easily be extended to other languages.

Read More on Github

Courts Database

Our Courts Database is a Python library or JSON file that provides structured information about nearly every American court, past or present.

The database has two main purposes. The first is to provide a unified list of court names and identifiers to make legal tools more interoperable. Second, is to use regular expressions to normalize court names into those identifiers.

Using this tool, you can take an unstructured string describing a court and convert it to an identifier you can trust.

The Courts Database was tested against sixteen million court strings, has over 700 court identifiers, and uses over 2,100 hand-crafted regular expressions to do its work.

The Courts Database is written in Python, but is available as JSON or can easily be extended to other languages.

Read More on Github

Court Seals Database and Service

The Judicial Seals Repository is a small project to collect and organize all of the court seals in use in the United States. At present, it has approximately 250 seals.

This project can be useful if you need to display or look up the seal of a court. For example, at right you can see the seal of the Supreme Court.

As of version 2.0, we provide these photos via a free scalable service. This saves you the trouble of hosting or maintaining the photos yourself. Of course, if you prefer to build your own system, the raw data is available in our git repository.

To learn more about these seals, see their page on the Python Package Index (PyPi) or on Github.

Read More on Github

Judicial Portraits Database and Service

Our database of judicial portraits is an open collection of meticulously cropped, edited, and organized photos of judges that can be dropped into websites or other applications.

With over 1,000 portraits of state and federal judges, this collection adds a visual element to many applications that straight data lacks.

This project has been supported by NSF Grant SES-1260875 and Pre/Dicta.

Read More on Github

Oral Argument Recording Archive

Launched in 2014, our archive of oral argument recordings is the largest open collection of legal audio on the Internet.

This collection was created when we learned that federal circuit courts were deleting oral argument recordings from their websites due to lack of space. At the time, nobody was collecting this information, and it was going down a digital black hole.

Federal circuit court oral arguments are historical documents that should not be deleted from the Internet. That's why in addition to collecting these files and saving them in our database, we also upload them to the Internet Archive.

We also use AI to generate transcripts for all federal circuit court arguments just minutes after the audio is released by the courts, allowing users to search and set alerts for anything said in court including statutes, citations, and keywords.

In addition:

We make every oral argument fully searchable.
We make podcasts for every court and allow you to make custom podcasts.
We convert each audio file to an MP3, and we add good metadata and album art to the file.
We support alerts for oral argument files.
The recordings are available in our APIs.

Our collection of SCOTUS recordings goes back to 2013. Before that, we recommend the wonderful Oyez Project, which has SCOTUS audio going back to 1955.

This work has been possible in part thanks to a grant from Columbia University Library. If you find this work valuable, please consider making a donation so that it may continue to thrive.

Learn More

Supreme Court Data in Bulk and Via a REST API

The Supreme Court is the most important court in the United States and for many years, we have worked to create the most complete and detailed collection of Supreme Court data possible.

In this task, we have combined many sources of data including the Supreme Court Database, the opinions available on Resource.org, Library of Congress, and downloads that we have collected directly from the Supreme Court website. All of this data has been carefully combined, fixing innumerable errors both manually and with automated systems.

Some examples of the work we have completed so far includes:

Let us know if you have any ideas for improvement or questions we can help with.

Check Out the Data

Legal Data Resources

Case Law Database

RECAP Archive Database

Judge and Disclosure Database

What's next?

Reporters Database

Courts Database

Court Seals Database and Service

Judicial Portraits Database and Service

Oral Argument Recording Archive

Supreme Court Data in Bulk and Via a REST API

About

Our Work

Tools

Data

Engage

Support FLP