Uploading PACER Dockets and Oral Argument Recordings to the Internet Archive

Michael Lissner

September 11, 2018

Highlights

We have begun uploading six million PACER dockets to the Internet Archive. Docket uploads will be completed quarterly going forward.
Our collection of oral argument recordings now supports all circuit courts and the Supreme Court. Previously, due to inaccesibility, it lacked the 10th and 11th Circuits.
We have uploaded more than 40 thousand oral argument recordings consisting of over one thousand days of audio to the Internet Archive. New oral argument recordings will be uploaded nightly.

At Free Law Project, we collect a lot of legal information. In our RECAP initiative, we collect (or are donated) around one hundred thousand items from PACER every day. Separately, in our collection of oral argument recordings, we have gathered more than 1.4 million minutes of legal recordings — more than anywhere else on the web. All of this content comes from a variety of sources, and we merge it all together to make a searchable collection of PACER dockets and a huge archive of oral argument recordings.

Part of our mission at Free Law Project is to share this information and to ensure its long-term distribution and preservation. A great way to do that is to give it to a neutral third party so that no matter what happens, the information will always be available. For years, we have been lucky to partner with the Internet Archive for this purpose and today we are pleased to share two pieces of news about how we give them information.

The first news is that we are fulfilling our promise from last November to do quarterly uploads of the PACER dockets that we have in the RECAP Archive. This is currently about six million dockets comprising nearly 40 million docket entries. We already upload the PDFs from these cases each night, so the change here is that we are now also uploading all of the dockets themselves. These are being uploaded as JSON files, the standard for developers.

We hoped to fulfill this promise back in December of last year, but learned — repeatedly — that this was a difficult and complex task. This first batch of dockets will take a while to complete, but we are happy to finally have it uploading. We apologize for the delay in accomplishing this.

For the techies: This information is being uploaded as JSON data, in a format that closely mirrors that of our APIs. Here's an example of a JSON object for the Manafort case in DC. To learn more about the data or the fields, please see the CourtListener API documentation. Unfortunately, we no longer support the old XML format.

We think this is big news! More than six million PACER dockets will be permanently preserved as high-quality data at the Internet Archive, with more regularly on the way.

Check It Out

You can find all of our oral argument recordings on the Internet Archive in a new collection we have created for this purpose.

Check it Out

Our second piece of news is that as of a few days ago, we uploaded our entire collection of oral argument recordings to a new collection at the Internet Archive. This is around one thousand days of continuous audio from over 40 thousand federal circuit court recordings. Every night from now on, we will upload any new recordings that we gather.

We are also announcing today that our database of oral argument recordings now supports both the 10th and 11th circuits. For years we have been writing letters to the circuit courts, urging them to post their oral argument recordings, and explaining the historical, educational, and legal importance of these recordings.

With these last two courts added, all of the circuit courts post their oral argument recordings to their website. Prior to this milestone, numerous circuit courts required expensive and bureaucratic paperwork to get a single recording, and consequently, such recordings rarely saw the light of day.

Finally, we are also announcing today that we are in the process of gathering more than twenty-five thousand oral argument recordings from the 9th Circuit's website. As with the rest, we will add these to our collection and upload them to the Internet Archive.

We could not be more thrilled and proud to have helped bring this data into the open. With this, we are finally gathering oral argument recordings from all of the circuit courts, making it searchable, uploading it to the Internet Archive, and even making it into podcasts. The Internet Archive is also generating written transcripts of all of the recordings we send them, allowing you to search for what was said in a circuit court on a given day.

This is what's possible when data is readily available.

What's Next

There is always more work to do. Here's what's next:

We are looking for help downloading the oral argument transcriptions from Internet Archive and making them searchable on CourtListener.
We are now uploading the raw data of these dockets to the Internet Archive, but it'd be great if it could be easily displayed for humans. We are looking for help creating a JavaScript library that could convert the data into nice HTML pages that'd be displayed on the Internet Archive.
We need to continue monitoring our first big upload of dockets to make sure that it finishes cleanly and completely. It is a big job for our server and it requires a watchful eye.

The steps we are taking today are hard ones that we have been working on for years. We hope you will support our efforts so we can do more of this work.

Tagged:internet archivepacerrecaporal arguments

Started in 2010, Free Law Project is the leading non-profit using technology, data, and advocacy to make the legal ecosystem more equitable and competitive. We host major open databases of opinions, federal filings, judges, financial disclosures, and oral arguments. We build open source tools like eyecite, juriscraper, and x-ray.

We rely on your donations for our continued success.

Please Support Our Work

Uploading PACER Dockets and Oral Argument Recordings to the Internet Archive

Highlights

Check It Out

What's Next

About

Our Work

Tools

Data

Engage

Support FLP