We Have Every Free PACER Opinion on CourtListener.com

Free Opinion Report Dropdown

At Free Law Project, we have gathered millions of court documents over the years, but it’s with distinct pride that we announce that we have now completed our biggest crawl ever. After nearly a year of work, and with support from the U.S. Department of Labor and Georgia State University, we have collected every free written order and opinion that is available in PACER. To accomplish this we used PACER’s “Written Opinion Report,” which provides many opinions for free.

This collection contains approximately 3.4 million orders and opinions from approximately 1.5 million federal district and bankruptcy court cases dating back to 1960. More than four hundred thousand of these documents were scanned and required OCR, amounting to nearly two million pages of text extraction that we completed for this project.

All of the documents amassed are available for search in the RECAP Archive of PACER documents and via our APIs. New opinions will be downloaded every night to keep the collection up to date.

The RECAP Archive now has more than twenty million documents.

With this additional collection, the RECAP Archive now has information about more than twenty million PACER documents.

As a backup and permanent repository, we are continuing our partnership with the Internet …

more ...

Why We Are Downloading all Free Opinions and Orders from PACER

PACER Logo

Today we are launching a new project to download all of the free opinions and orders that are available on PACER. Since we do not want to unduly impact PACER, we are doing this process slowly, giving it several weeks or months to complete, and slowing down if any PACER administrators get in touch with issues.

In this project, we expect to download millions of PDFs, all of which we will add to both the RECAP Archive that we host, and to the Internet Archive, which will serve as a publicly available backup.1 In the RECAP Archive, we will be immediately parsing the contents of all the PDFs as we download them. Once that is complete we will extract the content of scanned documents, as we have done for the rest of the collection.

This project will create an ongoing expense for Free Law Project—hosting this many files costs real money—and so we want to explain two major reasons why we believe this is an important project. The first reason is because there is a monumental value to these documents, and until now they have not been easily available to the public. These documents are a critical …

more ...

Free Law Project and Princeton/Columbia Researchers Launch First-of-its-Kind Judicial Database

A screenshot of President, Judge Taft

President Taft’s Biography Page

Today we’re extremely proud and excited to be launching a comprehensive database of judges and the judiciary, to be linked to Courtlistener’s corpus of legal opinions authored by those judges. We hope that this database, its APIs, and its bulk data will become a valuable tool for attorneys and researchers across the country. This new database has been developed with support from the National Science Foundation and the John S. and James L. Knight Foundation, in conjunction with Elliott Ash of Princeton University and Bentley MacLeod of Columbia University.

At launch, the database has nearly 8,500 judges from federal and state courts, all of which are available via our APIs, in bulk data, and via a new judicial search interface that we’ve created.

The database is aimed to be comprehensive, including as many facts about as many judges as possible. At the outset, we are collecting the following kinds of information about the judges:

  • Biographical information including their full name, race, gender, birth and death dates and locations, and any aliases or nicknames that a judge may have.

  • Their educational information including which schools they went to, when they went, and …

more ...

Releasing V3 of the API, Deprecating V1 and V2

This post is one with mixed news, so I’ll start with the good news, which is that version 3.0 of the CourtListener API is now available. It’s a huge improvement over versions 1 and 2:

  • It is now browsable. Go check it out. You can click around the API and peruse the data without doing any programming. At the top of every page there is a button that says Options. Click that button to see all the filtering and complexity that lies behind an API endpoint.
  • It can be sampled without authentication. Previously, if you wanted to use the API, you had to log in. No more. In the new version, you can sample the API and click around. If you want to use it programatically, you’ll still need to authenticate.
  • It conforms with the new CourtListener database. More on this in a moment, but the important part is that version 3 of the API supports Dockets, Opinion Clusters and Sub-Opinions, linking them neatly to Judges.
  • The search API supports Citation Searching. Our new Citation Search is a powerful feature that’s now available in the API.
  • Bulk data now has metadata. In response to a …
more ...

Millions of New “Short Form” Case Names Now on CourtListener

While working on a soon-to-be-released feature of CourtListener, we needed to create “short form” case names for all the cases that we could. We’re happy to share that we’ve created about 1.8M short form case names, including complete coverage for all Supreme Court cases going back to 1947, when the Supreme Court Database begins.

If you’re not familiar with the term, short form case names are the ones you might use in a later citation to an authority you’ve already discussed in a document. For example, the first time you mention a case you might say:

Kellogg Brown & Root Services, Inc. v. United States Ex Rel. Carter

But later references might just be:

Kellogg Brown at 22

The Blue Book doesn’t have a lot to say about this format, but does say the short form must make it, “clear to the reader…what is being referenced.” Also:

When using only one party name in a short form citation, use the name of the first party, unless that party is a geographical or governmental unit or other common litigant.

With these rules in mind, we made an algorithm that attempts to generate good short form …

more ...

Reporters Database

United States Reporters

A long time ago in a courthouse not too far away, people started making books of every important decision made by the courts. These books became known as reporters and were generally created by librarian-types of yore such as Mr. William Cranch and Alex Dallas.

These men—-for they were all men—-were busy for the next few centuries and created thousands of these books, culminating in what we know today as West’s reporters or as regional reporters like the “Dakota Reports” or the thoroughly-named, “Synopses of the Decisions of the Supreme Court of Texas Arising from Restraints by Conscript and Other Military Authorities (Robards).”

Motivated by our need to identify citations to these reporters, we’ve taken a stab at aggregating a few facts about them, such as variations in their name, abbreviation, or years they were published, and put all that information into our reporters database. Until recently, this database lived deep inside CourtListener and was only discovered by intrepid hackers rooting around, but a few months ago we pulled it out, put it in its own repository, and converted it to better formats so anyone could more easily re-use it.

Currently, it’s ready to use …

more ...

CourtListener is Now Integrated with the Supreme Court Database

Earlier this week somebody on the Internet pinged us with some code and asked that we integrate the data from the Supreme Court Database (SCDB). Well, we’re happy to share that less than a week later we’ve taken the code they provided and used it to upgrade CourtListener’s database.

The Supreme Court Database includes data for about 8,500 Supreme Court opinions from 1946 to 2013 and this first pass merges that data with CourtListener so that:

  • Our copy of these opinions are enhanced with better parallel citations. You can now look these items up by U.S. Reporter (U.S.), The Supreme Court Reporter (S.Ct.), Lawyers’ Edition (L.Ed.) or even LEXIS citation (U.S. LEXIS). This should make our citation graph much more robust and should help people like Colin Starger at University of Baltimore that are doing great analyses with this data. Many of these items were screen scraped directly from the Supreme Court website meaning that for these items, this is the first time they have had proper citations. Here’s an example of the many parallel citations items now have:

Roe v. Wade
Citations

  • All Supreme Court Opinions from 1946 to 2013 have a new …
more ...

Announcing Oral Arguments on CourtListener

We’re very excited to announce that CourtListener is currently in the process of rolling out support for Oral Argument audio. This is a feature that we’ve wanted for at least four years — our name is CourtListener, after all — and one that will bring a raft of new features to the project. We already have about 500 oral arguments on the site, and we’ve got many more we’ll be adding over the coming weeks.

For now we are getting oral argument audio in real time from ten federal appellate courts. As we get this audio, we are using it to power a number of features:

  • Oral Argument files become immediately available in our search results.
  • A podcast is automatically available for every jurisdiction we support and for any query that you can dream up. Want a custom podcast containing all of the 9th circuit arguments for a particular litigant? You got it.
  • You can now get alerts for oral arguments so you can be sure that you keep up with the latest coming out of the courts.
  • For developers, there are a number of new endpoints in both our REST API and our bulk data API for …
more ...

The Importance of Backups

Burning of the Library of Alexandria

The Burning of the Library of Alexandria, an illustration from ‘Hutchinsons History of the Nations’, c. 1910.

At least since the destruction of the Ancient Library of Alexandria, the world has known the importance of having a backup. The RECAP archive of documents from PACER is a partial backup of documents taken offline by five federal courts. It is impossible to determine how complete a backup we have, because the problem with missing documents is that you cannot even determine that they are missing without a complete list of what used to be available. No such lists exist for the documents from these five courts.

But as coverage of this surprising and unprecedented action by PACER officials continues (see techdirt), the BBC has an article that takes an interesting approach by pointing out some of the landmark civil rights cases taken off PACER through this action.

The BBC mentions the case Ricci v. DeStefano which was decided at the Second Circuit while  Sonia Sotomayor was a Circuit Judge. Sotomayor, now a Supreme Court Justice, had her role in deciding the case closely scrutinized during her Supreme Court confirmation hearings. Many who dug in to Sotomayor’s background during those hearings …

more ...

Free Law Project Joins Request for Access to Offline PACER Documents

A recent announcement on the federal PACER website indicated that PACER documents from five courts prior to certain dates (pre-2010 for two courts, pre-2012 for one court, etc.) would no longer be available on PACER. The announcement was reported widely by news organizations, including the Washington Post and Ars Technica. The announcement has now been changed to explain, “As a result of these architectural changes, the locally developed legacy case management systems in the five courts listed below are now incompatible with PACER; therefore, the judiciary is no longer able to provide electronic access to the closed cases on those systems.” See a screenshot of the earlier announcement without this explanation:

Original PACER announcement

Original PACER announcement

This morning, Free Law Project signed on to five letters from the non-profit, Public.Resource.Org, headed by Carl Malamud, asking the Chief Judge of each of these five courts to provide us with access to these newly offline documents. The letter proposes that we be provided access in order to conduct privacy research, particularly with respect to the presence of social security numbers in court records, as Public.Resource.Org has done previously in several contexts. In addition we offer to host all the documents …

more ...

Our RECAP partnership with Princeton University’s CITP

Today Free Law Project announced that it is partnering with Princeton University’s Center for Information Technology Policy to manage the operation and development of the RECAP platform. Most readers here will know that the RECAP platform utilizes free browser extensions to improve the experience of using PACER, the electronic public access system for U.S. federal courts, and crowdsources the creation of a free and open archive of public court records.

I have been frustrated with PACER for a long time: as a member of the public, as a law student, as a litigator, as an academic, and as one trying to build systems for public access to court documents. I’ve been frustrated by the price per page, by the price for searches with no results, by the shocking price for inadvertent searches with thousands of results, by the occasional price for judicial opinions that are supposed to be free, by the price in light of the fact that Congress made clear that the Judicial Conference “may, only to the extent necessary, prescribe reasonable fees… for access to information available through automatic data processing equipment” when it has been demonstrated time and again that PACER revenues grossly exceed …

more ...

CiteGeist Powers CourtListener’s Newly Improved Search Results

demo graph

The citation graph is made into a network to compute CiteGeist scores.

We’re excited to announce that beginning today our relevancy engine will provide significantly better results than it has in the past. Starting today, whenever you place a query we will analyze which opinions are the most cited, and we will use that to provide the best results possible. We’re calling this the CiteGeist score because it finds the spirit of your query (“Geist”) and gives you the best possible results. This is currently enabled for our corpus starting in the 1750’s up through about 1985, and the remaining years will get the CiteGeist treatment as well over the next few days.

The details of how CiteGeist works are in our code, but the basic idea is to give a high CiteGeist score to opinions that are cited many times by other important opinions, and to give a lower CiteGeist to opinions that have not been cited or that have only been cited by unimportant opinions. Once we’ve established the CiteGeist score, we combine it with a query’s keyword-based (TF/IDF) relevancy. Together, we get a combined score which is a measure of how …

more ...

Want to Merge Millions of Legal Opinions? It Won’t Be Easy.

Note: This is the third in the series of posts explaining the work that we did to release the data donation from Lawbox LLC. This is a very technical post exploring and documenting the process we use for extracting meta data and merging it with our current collection. If you’re not technically-inclined (or at least curious), you may want to scoot along.

Working with legal data is hard. We all know that, but this post serves to document the many reasons why that’s the case and then delves deeply into the ways we dealt with the problems we encountered while importing the Lawbox donation. The data we received from Lawbox contains about 1.6M HTML files and we’ve spent the past several months working with them to extract good meta data and then merge it with our current corpus. This post is a long and technical one and below I’ve broken it into two sections explaining this process: Extraction and Merging.

Extraction

Extraction is a difficult process when working with legal data because it’s inevitably quite dirty: Terms aren’t used consistently, there are no reliable identifiers, formats vary across jurisdictions, and the data was …

more ...

Free Law Project Adds More than 1.5M Opinions to its Collection Thanks to Data Donation

For Immediate Release —- Berkeley, CA

After many years of collecting and curating data, today CourtListener crossed some incredible boundaries. Thanks to a generous data donation from Lawbox LLC, our computers are currently adding more than 1.5M new opinions to CourtListener, expanding our coverage to a total of more than 350 jurisdictions. This new data enables legal professionals and researchers insight into data that has never before been available in bulk and greatly enhances the data we previously had. This data will be slowly rolling out in our front end, and will soon be available in bulk from our bulk downloads page. A new version of our coverage page was developed, and, as always, you can see our current coverage for any jurisdiction we support.

It’s difficult to overstate the importance of this new data. In addition to being a massive expansion of our coverage, it also brings some notable improvements to the project:

  1. For all of the new data and much of our old data, we have added star pagination throughout. For the first time, this will make pinpoint citations possible using the CourtListener platform.
  2. We’ve re-organized our database for more accurate citations enabling for the first …
more ...

Free Law Virtual Machine Available for Academics and Developers

Update: This tool has been deprecated. We heartily recommend our new process using Vagrant.

A goal of the Free Law Project is to make development of legal tools as easy as possible. In that vein, we’re excited to share that as of today we’re officially taking the wraps off what we’re calling the Free Law Virtual Machine.

For those not familiar with this, a virtual machine is a snapshot of a computer that can be run by anybody, anywhere. With this release, we’ve created a computer running Ubuntu Linux that our developers or academics can download, and which has all of the Free Law Project’s efforts pre-loaded and ready to go.

In addition to a number of minor improvements, the following are installed and configured:

  • Courtlistener
  • Juriscraper
  • Development tools such as Intellij, Meld, vim, and Kiki
  • Bookmarks of all American courts

In addition to providing a simple virtual machine that you can install, we’re also releasing sample data that can easily be imported into the CourtListener platform. This data is available in groups of 50, 500, 5,000 or 50,000 records so that anybody can easily begin working or experimenting with our platform …

more ...

A few small API changes

We’re updating our code in a number of ways today and that is resulting in a number of changes to the format of our data dumps. If you use them in an automated fashion, please note the following changes:

  • dateFiled is now date_filed
  • precedentialStatus is now precedential_status
  • docketNumber is now docket_number
  • westCite is now west_cite
  • lexisCite is now lexis_cite

Additionally, a new field, west_state_cite, has been added, which will have any citations to West’s state reporters. We’ve made these changes in preparation of a proper API that will return XML and JSON. Before we released that API, we needed to clean up some old field values so they were more consistent. After this point, we expect better consistency in the fields of our XML.

If this causes any inconvenience or if you need any help with these changes, please let us know.

more ...

New Courts at CourtListener with Historical Data

I mentioned in my last post that we’ve added some new courts to the site. Today we’ve added the historical data for these courts that was available on their website.

This amounts to about 1,500 new cases on CourtListener:

  • 112 from November 2003 to today at the Court of Appeals for the Armed Forces.
  • 764 from January 2000 to today at the Court of Veterans Claims
  • 600 from January 2008 to today at the Court of International Trade

All of these docs are immediately available for search, RSS or via our dump API, and will be in our dump of all our cases when it is regenerated at the end of the month.

This also marks an important achievement for the Juriscraper library. Since CourtListener now has scrapers for all federal courts of special jurisdiction, we’re officially moving it to version 0.2. It’s taken longer than we wanted to get it here, but this is a huge step for the library.

Freeing 1,000 docs at a time.

more ...

Announcing the third series of the Federal Reporter!

Following on Friday’s big announcement about our new citator, today I’m excited to share that we’ve completed incorporating volumes 1 to 491 of the third series of the Federal Reporter (F.3d). This has been a monumental task over the past six months. Since we already have many cases that were from the same time period and jurisdiction, we had to work very hard on our duplicate merging algorithm. In the end, we had were able to get upwards of 99% accuracy with our merging code, and any cases that could not be merged automatically were handled by human review. The outcome of this work is an improved dataset beyond any that has been available previously: In tens of thousands of cases, we have been able to merge the meta data on Resource.org with data that we obtained directly from the court websites.

These new cases bring our total number of cases up to 756,713, and we hope to hit a million by the end of the year. With this done, our next task is to begin incorporating and data from all of the appellate-level State Courts. We will be working on this in a …

more ...

Building a Citator on CourtListener

I’m incredibly excited today to announce that over the past few weeks we have successsfully rolled out a Citator on CourtListener. This feature was developed by UC Berkeley School of Information students Karen Rustad and Rowyn McDonald after a thorough design and development cycle which included everything from user interviews to performance optimizations of our citation finding algorithm.

As you’re browsing the site, you’ll immediately see three big new features. First, all Federal citations to documents that we have in our collection are now links. So as you’re reading, if there’s a reference to a prior case that you feel might be useful to your research, you can just click the link to that case and continue your research there. This allows you to go upstream in your research, looking at the important cases that came before.

The second big change you’ll see is a new sidebar on all case pages that lists the top five cases that reference the one you’re reading. This allows you to go downstream from the case you’re reading, where you’ll be able to identify how the case was later interpreted by other courts.

At the …

more ...

Announcements, Updates and the Current Roadmap

Just a quick note today to share some exciting news and updates about CourtListener.

First, I am elated to announce that the CourtListener project is now supported in part by a grant from Public.Resource.Org. With this support, we are now able to develop much more ambitious improvements to the site that would otherwise not be possible. Over the next few months, the site should be changing greatly thanks to this support, and I’d like to take a moment to share both what we’ve already been able to do, and the coming changes we have planned.

One feature that we added earlier this week is a single location where you can download the entire CourtListener corpus. With a single click, you can download 2.2GB of court cases in XML format. Check out the information on the dump page for more details about when the dump is generated, and how you can get it: http://courtlistener.com/dump-info/

The second exciting feature that we’ve been working on is a platform change that enables CourtListener to support a much larger corpus. In the past, we’ve had difficulty with jobs being performed synchronously with the court scrapers …

more ...