New formats for dump files

As mentioned in a previous post, we are currently making some changes to our back end to allow better citation meta data and searching granularity. As part of these changes, we have made two small changes to our dump formats.

The first change is to list docketNumber, westCite and lexisCite instead of caseNumber and westCitation. We previously had many West-style citations listed as generic case numbers. This wasn’t very accurate, so we’ve re-organized this to have better granularity.

The second change we’ve made is to how we handle missing or incomplete data. Previously, if a case was missing data, we would simply not include it in a dump. This was not the best solution, so we’re now including any information we have about a case in every dump we create. In some cases, this can create partial cases that lack vital meta data.

We hope these changes will be easy to work with, and that they’ll cause no disruption.

more ...

The abolishment of the Emergency Court of Appeals (April 18, 1962)

One of the coming features at CourtListener is an API for the law. Part of that feature is going to be some basic information about the courts themselves, so I spent some time over the weekend researching courts that served a special purpose but were since abolished.

One such court was the Emergency Court of Appeals. It was created during World War II to set prices, and, naturally, was the court of appeals for many cases. The creation date of the court is prominently published in various places on the Internet, but the abolishment history of the court was very difficult to find. After researching online for some time, and learning that my library card had expired (sigh), I put in a query with the Library of Congress, which provides free research of these types of things.

Within a couple days, the provided me with this amazing response, which I’m sharing here, and on the above Wikipedia article:

As stated in the Legislative Notes to 50 U.S. Code Appendix §§ 921 to 926, as posted at——000-notes.html, the following explanation is given regarding the amendment and repeal of Act …

more ...

Site refresh and new features now live!

After many months of works and about 100 revisions to the code, today we’ve rolled out the latest version of the site. This version comes with some great enhancements:

  • We rolled this out to our Twitter stream a few months ago, but we finally have proper branding and a proper logo. We’re still keeping things simple, but this should make things a little prettier.
  • We’ve added the search box to all pages so searches are easier to make and so you can see what search brought you to the document you’re looking at.
  • A new favorites feature has been added that allows you to make notes about cases that interest you, and to see all of your notes in your profile.
  • The sidebar has been moved to the left in preparation for faceted searching and browsing
  • Lots of code clean up, lots of aesthetic fixes and dozens of small fixes here, there and everywhere.

We’re really happy with this refresh and the new features that are coming along with it. If you notice anything that’s not working properly or that could be better, we’re always happy to hear your feedback.

more ...

Updated Supreme Court Case Dates and The First Release of Early SCOTUS Data in Machine-Readable Form

A few years ago, the Library of Congress released a PDF that listed the exact dates that the early Supreme Court Cases were decided. Since the written record only contained the month and year of the decision, this list served as the official record for the cases.

While it was great for the Library of Congress to publish this report, unfortunately they did so in a large PDF rather than a more useful format that could be used by projects such as CourtListener. Attempts to contact the Library of Congress were unable to locate the original version of the document, so we converted the PDF into both a CSV and an ODS spreadsheet so that the data can be easily read by a computer. I’m happy to be releasing these files today so that they can be used by others.

The second project we have been working on at Free Law Project was to import this data into our system. Because citations in the file are not always unique, we had to device a heuristic algorithm to link up the data in the CSV with the data in our system. Today, we’re happy to share that we did …

more ...

Schultze and Lee on RECAP at NYLS

On February 15, Steve and Tim spoke at New York Law School on “PACER, RECAP, and Free Law.” Video of the event is below:

[![]({static}/images/recap/20110215_Lee_Schultze_RECAP_NYLS.png) ](
more ...

Changes and Plans at

A few weeks ago, we made a fairly major change at to include ID numbers in all of our case URLs. This change meant that links that were previously like this:

Are now like this:

Most of the old links should continue to work, but using the new links should be much faster and more reliable. The major difference between the two is the ID number, which is encoded as a set of numbers (in this case V5o). This ID corresponds directly with the ID number in our database, aiding us greatly in serving up cases quickly and accurately.

Around the same time as this change, we added social networking links to all of our case pages to make them easier to share with friends and colleagues. These links use our new tiny domain,, and should thus be ideal for websites like Twitter or Reddit.

In the next few months we will be getting a major new server, and will be migrating our data to it. This will allow us to serve more data, and—drum roll please—will allow us to begin …

more ...

RECAP Extension 0.8 Beta Released

We are proud to announce beta version 0.8 of RECAP.

This release of RECAP fixes an issue introduced by the newest version of PACER, which has been deployed to several district courts. We’d like to thank the users that brought this issue to our attention and also encourage all RECAP users to contact us if you notice any irregularities in the future. Each district court operates their own version of PACER, so there are often small differences in code which can affect the way that RECAP operates.

In addition, we’ve added a feature that will allow CM/ECF users to more conveniently contribute documents to the RECAP archive. A substantial number of our users are attorneys who have a separate “ECF” login as well as a standard PACER account. Many of these users find it easy to download and pay for PACER documents while logged into the ECF system, but previous versions of RECAP would not upload these documents to the shared archive. Version 0.8 changes this behavior, allowing ECF users to contribute these documents to the RECAP archive.

When we released RECAP over a year ago, we intentionally disabled the extension when it detected an …

more ...

RECAP Extension 0.7 Beta Released

We are proud to announce beta version 0.7 of RECAP. This release adds support for Firefox 4 beta, for those of you living on the cutting edge.

We’ve also added a feature requested by our users. Before this release, the only way to see if RECAP had any free documents for a particular case was to purchase and examine the docket report for that case. In version 0.7, RECAP will notify you before you run a docket report if there is already free archived docket available. On the docket query page for a case that has archived information, you should see a box appear at the bottom of your screen. Clicking on that link will take you to RECAP’s summary page, which includes any docket information we have on the case as well as links to any documents we may have. Here’s an example of what you should see:

Visual of new RECAP

Version 0.7 also fixes a number of bugs, both minor and major. Thanks to a few extremely helpful users, we were able to fix a problem that prevented RECAP from working correctly behind certain types of proxy servers. Users behind a corporate proxy or firewall …

more ...

RECAP Firefox Search Plugin

One of the ideas behind the RECAP project is that once government data is made accessible in a free and open format, people will find useful new ways to search and process that data. We have heard from many folks looking to do interesting things with the documents archived by RECAP, and last year a group of students built the searchable web-based RECAP Archive. Today, Brian Carver shared a simple tool he built on top of that — a Firefox RECAP search plugin. You know that little search box in the top-right corner of Firefox? If you install his plugin you can choose the RECAP Archive as one of the search engines in the drop-down menu, so that finding free federal court documents is even easier.

Pretty cool!

more ...

Assessing PACER’s Access Barriers

The U.S. Courts recently conducted a year-long assessment of their Electronic Public Access program which included a survey of PACER users. While the results of the assessment haven’t been formally published, the Third Branch Newsletter has an interview with Bankruptcy Judge J. Rich Leonard that discusses a few high-level findings of the survey. Judge Leonard has been heavily involved in shaping the evolution of PACER since its inception twenty years ago and continues to lead today.

The survey covered a wide range of PACER users—“the courts, the media, litigants, attorneys, researchers, and bulk data collectors”—and Judge Leonard claims they found “a remarkably high level of satisfaction”: around 80% of those surveyed were “satisfied” or “very satisfied” with the service.

If we compare public access before we had PACER to where we are now, there is clearly much success to celebrate. But the key question is not only whether current users are satisfied with the service but also whether PACER is reaching its entire audience of potential users. Are there artificial obstacles preventing potential PACER users—who admittedly would be difficult to poll—from using the service? The satisfaction statistic may be fine at face value, assuming …

more ...

New Search and Browsing Interface for the RECAP Archive

Update: We wound down this version of the archive, but we replaced it with something much better.

One of the most-requested RECAP features is a better web interface to the archive. Today we’re releasing an experimental system for searching and browsing, at There are also a couple of extra features that we’re eager to get feedback on. For example, you can subscribe to an RSS feed for any case in order to get updates when new documents are added to the archive. We’ve also included some basic tagging features that let anybody add tags to any case. We’re sure that there will be bugs to be fixed or improvements that can be made.

The first version of the system was built by an enterprising team of students in Professor Ed Felten’s “Civic Technologies” course: Jen King, Brett Lullo, Sajid Mehmood, and Daniel Mattos Roberts. Dhruv Kapadia has done many of the subsequent updates. The links from the RECAP Archive pages point to files on our gracious host, the Internet Archive.

See, for example, the RECAP Archive page for United States of America v. Arizona, State of, et al. This is the Arizona …

more ...

More RECAP Events

On Tuesday June 22, Harlan Yu will be on Capitol Hill speaking to Congressional Staffers at an event sponsored by the Advisory Committee on Transparency. The event is called Transparency Made Easy: How to Make the Government More Open and Accountable. It is open only to staffers, but the Sunlight Foundation will post video of the event afterward.

On Friday the 25th, both Harlan and Steve will be at the 20th Annual CALI Conference in Camden NJ, explaining how “Capturing PACER for Open Access” can benefit the cause of legal education.

Update: The video and other materials from the Transparency Caucus are now available. You can watch Harlan’s remarks below:

more ...

New PACER Fee Data, RECAP Appearances This Week

This week, RECAP team member Steve Schultze released new data on how PACER fees are collected and spent in a post entitled, “What Does It Cost to Provide Electronic Public Access to Court Records?” He will discuss this new data and other aspects of public access to federal case materials at a event in DC tomorrow (live streaming video). Steve will also be on a panel at the Virginia State Bar Association’s annual meeting on Thursday, speaking about how the federal electronic filing experience can inform state efforts. That same day, RECAP developer Dhruv Kapadia will be at the event at Harvard.

There will be more RECAP developments and public appearances in the coming weeks.

Update: The video for the Law.Gov Event at the Center for American Progress is now posted on their site. You can watch the excerpt of Steve’s remarks below:

Video of Steve

more ...


I’m elated to announce today that I am officially taking the ropes of my final project and letting it loose into the wild. It’s been seven months since development on it officially started and finally, the beta version is done.

If you haven’t been following along, the project itself is an open source legal research tool which allows anybody to keep up to date with federal precedents as they are set by the 13 Federal Circuit courts. Right now, it has more than 130,000 documents in its corpus, including almost all of the Supreme Court record dating back to 1754. Every day it downloads the latest documents within about a half hour of when each court publishes them.

One thing we’ve focused on while building the site has making it as useful as possible for as many people as possible. Since not everybody likes getting updates in their inbox, we’ve also tied the search engine in with an Atom feed generator so that you can search for whatever you want, and then follow updates in your feed reader.

Everything we’ve built uses a powerful boolean search engine on the backend. At present, there …

more ...

RECAP, The Press, and Judicial Transparency

USA Today just posted an article covering the debate over Arizona’s new immigration law. The article is essentially a roundup of relevant coverage, comments, and the wave of pending lawsuits. It is an example of the core activities of journalism in action — namely information gathering, synthesis, and dissemination on issues relevant to the public. The article is also notable because it embeds the actual text of a complaint from one of the lawsuits filed yesterday. This document came from PACER.

Some time yesterday, a RECAP user also downloaded that document, thus contributing it to the public archive. You can see the docket and document here. As a result, anyone can freely search for, download, or re-post the document. People can also follow the progress of the case (assuming RECAP users continue to download new documents as they are posted). The fact that this is all happening automatically is an exciting success for RECAP.

However, that success is inherently limited. Although the system worked in this particular case, there are literally millions of other cases that have not been similarly liberated from the PACER paywall. Many of these are highly relevant to American citizens, but they are not of broad …

more ...

RECAP Documents Now More Searchable Via Internet Archive

We recently made a small change to the way that documents uploaded by RECAP users are made available on the Internet Archive. Until today, the Internet Archive had served primarily as a bulk hosting provider, without much ability to browse or search the archive. This was enforced in two ways: First, it was not possible to search for documents using the Internet Archive’s search tools. Second, external search engines were prevented from indexing the site. We decided to do this in order to be especially cautious with respect to privacy concerns that we have previously discussed.

Since we launched, we have spent a great deal of time examining these issues, and we decided to make a small incremental step in making the documents more findable without (yet) allowing in-depth full-text search of all documents. We have enabled Internet Archive indexing, as well as search engine indexing, for the case summary pages on the Internet Archive. That means, for example, that the relatively limited information on the AT&T v. Hepting case summary page is now searchable.

You can find this case through the Internet Archive search engine by doing a query like this: …

more ...

Designing CourtListener

Over the past week, I’ve been working to create scrapers for each of the 13 federal appeals courts. Last night I finally finished the last of them, so today I’m moving on to the design of the site. Design is always much better when people work in a team, so I’m putting these designs here so others can look at them and give me feedback. Please, please do!

So far, I’ve sketched out four of the major pages that the site will have. A user’s will begin using the site on its homepage. Here, they will be given few options. Basically, they can login, register for an account, make a search, or read one of the ancillary pages such as the “About” or “Privacy” page:

first sketch

Also, note the advanced button under the search field. When this is clicked, it expands to show the advanced search queries that the site will support, as you can see on the next page.

If people are logged in, their homepage becomes the “Create new alert page,” which you can see below. For now, this allows users to create very complicated queries by hand. In the future, it would be …

more ...

Converting PDF Files to HTML

For my final project, we are considering posting court cases on our site, and so I did some work today analyzing how best to convert the PDF files the courts give us to HTML that people can actually use. I looked briefly at google docs, since it has an amazing tool that converts PDF files to something resembling text, but short of spending a few days hacking the site, I couldn’t figure out any easy way to leverage their technology in any sort of automated way.

The other two tools I have looked at today are pdftotext and pdftohtml, which, not surprisingly, do what their names claim they do. Since we’re going to be pulling cases from the 13 federal circuit courts, I wanted to figure out which method works best for which court, and which method will provide us with the most generalizable solution across whatever PDF a court may crank out.

The short version is that the best option seems to be:

pdftotext -htmlmeta -layout -enc 'UTF-8' yourfile.pdf

This creates an html file with the text of the case laid out best as possible, some basic html meta data applied, and the UTF-8 encoding …

more ...