RECAP, The Press, and Judicial Transparency

USA Today just posted an article covering the debate over Arizona’s new immigration law. The article is essentially a roundup of relevant coverage, comments, and the wave of pending lawsuits. It is an example of the core activities of journalism in action — namely information gathering, synthesis, and dissemination on issues relevant to the public. The article is also notable because it embeds the actual text of a complaint from one of the lawsuits filed yesterday. This document came from PACER.

Some time yesterday, a RECAP user also downloaded that document, thus contributing it to the public archive. You can see the docket and document here. As a result, anyone can freely search for, download, or re-post the document. People can also follow the progress of the case (assuming RECAP users continue to download new documents as they are posted). The fact that this is all happening automatically is an exciting success for RECAP.

However, that success is inherently limited. Although the system worked in this particular case, there are literally millions of other cases that have not been similarly liberated from the PACER paywall. Many of these are highly relevant to American citizens, but they are not of broad …

more ...

RECAP Documents Now More Searchable Via Internet Archive

We recently made a small change to the way that documents uploaded by RECAP users are made available on the Internet Archive. Until today, the Internet Archive had served primarily as a bulk hosting provider, without much ability to browse or search the archive. This was enforced in two ways: First, it was not possible to search for documents using the Internet Archive’s search tools. Second, external search engines were prevented from indexing the site. We decided to do this in order to be especially cautious with respect to privacy concerns that we have previously discussed.

Since we launched, we have spent a great deal of time examining these issues, and we decided to make a small incremental step in making the documents more findable without (yet) allowing in-depth full-text search of all documents. We have enabled Internet Archive indexing, as well as search engine indexing, for the case summary pages on the Internet Archive. That means, for example, that the relatively limited information on the AT&T v. Hepting case summary page is now searchable.

You can find this case through the Internet Archive search engine by doing a query like this: …

more ...

Designing CourtListener

Over the past week, I’ve been working to create scrapers for each of the 13 federal appeals courts. Last night I finally finished the last of them, so today I’m moving on to the design of the site. Design is always much better when people work in a team, so I’m putting these designs here so others can look at them and give me feedback. Please, please do!

So far, I’ve sketched out four of the major pages that the site will have. A user’s will begin using the site on its homepage. Here, they will be given few options. Basically, they can login, register for an account, make a search, or read one of the ancillary pages such as the “About” or “Privacy” page:

first sketch

Also, note the advanced button under the search field. When this is clicked, it expands to show the advanced search queries that the site will support, as you can see on the next page.

If people are logged in, their homepage becomes the “Create new alert page,” which you can see below. For now, this allows users to create very complicated queries by hand. In the future, it would be …

more ...

Converting PDF Files to HTML

For my final project, we are considering posting court cases on our site, and so I did some work today analyzing how best to convert the PDF files the courts give us to HTML that people can actually use. I looked briefly at google docs, since it has an amazing tool that converts PDF files to something resembling text, but short of spending a few days hacking the site, I couldn’t figure out any easy way to leverage their technology in any sort of automated way.

The other two tools I have looked at today are pdftotext and pdftohtml, which, not surprisingly, do what their names claim they do. Since we’re going to be pulling cases from the 13 federal circuit courts, I wanted to figure out which method works best for which court, and which method will provide us with the most generalizable solution across whatever PDF a court may crank out.

The short version is that the best option seems to be:

pdftotext -htmlmeta -layout -enc 'UTF-8' yourfile.pdf

This creates an html file with the text of the case laid out best as possible, some basic html meta data applied, and the UTF-8 encoding …

more ...

RECAP Extension 0.6 Beta Released

The Mozilla Foundation released version 3.6 of Firefox today, and we’re proud to release the corresponding version of the RECAP extension, beta version 0.6. In addition to Firefox 3.6 compatibility, we’ve also thrown in a new feature suggested by our users: the option to save documents using filenames that we describe as “lawyer style” in contrast to the “Internet Archive style” we’ve traditionally used. For example, rather than saving a document as “gov.uscourts.cand.204881.46.0.pdf,” you can now configure the extension to store a document as “N.D.Cal._3-08-cv-03251_46_0.pdf.” Those who prefer the traditional filenames are free to continue using those as well.

We’ve also improved our docket-parsing code, allowing us to extract more metadata from court dockets. New fields we’re now scraping include “Assigned to”, “Referred to ” , “Cause”, “Nature of Suit”, “Jury Demand”, “Jurisdiction”, and “Demand.” We also scrape information about parties, including names, contact information, and attorneys. You can see a good example here (to choose a case at random).

If you’re an existing Firefox user, Firefox periodically checks for updates to extensions and should automatically fetch the new version of the RECAP …

more ...

RECAP in the Columbia Science and Technology Law Review

’s place at the heart or the periphery of the movement remains to be seen. Like any crowdsourcing application, RECAP’s usefulness increases as more people use it. Yet PACER’s prime users are large, bill-paying law firms, which tend to be wary about adopting new technology and have little incentive to contribute documents they paid for to a free database.

“Success” for RECAP may not be mainstream adoption, however. Merely by creating the working plugin and calling attention to the problem of restricted access to court documents, CITP has advanced the cause of reforming and opening up access to PACER. That alone is “Turning PACER around.”

One point this misses is that using RECAP can directly reduce firms’ PACER fees. It’s true, of course, that most firms pass these costs along to their clients. However, in today’s economic climate, clients are increasingly pressing their law firms for cost savings. Adopting RECAP is a painless way for firms to demonstrate cost-consciousness. And the cost savings from RECAP adoption will only get bigger as RECAP’s user base continues to grow. So while we think judicial transparency is reason enough to use RECAP, installing RECAP is good for every …

more ...

Google Project Shows Value of Open Judicial Records

We’re excited to see Google has unveiled a dramatic expansion of Google Scholar to include Supreme Court decisions going back to the 18th century, lower federal court decisions since the 1920s, and state Supreme Court and appellate decisions going back to the 1950s. They’ve done an impressive job with automated parsing of legal citations, transforming them into hyperlinks and allowing Google to do automated analysis of case similarity.

This type of project was precisely what we had in mind when some of us wrote “Government Data and the Invisible Hand” last year. The judiciary may be the foundation of a free society, but it’s not especially good at building websites or search engines. By making public records easily available for re-publications by third parties, the judiciary (and the other branches of government) can enable private parties to dramatically expand public access to public information.

In this case, the state and federal courts haven’t made it easy to download bulk data, so Google had to get the information from third parties. Google is a big company with significant resources at its disposal. But in an ideal world, it wouldn’t take the resources of a large company …

more ...

RECAP Media Recap

Last week, we got our first major media coverage from across the pond, as the Guardian gave us a generous write-up. They call RECAP “an ingenious twist on peer to peer networking” and write that “since the system launched in August, legal circles have been buzzing with support for the idea.”

Meanwhile, RECAP continues to generate interest from the legal profession. Earlier this month, RECAP’s own Tim Lee spoke to a group of New Jersey lawyers about how the software can save their clients money while expanding access to the public domain. And Arizona Attorney magazine has an in-depth article about RECAP and the debate over public access. They write that “there appears to be nothing illegal about the use of RECAP by those who are paying PACER users” (we agree). And they conclude that we’ve “carefully thought through the ethical implications and goals of the program.” We like to think so. The December issue of Virginia Lawyer magazine profiles RECAP, describing in detail the efforts so far to liberate PACER documents.

more ...

RECAP in Minnesota Lawyer

Word about RECAP continues to spread through the legal profession. The latest issue of Minnesota Lawyer covers the case of a Minneapolis lawyer who was sanctioned for inadvertently including the Social Security numbers and dates of birth of dozens of individuals in court documents, when the rules of civil procedure mandate that only the last four digits of a Social Security number and the year of birth be disclosed in documents filed with the court.

The article then mentions RECAP as one reason for attorneys to be careful about redaction when they’re filing court documents:

Friedemann said that concern over the publication of sensitive information has been elevated by recent Web programs like RECAP, which has made it easier to access public court filings.

RECAP automatically uploads all PACER documents a user is viewing onto an archive maintained by the non-profit group Internet Archive. When the next RECAP user attempts to view a PACER document that has already been archived, RECAP automatically uploads the copy to prevent that user from paying for those materials. The system allows users of PACER to slowly create a secondary archive of these public documents that can be accessed for free.

Friedemann explained that …

more ...

An Effort to Define the Ideal “”

A group of academics has been convened by Public.Resource.Org in order to define recommendations for a proposed federal government site: The group will study the feasibility of creating the equivalent of a for legal materials. The process will define a concrete path forward forward for the government. Specifically, it will deliver:

  • Detailed technical specifications for markup, authentication, bulk access, and other aspects of a distributed registry.
  • A bill of lading defining which materials should be made available on the system.
  • A detailed business plan and budget for the organization in the government running the new system.
  • Sample enabling legislation.
  • An economic impact statement detailing the effect on federal spending and economic activity.
  • Procedures for auditing materials on the system to ensure authenticity.

Ed Felten, Executive Director of Princeton’s Center for Information Technology Policy (which also produced RECAP), is one of the co-conveners.

more ...

Schultze on RECAP at Yale

Last week RECAP’s Steve Schultze and Harlan Yu visited Yale Law School to give a talk sponsored by Yale’s Information Society Project. Yale librarian Jason Eiseman produced a short interview with Steve that he describes as “a little Blair Witch.” Steve talks about the origins of RECAP, discusses some of the current challenge faced by RECAP, and talks briefly about RECAP’s newest sister project, FedThread.

more ...

RECAP in the Los Angeles Times and Elsewhere

Monday’s Los Angeles Times has a great article talking about the growing movement for government transparency. It focuses on three of our favorite transparency advocates: Ellen Miller, co-founder of the Sunlight Foundation; Josh Tauberer, a regular at CITP conferences, and Carl Malamud, whose non-profit,, is a key RECAP partner.

The article discusses RECAP in some detail, describing it as “a sort of digital Kumbaya.” We’re always happy to have news outlets help spread the word about RECAP, and we’re also glad that the article makes clear that RECAP is part of a broader movement for web-enabled government transparency. Folks like Carl, Josh, and Ellen have been pushing the envelope on these issues longer than we have.

One minor correction that’s worth noting: the article refers to “the courts’ PACER revenue of $10 million a year.” In reality, the expected revenue for 2009 is $87 million. This and many other details about PACER’s budget can be found in RECAP co-author Steve Schultze’s recently-released paper on the subject.

RECAP has been a subject of discussion in other venues as well. Ars Technica discussed the courts’ reaction to RECAP in its story about the …

more ...

RECAP’s Steve Schultze at the Gov 2.0 Expo

RECAP co-author Steve Schultze is having a busy month. Last week, he released a new paper called “Electronic Public Access Fees and the United States Federal Courts’ Budget: An Overview.” It provides a comprehensive overview of PACER’s budget. It explains how the courts decide how much to charge for PACER and how the money is spent. It’s an invaluable roadmap for anyone interested in understanding the debate over PACER’s future.

Today, Steve is at the Gov 2.0 Expo giving a talk about RECAP. If you’re at the expo as well, we hope you’re planning to go to the talk, which starts at 10:50. If not, you can see a pre-recorded version of his talk here:

teaser image

Finally, next week Steve will start his new job as associate director of the Center for Information Technology Policy at Princeton, which is the home of RECAP and its other co-authors. The rest of the RECAP team is excited that we’ll soon have Steve as a colleague as well as a co-author.

more ...

RECAP in the Wall Street Journal

Last week we did a round-up of leading technology-focused sites that have covered RECAP. Now, it seems that news of RECAP is spreading beyond the “tech blogosphere,” as more mainstream publications have begun writing about our software. Foreign Policy‘s Evgeny Morozov covered RECAP, calling it “smart and subversive.” On Wednesday NextGov, a National Journal publication widely read within the government IT community, ran a thorough write-up of RECAP by Aliya Sternstein. It included some good background on how RECAP fits into the larger debate about judicial transparency.

Finally, Katherine Mangu-Ward has penned a piece for the Wall Street Journal about RECAP. Katherine calls RECAP “a sleek little add-on” with “a stylish and subversive touch.” She writes:

With the possible exception of the ever-leaky CIA, no aspect of government remains more locked down than the secretive, hierarchical judicial branch. Digital records of court filings, briefs and transcripts sit behind paywalls like Lexis and Westlaw. Legal codes and judicial documents aren’t copyrighted, but governments often cut exclusive distribution deals, rendering other access methods a bit legally questionable. Supreme Court decisions are easy to get, but the briefs and decisions of lower courts can be hard to come by.

Last week …

more ...

A Note on RECAP’s Commitment to Privacy

We’ve gotten our first official reaction from the judiciary, in the form of a statement on the New Mexico Bankruptcy court’s website. It contains two important points about the PACER terms of use, and a misleading statement about privacy that we want to correct.

First, the good news: the court acknowledges the point we’ve made before: use of RECAP is consistent with the law and the PACER terms of use. The only potential exception is if you’ve received a fee waiver for PACER. In that case, use of RECAP could violate the terms of the fee waiver, which reads: “Any transfer of data obtained as the result of a fee exemption is prohibited unless expressly authorized by the court.” We’re not lawyers, so we don’t know if the court’s interpretation is correct, but we encourage our users to honor the terms of the fee waiver.

Now, an important correction. The statement raises the concern that RECAP could compromise sealed or private documents that attorneys access via the CM/ECF, the system attorneys use for electronic filing and retrieval of documents in pending cases. Protecting privacy is our top priority, and we specifically designed …

more ...

Tell The Courts to Improve PACER

One way to promote broader public access to the public record is to use RECAP to share documents with others. A complimentary approach is to tell the U.S. Courts directly what should change. Recently, Stanford Law Librarian Erika Wayne launched a petition to “Improve PACER,” which suggested several changes:

  1. Provide document authentication As the raw materials of adjudication become digitized and disseminated online, we must have some means of knowing that they are genuine. This is a dilemma that RECAP faces in helping users to trust the documents they download.
  2. Lower costs, improve interfaces Our ultimate goal is to remove PACER’s paywall entirely and free the database up for third parties to build interfaces. But in the meantime, it would certainly benefit the public to gain less expensive access to the law through more useful interfaces. The petition recommends that the U.S. Courts reduce the transaction costs of access, and make that access more usable.
  3. Free access from Federal Depository Libraries

Erika will deliver the petition to the Administrative Office of the Courts in the near future. If you support these goals, consider signing the petition.

more ...

Accessing the RECAP Repository without PACER

Of all the questions we’ve received, probably the most common is whether it will be possible to access the documents in our archive without using PACER at all. The answer is yes, but at the moment we don’t offer any good browsing or searching tools.

The big reason has to do with privacy. One of our top priorities in developing RECAP was making sure we don’t inadvertently compromise the privacy of individuals who are the subject of court records. A lot of sensitive personal information is revealed in the course of federal court cases. A variety of private parties might be interested in using the information contained in these records for illicit purposes such as identity theft, stalking, and witness intimidation. We wanted to make sure we weren’t inadvertently facilitating those types of activities.

In theory, the courts have redaction rules designed to deal with these problems. Judges can order particularly sensitive documents to be sealed, and the rest of the documents are supposed to be redacted to prevent inadvertent disclosure of private information. Unfortunately, this process is far from perfect. Private information does sometimes wind up in the public version of court documents.

When court …

more ...

Law Professors, Librarians, and Think Tankers Praise RECAP

We’ve been getting a ton of helpful feedback from users over the weekend. We’re grateful for all the supportive emails, comments, and tweets we’ve received. We’re also grateful for the bug reports and feature requests we’ve gotten. We need this kind of feedback to make RECAP better.

Most of the questions we’ve received are are now answered by our Frequently Asked Questions. Stay tuned for some upcoming blog posts where we’ll address some of these questions in more detail. But first, we wanted to highlight some more of the commentary that RECAP’s release has generated.

James Grimmelmann, a law professor at New York Law School who has done some great writing on public access to the law, gives RECAP this generous endorsement:

The great part about this is that because the Archive is providing the server space for free, every RECAP user is saving the court system work. Each time you download through RECAP, you avoid having to go through PACER’s servers at all. So yes, RECAP will mean a decrease in PACER’s revenues, but it also means a decrease in the things those revenues need to pay for. It …

more ...

The Blogosphere Weighs in on RECAP

We’re thrilled at the reception RECAP has gotten in its first few hours. Among the notable reactions, Techcrunch discusses the legal issues and concludes that using RECAP doesn’t violate copyright law. RECAP is a hot topic of conversation at Slashdot. CNet also weighed in, highlighting one of the challenges RECAP may face in the coming months:

There are some potential problems. One is that because the RECAP developers plan to make the source code available, it wouldn’t be hard for someone to seed the Internet Archive with “official court documents” that had been modified in some way. (The answer is for users to pay to download important files from PACER, or for the courts to employ digital signatures.)

Techdirt calls RECAP “ingenious”, and concludes that “this is a fantastic idea that hopefully will help to open up public domain court information that has been locked behind PACER’s paywalls for too long.”

Finally, Ars Technica does its usual thorough job of covering RECAP, writing:

The RECAP project could also illuminate potential solutions to the problems that are blocking a more complete PACER overhaul. Despite growing pressure from Congress to reform the PACER system and make data available …

more ...