More RECAP Events

On Tuesday June 22, Harlan Yu will be on Capitol Hill speaking to Congressional Staffers at an event sponsored by the Advisory Committee on Transparency. The event is called Transparency Made Easy: How to Make the Government More Open and Accountable. It is open only to staffers, but the Sunlight Foundation will post video of the event afterward.

On Friday the 25th, both Harlan and Steve will be at the 20th Annual CALI Conference in Camden NJ, explaining how “Capturing PACER for Open Access” can benefit the cause of legal education.

Update: The video and other materials from the Transparency Caucus are now available. You can watch Harlan’s remarks below:

[![]({filename}/images/recap/Harlan_Yu-Transparency_Advisory_Committee.png)](http://recap.s3.amazonaws.com/Harlan_Yu-Transparency_Advisory_Committee.mp4)
more ...

New PACER Fee Data, RECAP Appearances This Week

This week, RECAP team member Steve Schultze released new data on how PACER fees are collected and spent in a post entitled, “What Does It Cost to Provide Electronic Public Access to Court Records?” He will discuss this new data and other aspects of public access to federal case materials at a Law.gov event in DC tomorrow (live streaming video). Steve will also be on a panel at the Virginia State Bar Association’s annual meeting on Thursday, speaking about how the federal electronic filing experience can inform state efforts. That same day, RECAP developer Dhruv Kapadia will be at the Law.gov event at Harvard.

There will be more RECAP developments and public appearances in the coming weeks.

Update: The video for the Law.Gov Event at the Center for American Progress is now posted on their site. You can watch the excerpt of Steve’s remarks below:

Video of Steve

more ...

Announcing CourtListener.com

I’m elated to announce today that I am officially taking the ropes of my final project and letting it loose into the wild. It’s been seven months since development on it officially started and finally, the beta version is done.

If you haven’t been following along, the project itself is an open source legal research tool which allows anybody to keep up to date with federal precedents as they are set by the 13 Federal Circuit courts. Right now, it has more than 130,000 documents in its corpus, including almost all of the Supreme Court record dating back to 1754. Every day it downloads the latest documents within about a half hour of when each court publishes them.

One thing we’ve focused on while building the site has making it as useful as possible for as many people as possible. Since not everybody likes getting updates in their inbox, we’ve also tied the search engine in with an Atom feed generator so that you can search for whatever you want, and then follow updates in your feed reader.

Everything we’ve built uses a powerful boolean search engine on the backend. At present, there …

more ...

RECAP, The Press, and Judicial Transparency

USA Today just posted an article covering the debate over Arizona’s new immigration law. The article is essentially a roundup of relevant coverage, comments, and the wave of pending lawsuits. It is an example of the core activities of journalism in action — namely information gathering, synthesis, and dissemination on issues relevant to the public. The article is also notable because it embeds the actual text of a complaint from one of the lawsuits filed yesterday. This document came from PACER.

Some time yesterday, a RECAP user also downloaded that document, thus contributing it to the public archive. You can see the docket and document here. As a result, anyone can freely search for, download, or re-post the document. People can also follow the progress of the case (assuming RECAP users continue to download new documents as they are posted). The fact that this is all happening automatically is an exciting success for RECAP.

However, that success is inherently limited. Although the system worked in this particular case, there are literally millions of other cases that have not been similarly liberated from the PACER paywall. Many of these are highly relevant to American citizens, but they are not of broad …

more ...

RECAP Documents Now More Searchable Via Internet Archive

We recently made a small change to the way that documents uploaded by RECAP users are made available on the Internet Archive. Until today, the Internet Archive had served primarily as a bulk hosting provider, without much ability to browse or search the archive. This was enforced in two ways: First, it was not possible to search for documents using the Internet Archive’s search tools. Second, external search engines were prevented from indexing the site. We decided to do this in order to be especially cautious with respect to privacy concerns that we have previously discussed.

Since we launched, we have spent a great deal of time examining these issues, and we decided to make a small incremental step in making the documents more findable without (yet) allowing in-depth full-text search of all documents. We have enabled Internet Archive indexing, as well as search engine indexing, for the case summary pages on the Internet Archive. That means, for example, that the relatively limited information on the AT&T v. Hepting case summary page is now searchable.

You can find this case through the Internet Archive search engine by doing a query like this: http://www.archive.org/search.php …

more ...


Designing CourtListener

Over the past week, I’ve been working to create scrapers for each of the 13 federal appeals courts. Last night I finally finished the last of them, so today I’m moving on to the design of the site. Design is always much better when people work in a team, so I’m putting these designs here so others can look at them and give me feedback. Please, please do!

So far, I’ve sketched out four of the major pages that the site will have. A user’s will begin using the site on its homepage. Here, they will be given few options. Basically, they can login, register for an account, make a search, or read one of the ancillary pages such as the “About” or “Privacy” page:

first sketch

Also, note the advanced button under the search field. When this is clicked, it expands to show the advanced search queries that the site will support, as you can see on the next page.

If people are logged in, their homepage becomes the “Create new alert page,” which you can see below. For now, this allows users to create very complicated queries by hand. In the future, it would be …

more ...

Converting PDF Files to HTML

For my final project, we are considering posting court cases on our site, and so I did some work today analyzing how best to convert the PDF files the courts give us to HTML that people can actually use. I looked briefly at google docs, since it has an amazing tool that converts PDF files to something resembling text, but short of spending a few days hacking the site, I couldn’t figure out any easy way to leverage their technology in any sort of automated way.

The other two tools I have looked at today are pdftotext and pdftohtml, which, not surprisingly, do what their names claim they do. Since we’re going to be pulling cases from the 13 federal circuit courts, I wanted to figure out which method works best for which court, and which method will provide us with the most generalizable solution across whatever PDF a court may crank out.

The short version is that the best option seems to be:

pdftotext -htmlmeta -layout -enc 'UTF-8' yourfile.pdf

This creates an html file with the text of the case laid out best as possible, some basic html meta data applied, and the UTF-8 encoding …

more ...

RECAP Extension 0.6 Beta Released

The Mozilla Foundation released version 3.6 of Firefox today, and we’re proud to release the corresponding version of the RECAP extension, beta version 0.6. In addition to Firefox 3.6 compatibility, we’ve also thrown in a new feature suggested by our users: the option to save documents using filenames that we describe as “lawyer style” in contrast to the “Internet Archive style” we’ve traditionally used. For example, rather than saving a document as “gov.uscourts.cand.204881.46.0.pdf,” you can now configure the extension to store a document as “N.D.Cal._3-08-cv-03251_46_0.pdf.” Those who prefer the traditional filenames are free to continue using those as well.

We’ve also improved our docket-parsing code, allowing us to extract more metadata from court dockets. New fields we’re now scraping include “Assigned to”, “Referred to ” , “Cause”, “Nature of Suit”, “Jury Demand”, “Jurisdiction”, and “Demand.” We also scrape information about parties, including names, contact information, and attorneys. You can see a good example here (to choose a case at random).

If you’re an existing Firefox user, Firefox periodically checks for updates to extensions and should automatically fetch the new version of the RECAP …

more ...

RECAP in the Columbia Science and Technology Law Review

’s place at the heart or the periphery of the movement remains to be seen. Like any crowdsourcing application, RECAP’s usefulness increases as more people use it. Yet PACER’s prime users are large, bill-paying law firms, which tend to be wary about adopting new technology and have little incentive to contribute documents they paid for to a free database.

“Success” for RECAP may not be mainstream adoption, however. Merely by creating the working plugin and calling attention to the problem of restricted access to court documents, CITP has advanced the cause of reforming and opening up access to PACER. That alone is “Turning PACER around.”

One point this misses is that using RECAP can directly reduce firms’ PACER fees. It’s true, of course, that most firms pass these costs along to their clients. However, in today’s economic climate, clients are increasingly pressing their law firms for cost savings. Adopting RECAP is a painless way for firms to demonstrate cost-consciousness. And the cost savings from RECAP adoption will only get bigger as RECAP’s user base continues to grow. So while we think judicial transparency is reason enough to use RECAP, installing RECAP is good for every …

more ...

Google Project Shows Value of Open Judicial Records

We’re excited to see Google has unveiled a dramatic expansion of Google Scholar to include Supreme Court decisions going back to the 18th century, lower federal court decisions since the 1920s, and state Supreme Court and appellate decisions going back to the 1950s. They’ve done an impressive job with automated parsing of legal citations, transforming them into hyperlinks and allowing Google to do automated analysis of case similarity.

This type of project was precisely what we had in mind when some of us wrote “Government Data and the Invisible Hand” last year. The judiciary may be the foundation of a free society, but it’s not especially good at building websites or search engines. By making public records easily available for re-publications by third parties, the judiciary (and the other branches of government) can enable private parties to dramatically expand public access to public information.

In this case, the state and federal courts haven’t made it easy to download bulk data, so Google had to get the information from third parties. Google is a big company with significant resources at its disposal. But in an ideal world, it wouldn’t take the resources of a large company …

more ...

RECAP Media Recap

Last week, we got our first major media coverage from across the pond, as the Guardian gave us a generous write-up. They call RECAP “an ingenious twist on peer to peer networking” and write that “since the system launched in August, legal circles have been buzzing with support for the idea.”

Meanwhile, RECAP continues to generate interest from the legal profession. Earlier this month, RECAP’s own Tim Lee spoke to a group of New Jersey lawyers about how the software can save their clients money while expanding access to the public domain. And Arizona Attorney magazine has an in-depth article about RECAP and the debate over public access. They write that “there appears to be nothing illegal about the use of RECAP by those who are paying PACER users” (we agree). And they conclude that we’ve “carefully thought through the ethical implications and goals of the program.” We like to think so. The December issue of Virginia Lawyer magazine profiles RECAP, describing in detail the efforts so far to liberate PACER documents.

more ...

RECAP in Minnesota Lawyer

Word about RECAP continues to spread through the legal profession. The latest issue of Minnesota Lawyer covers the case of a Minneapolis lawyer who was sanctioned for inadvertently including the Social Security numbers and dates of birth of dozens of individuals in court documents, when the rules of civil procedure mandate that only the last four digits of a Social Security number and the year of birth be disclosed in documents filed with the court.

The article then mentions RECAP as one reason for attorneys to be careful about redaction when they’re filing court documents:

Friedemann said that concern over the publication of sensitive information has been elevated by recent Web programs like RECAP, which has made it easier to access public court filings.

RECAP automatically uploads all PACER documents a user is viewing onto an archive maintained by the non-profit group Internet Archive. When the next RECAP user attempts to view a PACER document that has already been archived, RECAP automatically uploads the copy to prevent that user from paying for those materials. The system allows users of PACER to slowly create a secondary archive of these public documents that can be accessed for free.

Friedemann explained that …

more ...

An Effort to Define the Ideal “Law.gov”

A group of academics has been convened by Public.Resource.Org in order to define recommendations for a proposed federal government site: law.gov. The group will study the feasibility of creating the equivalent of a data.gov for legal materials. The process will define a concrete path forward forward for the government. Specifically, it will deliver:

  • Detailed technical specifications for markup, authentication, bulk access, and other aspects of a distributed registry.
  • A bill of lading defining which materials should be made available on the system.
  • A detailed business plan and budget for the organization in the government running the new system.
  • Sample enabling legislation.
  • An economic impact statement detailing the effect on federal spending and economic activity.
  • Procedures for auditing materials on the system to ensure authenticity.

Ed Felten, Executive Director of Princeton’s Center for Information Technology Policy (which also produced RECAP), is one of the co-conveners.

more ...

Schultze on RECAP at Yale

Last week RECAP’s Steve Schultze and Harlan Yu visited Yale Law School to give a talk sponsored by Yale’s Information Society Project. Yale librarian Jason Eiseman produced a short interview with Steve that he describes as “a little Blair Witch.” Steve talks about the origins of RECAP, discusses some of the current challenge faced by RECAP, and talks briefly about RECAP’s newest sister project, FedThread.

more ...

RECAP in the Los Angeles Times and Elsewhere

Monday’s Los Angeles Times has a great article talking about the growing movement for government transparency. It focuses on three of our favorite transparency advocates: Ellen Miller, co-founder of the Sunlight Foundation; Josh Tauberer, a regular at CITP conferences, and Carl Malamud, whose non-profit, public.resource.org, is a key RECAP partner.

The article discusses RECAP in some detail, describing it as “a sort of digital Kumbaya.” We’re always happy to have news outlets help spread the word about RECAP, and we’re also glad that the article makes clear that RECAP is part of a broader movement for web-enabled government transparency. Folks like Carl, Josh, and Ellen have been pushing the envelope on these issues longer than we have.

One minor correction that’s worth noting: the article refers to “the courts’ PACER revenue of $10 million a year.” In reality, the expected revenue for 2009 is $87 million. This and many other details about PACER’s budget can be found in RECAP co-author Steve Schultze’s recently-released paper on the subject.

RECAP has been a subject of discussion in other venues as well. Ars Technica discussed the courts’ reaction to RECAP in its story about the …

more ...

RECAP’s Steve Schultze at the Gov 2.0 Expo

RECAP co-author Steve Schultze is having a busy month. Last week, he released a new paper called “Electronic Public Access Fees and the United States Federal Courts’ Budget: An Overview.” It provides a comprehensive overview of PACER’s budget. It explains how the courts decide how much to charge for PACER and how the money is spent. It’s an invaluable roadmap for anyone interested in understanding the debate over PACER’s future.

Today, Steve is at the Gov 2.0 Expo giving a talk about RECAP. If you’re at the expo as well, we hope you’re planning to go to the talk, which starts at 10:50. If not, you can see a pre-recorded version of his talk here:

teaser image

Finally, next week Steve will start his new job as associate director of the Center for Information Technology Policy at Princeton, which is the home of RECAP and its other co-authors. The rest of the RECAP team is excited that we’ll soon have Steve as a colleague as well as a co-author.

more ...

RECAP in the Wall Street Journal

Last week we did a round-up of leading technology-focused sites that have covered RECAP. Now, it seems that news of RECAP is spreading beyond the “tech blogosphere,” as more mainstream publications have begun writing about our software. Foreign Policy‘s Evgeny Morozov covered RECAP, calling it “smart and subversive.” On Wednesday NextGov, a National Journal publication widely read within the government IT community, ran a thorough write-up of RECAP by Aliya Sternstein. It included some good background on how RECAP fits into the larger debate about judicial transparency.

Finally, Katherine Mangu-Ward has penned a piece for the Wall Street Journal about RECAP. Katherine calls RECAP “a sleek little add-on” with “a stylish and subversive touch.” She writes:

With the possible exception of the ever-leaky CIA, no aspect of government remains more locked down than the secretive, hierarchical judicial branch. Digital records of court filings, briefs and transcripts sit behind paywalls like Lexis and Westlaw. Legal codes and judicial documents aren’t copyrighted, but governments often cut exclusive distribution deals, rendering other access methods a bit legally questionable. Supreme Court decisions are easy to get, but the briefs and decisions of lower courts can be hard to come by.

Last week …

more ...

A Note on RECAP’s Commitment to Privacy

We’ve gotten our first official reaction from the judiciary, in the form of a statement on the New Mexico Bankruptcy court’s website. It contains two important points about the PACER terms of use, and a misleading statement about privacy that we want to correct.

First, the good news: the court acknowledges the point we’ve made before: use of RECAP is consistent with the law and the PACER terms of use. The only potential exception is if you’ve received a fee waiver for PACER. In that case, use of RECAP could violate the terms of the fee waiver, which reads: “Any transfer of data obtained as the result of a fee exemption is prohibited unless expressly authorized by the court.” We’re not lawyers, so we don’t know if the court’s interpretation is correct, but we encourage our users to honor the terms of the fee waiver.

Now, an important correction. The statement raises the concern that RECAP could compromise sealed or private documents that attorneys access via the CM/ECF, the system attorneys use for electronic filing and retrieval of documents in pending cases. Protecting privacy is our top priority, and we specifically designed …

more ...

Tell The Courts to Improve PACER

One way to promote broader public access to the public record is to use RECAP to share documents with others. A complimentary approach is to tell the U.S. Courts directly what should change. Recently, Stanford Law Librarian Erika Wayne launched a petition to “Improve PACER,” which suggested several changes:

  1. Provide document authentication As the raw materials of adjudication become digitized and disseminated online, we must have some means of knowing that they are genuine. This is a dilemma that RECAP faces in helping users to trust the documents they download.
  2. Lower costs, improve interfaces Our ultimate goal is to remove PACER’s paywall entirely and free the database up for third parties to build interfaces. But in the meantime, it would certainly benefit the public to gain less expensive access to the law through more useful interfaces. The petition recommends that the U.S. Courts reduce the transaction costs of access, and make that access more usable.
  3. Free access from Federal Depository Libraries

Erika will deliver the petition to the Administrative Office of the Courts in the near future. If you support these goals, consider signing the petition.

more ...