11 New Courts Added to CourtListener and Juriscraper Comprising Nearly 50,000 New Opinions

A goal of the Free Law Project is to make legal research easier and faster. One way we do that is to scrape court websites, downloading any opinions that they have, making them searchable and finding the citations relationships among them. For many jurisdictions, we download all the opinions they host, while at others we simply start downloading their opinions on a given day and use those as fuel for our awareness project whenever new material is published.

Today we are happy to share that thanks to several volunteer contributors, we’re adding a number of new jurisdictions to the project:

Combined, these new jurisdictions already add nearly 50,000 new opinions to our collection, and as always, these are immediately available for free via our bulk downloads. As these jurisdictions publish more opinions, we will have them automatically, usually within 30 minutes from when they are posted.

We will continue adding more and more jurisdictions and opinions. This is only the beginning.

more ...

Non-Profit “Free Law Project” Formed to Create an Open Legal Ecosystem

For Immediate Release —- Berkeley, CA

Brian W. Carver and Michael Lissner, creators of the CourtListener platform and associated technology, are pleased to announce that after four years developing free and open legal technologies, they are launching a non-profit umbrella organization for their work: Free Law Project. Free Law Project will serve to bring legal materials and research to the public for free, formalizing the work that they have been doing, and providing a long-term home for similar projects.

Since the birth of this country, legal materials have been in the hands of the few, denying legal justice to the many,” said Michael Lissner, co-founder of the new non-profit. “It is appalling that the public does not have free online access to the entirety of United States case law,” said Brian Carver, UC Berkeley professor and Free Law Project co-founder. “We are working to change this situation. We also provide a platform for developing technologies that can make legal research easier for both professionals and the general public.”

The official goals for the non-profit are:

  • To provide free, public, and permanent access to primary legal materials on the Internet for educational, charitable, and scientific purposes;
  • To develop, implement, and provide public …
more ...

Announcing Citation Queries and other Goodies

We’re proud to announce a big new feature today that we’ve been planning for a long time. Starting today, you can make citation queries against the CourtListener corpus. If you look in the bottom of the left hand column, you’ll see a new slider:


Sliding the handles around, you can easily filter out any documents that are too popular or not popular enough — or both. In addition to this, we’ve added citation counts to our results list, and citation count ordering to our results. For example, you can now order the results by most cited or least cited, depending on the kind of work you’re doing.

In addition, we’re also announcing two new fields that you can query: Judges and Nature of Suit. Both of these fields are currently very limited in our corpus, but as we add more documents, we want to expose these to our users. To query by judge name, you can either type the name directly into the judge text box on the left, or you can place a query using the “judge” operator and a query like [ judge:smith ]. For the Nature of Suit, the data is both incomplete …

more ...

A few small API changes

We’re updating our code in a number of ways today and that is resulting in a number of changes to the format of our data dumps. If you use them in an automated fashion, please note the following changes:

  • dateFiled is now date_filed
  • precedentialStatus is now precedential_status
  • docketNumber is now docket_number
  • westCite is now west_cite
  • lexisCite is now lexis_cite

Additionally, a new field, west_state_cite, has been added, which will have any citations to West’s state reporters. We’ve made these changes in preparation of a proper API that will return XML and JSON. Before we released that API, we needed to clean up some old field values so they were more consistent. After this point, we expect better consistency in the fields of our XML.

If this causes any inconvenience or if you need any help with these changes, please let us know.

more ...

Two RECAP Grants Awarded in Memory of Aaron Swartz

In memory of Internet activist Aaron Swartz, Think Computer Foundation (http://www.thinkcomputer.org) and the Center for Information Technology Policy (CITP) at Princeton University (http://citp.princeton.edu) are announcing the winners of two $5,000 grant awards for improving RECAP.

Since 2009, a team of researchers at Princeton has worked on a web browser-based system known as RECAP (https://free.law/recap/) that allows citizens to recapture public court records from the federal government’s official PACER database. The Administrative Office of the Courts charges per-page user fees for PACER documents, which makes it expensive to access these public records. RECAP allows users to easily share the records that they purchase to and freely access documents that others have already purchased.

Shortly after the unexpected death of Mr. Swartz, Think Computer Foundation announced that it would fund grants worth $5,000 each to extend RECAP and make use of data contained in Think Computer Foundation’s PlainSite database of legal information.

Two of these grants are being awarded today.

Ka-Ping Yee, a Canadian software developer living in Northern California, has created a version of RECAP for Google’s Chrome browser. This gives RECAP a much larger base of …

more ...

Six new courts added to CourtListener

We’re excited to announce today that we’ve added five new courts to our list that we support.

Today we add the Supreme Courts for

  1. California
  2. Indiana
  3. West Virginia
  4. Wisconsin
  5. Wyoming

These are the first State courts that we support and over the next few days we’ll be adding more as the Juriscraper library supports them. We already have another seven state courts in the wings!

By launching these courts today, we’re making a small change in our plans. We were previously working towards having all 50 supreme courts ready to go so we could add them in one big push, but since that’s taking longer than we would like to develop these scrapers, we’re going to start adding state courts as they’re ready, one by one.

Today’s launch adds five courts and about 1,200 more cases to the project. We need help getting the remaining courts ready. If you’re a developer and want to help, get in touch via our contact form and we’ll get you up and coding in no time.

more ...

Live coverage graphs now available

Thanks to a great volunteer contribution, we now have amazing graphs on our coverage page instead of simply static numbers.

The old version used to simply say the number of total documents we had for a court, leaving you scratching your head. The new version shows you a timeline indicating how many documents we have in each court for each year. It’s a great improvement that brings a lot more transparency into the coverage we have on the site.

more ...

$10,000 in Further Awards for RECAP Projects

Today, teams across the country are hard at work on the Aaron Swartz Memorial Grants. These grants, offered by the Think Computer Foundation, provide $5,000 awards for three different projects related to RECAP.

We are delighted to announce additional awards. The generous folks over at Google’s Open Source Programs team have pledged to support two more RECAP-related project awards — at $5,000 each. These are open to anyone who wishes to submit a proposal for a significant improvement to the RECAP system. We will work with the proposers to scope the project and define what qualifies for the award. All projects must be open source.

There are several potential ideas. For instance, someone might propose add support to RECAP for displaying the user’s current balance and prompting the user to liberate up to their free quarterly $15 allocation as the end of the quarter approaches (inspired by Operation Asymptote). Someone might propose to improve the https://www.courtlistener.com/recap/ interface, and to improve detection and removal of private information. Someone might propose some other idea that we haven’t thought of. You may wish to watch the discussion of a few of these initial ideas …

more ...

Another new court on CourtListener

We’re on a roll, and today I’m happy to share that we’ve added yet another court to the site. Today’s court, with about 50 cases so far, is the Bankruptcy Appellate Panel for the Ninth Circuit.

We’ll be adding a historical scraper for this court soon, but for now, sit back and enjoy our super-fast results as they get delivered straight to your email.

50 today. 1,000 tomorrow.

more ...

New Courts at CourtListener with Historical Data

I mentioned in my last post that we’ve added some new courts to the site. Today we’ve added the historical data for these courts that was available on their website.

This amounts to about 1,500 new cases on CourtListener:

  • 112 from November 2003 to today at the Court of Appeals for the Armed Forces.
  • 764 from January 2000 to today at the Court of Veterans Claims
  • 600 from January 2008 to today at the Court of International Trade

All of these docs are immediately available for search, RSS or via our dump API, and will be in our dump of all our cases when it is regenerated at the end of the month.

This also marks an important achievement for the Juriscraper library. Since CourtListener now has scrapers for all federal courts of special jurisdiction, we’re officially moving it to version 0.2. It’s taken longer than we wanted to get it here, but this is a huge step for the library.

Freeing 1,000 docs at a time.

more ...

A few updates at CourtListener

It’s been quiet around here for a little while, so it’s about time I share what’s been going on behind the scenes. As you might imagine, just because we haven’t had a lot of news doesn’t mean that we haven’t been busy.

The biggest thing I have to share today is that we’ve moved our CourtListener infrastructure to new and bigger hardware. This task has taken months to complete and involved applying many updates to the code and infrastructure. For developers, this upgrade comes with a few changes:

  1. Our default database for CourtListener is now Postgres rather than MySQL. This is something that’s been planned for a while, but wasn’t really possible until a big upgrade like this one. The big changes that come out of this are non-locking queries for our database dumps, and better performance for many of our queries. Since Postgres is a transactional, stricter and more featureful database, we’re convinced that it is a better way forward than MySQL. Oracle lately hasn’t been a great steward to MySQL, so it was a good time to jump ship. As a bonus, Posgres was started in Berkeley …
more ...

Announcing the Aaron Swartz Memorial Grants

Last week, our community lost Aaron Swartz. We are still reeling. Aaron was a fighter for openness and freedom, and many people have been channeling their grief into positive actions for causes that were close to Aaron’s heart. One of these people is Aaron Greenspan, creator of the open-data site Plainsite and the Think Computer Foundation. He has established a generous set of grants to be awarded to the first person (or group) that develops the following upgrades to RECAP, our court record liberation system. RECAP would not exist without the work of Aaron Swartz.

Three grants are being made available related to RECAP. Each grant is worth $5,000.00:

  1. Grant 1: Develop and release a version of RECAP for the Google Chrome browser that matches the current Firefox browser extension functionality
  2. Grant 2: Develop and release a version of RECAP for Internet Explorer that matches the current Firefox browser extension functionality
  3. Grant 3: Update the Firefox browser extension to capture appellate court documents, and update the RECAP server code to parse them and respond appropriately to browser extension requests

For more details, see The Aaron Swartz Memorial Grants. If you are interested, you must register by the …

more ...

Presentation on Juriscraper and CourtListener for LVI2012

Yesterday and today I’ve been in Ithaca, New York, participating in the Law via the Internet Conference (LVI), where I’ve been learning tons!

I had the good fortune to have my proposal topic selected for Track 4: Application Development for Open Access and Engagement.

In the interest of sharing, I’ve attached the latest version of my slides to this Blog post, and the audio for the talk may eventually get posted on the LVI site.



more ...

New tool for testing lxml XPath queries

I got a bit frustrated today, and decided that I should build a tool to fix my frustration. The problem was that we’re using a lot of XPath queries to scrape various court websites, but there was no tool that could be used to test xpath expressions efficiently.

There are a couple tools that are quite similar to what I just built: There’s one called Xacobeo, Eclipse has one built in, and even Firebug has a tool that does similar. Unfortunately though, these each operate on a different DOM interpretation than the one that lxml builds.

So the problem I was running into was that while these tools helped, I consistently had the problem that when the HTML got nasty, they’d start falling over.

No more! Today I built a quick Django app that can be run locally or on a server. It’s quite simple. You input some HTML and an XPath expression, and it will tell you the matches for that expression. It has syntax highlighting, and a few other tricks up its sleeve, but it’s pretty basic on the whole.

I’d love to get any feedback I can about this. It’s …

more ...

Announcing the third series of the Federal Reporter!

Following on Friday’s big announcement about our new citator, today I’m excited to share that we’ve completed incorporating volumes 1 to 491 of the third series of the Federal Reporter (F.3d). This has been a monumental task over the past six months. Since we already have many cases that were from the same time period and jurisdiction, we had to work very hard on our duplicate merging algorithm. In the end, we had were able to get upwards of 99% accuracy with our merging code, and any cases that could not be merged automatically were handled by human review. The outcome of this work is an improved dataset beyond any that has been available previously: In tens of thousands of cases, we have been able to merge the meta data on Resource.org with data that we obtained directly from the court websites.

These new cases bring our total number of cases up to 756,713, and we hope to hit a million by the end of the year. With this done, our next task is to begin incorporating and data from all of the appellate-level State Courts. We will be working on this in a …

more ...

Building a Citator on CourtListener

I’m incredibly excited today to announce that over the past few weeks we have successsfully rolled out a Citator on CourtListener. This feature was developed by UC Berkeley School of Information students Karen Rustad and Rowyn McDonald after a thorough design and development cycle which included everything from user interviews to performance optimizations of our citation finding algorithm.

As you’re browsing the site, you’ll immediately see three big new features. First, all Federal citations to documents that we have in our collection are now links. So as you’re reading, if there’s a reference to a prior case that you feel might be useful to your research, you can just click the link to that case and continue your research there. This allows you to go upstream in your research, looking at the important cases that came before.

The second big change you’ll see is a new sidebar on all case pages that lists the top five cases that reference the one you’re reading. This allows you to go downstream from the case you’re reading, where you’ll be able to identify how the case was later interpreted by other courts.

At the …

more ...

Further privacy protections at CourtListener

I’ve written previously about the lengths we go to at CourtListener to protect people’s privacy, and today we completed one more privacy enhancement.

After my last post on this topic, we discovered that although we had already blocked cases from appearing in the search results of all major search engines, we had a privacy leak in the form of our computer-readable sitemaps. These sitemaps contain links to every page within a website, and since those links contain the names of the parties in a case, it’s possible that a Google search for the party name could turn up results that should be hidden.

This was problematic, and as of now we have changed the way we serve sitemaps so that they use the noindex X-Robots-Tag HTTP header. This tells search crawlers that they are welcome to read our sitemaps, but that they should avoid serving them or indexing them.

more ...

My Presentation Proposals for LVI 2012

The Law Via the Internet conference is celebrating its 20th anniversary at Cornell University on October 7-9th. I will be attending, and with any luck, I’ll be presenting on the topic proposed below.

Wrangling Court Data on a National Level

Access to case law has recently become easier than ever: By simply visiting a court’s website it is now possible to find and read thousands of cases without ever leaving your home. At the same time, there are nearly a hundred court websites, many of these websites suffer from poor funding or prioritization, and gaining a higher-level view of the law can be challenging. “Juriscraper” is a new project designed to ease these problems for all those that wish to collect these court opinions daily. The project is under active development, and we are looking for others to get involved.

Juriscraper is a liberally-licensed open source library that can be picked up and used by any organization to scrape the case data from court websites. In addition to a simply scraping the websites and extracting metadata from them, Juriscraper has a number of other design goals:

  • Extensibility to support video, oral argument audio, and other media types
  • Support …
more ...

Announcing OCR Support on CourtListener

For the past few months, we have been blogging about our research into how to handle scanned documents at CourtListener since a number of courts have a habit of releasing their opinions in this manner. Previously when this happened, it meant that we couldn’t get the text out of the document, and as a result, it was impossible for anybody to find these cases on the site.

Obviously, this is a bad situation for our users, so we are excited to announce that as of today we have a new Optical Character Recognition (OCR) system for extracting the text from scanned documents. We’re currently extracting the text from an additional 10,000 opinions that were previously unsearchable, and going into the future we’ll do this automatically as we get cases from the courts.

This change further expands the breadth of our coverage, and we hope you find it to be a useful change!

more ...

Announcing CourtListener’s New Sub-Project: Juriscraper

For the past two years at CourtListener we used a mess of code to scrape the Federal Court system. This worked remarkably well, but we recently began expanding our coverage and it was clear a rewrite was needed. For the past several weeks, we’ve been building a replacement called Juriscraper that is more reliable, understandable, flexible and expandable.

Unlike our old scrapers, Juriscraper is a library that anybody can pick up and use, and which allows your project to easily scrape court websites. It is currently at version 0.1, which supports all of the courts on CourtListener, and over the next few weeks we’ll be adding many more courts until we have all of the available courts in the United States.

We hope that this project will be something that others will use, and that we can thus centralize our scraping efforts. There are many organizations that are currently scraping court websites, each with their own implementations that they build and maintain. This creates lot of duplicated work, and slows down the maintenance for everybody. By finally creating a liberally licensed shared scraper, we hope to bring everybody under the same scraping roof so we can share …

more ...