Responding to GDPR “Right to Erasure” Requests

The General Data Protection Regulation (GDPR) is a sweeping new data protection and privacy law out of the EU. One of the things the GDPR includes is the ability of EU citizens to send “Right to Erasure” requests to websites, asking that those websites remove content that might be private. Recently, we received one of these requests from our domain registrar asking that we remove a court document from our database on CourtListener. It appears that this is a growing problem for other legal publishers too, with techdirt doing a write up of the issue late last week:

GDPR is a major development in the regulation of the Internet. It includes protections for individuals and a variety of regulations that apply to service providers like us. When GDPR went into effect, we were easily able to comply with its numerous privacy regulations because we were already being extremely conservative about who we shared data with and how much data we collected (see our privacy policy for details). For us, adopting compatible procedures with the GDPR just meant a few tweaks — no big deal.

Until last week that is, when we received a “Right to Erasure” request demanding that we remove a case from CourtListener. Now we have an EU regulation that’s at odds with our goal of gathering and sharing important legal information. What’s worse, if we complied with this request, we would be removing precedential information from CourtListener. Our policy is to never do that without a court order from a competent jurisdiction. In short, this take down request is at odds with our goals — and with the design of the American legal system.

So where do we bend? Who wins in this conflict between us, the GDPR, and the individual wishing to remove content from CourtListener? What follows is our approach to responding to this kind of request. The short version is that we won’t comply. We don’t believe we are subject to the GDPR, and even if we were, it has numerous carve outs specifically for the kind of information we provide.

Read on for the details of our approach.

more ...

Free Law Project Re-Launches RECAP Archive, a New Search Tool for PACER Dockets and Documents

After months of development, we are thrilled to share a from-scratch re-launch of the RECAP Archive. Our new archive, available immediately at https://www.courtlistener.com/recap/, contains all of the content currently in RECAP and makes it all fully searchable for the first time. At launch, the collection contains information about more than ten million PACER documents, including the extracted text from more than seven million pages of scanned documents.

RECAP Advanced Search Screen

The new advanced search interface for the RECAP Archive.

The search capabilities of this new system empower researchers in new ways. For example:

more ...

Further privacy protections at CourtListener

I’ve written previously about the lengths we go to at CourtListener to protect people’s privacy, and today we completed one more privacy enhancement.

After my last post on this topic, we discovered that although we had already blocked cases from appearing in the search results of all major search engines, we had a privacy leak in the form of our computer-readable sitemaps. These sitemaps contain links to every page within a website, and since those links contain the names of the parties in a case, it’s possible that a Google search for the party name could turn up results that should be hidden.

This was problematic, and as of now we have changed the way we serve sitemaps so that they use the noindex X-Robots-Tag HTTP header. This tells search crawlers that they are welcome to read our sitemaps, but that they should avoid serving them or indexing them.

more ...

Support for x-robots-tag and robots HTML meta tag

As part of our research for our post on how we block search engines, we looked into which search engines support which privacy standards. This information doesn’t seem to exist anywhere else on the Internet, so below are our findings, starting with the big guys, and moving towards more obscure or foreign search engines.

Google, Bing

Google (known as Googlebot) and Bing (known as Bingbot) support the x-robots-tag header and the robots HTML tag. Here’s Google’s page on the topic. And here’s Bing’s. The msnbot is retired.

Yahoo, AOL

Yahoo!’s search engine is provided by Bing. AOL’s is provided by Google. These are easy ones.

Ask, Yandex, Nutch

Ask (known as teoma), and Yandex (Russia’s search engine, known as yandex), support the robots meta tag, but do not appear to support the x-robots-tag. Ask’s page on the topic is here, and Yandex’s is here. The popular open source crawler, Nutch, also supports the robots HTML tag, but not the x-robots-tag header. Update: Newer versions of Nutch now support x-robots-tag!

The Internet Archive, Alexa

The Internet Archive uses Alexa’s crawler, which is known as ia_archiver. This crawler does not seem …

more ...

Respecting privacy while providing hundreds of thousands of public documents

At CourtListener, we have always taken privacy very seriously. We have over 600,000 cases currently, most of which are available on Google and other search engines. But in the interest of privacy, we make two broad exceptions to what’s available on search engines:

  1. As is stated in our removal policy, if someone gets in touch with us in writing and requests that we block search engines from indexing a document, we generally attempt to do so within a few hours.
  2. If we discover a privacy problem within a case, we proactively block search engines from indexing it.

Each of these exceptions presents interesting problems. In the case of requests to prevent indexing by search engines, we’re often faced with an ethical dilemma, since in many instances, the party making the request is merely displeased that their involvement in the case is easy to discover and/or they are simply embarrassed by their past. In this case, the question we have to ask ourselves is: Where is the balance between the person’s right to privacy and the public’s need to access court records, and to what extent do changes in practical obscurity compel action on our …

more ...

RECAP Documents Now More Searchable Via Internet Archive

We recently made a small change to the way that documents uploaded by RECAP users are made available on the Internet Archive. Until today, the Internet Archive had served primarily as a bulk hosting provider, without much ability to browse or search the archive. This was enforced in two ways: First, it was not possible to search for documents using the Internet Archive’s search tools. Second, external search engines were prevented from indexing the site. We decided to do this in order to be especially cautious with respect to privacy concerns that we have previously discussed.

Since we launched, we have spent a great deal of time examining these issues, and we decided to make a small incremental step in making the documents more findable without (yet) allowing in-depth full-text search of all documents. We have enabled Internet Archive indexing, as well as search engine indexing, for the case summary pages on the Internet Archive. That means, for example, that the relatively limited information on the AT&T v. Hepting case summary page is now searchable.

You can find this case through the Internet Archive search engine by doing a query like this: http://www.archive.org/search.php …

more ...

A Note on RECAP’s Commitment to Privacy

We’ve gotten our first official reaction from the judiciary, in the form of a statement on the New Mexico Bankruptcy court’s website. It contains two important points about the PACER terms of use, and a misleading statement about privacy that we want to correct.

First, the good news: the court acknowledges the point we’ve made before: use of RECAP is consistent with the law and the PACER terms of use. The only potential exception is if you’ve received a fee waiver for PACER. In that case, use of RECAP could violate the terms of the fee waiver, which reads: “Any transfer of data obtained as the result of a fee exemption is prohibited unless expressly authorized by the court.” We’re not lawyers, so we don’t know if the court’s interpretation is correct, but we encourage our users to honor the terms of the fee waiver.

Now, an important correction. The statement raises the concern that RECAP could compromise sealed or private documents that attorneys access via the CM/ECF, the system attorneys use for electronic filing and retrieval of documents in pending cases. Protecting privacy is our top priority, and we specifically designed …

more ...

Accessing the RECAP Repository without PACER

Of all the questions we’ve received, probably the most common is whether it will be possible to access the documents in our archive without using PACER at all. The answer is yes, but at the moment we don’t offer any good browsing or searching tools.

The big reason has to do with privacy. One of our top priorities in developing RECAP was making sure we don’t inadvertently compromise the privacy of individuals who are the subject of court records. A lot of sensitive personal information is revealed in the course of federal court cases. A variety of private parties might be interested in using the information contained in these records for illicit purposes such as identity theft, stalking, and witness intimidation. We wanted to make sure we weren’t inadvertently facilitating those types of activities.

In theory, the courts have redaction rules designed to deal with these problems. Judges can order particularly sensitive documents to be sealed, and the rest of the documents are supposed to be redacted to prevent inadvertent disclosure of private information. Unfortunately, this process is far from perfect. Private information does sometimes wind up in the public version of court documents.

When court …

more ...