Parties, Attorneys, and Firms are Now Searchable in the RECAP Archive

Today we are launching party, attorney, and firm search for the RECAP Archive of PACER documents. This unlocks powerful new ways to do your research.

For example, consider the following queries:

Click any of the above queries to see how they were made.

To use this new feature, type the name of the party or attorney into the fields on the RECAP Archive homepage or in the sidebar to the left of any search results. These boxes also accept advanced query syntax, and there are several new fields that can be queried from the main search box including party, attorney, and firm.

For example, in the main box you can search for attorney:”eric holder”~2 firm:covington. This query shows the cases where the attorney has the word “Eric” within two words of “Holder” (thus allowing his middle name) which were handled at the firm “Covington & Burling”.

Demo of Eric Holder at Covington & Burling

A search for Eric X Holder while …

more ...

Free Law Project Re-Launches RECAP Archive, a New Search Tool for PACER Dockets and Documents

After months of development, we are thrilled to share a from-scratch re-launch of the RECAP Archive. Our new archive, available immediately at https://www.courtlistener.com/recap/, contains all of the content currently in RECAP and makes it all fully searchable for the first time. At launch, the collection contains information about more than ten million PACER documents, including the extracted text from more than seven million pages of scanned documents.

RECAP Advanced Search Screen

The new advanced search interface for the RECAP Archive.

The search capabilities of this new system empower researchers in new ways. For example:

more ...

Free Law Project Unveils API for Court Opinions

powered-by
logo

Today marks another big day for the Free Law Project. We’re happy to share that we’ve created the first ever API for U.S. Legal Opinions. An API —- or Application Programming Interface —- is a way for computers to talk to each other and consume each others’ data in an automated fashion. From this day forth, developers, researchers and legal startups can begin consuming the data that we have at CourtListener in a granular and very specific manner.

For example, here are some very basic things that can be done with our API (these links will only work if you are signed in to your CourtListener account):

more ...

CiteGeist Powers CourtListener’s Newly Improved Search Results

demo graph

The citation graph is made into a network to compute CiteGeist scores.

We’re excited to announce that beginning today our relevancy engine will provide significantly better results than it has in the past. Starting today, whenever you place a query we will analyze which opinions are the most cited, and we will use that to provide the best results possible. We’re calling this the CiteGeist score because it finds the spirit of your query (“Geist”) and gives you the best possible results. This is currently enabled for our corpus starting in the 1750’s up through about 1985, and the remaining years will get the CiteGeist treatment as well over the next few days.

The details of how CiteGeist works are in our code, but the basic idea is to give a high CiteGeist score to opinions that are cited many times by other important opinions, and to give a lower CiteGeist to opinions that have not been cited or that have only been cited by unimportant opinions. Once we’ve established the CiteGeist score, we combine it with a query’s keyword-based (TF/IDF) relevancy. Together, we get a combined score which is a measure of how …

more ...

Announcing Citation Queries and other Goodies

We’re proud to announce a big new feature today that we’ve been planning for a long time. Starting today, you can make citation queries against the CourtListener corpus. If you look in the bottom of the left hand column, you’ll see a new slider:

sliders

Sliding the handles around, you can easily filter out any documents that are too popular or not popular enough — or both. In addition to this, we’ve added citation counts to our results list, and citation count ordering to our results. For example, you can now order the results by most cited or least cited, depending on the kind of work you’re doing.

In addition, we’re also announcing two new fields that you can query: Judges and Nature of Suit. Both of these fields are currently very limited in our corpus, but as we add more documents, we want to expose these to our users. To query by judge name, you can either type the name directly into the judge text box on the left, or you can place a query using the “judge” operator and a query like [ judge:smith ]. For the Nature of Suit, the data is both incomplete …

more ...

A few updates at CourtListener

It’s been quiet around here for a little while, so it’s about time I share what’s been going on behind the scenes. As you might imagine, just because we haven’t had a lot of news doesn’t mean that we haven’t been busy.

The biggest thing I have to share today is that we’ve moved our CourtListener infrastructure to new and bigger hardware. This task has taken months to complete and involved applying many updates to the code and infrastructure. For developers, this upgrade comes with a few changes:

  1. Our default database for CourtListener is now Postgres rather than MySQL. This is something that’s been planned for a while, but wasn’t really possible until a big upgrade like this one. The big changes that come out of this are non-locking queries for our database dumps, and better performance for many of our queries. Since Postgres is a transactional, stricter and more featureful database, we’re convinced that it is a better way forward than MySQL. Oracle lately hasn’t been a great steward to MySQL, so it was a good time to jump ship. As a bonus, Posgres was started in Berkeley …
more ...

Building a Citator on CourtListener

I’m incredibly excited today to announce that over the past few weeks we have successsfully rolled out a Citator on CourtListener. This feature was developed by UC Berkeley School of Information students Karen Rustad and Rowyn McDonald after a thorough design and development cycle which included everything from user interviews to performance optimizations of our citation finding algorithm.

As you’re browsing the site, you’ll immediately see three big new features. First, all Federal citations to documents that we have in our collection are now links. So as you’re reading, if there’s a reference to a prior case that you feel might be useful to your research, you can just click the link to that case and continue your research there. This allows you to go upstream in your research, looking at the important cases that came before.

The second big change you’ll see is a new sidebar on all case pages that lists the top five cases that reference the one you’re reading. This allows you to go downstream from the case you’re reading, where you’ll be able to identify how the case was later interpreted by other courts.

At the …

more ...

Support for x-robots-tag and robots HTML meta tag

As part of our research for our post on how we block search engines, we looked into which search engines support which privacy standards. This information doesn’t seem to exist anywhere else on the Internet, so below are our findings, starting with the big guys, and moving towards more obscure or foreign search engines.

Google, Bing

Google (known as Googlebot) and Bing (known as Bingbot) support the x-robots-tag header and the robots HTML tag. Here’s Google’s page on the topic. And here’s Bing’s. The msnbot is retired.

Yahoo, AOL

Yahoo!’s search engine is provided by Bing. AOL’s is provided by Google. These are easy ones.

Ask, Yandex, Nutch

Ask (known as teoma), and Yandex (Russia’s search engine, known as yandex), support the robots meta tag, but do not appear to support the x-robots-tag. Ask’s page on the topic is here, and Yandex’s is here. The popular open source crawler, Nutch, also supports the robots HTML tag, but not the x-robots-tag header. Update: Newer versions of Nutch now support x-robots-tag!

The Internet Archive, Alexa

The Internet Archive uses Alexa’s crawler, which is known as ia_archiver. This crawler does not seem …

more ...

Our Biggest Change Ever is Live!

After three months of hard development, I’m pleased to announce that the new version of CourtListener is going live at this very moment. In this version, we’ve completely rewritten vast swaths of the underlying code, and we’ve switched to a hugely more powerful architecture.

The new site comes with some significant improvements:

  • You can now search by casename, date, court, precedential status or citation
  • Results can be ordered by date or by relevance
  • New Boolean operators are supported, and our syntax is much more intuitive (see here for many more details)
  • If you want, you can now dig very deeply into the results. Previously, we had a cap at 1,000 results for a query. Not any more.
  • Court documents will now show up in our search results within milliseconds of being found on the court’s website. In the future, if there’s demand, we may use this to offer Realtime alerts.
  • We now have snippets and highlighting on our results page.
  • Finally, some polish everywhere to make things prettier.
  • Huge performance improvements.
  • Better support for mobile devices and tablets.
  • Better support for disabled people, and users that prefer not to use JavaScript.

And that just …

more ...

Announcements, Updates and the Current Roadmap

Just a quick note today to share some exciting news and updates about CourtListener.

First, I am elated to announce that the CourtListener project is now supported in part by a grant from Public.Resource.Org. With this support, we are now able to develop much more ambitious improvements to the site that would otherwise not be possible. Over the next few months, the site should be changing greatly thanks to this support, and I’d like to take a moment to share both what we’ve already been able to do, and the coming changes we have planned.

One feature that we added earlier this week is a single location where you can download the entire CourtListener corpus. With a single click, you can download 2.2GB of court cases in XML format. Check out the information on the dump page for more details about when the dump is generated, and how you can get it: http://courtlistener.com/dump-info/

The second exciting feature that we’ve been working on is a platform change that enables CourtListener to support a much larger corpus. In the past, we’ve had difficulty with jobs being performed synchronously with the court scrapers …

more ...

RECAP Documents Now More Searchable Via Internet Archive

We recently made a small change to the way that documents uploaded by RECAP users are made available on the Internet Archive. Until today, the Internet Archive had served primarily as a bulk hosting provider, without much ability to browse or search the archive. This was enforced in two ways: First, it was not possible to search for documents using the Internet Archive’s search tools. Second, external search engines were prevented from indexing the site. We decided to do this in order to be especially cautious with respect to privacy concerns that we have previously discussed.

Since we launched, we have spent a great deal of time examining these issues, and we decided to make a small incremental step in making the documents more findable without (yet) allowing in-depth full-text search of all documents. We have enabled Internet Archive indexing, as well as search engine indexing, for the case summary pages on the Internet Archive. That means, for example, that the relatively limited information on the AT&T v. Hepting case summary page is now searchable.

You can find this case through the Internet Archive search engine by doing a query like this: http://www.archive.org/search.php …

more ...