We’ve Added Thousands More Citations to Historical Supreme Court Opinions

We have a small update to share today, as we’ve wrapped up adding thousands of historical Supreme Court citations to our collection. These are the original citations for the Supreme Court from 1754 to 1874, from before when the United States Reports had begun. Previously we had many of these citations, but as of today we can say we have historical citations for our entire SCOTUS collection.

For the unfamiliar, Supreme Court citations were originally named after the Reporter of Decisions for the Supreme Court from the time the opinion was published. For example, the first person to do this was Alexander Dallas, and his citations start at 1 Dall. 1 (1754), and go forward to 4 Dall. 446 (1806). After Dallas came a long line of other reporters, each of whom named their series of books after himself until 1875, when congress began appropriating money for the full time creation of these reporters and demanded they be called the “United States Reports.”

18 Stat. 204 (1874)

A snapshot of 18 Stat. 204 (1874), which allocated $25,000 to the Supreme Court for printing (about $557,100 today).

At that time, 91 U.S. 1 was the first case to be born with …

more ...

Some Citation Parsing Statistics

We want to share some quick statistics today. We we just completed running our citation parser across the entire CourtListener collection. If you follow our work, you’ll know that the purpose of the citation parser is to go through every opinion in CourtListener and identify every citation from one opinion to another (such as “410 U.S. 113“). Once identified, the parser looks up the citation and attempts to make a hyperlink between the opinions so that if you see a citation while reading, you can click it to go to the correct place.

As you can imagine, looking up every citation in every opinion in CourtListener can take some time, so we only run our citation finder when we need to. In this case:

  • The process ran continuously for two weeks.
  • It ran a total of 253,872,460 queries against our search engine.
  • It found 25,471,410 citations between opinions.
  • There are about three million opinions currently in CourtListener.

After running the parser, the first stop I like to take is to go and see the search results ordered by citation count. In an upset, Strickland v. Washington, the former leader, has been pushed to third …

more ...

Our New Citation Finder

CourtListener now has a new citation finder that you can use with any citation in our system. It’s dead simple. There are two ways to use it.

Either simply type in the citation you want to look up:

Citation Lookup

Or, just make a link with a format like:

  • https://www.courtlistener.com/c/REPORTER/VOLUME/PAGE/

And you’ll get to the page for that citation. For example, using parallel citations, any of these links will take you to Citizen’s United v. Federal Elections Commission:

This new tool relies on our existing citation extractor, which extracts thousands of citations from opinions every day. As a result, these links are also able to handle alternate names for any reporter that we have encoded in our Reporters Database. For example, the United States Reports has historically also been abbreviated as “U.S.S.C.Rep.” or “USSCR”. Use either of these, and you’ll find that they also work without a hitch:

more ...

Reporters Database

United States Reporters

A long time ago in a courthouse not too far away, people started making books of every important decision made by the courts. These books became known as reporters and were generally created by librarian-types of yore such as Mr. William Cranch and Alex Dallas.

These men—-for they were all men—-were busy for the next few centuries and created thousands of these books, culminating in what we know today as West’s reporters or as regional reporters like the “Dakota Reports” or the thoroughly-named, “Synopses of the Decisions of the Supreme Court of Texas Arising from Restraints by Conscript and Other Military Authorities (Robards).”

Motivated by our need to identify citations to these reporters, we’ve taken a stab at aggregating a few facts about them, such as variations in their name, abbreviation, or years they were published, and put all that information into our reporters database. Until recently, this database lived deep inside CourtListener and was only discovered by intrepid hackers rooting around, but a few months ago we pulled it out, put it in its own repository, and converted it to better formats so anyone could more easily re-use it.

Currently, it’s ready to use …

more ...

CourtListener is Now Integrated with the Supreme Court Database

Earlier this week somebody on the Internet pinged us with some code and asked that we integrate the data from the Supreme Court Database (SCDB). Well, we’re happy to share that less than a week later we’ve taken the code they provided and used it to upgrade CourtListener’s database.

The Supreme Court Database includes data for about 8,500 Supreme Court opinions from 1946 to 2013 and this first pass merges that data with CourtListener so that:

  • Our copy of these opinions are enhanced with better parallel citations. You can now look these items up by U.S. Reporter (U.S.), The Supreme Court Reporter (S.Ct.), Lawyers’ Edition (L.Ed.) or even LEXIS citation (U.S. LEXIS). This should make our citation graph much more robust and should help people like Colin Starger at University of Baltimore that are doing great analyses with this data. Many of these items were screen scraped directly from the Supreme Court website meaning that for these items, this is the first time they have had proper citations. Here’s an example of the many parallel citations items now have:

Roe v. Wade

  • All Supreme Court Opinions from 1946 to 2013 have a new …
more ...

Our New Authorities Table Allows Traveling to the Past

For a long time we’ve had a feature that allowed you to look at the items that cite an opinion, letting you to look into the future and see what cases found it important down the line. As of today, we’re announcing the complimentary feature that allows easy travel into the past. Starting immediately, when you look at almost any case in our collection you’ll see an Authorities section in its sidebar.

For example, Roe v. Wade looks like this:


This section shows the top five opinions that were cited by the one you are looking at. If you wish to see all of the opinions it cited there is a link at the bottom that takes you to the new Table of Authorities page, which shows everything:


Now, when you’re looking at an opinion, you can easily travel through time to either opinions that came later or ones that came before. Doc Brown would be proud.

Posted by: Michael Lissner

more ...

Free Law Project Unveils API for Court Opinions


Today marks another big day for the Free Law Project. We’re happy to share that we’ve created the first ever API for U.S. Legal Opinions. An API —- or Application Programming Interface —- is a way for computers to talk to each other and consume each others’ data in an automated fashion. From this day forth, developers, researchers and legal startups can begin consuming the data that we have at CourtListener in a granular and very specific manner.

For example, here are some very basic things that can be done with our API (these links will only work if you are signed in to your CourtListener account):

more ...

CiteGeist Powers CourtListener’s Newly Improved Search Results

demo graph

The citation graph is made into a network to compute CiteGeist scores.

We’re excited to announce that beginning today our relevancy engine will provide significantly better results than it has in the past. Starting today, whenever you place a query we will analyze which opinions are the most cited, and we will use that to provide the best results possible. We’re calling this the CiteGeist score because it finds the spirit of your query (“Geist”) and gives you the best possible results. This is currently enabled for our corpus starting in the 1750’s up through about 1985, and the remaining years will get the CiteGeist treatment as well over the next few days.

The details of how CiteGeist works are in our code, but the basic idea is to give a high CiteGeist score to opinions that are cited many times by other important opinions, and to give a lower CiteGeist to opinions that have not been cited or that have only been cited by unimportant opinions. Once we’ve established the CiteGeist score, we combine it with a query’s keyword-based (TF/IDF) relevancy. Together, we get a combined score which is a measure of how …

more ...

Want to Merge Millions of Legal Opinions? It Won’t Be Easy.

Note: This is the third in the series of posts explaining the work that we did to release the data donation from Lawbox LLC. This is a very technical post exploring and documenting the process we use for extracting meta data and merging it with our current collection. If you’re not technically-inclined (or at least curious), you may want to scoot along.

Working with legal data is hard. We all know that, but this post serves to document the many reasons why that’s the case and then delves deeply into the ways we dealt with the problems we encountered while importing the Lawbox donation. The data we received from Lawbox contains about 1.6M HTML files and we’ve spent the past several months working with them to extract good meta data and then merge it with our current corpus. This post is a long and technical one and below I’ve broken it into two sections explaining this process: Extraction and Merging.


Extraction is a difficult process when working with legal data because it’s inevitably quite dirty: Terms aren’t used consistently, there are no reliable identifiers, formats vary across jurisdictions, and the data was …

more ...

Free Law Project Adds More than 1.5M Opinions to its Collection Thanks to Data Donation

For Immediate Release —- Berkeley, CA

After many years of collecting and curating data, today CourtListener crossed some incredible boundaries. Thanks to a generous data donation from Lawbox LLC, our computers are currently adding more than 1.5M new opinions to CourtListener, expanding our coverage to a total of more than 350 jurisdictions. This new data enables legal professionals and researchers insight into data that has never before been available in bulk and greatly enhances the data we previously had. This data will be slowly rolling out in our front end, and will soon be available in bulk from our bulk downloads page. A new version of our coverage page was developed, and, as always, you can see our current coverage for any jurisdiction we support.

It’s difficult to overstate the importance of this new data. In addition to being a massive expansion of our coverage, it also brings some notable improvements to the project:

  1. For all of the new data and much of our old data, we have added star pagination throughout. For the first time, this will make pinpoint citations possible using the CourtListener platform.
  2. We’ve re-organized our database for more accurate citations enabling for the first …
more ...

Announcing Citation Queries and other Goodies

We’re proud to announce a big new feature today that we’ve been planning for a long time. Starting today, you can make citation queries against the CourtListener corpus. If you look in the bottom of the left hand column, you’ll see a new slider:


Sliding the handles around, you can easily filter out any documents that are too popular or not popular enough — or both. In addition to this, we’ve added citation counts to our results list, and citation count ordering to our results. For example, you can now order the results by most cited or least cited, depending on the kind of work you’re doing.

In addition, we’re also announcing two new fields that you can query: Judges and Nature of Suit. Both of these fields are currently very limited in our corpus, but as we add more documents, we want to expose these to our users. To query by judge name, you can either type the name directly into the judge text box on the left, or you can place a query using the “judge” operator and a query like [ judge:smith ]. For the Nature of Suit, the data is both incomplete …

more ...

Building a Citator on CourtListener

I’m incredibly excited today to announce that over the past few weeks we have successsfully rolled out a Citator on CourtListener. This feature was developed by UC Berkeley School of Information students Karen Rustad and Rowyn McDonald after a thorough design and development cycle which included everything from user interviews to performance optimizations of our citation finding algorithm.

As you’re browsing the site, you’ll immediately see three big new features. First, all Federal citations to documents that we have in our collection are now links. So as you’re reading, if there’s a reference to a prior case that you feel might be useful to your research, you can just click the link to that case and continue your research there. This allows you to go upstream in your research, looking at the important cases that came before.

The second big change you’ll see is a new sidebar on all case pages that lists the top five cases that reference the one you’re reading. This allows you to go downstream from the case you’re reading, where you’ll be able to identify how the case was later interpreted by other courts.

At the …

more ...