Announcing a New Open Database of Court Information, IDs, and Parsers

William E. Palin, Esq., Michael Lissner

March 10, 2020

John Adams Courthouse — One court, many names: The "Supreme Court of Massachusetts", the "Supreme Judicial Court of Massachusetts", "The State of Mass. Supreme Court", the "Massachusetts Supreme Judicial Court", and the list goes on. Courtesy of Wikipedia

Since 2010 when we launched CourtListener, one of our goals has been to build a complete, accurate, and audited collection of open case law. Today that goal takes a major step forward as we announce a new tool we have built to parse court data.

A critical step in parsing court opinions is knowing which court produced the opinion. Unfortunately, courts change their names over time and so we encounter opinions from the "Supreme Court of Massachusetts", the "Supreme Judicial Court of Massachusetts", and the "Massachusetts Supreme Judicial Court", among others. These are all names for the very same court of last resort in Massachusetts, so we created a tool that recognizes all these varied names.

We call our tool the Free Law Project Courts-DB.

Using Courts-DB, you can easily look up the name of nearly any American court with published cases going back to 1600. We have used this functionality to parse nearly 16 million court names. After doing so, our accuracy at parsing court names stands at 99.998%. (The remaining 0.002% generally requires a human to understand.)

The Numbers

Tested against 16M courts

17,887 lines of code

718 court identifiers

361 court websites

2,100 regular expressions

Courts-DB consists of over 17,000 lines of code and has data about American courts from the 1600s until modern times. Generally, if the court ever had a published case — and often even if it did not — then that court will be available in Courts-DB. This includes special and limited jurisdiction courts, tribal courts, and even a couple of United States Courts of other countries (looking at you United States Court for Berlin).

Courts-DB uses over 2,100 regular expressions to match court names, has over 300 court websites available for lookup, and provides thousands of examples, variations, typos, and other court metadata.

Finally, the DB contains identifiers for all of these courts. Identifiers are an important part of building any software system, and their absence from the legal industry has been an ongoing challenge to innovation and interoperation. Many of our identifiers are already adopted by the SALI Alliance and we hope to soon incorporate the rest into their standards. If you are developing any sort of legal software, we hope you will consider using these identifiers.

Starting now, Courts-DB is available as open code, a python package or as an extremely long JSON file.

Courts-db is part of larger initiatives at Free Law Project to organize and provide free and open access to every US court opinion in history. We encourage and invite users to join, research and test our code. In particular, we are looking for help adding court start and end dates to Courts-DB. If you're interested in lending a hand, please get in touch.

To learn more about the project, the data and how to use it please visit Courts-db on Github.

Tagged:courtscourts-dbcourtlisteneropen datahistory

Started in 2010, Free Law Project is the leading non-profit using technology, data, and advocacy to make the legal ecosystem more equitable and competitive. We host major open databases of opinions, federal filings, judges, financial disclosures, and oral arguments. We build open source tools like eyecite, juriscraper, and x-ray.

We rely on your donations for our continued success.

Please Support Our Work

Announcing a New Open Database of Court Information, IDs, and Parsers

The Numbers

About

Our Work

Tools

Data

Engage

Support FLP