Announcing OCR Support on CourtListener

Michael Lissner

March 3, 2012

For the past few months, we have been blogging about our research into how to handle scanned documents at CourtListener since a number of courts have a habit of releasing their opinions in this manner. Previously when this happened, it meant that we couldn't get the text out of the document, and as a result, it was impossible for anybody to find these cases on the site.

Obviously, this is a bad situation for our users, so we are excited to announce that as of today we have a new Optical Character Recognition (OCR) system for extracting the text from scanned documents. We're currently extracting the text from an additional 10,000 opinions that were previously unsearchable, and going into the future we'll do this automatically as we get cases from the courts.

This change further expands the breadth of our coverage, and we hope you find it to be a useful change!

Tagged:AnnouncementsCourtListenerOCR

Started in 2010, Free Law Project is the leading non-profit using technology, data, and advocacy to make the legal ecosystem more equitable and competitive. We host major open databases of opinions, federal filings, judges, financial disclosures, and oral arguments. We build open source tools like eyecite, juriscraper, and x-ray.

We rely on your donations for our continued success.

Please Support Our Work

Announcing OCR Support on CourtListener

About

Our Work

Tools

Data

Engage

Support FLP