For the past few months, we have been blogging about our research into how to handle scanned documents at CourtListener since a number of courts have a habit of releasing their opinions in this manner. Previously when this happened, it meant that we couldn’t get the text out of the document, and as a result, it was impossible for anybody to find these cases on the site.
Obviously, this is a bad situation for our users, so we are excited to announce that as of today we have a new Optical Character Recognition (OCR) system for extracting the text from scanned documents. We’re currently extracting the text from an additional 10,000 opinions that were previously unsearchable, and going into the future we’ll do this automatically as we get cases from the courts.
This change further expands the breadth of our coverage, and we hope you find it to be a useful change!