New RECAP Archive Search Is Now Live
In 2016, we launched a major new search engine for the RECAP Archive. Since then, it has grown in usage and size, measuring in at over four hundred million items and receiving about thirty thousand queries per day.
Today, after nearly a year of work, and several months in beta, we're launching a brand-new search engine for the RECAP Archive, switching the underlying system to be more accurate, scalable, and functional.
This new system fixes a number of longstanding issues and adds a variety of new features.
Among the improvements:
All Cases Are Now Searchable.
The old search system primarily searched filings, instead of cases, but, due to the PACER paywall, many cases in our system don't get filings. When this happened, such cases simply would not show up in search results.
That was a big problem that is now fixed. Now, if you search the name of a case that lacks filings, you'll see the case name and metadata, like this:
This is an important fix that allows you to search the metadata for a case while we wait for filings to be added to the system.
You can search for exactly the words you want.
In general, people want searches to be broadly interpreted. For example, if you search for
immigration, you probably want to also get results for
immigrated, etc. But sometimes that can be annoying, and people have often asked if it could be turned off.
In the new search engine it can. Simply surround any word in double quotes, and we will use just that word and not any variation of it. Think: "Deposit" vs. "Deposition" or "McDonald" the last name vs. "McDonalds" the restaurant (plural).
It's frustrating when searches for common acronyms and abbreviations don't bring back their longer forms. To fix this, we have searched our data and the Blue Book to find the top 800 legal abbreviations and acronyms. Try a search for something like "IRS" to see this in action.
Call it the Eric Goldman feature: You can now search emojis and unicode characters.
Try searching criminal cases with: 💣 OR 🔫 OR 💰.
For many years we've extracted the text from documents, but being able to search in particular fields is a superpower, and we've added a number of fields to make that possible. The new fields are:
chapter— The bankruptcy chapter for a case.
trustee_str— The name of the trustee for a bankruptcy case.
entry_number— The docket entry number of a filing.
jursidcitionType— The jurisdiction code for a case.
plain_text— The text of the filing, only.
cites— The opinion IDs cited by a filing (more on this soon).
pacer_doc_id— The internal ID for a filing as assigned by PACER.
See the search help page for details.
Better search, generally
A handful of little fixes are in place:
- Small words (like "to", "the", etc.) are now searchable.
- Queries with upper and lower case letters now work better. Try: McDonalds or WikiLeaks.
- Highlighting in search results is improved and more consistent.
- Docket number and other fielded searches are more robust.
- Timezone bugs are now fixed and dates are more consistent.
In short, we've attempted to fix every bug and feature request people have sent us in order to make the most powerful and accurate boolean search engine we could.
We think it's the best search engine we've ever released.
In the coming months we will be building on this foundation:
- We will bring these enhancements to our case law search engine and our APIs.
- We will build a new high-speed alert system for RECAP to send notifications when search queries have new results.
- We will make it possible to find filings based on the cases they cite (and this will work with the alerts too).
Try It Now!
This new system has been a huge undertaking, and we're thrilled to be launching it today. We hope you'll send us your feedback and thoughts!