Semantic Search API Now Live!
Announcing the launch of our Semantic Search API in CourtListener—a major leap forward in how you search legal data.
We've been working diligently to bring semantic search to our case law database for quite some time. Back in March, we shared a status update with extensive details on the machine learning techniques and models powering this feature. Since then, we've completed many rounds of testing and generated embeddings for all our case law data. The functionality is now live through our API, complementing the keyword search you already know and love!
Key Features
Unlike keyword search, semantic search lets you search using natural language. It looks beyond exact matches to understand the meaning and intent behind your query. This means it can surface relevant precedents even when they're phrased differently—something keyword searches often miss.
How It Works
An encoder machine learning model, fine-tuned for information retrieval, represents legal opinions as embeddings (high-dimensional numbers) that capture their semantic meaning. When you submit a query, the same model translates it into embeddings, and an approximate nearest neighbor algorithm identifies opinions with the most semantically similar meanings.
For a detailed comparison of keyword versus semantic search, check out our help page on Citegeist.
Try It NowSemantic search can be frustrating when it ignores words that must be in the results. To address this, we've implemented hybrid search. This feature lets you combine keyword phrases with natural language, so you can query by both keywords and semantic meaning simultaneously. To ensure keywords are in the results, enclose your keywords in quotation marks.
All your favorite filters—dates, courts, and other parameters—work seamlessly with semantic search, just like with keyword search. And like keyword search, results are intelligently boosted by our Citegeist relevancy algorithm.
We've put this feature through rigorous testing. Our alpha users have consistently praised the retrieval quality, and we've completed numerous machine learning benchmarks to ensure accuracy. Given the massive volume of case law data, we've also run countless rounds of speed and performance testing to guarantee fast retrieval times—something we'll continue monitoring as usage grows.
True to our mission, we've also made the case law embeddings fully open and available for download. Learn more on our bulk data page. As you might imagine, generating these embeddings is no small feat, and each download incurs about $200 in AWS fees for Free Law Project. If you find this valuable, please consider becoming a member or making a donation to support our work!
Become a Member Today Make a One-Time Donation
With Thanks
Launching semantic search at this scale has taken a considerable amount of work and collaboration by—among others!—the Free Law Project dev and AI teams; a number of volunteers including LegalTextAI, Nina Shamsi at Northeastern University, and Dominik Stammbach at Princeton University; and our alpha and beta testers.
Thank you all for your contributions, help, and wisdom!
What's Next?
Semantic search is currently available through our API, but we're already working to bring it to the website—stay tuned! And don't worry, keyword search isn't going anywhere. You'll have full access to both.
If you're as excited about semantic search as we are, give it a try! We can't wait to see what you'll build!