We’re excited to announce that beginning today our relevancy engine will provide significantly better results than it has in the past. Starting today, whenever you place a query we will analyze which opinions are the most cited, and we will use that to provide the best results possible. We’re calling this the CiteGeist score because it finds the spirit of your query (“Geist”) and gives you the best possible results. This is currently enabled for our corpus starting in the 1750’s up through about 1985, and the remaining years will get the CiteGeist treatment as well over the next few days.
The details of how CiteGeist works are in our code, but the basic idea is to give a high CiteGeist score to opinions that are cited many times by other important opinions, and to give a lower CiteGeist to opinions that have not been cited or that have only been cited by unimportant opinions. Once we’ve established the CiteGeist score, we combine it with a query’s keyword-based (TF/IDF) relevancy. Together, we get a combined score which is a measure of how intrinsically important a case is (its CiteGeist) as well as how closely it matches your specific query.
We are proud to offer this service, and as always we give away our data in our bulk files and soon via our API. We hope that this new feature will make legal research faster, easier and more accurate and we couldn’t be prouder to offer this service.
This feature was developed by a volunteer contributor, Bo Jin (Krist). He is majoring in Software Engineering at Tianjin University and spent the summer of 2013 taking classes at UC Berkeley. Krist worked closely with us while in Berkeley learning about our code base and has continued to contribute now that he’s returned to China to finish his degree. He hopes to return to the U.S. next fall to pursue a Masters degree in Computer Science.
Posted by: Michael Lissner