Facts about FJC's Integrated Database
This page is where we keep useful information and notes for people that are using the Integrated Database provided by the Federal Judicial Center.
This database contains information about every case filed in the PACER system, and is thus a valuable resource for researchers, journalists, and legal startups. There are a few questions that aren't answered by their website, and a few pieces of missing information that are essential for understanding the data. This is the page we wish existed when we began working with this data.
Where can I find an explanation of the office codes?
The codebook states that this information is in "Guide to Judiciary Policies and Procedures, Volume XI, Appendix A," but that does not appear to be online. We have requested this information and have published it here.Download the office codes
Is there any additional information about the fields?
Yes! We have a little information about the fields beyond what's in the code book. For example, many of the fields are artificially truncated. This appears to be a problem with the source data that the AO gives to the FJC.
We have requested the the lengths of the various fields in the civil data as of 1 June 2018, and published it here.Download the civil field info
A few weeks later, the the FJC also shared the lengths of the fields in the criminal data (without us even asking!):Download the criminal field info
How can I link up the IDB with PACER?
Well, you've always got the docket number, and sure enough, PACER provides a free undocumented API for looking up docket numbers and getting their unique PACER ID.
If you're familiar with PACER as a user, this is the API that is used by the website itself when you paste a docket number into the docket report form and press "Find this case."
Behind the scenes, when you do that, it does a query to a URL like:
We call this the "possible case numbers API". It will respond with something like:
<request number='3:12-cv-3879'> <case number='3:12-cv-3879' id='257622' title='3:12-cv-03879-VC Technology Properties Limited LLC et al v. Novatel Wireless, Inc. (closed 07/14/2015)' sortable='3:2012-cv-03879-VC'/> </request>
A few notes:
In this response, the PACER internal ID for the docket is in the
idfield and has a value of
Some docket numbers, particularly in criminal cases, are ambiguous, in which case there will be several
casenodes in the returned XML.
The possible case numbers API is very flexible in the format of docket numbers that it receives. If you prefer to query the docket number above as
3:12-cv-3879, that will also work.
Whatever format you put in will be converted to a standardized docket number, including the judge's initials.
If you plan to use this API, we recommend using the Juriscraper framework, which has APIs specifically for this purpose.
What encoding is used for the data?
According to FJC staff, the document is encoded as ascii data. Using the following command, we were able to find three exceptions to this rule containing umlauts:
grep --color='auto' -P -n "[^\x00-\x7F]" --text file.txt
How often is the data updated?
The FJC receives data from the AO on a quarterly basis around two months after the end of the quarter. FJC then post-processes the data and puts it online.
Why are the judge fields deliberately blank in the FJC data?
If you need these values, you might be able to obtain unredacted sources of the data from the National Archive of Criminal Justice Data at University of Michigan. They appear to have this data in unredacted form, but getting access to it is extremely onerous.