Facts about FJC's Integrated Database

This page is where we keep useful information and notes for people that are using the Integrated Database provided by the Federal Judicial Center.
This database contains information about every case filed in the PACER system, and is thus a valuable resource for researchers, journalists, and legal startups. There are a few questions that aren't answered by their website, and a few pieces of missing information that are essential for understanding the data. This is the page we wish existed when we began working with this data.
Where can I find an explanation of the office codes?
The codebook states that this information is in "Guide to Judiciary Policies and Procedures, Volume XI, Appendix A," but that does not appear to be online. We have requested this information and have published it here.
Download the office codesWhere can I find an explanation of the criminal offense codes?
As far as we know, these aren't published online, but a copy was requested by a researcher, which we are publishing here.
Download the criminal offense codesIs there any additional information about the fields?
Yes! We have a little information about the fields beyond what's in the code book. For example, many of the fields are artificially truncated. This appears to be a problem with the source data that the AO gives to the FJC.
We have requested the lengths of the various fields in the civil data as of 1 June 2018, and published it here.
Download the civil field infoA few weeks later, the FJC also shared the lengths of the fields in the criminal data (without us even asking!):
Download the criminal field infoHow can I link up the IDB with PACER?
Well, you've always got the docket number, and sure enough, PACER provides a free undocumented API for looking up docket numbers and getting their unique PACER ID.
If you're familiar with PACER as a user, this is the API that is used by the website itself when you paste a docket number into the docket report form and press "Find this case."
Behind the scenes, when you do that, it does a query to a URL like:
We call this the "possible case numbers API". It will respond with something like:
A few notes:
- 
In this response, the PACER internal ID for the docket is in the idfield and has a value of257622.
- 
Some docket numbers, particularly in criminal cases, are ambiguous, in which case there will be several casenodes in the returned XML.
- 
The possible case numbers API is very flexible in the format of docket numbers that it receives. If you prefer to query the docket number above as 12003879instead of3:12-cv-3879, that will also work.Whatever format you put in will be converted to a standardized docket number, including the judge's initials. 
If you plan to use this API, we recommend using the Juriscraper framework, which has APIs specifically for this purpose.
What encoding is used for the data?
According to FJC staff, the document is encoded as ascii data. Using the following command, we were able to find three exceptions to this rule containing umlauts:
grep --color='auto' -P -n "[^\x00-\x7F]" --text file.txt
How often is the data updated?
The FJC receives data from the AO quarterly around two months after the end of the quarter. FJC then post-processes the data and puts it online.
Why are the judge fields deliberately blank in the FJC data?
These are blank according to policy set by the Judicial Conference of the U.S. It was originally set in March of 1995 (pg 21-22), and it was reaffirmed in March of 2003 (see page 20).

If you need these values, you might be able to obtain unredacted sources of the data from the National Archive of Criminal Justice Data at University of Michigan. They appear to have this data in unredacted form, but getting access to it is extremely onerous.
Are there other resources about the IDB?
Yes! The folks at SCALES-OKN have a nice post discussing how to crosswalk the data.