options(scipen = 1000000)

library(pacman)
p_load(tidyverse, arrow, lubridate, zoo, digest, ggplot2, plotly, gghighlight, knitr, kableExtra)

df <- data.table::fread(file = here::here('write', 'input', 'ice_detentions_fy12-24ytd.csv.gz'),
                        sep = '|')

data.table::setDF(df)

detloc_aor <- df %>% 
  distinct(detention_facility_code, area_of_responsibility) %>% 
  arrange(detention_facility_code, area_of_responsibility)

This dataset of U.S. Immigration and Customs Enforcement (ICE) nationwide detention placements from October 1, 2011 to January 4, 2024 (full U.S. government fiscal years 2012-2023) includes 8631489 records after dropping duplicate records and records missing anonymized_id values1, representing 3360633 unique individuals and 1122 unique detention facilities. The dataset has been minimally cleaned and standardized and is known to contain inconsistencies; all results should be interpreted with caution.

The data and associated code can be found in the following GitHub repository: https://github.com/UWCHR/ice-detain

Data format

Each row in this dataset represents a placement of an individual at a specific ICE detention facility. Unique people are identified by the anonymized_identifier field. One or more successive placements constitute a detention stay. Unique detention stays are identified by the combination of anonymized_identifier and stay_book_in_date_time fields. Each unique person represented in the dataset may experience one or more detention stays.

Data were released by ICE in separate annual spreadsheets containing all detention placement records for detention stays with an initial book-in during that U.S. government fiscal year. Stay records spanning multiple fiscal years do not appear to be duplicated in successive datasets. Within each spreadsheet, detention histories are sorted by anonymized_identifier and detention_book_in_date_and_time, resulting in “blocs” of records for each stay.

Completed detention placements are identified by non-null detention_book_out_date_time and detention_release_reason fields; associated stay records are identified by non-null stay_book_out_date_time and stay_release_reason fields. In most cases, these values will match associated detention_book_out_date_time and detention_release_reason values for the final placement of the stay.

See below for select fields of a representative bloc of detention placement records pertaining to a single completed individual detention stay:

sample_cols <- c("stay_book_in_date_time",
                 "detention_book_in_date_and_time", # Note irregular field name format
                 "detention_book_out_date_time",
                 "stay_book_out_date_time",
                 "citizenship_country",
                 "gender",
                 "birth_year",
                 "detention_facility",
                 "area_of_responsibility",
                 "detention_release_reason",
                 "stay_release_reason",
                 "anonymized_identifier")

example <- df[df$anonymized_identifier == "0000005694c4fd9ebcc71c6a54fca1cdfc516c36", sample_cols]

knitr::kable(example, row.names = FALSE, format = "html") %>%
  kableExtra::scroll_box(width = "100%", height = "300px")
stay_book_in_date_time detention_book_in_date_and_time detention_book_out_date_time stay_book_out_date_time citizenship_country gender birth_year detention_facility area_of_responsibility detention_release_reason stay_release_reason anonymized_identifier
2012-03-23 03:55:00 2012-03-23 03:55:00 2012-03-23 14:30:00 2012-04-10 15:56:00 MEXICO Male 1983 LOS CUST CASE Los Angeles Area of Responsibility Transferred Removed 0000005694c4fd9ebcc71c6a54fca1cdfc516c36
2012-03-23 03:55:00 2012-03-23 14:46:00 2012-03-26 14:15:00 2012-04-10 15:56:00 MEXICO Male 1983 THEO LACY FACILITY Los Angeles Area of Responsibility Transferred Removed 0000005694c4fd9ebcc71c6a54fca1cdfc516c36
2012-03-23 03:55:00 2012-03-26 14:16:00 2012-04-02 18:06:00 2012-04-10 15:56:00 MEXICO Male 1983 WESTERN MEDICAL CENTER Los Angeles Area of Responsibility Transferred Removed 0000005694c4fd9ebcc71c6a54fca1cdfc516c36
2012-03-23 03:55:00 2012-04-02 18:06:00 2012-04-10 15:13:00 2012-04-10 15:56:00 MEXICO Male 1983 THEO LACY FACILITY Los Angeles Area of Responsibility Transferred Removed 0000005694c4fd9ebcc71c6a54fca1cdfc516c36
2012-03-23 03:55:00 2012-04-10 15:14:00 2012-04-10 15:56:00 2012-04-10 15:56:00 MEXICO Male 1983 SANTA ANA DRO HOLDROOM Los Angeles Area of Responsibility Removed Removed 0000005694c4fd9ebcc71c6a54fca1cdfc516c36

Ongoing detention placements at the time of production of the data on January 4, 2024 are missing detention_book_out_date_time and detention_release_reason values; associated blocs of records are missing stay_book_out_date_time and stay_release_reason values. See the below example of an individual with three detention stays, the third of which was ongoing at the time of release of data:

example <- df[df$anonymized_identifier == "8ede14ebf032fcf92ae3e43675e18a6713af2fac", sample_cols]

knitr::kable(example, row.names = FALSE, format = "html") %>%
  kableExtra::scroll_box(width = "100%", height = "300px")
stay_book_in_date_time detention_book_in_date_and_time detention_book_out_date_time stay_book_out_date_time citizenship_country gender birth_year detention_facility area_of_responsibility detention_release_reason stay_release_reason anonymized_identifier
2018-02-21 15:56:00 2018-02-21 15:56:00 2018-03-22 14:06:00 2018-03-22 14:06:00 MEXICO Female 1996 BUFFALO SPC Buffalo Area of Responsibility U.S. Marshals or other agency (explain in Detention Comments) U.S. Marshals or other agency (explain in Detention Comments) 8ede14ebf032fcf92ae3e43675e18a6713af2fac
2018-06-27 15:00:00 2018-06-27 15:00:00 2018-08-14 07:30:00 2018-08-14 07:30:00 MEXICO Female 1996 BUFFALO SPC Buffalo Area of Responsibility U.S. Marshals or other agency (explain in Detention Comments) U.S. Marshals or other agency (explain in Detention Comments) 8ede14ebf032fcf92ae3e43675e18a6713af2fac
2018-08-21 16:00:00 2018-08-21 16:00:00 2019-05-13 17:23:00 NA MEXICO Female 1996 BUFFALO SPC Buffalo Area of Responsibility Transferred NA 8ede14ebf032fcf92ae3e43675e18a6713af2fac
2018-08-21 16:00:00 2019-05-13 19:00:00 2019-05-20 17:56:00 NA MEXICO Female 1996 CENTRAL LOUISIANA ICE PROC CTR New Orleans Area of Responsibility Transferred NA 8ede14ebf032fcf92ae3e43675e18a6713af2fac
2018-08-21 16:00:00 2019-05-20 18:59:00 2019-05-21 07:00:00 NA MEXICO Female 1996 YORK COUNTY JAIL, PA Philadelphia Area of Responsibility Transferred NA 8ede14ebf032fcf92ae3e43675e18a6713af2fac
2018-08-21 16:00:00 2019-05-21 13:15:00 2020-12-11 07:56:00 NA MEXICO Female 1996 BUFFALO SPC Buffalo Area of Responsibility Transferred NA 8ede14ebf032fcf92ae3e43675e18a6713af2fac
2018-08-21 16:00:00 2020-12-11 14:39:00 2021-01-28 15:00:00 NA MEXICO Female 1996 RENSSELAER CO SHERIFF New York City Area of Responsibility Transferred NA 8ede14ebf032fcf92ae3e43675e18a6713af2fac
2018-08-21 16:00:00 2021-01-28 17:15:00 2021-02-24 08:00:00 NA MEXICO Female 1996 BUFFALO SPC Buffalo Area of Responsibility Transferred NA 8ede14ebf032fcf92ae3e43675e18a6713af2fac
2018-08-21 16:00:00 2021-02-24 14:47:00 2022-08-11 17:14:00 NA MEXICO Female 1996 RENSSELAER CO SHERIFF New York City Area of Responsibility Transferred NA 8ede14ebf032fcf92ae3e43675e18a6713af2fac
2018-08-21 16:00:00 2022-08-11 23:00:00 NA NA MEXICO Female 1996 ELIZABETH CONTRACT D.F. Newark Area of Responsibility NA NA 8ede14ebf032fcf92ae3e43675e18a6713af2fac

Note that in some cases, stay_book_in_date_time and stay_book_out_date_time values do not match their respective detention_book_in_date_and_time or detention_book_out_date_time values. This is especially prevalent in records with initial detention book in dates during FY 2011 or prior, but is present in all years; it is not clear what accounts for this discrepancy, which may result in under-counting of detained population. See for example:

example <- df[df$anonymized_identifier == "0bafe4b223d4a1afe8e4bb95821f00e9183e588b", sample_cols]

knitr::kable(example, row.names = FALSE, format = "html") %>%
  kableExtra::scroll_box(width = "100%", height = "300px")
stay_book_in_date_time detention_book_in_date_and_time detention_book_out_date_time stay_book_out_date_time citizenship_country gender birth_year detention_facility area_of_responsibility detention_release_reason stay_release_reason anonymized_identifier
1995-08-31 00:01:00 2010-04-14 08:01:00 2017-08-24 17:54:00 2017-08-31 11:30:00 CUBA Male 1952 TERRE HAUTE USP Chicago Area of Responsibility Transferred Removed 0bafe4b223d4a1afe8e4bb95821f00e9183e588b
1995-08-31 00:01:00 2017-08-24 18:04:00 2017-08-24 18:07:00 2017-08-31 11:30:00 CUBA Male 1952 PULASKI COUNTY JAIL Chicago Area of Responsibility Transferred Removed 0bafe4b223d4a1afe8e4bb95821f00e9183e588b
1995-08-31 00:01:00 2017-08-24 20:00:00 2017-08-25 06:48:00 2017-08-31 11:30:00 CUBA Male 1952 STEWART DETENTION CENTER Atlanta Area of Responsibility Transferred Removed 0bafe4b223d4a1afe8e4bb95821f00e9183e588b
1995-08-31 00:01:00 2017-08-26 00:01:00 2017-08-26 00:01:00 2017-08-31 11:30:00 CUBA Male 1952 KROME NORTH SPC Miami Area of Responsibility Transferred Removed 0bafe4b223d4a1afe8e4bb95821f00e9183e588b
1995-08-31 00:01:00 2017-08-26 00:15:00 2017-08-31 06:45:00 2017-08-31 11:30:00 CUBA Male 1952 KROME NORTH SPC Miami Area of Responsibility Transferred Removed 0bafe4b223d4a1afe8e4bb95821f00e9183e588b
1995-08-31 00:01:00 2017-08-31 07:00:00 2017-08-31 11:30:00 2017-08-31 11:30:00 CUBA Male 1952 MIAMI STAGING FACILITY Miami Area of Responsibility Removed Removed 0bafe4b223d4a1afe8e4bb95821f00e9183e588b

Data analysis

When performing analysis of this dataset, it is important to consider whether the unit of interest is at the level of unique individuals, placements, or stays.

The full dataset, unique-stays/output/ice_detentions_fy12-24ytd.feather, can be used to examine all placement records pertaining to a given individual, demographic characteristic, or specific detention facility. Note that results for calculations such as “average length of stay” will be incorrect if analysis is not restricted to one representative record per individual per stay.

Analysis of characteristics per detention stay can be calculated by selecting distinct combinations of anonymized_identifier and stay_book_in_date_time values; or simply by using distinct stayid values. Additional analysis fields generated in unique-stays/src/unique-stays.R can be used to select representative records for each stay based on specific characteristics, such as the longest_placement per stay or last_placement per stay. To select the first placement per stay, simply filter the dataset to select records where placement_count == 1.

dat <- df %>%
  distinct(stayid, .keep_all = TRUE) %>% 
  mutate(bookin_fy = substr(quarter(stay_book_in_date_time, type="year.quarter", fiscal_start=10), 1, 4)) %>% 
  filter(bookin_fy > 2011,
         bookin_fy < 2024) %>% 
  group_by(bookin_fy) %>% 
  summarize(alos = mean(as.numeric(stay_length, units='days'), na.rm=TRUE))

p1 <- dat %>% 
  ggplot(aes(x = bookin_fy, y = alos)) +
  geom_col() +
  xlab("Initial book-in FY") +
  ylab("Average length of stay (days)") +
  labs(title = "Average length of stay, FY2012-2023")

p1

Detention headcount

An obvious usage of this dataset is to calculate the detained population over time, either in total or based on given characteristics such as population demographics or location. Because of the structure of the dataset, this can be done by conditionally indexing records to select in-range detention placements for a given date or timeline; e.g. df[df$detention_book_in_date_and_time <= date & df$detention_book_out_date_time > date,]. (Note that this indexing will fail to select ongoing detention stays missing detention_book_out_date_time unless this is accounted for in some way; we do this by setting a temporary detention_book_out_date_time_min value equal to the max date in the timeline. This indexing may fail to account for bookings shorter than 24 hours.)

We have implemented a basic script for this operation at headcount/src/headcount.R, which by default outputs daily detained population per detention facility at 01:00 hours UTC for the time period covered by the dataset as headcount/output/headcount_fy12-24ytd.csv.gz. Parameters of this script, including arbitrary grouping variables, can be set in the file headcount/Makefile. (At least one grouping variable is required per current implementaiton, if this is not desired it should be trivial to modify script for this purpose.)

Results of these calculations should be regarded as approximations of the true detained population.2 Note full total detention population is not reached until FY 2012, per the scope of the FOIA request that generated this dataset:

headcount <- read_delim(here::here('write', 'input', 'headcount_fy12-24ytd.csv.gz'), delim='|')

headcount <- headcount %>% 
  mutate(date = as.Date(date))

# Note data is grouped by `date` and `detention_facility_code`, so we have to
# sum total population by date
p1 <- headcount %>% 
  # filter(date >= '2011-10-01') %>% 
  group_by(date) %>% 
  summarize(total_pop = sum(n, na.rm=TRUE)) %>% 
  ggplot(aes(x = date, y = total_pop)) +
  geom_line() +
  ylim(0, NA) +
  labs(title = "Nationwide daily U.S. ICE detention population") +
  ylab("Total population") +
  xlab("Date")

p1

We can also filter the default headcount dataset to display population counts for a given facility or subset of facilities:

# `group_by` and `summarize` not necessary but good practice for consistency
p2 <- headcount %>% 
  filter(date >= '2011-10-01',
         detention_facility_code == "CSCNWWA") %>% 
  group_by(date) %>% 
  summarize(total_pop = sum(n, na.rm=TRUE)) %>% 
  ggplot(aes(x = date, y = total_pop)) +
  geom_line() +
  ylim(0, NA) +
  labs(title = "Daily ICE detention population",
       subtitle = "NW ICE Processing Center (Tacoma, WA)") +
  ylab("Total population") +
  xlab("Date")

p2

For more limited subsets of data, this can be run inline:

# Transform data to "fill in" missing `detention_book_out_date_time` values
# in order to account for ongong detention stays at time of release of data
max_date <- max(df$stay_book_out_date_time, na.rm=TRUE)

df <- df %>%
  mutate(detention_book_out_date_time_min = 
           case_when(is.na(detention_book_out_date_time) ~ max_date,
                     TRUE ~ detention_book_out_date_time))

# Define timeline for calculation of daily detained population
timeline_start <- min(df$stay_book_in_date_time, na.rm=TRUE)
timeline_end <- max(df$stay_book_out_date_time, na.rm=TRUE)
timeline <- seq(timeline_start, timeline_end, by='day')

# Function counts all in-range detention placement records in dataset `df` for a
# given `date` by a given grouping variable `var`
headcounter <- function(date, df, group_vars) {
  
  in_range <- df[df$detention_book_in_date_and_time <= date & df$detention_book_out_date_time_min > date,]
  
  in_range %>% 
    group_by(across(all_of(group_vars))) %>% 
    summarize(n = n()) %>% 
    complete(fill = list(n = 0)) %>% 
    mutate(date=date)
  
  }

# Generate limited sample dataset
temp_df <- df %>% 
  filter(citizenship_country == "ANGOLA")

# Apply function to timeline
example_headcount <- lapply(timeline, headcounter, df=temp_df, group_vars=c('gender'))

# Transform output into data frame
example_headcount_data <- map_dfr(example_headcount, bind_rows)

# Plot headcount
p1 <- example_headcount_data %>% 
  filter(date >= "2010-10-01") %>% 
  ggplot(aes(x = date, y = n, fill = gender) ) +
  geom_area() +
  labs(title = "Daily detained population by `gender`",
       subtitle = "`citizenship_country` == 'ANGOLA'")

p1

Daily population counts can be used to calculate figures such as “average daily population” (ADP) per month or year.

Facility characteristics

Note that the datasets released by ICE contain minimal information on detention facility characteristics; this is limited to the fields detention_facility_code (also referred to as “DETLOC” in other ICE data sources), detention_facility (full facility name), area_of_responsibility (ICE area of responsibility where facility is located), and docket_control_office (ICE docket control office responsible for facility).

Detention facilities represented in this dataset range include dedicated ICE detention facilities, ICE hold rooms, jails and prisons contracted by ICE, medical facilities, hotels, etc.; which differ in important characteristics which are not represented here and must be inferred or joined from other sources. For example, some ICE facilities can only hold detained people for 72 hours or less, an important factor to control for when comparing placement or stay lengths between facilities.


  1. For counts of dropped records, see ice-detain/concat/output/concat.log.↩︎

  2. A recent Government Accountability Office (GAO) report indicates that detention population figures reported by ICE underestimate detention population figures obtained by independent calculations apparently using a version of the same dataset analyzed here. See U.S. Government Accountability Office, “Immigration Enforcement: Arrests, Removals, and Detentions Varied Over Time and ICE Should Strengthen Data Reporting”, GAO-24-106233, July 23, 2024, https://www.gao.gov/products/gao-24-106233↩︎