options(scipen = 1000000)
library(pacman)
p_load(tidyverse, arrow, lubridate, zoo, digest, ggplot2, plotly, gghighlight, knitr, kableExtra)
df <- data.table::fread(file = here::here('write', 'input', 'ice_detentions_fy12-24ytd.csv.gz'),
sep = '|')
data.table::setDF(df)
detloc_aor <- df %>%
distinct(detention_facility_code, area_of_responsibility) %>%
arrange(detention_facility_code, area_of_responsibility)
This dataset of
U.S. Immigration and Customs Enforcement (ICE) nationwide detention
placements from October 1, 2011 to January 4, 2024 (full U.S. government
fiscal years 2012-2023) includes 8631489 records after dropping
duplicate records and records missing anonymized_id
values1,
representing 3360633 unique individuals and 1122 unique detention
facilities. The dataset has been minimally cleaned and standardized and
is known to contain inconsistencies; all results should be interpreted
with caution.
The data and associated code can be found in the following GitHub repository: https://github.com/UWCHR/ice-detain
Each row in this dataset represents a placement of an individual at a
specific ICE detention facility. Unique people are identified by the
anonymized_identifier
field. One or more successive
placements constitute a detention stay. Unique detention stays are
identified by the combination of anonymized_identifier
and
stay_book_in_date_time
fields. Each unique person
represented in the dataset may experience one or more detention
stays.
Data were released by ICE in separate annual spreadsheets containing
all detention placement records for detention stays with an initial
book-in during that U.S. government fiscal year. Stay records spanning
multiple fiscal years do not appear to be duplicated in successive
datasets. Within each spreadsheet, detention histories are sorted by
anonymized_identifier
and
detention_book_in_date_and_time
, resulting in “blocs” of
records for each stay.
Completed detention placements are identified by non-null
detention_book_out_date_time
and
detention_release_reason
fields; associated stay records
are identified by non-null stay_book_out_date_time
and
stay_release_reason
fields. In most cases, these values
will match associated detention_book_out_date_time
and
detention_release_reason
values for the final placement of
the stay.
See below for select fields of a representative bloc of detention placement records pertaining to a single completed individual detention stay:
sample_cols <- c("stay_book_in_date_time",
"detention_book_in_date_and_time", # Note irregular field name format
"detention_book_out_date_time",
"stay_book_out_date_time",
"citizenship_country",
"gender",
"birth_year",
"detention_facility",
"area_of_responsibility",
"detention_release_reason",
"stay_release_reason",
"anonymized_identifier")
example <- df[df$anonymized_identifier == "0000005694c4fd9ebcc71c6a54fca1cdfc516c36", sample_cols]
knitr::kable(example, row.names = FALSE, format = "html") %>%
kableExtra::scroll_box(width = "100%", height = "300px")
stay_book_in_date_time | detention_book_in_date_and_time | detention_book_out_date_time | stay_book_out_date_time | citizenship_country | gender | birth_year | detention_facility | area_of_responsibility | detention_release_reason | stay_release_reason | anonymized_identifier |
---|---|---|---|---|---|---|---|---|---|---|---|
2012-03-23 03:55:00 | 2012-03-23 03:55:00 | 2012-03-23 14:30:00 | 2012-04-10 15:56:00 | MEXICO | Male | 1983 | LOS CUST CASE | Los Angeles Area of Responsibility | Transferred | Removed | 0000005694c4fd9ebcc71c6a54fca1cdfc516c36 |
2012-03-23 03:55:00 | 2012-03-23 14:46:00 | 2012-03-26 14:15:00 | 2012-04-10 15:56:00 | MEXICO | Male | 1983 | THEO LACY FACILITY | Los Angeles Area of Responsibility | Transferred | Removed | 0000005694c4fd9ebcc71c6a54fca1cdfc516c36 |
2012-03-23 03:55:00 | 2012-03-26 14:16:00 | 2012-04-02 18:06:00 | 2012-04-10 15:56:00 | MEXICO | Male | 1983 | WESTERN MEDICAL CENTER | Los Angeles Area of Responsibility | Transferred | Removed | 0000005694c4fd9ebcc71c6a54fca1cdfc516c36 |
2012-03-23 03:55:00 | 2012-04-02 18:06:00 | 2012-04-10 15:13:00 | 2012-04-10 15:56:00 | MEXICO | Male | 1983 | THEO LACY FACILITY | Los Angeles Area of Responsibility | Transferred | Removed | 0000005694c4fd9ebcc71c6a54fca1cdfc516c36 |
2012-03-23 03:55:00 | 2012-04-10 15:14:00 | 2012-04-10 15:56:00 | 2012-04-10 15:56:00 | MEXICO | Male | 1983 | SANTA ANA DRO HOLDROOM | Los Angeles Area of Responsibility | Removed | Removed | 0000005694c4fd9ebcc71c6a54fca1cdfc516c36 |
Ongoing detention placements at the time of production of the data on
January 4, 2024 are missing detention_book_out_date_time
and detention_release_reason
values; associated blocs of
records are missing stay_book_out_date_time
and
stay_release_reason
values. See the below example of an
individual with three detention stays, the third of which was ongoing at
the time of release of data:
example <- df[df$anonymized_identifier == "8ede14ebf032fcf92ae3e43675e18a6713af2fac", sample_cols]
knitr::kable(example, row.names = FALSE, format = "html") %>%
kableExtra::scroll_box(width = "100%", height = "300px")
stay_book_in_date_time | detention_book_in_date_and_time | detention_book_out_date_time | stay_book_out_date_time | citizenship_country | gender | birth_year | detention_facility | area_of_responsibility | detention_release_reason | stay_release_reason | anonymized_identifier |
---|---|---|---|---|---|---|---|---|---|---|---|
2018-02-21 15:56:00 | 2018-02-21 15:56:00 | 2018-03-22 14:06:00 | 2018-03-22 14:06:00 | MEXICO | Female | 1996 | BUFFALO SPC | Buffalo Area of Responsibility | U.S. Marshals or other agency (explain in Detention Comments) | U.S. Marshals or other agency (explain in Detention Comments) | 8ede14ebf032fcf92ae3e43675e18a6713af2fac |
2018-06-27 15:00:00 | 2018-06-27 15:00:00 | 2018-08-14 07:30:00 | 2018-08-14 07:30:00 | MEXICO | Female | 1996 | BUFFALO SPC | Buffalo Area of Responsibility | U.S. Marshals or other agency (explain in Detention Comments) | U.S. Marshals or other agency (explain in Detention Comments) | 8ede14ebf032fcf92ae3e43675e18a6713af2fac |
2018-08-21 16:00:00 | 2018-08-21 16:00:00 | 2019-05-13 17:23:00 | NA | MEXICO | Female | 1996 | BUFFALO SPC | Buffalo Area of Responsibility | Transferred | NA | 8ede14ebf032fcf92ae3e43675e18a6713af2fac |
2018-08-21 16:00:00 | 2019-05-13 19:00:00 | 2019-05-20 17:56:00 | NA | MEXICO | Female | 1996 | CENTRAL LOUISIANA ICE PROC CTR | New Orleans Area of Responsibility | Transferred | NA | 8ede14ebf032fcf92ae3e43675e18a6713af2fac |
2018-08-21 16:00:00 | 2019-05-20 18:59:00 | 2019-05-21 07:00:00 | NA | MEXICO | Female | 1996 | YORK COUNTY JAIL, PA | Philadelphia Area of Responsibility | Transferred | NA | 8ede14ebf032fcf92ae3e43675e18a6713af2fac |
2018-08-21 16:00:00 | 2019-05-21 13:15:00 | 2020-12-11 07:56:00 | NA | MEXICO | Female | 1996 | BUFFALO SPC | Buffalo Area of Responsibility | Transferred | NA | 8ede14ebf032fcf92ae3e43675e18a6713af2fac |
2018-08-21 16:00:00 | 2020-12-11 14:39:00 | 2021-01-28 15:00:00 | NA | MEXICO | Female | 1996 | RENSSELAER CO SHERIFF | New York City Area of Responsibility | Transferred | NA | 8ede14ebf032fcf92ae3e43675e18a6713af2fac |
2018-08-21 16:00:00 | 2021-01-28 17:15:00 | 2021-02-24 08:00:00 | NA | MEXICO | Female | 1996 | BUFFALO SPC | Buffalo Area of Responsibility | Transferred | NA | 8ede14ebf032fcf92ae3e43675e18a6713af2fac |
2018-08-21 16:00:00 | 2021-02-24 14:47:00 | 2022-08-11 17:14:00 | NA | MEXICO | Female | 1996 | RENSSELAER CO SHERIFF | New York City Area of Responsibility | Transferred | NA | 8ede14ebf032fcf92ae3e43675e18a6713af2fac |
2018-08-21 16:00:00 | 2022-08-11 23:00:00 | NA | NA | MEXICO | Female | 1996 | ELIZABETH CONTRACT D.F. | Newark Area of Responsibility | NA | NA | 8ede14ebf032fcf92ae3e43675e18a6713af2fac |
Note that in some cases, stay_book_in_date_time
and
stay_book_out_date_time
values do not match their
respective detention_book_in_date_and_time
or
detention_book_out_date_time
values. This is especially
prevalent in records with initial detention book in dates during FY 2011
or prior, but is present in all years; it is not clear what accounts for
this discrepancy, which may result in under-counting of detained
population. See for example:
example <- df[df$anonymized_identifier == "0bafe4b223d4a1afe8e4bb95821f00e9183e588b", sample_cols]
knitr::kable(example, row.names = FALSE, format = "html") %>%
kableExtra::scroll_box(width = "100%", height = "300px")
stay_book_in_date_time | detention_book_in_date_and_time | detention_book_out_date_time | stay_book_out_date_time | citizenship_country | gender | birth_year | detention_facility | area_of_responsibility | detention_release_reason | stay_release_reason | anonymized_identifier |
---|---|---|---|---|---|---|---|---|---|---|---|
1995-08-31 00:01:00 | 2010-04-14 08:01:00 | 2017-08-24 17:54:00 | 2017-08-31 11:30:00 | CUBA | Male | 1952 | TERRE HAUTE USP | Chicago Area of Responsibility | Transferred | Removed | 0bafe4b223d4a1afe8e4bb95821f00e9183e588b |
1995-08-31 00:01:00 | 2017-08-24 18:04:00 | 2017-08-24 18:07:00 | 2017-08-31 11:30:00 | CUBA | Male | 1952 | PULASKI COUNTY JAIL | Chicago Area of Responsibility | Transferred | Removed | 0bafe4b223d4a1afe8e4bb95821f00e9183e588b |
1995-08-31 00:01:00 | 2017-08-24 20:00:00 | 2017-08-25 06:48:00 | 2017-08-31 11:30:00 | CUBA | Male | 1952 | STEWART DETENTION CENTER | Atlanta Area of Responsibility | Transferred | Removed | 0bafe4b223d4a1afe8e4bb95821f00e9183e588b |
1995-08-31 00:01:00 | 2017-08-26 00:01:00 | 2017-08-26 00:01:00 | 2017-08-31 11:30:00 | CUBA | Male | 1952 | KROME NORTH SPC | Miami Area of Responsibility | Transferred | Removed | 0bafe4b223d4a1afe8e4bb95821f00e9183e588b |
1995-08-31 00:01:00 | 2017-08-26 00:15:00 | 2017-08-31 06:45:00 | 2017-08-31 11:30:00 | CUBA | Male | 1952 | KROME NORTH SPC | Miami Area of Responsibility | Transferred | Removed | 0bafe4b223d4a1afe8e4bb95821f00e9183e588b |
1995-08-31 00:01:00 | 2017-08-31 07:00:00 | 2017-08-31 11:30:00 | 2017-08-31 11:30:00 | CUBA | Male | 1952 | MIAMI STAGING FACILITY | Miami Area of Responsibility | Removed | Removed | 0bafe4b223d4a1afe8e4bb95821f00e9183e588b |
When performing analysis of this dataset, it is important to consider whether the unit of interest is at the level of unique individuals, placements, or stays.
The full dataset,
unique-stays/output/ice_detentions_fy12-24ytd.feather
, can
be used to examine all placement records pertaining to a given
individual, demographic characteristic, or specific detention facility.
Note that results for calculations such as “average length of stay” will
be incorrect if analysis is not restricted to one representative record
per individual per stay.
Analysis of characteristics per detention stay can be calculated by
selecting distinct combinations of anonymized_identifier
and stay_book_in_date_time
values; or simply by using
distinct stayid
values. Additional analysis fields
generated in unique-stays/src/unique-stays.R
can be used to
select representative records for each stay based on specific
characteristics, such as the longest_placement
per stay or
last_placement
per stay. To select the first placement per
stay, simply filter the dataset to select records where
placement_count == 1
.
dat <- df %>%
distinct(stayid, .keep_all = TRUE) %>%
mutate(bookin_fy = substr(quarter(stay_book_in_date_time, type="year.quarter", fiscal_start=10), 1, 4)) %>%
filter(bookin_fy > 2011,
bookin_fy < 2024) %>%
group_by(bookin_fy) %>%
summarize(alos = mean(as.numeric(stay_length, units='days'), na.rm=TRUE))
p1 <- dat %>%
ggplot(aes(x = bookin_fy, y = alos)) +
geom_col() +
xlab("Initial book-in FY") +
ylab("Average length of stay (days)") +
labs(title = "Average length of stay, FY2012-2023")
p1
An obvious usage of this dataset is to calculate the detained
population over time, either in total or based on given characteristics
such as population demographics or location. Because of the structure of
the dataset, this can be done by conditionally indexing records to
select in-range detention placements for a given date or timeline;
e.g. df[df$detention_book_in_date_and_time <= date & df$detention_book_out_date_time > date,]
.
(Note that this indexing will fail to select ongoing detention stays
missing detention_book_out_date_time
unless this is
accounted for in some way; we do this by setting a temporary
detention_book_out_date_time_min
value equal to the max
date in the timeline. This indexing may fail to account for bookings
shorter than 24 hours.)
We have implemented a basic script for this operation at
headcount/src/headcount.R
, which by default outputs daily
detained population per detention facility at 01:00 hours UTC for the
time period covered by the dataset as
headcount/output/headcount_fy12-24ytd.csv.gz
. Parameters of
this script, including arbitrary grouping variables, can be set in the
file headcount/Makefile
. (At least one grouping variable is
required per current implementaiton, if this is not desired it should be
trivial to modify script for this purpose.)
Results of these calculations should be regarded as approximations of the true detained population.2 Note full total detention population is not reached until FY 2012, per the scope of the FOIA request that generated this dataset:
headcount <- read_delim(here::here('write', 'input', 'headcount_fy12-24ytd.csv.gz'), delim='|')
headcount <- headcount %>%
mutate(date = as.Date(date))
# Note data is grouped by `date` and `detention_facility_code`, so we have to
# sum total population by date
p1 <- headcount %>%
# filter(date >= '2011-10-01') %>%
group_by(date) %>%
summarize(total_pop = sum(n, na.rm=TRUE)) %>%
ggplot(aes(x = date, y = total_pop)) +
geom_line() +
ylim(0, NA) +
labs(title = "Nationwide daily U.S. ICE detention population") +
ylab("Total population") +
xlab("Date")
p1
We can also filter the default headcount dataset to display population counts for a given facility or subset of facilities:
# `group_by` and `summarize` not necessary but good practice for consistency
p2 <- headcount %>%
filter(date >= '2011-10-01',
detention_facility_code == "CSCNWWA") %>%
group_by(date) %>%
summarize(total_pop = sum(n, na.rm=TRUE)) %>%
ggplot(aes(x = date, y = total_pop)) +
geom_line() +
ylim(0, NA) +
labs(title = "Daily ICE detention population",
subtitle = "NW ICE Processing Center (Tacoma, WA)") +
ylab("Total population") +
xlab("Date")
p2
For more limited subsets of data, this can be run inline:
# Transform data to "fill in" missing `detention_book_out_date_time` values
# in order to account for ongong detention stays at time of release of data
max_date <- max(df$stay_book_out_date_time, na.rm=TRUE)
df <- df %>%
mutate(detention_book_out_date_time_min =
case_when(is.na(detention_book_out_date_time) ~ max_date,
TRUE ~ detention_book_out_date_time))
# Define timeline for calculation of daily detained population
timeline_start <- min(df$stay_book_in_date_time, na.rm=TRUE)
timeline_end <- max(df$stay_book_out_date_time, na.rm=TRUE)
timeline <- seq(timeline_start, timeline_end, by='day')
# Function counts all in-range detention placement records in dataset `df` for a
# given `date` by a given grouping variable `var`
headcounter <- function(date, df, group_vars) {
in_range <- df[df$detention_book_in_date_and_time <= date & df$detention_book_out_date_time_min > date,]
in_range %>%
group_by(across(all_of(group_vars))) %>%
summarize(n = n()) %>%
complete(fill = list(n = 0)) %>%
mutate(date=date)
}
# Generate limited sample dataset
temp_df <- df %>%
filter(citizenship_country == "ANGOLA")
# Apply function to timeline
example_headcount <- lapply(timeline, headcounter, df=temp_df, group_vars=c('gender'))
# Transform output into data frame
example_headcount_data <- map_dfr(example_headcount, bind_rows)
# Plot headcount
p1 <- example_headcount_data %>%
filter(date >= "2010-10-01") %>%
ggplot(aes(x = date, y = n, fill = gender) ) +
geom_area() +
labs(title = "Daily detained population by `gender`",
subtitle = "`citizenship_country` == 'ANGOLA'")
p1
Daily population counts can be used to calculate figures such as “average daily population” (ADP) per month or year.
Note that the datasets released by ICE contain minimal information on
detention facility characteristics; this is limited to the fields
detention_facility_code
(also referred to as “DETLOC” in
other ICE data sources), detention_facility
(full facility
name), area_of_responsibility
(ICE area of responsibility
where facility is located), and docket_control_office
(ICE
docket control office responsible for facility).
Detention facilities represented in this dataset range include dedicated ICE detention facilities, ICE hold rooms, jails and prisons contracted by ICE, medical facilities, hotels, etc.; which differ in important characteristics which are not represented here and must be inferred or joined from other sources. For example, some ICE facilities can only hold detained people for 72 hours or less, an important factor to control for when comparing placement or stay lengths between facilities.
For counts of dropped records, see
ice-detain/concat/output/concat.log
.↩︎
A recent Government Accountability Office (GAO) report indicates that detention population figures reported by ICE underestimate detention population figures obtained by independent calculations apparently using a version of the same dataset analyzed here. See U.S. Government Accountability Office, “Immigration Enforcement: Arrests, Removals, and Detentions Varied Over Time and ICE Should Strengthen Data Reporting”, GAO-24-106233, July 23, 2024, https://www.gao.gov/products/gao-24-106233↩︎