Back to index

Data overview: Arrests

This notebook presents a national overview of U.S. Immigration and Customs Enforcement (ICE) Enforcement and Removal Operations (ERO) Law Enforcement Systems and Analysis Division (LESA) data from ICE’s Integrated Decision Support (IIDS) database regarding nationwide arrests for the time period from October 1, 2011, through January 29, 2023, (full U.S. Government Fiscal Years 2012 through 2022), obtained by the University of Washington Center for Human Rights (UWCHR) pursuant to FOIA request 2022-ICFO-09023.

For data and code used to generate this notebook, see: https://github.com/UWCHR/ice-enforce

options(scipen = 1000000)

library(pacman)

p_load(here, tidyverse, zoo, lubridate, ggplot2, plotly, gghighlight, viridis)

arr <- read_delim(here('write', 'input', 'ice_arrests_fy12-23ytd.csv.gz'), delim='|',
                  col_types = cols(aor = col_factor(),
                                  arrest_date = col_date(format="%m/%d/%Y"),
                                  departed_date = col_date(format="%m/%d/%Y"),
                                  apprehension_landmark = col_factor(),
                                  arrest_method = col_factor(),
                                  operation = col_factor(),
                                  processing_disposition = col_factor(),
                                  citizenship_country = col_factor(),
                                  gender = col_factor(),
                                  case_closed_date = col_date(format="%m/%d/%Y"),
                                  id = col_integer(),
                                  hashid = col_character()
                                  )) 

redacted <- c('removal_threat_level', 'apprehension_threat_level', 'alien_file_number')
redacted_text <- paste0('`', paste(unlist(redacted), collapse = '`, `'), '`')

arr <- arr %>% 
  dplyr::select(-all_of(redacted))

cy_months <- c("Jan","Feb","Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
fy_months <- c("Oct", "Nov", "Dec", "Jan","Feb","Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep")

arr <- arr %>% 
  mutate(aor = factor(aor, levels = sort(levels(arr$aor))),
         arrest_date = as_date(arrest_date, format="%m/%d/%Y"),
         year = year(arrest_date),
         month = factor(month(arrest_date, label=TRUE, abbr=TRUE), levels = fy_months),
         year_mth = zoo::as.yearmon(arrest_date),
         fy = as.factor(substr(quarter(arrest_date, fiscal_start=10, type="year.quarter"), 1,4)),
         gender = toupper(gender),
         operation = toupper(operation),
         processing_disposition = toupper(processing_disposition),
         citizenship_country = factor(toupper(citizenship_country)),
         apprehension_landmark = toupper(str_squish(apprehension_landmark)))

An administrative arrest (“arrest”) occurs when an individual is taken into custody by ICE and removal proceedings initiated against them.1

The arrests dataset (arr) includes 1741174 observations of 17 variables; 3 fully redacted fields (removal_threat_level, apprehension_threat_level, alien_file_number) are dropped from analysis.

The following provides an summary of dataset characteristics via skimr::skim(arr):

skimr::skim(arr)
Data summary
Name arr
Number of rows 1741174
Number of columns 17
_______________________
Column type frequency:
character 7
Date 3
factor 5
numeric 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
apprehension_landmark 55999 0.97 1 80 0 10536 0
operation 1227173 0.30 1 79 0 512 0
processing_disposition 3526 1.00 5 44 0 50 0
gender 0 1.00 4 7 0 3 0
hashid 0 1.00 40 40 0 1741174 0
area_of_responsibility 16414 0.99 25 37 0 26 0
year_mth 0 1.00 4 16 0 136 0

Variable type: Date

skim_variable n_missing complete_rate min max median n_unique
arrest_date 0 1.00 2011-10-01 2023-01-29 2016-05-18 4139
departed_date 879760 0.49 1982-07-15 2023-01-27 2016-01-08 4174
case_closed_date 1297394 0.25 1989-05-11 2023-01-28 2018-09-21 2712

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
aor 16414 0.99 FALSE 25 DAL: 156735, SNA: 153590, HOU: 140274, ATL: 134318
arrest_method 0 1.00 FALSE 28 CAP: 715306, Non: 262757, CAP: 254005, Loc: 175016
citizenship_country 20 1.00 FALSE 221 MEX: 990668, GUA: 154021, HON: 143928, EL : 101398
month 0 1.00 TRUE 12 Oct: 170676, Nov: 156697, Jan: 151794, Mar: 147540
fy 0 1.00 FALSE 12 201: 265573, 201: 232287, 201: 183703, 201: 158581

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
id 0 1 870586.50 502633.78 0 435293.2 870586.5 1305880 1741173 ▇▇▇▇▇
year 0 1 2016.15 3.45 2011 2013.0 2016.0 2019 2023 ▇▅▆▃▃

Field definitions

Datasets were released without any data dictionary or field descriptions; in cases where this information is not self-explanatory, we have attempted to provide citations of relevant sources providing context.

Original dataset fields

  • aor: ICE Area of Responsibility associated with arrest
  • arrest_date: Date of arrest
  • departed_date: Date of departure
  • case_closed_date: Date of closure of case
  • arrest_method: ICE ERO division or category associated with arrest
  • apprehension_landmark: Landmark or entity associated with arrest
  • operation: Operation associated with arrest
  • processing_disposition: Status of removal proceedings associated with event
  • citizenship_country: Country of citizenship of arrested individual
  • gender: Gender of arrested individual
  • apprehension_threat_level: Fully redacted in original dataset
  • removal_threat_level: Fully redacted in original dataset
  • alien_file_number: Unique individual identifier for arrested individual, fully redacted in original dataset

Additional fields created by UWCHR

  • id: Sequential record identifier (not individual identifier)
  • hashid: Unique record hash (not individual identifier)
  • year: Calendar year derived from arrest_date
  • month: Abbreviated month derived from arrest_date
  • year_mth: Calendar year and month derived from arrest_date
  • fy: U.S. government fiscal year (Oct.-Sept.) derived from arrest_date

Total arrests

p1 <- arr %>%
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  group_by(fy) %>% 
  summarize(n = n()) %>% 
  ggplot(aes(x = as.factor(fy), y=n)) +
  geom_col() +
  labs(title = "Total ICE arrests per FY") +
  theme_minimal()

p1

p2 <- arr %>%
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  group_by(year_mth) %>%
  summarize(n = n()) %>%
  ggplot(aes(x = year_mth, y = n)) +
  geom_line(aes(group=1)) +
  ylim(0, NA) +
  labs(title = "Total nationwide ICE arrests per month") +
  theme_minimal()

p2
## Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
## ℹ Please use the `transform` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

p3 <- arr %>%
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  group_by(fy, month) %>%
  summarize(n = n()) %>%
  ggplot(aes(x = month, y = n, color = fy, group = fy)) +
  geom_line() +
  ylim(0, NA) +
  scale_color_viridis_d() +
  labs(title = "Total nationwide ICE arrests per month") +
  theme_minimal()

p3

Basic demographics

Gender

Increasing proportion of females arrested since FY 2021:

# arr %>%
#   mutate(gender = toupper(gender)) %>% 
#   group_by(gender) %>% 
#   summarize(n = n())

p1 <- arr %>% 
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  count(fy, gender) %>% 
  ggplot(aes(x=fy, y=n, fill=gender)) +
  geom_col(position='fill') +
  scale_y_continuous(labels = scales::percent) +
  labs(title="Total ICE arrests, % by gender") +
  theme_minimal()

p1

Country of citizenship

Changing composition of arrest nationality: Mexico, Guatemala, El Salvador decrease; increase in Venezeula, Colombia, Nicaragua.

cit <- arr %>%
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  mutate(citizenship_country = toupper(citizenship_country)) %>% 
  group_by(citizenship_country) %>% 
  summarize(n = n()) %>% 
  arrange(desc(n))

p1 <- arr %>% 
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  mutate(citizenship_country =
           case_when(citizenship_country %in%
                       head(cit$citizenship_country, 15) ~
                       citizenship_country,
                     TRUE ~
                       "ALL OTHERS"
  )) %>% 
  count(fy, citizenship_country) %>% 
  ggplot(aes(x=fy, y=n, fill=citizenship_country, color=citizenship_country)) +
  geom_col() +
  labs(title = "Total ICE arrests by country of citizenship (top 15)") +
  theme_minimal()

ggplotly(p1)
cit_rank <- arr %>% 
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  count(fy, citizenship_country) %>% 
  arrange(fy, desc(n), citizenship_country) %>% 
  group_by(fy) %>% 
  mutate(ranking = row_number())

p1 <- cit_rank %>%
  filter(ranking <= 10) %>% 
  ggplot(aes(x = fy, y = ranking, group = citizenship_country)) +
  geom_line(aes(color = citizenship_country), size = 1) +
  geom_point(aes(color = citizenship_country), size = 2) +
  scale_y_reverse(breaks = seq(1,10)) +
  labs(title = "Ranked country of citizenship for ICE arrests") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplotly(p1)

Total arrests by AOR

Below is an interactive chart of total ICE arrests per FY by AOR:

p1 <- arr %>%
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>%   
  group_by(fy, aor) %>% 
  summarize(n = n()) %>% 
  ggplot(aes(x = as.factor(fy), y=n, color=aor, group=aor)) +
  geom_line() +
  labs(title = "Total ICE arrests per FY by AOR") +
  theme_minimal()

ggplotly(p1)

Percent change

Percent change in arrests per FY nationally and by AOR.

natl_pct_chg <- arr %>%
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  group_by(fy) %>%
  summarize(n = n()) %>% 
  mutate(pct_change = (n/lag(n) - 1))

p1 <- natl_pct_chg %>% 
  ggplot(aes(x = fy, y = pct_change)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent) +
  labs(title="FY % change in total ICE arrests") +
  theme_minimal()

p1

aor_pct_chg <- arr %>%
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30",
         !is.na(aor)) %>% 
  group_by(fy, aor) %>%
  summarize(n = n()) %>% 
  group_by(aor) %>% 
  arrange(fy, .by_group=TRUE) %>% 
  mutate(pct_change = (n/lag(n) - 1))

p2 <- aor_pct_chg %>% 
  ggplot(aes(x = fy, y = pct_change)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent) +
  scale_x_discrete(breaks=seq(2012, 2022, 4)) +
  facet_wrap(~aor)  +
  labs(title="FY % change in total ICE arrests per AOR") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  theme_minimal()

p2

Arrests by arrest_method

methods <- arr %>% 
  count(arrest_method) %>% 
  arrange(desc(n))

top_methods <- methods %>% 
  filter(n > 10000)

arr <- arr %>% 
  mutate(arrest_method_short =
           case_when(arrest_method %in%
                       unlist(top_methods$arrest_method) ~
                       as.character(arrest_method), 
                     TRUE ~ 
                       "All others"))
p1 <- arr %>% 
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  group_by(fy, arrest_method_short) %>%
  ggplot(aes(x = fy, fill=arrest_method_short)) +
  geom_bar(stat='count', position='stack') +
  theme_minimal()

ggplotly(p1)
method_pct_chg <- arr %>%
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30",
         !is.na(aor)) %>% 
  group_by(fy, arrest_method_short) %>%
  summarize(n = n()) %>% 
  group_by(arrest_method_short) %>% 
  arrange(fy, .by_group=TRUE) %>% 
  mutate(pct_change = (n/lag(n) - 1))

p1 <- method_pct_chg %>% 
  ggplot(aes(x = fy, y = pct_change)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent) +
  scale_x_discrete(breaks=seq(2012, 2022, 4)) +
  facet_wrap(~arrest_method_short, scales='free_y', labeller = label_wrap_gen(width=20))  +
  labs(title="FY % change in total ICE arrests by arrest method") +
  theme_minimal()

p1

p2 <- arr %>% 
  mutate(fy = substr(as.character(fy), 3, 4)) %>% 
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30",
         !is.na(aor),
         aor != "HQ") %>% 
  group_by(aor, fy, arrest_method_short) %>%
  ggplot(aes(x = fy, fill=arrest_method_short)) +
  geom_bar(stat='count') +
  scale_x_discrete(breaks=seq(12, 22, 4)) +
  facet_wrap(~aor) +
  labs(title="Total ICE arrests by arrest method per AOR") +
  theme_minimal()

ggplotly(p2)
# p3 <- arr %>%
#     filter(arrest_date >= "2011-10-01",
#          arrest_date <= "2022-09-30",
#          !is.na(aor),
#          aor != "HQ",
#          arrest_method_short == "ERO Reprocessed Arrest") %>% 
#   group_by(aor, fy, arrest_method_short) %>%
#   ggplot(aes(x = fy, fill=arrest_method_short)) +
#   geom_bar(stat='count') +
#   scale_x_discrete(breaks=seq(2012, 2022, 2)) +
#   facet_wrap(~aor) +
#   labs(title="Total ICE arrests by arrest method per AOR") +
#   theme(axis.text.x = element_text(angle = 90, vjust = 0, hjust=1))
# 
# p3
p1 <- arr %>% 
  filter(arrest_date >= "2011-10-01",
       arrest_date <= "2022-09-30") %>% 
  count(fy, gender, arrest_method_short) %>% 
  ggplot(aes(x=fy, y=n, fill=gender)) +
  geom_col(position='fill') +
  facet_wrap(~arrest_method_short, labeller = label_wrap_gen(width=20)) +
  scale_x_discrete(breaks=seq(2012, 2022, 4)) +
  scale_y_continuous(labels = scales::percent) +
  labs(title="Total ICE arrests, % by gender") +
  theme_minimal()

p1

Processing disposition

disps <- arr %>% 
  count(processing_disposition) %>% 
  arrange(desc(n))

top_disps <- disps %>% 
  filter(n > 10000)

arr <- arr %>% 
  mutate(disp_short = 
           case_when(processing_disposition %in%
                       unlist(top_disps$processing_disposition) ~
                       as.character(processing_disposition), 
                     TRUE ~
                       "ALL OTHERS"))

p1 <- arr %>%
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  group_by(fy, disp_short) %>% 
  summarize(n = n()) %>% 
  ggplot(aes(x = fy, y=n, fill=disp_short)) +
  geom_col() +
  labs(title = "Total ICE arrests per FY by processing disposition") +
  theme_minimal()

ggplotly(p1)

Apprehension Landmark

Overview of most common arrest apprehension_landmark values gives a sense of the diversity of this category, which includes 10537 unique values; note inclusion of general values likely denoting the ICE sub-office or divison responsible for the arrest rather than a precise location. Closer inspection is recommended at the AOR level.

landmarks_per_aor <- arr %>%
  group_by(aor) %>%
  summarize(n = n_distinct(apprehension_landmark))

landmarks <- arr %>% 
  count(apprehension_landmark) %>% 
  arrange(desc(n))

p1 <- arr %>% 
  filter(arrest_date >= "2011-10-01",
         arrest_date <= "2022-09-30") %>% 
  mutate(apprehension_landmark =
           case_when(apprehension_landmark %in%
                     head(landmarks$apprehension_landmark, 15) ~
                     as.character(apprehension_landmark), 
                   TRUE ~
                     "ALL OTHERS")) %>% 
  group_by(fy, apprehension_landmark) %>% 
  summarize(n = n()) %>% 
  ggplot(aes(x = fy, y=n, fill=apprehension_landmark)) +
  geom_col() +
  labs(title = "Total ICE arrests per FY by `apprehension_landmark` (top 15)") +
  theme_minimal()

ggplotly(p1)

  1. For discussion of ICE’s definition of “arrests”, see American Immigration Council, “Changing Patterns of Interior Immigration Enforcement in the United States, 2016 - 2018”, July 2019: https://www.americanimmigrationcouncil.org/research/interior-immigration-enforcement-united-states-2016-2018↩︎