Data overview: Removals

This notebook presents a national overview of U.S. Immigration and Customs Enforcement (ICE) Enforcement and Removal Operations (ERO) Law Enforcement Systems and Analysis Division (LESA) data from ICE’s Integrated Decision Support (IIDS) database regarding nationwide ICE removals for the time period from October 1, 2011, through January 29, 2023, (full U.S. Government Fiscal Years 2012 through 2022), obtained by the University of Washington Center for Human Rights pursuant to FOIA request 2022-ICFO-09023.

For data and code used to generate this notebook, see: https://github.com/UWCHR/ice-enforce

options(scipen = 1000000)

library(pacman)

p_load(here, tidyverse, zoo, lubridate, ggplot2, plotly, gghighlight)

pd_dict <- read_delim(here('share', 'hand', 'processing_disp.csv'), delim='|')

rem <- read_delim(here('write', 'input', 'ice_removals_fy12-23ytd.csv.gz'), delim='|',
                  col_types = cols(aor = col_factor(),
                                   arrest_date = col_date(format="%m/%d/%Y"),
                                  departed_date = col_date(format="%m/%d/%Y"),
                                  case_close_date = col_date(format="%m/%d/%Y"),
                                  removal_date = col_date(format="%m/%d/%Y"),
                                  apprehension_method_code = col_character(),
                                  processing_disposition_code = col_factor(),
                                  citizenship_country = col_factor(),
                                  gender = col_factor(),
                                  final_charge_section = col_factor(),
                                  id = col_integer(),
                                  hashid = col_character()
                                  )) 

redacted <- c('removal_threat_level', 'alien_file_number')
redacted_text <- paste0('`', paste(unlist(redacted), collapse = '`, `'), '`')

rem <- rem %>% 
  dplyr::select(-redacted, -case_closed_date)

cy_months <- c("Jan","Feb","Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
fy_months <- c("Oct", "Nov", "Dec", "Jan","Feb","Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep")

rem <- rem %>% 
  mutate(aor = factor(aor, levels = sort(levels(rem$aor))),
         year = year(departed_date),
         month = factor(month(departed_date, label=TRUE, abbr=TRUE), levels = fy_months),
         year_mth = zoo::as.yearmon(departed_date),
         processing_disp = toupper(coalesce(processing_disposition_code, processing_disposition)),
         fy =substr(quarter(departed_date, fiscal_start=10, type="year.quarter"), 1,4),
         gender = toupper(gender),
         processing_disposition = toupper(processing_disposition),
         citizenship_country = factor(toupper(citizenship_country)))

rem <- left_join(rem, pd_dict, by=c('processing_disp' = 'processing_disposition_raw'))

A removal occurs when an individual is issued a final order of removal and departs the United States via deportation or voluntary return.¹

The removals dataset (rem) includes 2665505 observations of 20 variables; 2 fully redacted fields (removal_threat_level, alien_file_number) are dropped from analysis.

The following provides an summary of dataset characteristics via skimr::skim(rem):

skimr::skim(rem)

Data summary
Name	rem
Number of rows	2665505
Number of columns	20
_______________________
Column type frequency:
character	9
Date	4
factor	5
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	n_missing	complete_rate	min	max	n_unique
apprehension_method_code	2503421	0.06	1	5	29
gender	0	1.00	4	7	3
processing_disposition	2297154	0.14	5	44	35
hashid	0	1.00	40	40	2665505
area_of_responsibility	0	1.00	25	37	26
year_mth	0	1.00	4	16	148
processing_disp	12290	1.00	1	44	89
fy	0	1.00	4	4	13
processing_disposition_clean	170886	0.94	5	44	53

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
departed_date	0	1.00	2010-10-01	2023-01-27	2015-09-28	4500
case_close_date	29356	0.99	2011-10-01	2022-10-04	2015-09-15	4018
arrest_date	439348	0.84	1968-03-09	2023-01-27	2015-12-30	11667
removal_date	778493	0.71	2013-10-01	2023-01-27	2017-05-18	3406

Variable type: factor

skim_variable	n_missing	complete_rate	ordered	n_unique	top_counts
aor	0	1.00	FALSE	25	SNA: 707834, ELP: 313355, PHO: 265235, SND: 238497
processing_disposition_code	380641	0.86	FALSE	54	REI: 918042, ER: 451871, WA/: 327493, T: 172526
citizenship_country	7	1.00	FALSE	210	MEX: 1581516, GUA: 395299, HON: 280187, EL : 188837
final_charge_section	11956	1.00	FALSE	141	212: 703614, 212: 677353, 212: 571377, 212: 303340
month	0	1.00	TRUE	12	Oct: 248369, May: 238244, Mar: 237583, Nov: 230357

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100	hist
id	0	1	1332752.00	769465.16	0	666376	1332752	1999128	2665504	▇▇▇▇▇
year	0	1	2015.57	2.96	2010	2013	2015	2018	2023	▅▇▅▆▁

Field definitions

Datasets were released without any data dictionary or field descriptions; in cases where this information is not self-explanatory, we have attempted to provide citations of relevant sources providing context.

Original dataset fields

aor: ICE Area of Responsibility associated with removal
arrest_date: Date of arrest
departed_date: Date of departure
removal_date: Date of order of removal
case_closed_date: Date of closure of case
apprehension_method_code: Abbreviated code for apprehension method associated with removal
processing_disposition_code: Abbreviated code for processing disposition associated with removal
final_charge_section: Federal code under which individual ordered removed
citizenship_country: Country of citizenship of removed individual
gender: Gender of removed individual
apprehension_threat_level: Fully redacted in original dataset
removal_threat_level`: Fully redacted in original dataset
alien_file_number: Unique individual identifier for arrested individual, fully redacted in original dataset

Additional fields created by UWCHR

id: Sequential record identifier (not individual identifier)
hashid: Unique record hash (not individual identifier)
processing_disposition_clean: Inferred full text value of processing_disposition_code
year: Calendar year derived from arrest_date
month: Abbreviated month derived from arrest_date
year_mth: Calendar year and month derived from arrest_date
fy: U.S. government fiscal year (Oct.-Sept.) derived from arrest_date

Total removals

Major decrease in removals by ICE, but note CBP Title 42 expulsions at Southern border since 2020 are not counted here.

p1 <- rem %>% 
 filter(departed_date >= "2011-10-01",
     departed_date <= "2022-09-30") %>% 
  group_by(fy) %>% 
  summarize(n = n()) %>% 
  ggplot(aes(x = as.factor(fy), y=n)) +
  geom_col() +
  labs(title = "Total removals per FY") +
  theme_minimal()

p1

p2 <- rem %>%
 filter(departed_date >= "2011-10-01",
     departed_date <= "2022-09-30") %>% 
  group_by(year_mth) %>%
  summarize(n = n()) %>%
  ggplot(aes(x = year_mth, y = n)) +
  geom_line(aes(group=1)) +
  ylim(0, NA) +
  labs(title = "Total nationwide ICE removals per month") +
  theme_minimal()

p2

p3 <- rem %>%
 filter(departed_date >= "2011-10-01",
     departed_date <= "2022-09-30") %>% 
  group_by(fy, month) %>%
  summarize(n = n()) %>%
  ggplot(aes(x = month, y = n, color = fy, group = fy)) +
  geom_line() +
  ylim(0, NA) +
  scale_color_viridis_d() +
  labs(title = "Total nationwide ICE removals per month") +
  theme_minimal()

p3

Basic demographics

Removals by `gender`

# rem %>%
#   mutate(gender = tolower(gender)) %>% 
#   group_by(gender) %>% 
#   summarize(n = n())

p1 <- rem %>% 
  filter(departed_date >= "2011-10-01",
       departed_date <= "2022-09-30") %>% 
  count(fy, gender) %>% 
  ggplot(aes(x=fy, y=n, fill=gender)) +
  geom_col(position='fill') +
  scale_y_continuous(labels = scales::percent) +
  labs(title="Total ICE removals, % by gender") +
  theme_minimal()

p1

Removals by `citizenship_country`

Note citizenship_country may not correspond with an individual’s deportation destination; deportation destination is not represented in this dataset.

cit <- rem %>%
  mutate(citizenship_country = toupper(citizenship_country)) %>% 
  group_by(citizenship_country) %>% 
  summarize(n = n()) %>% 
  arrange(desc(n))

p1 <- rem %>% 
 filter(departed_date >= "2011-10-01",
     departed_date <= "2022-09-30") %>% 
  mutate(citizenship_country = case_when(
    citizenship_country %in% head(cit$citizenship_country, 15) ~ citizenship_country,
    TRUE ~ "ALL OTHERS"
  )) %>% 
  count(fy, citizenship_country) %>% 
  ggplot(aes(x=fy, y=n, fill=citizenship_country, color=citizenship_country)) +
  geom_col() +
  labs(title = "Total ICE removals by country of citizenship (top 15)") +
  theme_minimal()

ggplotly(p1)

# % change in removal by group?

ICE removals per AOR

p1 <- rem %>%
  filter(departed_date >= "2011-10-01",
     departed_date <= "2022-09-30") %>% 
  group_by(fy, aor) %>% 
  summarize(n = n()) %>% 
  ggplot(aes(x = as.factor(fy), y=n, color=aor, group=aor)) +
  geom_line() +
  labs(title = "Total removals per FY by AOR") +
  theme_minimal()

ggplotly(p1)

natl_pct_chg <- rem %>%
  filter(departed_date >= "2011-10-01",
         departed_date <= "2022-09-30") %>% 
  group_by(fy) %>%
  summarize(n = n()) %>% 
  mutate(pct_change = (n/lag(n) - 1))

p1 <- natl_pct_chg %>% 
  ggplot(aes(x = fy, y = pct_change)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent) +
  labs(title="FY % change in total removals") +
  theme_minimal()

p1

aor_pct_chg <- rem %>%
  filter(departed_date >= "2011-10-01",
         departed_date <= "2022-09-30",
         aor != "HQ",
         !is.na(aor)) %>% 
  group_by(fy, aor) %>%
  summarize(n = n()) %>% 
  group_by(aor) %>% 
  arrange(fy, .by_group=TRUE) %>% 
  mutate(pct_change = (n/lag(n) - 1))

p2 <- aor_pct_chg %>% 
  ggplot(aes(x = fy, y = pct_change)) +
  geom_col() +
  scale_y_continuous(labels = scales::percent) +
  scale_x_discrete(breaks=seq(2012, 2022, 2)) +
  facet_wrap(~aor)  +
  labs(title="FY % change in total removals per AOR") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
  theme_minimal()

p2

Removals by processing disposition

Unlike datasets for encounters and arrests, removals data represents case processing disposition as an abbreviated processing_disposition_code. Where possible, we have inferred correspondence between full text processing_disposition values in encounters and arrests datasets and processing_disposition_code values in this dataset; cleaned values are represented in the processing_disposition_clean field.

disps <- rem %>% 
  filter(departed_date >= "2012-10-01",
         departed_date <= "2022-09-30",
         ) %>% 
  count(processing_disposition_clean) %>% 
  arrange(desc(n))

top_disp <- disps %>% 
  filter(n > 50000)

rem <- rem %>% 
  mutate(disp_short = case_when(processing_disposition_clean %in% unlist(top_disp$processing_disposition_clean) ~ as.character(processing_disposition_clean), 
                                         TRUE ~ "ALL OTHERS"))

p1 <- rem %>% 
  filter(departed_date >= "2012-10-01",
         departed_date <= "2022-09-30",
         ) %>% 
  group_by(fy, disp_short) %>% 
  summarize(n = n()) %>% 
  ggplot(aes(x = as.factor(fy), y=n, fill=disp_short)) +
  geom_col() +
  labs(title = "Total removals per FY by processing disposition") +
  theme_minimal()

ggplotly(p1)

Removals by `apprehension_method_code`

This field is largely missing data prior to FY 2022. Codes are alphanumeric abbreviations; most common codes are analogous to full text values in apprehension_method field of arrests dataset but significance of some codes is unclear; for example, the 15 top values for this field and inferred correspondence:

“287”: 287(g) arrest
“PB”: Patrol border
“CLC”: Criminal Alien Program (CAP) local custody
“CFD”: CAP federal custody
“CST”: CAP state custody
“ISP”: Inspection
“L”: Located
“NCA”: Non-custodial arrest
“TRC”: Transportation check (?)
“OA”: Other agency
“PAP”: Probation and parole
“O”: Other
“REP”: ERO reprocessed
“PI”: Patrol interior (?)
“LEA”: Law enforcement agency assist

rem <- rem %>% 
  mutate(apprehension_method_code = str_replace_all(apprehension_method_code, "287.0", "287"))

apprehension_method_code_rank <- rem %>% 
  count(apprehension_method_code) %>% 
  arrange(desc(n))

p1 <- rem %>% 
  filter(departed_date >= "2011-10-01",
         departed_date <= "2022-09-30") %>% 
  count(year_mth, apprehension_method_code) %>% 
  ggplot(aes(x = year_mth, y = n, fill = apprehension_method_code)) +
  geom_col() +
  labs(title = "Removals by `apprehension_method_code`, FY 2022") +
  theme_minimal()

ggplotly(p1)

Dates

Removals data includes four separate date fields: departed_date, case_close_date, arrest_date, and removal_date. Of these, departed_date is most complete, with no missing values; therefore we use this date as the primary field for date values in this notebook.

rem %>% 
  dplyr::select(contains('date')) %>% 
skimr::skim()

Data summary
Name	Piped data
Number of rows	2665505
Number of columns	4
_______________________
Column type frequency:
Date	4
________________________
Group variables	None

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
departed_date	0	1.00	2010-10-01	2023-01-27	2015-09-28	4500
case_close_date	29356	0.99	2011-10-01	2022-10-04	2015-09-15	4018
arrest_date	439348	0.84	1968-03-09	2023-01-27	2015-12-30	11667
removal_date	778493	0.71	2013-10-01	2023-01-27	2017-05-18	3406

hist(rem$departed_date, breaks='years', col='pink')

hist(rem$removal_date, breaks='years', col='lightblue')

hist(rem$arrest_date, breaks='years', col='lightyellow')

hist(rem$case_close_date, breaks='years', col='lightgreen')

The earliest dataset analyzed here, for FY 2012, includes only the departed_date and case_close_date fields; the FY 2013 dataset introduces an additional arrest_date value alongside these; and the FY 2014 and subsequent datasets include a fourth value for removal_date.

The fields departed_date and removal_date are complete for all records in datasets where these date fields appear. Only the most recent records for FY 2022 are missing case_close_date values, logically suggests that these cases remained open at the time of production of this dataset; a small proportion of records are missing arrest_date during all years since FY 2013, it is not clear what this indicates about the cases in question.

rem %>%
  filter(departed_date >= "2011-10-01",
         departed_date <= "2022-09-30") %>% 
  group_by(fy) %>% 
  summarize(missing_dep_date = sum(is.na(departed_date)),
            missing_rem_date = sum(is.na(removal_date)),
            missing_arr_date = sum(is.na(arrest_date)),
            missing_cc_date = sum(is.na(case_close_date)),
            )

## # A tibble: 11 × 5
##    fy    missing_dep_date missing_rem_date missing_arr_date missing_cc_date
##    <chr>            <int>            <int>            <int>           <int>
##  1 2012                 0           408419           402919               0
##  2 2013                 0           363144              144               0
##  3 2014                 0                0             1772               0
##  4 2015                 0                0             1836               0
##  5 2016                 0                0             2679               0
##  6 2017                 0                0             3362               0
##  7 2018                 0                0             4095               0
##  8 2019                 0                0             4723               0
##  9 2020                 0                0             4117               0
## 10 2021                 0                0             2756               0
## 11 2022                 0                0             2922            1924

We can calculate lag between different dates, such as between arrest date and departure date, which reveals an increase in average time between arrest and departure since FY 2014; and significant difference between cases by processing_disposition:

rem$dep_diff_arr <- difftime(rem$departed_date, rem$arrest_date, units='days')

hist(as.numeric(rem$dep_diff_arr))

p1 <- rem %>% 
   filter(departed_date >= "2011-10-01",
       departed_date <= "2022-09-30") %>% 
  filter(fy >= 2013) %>%
  group_by(fy) %>% 
  summarize(mean_dep_diff_arr = mean(dep_diff_arr, na.rm = TRUE)) %>% 
  ggplot(aes(x = fy, y = mean_dep_diff_arr)) +
  geom_line(group=1) +
  ylim(0, NA) +
  theme_minimal()

p1

p2 <- rem %>% 
  filter(departed_date >= "2011-10-01",
         departed_date <= "2022-09-30") %>% 
  mutate(disp_short = case_when(processing_disposition_clean %in%
                                  unlist(top_disp$processing_disposition_clean) ~
                                  as.character(processing_disposition_clean), 
                                TRUE ~ "ALL OTHERS")) %>% 
  filter(fy >= 2014) %>%
  group_by(fy, disp_short) %>% 
  summarize(mean_dep_diff_arr = mean(dep_diff_arr, na.rm = TRUE),
            med_dep_diff_arr = median(dep_diff_arr, na.rm = TRUE)) %>% 
  ggplot(aes(y = disp_short, x = mean_dep_diff_arr, color = disp_short, group=disp_short)) +
  geom_boxplot() +
  scale_y_discrete(label=function(x) abbreviate(x, minlength=10)) +
  theme_minimal()

p2

We repeat the same analysis for the differnce between removal_date and departed_date for the period FY14 on (removal_date absent for data prior to FY14):

rem$rem_diff_dep <- difftime(rem$removal_date, rem$departed_date, units='days')

hist(as.numeric(rem$rem_diff_dep))

p1 <- rem %>% 
   filter(departed_date >= "2013-10-01",
       departed_date <= "2022-09-30") %>% 
  filter(fy >= 2013) %>%
  group_by(fy) %>% 
  summarize(mean_rem_diff_dep = mean(rem_diff_dep, na.rm = TRUE)) %>% 
  ggplot(aes(x = fy, y = mean_rem_diff_dep)) +
  geom_line(group=1) +
  ylim(0, NA) +
  theme_minimal()

p1

Here we focus on characteristics of removals with nonzero difference between removal_date, departure_date:

rem_diff_dep_nonzero <- rem %>% 
  filter(rem_diff_dep != 0,
         !is.na(rem_diff_dep)) %>% 
  mutate(rem_diff_dep = as.numeric(rem_diff_dep))

hist(as.numeric(rem_diff_dep_nonzero$rem_diff_dep))

p1 <- rem_diff_dep_nonzero %>% 
   filter(departed_date >= "2013-10-01",
       departed_date <= "2022-09-30") %>% 
  filter(fy >= 2013) %>%
  group_by(fy) %>% 
  summarize(mean_rem_diff_dep = mean(rem_diff_dep, na.rm = TRUE)) %>% 
  ggplot(aes(x = fy, y = mean_rem_diff_dep)) +
  geom_line(group=1) +
  ylim(0, NA) +
  theme_minimal()

p1

p2 <- rem_diff_dep_nonzero %>% 
  filter(departed_date >= "2013-10-01",
         departed_date <= "2022-09-30") %>% 
  mutate(disp_short = case_when(processing_disposition_clean %in%
                                  unlist(top_disp$processing_disposition_clean) ~
                                  as.character(processing_disposition_clean), 
                                TRUE ~ "ALL OTHERS")) %>% 
  filter(fy >= 2014) %>%
  ggplot(aes(y = disp_short, x = rem_diff_dep, color = disp_short, group=disp_short)) +
  geom_boxplot() +
  scale_y_discrete(label=function(x) abbreviate(x, minlength=10)) +
  theme_minimal()

p2

For discussion of ICE’s definition of “removals”, see American Immigration Council, “Changing Patterns of Interior Immigration Enforcement in the United States, 2016 - 2018”, July 2019: https://www.americanimmigrationcouncil.org/research/interior-immigration-enforcement-united-states-2016-2018 ↩︎

ICE ERO-LESA nationwide removals data, FY12-22

UWCHR

2023-06-01

Data overview: Removals

Field definitions

Original dataset fields

Additional fields created by UWCHR

Total removals

Basic demographics

Removals by `gender`

Removals by `citizenship_country`

ICE removals per AOR

Removals by processing disposition

Removals by `apprehension_method_code`

Dates

ICE ERO-LESA nationwide removals data, FY12-22

UWCHR

2023-06-01

Data overview: Removals

Field definitions

Original dataset fields

Additional fields created by UWCHR

Total removals

Basic demographics

Removals by gender

Removals by citizenship_country

ICE removals per AOR

Removals by processing disposition

Removals by apprehension_method_code

Dates

Removals by `gender`

Removals by `citizenship_country`

Removals by `apprehension_method_code`