This notebook presents a national overview of U.S. Immigration and Customs Enforcement (ICE) Enforcement and Removal Operations (ERO) Law Enforcement Systems and Analysis Division (LESA) data from ICE’s Integrated Decision Support (IIDS) database regarding nationwide ICE removals for the time period from October 1, 2011, through January 29, 2023, (full U.S. Government Fiscal Years 2012 through 2022), obtained by the University of Washington Center for Human Rights pursuant to FOIA request 2022-ICFO-09023.
For data and code used to generate this notebook, see: https://github.com/UWCHR/ice-enforce
options(scipen = 1000000)
library(pacman)
p_load(here, tidyverse, zoo, lubridate, ggplot2, plotly, gghighlight)
pd_dict <- read_delim(here('share', 'hand', 'processing_disp.csv'), delim='|')
rem <- read_delim(here('write', 'input', 'ice_removals_fy12-23ytd.csv.gz'), delim='|',
col_types = cols(aor = col_factor(),
arrest_date = col_date(format="%m/%d/%Y"),
departed_date = col_date(format="%m/%d/%Y"),
case_close_date = col_date(format="%m/%d/%Y"),
removal_date = col_date(format="%m/%d/%Y"),
apprehension_method_code = col_character(),
processing_disposition_code = col_factor(),
citizenship_country = col_factor(),
gender = col_factor(),
final_charge_section = col_factor(),
id = col_integer(),
hashid = col_character()
))
redacted <- c('removal_threat_level', 'alien_file_number')
redacted_text <- paste0('`', paste(unlist(redacted), collapse = '`, `'), '`')
rem <- rem %>%
dplyr::select(-redacted, -case_closed_date)
cy_months <- c("Jan","Feb","Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")
fy_months <- c("Oct", "Nov", "Dec", "Jan","Feb","Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep")
rem <- rem %>%
mutate(aor = factor(aor, levels = sort(levels(rem$aor))),
year = year(departed_date),
month = factor(month(departed_date, label=TRUE, abbr=TRUE), levels = fy_months),
year_mth = zoo::as.yearmon(departed_date),
processing_disp = toupper(coalesce(processing_disposition_code, processing_disposition)),
fy =substr(quarter(departed_date, fiscal_start=10, type="year.quarter"), 1,4),
gender = toupper(gender),
processing_disposition = toupper(processing_disposition),
citizenship_country = factor(toupper(citizenship_country)))
rem <- left_join(rem, pd_dict, by=c('processing_disp' = 'processing_disposition_raw'))
A removal occurs when an individual is issued a final order of removal and departs the United States via deportation or voluntary return.1
The removals dataset (rem
) includes 2665505 observations
of 20 variables; 2 fully redacted fields
(removal_threat_level
, alien_file_number
) are
dropped from analysis.
The following provides an summary of dataset characteristics via
skimr::skim(rem)
:
skimr::skim(rem)
Name | rem |
Number of rows | 2665505 |
Number of columns | 20 |
_______________________ | |
Column type frequency: | |
character | 9 |
Date | 4 |
factor | 5 |
numeric | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
apprehension_method_code | 2503421 | 0.06 | 1 | 5 | 0 | 29 | 0 |
gender | 0 | 1.00 | 4 | 7 | 0 | 3 | 0 |
processing_disposition | 2297154 | 0.14 | 5 | 44 | 0 | 35 | 0 |
hashid | 0 | 1.00 | 40 | 40 | 0 | 2665505 | 0 |
area_of_responsibility | 0 | 1.00 | 25 | 37 | 0 | 26 | 0 |
year_mth | 0 | 1.00 | 4 | 16 | 0 | 148 | 0 |
processing_disp | 12290 | 1.00 | 1 | 44 | 0 | 89 | 0 |
fy | 0 | 1.00 | 4 | 4 | 0 | 13 | 0 |
processing_disposition_clean | 170886 | 0.94 | 5 | 44 | 0 | 53 | 0 |
Variable type: Date
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
departed_date | 0 | 1.00 | 2010-10-01 | 2023-01-27 | 2015-09-28 | 4500 |
case_close_date | 29356 | 0.99 | 2011-10-01 | 2022-10-04 | 2015-09-15 | 4018 |
arrest_date | 439348 | 0.84 | 1968-03-09 | 2023-01-27 | 2015-12-30 | 11667 |
removal_date | 778493 | 0.71 | 2013-10-01 | 2023-01-27 | 2017-05-18 | 3406 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
aor | 0 | 1.00 | FALSE | 25 | SNA: 707834, ELP: 313355, PHO: 265235, SND: 238497 |
processing_disposition_code | 380641 | 0.86 | FALSE | 54 | REI: 918042, ER: 451871, WA/: 327493, T: 172526 |
citizenship_country | 7 | 1.00 | FALSE | 210 | MEX: 1581516, GUA: 395299, HON: 280187, EL : 188837 |
final_charge_section | 11956 | 1.00 | FALSE | 141 | 212: 703614, 212: 677353, 212: 571377, 212: 303340 |
month | 0 | 1.00 | TRUE | 12 | Oct: 248369, May: 238244, Mar: 237583, Nov: 230357 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
id | 0 | 1 | 1332752.00 | 769465.16 | 0 | 666376 | 1332752 | 1999128 | 2665504 | ▇▇▇▇▇ |
year | 0 | 1 | 2015.57 | 2.96 | 2010 | 2013 | 2015 | 2018 | 2023 | ▅▇▅▆▁ |
Datasets were released without any data dictionary or field descriptions; in cases where this information is not self-explanatory, we have attempted to provide citations of relevant sources providing context.
aor
: ICE Area of Responsibility associated with
removalarrest_date
: Date of arrestdeparted_date
: Date of departureremoval_date
: Date of order of removalcase_closed_date
: Date of closure of caseapprehension_method_code
: Abbreviated code for
apprehension method associated with removalprocessing_disposition_code
: Abbreviated code for
processing disposition associated with removalfinal_charge_section
: Federal code under which
individual ordered removedcitizenship_country
: Country of citizenship of removed
individualgender
: Gender of removed individualapprehension_threat_level
: Fully redacted in original
datasetremoval
_threat_level`: Fully redacted in original
datasetalien_file_number
: Unique individual identifier for
arrested individual, fully redacted in original datasetid
: Sequential record identifier (not individual
identifier)hashid
: Unique record hash (not individual
identifier)processing_disposition_clean
: Inferred full text value
of processing_disposition_code
year
: Calendar year derived from
arrest_date
month
: Abbreviated month derived from
arrest_date
year_mth
: Calendar year and month derived from
arrest_date
fy
: U.S. government fiscal year (Oct.-Sept.) derived
from arrest_date
Major decrease in removals by ICE, but note CBP Title 42 expulsions at Southern border since 2020 are not counted here.
p1 <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
group_by(fy) %>%
summarize(n = n()) %>%
ggplot(aes(x = as.factor(fy), y=n)) +
geom_col() +
labs(title = "Total removals per FY") +
theme_minimal()
p1
p2 <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
group_by(year_mth) %>%
summarize(n = n()) %>%
ggplot(aes(x = year_mth, y = n)) +
geom_line(aes(group=1)) +
ylim(0, NA) +
labs(title = "Total nationwide ICE removals per month") +
theme_minimal()
p2
p3 <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
group_by(fy, month) %>%
summarize(n = n()) %>%
ggplot(aes(x = month, y = n, color = fy, group = fy)) +
geom_line() +
ylim(0, NA) +
scale_color_viridis_d() +
labs(title = "Total nationwide ICE removals per month") +
theme_minimal()
p3
gender
# rem %>%
# mutate(gender = tolower(gender)) %>%
# group_by(gender) %>%
# summarize(n = n())
p1 <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
count(fy, gender) %>%
ggplot(aes(x=fy, y=n, fill=gender)) +
geom_col(position='fill') +
scale_y_continuous(labels = scales::percent) +
labs(title="Total ICE removals, % by gender") +
theme_minimal()
p1
citizenship_country
Note citizenship_country
may not correspond with an
individual’s deportation destination; deportation destination is not
represented in this dataset.
cit <- rem %>%
mutate(citizenship_country = toupper(citizenship_country)) %>%
group_by(citizenship_country) %>%
summarize(n = n()) %>%
arrange(desc(n))
p1 <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
mutate(citizenship_country = case_when(
citizenship_country %in% head(cit$citizenship_country, 15) ~ citizenship_country,
TRUE ~ "ALL OTHERS"
)) %>%
count(fy, citizenship_country) %>%
ggplot(aes(x=fy, y=n, fill=citizenship_country, color=citizenship_country)) +
geom_col() +
labs(title = "Total ICE removals by country of citizenship (top 15)") +
theme_minimal()
ggplotly(p1)
# % change in removal by group?
p1 <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
group_by(fy, aor) %>%
summarize(n = n()) %>%
ggplot(aes(x = as.factor(fy), y=n, color=aor, group=aor)) +
geom_line() +
labs(title = "Total removals per FY by AOR") +
theme_minimal()
ggplotly(p1)
natl_pct_chg <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
group_by(fy) %>%
summarize(n = n()) %>%
mutate(pct_change = (n/lag(n) - 1))
p1 <- natl_pct_chg %>%
ggplot(aes(x = fy, y = pct_change)) +
geom_col() +
scale_y_continuous(labels = scales::percent) +
labs(title="FY % change in total removals") +
theme_minimal()
p1
aor_pct_chg <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30",
aor != "HQ",
!is.na(aor)) %>%
group_by(fy, aor) %>%
summarize(n = n()) %>%
group_by(aor) %>%
arrange(fy, .by_group=TRUE) %>%
mutate(pct_change = (n/lag(n) - 1))
p2 <- aor_pct_chg %>%
ggplot(aes(x = fy, y = pct_change)) +
geom_col() +
scale_y_continuous(labels = scales::percent) +
scale_x_discrete(breaks=seq(2012, 2022, 2)) +
facet_wrap(~aor) +
labs(title="FY % change in total removals per AOR") +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust=1)) +
theme_minimal()
p2
Unlike datasets for encounters and arrests, removals data represents case
processing disposition as an abbreviated
processing_disposition_code
. Where possible, we have
inferred correspondence between full text
processing_disposition
values in encounters and arrests
datasets and processing_disposition_code
values in this
dataset; cleaned values are represented in the
processing_disposition_clean
field.
disps <- rem %>%
filter(departed_date >= "2012-10-01",
departed_date <= "2022-09-30",
) %>%
count(processing_disposition_clean) %>%
arrange(desc(n))
top_disp <- disps %>%
filter(n > 50000)
rem <- rem %>%
mutate(disp_short = case_when(processing_disposition_clean %in% unlist(top_disp$processing_disposition_clean) ~ as.character(processing_disposition_clean),
TRUE ~ "ALL OTHERS"))
p1 <- rem %>%
filter(departed_date >= "2012-10-01",
departed_date <= "2022-09-30",
) %>%
group_by(fy, disp_short) %>%
summarize(n = n()) %>%
ggplot(aes(x = as.factor(fy), y=n, fill=disp_short)) +
geom_col() +
labs(title = "Total removals per FY by processing disposition") +
theme_minimal()
ggplotly(p1)
apprehension_method_code
This field is largely missing data prior to FY 2022. Codes are
alphanumeric abbreviations; most common codes are analogous to full text
values in apprehension_method
field of arrests dataset but significance of some codes
is unclear; for example, the 15 top values for this field and inferred
correspondence:
rem <- rem %>%
mutate(apprehension_method_code = str_replace_all(apprehension_method_code, "287.0", "287"))
apprehension_method_code_rank <- rem %>%
count(apprehension_method_code) %>%
arrange(desc(n))
p1 <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
count(year_mth, apprehension_method_code) %>%
ggplot(aes(x = year_mth, y = n, fill = apprehension_method_code)) +
geom_col() +
labs(title = "Removals by `apprehension_method_code`, FY 2022") +
theme_minimal()
ggplotly(p1)
Removals data includes four separate date fields:
departed_date
, case_close_date
,
arrest_date
, and removal_date
. Of these,
departed_date
is most complete, with no missing values;
therefore we use this date as the primary field for date values in this
notebook.
rem %>%
dplyr::select(contains('date')) %>%
skimr::skim()
Name | Piped data |
Number of rows | 2665505 |
Number of columns | 4 |
_______________________ | |
Column type frequency: | |
Date | 4 |
________________________ | |
Group variables | None |
Variable type: Date
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
departed_date | 0 | 1.00 | 2010-10-01 | 2023-01-27 | 2015-09-28 | 4500 |
case_close_date | 29356 | 0.99 | 2011-10-01 | 2022-10-04 | 2015-09-15 | 4018 |
arrest_date | 439348 | 0.84 | 1968-03-09 | 2023-01-27 | 2015-12-30 | 11667 |
removal_date | 778493 | 0.71 | 2013-10-01 | 2023-01-27 | 2017-05-18 | 3406 |
hist(rem$departed_date, breaks='years', col='pink')
hist(rem$removal_date, breaks='years', col='lightblue')
hist(rem$arrest_date, breaks='years', col='lightyellow')
hist(rem$case_close_date, breaks='years', col='lightgreen')
The earliest dataset analyzed here, for FY 2012, includes only the
departed_date
and case_close_date
fields; the
FY 2013 dataset introduces an additional arrest_date
value
alongside these; and the FY 2014 and subsequent datasets include a
fourth value for removal_date
.
The fields departed_date
and removal_date
are complete for all records in datasets where these date fields appear.
Only the most recent records for FY 2022 are missing
case_close_date
values, logically suggests that these cases
remained open at the time of production of this dataset; a small
proportion of records are missing arrest_date
during all
years since FY 2013, it is not clear what this indicates about the cases
in question.
rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
group_by(fy) %>%
summarize(missing_dep_date = sum(is.na(departed_date)),
missing_rem_date = sum(is.na(removal_date)),
missing_arr_date = sum(is.na(arrest_date)),
missing_cc_date = sum(is.na(case_close_date)),
)
## # A tibble: 11 × 5
## fy missing_dep_date missing_rem_date missing_arr_date missing_cc_date
## <chr> <int> <int> <int> <int>
## 1 2012 0 408419 402919 0
## 2 2013 0 363144 144 0
## 3 2014 0 0 1772 0
## 4 2015 0 0 1836 0
## 5 2016 0 0 2679 0
## 6 2017 0 0 3362 0
## 7 2018 0 0 4095 0
## 8 2019 0 0 4723 0
## 9 2020 0 0 4117 0
## 10 2021 0 0 2756 0
## 11 2022 0 0 2922 1924
We can calculate lag between different dates, between arrest date and
departure date, which reveals an increase in average time between arrest
and departure since FY 2014; and significant difference between cases by
processing_disposition
:
rem$dep_diff_arr <- difftime(rem$departed_date, rem$arrest_date, units='days')
p1 <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
filter(fy >= 2013) %>%
group_by(fy) %>%
summarize(mean_dep_diff_arr = mean(dep_diff_arr, na.rm = TRUE)) %>%
ggplot(aes(x = fy, y = mean_dep_diff_arr)) +
geom_line(group=1) +
ylim(0, NA) +
theme_minimal()
p1
p2 <- rem %>%
filter(departed_date >= "2011-10-01",
departed_date <= "2022-09-30") %>%
mutate(disp_short = case_when(processing_disposition_clean %in%
unlist(top_disp$processing_disposition_clean) ~
as.character(processing_disposition_clean),
TRUE ~ "ALL OTHERS")) %>%
filter(fy >= 2014) %>%
group_by(fy, disp_short) %>%
summarize(mean_dep_diff_arr = mean(dep_diff_arr, na.rm = TRUE),
med_dep_diff_arr = median(dep_diff_arr, na.rm = TRUE)) %>%
ggplot(aes(y = disp_short, x = mean_dep_diff_arr, color = disp_short, group=disp_short)) +
geom_boxplot() +
scale_y_discrete(label=function(x) abbreviate(x, minlength=10)) +
theme_minimal()
p2
For discussion of ICE’s definition of “removals”, see American Immigration Council, “Changing Patterns of Interior Immigration Enforcement in the United States, 2016 - 2018”, July 2019: https://www.americanimmigrationcouncil.org/research/interior-immigration-enforcement-united-states-2016-2018↩︎