# Author: University of Washington Center for Human Rights
# Title: Hidden in Plain Sight: ICE Air Data Appendix
# Date: 2019-04-24
# License: GPL 3.0 or greater
import pandas as pd
import numpy as np
import yaml
import matplotlib.pyplot as plt
This is an appendix to the report Hidden in Plain Sight: ICE Air and the Machinery of Mass Deportation, which uses data from ICE's Alien Repatriation Tracking System (ARTS) released by ICE Enforcement and Removal Operations pursuant to a Freedom of Information Act request by the University of Washington Center for Human Rights. This appendix intended to provide readers with greater detail on the contents, structure, and limitations of this dataset, and the process our researchers performed to render it suitable for social scientific analysis. The appendix is a living document that will be updated over time in order to make ICE Air data as widely-accessible and transparently-documented as possible.
The project repository contains all the data and code used for the production of the report.
# Get optimal data types before reading in the ARTS dataset
with open('input/dtypes.yaml', 'r') as yamlfile:
column_types = yaml.load(yamlfile)
read_csv_opts = {'sep': '|',
'quotechar': '"',
'compression': 'gzip',
'encoding': 'utf-8',
'dtype': column_types,
'parse_dates': ['MissionDate'],
'infer_datetime_format': True}
df = pd.read_csv('input/ice-air.csv.gz', **read_csv_opts)
# The ARTS Data Dictionary as released by ICE
data_dict = pd.read_csv('input/ARTS_Data_Dictionary.csv.gz', compression='gzip', sep='|')
data_dict.columns = ['Field', 'Definition']
# A YAML file containing the field names in the original ARTS dataset
with open('hand/arts_cols.yaml', 'r') as yamlfile:
arts_cols = yaml.load(yamlfile)
# Asserting characteristics of key fields
assert sum(df['AlienMasterID'].isnull()) == 0
assert len(df) == len(set(df['AlienMasterID']))
assert sum(df['MissionID'].isnull()) == 0
assert sum(df['MissionNumber'].isnull()) == 0
assert len(set(df['MissionID'])) == len(set(df['MissionNumber']))
According to a 2015 audit of ICE Air by the Department of Homeland Security Office of Inspector General (OIG), the ARTS dataset was created by an ICE contractor at an unknown date and transitioned to government personnell in May 2014.[1] The identity of the contractor which originally created the database is not currently known.
Following cleaning by UWCHR as detailed below, the ARTS dataset contains 1732625 records relating to ICE Air Operations charter flights during the period from October 1, 2010 to December 5, 2018, including full data for U.S. Federal Government Fiscal Years 2011 through 2018. Each record in the dataset relates to a single passenger on a single ICE Air mission.
The ARTS dataset is made up of 43 fields, defined in a data dictionary provided by ICE; however, as described below, the content of these fields does not always conform to the definitions provided:
Field | Definition |
---|---|
Status | Criminal Status |
Sex | Sex |
Convictions | Convictions |
GangMember | Gang Member |
ClassLvl | Class Level |
Age | Age |
MissionDate | Mission Date |
MissionNumber | Mission Number (i.e., Flight Number) |
PULOC | Pick up location |
DropLoc | Drop off location |
StrikeFromList | Was Alien struck from mission |
ReasonStruck | Reason Struck |
R-T | Removal or Transfer |
Code | Criminality Code |
CountryOfCitizenship | Country of Citizenship |
Juvenile | Is Alien a Juvenile |
MissionWeek | Mission Week |
MissionQuarter | Mission Quarter |
MissionYear | Mission Year |
MissionMonth | Mission Month |
Criminality | Criminality |
FamilyUnitFlag | Family Unit Flag |
UnaccompaniedFlag | Unaccompanied Flag |
AlienMasterID | Unique value assigned by ARTS for Alien |
MissionID | Unique value assigned by ARTS for mission |
air_AirportID | Airport ID (numerical) |
air_AirportName | Airport Name |
air_City | Airport City |
st_StateID | State ID (Numerical) |
st_StateAbbr | State Abbreviation |
AOR_AORID | Area of Responsibility (Numerical) |
AOR_AOR | Area of Responsibility (Abbreviation) |
AOR_AORName | Area of Responsibility |
air_Country | Airport Country |
air2_AirportID | Airport ID (numerical) |
air2_AirportName | Airport Name |
air2_City | Airport City |
st2_StateID | State ID (Numerical) |
st2_StateAbbr | State Abbreviation |
aor2_AORID | Area of Responsibility (Numerical) |
aor2_AOR | Area of Responsibility (Abbreviation) |
aor2_AORName | Area of Responsibility |
air2_Country | Airport Country |
The first 25 fields relate to passenger and mission characteristics; the latter 18 fields, marked by various prefixes, relate to characteristics of the airports and locations associated with each record. (Additional fields generated by UWCHR in the process of analysis of the dataset are not enumerated here.) There is no indication that the content of any of these fields was withheld or redacted by ICE upon release of the dataset. However, according to the 2015 DHS OIG audit, the ARTS database does include additional fields which were not released to UWCHR, including passenger A-Numbers and Fingerprint IDs; and details on the cost of individual flights.
The ARTS dataset uses three key fields to identify passengers (AlienMasterID
) and missions (MissionID
and MissionNumber
). The AlienMasterID
field is made up of 1732625 unique values. AlienMasterID
values are numeric strings starting at 5000 and incrementing to 2047246, with some values skipped. Each AlienMasterID
value is used only once; repeat passengers on multiple flights cannot be identified based solely on this field. Below we will display a subset of columns relating to passenger characteristics for the most recent records in the dataset.
selected_cols = ['AlienMasterID', 'Status', 'Sex',
'Convictions', 'GangMember', 'Age', 'PULOC', 'DropLoc',
'R-T', 'CountryOfCitizenship']
sample_records = df.loc[:, selected_cols].tail().dropna(axis=1)
AlienMasterID | Status | Sex | Convictions | GangMember | Age | PULOC | DropLoc | R-T | CountryOfCitizenship |
---|---|---|---|---|---|---|---|---|---|
2047241 | 8F | M | NC | N | 22.0 | KIAH | MGGT | R | GUATEMALA |
2047242 | 8F | M | NC | N | 20.0 | KIAH | MGGT | R | GUATEMALA |
2047243 | 8F | M | Illegal Entry | N | 32.0 | KIAH | MGGT | R | GUATEMALA |
2047245 | 16 | M | Illegal Entry | N | 24.0 | KIAH | MGGT | R | GUATEMALA |
2047246 | 16 | M | NC | N | 27.0 | KIAH | MGGT | R | GUATEMALA |
The MissionID
and MissionNumber
fields each contain 14961 values; like AlienMasterID
, these are also numeric strings that increase incrementally over time. Below we will display records for a single ICE Air mission[2] using the Pandas groupby
function, which groups records by specified characteristics. We can include here other fields such as R-T
, CountryOfCitizenship
, etc.:
sample_mission_id = 47425
sample_records = df[df['MissionID'] == sample_mission_id]
sample_groupby = sample_records.groupby(['MissionDate', 'MissionID', 'MissionNumber', 'PULOC', 'DropLoc'])
sample_table = sample_groupby['AlienMasterID'].nunique().reset_index()
sample_table = sample_table.rename({'AlienMasterID': 'PassengerCount'}, axis=1)
MissionDate | MissionID | MissionNumber | PULOC | DropLoc | PassengerCount |
---|---|---|---|---|---|
2018-12-04 | 47425 | 190331 | KBFI | KELP | 47 |
2018-12-04 | 47425 | 190331 | KBFI | KIWA | 19 |
2018-12-04 | 47425 | 190331 | KELP | KIWA | 51 |
2018-12-04 | 47425 | 190331 | KIWA | KBFI | 14 |
2018-12-04 | 47425 | 190331 | KIWA | KELP | 15 |
While mission records allow us to determine the numbers of individuals (summarized in the grouping above) moved between each pickup and dropoff location involved in a mission, we are not able to reconstruct the exact itinerary of the flight from these records. See below for more on the limitations of the MissionID
and MissionNumber
keys.
The raw ARTS dataset was released by ICE as 9 XLSX format files, these were combined into a single dataset containing 1763020 records. For space reasons, the raw files are not stored in the project's Git repository, but are available via UWCHR's Google Drive. Prior to analysis, the ARTS dataset was converted from XLSX to CSV format and cleaned to standardize fields and remove some records with missing data.
The cleaning process is fully documented in code; see the clean/
directory in the project repository. Selected data cleaning steps are described below:
AlienMasterID
values and airport metadata: In the raw ARTS dataset, 29465 AlienMasterID
values are repeated up to 2 times. Upon close inspection, it becomes apparent that these records are repeated because of inconsistencies in certain airport metadata values (fields starting with the prefixes air_
or air2_
), resulting in the duplication of some passenger records, probably due to a database merge by ICE prior to release. (For example, all records for Yuma International Airport in Yuma, AZ are duplicated, with one version of the records incorrectly listing "AR" as the state associated with the airport.) 29465 duplicate AlienMasterID
values were dropped and erroneous or missing airport metadata was corrected; see ice-air/clean/hand/bad_airports.csv
for values that were substituted. Airport metadata fields not used in the present analysis, such as numeric codes for US states, were not cleaned.ice-air/clean/hand/clean.yaml
. Fields not used in the present analysis were not cleaned. See below for discussion of some of these fields.While this dataset provides the first public view into the operations of ICE Air, it raises as many questions as it answers. In part, this is because it was released without an explanation of the context in which it is used: as noted in the report, we know some deportation flights are on commercial airlines, which appear not to be listed here, and others are charter flights whose records are similarly absent from this set. It is important, therefore, to be conscious of the limitations to the conclusions we can draw from this data alone, to “ground-truth” any observations by comparing to the lived experiences of immigrant communities, and to continue to demand full transparency from federal and local governments about the mechanics of deportations.
There are also several specific elements of this dataset’s design which limit its usefulness.
First, it does not permit the tracking of individual passengers; it is very likely that some individuals are repeated in the collection, but it it not possible to identify this because each individual is not assigned a consistent identifier in this dataset.
According to the 2015 DHS OIG audit, the ARTS database reviewed by OIG includes fields for passenger A-Numbers and Fingerprint IDs, which would permit tracking of repeat passengers, though the OIG also notes inconsistencies in the usage of these identifiers. Unfortunately, these fields were not included in the database shared with UWCHR. There is no inherent way to track repeat passengers on multiple flights in the ARTS dataset as released to the UWCHR by ICE; close analysis of specific combinations of passenger characteristics (i.e. age, nationality, criminal conviction status) does suggest that passengers are represented multiple times in the dataset, but systematically isolating repeat passengers without access to additional unique ID fields would be prohibitively difficult.
Second, the database does not track flight itineraries. Each record represents an individual passenger on an ICE Air mission, but reconstructing the flight path for a single mission or passenger is not possible. While the database contains fields labeled MissionID
and MissionNumber
that might initially appear useful for such purposes, they too introduce limitations. While MissionID
and MissionNumber
values differ, their functions in the dataset appear to be completely equivalent: combinations of these values are strictly one-to-one, and they are not hierarchical (see code snippet below). Both values consist of numeric strings which increase incrementally, with some values skipped. Contrary to the ARTS data dictionary released by ICE, the MissionNumber
field appears to bear no relation to actual flight numbers.
assert sum(df.groupby(['MissionID'])['MissionNumber'].nunique() > 1) == 0
assert sum(df.groupby(['MissionNumber'])['MissionID'].nunique() > 1) == 0
assert sum(df.groupby(['MissionNumber', 'MissionID'])['MissionDate'].nunique() > 1) == 0
assert sum(df.groupby(['MissionID', 'MissionNumber'])['MissionDate'].nunique() > 1) == 0
ICE Air missions as represented by the MissionID
and MissionNumber
fields never span more than one day, though multiple missions may occur on a given date. The MissionDate
field only records the day of the mission; the dataset does not include any other time data, such as takeoff or landing timestamps. Each mission can include multiple combinations of pickup and drop-off locations, represented by the PULOC
and DropLoc
fields. These values encode the pickup and drop-off location for each passenger on the mission, not the flight itinerary of the mission. Therefore, while each mission may include multiple flight legs, it is not possible to use this version of the ARTS database to conclusively reconstruct itineraries or calculate the total number of legs on flights operated by ICE Air.
Third, ICE agents’ entry of data is inconsistent. Many of the fields are unstandardized and present significant challenges for cleaning and analysis, especially the Status
, GangMember
, and Convictions
fields, which include many unique and often irrelevant values. (This concern was also noted by the DHS OIG in its 2015 audit; in several cases, data entry processes seem to have become more standardized over time, which may obscure real trends in the data.)
Several of these fields merit additional explanation:
Status
field, despite being defined in the ARTS Data Dictionary as relating to "Criminal Status", appears to relate to the status of a passeger's deportation proceedings. Some values in this field conform to a set of 29 alphanumeric codes used by ICE categorize the status of removal processes (see Kerwin et al., 2015, for a description of each of these codes), others consist of unstandardized text descriptions, including unrelated or irrelevant values. Analysis of the distribution of Status
values prior to data cleaning shows that the 29 standardized codes were used very rarely prior to FY 2013; after FY 2013 the standardized values are more frequent. We have translated unstandardized values with more than 100 ocurrences into their standardized equivalents, where possible; see cleaning description above and code in the clean/
task. In total this field contains more than 937 unique values, after cleaning.Convictions
, Criminality
and Code
fields all represent different ways of coding a passenger's criminal status. Convictions
is an unstructured field with 14709 unique values, presumably representing each passenger's most serious criminal conviction. The Criminality
field is more structured and was easily cleaned into a binary category where "NC" represents passengers without a criminal conviction and "C" represents passengers with a criminal conviction. However, it is important to note that this field is not always consistent with the Convictions
field, and it contains 21163 missing values, especially in the earlier period of the dataset. The Code
field consists of a relatively structured set of alphabetic codes which also appear to relate to criminal status, but their meaning is unclear.Age
and Juvenile
fields are relatively self-explanatory. Juvenile
is a binary field where all passengers aged 17 or younger are marked True; the values in this field are consistent with the numeric values in Age
. Values below 0 or above 99 in Age
were set as null values in the cleaning process.FamilyUnitFlag
and UnaccompaniedFlag
(presumably relating to family groups and unaccompanied minors), are entirely unused in this dataset: not a single record is flagged with either of these values. As noted above, there is no indication that these values have been redacted or withheld by ICE, suggesting that they are simply not used.[1] The DHS OIG explains the ARTS database as such: "ICE Air records data pertaining to charter flights and detainees in the Alien Repatriation Trackgin System (ARTS). A contractor who managed the program%rsquo;s daily activities created the ARTS database to provide a historical record of charter flights, monthly statistical reports, and data responses to ICE required under the contract. ARTS captures information such as the dates, routes, detainees, delays or cancelations, and costs associated with each charter flight."
[2] The 2015 audit by the DHS OIG defines an ICE Air mission: "A mission begins when the aircraft departs from its initial location and ends when the aircraft reaches its final destination. Depending on the need, a mission can contain one or more stops to destinations within the United States, internationally, or a combination." In a footnote, the DHS OIG specifies, "The definition for the term “mission” is derived from the definition used in the former Office of Detention and Removal Operations’ *Policy and Procedure Manual (June 2008)." UWCHR researchers have not been able to locate this reference.