This notebook presents results of a preliminary statistical analysis of one week of Seattle Police Department (SPD) Automated License Plate Reader (ALPR) data, obtained via public records request by the University of Washington Center for Human Rights (UWCHR).
UWCHR is grateful to Jessica Godwin and UW Center for Studies in Demography and Ecology (CSDE) for consulting and feedback on this project.
This dataset represents one week data from ALPR devices employed by SPD. Detailed descriptive analysis of these data is available here. Each observation in the dataset represents one license plate read by a given ALPR device, with an associated zipcode derived from the original dataset’s address fields. Per comments from Jessica Godwin, we note that “zipcode” is not a policy-relevant geographic designation; future analysts are encouraged to explore alternative geographic units such as SPD precints or sectors.
As noted in the descriptive analysis writeup, two ALPR devices representing approximately a quarter of total reads included no address information; these observations are dropped from this analysis. The analysis is restricted to zipcodes in the city of Seattle; and excludes the zipcode “98195”, the University of Washington campus, for which no household income data are available, and which is policed by the University of Washington Police Department.
We attempted to model the relationship between license plate read
frequency by location and a small selection of socio-economic estimates
obtained via tidycensus()
. We group the data at the level
of plate reads per zipcode by device and date. The resulting data, with
166 observations, approximately conforms to a Poisson distribution:
Statistic | N | Mean | St. Dev. | Min | Max |
read_freq | 166 | 412.000 | 504.110 | 1 | 2,961 |
read_wday | 166 | 3.627 | 1.837 | 1 | 7 |
read_weekend | 166 | 0.217 | 0.413 | 0 | 1 |
spd_precinct | 166 | 0.205 | 0.405 | 0 | 1 |
median_incomeE | 166 | 96,436.240 | 24,397.000 | 60,955 | 148,878 |
total_popE | 166 | 32,588.190 | 16,087.100 | 808 | 53,729 |
hispE | 166 | 2,348.699 | 1,271.048 | 79 | 4,667 |
whiteE | 166 | 19,901.240 | 12,201.840 | 479 | 40,136 |
blackE | 166 | 2,158.373 | 2,163.552 | 86 | 9,767 |
nativeE | 166 | 117.651 | 113.283 | 0 | 462 |
asianE | 166 | 5,782.873 | 2,985.629 | 101 | 13,039 |
pacislE | 166 | 89.705 | 91.909 | 0 | 351 |
otherE | 166 | 134.277 | 122.073 | 0 | 526 |
Visual inspection of the bivariate relationship between plate read
frequency (read_freq
) and estimated median household income
per zipcode (median_incomeE
) does not show a strong
trend:
Models with zipcode
and device
modeled as
random effects find significant negative correlation between plate read
frequency (read_freq
) and median household income
(median_incomeE
), when including offset for total
population per zipcode. We also control for whether the reads took place
on a weekend or in a zipcode with an SPD precinct office. Predictor
variables are scaled per the requirements of the model. (Results are
similar when modeling zipcode
and device
as
dummy variables.)
reads_per_zip_device_day[, predictors] <- scale(reads_per_zip_device_day[, predictors])
mod3 <- glmer('read_freq ~ (1 | zipcode) + (1 | device) + read_weekend + median_incomeE + hispE + asianE + blackE + multiE + otherE + nativeE + pacislE + spd_precinct',
offset=log(reads_per_zip_device_day$total_popE),
data=reads_per_zip_device_day,
family=poisson
)
stargazer(mod3, type='html')
Dependent variable: | |
read_freq | |
read_weekend | -0.360*** |
(0.012) | |
median_incomeE | -0.831*** |
(0.279) | |
hispE | -0.590 |
(0.543) | |
asianE | 0.223 |
(0.328) | |
blackE | 0.105 |
(0.445) | |
multiE | 0.632 |
(0.517) | |
otherE | -0.045 |
(0.405) | |
nativeE | -0.781*** |
(0.270) | |
pacislE | -0.295 |
(0.449) | |
spd_precinct | -0.329 |
(0.625) | |
Constant | -4.524*** |
(0.427) | |
Observations | 166 |
Log Likelihood | -29,225.520 |
Akaike Inf. Crit. | 58,477.040 |
Bayesian Inf. Crit. | 58,517.490 |
Note: | p<0.1; p<0.05; p<0.01 |
However, the model shows evidence of a high degree of overdispersion.
Post-fit quasilikelihood estimation (see: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#fitting-models-with-overdispersion)
removes the significance of median_incomeE
:
## chisq ratio rdf p
## 61266.9516 400.4376 153.0000 0.0000
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.5244 8.5495 -0.53 0.60
## read_weekendTRUE -0.3603 0.2334 -1.54 0.12
## median_incomeE -0.8313 5.5898 -0.15 0.88
## hispE -0.5898 10.8701 -0.05 0.96
## asianE 0.2226 6.5710 0.03 0.97
## blackE 0.1047 8.9095 0.01 0.99
## multiE 0.6317 10.3401 0.06 0.95
## otherE -0.0449 8.1099 -0.01 1.00
## nativeE -0.7813 5.4063 -0.14 0.89
## pacislE -0.2950 8.9801 -0.03 0.97
## spd_precinctTRUE -0.3294 12.5101 -0.03 0.98
Alternative models suffer from the same issue of overdispersion, with results not robust to post-fit adjustment. While most models found a negative relationship between plate read frequency and median household income, results were not entirely consistent. No models found consistent association between plate read frequency and race/ethnicity indicators.
Given these challenges and the large proportion of missing address data in the original dataset, we do not recommend drawing strong conclusions from this preliminary analysis.