Introduction

This notebook presents results of a preliminary statistical analysis of one week of Seattle Police Department (SPD) Automated License Plate Reader (ALPR) data, obtained via public records request by the University of Washington Center for Human Rights (UWCHR).

UWCHR is grateful to Jessica Godwin and UW Center for Studies in Demography and Ecology (CSDE) for consulting and feedback on this project.

About the data

This dataset represents one week data from ALPR devices employed by SPD. Detailed descriptive analysis of these data is available here. Each observation in the dataset represents one license plate read by a given ALPR device, with an associated zipcode derived from the original dataset’s address fields. Per comments from Jessica Godwin, we note that “zipcode” is not a policy-relevant geographic designation; future analysts are encouraged to explore alternative geographic units such as SPD precints or sectors.

As noted in the descriptive analysis writeup, two ALPR devices representing approximately a quarter of total reads included no address information; these observations are dropped from this analysis. The analysis is restricted to zipcodes in the city of Seattle; and excludes the zipcode “98195”, the University of Washington campus, for which no household income data are available, and which is policed by the University of Washington Police Department.

Analysis

We attempted to model the relationship between license plate read frequency by location and a small selection of socio-economic estimates obtained via tidycensus(). We group the data at the level of plate reads per zipcode by device and date. The resulting data, with 166 observations, approximately conforms to a Poisson distribution:

Summary statistics

Statistic N Mean St. Dev. Min Max
read_freq 166 412.000 504.110 1 2,961
read_wday 166 3.627 1.837 1 7
read_weekend 166 0.217 0.413 0 1
spd_precinct 166 0.205 0.405 0 1
median_incomeE 166 96,436.240 24,397.000 60,955 148,878
total_popE 166 32,588.190 16,087.100 808 53,729
hispE 166 2,348.699 1,271.048 79 4,667
whiteE 166 19,901.240 12,201.840 479 40,136
blackE 166 2,158.373 2,163.552 86 9,767
nativeE 166 117.651 113.283 0 462
asianE 166 5,782.873 2,985.629 101 13,039
pacislE 166 89.705 91.909 0 351
otherE 166 134.277 122.073 0 526


Visual inspection of the bivariate relationship between plate read frequency (read_freq) and estimated median household income per zipcode (median_incomeE) does not show a strong trend:

Models with zipcode and device modeled as random effects find significant negative correlation between plate read frequency (read_freq) and median household income (median_incomeE), when including offset for total population per zipcode. We also control for whether the reads took place on a weekend or in a zipcode with an SPD precinct office. Predictor variables are scaled per the requirements of the model. (Results are similar when modeling zipcode and device as dummy variables.)

reads_per_zip_device_day[, predictors] <- scale(reads_per_zip_device_day[, predictors])

mod3 <- glmer('read_freq ~ (1 | zipcode) + (1 | device) + read_weekend + median_incomeE + hispE + asianE + blackE + multiE + otherE + nativeE + pacislE + spd_precinct',
              offset=log(reads_per_zip_device_day$total_popE),
              data=reads_per_zip_device_day,
              family=poisson
              )

stargazer(mod3, type='html')
Dependent variable:
read_freq
read_weekend -0.360***
(0.012)
median_incomeE -0.831***
(0.279)
hispE -0.590
(0.543)
asianE 0.223
(0.328)
blackE 0.105
(0.445)
multiE 0.632
(0.517)
otherE -0.045
(0.405)
nativeE -0.781***
(0.270)
pacislE -0.295
(0.449)
spd_precinct -0.329
(0.625)
Constant -4.524***
(0.427)
Observations 166
Log Likelihood -29,225.520
Akaike Inf. Crit. 58,477.040
Bayesian Inf. Crit. 58,517.490
Note: p<0.1; p<0.05; p<0.01


However, the model shows evidence of a high degree of overdispersion. Post-fit quasilikelihood estimation (see: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#fitting-models-with-overdispersion) removes the significance of median_incomeE:

##      chisq      ratio        rdf          p 
## 61266.9516   400.4376   153.0000     0.0000
##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)       -4.5244     8.5495   -0.53     0.60
## read_weekendTRUE  -0.3603     0.2334   -1.54     0.12
## median_incomeE    -0.8313     5.5898   -0.15     0.88
## hispE             -0.5898    10.8701   -0.05     0.96
## asianE             0.2226     6.5710    0.03     0.97
## blackE             0.1047     8.9095    0.01     0.99
## multiE             0.6317    10.3401    0.06     0.95
## otherE            -0.0449     8.1099   -0.01     1.00
## nativeE           -0.7813     5.4063   -0.14     0.89
## pacislE           -0.2950     8.9801   -0.03     0.97
## spd_precinctTRUE  -0.3294    12.5101   -0.03     0.98

Alternative models suffer from the same issue of overdispersion, with results not robust to post-fit adjustment. While most models found a negative relationship between plate read frequency and median household income, results were not entirely consistent. No models found consistent association between plate read frequency and race/ethnicity indicators.

Given these challenges and the large proportion of missing address data in the original dataset, we do not recommend drawing strong conclusions from this preliminary analysis.