Introduction

This notebook presents results of a preliminary statistical analysis of one week of Seattle Police Department (SPD) Automated License Plate Reader (ALPR) data, obtained via public records request by the University of Washington Center for Human Rights (UWCHR).

UWCHR is grateful to Jessica Godwin and UW Center for Studies in Demography and Ecology (CSDE) for consulting and feedback on this project.

About the data

This dataset represents one week data from ALPR devices employed by SPD. Detailed descriptive analysis of these data is available here. Each observation in the dataset represents one license plate read by a given ALPR device, with an associated zipcode derived from the original dataset’s address fields. Per comments from Jessica Godwin, we note that “zipcode” is not a policy-relevant geographic designation; future analysts are encouraged to explore alternative geographic units such as SPD precints or sectors.

As noted in the descriptive analysis writeup, two ALPR devices representing approximately a quarter of total reads included no address information; these observations are dropped from this analysis. The analysis is restricted to zipcodes in the city of Seattle; and excludes the zipcode “98195”, the University of Washington campus, for which no household income data are available, and which is policed by the University of Washington Police Department.

Analysis

We attempted to model the relationship between license plate read frequency by location and a small selection of socio-economic estimates obtained via tidycensus(). We group the data at the level of plate reads per zipcode by device and date. The resulting data, with 166 observations, approximately conforms to a Poisson distribution:

Summary statistics


Statistic	N	Mean	St. Dev.	Min	Max

read_freq	166	412.000	504.110	1	2,961
read_wday	166	3.627	1.837	1	7
read_weekend	166	0.217	0.413	0	1
spd_precinct	166	0.205	0.405	0	1
median_incomeE	166	96,436.240	24,397.000	60,955	148,878
total_popE	166	32,588.190	16,087.100	808	53,729
hispE	166	2,348.699	1,271.048	79	4,667
whiteE	166	19,901.240	12,201.840	479	40,136
blackE	166	2,158.373	2,163.552	86	9,767
nativeE	166	117.651	113.283	0	462
asianE	166	5,782.873	2,985.629	101	13,039
pacislE	166	89.705	91.909	0	351
otherE	166	134.277	122.073	0	526

Visual inspection of the bivariate relationship between plate read frequency (read_freq) and estimated median household income per zipcode (median_incomeE) does not show a strong trend:

Models with zipcode and device modeled as random effects find significant negative correlation between plate read frequency (read_freq) and median household income (median_incomeE), when including offset for total population per zipcode. We also control for whether the reads took place on a weekend or in a zipcode with an SPD precinct office. Predictor variables are scaled per the requirements of the model. (Results are similar when modeling zipcode and device as dummy variables.)

reads_per_zip_device_day[, predictors] <- scale(reads_per_zip_device_day[, predictors])

mod3 <- glmer('read_freq ~ (1 | zipcode) + (1 | device) + read_weekend + median_incomeE + hispE + asianE + blackE + multiE + otherE + nativeE + pacislE + spd_precinct',
              offset=log(reads_per_zip_device_day$total_popE),
              data=reads_per_zip_device_day,
              family=poisson
              )

stargazer(mod3, type='html')


	Dependent variable:

	read_freq

read_weekend	-0.360^***
	(0.012)

median_incomeE	-0.831^***
	(0.279)

hispE	-0.590
	(0.543)

asianE	0.223
	(0.328)

blackE	0.105
	(0.445)

multiE	0.632
	(0.517)

otherE	-0.045
	(0.405)

nativeE	-0.781^***
	(0.270)

pacislE	-0.295
	(0.449)

spd_precinct	-0.329
	(0.625)

Constant	-4.524^***
	(0.427)


Observations	166
Log Likelihood	-29,225.520
Akaike Inf. Crit.	58,477.040
Bayesian Inf. Crit.	58,517.490

Note:	p<0.1; p<0.05; p<0.01

However, the model shows evidence of a high degree of overdispersion. Post-fit quasilikelihood estimation (see: https://bbolker.github.io/mixedmodels-misc/glmmFAQ.html#fitting-models-with-overdispersion) removes the significance of median_incomeE:

##      chisq      ratio        rdf          p 
## 61266.9516   400.4376   153.0000     0.0000

##                  Estimate Std. Error z value Pr(>|z|)
## (Intercept)       -4.5244     8.5495   -0.53     0.60
## read_weekendTRUE  -0.3603     0.2334   -1.54     0.12
## median_incomeE    -0.8313     5.5898   -0.15     0.88
## hispE             -0.5898    10.8701   -0.05     0.96
## asianE             0.2226     6.5710    0.03     0.97
## blackE             0.1047     8.9095    0.01     0.99
## multiE             0.6317    10.3401    0.06     0.95
## otherE            -0.0449     8.1099   -0.01     1.00
## nativeE           -0.7813     5.4063   -0.14     0.89
## pacislE           -0.2950     8.9801   -0.03     0.97
## spd_precinctTRUE  -0.3294    12.5101   -0.03     0.98

Alternative models suffer from the same issue of overdispersion, with results not robust to post-fit adjustment. While most models found a negative relationship between plate read frequency and median household income, results were not entirely consistent. No models found consistent association between plate read frequency and race/ethnicity indicators.

Given these challenges and the large proportion of missing address data in the original dataset, we do not recommend drawing strong conclusions from this preliminary analysis.

SPD ALPR Analysis

Destiny Moreno

Phil Neff

22 May, 2023

Introduction

About the data

Analysis

Summary statistics