Beyond the Numbers: Uncovering Bias in CMPD Traffic Stops
By Alex Negron ‘27
Introduction
Across the country, there is a challenge in measuring racial disparities in criminal justice policy. In the United States, the most common point of interaction between its citizens and the police is through police traffic stops. When we look at the data from these stops, we can potentially find evidence of bias within our policing system. The City of Charlotte, North Carolina, has raised growing concerns about racial disparities in the Charlotte-Mecklenburg Police Department (CMPD) and whether or not this department is enforcing the law evenly across racial groups. This article will examine those concerns using data from CMPD traffic stops to evaluate whether racial disparities exist.
This analysis uses publicly available data from the Charlotte Open Data Portal and applies both statistical and machine learning techniques to attempt to uncover bias. The dataset includes a detailed record of traffic stops dating all the way back to 2016, covering hundreds of thousands of observations. Before conducting this analysis, the data were cleaned using Python to remove missing/inconsistent entries and to merge low-frequency racial groups into broader groups. A Random Forest Classifier was then implemented to model the outcomes of traffic stops. After this modeling was completed, a z-test was performed for proportion differences to decide if findings were statistically significant. By combining methods from data science and statistics, an analysis can be created that is both predictive and insightful, enabling us to further understand how race correlates with enforcement outcomes.
Data Source
The data was sourced from Charlotte’s Open Data Portal on the City of Charlotte’s website. This data set covers hundreds of thousands of traffic stops dating all the way back to 2016. The variables used in this study are driver characteristics, stop context, and outcomes.
The preprocessing for this study was a very standard and widely utilized process for studies such as this one. The data was first cleaned and standardized using Python. The missing/inconsistent entries were then removed from the dataset. Low-frequency race categories were then merged into the “Other/Unknown” category. The data was then encoded to normalize numerical fields. Sensitive identifiers were not included in the study to ensure privacy and confidentiality. This left us with a data set of roughly 221,000 valid lines of data. CMPD’s transparency with the residents of Charlotte enables this study to be conducted using reproducible methods.
Methodology
This study combined machine learning and statistical testing to assess potential racial disparities. There were two guiding questions when conducting this study:
1. Can data predict how a traffic stop ends?
2. Do outcomes differ by race more than expected by chance?
A Random Forest classifier was used to predict outcomes based on input values. This model can generate hundreds or even thousands of decision trees. This classifier built hundreds of decision trees and then calculated the average of their results. This method better learns complex relationships and reduces random error. For data preparation, the numerical data from the preparation was used.
The model was then tested against historical data sectioned off from the original cleaned data set. This was done in order to ensure the accuracy of the prediction and to make sure the model was not over/underfitted. A Z-test for proportions tested the statistical significance of these disparities. A p-value of < 0.05 was considered statistically significant and further backed the validity of this study. A p-value of 0.05 or less indicates that the results are unlikely to occur by chance. This provides stronger evidence that the findings were meaningful and not random.
By combining machine learning and statistical methods, we can further understand the disparities and bias that may happen at traffic stops in Charlotte.
Findings
In this study, a significant bias was found in traffic stops across Charlotte. The accuracy of the Random Forest model was 47%. This may seem low, but this is normal in real-world social data. More accurate predictions were found for more common outcomes, such as Verbal Warnings and Citations. It’s a little bit harder to predict rarer outcomes, such as Arrests or Written Warnings, as there are usually unrecorded situational factors that lead to these outcomes.
The percent of predicted enforcement at stops can be listed below, with the percent being the percent of stops that end in an outcome more severe than a simple warning:
Black drivers: ~8.2%
White drivers: ~3.5%
Asian and Other/Unknown: ~2.5%
Looking at these numbers, we can see that Black drivers are over twice as likely to face a more severe outcome at traffic stops than White drivers. This disparity suggests that enforcement outcomes may not be evenly applied across racial groups. This can raise important questions about consistency and fairness in policing.
Below are the Z-test results:
Black drivers: z=47.43, p < 0.001
White drivers: z=-39.60, p < 0.001
Asian and Other/Unknown: p < 0.001
Looking at the results of the Z-test, we can see that the results of this study are statistically significant and are extremely unlikely to be due to chance. This suggests that there actually is bias within the policing system in Charlotte. These findings align with broader national studies which find consistent bias in traffic stops.
Mixing two different methods together allows both predictive and inferential insight. While random forests uncover complex relationships, a Z-test tests for the statistical significance of the disparities found. This structure can be seen as mirroring studies such as the Stanford Open Policing Project. Combining these methods produces relevant and accurate results that can benefit the city of Charlotte.
Conclusion
Looking at the results of this analysis, there is a clear racial disparity in CMPD traffic stops. Black drivers are predicted to experience enforcement actions more than twice the rate of white drivers. A p-value of 0.001 was found for each category. As a result of this, we can conclude that this study is statistically significant. As these disparities are consistent across multiple groups, we can conclude that these results are unlikely to have occurred by chance. This suggests that systemic factors influence how traffic stops end in Charlotte.
A Random Forest analysis cannot assign blame to individual officers. However, its predictive power and statistical results can indicate structural patterns that merit serious consideration. The findings of this study highlight the importance of transparency and equitable training programs within Charlotte’s policing system. By implementing these recommendations it can help ensure that enforcement is applied fairly across all racial groups.
More broadly, this study can demonstrate how open data and the application of analytical methods contribute to accountability in law enforcement. By combining data science with public policy analysis, we can better promote fairness in policing. It is important to allow access to open data. This remains essential to achieving a more equitable criminal justice system in Charlotte and beyond.
Alex Negron is a junior majoring in data science.
Sources
Groulx, C., & Fernandes, A. (2021). Racial bias in traffic stops: The City of Charlotte. University of North Carolina at Charlotte Undergraduate Research Journal, 1(1), 26–48. https://journals.charlotte.edu/index.php/urj
City of Charlotte. (n.d.). Officer Traffic Stops [Data set]. Charlotte Open Data Portal. https://data.charlottenc.gov/datasets/officer-traffic-stops
Stanford Open Policing Project. (2023). Findings – The Stanford Open Policing Project. https://openpolicing.stanford.edu/findings/