I began my journey with the two CSV files, ‘fatal-police-shootings-data’ and ‘fatal-police-shootings-agencies,’ by bringing them into Jupyter Notebook. Here’s a brief account of the starting steps and issues I came across:
ID: The ID serves as a distinctive identifier for each instance of a fatal police shooting, enabling us to uniquely reference and monitor individual occurrences. The range of IDs, from 3 to 8696, implies that there are 8002 distinct incidents documented in the dataset, with no missing or duplicate IDs.
Date: The date column records the date and time of each fatal police shooting incident, covering the period from January 2, 2015, to December 1, 2022. The mean date, which falls around January 12, 2019, indicates the central tendency of the incident dates. Approximately 25% of the incidents took place before January 18, 2017, and roughly 75% before January 21, 2021.
Age: The age columns denote the age of the victims at the time of the fatal police shooting. Victim ages in the dataset range from 2 to 92 years old, with an average age of 37.209, signifying the typical age of victims. The 25th and 75th percentiles shed light on the age distribution, with 25% of victims being 27 years old or younger and 75% being 45 years old or younger. The standard deviation, approximately 12.979, reflects the variability in victim ages.
Longitude: This column contains longitude coordinates of the locations where fatal police shootings occurred. The longitude values span a wide range, from around -160.007 to -67.867. The mean longitude, approximately -97.041, represents the central location. Approximately 25% of incidents occurred to the west of -112.028, and roughly 75% to the west of -83.152. The standard deviation, around 16.525, indicates the dispersion of incident locations along the longitude axis.
Latitude: This column indicates the latitude coordinates of the locations where fatal police shootings occurred. Latitude values vary from approximately 19.498 to 71.301. The mean latitude, around 36.676, represents the central location. Approximately 25% of incidents occurred to the south of 33.480, and roughly 75% to the south of 40.027. The standard deviation, about 5.380, reflects the dispersion of incident locations along the latitude axis.