Data Mining Project: Boston Crime Data

 

Background

The objective of the project is to analyze the Boston Police crime data for actionable trends, including specific crime patterns relative to geographical areas; time of day, week, or year; and type of crime. Law enforcement and municipal managers may be able to make management decisions, including education and staffing, by sharing trend data.

 

The Data

The data set focuses on crimes in Boston between 2015 and 2018 and was obtained from:

https://www.kaggle.com/ankkur13/boston-crime-data#crime.csv.

There are 327,820 rows of recorded entries of crimes committed and 17 columns (Attributes), which include:

  1. INCIDENT_NUMBER:
  2. OFFENSE_CODE:
  3. OFFENSE_CODE_GROUP:
  4. OFFENSE_DESCRIPTION:
  5. DISTRICT:
  6. REPORTING_AREA:
  7. SHOOTING:
  8. OCCURRED_ON_DATE:
  9. YEAR:
  10. MONTH:
  11. DAY_OF_WEEK:
  12. HOUR:
  13. UCR_PART:
  14. STREET:
  15. LATITUDE:
  16. LONGITUDE:
  17. LOCATION:

 

More details about the attributes can be found on slide 4 of the presentation.

Data Preprocessing

Data Preprocessing was required before analyzing the data.

The details about the data cleanup and decisions made on certain records and attributes can be found on slide 6 of the presentation, as well as in the worksheet available at the bottom of this summary.

 

Analysis

Weka, a data-mining software with several algorithm options to choose from, was used to detect patterns in crime reporting.

The methods used to analyze this data were:

  • Cluster – Simple K Means
  • Classify – Decision Tree
  • Associate – Apriori

 

Outcome

The methods used, challenges, and outcomes can be appreciated from slides 8 to 18.

Note:  I’ve included the unedited worksheet where I documented the results and questions I was trying to answer as I worked on the analysis.  It includes some of the challenges and decisions made throughout the process.

Results Worksheet (Sample)

Technologies used