• Log In
 Visit the Pennsylvania State University Home Page

Data Science Tools

  • Home
  • About
  • Data Exploration
    • Lab 1 RapidMiner Modules
      • RM Module 1: Accessing Data
      • RM Module 2: Filtering & Sorting
      • RM Module 3: Merging & Grouping
      • RM Module 4: Creating & Removing Columns
      • RM Module 5: Changing Types & Roles for Modeling
      • RM Module 6: Normalization & Detecting Outliers
      • RM Module 7: Pivoting & Advanced Renaming
      • RM Module 8: Handling Missing Values
      • RM Module 9: Macros & Sampling
      • RM Module 10: Looping & Branching
    • Lab 1 Tableau Modules
      • T Module 1: Accessing Data
      • T Module 2: Filtering & Sorting
      • T Module 3: Merging & Grouping
      • T Module 4: Creating & Hiding Columns
      • T Module 5: Predictive Modeling
    • Lab 1 R Modules
      • R Module 1: Accessing Data
      • R Module 2: Filtering & Sorting
      • R Module 3: Merging & Grouping
      • R Module 4: Creating & Removing Columns
      • R Module 5: Predictive Modeling
      • R Module 6: Normalization & Detecting Outliers
      • R Module 7: Pivoting
      • R Module 8: Handling Missing Values
      • R Module 9: Sampling
      • R Module 10: Looping
  • Machine Learning
  • Data Cleaning
  • Text Analysis
  • Help

Data Exploration

Case Study: Altoona Crime Rates

 

Imagine this: You have been asked by the City of Altoona’s police department to help them improve their ongoing fight against crime in the city. They provide you with Altoona Crime Rates, a dataset covering their arrests over the last few years, and ask you to provide as many insights and recommendation as you can. They also provide you with Altoona Population Estimates, a dataset covering the city’s population, to help in your analysis. The following 10 Modules show the steps you can do as their Data Scientist to help them make their law enforcement more efficient.

 

#

Module

Explanation

1

Accessing Data

Import data on Altoona crime rates into the repository. Add data to
the process. Inspect data using summary statistics.

2

Filtering & Sorting

Determine the most common crimes committed by juvenile offenders.

3

Merging & Grouping

Join the Altoona crime dataset with the Altoona population dataset to
compare crime statistics to general population statistics in the area.

4

Creating & Removing Columns

Calculate percentage of Altoona population who are convicted
offenders, in general, by race, and by ethnicity. Keep only interesting
columns.

5

Changing Types & Roles for Modeling

Prepare data for modeling by changing column types. Predict the
profile of Altoona criminals most likely to commit a crime based on sex, age,
race, and ethnicity.

6

Normalization and Detecting Outliers

Normalize the Altoona crime data, and remove outliers to improve the
predictive model’s performance.

7

Pivoting and Advanced Renaming

Pivot the Altoona crime data from long table to wide table format,
then use advanced renaming of the attributes to make the final outcome
neater.

8

Handle Missing Values

Perform data cleansing to achieve higher data quality.

9

Macros & Sampling

Use macros to calculate a new example set size; then sample the data
down to this new size.

10

Looping & Branching

Loop over the 2 sexes and sample the examples for each sex
individually down to 500 or less.

Datasets (download here) 

 

 

Next Page: Lab 1 RapidMiner ModulesPrevious Page: Data Cleaning

Follow me on Twitter

My Tweets
 Visit the Pennsylvania State University Home Page
Copyright 2025 © The Pennsylvania State University Privacy Non-Discrimination Equal Opportunity Accessibility Legal