[TO BE UPDATED SOON!]
In the last tutorial, we learned how to retrieve your data, now it is time to learn how to use that dataset and manipulate it so you can actually understand what it is saying. In this tutorial, you we will apply a filter to
the Altoona crime rates data to only look at the most common crimes committed by juvenile offenders.
R Script (copy below code & paste into RStudio; do not copy the output results)
# # # # # NSF Project “Big Data Education” (Penn State University)
# # # # # More info: http://sites.psu.edu/bigdata/
# # # # # Lab 1: Altoona Crime Rates – Module 2: Filtering & Sorting
# # # # STEP 0: SET WORKING DIRECTORY
# Set working directory to a folder with the following file:
# ‘Lab 1 Data Altoona Crime Rates.csv’
# # # # STEP 1: READ IN THE DATA
AltoonaCrimeRates <- read.csv(“Lab 1 Data Altoona Crime Rates.csv”, sep=”,”, header=TRUE)
OUTPUT (a new data frame
|
# Checking: See the read in data
View(AltoonaCrimeRates)
OUTPUT (the table pops out):
|
# # # # STEP 2: FILTER DOWN TO THE DATA OF INTEREST
# Keep only rows where Juvenile.Total>0
# Below code says “define AltoonaCrimeRates as AltoonaCrimeRates where
# the rows satisfy condition that Juvenile.Total is greater than 0″
# (because with [,] the condition before comma applies to rows,
# and condition after comma would have applied to columns, but there is none here).
AltoonaCrimeRates <- AltoonaCrimeRates[AltoonaCrimeRates$Juvenile.Total > 0, ]
OUTPUT (data frame
|
# Checking: Have the zeros under Juvenile.Total been filtered out?
View(AltoonaCrimeRates)
OUTPUT (the table pops out, column Juvenile.Total now has no
|
# # # # STEP 3: SORT DATA BY THE COLUMN OF INTEREST
# Sort data by Juvenile.Total
# Below code says “show first 5 rows (head & n=5) of AltoonaCrimeRates, where
# its rows are ordered by Juvenile.Total in a decreasing order, and
# its columns to keep are only Offense.Code, Juvenile.Total, Sex, and Month”.
head(AltoonaCrimeRates
[order(AltoonaCrimeRates$Juvenile.Total,
decreasing=TRUE),
c(“Offense.Code”, “Juvenile.Total”,
“Sex”, “Month”)],
n=5)
OUTPUT:
1201 140-Vandalism 26 M 5/1/2015 1202 140-Vandalism 17 F 5/1/2015 1452 290-Runaways 14 F 10/1/2015 295 060-Larceny-Theft 12 M 8/1/2013 710 290-Runaways 12 M 5/1/2014 508 060-Larceny-Theft 11 M 1/1/2014
|
# # # # STEP 4: AGGREGATE DATA TO GET MORE INFORMATION OF INTEREST
# Collapse data by Offense.Code (one row per crime type)
# Below code says “aggregate AltoonaCrimeRates by Offense.Code
# and apply summation”.
AltoonaCrimeRates <-
aggregate(. ~ AltoonaCrimeRates$Offense.Code,
data = AltoonaCrimeRates,
FUN = sum)
OUTPUT (data frame AltoonaCrimeRates reduces from 798 to 32 observations, one for each different value of Offense.Code):
|
# Checking: Is the table collapsed by Offense.Code?
View(AltoonaCrimeRates)
OUTPUT (the table pops out, with a new column
|
# Remove columns which no longer make sense
# (Columns Month, Offense.Code, and Sex are respectively columns 2-4)
AltoonaCrimeRates <- AltoonaCrimeRates[ ,-c(2:4)]
# Checking: Have the columns been removed?
View(AltoonaCrimeRates)
OUTPUT (the table pops out, with the 3 columns removed):
|
# Rename’AltoonaCrimeRates$Offense.Code’ into ‘Offense.Code’.
names(AltoonaCrimeRates)[names(AltoonaCrimeRates)==’AltoonaCrimeRates$Offense.Code’]
<- ‘Offense.Code’
# Checking: Has thecolumn been renamed?
View(AltoonaCrimeRates)
OUTPUT (the table pops out, with first column renamed to
|
# Look at the resulting dataset, again ordered by Total.Juvenile
head(AltoonaCrimeRates
[order(AltoonaCrimeRates$Juvenile.Total,
decreasing=TRUE),
c(“Offense.Code”,”Juvenile.Total”)],
n=5)
OUTPUT (data frame
32 290-Runaways 455 8 080-Other Assaults – Not Aggravated 340 6 060-Larceny-Theft 258 26 220-Liquor Law 199 28 240-Disorderly Conduct 189 30 260-All Other Offenses (Except Traffic) 137
|
Challenges:
Use the outputs you just obtained to answer the following questions.
- Looking at the resulting data, what is the most common crime that male juvenile offenders committed in one month? What about the female juvenile offenders? Hint: to answer for female offenders, remember you can change in the “head” command how many top rows you see (from n=5 to e.g. n=50 or n=100).
- In what month did both of the above common crimes occur?
- Looking at the resulting data, what is the most common crime that juvenile offenders, regardless of sex, committed across all months? How many times was it committed?
- Change the code so it looks at adult offenders rather than juvenile offenders, and answer questions 1-3 for them.
Next Page: Previous Page: