• Log In
 Visit the Pennsylvania State University Home Page

Data Science Tools

  • Home
  • About
  • Data Exploration
    • Lab 1 RapidMiner Modules
      • RM Module 1: Accessing Data
      • RM Module 2: Filtering & Sorting
      • RM Module 3: Merging & Grouping
      • RM Module 4: Creating & Removing Columns
      • RM Module 5: Changing Types & Roles for Modeling
      • RM Module 6: Normalization & Detecting Outliers
      • RM Module 7: Pivoting & Advanced Renaming
      • RM Module 8: Handling Missing Values
      • RM Module 9: Macros & Sampling
      • RM Module 10: Looping & Branching
    • Lab 1 Tableau Modules
      • T Module 1: Accessing Data
      • T Module 2: Filtering & Sorting
      • T Module 3: Merging & Grouping
      • T Module 4: Creating & Hiding Columns
      • T Module 5: Predictive Modeling
    • Lab 1 R Modules
      • R Module 1: Accessing Data
      • R Module 2: Filtering & Sorting
      • R Module 3: Merging & Grouping
      • R Module 4: Creating & Removing Columns
      • R Module 5: Predictive Modeling
      • R Module 6: Normalization & Detecting Outliers
      • R Module 7: Pivoting
      • R Module 8: Handling Missing Values
      • R Module 9: Sampling
      • R Module 10: Looping
  • Machine Learning
  • Data Cleaning
  • Text Analysis
  • Help

R Module 1: Accessing Data

LAB 1:
ALTOONA CRIME RATES

R MODULE 1:
ACCESSING DATA

 

In this tutorial, we import data on Altoona crime rates into RStudio, and then inspect the data using summary statistics. RStudio is an integrated development environment (IDE) for the R programming language – essentially, RStudio is a more user-friendly interface for programming in R.

 

We will first (1) set our working directory, then (2) create a new R script in RStudio, and finally (3) use the R code presented further below in this Module.

 

(1) Setting a working directory in RStudio

 

Open RStudio. You will notice the following tabs:

  • Console, where code is executed and output often provided.
  • Environment, where created variables, data frames, etc. are listed.
  • Plots, where graphs are usually plotted.
  • Files, where your current working directory is shown (the Files tab doesn’t actually say “working directory” anywhere, but this is what the Files tab is showing you by default).

You can see other tabs, but do not worry about them now.

 

 

Working directory, as the name implies, is the location in which R is working by default. If you don’t tell R otherwise, this is from where it will take files, and this is to where it will save files.

 

To set the working directory to the folder with our
datasets, do the following:

  1. On RStudio’s main ribbon, click Session > Set Working Directory > Choose Directory

 

 

  1. A new window pops up. Navigate to the folder containing our datasets, and click OK:

 

The Files tab (on the right) is now showing our new working directory contains our 2 datasets.
The Console tab (on the left) shows how we could have set the working directory using R code (this code would have achieved the exact same result we just did in RStudio manually).

 

Great, your working directory is now set!

Generally, you always want to set your working directory. This is for 2 reasons:

  1. Calling a file that is in your working directory makes the code shorter, and
  2. Reproducing code with a working directory is easier for other programmers.

           In conclusion, setting up the working directory saves time, and is a good programming practice.

 

(2) Creating a new R script in RStudio

 

There is one main tab still missing – the one for writing our R script. To add that, do the
following:

  1. On RStudio’s main ribbon, click File > New File > R Script.

 

The new Untitled tab appears – this is where you will be writing your R script:

Great, your R script is ready for coding! The benefit of coding in an R script is that you can always save it for later use, without the need to rewrite the code.

(3) Coding in R

 

Now that we have set up RStudio, time to code in R!

 

Copy and paste the below script into your new Untitled R script tab in R Studio.

  • Every line of code starting with the pound key / hashtag (#) is interpreted as a comment in R. It does not affect the code, or the output. It serves as instructions. Read the comments in each block before you execute the code.
  • There is an example of output after each block of code. Make sure your own output in R matches the output presented here.
  • You execute a line in R script by being in that line, and pressing CTRL+Enter.

 

R Script (copy below code, paste, and execute in RStudio; do not copy the output results)

# # # # # NSF Project “Big Data Education” (Penn State University)

# # # # # More info: http://sites.psu.edu/bigdata/

# # # # # Lab 1: Altoona Crime Rates – Module 1: Accessing Data

 

# # # # STEP 0: SET WORKING DIRECTORY

# Set working directory to a folder with the following file:

# ‘Lab 1 Data Altoona Crime Rates.csv’

# # # # STEP 1: READ IN THE DATA

# Below command says: “define variable AltoonaCrimeRates

# which reads as a csv file ‘Lab 1 Data Altoona Crime Rates.csv’,

# uses comma (,) as field separator, and has a header row”

# (a first row containing names of all columns).

# The <- syntax means assigning something to something else in R.

AltoonaCrimeRates <- read.csv(“Lab 1 Data Altoona Crime Rates.csv”,sep=”,”, header=TRUE)

 

OUTPUT (a new data frame
is created in Global Environment with 2326 observations of 39 variables):

 

<![if !vml]><![endif]>

 

 

# See the data set you just read into R

AltoonaCrimeRates

 

OUTPUT (shows first 25 rows for all 39 variables):


     Month                            Offense.Code       Sex Adult.    Total Juvenile.    Total        Age.18      Age.19

1   Jan-13          01B-Manslaughter by Negligence   M          0      1     0     0

2   Jan-13                             030-Robbery   M          2             0     0     0

3   Jan-13                             030-Robbery   F          1             0     0     0

4   Jan-13                  040-Aggravated Assault  M          10             0     1     0

5   Jan-13                  040-Aggravated Assault  F          1             0     0     0

6   Jan-13                            050-Burglary   M          4             2     2     2

7   Jan-13                            050-Burglary   F          1             0     0     0

8   Jan-13                       060-Larceny-Theft  M         22             2     2     2

9   Jan-13                       060-Larceny-Theft  F         17             0     0     1

10   Jan-13                 070-Motor Vehicle Theft  M          2             0     1     0

11   Jan-13                 070-Motor Vehicle Theft  F           1             0     0     0

12   Jan-13      080-Other Assaults – Not Aggravated  M         49       7      1     1

13   Jan-13      080-Other Assaults – Not Aggravated   F          14       0      0      1

14   Jan-13                   090-Arson   M           2             0     0     0

15   Jan-13                   110-Fraud   M          1          1     0     0

16   Jan-13                    110-Fraud   F          4             0     0     0

17   Jan-13  130-Stolen Prop., Rec., Posses., Buying   M           5              1      2      1

18   Jan-13                   140-Vandalism   M           3              4      1      2

19   Jan-13                   140-Vandalism   F           1             0      0      0

20   Jan-13      150-Weapons, Carrying, Posses, Etc.   M           4              1      1      0

21   Jan-13      150-Weapons, Carrying, Posses, Etc.   F           1              0      0      0

22   Jan-13     170-Sex Offenses (Except 02 and 160)   M           1             0     0     0

23   Jan-13 18B-Drug Sale/Mfg – Marijuana   M 2 0 1 0

24   Jan-13 18C-Drug Sale/Mfg – Synthetic   M  2 0  )0

25   Jan-13 18D-Drug Sale/Mfg – Other M 4 0 0 0

 


            Age.20  Age.21  Age.22  Age.23  Age.24   Age.25 .29 Age.30.34 Age.35.39 Age.40.44 Age.45.49 Age.50.54

1  0 0 0 0 0 0 0 0 0 0 0

2 0 0 0 0 0 0 1 1 0 0 0

300000010000

4
0
0
0
0
0
0
5
0
1
1
1

5
1
0
0
0
0
0
0
0
0
0
0

6
0      0      0      0      0
0
0
0
0
0
0

7
0
0
0
0
0
0
0
1
0
0
0

8
0
1
0
0
1
4
6
2
3
0
1

9
0
3
3
1
1
1
2
3
0
1
0

10
0
1
0
0
0
0
0
0
0
0
0

11
0
0
0
0
0
0
1
0
0
0
0

12
3
3
1
3
0
4
9
4
9
4
3

13
0
1
0
1
0
2
  0
2
4
2
1

14
0
1
0
1
0
0
0
0
0
0
0

15
0
0
0
0
0
0
0
0
0
1
0

16        0      0      0      0      0
0
1
2
0
1
0

17
0
0
0
0
0
0
0
0
0
0
2

18
0
0
0
0
0
0
0
0
0
0
0

19
0
0
0
0
0
0
0
0
0
1
0

20
0
0
0
0
0
0
1
0
0
0
0

21
0
1      0      0      0
0
0
0
0
0
0

22
0
0
0
0
0
0
0
1
0
0
0

23
0
0
0
0
0
1
0
0
0         0
0

24
0
0
0
0
0
0
2
0
0
0
0

25
0
0
0
4
0
0
0
0
0
0
0


Age.55.59 Age.60.64 Age.65. Adult.Race.White
Adult.Race.Black Adult.Race.Native.American

1
0
0
0
0
0
0

2
0
0
0
2
0
0

3
0
0
0
1
0
0

4
1
0
0
9
1
0

5
0
0
0
1
0                          0

6
0
0
0
4
0
0

7
0
0
0
1
0
0

8
0
0
0
     20
2
0

9
1
0
0
17
0
0

10
0
0
0
2
0
0

11
0         0       0
1
0
0

12
3
0
1
41
8
0

13
0
0
0
14
0
        0

14
0
0
0
0
2
0

15
0
0
0
1
0
0

16
0
0
0
4
       0
0

17
0
0
0
5
0
0

18
0
0
0
3
0
0

19
0
0
0
1
0
0

20
2
0
0
4
0
0

21
0
0
0
1
0
0

22
0
0
0
1
0
0

23
0
0
0
1
1
0

24
0
0
0
2
0
0

25
0
0
0
0
4
0

     Adult.Race.Asian.Pacific Adult.Ethnic.Hispanic
Adult.Ethnic.Non.Hispanic Age.Under.10 Age.11.12

1
0
0
0
0
0

2
0
1
1
0
0

3
       0
0
1
0
0

4
0
1
9
0
0

5
0
0
 1
0
0

6
0
0
4
0
0

7
0
0
1
0
0

8
0
0
22
0
0

9
0
0
17
0
0

10
0
0
2            0
0

11
0
0
1
0
0

12
0
3
46
0
0

13
0                     1
13
0
0

14
0
0
2
0
0

15
0
0
1
0
0

16
0
0
4
0
0

17
0
0                         5
0
0

18
0
0
3
0
0

19
0
0
1
0
0

20
                     0
0
4
0
0

21
0
0
1
0
0

22
0
0
               1
0
0

23
0
0
2
0
0

24
0
0
2
0
0

25
              0
0
4
0
0


Age.13.14 Age.15 Age.16 Age.17 Juvenile.Race.White
Juvenile.Race.Black

1
0
0
0
1
1
0

2
0      0      0      0
0
0

3
0
0
0
0
0
0

4
0
0
0
0
0
0

5
0
0
0
0
         0
0

6
0
0
1
1
2
0

7
0
0
0
0
0
0

8
0
0
0
2
2
0

9
0
0
0
0
0
0

10
0
0
0
0
0
0

11
0
0
0
0
0
0

12
1
2
1
3
5
2

13
0
0
0
0
0
0

14
0
0
0
0
0
0

15
0
0
0
1
1
0

16
0
0
0
0
0
0

17
0
0
0
1
1                   0

18
0
0
1
3
4
0

19
0
0
0
0
0
0

20
1
0
0
0
1
0

21
     0      0      0      0
0
0

22
0
0
0
0
0
0

23
0
0
0
0
0
0

24
0
0
0
0                   0
0

25
0
0
0
0
0
0

     Juvenile.Race.Native.American Juvenile.Race.Asian.Pacific
Juvenile.Ethnic.Hispanic

1
0
0
0

2
0
0
0

3
0
0
      0

4
0
0
0

5
0
0
0

6
0
0
              0

7
0
0
0

8
0
0
0

9
0
0                        0

10
0
0
0

11
0
0
0

12
0
     0
0

13
0
0
0

14
0
0
0

15
0
             0
0

16
0
0
0

17
0
0
0

18
0                           0
0

19
0
0
0

20
0
0
0

21
0
0
0

22
0
0
0

23
0
0
      0

24
0
0
0

25
0
0
0

     Juvenile.Ethnic.Non.Hispanic

1
1

2
0

3
0

4
0

5
0

6
2

7
0

8
2

9
              0

10
0

11
0

12
7

13
0

14
0

15
1

16
0

17
1

18
4

19
0

20
1

21
0

22
0

23
0

24
              0

25
0

 [ reached
getOption(“max.print”) — omitted 2301 rows ]

 

 

# See the data set you
just read into R, in a more organized table

View(AltoonaCrimeRates)

 

OUTPUT (a new tab is
created with the table):

 

<![if !vml]><![endif]>

 

 

# Get summary statistics
of the data set

summary(AltoonaCrimeRates)

 

OUTPUT:

 


Month
Offense.Code
Sex

Adult.Total 

 Nov-14 :  50   060-Larceny-Theft
: 110   F: 983   Min.   : 0.0 

 Aug-16 :  49   080-Other Assaults – Not
Aggravated    : 110   M:1343   1st Qu.: 1.0 

 Jul-14 :  49   210-Driving Under the
Influence        :
110
Median : 3.0 

 Aug-15 :  48   220-Liquor Law
: 110
Mean   : 7.9 

 May-14 :  47   260-All Other Offenses (Except
Traffic): 110
3rd Qu.:11.0 

 Apr-16 :  46   230-Drunkenness
: 109
Max.   :73.0 

 (Other):2037   (Other)
:1667
         

 Juvenile.Total
Age.18
Age.19
Age.20
Age.21

 Min.   :
0.0000   Min.   : 0.0000   Min.   : 0.000   Min.   : 0.0000   Min.   :0.0000 

 1st Qu.: 0.0000   1st Qu.: 0.0000   1st Qu.: 0.000   1st Qu.: 0.0000   1st Qu.:0.0000 

 Median
:
0.0000   Median :
0.0000   Median : 0.000   Median : 0.0000   Median :0.0000 

 Mean   :
0.8912   Mean   : 0.4329   Mean   : 0.509   Mean   : 0.3805   Mean   :0.3031 

 3rd Qu.: 1.0000   3rd Qu.: 0.0000   3rd Qu.: 0.000   3rd Qu.: 0.0000   3rd Qu.:0.0000 

 Max.   :26.0000   Max.   :17.0000   Max.   :22.000   Max.   :12.0000   Max.   :7.0000 



Age.22
Age.23
Age.24
Age.25.29
Age.30.34
Age.35.39

 Min.   :0.0000   Min.   :0.000   Min.   :0.0000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.0000 

 1st Qu.:0.0000   1st Qu.:0.000   1st Qu.:0.0000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.0000 

 Median :0.0000   Median :0.000   Median :0.0000   Median :
0.000   Median : 0.000   Median : 0.0000 

 Mean   :0.2618   Mean   :0.282   Mean   :0.3022   Mean   : 1.304   Mean   : 1.137   Mean   : 0.9058 

 3rd Qu.:0.0000   3rd Qu.:0.000   3rd Qu.:0.0000   3rd Qu.: 2.000   3rd Qu.: 1.000   3rd Qu.: 1.0000 

 Max.   :6.0000   Max.   :7.000   Max.   :8.0000   Max.   :30.000   Max.   :21.000   Max.   :21.0000 



Age.40.44
Age.45.49
Age.50.54
Age.55.59
Age.60.64
Age.65.

 Min.   :
0.000   Min.   :0.0000   Min.   : 0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000 

 1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.: 0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000 

 Median
:
0.000   Median
:0.0000   Median : 0.000   Median :0.0000   Median :0.0000   Median :0.0000 

 Mean   :
0.644   Mean   :0.5004   Mean   : 0.439   Mean   :0.2631   Mean   :0.1264   Mean   :0.1083 

 3rd Qu.: 1.000   3rd Qu.:1.0000   3rd Qu.: 0.000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000 

 Max.   :12.000   Max.   :8.0000   Max.   :10.000   Max.   :6.0000   Max.   :4.0000   Max.   :4.0000 


 Adult.Race.White
Adult.Race.Black
Adult.Race.Native.American Adult.Race.Asian.Pacific

 Min.   :
0.000   Min.   : 0.0000   Min.   :0.000000
Min.   :0.00000

 1st Qu.: 1.000   1st Qu.: 0.0000   1st Qu.:0.000000
1st Qu.:0.00000

 Median
:
3.000   Median :
0.0000   Median
:0.000000
Median :0.00000

 Mean   :
6.927   Mean   : 0.9377   Mean   :0.006019
Mean   :0.02924

 3rd Qu.:10.000   3rd Qu.: 1.0000   3rd Qu.:0.000000
3rd Qu.:0.00000         

 Max.   :64.000   Max.   :22.0000   Max.   :1.000000
Max.   :4.00000


 Adult.Ethnic.Hispanic
Adult.Ethnic.Non.Hispanic
Age.Under.10       Age.11.12
Age.13.14

 Min.   :
0.0000
Min.   : 0.000
Min.   :0.00000   Min.   :0.00000   Min.   :0.0000 

 1st Qu.: 0.0000       1st Qu.:
1.000
1st Qu.:0.00000   1st
Qu.:0.00000   1st
Qu.:0.0000 

 Median
:
0.0000
 Median : 3.000
Median :0.00000
Median :0.00000
Median :0.0000 

 Mean   :
0.1109
Mean   : 7.789
Mean   :0.00172   Mean   :0.04815   Mean   :0.2188 

 3rd Qu.: 0.0000       3rd
Qu.:11.000
3rd Qu.:0.00000   3rd
Qu.:0.00000   3rd
Qu.:0.0000 

 Max.   :10.0000
Max.   :71.000
Max.   :1.00000   Max.   :9.00000   Max.   :8.0000 



Age.15
Age.16
Age.17       Juvenile.Race.White Juvenile.Race.Black

 Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   : 0.0000     Min.   :0.00000   

 1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.: 0.0000     1st Qu.:0.00000   

 Median :0.0000   Median :0.0000   Median :0.0000   Median :
0.0000     Median
:0.00000   

 Mean   :0.1827   Mean   :0.1935   Mean   :0.2463   Mean   : 0.7911     Mean   :0.09716   

 3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.: 1.0000     3rd Qu.:0.00000   

 Max.   :9.0000   Max.   :9.0000   Max.   :8.0000   Max.   :26.0000     Max.   :5.00000   


                   

 Juvenile.Race.Native.American
Juvenile.Race.Asian.Pacific Juvenile.Ethnic.Hispanic

 Min.   :0
Min.   :0.00000
Min.   :0.000000       

 1st Qu.:0
1st Qu.:0.00000
1st Qu.:0.000000       

 Median :0
Median :0.00000
Median :0.000000       

 Mean   :0
Mean   :0.00301
Mean   :0.008169       

 3rd Qu.:0
3rd Qu.:0.00000
3rd Qu.:0.000000       

 Max.   :0
Max.   :1.00000
Max.   :1.000000       


 Juvenile.Ethnic.Non.Hispanic

 Min.   :
0.0000

 1st Qu.: 0.0000

 Median
:
0.0000

 Mean   :
0.8831

 3rd Qu.: 1.0000

 Max.   :26.0000   

 

 

# Get a count for all
values of variable “Offense.Code” from AltoonaCrimeRates

table(AltoonaCrimeRates$Offense.Code)

 

OUTPUT:

 

01A-Murder and Nonnegligent Manslaughter
01B-Manslaughter by Negligence


5
2


020-Rape
030-Robbery


27
50


040-Aggravated Assault
050-Burglary


102
76

                       060-Larceny-Theft
070-Motor Vehicle Theft


110
56


080-Other Assaults – Not Aggravated
090-Arson


                       110
13


100-Forgery and Counterfeiting
110-Fraud


68
91


   120-Embezzlement  130-Stolen
Prop., Rec., Posses., Buying


20
60


140-Vandalism
150-Weapons, Carrying, Posses, Etc.


          74
70

160-Prostitution and Commercialized Vice     170-Sex Offenses
(Except 02 and 160)


4
56


18A-Drug Sale/Mfg – Opium – Cocaine
18B-Drug Sale/Mfg – Marijuana


78
83


18C-Drug Sale/Mfg – Synthetic
18D-Drug Sale/Mfg – Other


38
94

   18E-Drug
Possession – Opium – Cocaine
18F-Drug Possession – Marijuana


46
105

         18G-Drug Possession –
Synthetic
18H-Drug Possession – Other


37
105

  200-Offenses
Against Family & Children
210-Driving Under the Influence


                        73
110


220-Liquor Law
230-Drunkenness


110
109


240-Disorderly Conduct
250-Vagrancy


108
1

 260-All Other
Offenses (Except Traffic) 280-Curfew and Loitering Laws (Under 18)


110
20


290-Runaways


105

 

 

 

 

Challenges:

 

Use the outputs you just obtained to answer the following questions.

 

  1. By Offense Code: What are the most common, and least common crimes in Altoona (i.e. what Offense Codes appear in the dataset most and least times)?
  2. By Sex: Are crimes more often committed by men or women (i.e. what Sex appears in the dataset most times)?
  3. By Age: What group has committed the highest number of offenses for a single crime in a single month – adults, or juvenile offenders?
  4. By Age: What specific age group has committed the highest number of offenses for a single crime in a single month?
  5. By Race: What racial group has committed the highest number of offenses for a single crime in a single month?
  6. By Ethnicity: What ethnic group has committed the highest number of offenses for a single crime in a single month?
  7. What is the profile of the average person committing crimes in Altoona? Look at the max values for Offense Code, and Sex; average values for Adult vs. Juvenile, Age, Race, Ethnicity.

 

Important note for writing R code in
Word:

 

If you write your own R code in Word, you need to turn off Word’s autocorrecting of “straight quotes” (“”) into “smart quotes” (“”); otherwise, when you copy and paste the code into R, it will not work. This is because R only recognizes “straight quotes” (“”).

 

To disable “smart quotes”, in the main ribbon click File > Options:

 

<![if !vml]><![endif]>

<![if !vml]><![endif]>

In Options, click on Proofing > AutoCorrect Options:

<![if !vml]><![endif]>

A new
window opens – in AutoFormat tab,
un-check “Straight quotes” with “smart quotes”:

<![if !vml]><![endif]>

In the same
window, in AutoFormat As
You Type
tab,
un-check “Straight
quotes” with “smart quotes”
again:

Finally, click OK.

 

You are now all set to write your R script in Word!

 

 

Next Page: R Module 2: Filtering & SortingPrevious Page: R Module 4: Creating & Removing Columns

Follow me on Twitter

My Tweets
 Visit the Pennsylvania State University Home Page
Copyright 2025 © The Pennsylvania State University Privacy Non-Discrimination Equal Opportunity Accessibility Legal