[TO BE UPDATED SOON!]
What if you wanted to send someone a smaller sample of your dataset, e.g. 50% of the original dataset? You could sample the original dataset down to this new size.
R Script (copy below code & paste into RStudio; do not copy the output results)
# # # # # NSF Project “Big Data Education” (Penn State University)
# # # # # More info: http://sites.psu.edu/bigdata/
# # # # # Lab 1: Altoona Crime Rates – Module 9: Sampling
# # # # STEP 0: SET WORKING DIRECTORY
# Set working directory to a folder with the following file:
# ‘Lab 1 Data Altoona Crime Rates.csv’
# # # # STEP 1: READ IN THE DATA
AltoonaCrimeRates <- read.csv(“Lab 1 Data Altoona Crime Rates.csv”, sep=”,”, header=TRUE)
|
OUTPUT (a new data frame is created with 2326 observations of 39 variables):
|
# # # # STEP 2: GET 50% OF THE DATA SET
# Define number as o.5 times number of rows (nrow) in AltoonaCrimeRates.
number <- 0.5*nrow(AltoonaCrimeRates)
|
OUTPUT (a new value “number” is created, equal to 1163):
|
# # # # STEP 3: RANDOMLY SAMPLE THE DATA SET DOWN TO 50%
AltoonaCrimeRates <- AltoonaCrimeRates[sample(nrow(AltoonaCrimeRates), number), ]
|
OUTPUT (the AltoonaCrimeRates data frame is reduced to its new sample size of 1163 observations):
|
# Checking: See the new samples data set
View(AltoonaCrimeRates)
|
OUTPUT (you can still see numbers of rows from the original data set in the sample, e.g. row 264, then 701, then 1729, etc.):
|
Challenges:
- Try to build samples of 30% and 80% of the original size. What do you need to change? What are the resulting example set sizes?
- Replace the input example set with a different dataset. Do you need to change anything else or will the process execute just fine? Try it!
Next Page: Previous Page:



