• Log In
 Visit the Pennsylvania State University Home Page

Data Science Tools

  • Home
  • About
  • Data Exploration
    • Lab 1 RapidMiner Modules
      • RM Module 1: Accessing Data
      • RM Module 2: Filtering & Sorting
      • RM Module 3: Merging & Grouping
      • RM Module 4: Creating & Removing Columns
      • RM Module 5: Changing Types & Roles for Modeling
      • RM Module 6: Normalization & Detecting Outliers
      • RM Module 7: Pivoting & Advanced Renaming
      • RM Module 8: Handling Missing Values
      • RM Module 9: Macros & Sampling
      • RM Module 10: Looping & Branching
    • Lab 1 Tableau Modules
      • T Module 1: Accessing Data
      • T Module 2: Filtering & Sorting
      • T Module 3: Merging & Grouping
      • T Module 4: Creating & Hiding Columns
      • T Module 5: Predictive Modeling
    • Lab 1 R Modules
      • R Module 1: Accessing Data
      • R Module 2: Filtering & Sorting
      • R Module 3: Merging & Grouping
      • R Module 4: Creating & Removing Columns
      • R Module 5: Predictive Modeling
      • R Module 6: Normalization & Detecting Outliers
      • R Module 7: Pivoting
      • R Module 8: Handling Missing Values
      • R Module 9: Sampling
      • R Module 10: Looping
  • Machine Learning
  • Data Cleaning
  • Text Analysis
  • Help

RapidMiner Module 1: Accessing Data


Above: How your final Process panel will look in RapidMiner after you do this Module.

 

1/5

Import data into RapidMiner.

 

Getting your data into RapidMiner is usually the first task you need to start your analysis. In this tutorial, you will learn how to import files into the central storage of RapidMiner, called the Repository.

 

 

ACTIVITY

 

 

 

 

  1. Click Add Data in the Repository panel, and follow the steps in the wizard – navigate to the file Lab 1 Data Altoona Crime Rates.xlsx on your own computer, and select it for import.
  2. Continue to click “Next” as you go through the import steps.
  3. When you are finalizing the import, rename the data to Altoona Crime Rates, and store it in your own Local Repository – it will have your username next to it in brackets, e.g. “Local Repository (Ariel)”. When all set, click “Finish”. RapidMiner will show you the data you just imported.

 

 

 

EXPLANATION

 

 

 

 

The Repository panel, in the upper left corner by default, is the place to store all your data, processes, and
results. You should always import data into the repository. This will simplify the design of analytical processes a lot since RapidMiner’s repository stores the describing meta data together with the data.

 

2/5

Add data to the process.

 

 

ACTIVITY

 

 

 

 

  1. Click the Design tab to return to the Process panel.
  2. Drag the imported Altoona Crime Rates data from the Repository panel into the Process panel.

 

 

 

 

EXPLANATION

 

 

 

 

When you drag data from the repository into the process, it transforms into a data-loading operator (in this case, Retrieve Altoona Crime Rates). Data is not actually loaded (or delivered at the round output ports of each operator) until you run the process, which is what we do in the next few steps.

 

 

3/5

Create a connection in the process for results.

 

 

ACTIVITY

 

 

 

 

  1. Connect the output port of Retrieve Altoona Crime Rates with the result port (“res”) on the
    right side of the Process panel.
  2. Make the connection either by dragging a line between the ports, or by clicking first on one port and then on the other port.

 

 

 

 

EXPLANATION

 

 

 

 

Only data which is delivered to one of the result ports (“res”) on the right can be seen after the execution of the process. If your process does not have at least one connection to a result port, you won’t see any results when you execute it!

 

 

4/5

Execute the process.

 

 

ACTIVITY

 

 

 

 

  1. Click   to execute the process.

 

 

 

EXPLANATION

 

 

 

 

Once run, you automatically switch to the Results
view where your results are displayed. At any time, you can click the Design tab to return to the Process panel.

 

 

5/5

Inspect data using summary statistics.

 

Congratulations, you have just imported your first data set! You will import data from now on in the same way. Even just simply importing data in RapidMiner gives you a lot of useful information. See the Challenge questions below for some examples.

 

 

CHALLENGE

 

 

 

 

Using the
Statistics tab in the Results view, answer the following
questions:

  1. By Offense Code: What are the most common, and least common crimes in
    Altoona?
  2. By Sex: Are crimes more often committed by men or women?
  3. By Age: What group has committed the highest number of offenses for a
    single crime in a single month – adults, or juvenile offenders?
  4. By Age: What specific age group has committed the highest number of
    offenses for a single crime in a single month?
  5. By Race: What racial group has committed the highest number of offenses for a
    single crime in a single month?
  6. By Ethnicity: What ethnic group has committed the highest number of
    offenses for a single crime in a single month?
  7. What is the profile of the average person committing crimes in Altoona? Look at the
    max values for Offense Code, and Sex; average values for Adult vs.
    Juvenile, Age, Race, Ethnicity.

 

 

Next Page: RapidMiner Module 3: Merging & GroupingPrevious Page: RapidMiner Module 2: Filtering & Sorting

Follow me on Twitter

My Tweets
 Visit the Pennsylvania State University Home Page
Copyright 2025 © The Pennsylvania State University Privacy Non-Discrimination Equal Opportunity Accessibility Legal