• Log In
 Visit the Pennsylvania State University Home Page

Data Science Tools

  • Home
  • About
  • Data Exploration
    • Lab 1 RapidMiner Modules
      • RM Module 1: Accessing Data
      • RM Module 2: Filtering & Sorting
      • RM Module 3: Merging & Grouping
      • RM Module 4: Creating & Removing Columns
      • RM Module 5: Changing Types & Roles for Modeling
      • RM Module 6: Normalization & Detecting Outliers
      • RM Module 7: Pivoting & Advanced Renaming
      • RM Module 8: Handling Missing Values
      • RM Module 9: Macros & Sampling
      • RM Module 10: Looping & Branching
    • Lab 1 Tableau Modules
      • T Module 1: Accessing Data
      • T Module 2: Filtering & Sorting
      • T Module 3: Merging & Grouping
      • T Module 4: Creating & Hiding Columns
      • T Module 5: Predictive Modeling
    • Lab 1 R Modules
      • R Module 1: Accessing Data
      • R Module 2: Filtering & Sorting
      • R Module 3: Merging & Grouping
      • R Module 4: Creating & Removing Columns
      • R Module 5: Predictive Modeling
      • R Module 6: Normalization & Detecting Outliers
      • R Module 7: Pivoting
      • R Module 8: Handling Missing Values
      • R Module 9: Sampling
      • R Module 10: Looping
  • Machine Learning
  • Data Cleaning
  • Text Analysis
  • Help

RapidMiner Module 2: Filtering & Sorting

1/5

Add data into the process.

 

In the last tutorial, we learned how to retrieve your data, now it is time to learn how to use that dataset and manipulate it so you can actually understand what it is saying. In this tutorial, you we will apply a filter to
the Altoona crime rates data to only look at the most common crimes committed by juvenile offenders.

 

ACTIVITY

 

 

 

 

  1. Drag the already imported Altoona Crime Rates data from the Repository panel into the Process panel.

 

 

 

EXPLANATION

 

 

 

 

Fun fact: in RapidMiner, rows are called examples, columns are called attributes, and data tables are called example sets. It is good to know this beforehand, because when we work on rows, we will be calling operators that work on examples, attributes, or the whole examples set.

 

2/5

Filter down to the data of interest.

 

 

ACTIVITY

 

 

 

 

  1. Search for the Filter Examples operator using the search box at the top of the Operator panel.
  2. Drag Filter Examples into the Process panel.
  3. Connect the output port of Retrieve Altoona Crime Rates with the input port of Filter Examples.
  4. Click on Filter Examples to select it; then in the Parameters panel click Add Filters to define a filter, and set it to Juvenile Total, >, and 0.00.

 

 

 

 

EXPLANATION

 

 

 

 

Remember, rows are called examples, so the Filter Examples operator is basically saying “filter rows by some criteria”, and its settings in the Parameters panel say “the criteria is all rows with Juvenile Total being greater than 0”.

 

When you add an operator to your process, you should immediately connect it to the previous operators in the process. Remember that data flows between operators, so an operator’s connection can influence its parameters. For instance, how could the Filter Examples operator “know” about the column Juvenile Total, if it is not connected to the data source?

 

 

3/5

Sort data by the attribute of interest.

 

 

ACTIVITY

 

 

 

 

  1. Search for the Sort operator, and then drag it into the Process.
  2. Connect the output port of Filter Examples with the input port of Sort.
  3. Click on Sort to select it; then in the Parameters panel set attribute name to Juvenile Total, and sorting direction to decreasing.
  4. Connect the output port of Sort to the result port (“res” port) on the right of the Process panel.
  5. Click    Run to execute the process.

 

 

 

EXPLANATION

 

 

 

 

Remember, columns are called attributes, so the Sort operator above is saying “sort the data by column Juvenile Total in a decreasing order”.

 

 

 

CHALLENGE

 

 

 

 

  1. Looking at the resulting data, what is the most common crime that male juvenile offenders committed in one month? What about the female juvenile offenders?
  2. In what month did both of the above common crimes occur?

 

 

4/5

Aggregate data to get more information of interest.

 

 

ACTIVITY

 

 

 

 

  1. Return to Design view, and disconnect operator Sort from the “res” port.
  2. Search for the Aggregate operator, and then drag it into the Process.
  3. Connect the output port of Sort with the input port of Aggregate.
  4. Click on Aggregate to select it; then in the Parameters panel set aggregation attributes to Juvenile Total
    (for aggregation attribute), and sum (for aggregation functions); also set group by attributes to Offense Code.
  5. Connect the output port of Aggregate to the “res” port.
  6. Click    Run to execute the process.

 

 

 

EXPLANATION

 

 

 

 

The Aggregate operator above is saying “sum the data in column Juvenile Total by column Offense Code”. As a result, we got the sum of all juvenile crimes broken down by offense code, i.e. the type of crime they committed, across all months in our dataset. Also, note the Aggregate operator dropped all other attributes.

 

 

 

CHALLENGE

 

 

 

 

  1. Looking at the resulting data, what is the most common crime that juvenile offenders committed across all months? How many times was it committed?
  2. Change the process so it looks at adult offenders rather than juvenile offenders, and answer questions 1-3 for them.

 

 

5/5

Congratulations & recap.

 

Congratulations! You have learned how to sort and filter data in RapidMiner using different operators and approaches. You can sort either by using the Sort operator, or by simply clicking on the top of that column in Results view. You can filter either by using the Filter Examples operator, or by using the Aggregate
operator. These operators will continue to come up routinely in your future work.

 

 

 

Next Page: RapidMiner Module 1: Accessing DataPrevious Page: RapidMiner Module 10: Looping & Branching

Follow me on Twitter

My Tweets
 Visit the Pennsylvania State University Home Page
Copyright 2025 © The Pennsylvania State University Privacy Non-Discrimination Equal Opportunity Accessibility Legal