• Log In
 Visit the Pennsylvania State University Home Page

Data Science Tools

  • Home
  • About
  • Data Exploration
    • Lab 1 RapidMiner Modules
      • RM Module 1: Accessing Data
      • RM Module 2: Filtering & Sorting
      • RM Module 3: Merging & Grouping
      • RM Module 4: Creating & Removing Columns
      • RM Module 5: Changing Types & Roles for Modeling
      • RM Module 6: Normalization & Detecting Outliers
      • RM Module 7: Pivoting & Advanced Renaming
      • RM Module 8: Handling Missing Values
      • RM Module 9: Macros & Sampling
      • RM Module 10: Looping & Branching
    • Lab 1 Tableau Modules
      • T Module 1: Accessing Data
      • T Module 2: Filtering & Sorting
      • T Module 3: Merging & Grouping
      • T Module 4: Creating & Hiding Columns
      • T Module 5: Predictive Modeling
    • Lab 1 R Modules
      • R Module 1: Accessing Data
      • R Module 2: Filtering & Sorting
      • R Module 3: Merging & Grouping
      • R Module 4: Creating & Removing Columns
      • R Module 5: Predictive Modeling
      • R Module 6: Normalization & Detecting Outliers
      • R Module 7: Pivoting
      • R Module 8: Handling Missing Values
      • R Module 9: Sampling
      • R Module 10: Looping
  • Machine Learning
  • Data Cleaning
  • Text Analysis
  • Help

RapidMiner Module 10: Looping & Branching

1/6

Set a macro for maximum size of example subsets.

 

What if you wanted to get a specific sample of your dataset, e.g. exactly 500 examples for each value of the attribute Sex (500 for Female and 500 for Male)? You can loop over the 2 sexes and sample the examples for each sex individually down to 500 or less.

 

 

ACTIVITY

 

 

 

 

  1. Drag the stored Altoona Crime Rates into the Process.
  2. Add the Set Macro operator. Connect it.
  3. Click on the Set Macro operator, then in the Parameters panel set macro to max size and value to 500.

                                                                                     

 

 

EXPLANATION

 

 

 

 

We use a macro to define the maximum size of the example subsets we want to get, because that way we can easily change it from 500 to some other size, and the macro will automatically update that in all later operators in the process which use the maximum size in their calculations.

 

 

2/6

Introduce a loop.

 

 

ACTIVITY

 

 

 

 

  1. Add the Loop Values operator. Connect it.
  2. Click on the Loop Values operator, then in the Parameters panel set attribute to Sex. Note
    that iteration macro is called loop_value.
  3. Double-click on the Loop Values operator. Its sub-process opens.

 

 

 

EXPLANATION

 

 

 

 

A loop takes all values of an attribute, and does the same process for each value, e.g. if the attribute is Sex with 2 different values (Female and Male), and the process we set is to reduce an example set to 500 for each different value, the iteration macro loop_value will first take all Female examples and reduce that subset
to 500 examples, then take all Male examples and reduce that subset to 500 examples.

 

The branching icon    means that an operator can have a sub-process, i.e. it can have other operators nested within it. Whenever you are in a sub-process, like now, you will see it written in the Process panel’s top left corner.

 

 

3/6

Inside the loop, start looping over attribute’s different
values.

 

ACTIVITY

 

 

 

 

  1. While in the Loop Values sub-process, add the Filter Examples operator. Connect it on the left to the sub-process’ inp port.
  2. Click on the Filter Examples operator, then in the Parameters panel set filters to Sex, equals, and %{loop_value}.
  3. Add the Branch operator. Connect it on the left to the Filter Examples operator, and on the right to the out port of the sub-process.
  4. Click on the Branch operator, then in the Parameters panel set condition type to max_examples, and condition value to %{max size}.
  5. Double-click on the Branch operator. Its sub-process opens

 

 

 

EXPLANATION

 

 

 

 

In step 2, when we write %{loop_value}, we are calling the iteration macro loop_value from the Loop Values operator’s settings. Remember, we set the Loop Values operator’s attribute to Sex, so loop_value will take on different values of attribute Sex: in the first iteration it will do the entire sub-process as Female, in the second iteration as Male.

 

In step 4, when we write %{max size}, we are calling the max size macro that we previously defined with the Set Macro operator. Remember, we set it to 500.

 

The Branch operator works as a condition operator that says “if the condition is true, then do first action, else
do second action”. We set its condition to be that the maximum size of an example set is max size (which we
set before in the macro to 500). So when the loop_value is Female, if there are maximally 500 examples where the Sex attribute’s value is Female, then the Branch operator will do first action, else (if
there are more than 500 examples) it will do the second action”. What these actions are, we define on the next
page in the Branch operator’s sub-process.

 

 

4/6

Inside the sub-loop, reduce the example subset size to 500.

 

ACTIVITY

 

 

 

 

  1. While in the Branch sub-process Then, just connect the top input port on the left with the top output port on the right, without adding any operators.
  2. While in the Branch sub-process Else, add the Sample operator. Connect it on both sides.
  3. Click on the Sample operator, then in the Parameters panel set sample to absolute and sample size to %{max size}.

 

 

 

EXPLANATION

 

 

 

 

We just defined the actions that the Branch operator does for each of its 2 outcomes, then and else.

 

The Branch operator now says “if the example set has maximum 500 examples, then do nothing, else use Sample operator to sample it down to 500”.

 

The Loop Values operator within which all of this is taking place says “repeat this for each different value of the
attribute Sex”.

 

So in conclusion, we are looping over both sexes of the offenders, and for each sex reducing the subset size to 500 examples, which is exactly what we wanted to do.

 

 

5/6

Append all sub-loop outputs together.

 

 

ACTIVITY

 

 

 

 

  1. Click in the Process panel’s top left corner on the blue Process link to navigate back from the sub-processes to the main process.
  2. Add the Append operator. Connect it on both sides.
  3. Click   Run to execute the process.

 

 

 

EXPLANATION

 

 

 

 

Notice the doubled line between Loop Values and Append. This means that the output of Loop Values is actually a collection of example sets instead of a single example set. The loop will in fact deliver one example set for each of the different values of attribute Sex, one for Female and one for Male. This is why we added the Append operator: it combines both sets into one single set again.

 

 

6/6

Practice looping more.

 

Congratulations! You have just mastered the complex but very useful concepts of looping, and branching. The nested sub-processes may take some time getting used to, but they can help create very powerful models.

 

 

CHALLENGE

 

 

 

 

  1. How big is your resulting dataset? How many examples does it have? Does attribute Sex have more than
    500 examples for either its Female or Male value?
  2. How should you change the process so that instead of 500 examples 1000 are kept from each sex?
  3. How should you change the process so that instead of attribute Sex, you are keeping 20 examples from each value of attribute Offense Code? How many examples does the resulting dataset have?

 

 

 

 

Next Page: RapidMiner Module 2: Filtering & SortingPrevious Page: Lab 1 RapidMiner Modules

Follow me on Twitter

My Tweets
 Visit the Pennsylvania State University Home Page
Copyright 2025 © The Pennsylvania State University Privacy Non-Discrimination Equal Opportunity Accessibility Legal