• Log In
 Visit the Pennsylvania State University Home Page

Data Science Tools

  • Home
  • About
  • Data Exploration
    • Lab 1 RapidMiner Modules
      • RM Module 1: Accessing Data
      • RM Module 2: Filtering & Sorting
      • RM Module 3: Merging & Grouping
      • RM Module 4: Creating & Removing Columns
      • RM Module 5: Changing Types & Roles for Modeling
      • RM Module 6: Normalization & Detecting Outliers
      • RM Module 7: Pivoting & Advanced Renaming
      • RM Module 8: Handling Missing Values
      • RM Module 9: Macros & Sampling
      • RM Module 10: Looping & Branching
    • Lab 1 Tableau Modules
      • T Module 1: Accessing Data
      • T Module 2: Filtering & Sorting
      • T Module 3: Merging & Grouping
      • T Module 4: Creating & Hiding Columns
      • T Module 5: Predictive Modeling
    • Lab 1 R Modules
      • R Module 1: Accessing Data
      • R Module 2: Filtering & Sorting
      • R Module 3: Merging & Grouping
      • R Module 4: Creating & Removing Columns
      • R Module 5: Predictive Modeling
      • R Module 6: Normalization & Detecting Outliers
      • R Module 7: Pivoting
      • R Module 8: Handling Missing Values
      • R Module 9: Sampling
      • R Module 10: Looping
  • Machine Learning
  • Data Cleaning
  • Text Analysis
  • Help

RapidMiner Module 9: Macros & Sampling

1/5

Set a macro value.

 

What if you wanted to send someone a smaller sample of your dataset, e.g. 50% of the original dataset? You can use macros to calculate a new example set size for any dataset; then sample the data down to this new size. Macros are like variables. You can use them to dynamically store and load values in your process.

 

 

ACTIVITY

 

 

 

 

  1. Drag the stored Altoona Crime Rates into the Process.
  2. Add the Set Macro operator. Connect it.
  3. Click on the Set Macro operator, then in the Parameters panel set macro to fraction and value to 0.5.

                                                                                     

 

 

EXPLANATION

 

 

 

 

Step 3 shows how macros are like variables – you define their name, and value. This particular macro is a constant, 0.5.

 

 

2/5

Extract a macro value from the dataset.

 

 

ACTIVITY

 

 

 

 

  1. Add the Extract Macro operator. Connect it.
  2. Click on the Extract Macro operator, then in the Parameters panel set macro to size and macro trype to number_of_examples.

 

 

 

EXPLANATION

 

 

 

 

Unlike Set Macro which defines a macro from scratch, Extract Macro defines a macro from the existing dataset.

 

 

3/5

Calculate a new macro from other macros.

 

 

ACTIVITY

 

 

 

 

  1. Add the Generate Macro operator. Connect it.
  2. Click on the Generate Macro operator, then in the Parameters panel, function descriptions set macro name to new size and functions expressions to round(eval(%{size})*eval(%{fraction})).

 

 

 

EXPLANATION

 

 

 

 

The formula in Step 2 essentially says “round up the value of macro size multiplied by the value of macro fraction”.

 

When using macros in formulas, you have to use them in the %{macro} format, for instance %{fraction}.
The eval() command returns the value of the macro in the brackets. Without eval, the computer would consider %{fraction} to be just a set of characters %, {, fraction, }, but with eval(%{fraction}), the computer considers it
to be a number, 0.5 in our case. The round command just rounds up a number. We are rounding because we want to know how many examples to keep in our new sample size, and a fraction of example would not make sense.

 

 

4/5

Sample the dataset to a smaller size.

 

 

ACTIVITY

 

 

 

 

  1. Add the Sample operator. Connect it on both sides (including to the res port).
  2. Click on the Sample operator, then in the Parameters panel set sample to absolute and sample size to %{new size}.
  3. Click   Run to execute the process.

 

 

 

EXPLANATION

 

 

 

 

In step 2 above the Sample operator uses the macro new size to determine the size of the sample it should produce. By definition new size is equal to the rounded up product of size and fraction. Going back to page 2, size is derived from the number of examples, which is for our dataset 2326 examples. Going back to page 1, we set fraction at 0.5. So in conclusion, new size = round (2326*0.5) = 1163. This is exactly the number of examples in our Results view.

 

 

5/5

Practice sampling more.

 

Congratulations! You have successfully produced a smaller random sample of your dataset.

 

 

CHALLENGE

 

 

 

 

  1. Try to build samples of 30% and 80% of the original size. What do you need to change? What are the resulting example set sizes?
  2. Replace the input example set with a different dataset. Do you need to change anything else or will the process execute just fine? Try it!
  3. Have a look into the Parameters of Sample. Do you find a setting which would allow you to create a 50% sample of the data without calculating the macros? Try to change the process to make it work.

 

 

 

 

Next Page: Lab 1 Tableau ModulesPrevious Page: RapidMiner Module 8: Handling Missing Values

Follow me on Twitter

My Tweets
 Visit the Pennsylvania State University Home Page
Copyright 2025 © The Pennsylvania State University Privacy Non-Discrimination Equal Opportunity Accessibility Legal