1/6
Set a macro for maximum size of example subsets.
What if you wanted to get a specific sample of your dataset, e.g. exactly 500 examples for each value of the attribute Sex (500 for Female and 500 for Male)? You can loop over the 2 sexes and sample the examples for each sex individually down to 500 or less.
|
ACTIVITY |
|
|
|
|
|
|
EXPLANATION |
|
|
|
|
We use a macro to define the maximum size of the example subsets we want to get, because that way we can easily change it from 500 to some other size, and the macro will automatically update that in all later operators in the process which use the maximum size in their calculations.
|
2/6
Introduce a loop.
|
ACTIVITY |
|
|
|
|
|
3/6
Inside the loop, start looping over attribute’s different
values.
|
ACTIVITY |
|
|
|
|
|
|
EXPLANATION |
|
|
|
|
In step 2, when we write %{loop_value}, we are calling the iteration macro loop_value from the Loop Values operator’s settings. Remember, we set the Loop Values operator’s attribute to Sex, so loop_value will take on different values of attribute Sex: in the first iteration it will do the entire sub-process as Female, in the second iteration as Male.
In step 4, when we write %{max size}, we are calling the max size macro that we previously defined with the Set Macro operator. Remember, we set it to 500.
The Branch operator works as a condition operator that says “if the condition is true, then do first action, else
|
4/6
Inside the sub-loop, reduce the example subset size to 500.
|
ACTIVITY |
|
|
|
|
|
|
EXPLANATION |
|
|
|
|
We just defined the actions that the Branch operator does for each of its 2 outcomes, then and else.
The Branch operator now says “if the example set has maximum 500 examples, then do nothing, else use Sample operator to sample it down to 500”.
The Loop Values operator within which all of this is taking place says “repeat this for each different value of the
So in conclusion, we are looping over both sexes of the offenders, and for each sex reducing the subset size to 500 examples, which is exactly what we wanted to do.
|
5/6
Append all sub-loop outputs together.
|
EXPLANATION |
|
|
|
|
Notice the doubled line between Loop Values and Append. This means that the output of Loop Values is actually a collection of example sets instead of a single example set. The loop will in fact deliver one example set for each of the different values of attribute Sex, one for Female and one for Male. This is why we added the Append operator: it combines both sets into one single set again.
|
6/6
Practice looping more.
Congratulations! You have just mastered the complex but very useful concepts of looping, and branching. The nested sub-processes may take some time getting used to, but they can help create very powerful models.
|
CHALLENGE |
|
|
|
|
|
Next Page: Previous Page: