1/5
Add data into the process.
In the last tutorial, we learned how to retrieve your data, now it is time to learn how to use that dataset and manipulate it so you can actually understand what it is saying. In this tutorial, you we will apply a filter to
the Altoona crime rates data to only look at the most common crimes committed by juvenile offenders.
|
ACTIVITY |
|
|
|
|
|
|
EXPLANATION |
|
|
|
|
Fun fact: in RapidMiner, rows are called examples, columns are called attributes, and data tables are called example sets. It is good to know this beforehand, because when we work on rows, we will be calling operators that work on examples, attributes, or the whole examples set.
|
2/5
Filter down to the data of interest.
|
ACTIVITY |
|
|
|
|
|
|
EXPLANATION |
|
|
|
|
Remember, rows are called examples, so the Filter Examples operator is basically saying “filter rows by some criteria”, and its settings in the Parameters panel say “the criteria is all rows with Juvenile Total being greater than 0”.
When you add an operator to your process, you should immediately connect it to the previous operators in the process. Remember that data flows between operators, so an operator’s connection can influence its parameters. For instance, how could the Filter Examples operator “know” about the column Juvenile Total, if it is not connected to the data source?
|
3/5
Sort data by the attribute of interest.
|
EXPLANATION |
|
|
|
|
Remember, columns are called attributes, so the Sort operator above is saying “sort the data by column Juvenile Total in a decreasing order”.
|
|
CHALLENGE |
|
|
|
|
|
4/5
Aggregate data to get more information of interest.
|
EXPLANATION |
|
|
|
|
The Aggregate operator above is saying “sum the data in column Juvenile Total by column Offense Code”. As a result, we got the sum of all juvenile crimes broken down by offense code, i.e. the type of crime they committed, across all months in our dataset. Also, note the Aggregate operator dropped all other attributes.
|
|
CHALLENGE |
|
|
|
|
|
5/5
Congratulations & recap.
Congratulations! You have learned how to sort and filter data in RapidMiner using different operators and approaches. You can sort either by using the Sort operator, or by simply clicking on the top of that column in Results view. You can filter either by using the Filter Examples operator, or by using the Aggregate
operator. These operators will continue to come up routinely in your future work.
Next Page: Previous Page: