1/6
Retrieve the data, and note its different types and roles.
In this tutorial, we prepare data for modeling by changing column types and roles, i.e. determining which column (or variable) should be predicted by what other columns (or variables) in our dataset. We then predict the profile of Altoona offenders most likely to commit a crime based on sex, age, race, and ethnicity. In our modeling, we use a classic machine learning method called Decision Tree.
|
EXPLANATION |
|
|
|
|
Type defines the attribute’s (column’s) possible values, e.g. values can be date_time (Jan 1, 2013 12:00:00 AM EST), polynominal (01ABC-4DEF), integer (0, 1, 2, 3, -10), etc.
Role describes how the attribute will be used by machine learning operators. Attributes without an assigned role are by default regular attributes – these are used as inputs by machine learning operators. All of our attributes currently have a regular role. Attributes which we want to predict need to be assigned the role of label attributes (sometimes also called target or class attributes) – these are used as outputs by machine learning operators.
Say we want to predict the total number of crimes committed by adults. In this case, we would choose the attribute Adult Total, and change its role to label. This is what we do next.
|
2/6
Change the target attribute’s type for predictive modeling by discretizing it.
|
ACTIVITY |
|
|
|
|
|
|
EXPLANATION |
|
|
|
|
Discretization is a common technique to transform an attribute’s type from numerical to polynominal (a nominal with more than 2 values), a type needed for the machine learning method called Decision Tree which we will be using later on.
“Binning” categorizes each Adult Total example (row) as one of two “bins” (groups) covering equal parts of the overall range of values. Discretization then replaces the original numerical value in each example (row) with the name of the “bin” the value belongs to.
|
3/6
Change the target attribute’s role for predictive modeling to label.
|
EXPLANATION |
|
|
|
|
Step 2 above essentially tells the machine (i.e. computer) that attribute Adult Total is the one that needs to
In Results view, Data tab, you can see that the column of attribute Adult Total is now in different color
|
4/6
Keep only attributes relevant to predicting your target
attribute.
|
ACTIVITY |
|
|
|
|
|
|
EXPLANATION |
|
|
|
|
We excluded all juvenile attributes because we are focusing on adult offenders only, and keeping the details of juvenile offenses would not help us better predict adult offenses.
|
5/6
Predict target attribute using a decision tree model.
|
ACTIVITY |
|
|
|
|
|
|
EXPLANATION |
|
|
|
|
In Design view, notice how between different operators the connections are blue until the Decision Tree operator (so-called data connections), and then they are green after that operator (so-called model connections,
In Results view, you can see the resulting decision tree. How to interpret it? Start from the most right tree “branch”, and work leftward. We see that this branch has the thickest arrows, meaning the majority of data is
|
6/6
Practice machine learning a bit more.
Congratulations! We just finished our first machine learning model – a simple decision tree. As datasets become bigger and more complex, machine learning models like the decision tree become more and more useful in quickly giving us insights from the data that we would not have found on our own, or would have needed significantly more time for.
|
CHALLENGE |
|
|
|
|
|
Next Page: Previous Page: