1/4
Aggregate the data by attributes of interest.
In this tutorial, we pivot the Altoona crime data from long table to wide table format, then use advanced renaming of the attributes to make the final outcome neater. Pivoting rotates the data from a long table format (one attribute with a lot of examples) into a wide table format (lots of attributes with a single example). This transformation is especially useful after an aggregation since the aggregated information is always stored in a long format, while machine learning models often need the data to be stored in the wide
format.
For example, we can use pivoting to show how many months in our dataset each sex committed each type of crime, and whether both sexes commit all types of crime or not.
|
EXPLANATION |
|
|
|
|
The Results view shows you a table with three columns: Offense Code, Sex, and count(Sex), which counts how many months in the dataset had this offense code committed by this sex. For instance, 01A-Murder and Non negligent Manslaughter has been committed by F (female offenders) in exactly 2 months in the dataset. All
|
2/4
Pivot the data from long to wide format to allow
side-by-side comparison.
Next, we transform the long table format into wide so that we can compare male and female offenders side-by-side.
|
EXPLANATION |
|
|
|
|
The Results view shows you a table with less rows than before. Now each single row represents one crime from the Offense Code attribute (the group attribute parameter). On the other hand, F and M, the two different values from the Sex attribute (the index attribute) are now the two other columns – and the actual values in these two columns are the counts from aggregation. It is now easy to compare side-by-side: For instance, 01A-Murder and Non negligent Manslaughter has been committed in 2 months by females, and 3 months by males. Even more interesting, we see that the next crime, 01B-Manslaughter by Negligence has never been committed by females in our dataset!
|
3/4
Rename all new columns for a neater presentation.
The names of the new columns reflect how they were created. If we want to simplify their names, we can use operator Rename to rename each individually, or we can use Rename by Replacing which is handier for replacing multiple attributes simultaneously, and which we demonstrate below:
|
EXPLANATION |
|
|
|
|
Your Results view should now have columns that say Sex F, and Sex M.
You might already be familiar with regular expressions (this is the name for the strange parameters we have used for the renaming). The expression you have used for replace what means that you look for something, designated by (.*), between count( and )_ and then something else, again designated by (.*), after the underscore. Each time you use round brackets, you define a new so-called capturing group which you can refer to in the definition of the replacement. Since the round brackets have a special meaning here in the original expression, we need to quote the brackets in the name itself with a backslash. Finally, we can use the capturing groups in the replace by parameter with the dollar sign and the number identifying the group. $1 is for the content of the first group, which always happens to be “Sex”, and $2 is for identifying the second group. Those are the two different sexes “F”, and “M”.
We are using Store because we will need the newly formatted output in the next Module.
|
4/4
Analyze female and male offenders side-by-side with the new
pivoted table.
Congratulations! You just successfully performed a pivot of the data, and renamed it, making it easy to compare female and male monthly crimes side-by-side.
Next Page: Previous Page: