1/4
Expand the new combined dataset with new attributes for even
more insights.
Once we have combined multiple datasets for new insights, it is possible to get even more insights by creating new columns, and then focusing on those insights by removing some old columns. The new columns may have formulas which explain our data in a new way. The old columns may have data
that is not of interest in the given analysis.
In this tutorial, we are going to create & remove columns in our combined dataset to answer the following questions:
- What is Altoona’s crime rate, and has it really increased over the period, or not?
- Looking at the proportion of different racial and ethnic groups in Altoona’s population, are any groups significantly overrepresented, or underrepresented in recorded crimes?
|
ACTIVITY |
|
|
|
|
% Crimes by Black and [sum(Adult Race % Pop Black and [mode(Pop Race % Crimes by White and [sum(Adult Race % Pop White and [mode(Pop Race White)]/[mode(Pop % Crimes in Population and [sum(Adult
You can either copy/paste the above directly into the appropriate fields, or for function expressions you can try to get the same expressions by clicking on the little calculator symbol on the side, then when another new
|
|
EXPLANATION |
|
|
|
|
Remember, attributes are RapidMiner lingo for columns, socoperator Generate Attributes
The 5 new attributes (columns) we created above measure, respectively, the proportion of black criminals among all criminals, the proportion of black population in total population, the proportion of white criminals among all criminals, the proportion of white population in total population, and the proportion of total crimes in total population – essentially, the crime rate. The last attribute is literally one of the questions we had – Altoona’s crime rate over time.
|
2/4
Use new attributes to create even newer attributes.
|
ACTIVITY |
|
|
|
|
Diff Black and [% Crimes by Black]-[% Diff White and [% Crimes by White]-[% As before, you can either
|
|
EXPLANATION |
|
|
|
|
Notice how we are able to use the attributes created by the first Generate Attributes to create new
The 2 new attributes (columns) we created above measure, first, the difference between the proportion of crimes committed by black people, and the proportion of population composed of black people, and second, the difference between the proportion of crimes committed by white people, and the proportion of population composed of white people. These 2 attributes will help us answer the other question we had at the start – whether certain groups are overrepresented, or underrepresented in recorded crimes. E.g. if in a given month black people crime rate is higher than black people population rate, the difference will be positive, meaning that black people are overrepresented in crimes compared to their proportion in the overall population.
|
3/4
Remove unimportant columns to focus on the questions at
hand.
|
EXPLANATION |
|
|
|
|
The above steps select only 4 attributes (columns) to keep, the ones we need to answer
|
4/4
Answer more complex questions with ease.
Congratulations! You just successfully created new attributes, and removed old ones. Now you can easily answer the two questions from the start, as well as many more:
Next Page: Previous Page: