In this tutorial, we import data on Altoona crime rates into Tableau, and then inspect the data using summary statistics. Tableau is a visually oriented program, so we use a number of screen pictures throughout our Tableau tutorials.
Step 1: Connect the data
Tableau’s start page has 3 main panes:
- Connect – to connect to the data
- Open – to open recently used workbooks, and
- Discover – to discover content produced by the Tableau community.
- Our data is in Excel format, so in Connect pane, under To a File select Excel.
- A dialog box pops up – browse to the Lab 1 Data Altoona Crime Rates.xlsx file, and select it for opening
Step 2: Review the data connection
After connecting the data, Tableau switches from start page to data source page.
- The left pane shows you are connected to the “Lab 1 Al…a
Crime Rates” Excel file, and that it has only 1 sheet, Sheet1. - The top right pane called canvas shows that you are connected to Sheet1, and Tableau has therefore automatically titled your dataset “Sheet1 (Lab 1 Data Altoona CrimeRates)”. Canvas also shows your Connection is Live rather than an Extract.
- The bottom right pane called data grid shows the connected data’s first 1,000 rows.
- Hover the mouse in canvas over title “Sheet1 (Lab 1 Data Altoona Crime Rates)”. The title’s text box
appears – click on it, and rename the resulting dataset to just “Altoona Crime Rates”, just to keep the name simpler. - Change the Connection in canvas from Live to Extract. We usually use Live when working
with a data source that is updating in real time. For all other purposes, Extract is usually more practical because it creates an extracted copy of the data, so even if we move the original data source to a different location, Tableau can still work using the extracted copy. Wait until text appears under Live button, stating “Extract will include all data”. - Click on Sheet 1 tab in the bottom left corner to start working with the extracted data.
- A window will pop out, prompting you to save the extracted data – save it as “Altoona Crime Rates” (it will save as Tableau Extract).
Step 3: Build a data view
After selecting Sheet 1, Tableau switches from data source page to the first worksheet.
- On the left is the Side bar with Data and Analytics panes. The Data pane contains all columns from the extracted dataset, categorized into Dimensions (categorical data like Month, Offense Code, Sex), and Measures (numerical data like numbers of Adult Ethnic Hispanic offenders, Adult
Ethnic Non-Hispanic offenders, etc.). The Analytics pane contains different ways to analyze the data we select. - Towards the middle are the cards (with Pages, Filters, and Marks), and shelves (with Columns, and Rows), both of which are used to build a data view.
- On the middle and lower right is the view, which displays all data visualizations we do.
For demonstration, we will now build a simple data view. Say we use Dimension Month and Measure Adult Ethnic Hispanic:
- Click on Dimension Month, and drag it from the
Data pane into Columns shelf.
2. Similarly, click on Measure Adult Ethnic Hispanic, and drag it from the Data pane into Rows shelf.
Step 4: Upgrade the data view
- Note in the shelves that Tableau by default aggregates all variables: Month has been aggregated
by YEAR, and Adult Ethnic Hispanic has been aggregated as a SUM. Blue bubble indicates discrete variable; green indicates continuous variable. - The resulting view shows that the sum of adult offenders who are ethnically Hispanic increased every year until 2015, and then decreased every year since (bear in mind that the dataset only goes to mid-2017, but we can ignore that inconsistency for our current purposes).
We We now upgrade this view with various details:
1. Want to know the exact number for any point in the graph? Hover the mouse
over that point:
Want 2. Want to see the exact number for all points in the graph? In Marks card, click on Label, and check Show mark labels:
3. Want to see the exact number for only maximum & minimum? In Marks card, click on Label, and under Marks to Label choose Min/Max:
For now, go back to seeing all points – under Marks to Label choose All.
4.Want to view the data in a different chart, e.g. bar chart? In Marks card, click on the horizontal menu saying Automatic and change it to Bar:
Note: In Marks card, you may need to check the box saying “Allow labels to overlap other marks” if that box is not already checked, to make sure all your bars are showing values. Keep this box checked throughout your work.
5. Want to sort data by size? In the view, hover the mouse above the vertical axis saying “Adult Ethnicity Hispanic” – a small sorting icon shows up. Click on the sorting icon to sort all bars by size – one click sorts largest first, two clicks sort smallest first, and third click cancels sorting by size.
6. Want to see the bars horizontally, rather than vertically? Swap Columns & Rows in your view – in top toolbar, click on the swap icon
7. Want to compare multiple variables, e.g. Adult Ethnic Hispanic offenders to Adult Ethnic Non-Hispanic?Similar to before, click on Measure Adult Ethnic Non-Hispanic, and drag it from the Data pane into Columns shelf, next to the already existing “SUM(Adult Ethnic Hispanic)” green bubble:
We can see that adult non-Hispanic offenders constitute a much larger group than adult Hispanic offenders, across all years in the dataset.
If your graphs are too large to fully display on screen, you can easily adjust the height and width by hovering around the appropriate graph borders:
8. Want to look at the variables’ maximum values, rather thansums? Select both bubbles – “SUM(Adult Ethnic Hispanic)”, and “SUM(Adult Ethnic Non-Hispanic)” – you can select both by pressing down the CTRL key, then selecting them. Hover the mouse above either one to get the drop-down arrow – click on the arrow, and then change Measure (Sum) to Maximum:
The view updates with maximum values for both variables replacing the sums, and the bubbles are renamed “MAX(Adult Ethnic Hispanic”, and “MAX(Adult Ethnic Non-Hispanic)” respectively:
With the new view, it is easy to see that the maximum number of crimes committed by adult Hispanic offenders in a single month (because we know our data is all monthly) was 10 crimes, in 2017, while the maximum number of crimes committed by adult non-Hispanic offenders in a single month was 71 crimes, also in 2017.
9. Want to look at a single variable by itself, e.g. how many times every year appears in the data set? Clear the Columns shelf – hover the mouse above the “Columns” text; press the drop-down arrow that appears, and select Clear Shelf:
Next, click on Dimension Month (yes, the same variable that you already have in the Rows shelf), and drag it from the Data pane into Columns shelf:
We want to count how many times every year appears – so we should take the total count of times variable Month appears, and break it down by year.
To count the number of times variable Month appears, in Columns shelf, hover the mouse on top of YEAR(Month), click on the drop-down arrow, and under Measure select Count:
The resulting view shows us exactly what we wanted – a count of the number of times variable Month
appears, broken down by year, or interpreted another way, how many times each year appears in the dataset. 2014 appears in the dataset the most times, 520:
10. Want to break down the view by quarters, or months, rather than by years? In Rows shelf, click the plus sign (+) on YEAR(Month) – the view breaks down by quarter; click (+) on QUARTER(Month) – the view breaks down by month:
This Review
This Module showed how to connect the data, build a view, and update it in different ways. When you look at this introduction, the basic, core steps to building any view are:
- The variables you put into the Columns and Rows shelves, and how you set them up (e.g. SUM vs. MAX vs. CNT, or YEAR vs. QUARTER vs. MONTH, etc.).
This determines what variable(s) we want to observe, and in what relation.
The Marks card options to format the view as needed.
This helps get the insight we want (i.e. answer we want) from our observed variable(s).
Challenges:
Practice what you just learned by answering the following questions Focus on the core elements that build a view – (1) variables you put into the Columns and Rows shelves, and (2) the Marks card options.
Answer each question in a new worksheet.
- By Offense Code: What are the most common, and least common crimes in Altoona (i.e. what Offense Codes appear in the dataset most and least times)?
Hint: See Step 4.9. Break down types of Offense Code by the number of Offense Codes done.
- By Sex: Are crimes more often committed by men or women (i.e. what Sex appears in the dataset most times)?
Hint: See Step 4.9. Break down types of Sex (F, M) by the number for each Sex.
- By Age: What group has committed the highest number of offenses for a single crime in a single month – adults, or juvenile offenders?
Hint: See Steps 4.3 (on marking only maxima and minima), 4.7, (on comparing multiple variables), and 4.8 (on aggregating variables by MAX rather than SUM).
- By Age: What specific age group has committed the highest number of offenses for a single crime in a single month?
Hint: Same like question 3 above, except you will have a lot more variables for specific age groups.
- By Race: What racial group has committed the highest number of offenses for a single crime in a single month?
Hint: Same approach like question 3 above.
- By Ethnicity: What ethnic group has committed the highest number of offenses for a single crime in a single month?
Hint: Same approach like question 3 above.
- What is the profile of the average person committing crimes in Altoona? Look at the max values for Offense Code, and Sex; average values for Adult vs. Juvenile, Age, Race, Ethnicity.
Hint: Summarize in words your finding from questions 1-6. You already have answers 1-2 set up, and you just need to change answers 3-6 (drop Offense Code and change the other shelf’s aggregation of variables to AVG to get the average).
Next Page: Previous Page: