Lesson on Data Visualization and its Misuses

Posted by Keren Wang, FA 2024


In this session, we aim to achieve several key learning objectives:
  • Understand the fundamental principles of framing and visual rhetoric, exploring how they shape the design and interpretation of data visualizations.
  • Examine the art of designing and manipulating graphic systems of signs that disclose or conceal specific quantitative or qualitative information.
  • Identify common types of data visualizations, such as bar charts, pie charts, line graphs, and network graphs, along with their appropriate applications.
  • Recognize the advantages and potential misuses of data visualizations, including manipulative techniques like framing and scaling distortions.
  • Critically analyze real and hypothetical examples to detect misleading or biased visual representations.
  • Develop best practices for creating clear, honest, and effective data visualizations, ensuring accuracy and ethical integrity.

Photography and Visual Rhetoric

To truly grasp the fundamental principles and perils of data visualization, we must journey back to the birth of photography and photojournalism. When photography was first employed in news reporting, it carried an inherent demand for credibility. Unlike paintings or sketches, photographs were perceived as unfiltered, unmediated representations of reality. Ironically, as our discussion will reveal, even from its inception, photojournalism was subject to rhetorical manipulation.

The manipulation of visual information is exemplified by two notable early instances of war photography: Roger Fenton’s The Valley of the Shadow of Death (1855) and Timothy H. O’Sullivan’s Home of a Rebel Sharpshooter, Gettysburg (1863).

In “The Valley of the Shadow of Death,” taken on April 23, 1855 during the Crimean War, Fenton captured a desolate battlefield landscape strewn with cannonballs, as seen in the version on the left:

Controversy arose when another version of the photograph surfaced, as seen in the version on the right: one with cannonballs scattered across the road and another with the road largely clear.  [1] This discrepancy led to debates about whether Fenton had arranged the cannonballs to create a more dramatic scene, highlighting the potential for photographers to alter battlefield imagery to influence public perception.

Similarly, O’Sullivan’s Home of a Rebel Sharpshooter, Gettysburg (1863) depicts a fallen Confederate soldier positioned in a rocky enclave known as “Devil’s Den.” The carefully arranged placement of the rifle and the soldier’s posture evoke the idealized visual composition of a Renaissance painting:

Subsequent analysis revealed that the body had been moved approximately 40 yards from its original location, and the rifle was placed beside it to enhance the composition. This staging underscores the ethical dilemmas faced by early war photographers, who sometimes manipulated scenes to convey a particular narrative or emotional impact. [2]

From its inception, photographs intended to document reality were often doctored, staged, or framed to distort information and evoke specific emotional reactions. This reveals an essential truth: photographic visualization has always been more rhetorical than purely representational, subjected to the same, if not more subtle, forms of manipulation as speech and writing.

Visual Framing

Framing can  influence how a target audience interprets and responds to a message, by strategically emphasizing certain visual or textual elements while downplaying or obscuring others. [3] This technique can evoke different emotional reactions, guide opinions, or alter the perceived significance of an issue, ultimately steering the audience’s response in a desired direction. [4]

A notable example of visual framing is the incident involving India’s state-run Press Information Bureau (PIB) during the 2015 Chennai floods. The PIB released a photograph of Prime Minister Narendra Modi surveying the flood-affected areas from an aircraft window. However, the image was later revealed to be doctored, with a separate flood scene digitally inserted into the window to enhance the visual impact:

Similarly, data visualizations, which we often consider objective graphical representations of facts, operate under the same rhetorical principles. Like statistics, they can be strategically crafted to shape audience perception and elicit intended reactions. Whether through framing, selective emphasis, or visual distortions, data visualizations share the same capacity for manipulation as photographic narratives. [5] With this context in mind, let’s explore how these principles manifest across various types of visualizations, from timelines to bar charts and beyond, and uncover the rhetorical craft that underpins their design.


Timeline

A timeline is a visual representation of events arranged in chronological order. Unlike bar or line graphs, which typically focus on numeric data, timelines visualize the sequence of events. They help viewers understand the temporal relations between events and how they unfold over time.

Timelines can be oriented either horizontally or vertically. Events are plotted along a time axis and spaced according to when they occurred, and major milestones or periods can be highlighted with markers or annotations.

For instance, a timeline could illustrate the evolution of major classical philosophical figures from ancient China during the “Hundred Schools of Thought” period as seen in this example. By including select figures from ancient Greece and Rome on the opposite side of the time axis, the timeline provides a dual perspective, helping to contextualize these key figures within a broader historical framework.

Timelines are particularly useful for highlighting historical events or developments, such as the progression of a major war or the evolution of technological advancements.

Gantt chart is a specialized timeline used to show the sequence and duration of tasks in a project. One of the main advantages of Gantt charts is that they help organize and visualize complex sequences of events. Here is a more complex Gantt chart that breaks down the survey study into detailed subtasks for each major phase. This provides a clearer picture of the workflow, helping to manage and track each specific step in the process:

Misuse of Timeline – Incorrect Scaling:

A timeline with incorrect scaling occurs when events are spaced unevenly or inaccurately relative to their chronological distances. See  the example below:

In the timeline above, events that are 5,435 years apart (between the invention of ‘Writing Systems’ and ‘Electromechanical & Digital’ information technology) appear visually similar to the much greater span between ‘Writing Systems’ and the advent of ‘Oral, Representational, and Semaphoric’ systems over 100,000 years ago.

This can mislead viewers into thinking that events are either closer together or farther apart than they actually are. The inaccurate spacing may result in misinterpretations of historical progression or cause-and-effect relationships.

How to Fix It: Ensure equal time intervals (e.g., years or decades) are represented by equal physical spacing on the timeline:

In this corrected timeline with consistent time intervals and proportional scale, events that are 100,000 years apart should be visually twenty times as far apart as events that are 5,000 years apart.  If uneven spacing is unavoidable for readability, explicitly note the time differences between events.


Bar Chart

A bar chart or bar graph represents data with rectangular bars, where the length or height of each bar corresponds to the data value it represents. Bars can be plotted vertically or horizontally.

Each bar represents a specific category or group, with its length or height indicating the magnitude of the corresponding value. The bars are separated by spaces to emphasize that the data is discrete, rather than continuous.

Bar charts are commonly used to compare quantities across different categories, such as student enrollment figures for various majors. For example, if we want to compare the number of students enrolled in different majors at a university, a bar chart can present the enrollment figures for each major side by side, clearly showing which major is the most popular:

Bar graphs are particularly effective for highlighting differences, making it easy to identify the highest or lowest values at a glance. Bar charts are simple to construct and interpret, providing a quick visual comparison. They also have the advantage of being able to display both positive and negative values.

A grouped or clustered bar graph such as the one shown below compares two or more groups (sub-categories) within each category. They are commonly used for comparing data across different categories and sub-categories, such as generational differences in communication preferences:

A grouped bar graph is particularly effective for illustrating relationships between two categorical variables, offering a clear visual representation of complex data sets. However, they can become visually cluttered if too many groups or sub-categories are included, which may render a bar graph into a “cluster-mess.”

A stacked bar graph is similar to a grouped bar graph but stacks sub-category values within a single bar. This format is particularly useful for showing the proportion of sub-categories within each category while also allowing for comparisons of total values across categories, as seen in this example:

One advantage of a stacked bar graph is that it combines total and part-to-whole analysis, providing a comprehensive view of both the overall category size and its internal composition. Additionally, it saves space compared to a grouped bar graph, making it a more compact visualization option.

However, stacked bar graphs can make it difficult to compare individual sub-category values across different bars. They may also become visually discombobulating  when too many sub-categories are included, potentially hindering clear interpretation: behold, the rainbow bar-code!

Misuse of Truncated Bar Chart

Let’s take a look at this bar chart where the y-axis starts at a value higher than 0, exaggerating differences between categories:

The chart exaggerates the differences between the bars by truncating the y-axis. The actual differences are small, but they appear much larger because the baseline isn’t at zero.

How to Fix It: Let’s correct the bar chart by starting the y-axis at zero. Ensure the y-axis starts at zero to provide an accurate visual representation of the differences:

The y-axis now starts at zero. It might be less “visually dramatic,” but it provides an accurate visual representation of the differences between categories.


Pie Chart

Pie charts are commonly used to visualize proportions or percentages of various subcategories within a whole. For example, the simple pie chart below illustrates the distribution of responses to a survey on communication preferences:

An exploded pie chart is similar to a simple pie chart, but one or more slices are separated from the rest to draw attention. This format is particularly useful for highlighting specific categories or outliers, such as emphasizing the most-used communication method in a survey:

A doughnut chart is another common variety of pie chart, distinguished by its hollow center. It serves a similar purpose to a pie chart but provides additional space in the center, which can be used for labels or other relevant information:

Misuse of Pie Chart – Incorrectly Labeled Percentages

Here is a misleading pie chart where the slice proportions do not accurately match the labeled percentages:

In this example, only the 10% slice looks roughly proportional, all remaining slices are either too large or small for their stated percentage. This can mislead viewers to faulty conclusions about the data distribution.


Line Chart

A line graph or line chart or uses points connected by lines to represent data that changes over time or along a continuous variable.

Typically, the horizontal x-axis represents time or a sequential category, while the vertical y-axis represents the variable being measured, such as temperature, sales, or stock prices. Data points are plotted at the intersection of their corresponding x and y values and are then connected by lines to illustrate the changes.

Line graphs are commonly used to visualize trends over time, such as stock prices, daily temperatures, or monthly sales. They help identify patterns, including increases, decreases, or cyclical behavior. One of their key advantages is their ability to show how a variable changes over time, making it easier to detect trends, fluctuations, or periods of stability. Additionally, multiple lines can be plotted on the same graph to compare trends across different variables.

For example, this chart shows the income share of the richest 1% of the population in various countries from 1980 to 2014, measured before taxes and benefits. This line graph provides a clear visual representation of how income inequality has evolved across different nations over time. Each line represents a country, illustrating trends in the proportion of income received by the top 1%:

Misuse of Line Chart – Exaggerated Slope

Let’s plot a graph with a y-axis that starts close to the minimum value, exaggerating the slope of the line:

Notice that in this graph, the y-axis starts at 440, close to the minimum value of the data. This artificially steepens the slope of the line, making the increase in crime rates appear more dramatic than it actually is. The manipulation may lead viewers to believe that crime rates have risen sharply, which is not true.

Now, let’s plot the same data with a properly scaled y-axis:

In this version, the y-axis now starts at 0, providing a more accurate representation of the actual change in crime rates over time. The gradual increase in crime rates is evident, but it does not appear as steep or alarming as in the misleading graph.


Network Graph

Network graphs are visual representations of relationships between entities (nodes) and their interactions or relations (edges). In communication research, network graphs are used to analyze various phenomena, such as social networks, communication flow, and influence patterns.

Network graphs consist of several fundamental elements. Nodes represent entities, such as individuals or organizations. Edges represent the connections or interactions between these nodes, such as communication frequency or social ties. The size or color of nodes is often used to indicate additional variables, such as the importance or influence of an entity, for example, the number of followers in a social network. Similarly, the weight or thickness of edges represents the strength or frequency of interactions, providing a visual cue about the intensity or significance of the connections.

Network graphs are widely applied in several areas. One key application is Social Network Analysis (SNA), which involves studying the structure of social relationships, such as the connections between individuals within a community. Another common use is in Communication Flow, where network graphs help visualize how information moves within an organization or across various platforms. Additionally, they are employed in Influence and Interaction Analysis, which focuses on identifying key influencers or hubs within communication networks, such as prominent social media influencers.

Here is a network graph representing hypothetical user interactions across three major anonymous discussion boards: 2channel, 4chan, and LIHKG:

*Disclaimer: This network graph is provided for illustration purposes only and does not represent actual results from a real study. It serves as a realistic hypothetical example for education. 

This network graph provides a detailed visualization of interactions among 30 users across three major discussion boards: 2channel, 4chan, and LIHKG. The nodes in the graph represent both users and discussion boards, with the board nodes in gold.

The color of each user node indicates their primary board of interaction:

  • Light blue nodes correspond to users primarily engaging with the board 2channel
  • Light green nodes represent users interacting mainly with the board 4chan
  • Light coral nodes signify users who are most active on the LIHKG board.

The edges connecting the nodes represent interactions between users and boards, with the thickness or weight of each edge given in numerical values indicating the frequency of these interactions.

This visualization highlights the distinct user bases associated with each board and provides valuable insights into the patterns of user engagement and cross-platform activity.


Conclusion

Throughout this lesson, we have uncovered the complex interplay between data visualization, visual rhetoric, and framing. By examining early examples of manipulated war photography, such as Fenton’s The Valley of the Shadow of Death and O’Sullivan’s Home of a Rebel Sharpshooter, to more recent examples such as PM Modi’s doctored photo incident, illustrate how visual framing can skew reality,  we saw how visual media, from its inception, has been shaped not just to inform but to persuade and evoke emotion. These examples underscore an important truth: visual representations, far from being neutral mirrors of reality, are imbued with rhetorical intent.

We then explored how these same principles apply to common forms of data visualizations. Whether through timelines, bar charts, or network graphs, the visual presentation of data can clarify complex information but is equally susceptible to manipulation. Techniques such as truncating axes, distorting proportions, or selectively emphasizing data points can subtly, yet powerfully, shape audience perceptions.

Finally, we considered best practices for creating clear, honest, and effective data visualizations. The lesson emphasizes that while visuals can simplify and enhance communication, their design must prioritize accuracy and transparency to maintain credibility. By critically analyzing visual data and understanding its rhetorical dimensions, we become not only better interpreters of information but also more responsible creators.

Posted by Keren Wang, 11 November 2024, all rights reserved.