Lesson on Data Visualization and its Misuses

Posted by Keren Wang, FA 2024


In this session, we aim to achieve several key learning objectives:
  • Understand the fundamental principles of framing and visual rhetoric, exploring how they shape the design and interpretation of data visualizations.
  • Examine the art of designing and manipulating graphic systems of signs that disclose or conceal specific quantitative or qualitative information.
  • Identify common types of data visualizations, such as bar charts, pie charts, line graphs, and network graphs, along with their appropriate applications.
  • Recognize the advantages and potential misuses of data visualizations, including manipulative techniques like framing and scaling distortions.
  • Critically analyze real and hypothetical examples to detect misleading or biased visual representations.
  • Develop best practices for creating clear, honest, and effective data visualizations, ensuring accuracy and ethical integrity.

Photography and Visual Rhetoric

To truly grasp the fundamental principles and perils of data visualization, we must journey back to the birth of photography and photojournalism. When photography was first employed in news reporting, it carried an inherent demand for credibility. Unlike paintings or sketches, photographs were perceived as unfiltered, unmediated representations of reality. Ironically, as our discussion will reveal, even from its inception, photojournalism was subject to rhetorical manipulation.

The manipulation of visual information is exemplified by two notable early instances of war photography: Roger Fenton’s The Valley of the Shadow of Death (1855) and Timothy H. O’Sullivan’s Home of a Rebel Sharpshooter, Gettysburg (1863).

In “The Valley of the Shadow of Death,” taken on April 23, 1855 during the Crimean War, Fenton captured a desolate battlefield landscape strewn with cannonballs, as seen in the version on the left:

Controversy arose when another version of the photograph surfaced, as seen in the version on the right: one with cannonballs scattered across the road and another with the road largely clear.  [1] This discrepancy led to debates about whether Fenton had arranged the cannonballs to create a more dramatic scene, highlighting the potential for photographers to alter battlefield imagery to influence public perception.

Similarly, O’Sullivan’s Home of a Rebel Sharpshooter, Gettysburg (1863) depicts a fallen Confederate soldier positioned in a rocky enclave known as “Devil’s Den.” The carefully arranged placement of the rifle and the soldier’s posture evoke the idealized visual composition of a Renaissance painting:

Subsequent analysis revealed that the body had been moved approximately 40 yards from its original location, and the rifle was placed beside it to enhance the composition. This staging underscores the ethical dilemmas faced by early war photographers, who sometimes manipulated scenes to convey a particular narrative or emotional impact. [2]

From its inception, photographs intended to document reality were often doctored, staged, or framed to distort information and evoke specific emotional reactions. This reveals an essential truth: photographic visualization has always been more rhetorical than purely representational, subjected to the same, if not more subtle, forms of manipulation as speech and writing.

Visual Framing

Framing can  influence how a target audience interprets and responds to a message, by strategically emphasizing certain visual or textual elements while downplaying or obscuring others. [3] This technique can evoke different emotional reactions, guide opinions, or alter the perceived significance of an issue, ultimately steering the audience’s response in a desired direction. [4]

A notable example of visual framing is the incident involving India’s state-run Press Information Bureau (PIB) during the 2015 Chennai floods. The PIB released a photograph of Prime Minister Narendra Modi surveying the flood-affected areas from an aircraft window. However, the image was later revealed to be doctored, with a separate flood scene digitally inserted into the window to enhance the visual impact:

Similarly, data visualizations, which we often consider objective graphical representations of facts, operate under the same rhetorical principles. Like statistics, they can be strategically crafted to shape audience perception and elicit intended reactions. Whether through framing, selective emphasis, or visual distortions, data visualizations share the same capacity for manipulation as photographic narratives. [5] With this context in mind, let’s explore how these principles manifest across various types of visualizations, from timelines to bar charts and beyond, and uncover the rhetorical craft that underpins their design.


Timeline

A timeline is a visual representation of events arranged in chronological order. Unlike bar or line graphs, which typically focus on numeric data, timelines visualize the sequence of events. They help viewers understand the temporal relations between events and how they unfold over time.

Timelines can be oriented either horizontally or vertically. Events are plotted along a time axis and spaced according to when they occurred, and major milestones or periods can be highlighted with markers or annotations.

For instance, a timeline could illustrate the evolution of major classical philosophical figures from ancient China during the “Hundred Schools of Thought” period as seen in this example. By including select figures from ancient Greece and Rome on the opposite side of the time axis, the timeline provides a dual perspective, helping to contextualize these key figures within a broader historical framework.

Timelines are particularly useful for highlighting historical events or developments, such as the progression of a major war or the evolution of technological advancements.

Gantt chart is a specialized timeline used to show the sequence and duration of tasks in a project. One of the main advantages of Gantt charts is that they help organize and visualize complex sequences of events. Here is a more complex Gantt chart that breaks down the survey study into detailed subtasks for each major phase. This provides a clearer picture of the workflow, helping to manage and track each specific step in the process:

Misuse of Timeline – Incorrect Scaling:

A timeline with incorrect scaling occurs when events are spaced unevenly or inaccurately relative to their chronological distances. See  the example below:

In the timeline above, events that are 5,435 years apart (between the invention of ‘Writing Systems’ and ‘Electromechanical & Digital’ information technology) appear visually similar to the much greater span between ‘Writing Systems’ and the advent of ‘Oral, Representational, and Semaphoric’ systems over 100,000 years ago.

This can mislead viewers into thinking that events are either closer together or farther apart than they actually are. The inaccurate spacing may result in misinterpretations of historical progression or cause-and-effect relationships.

How to Fix It: Ensure equal time intervals (e.g., years or decades) are represented by equal physical spacing on the timeline:

In this corrected timeline with consistent time intervals and proportional scale, events that are 100,000 years apart should be visually twenty times as far apart as events that are 5,000 years apart.  If uneven spacing is unavoidable for readability, explicitly note the time differences between events.


Bar Chart

A bar chart or bar graph represents data with rectangular bars, where the length or height of each bar corresponds to the data value it represents. Bars can be plotted vertically or horizontally.

Each bar represents a specific category or group, with its length or height indicating the magnitude of the corresponding value. The bars are separated by spaces to emphasize that the data is discrete, rather than continuous.

Bar charts are commonly used to compare quantities across different categories, such as student enrollment figures for various majors. For example, if we want to compare the number of students enrolled in different majors at a university, a bar chart can present the enrollment figures for each major side by side, clearly showing which major is the most popular:

Bar graphs are particularly effective for highlighting differences, making it easy to identify the highest or lowest values at a glance. Bar charts are simple to construct and interpret, providing a quick visual comparison. They also have the advantage of being able to display both positive and negative values.

A grouped or clustered bar graph such as the one shown below compares two or more groups (sub-categories) within each category. They are commonly used for comparing data across different categories and sub-categories, such as generational differences in communication preferences:

A grouped bar graph is particularly effective for illustrating relationships between two categorical variables, offering a clear visual representation of complex data sets. However, they can become visually cluttered if too many groups or sub-categories are included, which may render a bar graph into a “cluster-mess.”

A stacked bar graph is similar to a grouped bar graph but stacks sub-category values within a single bar. This format is particularly useful for showing the proportion of sub-categories within each category while also allowing for comparisons of total values across categories, as seen in this example:

One advantage of a stacked bar graph is that it combines total and part-to-whole analysis, providing a comprehensive view of both the overall category size and its internal composition. Additionally, it saves space compared to a grouped bar graph, making it a more compact visualization option.

However, stacked bar graphs can make it difficult to compare individual sub-category values across different bars. They may also become visually discombobulating  when too many sub-categories are included, potentially hindering clear interpretation: behold, the rainbow bar-code!

Misuse of Truncated Bar Chart

Let’s take a look at this bar chart where the y-axis starts at a value higher than 0, exaggerating differences between categories:

The chart exaggerates the differences between the bars by truncating the y-axis. The actual differences are small, but they appear much larger because the baseline isn’t at zero.

How to Fix It: Let’s correct the bar chart by starting the y-axis at zero. Ensure the y-axis starts at zero to provide an accurate visual representation of the differences:

The y-axis now starts at zero. It might be less “visually dramatic,” but it provides an accurate visual representation of the differences between categories.


Pie Chart

Pie charts are commonly used to visualize proportions or percentages of various subcategories within a whole. For example, the simple pie chart below illustrates the distribution of responses to a survey on communication preferences:

An exploded pie chart is similar to a simple pie chart, but one or more slices are separated from the rest to draw attention. This format is particularly useful for highlighting specific categories or outliers, such as emphasizing the most-used communication method in a survey:

A doughnut chart is another common variety of pie chart, distinguished by its hollow center. It serves a similar purpose to a pie chart but provides additional space in the center, which can be used for labels or other relevant information:

Misuse of Pie Chart – Incorrectly Labeled Percentages

Here is a misleading pie chart where the slice proportions do not accurately match the labeled percentages:

In this example, only the 10% slice looks roughly proportional, all remaining slices are either too large or small for their stated percentage. This can mislead viewers to faulty conclusions about the data distribution.


Line Chart

A line graph or line chart or uses points connected by lines to represent data that changes over time or along a continuous variable.

Typically, the horizontal x-axis represents time or a sequential category, while the vertical y-axis represents the variable being measured, such as temperature, sales, or stock prices. Data points are plotted at the intersection of their corresponding x and y values and are then connected by lines to illustrate the changes.

Line graphs are commonly used to visualize trends over time, such as stock prices, daily temperatures, or monthly sales. They help identify patterns, including increases, decreases, or cyclical behavior. One of their key advantages is their ability to show how a variable changes over time, making it easier to detect trends, fluctuations, or periods of stability. Additionally, multiple lines can be plotted on the same graph to compare trends across different variables.

For example, this chart shows the income share of the richest 1% of the population in various countries from 1980 to 2014, measured before taxes and benefits. This line graph provides a clear visual representation of how income inequality has evolved across different nations over time. Each line represents a country, illustrating trends in the proportion of income received by the top 1%:

Misuse of Line Chart – Exaggerated Slope

Let’s plot a graph with a y-axis that starts close to the minimum value, exaggerating the slope of the line:

Notice that in this graph, the y-axis starts at 440, close to the minimum value of the data. This artificially steepens the slope of the line, making the increase in crime rates appear more dramatic than it actually is. The manipulation may lead viewers to believe that crime rates have risen sharply, which is not true.

Now, let’s plot the same data with a properly scaled y-axis:

In this version, the y-axis now starts at 0, providing a more accurate representation of the actual change in crime rates over time. The gradual increase in crime rates is evident, but it does not appear as steep or alarming as in the misleading graph.


Network Graph

Network graphs are visual representations of relationships between entities (nodes) and their interactions or relations (edges). In communication research, network graphs are used to analyze various phenomena, such as social networks, communication flow, and influence patterns.

Network graphs consist of several fundamental elements. Nodes represent entities, such as individuals or organizations. Edges represent the connections or interactions between these nodes, such as communication frequency or social ties. The size or color of nodes is often used to indicate additional variables, such as the importance or influence of an entity, for example, the number of followers in a social network. Similarly, the weight or thickness of edges represents the strength or frequency of interactions, providing a visual cue about the intensity or significance of the connections.

Network graphs are widely applied in several areas. One key application is Social Network Analysis (SNA), which involves studying the structure of social relationships, such as the connections between individuals within a community. Another common use is in Communication Flow, where network graphs help visualize how information moves within an organization or across various platforms. Additionally, they are employed in Influence and Interaction Analysis, which focuses on identifying key influencers or hubs within communication networks, such as prominent social media influencers.

Here is a network graph representing hypothetical user interactions across three major anonymous discussion boards: 2channel, 4chan, and LIHKG:

*Disclaimer: This network graph is provided for illustration purposes only and does not represent actual results from a real study. It serves as a realistic hypothetical example for education. 

This network graph provides a detailed visualization of interactions among 30 users across three major discussion boards: 2channel, 4chan, and LIHKG. The nodes in the graph represent both users and discussion boards, with the board nodes in gold.

The color of each user node indicates their primary board of interaction:

  • Light blue nodes correspond to users primarily engaging with the board 2channel
  • Light green nodes represent users interacting mainly with the board 4chan
  • Light coral nodes signify users who are most active on the LIHKG board.

The edges connecting the nodes represent interactions between users and boards, with the thickness or weight of each edge given in numerical values indicating the frequency of these interactions.

This visualization highlights the distinct user bases associated with each board and provides valuable insights into the patterns of user engagement and cross-platform activity.


Conclusion

Throughout this lesson, we have uncovered the complex interplay between data visualization, visual rhetoric, and framing. By examining early examples of manipulated war photography, such as Fenton’s The Valley of the Shadow of Death and O’Sullivan’s Home of a Rebel Sharpshooter, to more recent examples such as PM Modi’s doctored photo incident, illustrate how visual framing can skew reality,  we saw how visual media, from its inception, has been shaped not just to inform but to persuade and evoke emotion. These examples underscore an important truth: visual representations, far from being neutral mirrors of reality, are imbued with rhetorical intent.

We then explored how these same principles apply to common forms of data visualizations. Whether through timelines, bar charts, or network graphs, the visual presentation of data can clarify complex information but is equally susceptible to manipulation. Techniques such as truncating axes, distorting proportions, or selectively emphasizing data points can subtly, yet powerfully, shape audience perceptions.

Finally, we considered best practices for creating clear, honest, and effective data visualizations. The lesson emphasizes that while visuals can simplify and enhance communication, their design must prioritize accuracy and transparency to maintain credibility. By critically analyzing visual data and understanding its rhetorical dimensions, we become not only better interpreters of information but also more responsible creators.

Posted by Keren Wang, 11 November 2024, all rights reserved.

 

Argumentation Lesson: Language in Argument

 

 

SCOM 2710: Language in Argument

Posted by Keren Wang, 2024

Assigned Reading for This Week

Herrick, Chapter 12 – Definition in Argument

Key Terms:

  • Argumentative and circular definitions
  • Definition Report
  • Distinction without a difference
  • Etymology
  • Euphemism
  • Labeling
  • Original intent
  • Paradigm case
  • Reclassification

Herrick, Chapter 13 – Locating and Evaluating Sources of Evidence

Key Terms:

  • Ambiguity
  • Equivocation
  • Mixed metaphor
  • Redundancy
  • Semantic and syntactic ambiguity

Definition in Argument

Imagine a city council debate over a proposed law to decriminalize the possession of small amounts of all drugs, including heroin and cocaine. Advocates called this shift a “public health approach,” emphasizing treatment over punishment for addiction. They argued that this wasn’t about condoning drug use but addressing addiction as a health crisis. Meanwhile, opponents labeled it a “soft-on-crime policy,” warning that it would lead to increased drug-related deaths, homelessness, and crime. By framing the law in these dramatically different ways, each side influenced public perception and the meaning of “decriminalization” itself.

This scenario highlights how definitions play a pivotal role in shaping arguments and controlling public opinion. In this chapter, we’ll explore how definitions are strategically used in arguments—not simply to clarify, but to guide, persuade, and even manipulate. As we’ll see, defining a term isn’t just about providing a meaning; it’s about shaping reality.

Homeless encampment in Portland, Oregon

Homeless encampment along city street in Portland, Oregon. In 2020, Oregon made headlines as the first U.S. state to decriminalize the possession of small amounts of hard drugs. After intense public debates, Oregon lawmakers voted to roll back drug decriminalization in 2024. Photo CC BY-SA 4.0 via Wikimedia Commons.

1. Importance of Definition in Argument

Definition plays a crucial role in argumentation as it sets boundaries and shapes the debate. Whoever controls the definitions often controls the argument.

2. Types of Definitions

  • Definition Report: Provides a generally accepted meaning for clarity. Example: “Deep web” as sites not indexed by search engines.
  • Argumentative Definition: A strategic definition used to support a specific argument. Example: Labeling a border wall as a “security wall” versus a “land grab” impacts public perception.

3. Strategies of Definition

  • Euphemism: Softening terms to reduce opposition. Example: Calling layoffs “downsizing” in corporate language.
  • Reclassification: Shifting a term to a new category. Example: Buckyballs marketed as “adult desktop gift items” rather than “toys” to avoid liability issues.
  • Labeling: Using suggestive names to influence perception. Example: “Fake news” to discredit media outlets without addressing specific arguments.

4. Evaluating Definitions

  • Circular Definition: Defining something by repeating the same idea. Example: Defining “free trade” as “unrestricted imports and exports.” Or defining “crime” as “unlawful act.”
  • Distinction Without a Difference: Claiming a new category exists without meaningful differences. Example: Defining a “registry of citizen support” instead of calling it a “petition.”

5. Sources of Definitions

  • Common Usage: Everyday meanings, often used in political and legal contexts. Example: “Parent” as biological or adoptive guardian in family law.
  • Etymology: Word origins to clarify meanings. Example: “Vocation” (from Latin vocare, “to call”) to suggest meaningful work beyond a job.
  • Paradigm Case: Using typical examples to define a term. Example: Defining “good president” by referencing qualities of Harry Truman.
  • Original Intent: Meaning based on original usage. Example: Debates on the Second Amendment and the term “militia” in the U.S. Constitution.
  • Authority: Expert definitions from recognized sources. Example: DSM-5’s definitions in mental health discussions.

Ambiguity, Equivocation, and other Misleading Uses of Language in Argumentation

Consider this: during a heated presidential campaign speech, Candidate A declares their main opponent, Candidate B “an existential threat to democracy,” warning that their actions could dismantle the country’s foundational systems. Candidate B’s supporters respond by insisting that their candidate is “a true patriot fighting for the people.” This choice of language shapes two vastly different narratives: Is this person a risk to democracy itself, or a protector of core values? Both terms—“existential threat” and “true patriot”—are vague, ambiguous, and open to interpretation, inviting audiences to project their fears, hopes, and political leanings onto these phrases.

This chapter delves into how ambiguity, equivocation, and other language tactics blur lines in arguments and public discourse. As we’ll uncover, words like these aren’t simply descriptors; they’re powerful tools that stir emotions, shift perceptions, and obscure meaning. Understanding these language strategies reveals how rhetoric can shape beliefs, mask intentions, and manipulate public understanding.

1. Ambiguity

Semantic Ambiguity: A word with multiple meanings in the same context.

  • Example: “Washington lawmakers debate concealed carry ban” (It’s unclear whether “Washington” refers to the State of Washington, the city of Washington D.C., or as a metonym for the U.S. Congress).
  • Example: “Real Cheese Flavor” (It’s unclear whether “real” refers to actual cheese content, or artificial ingredients that mimics the taste of “real” cheese).

Syntactic Ambiguity: Sentence structure creates multiple interpretations.

  • Example: “Family Sued Over Dog Bite Wins $100,000 Settlement” (Was the family sued because their dog bit someone, or were they the victims of a dog bite and received a settlement?)
  • Example: “Special Financing Available Without Credit Check” (This could suggest that all special financing options require no credit check, but it may mean only a specific option with high-interest terms is available without a credit check)

2. Equivocation

Changing the meaning of a term mid-argument, creating inconsistency.

  • Example 1: “Senator Zoidberg is a true patriot who fights for freedom. Unlike other lawmakers, he always fought vigorously for the freedom to bear arms.” “Freedom” initially suggests broad individual liberties, but in context, it may shift to mean only certain selective freedoms that align with the politician’s agenda.
  • Example 2: “This law provides protection for families. It protects their right to refuse government-mandated vaccinations.” “Protection” is first used to imply safety and security but shifts to mean protecting certain cultural or ideological beliefs.

3. Other Language Issues

Redundancy: Unnecessary repetition of ideas.

  • Example: “The economy is doing great! Inflation has slowed, consumer prices are stabilizing, cost-of-living increases have tapered off, price growth has moderated.”

Choosing the Wrong Word (for the purpose to mislead): Misuse due to similar-sounding words.

  • Example 1: Framing mass layoffs as “The company is optimizing its workforce” (to make the act of firing employees sound like an organizational improvement rather than a reduction).
  • Example 2: Framing illegal wiretapping as “special enhanced monitoring procedures for national security.”
  • Example 3: Framing illegal pyramid scheme as “Member-supported, peer-to-peer financial empowerment program, featuring network-enhanced income plan and exclusive opportunity with tiered rewards.”
  • Example 4: Framing unpaid labor as “Exposure-based volunteer opportunities for portfolio enhancement and gaining valuable industry insight.”
© 2024 SCOM 2710 – Language in Argument. All rights reserved.

 

Lesson on Statistical Evidence in Argumentation – Evaluating Survey Accuracy

 

 

Sampling & Evaluating Survey Accuracy

Posted by Keren Wang, 2024

*Before starting this lesson, make sure you have reviewed: Statistical Evidence: Survey and Opinion Polling

*

Sampling Error and Polling Accuracy Case Study: 2016 U.S. Presidential Election

During the 2016 U.S. Presidential Election, many national polls predicted a victory for Democratic candidate Hillary Clinton. While Clinton won the popular vote by around 2.1%, Republican candidate Donald Trump won the Electoral College, securing the presidency. [1

Huffington Post 2016 Election Prediction

Image: Huffington Post’s 2016 US Presidential Election prediction updated on the eve of the election day.

The challenges pollsters faced with predicting the 2016 election shed light on a common problem with using statistics in arguments: numbers can give us a false sense of certainty. Stats are powerful—they carry authority and can seem “objective”—but they’re rarely as clear-cut as they appear. In reality, they often come with hidden biases and assumptions that may not capture the full picture. [2]

In 2016, pollsters and analysts leaned heavily on polling data to forecast the election outcome, treating the numbers almost like a science. But they didn’t account for factors like sampling errors, social desirability bias, and last-minute changes in voter sentiment, all of which skewed the predictions. The result? A widespread belief that one outcome was nearly certain—until it wasn’t.

This reliance on numbers to tell a definitive story shows how easy it is to be misled by the “authority” of stats. It’s a reminder that while statistical evidence can be persuasive, it’s not infallible. To use data responsibly in arguments, we need to present it with a little humility, recognizing its limitations and the need to pair it with other forms of analysis. Instead of seeing numbers as the whole truth, we should treat them as one piece of the puzzle, open to interpretation and discussion. [3]

Evaluating the Use of Polling Evidence in 2016 US Presidential Election

1. Non-Response Bias

 

Impact: Non-response bias occurs when certain demographics are less likely to respond to polls, which can skew results. In 2016, many polls underrepresented rural and working-class voters, groups that tended to favor Donald Trump. These groups were harder to reach and less likely to respond to traditional polling methods. [4]

 

Problematic Use in Argumentation: Analysts and commentators who used these poll results often overlooked or underestimated the impact of this underrepresentation. News networks frequently relied on results from similar polling agencies, creating a feedback loop that reinforced a constructed “reality” of Clinton’s expected victory. This effect was further amplified as media outlets fed off each other’s election news stories and headlines, creating a narrative that appeared authoritative but was actually based on incomplete data. This collective overconfidence in Clinton’s chances contributed to a misleading perception that didn’t reflect the complexities and variances among voter demographics.

2. Late Shifts in Voter Preferences

Impact: Many voters made up their minds close to Election Day, influenced by last-minute campaign events, media coverage, or debates. Polling, however, generally stops a few days before the election, often missing these late shifts. In 2016, a significant portion of undecided voters shifted toward Trump in the final days, which wasn’t captured in most polling. [5] The reasons for this shift are complex, but one contributing factor may have been social desirability bias—some Trump supporters may not have honestly disclosed their preferences to pollsters, fearing negative judgment from their friends and family members. As a result, these voters remained hidden within the undecided category, skewing polling data away from an accurate portrayal of support for Trump. [6]

Problematic Use in Argumentation: This late shift was largely invisible in the polling data, leading analysts to underestimate Trump’s chances. When using this data for argumentation, commentators tended to either overlook the intrinsic time constraint of surveying, or erroneously assume that the voters who were undecided would either not vote or distribute evenly across candidates. [7] This assumption failed to account for the unpredictability of undecided voters, ultimately leading to faulty conclusions.

3. Sampling Error

 

Impact: Sampling error, the statistical error that occurs when a poll’s sample does not perfectly represent the population, was especially impactful in closely divided states. In 2016, even minor errors in states like Michigan, Wisconsin, and Pennsylvania, where the polls showed narrow leads for Clinton, contributed to a misleading picture. The underestimated support for Trump in these states shifted the Electoral College outcome in his favor. [8]

Studies have found that that 2016 election polls across the board suffer from a range of sampling malpractices—often collectively referred to as “total survey error.” Reported margins of error typically capture only sampling variability and often ignore systemic sampling errors, such as uncertainty in defining the target population, particularly regarding who will likely to vote. An empirical analysis 2016 US presidential election cycles found that average survey error was around 3.5 percentage points—approximately double the margin implied by most reported margins of error. Polls were also predominately erred on the side of overestimating Clinton’s performance, which was partly due to similar types of unwarranted assumptions about voter demographics by major polling organizations. This shared “meta-bias” creates further compound inaccuracies, especially in close races as seen in 2016. [9]

Problematic Use in Argumentation: Polling margins of error are often presented as minor uncertainties, with little impact on the overall narrative. In 2016, this assumption was problematic because the race in key states was so close that even a small sampling error could, and did, shift the predicted outcome. The statistics carried an aura of scientific objectivity, which masked underlying biases and imperfect assumptions that remained tacit and hidden behind cold numbers. News media perpetuated this misconception by over-relying on the seemingly definitive value of these numerical data, interpreting polling results as if they offered predictable and accurate insights. [10] This contributed to overconfidence in Clinton’s prospects and led commentators to misjudge the actual electoral dynamics in crucial swing states.

 

The Problem of Sampling

Sampling is the process of selecting a subset of individuals from a larger population to make inferences about that population. Effective sampling ensures that survey findings are representative and reliable.

The goal of sampling is to accurately reflect the larger population’s characteristics by selecting a group that is both representative and adequately sized. This section of the reading covers three primary sampling methods used to create a sample that reflects the diversity and characteristics of the population being surveyed: simple random sampling, stratified random sampling, and clustered sampling:

Simple Random Samples

A simple random sample gives each member of the population an equal chance of being selected. This method involves a straightforward, single-step selection process.

Process

Researchers assign a number to each individual in the population and use a random number generator or a table of random numbers to select individuals.

Example

In a survey of university students’ media consumption habits, a researcher may use a list of all enrolled students, assign each a number, and then use a random number generator to pick students for the survey.

Benefits

This method helps prevent bias since each individual has an equal opportunity to be included. It’s often the most representative sampling method if done correctly.

Limitations

Simple random sampling can be challenging and time-consuming when dealing with large populations, as researchers need an accurate, complete list of all members.

Stratified Random Samples

Stratified random sampling involves dividing the population into subgroups (strata) based on relevant characteristics and then randomly selecting individuals within each stratum.

Process

Researchers identify categories (e.g., age, gender, ethnicity), divide the population accordingly, and randomly sample individuals from each category. This ensures each subgroup is adequately represented.

Example

If a study examines the impact of social media on mental health among high school students, researchers might stratify the sample by grade level (e.g., freshman, sophomore) and then randomly select students within each grade level to participate.

Benefits

This method increases precision by ensuring the sample reflects key population characteristics, making it valuable for ensuring representation across specific groups.

Limitations

Stratified sampling requires additional time and resources to divide the population and select individuals within each subgroup. It assumes that researchers know which subgroups are relevant to the study.

Clustered Samples

Clustered sampling selects groups (clusters) rather than individual members, which is useful for large, widely distributed populations or when a complete list of members is impractical.

Process

Researchers divide the population into naturally occurring clusters, such as geographical locations, and then randomly select clusters. Within each cluster, they may survey all members or randomly choose individuals.

Example

In a survey on internet access across a large city, researchers might select certain neighborhoods as clusters and then survey individuals within those neighborhoods.

Benefits

Cluster sampling saves time and reduces travel costs, especially for geographically dispersed populations. It’s often more practical and economical for large-scale studies.

Limitations

Clustered sampling can lead to sampling bias if clusters are not representative of the overall population and is generally less precise than other methods due to potential similarities within clusters.

Obtaining Samples

Random sampling ensures that each member of the population has a known, non-zero chance of being selected, minimizing bias and improving the representativeness of the sample. Here’s how the process works in the three sampling methods discussed earlier:

1. Simple Random Sampling

Researchers create a list of every individual in the population and assign a sequential number to each. Using a table of random numbers or software, they randomly select numbers corresponding to individuals.

  • Population Members: Imagine 100 individuals in a line, represented as gray dots labeled from 1 to 100 as shown below.
  •  Random Sample Selection: 15 individuals (most commonly selected via computer-generated random numbers) are highlighted in blue among the gray dots, showing how a simple random sample is chosen without any grouping or structure.

This method works best with smaller, manageable populations where researchers have full access to an accurate population list. [11]

Simple Random Sampling

2. Stratified Random Sampling

This method typically involves dividing the population into distinct groups or strata based on relevant characteristics (such as age, income, education level). Within each stratum, a simple random sample is then conducted.

By sampling within each group, researchers can control for potential influences that specific characteristics might have on the survey’s findings, thereby increasing the sample’s representativeness. [12]

Stratified Random Sampling

3. Clustered Sampling

When it is impractical to list every individual in a large population, researchers divide the population into clusters, often based on geographic or organizational divisions.

They randomly select entire clusters and survey individuals within those selected clusters. This can involve surveying everyone in each cluster or using random sampling within clusters for larger groups. [13]

Clustered Sampling

Evaluating Survey Accuracy

This section explores three critical factors for assessing survey accuracy: Sample Size, Margin of Error, and Confidence Level. Understanding these elements helps researchers determine the reliability of their survey results and interpret findings with appropriate caution.

1. Sample Size

Definition: The number of individuals or units chosen to represent the population in the survey.

Importance: Larger samples generally provide more accurate data. The relationship between sample size and accuracy follows the law of diminishing returns, meaning that after a certain point, increases in sample size result in only minor improvements in accuracy.

Key Concept: Sampling Error decreases as sample size increases. However, the increase in precision grows smaller as the sample size becomes very large.

Example: Imagine researchers want to understand coffee preferences across a city with 100,000 residents. They conduct a survey to find out what percentage of residents prefer iced coffee over hot coffee:

  • The researchers initially surveyed 100 residents and found that 60% prefer iced coffee. However, with only 100 people surveyed out of 100,000, this small sample has a higher margin of error, potentially making the results less representative of the entire population.
  • To get a more precise estimate, they increase the sample size to 1,000 people, which lowers the margin of error. As the sample size grows, the accuracy of the result improves, giving a clearer picture of the true percentage of residents who prefer iced coffee.

2. Margin of Error

Definition: The range within which the true population parameter is expected to fall, considering sampling variability.

Role in Surveys: The margin of error shows the possible deviation between the survey’s results and the actual population values.

Calculation: It’s derived from the standard error and sample size, reflecting how representative the sample is of the population.

Example : In the same coffee preferences survey scenario:

  • With the 100-person survey, they might have a margin of error of ±10% (at 95% confidence level), meaning the true preference for iced coffee could be anywhere between 50% and 70%.
  • With the 1,000-person survey, the margin of error decreases to ±3% (at 95% confidence level), so they can now be more confident that the true preference is between 57% and 63%.
  • With a 2,000-person survey, the margin of error further goes down to ±2% (at 95% confidence level).

3. Confidence Level

Definition: The degree of certainty that the population parameter lies within the margin of error.

Common Confidence Levels: 95% confidence is standard, meaning if the survey were repeated multiple times, 95% of the results would fall within the margin of error.

Confidence Interval: This is the range constructed around the survey result to indicate where the true population parameter is likely to be, given the confidence level.

Example: In the coffee preferences survey scenario:

  • 95% Confidence Level (C.L.): The researchers can be 95% confident that the true percentage of iced coffee preference lies within their calculated margin of error (±3%).
  • 99% C.L.: If they want to be even more certain, they could use a 99% confidence level, increasing the margin of error to ±4% for the 1,000-person survey.
  • To maintain the same margin of error at a 99% confidence level, a larger sample size would be required, such as 2,000 people to achieve a ±3% margin of error.

Sample Size and Margin of Error Chart

Chart: The chart above illustrates the relationship between sample size and margin of error across different confidence levels (95% and 99%).

As sample size increases, the margin of error decreases, making the survey more precise. Higher confidence levels (e.g., 99%) result in a larger margin of error, meaning we can be more confident in the results but within a wider range. The diminishing effect of increasing sample size shows that the margin of error decreases rapidly with smaller samples but flattens out at higher sample sizes.

*

 

Further Reading

1. Courtney Kennedy et. al., An Evaluation of the 2016 Election Polls in the United States, Public Opinion Quarterly, Volume 82, Issue 1, Spring 2018.

2. Joshua J. Bon, Timothy Ballard, Bernard Baffour, Polling Bias and Undecided Voter Allocations: US Presidential Elections, 2004-2016, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 182, Issue 2, February 2019, Pages 467-493.

3. Wright, Fred A., and Alec A. Wright. “How surprising was Trump’s victory? Evaluations of the 2016 US presidential election and a new poll aggregation model.” Electoral Studies 54 (2018): 81-89.

4. Battersby, Mark. “The Rhetoric of Numbers: Statistical Inference as Argumentation.” (2003).

5. Hoeken, Hans. “Anecdotal, statistical, and causal evidence: Their perceived and actual persuasiveness.” Argumentation 15 (2001): 425-437.

6. Giri, Vetti, and M. U. Paily. “Effect of scientific argumentation on the development of critical thinking.” Science & Education 29, no. 3 (2020): 673-690.

7. Gibson, James L., and Joseph L. Sutherland. “Keeping your mouth shut: Spiraling self-censorship in the United States.” Political Science Quarterly 138, no. 3 (2023): 361-376.

8. Roeh, Itzhak, and Saul Feldman. “The rhetoric of numbers in front-page journalism: how numbers contribute to the melodramatic in the popular press.” Text-Interdisciplinary Journal for the Study of Discourse 4, no. 4 (1984): 347-368.

9. Ziliak, Stephen T., and Ron Wasserstein. “One Thing About… the Rhetoric of Statistics.” CHANCE 36, no. 4 (2023): 55-56.

Lesson on Statistical Evidence – Survey and Opinion Polling

 

 

 Statistical Evidence – Survey and Opinion Polling

Posted by Keren Wang, 2024

This lesson covers using statistical evidence, specially surveys and opinion polls surveys in argumentation.

Surveys, also known as opinion polls, are designed to represent the views of a population by posing a series of questions to a sample group and then extrapolating broader trends or conclusions in quantitative terms. [1] [2]  For example, in an election year, polling agencies might survey a diverse group of registered voters to gauge public support for different candidates, often reporting findings in percentages to indicate the overall popularity of each option. These results help forecast likely outcomes and provide insights into public sentiment on key issues. However, like all statistical evidence, opinion poll results are susceptible to inaccuracies and distortions, which can be misused or misinterpreted.

In this lesson, we will examine various types of surveys, their advantages and limitations, and their appropriate applications.

There are two basic types of surveys – descriptive and analytic:

Survey Types

Survey Type Descriptive Survey Analytic Survey
Purpose Document and describe the characteristics, behaviors, or attitudes of a specific population at a given time. Goes beyond description to investigate why certain patterns or behaviors occur.
Focus Covers demographic data, behavior frequency, or opinions and preferences. Tests hypotheses by examining relationships between variables.
Use in Communication Research Useful for establishing a snapshot of audience preferences or public opinion. Used to analyze how communication factors affect audience attitudes and behaviors.
Example 1 Survey examining social media platform usage among college students. Survey studying the relationship between crisis communication and trust in government.
Example 2 Survey by a PR firm to gauge public perceptions of a corporate brand. Survey exploring whether exposure to diverse news sources affects political polarization.

Advantages and Limitations of Surveys

Advantages

  • Anonymity: Anonymous surveys can increase comfort in sharing personal or sensitive information. Respondents may feel more at ease providing honest answers when their identities are not disclosed, leading to more accurate data. [3]
  • Broad Reach: Surveys / opinion polls allow researchers to reach large, diverse populations. For instance, online surveys can be distributed globally, allowing for data collection from a wide audience. [4]
  • Cost-Effective: Surveys are accessible across industries for large-scale data collection. Online surveys, in particular, are less expensive compared to other methods, such as face-to-face or telephone interviews. [5]
  • Quantifiable Data: Surveys allow for measurable insights into the target population. By using structured questions, researchers can collect data that is easy to analyze statistically, facilitating the identification of patterns and trends. When designed and applied correctly, surveys can increase consistency and reduce researcher bias. By administering the same set of questions to all respondents, surveys can bring uniformity in opinion collection and analysis.

Limitations

  1. Ineffective or Misleading Questions: Survey questions are sometimes written in vague, overly broad, or ambiguous ways that can lead to misinterpretation or inaccurate responses. Closed-ended questions, such as yes/no or rating scale questions, may lack depth. While these types of questions facilitate quantitative analysis, they often fail to capture the full complexity of respondents’ thoughts and feelings. [6]
    • Example: A closed-ended question like “Do you support increased sales taxes to improve education and healthcare? YES or NO” can be misleading. It combines two issues—education and healthcare—forcing respondents to address both at once, even if they feel differently about each. It also presents “increased taxes” as the only solution, ignoring other funding options. Imagine if 70% of respondents to this question answered “NO,” a news headline claiming “Poll Shows 70% of Voters Don’t Want to Improve Education & Healthcare” would also be misleading. Many may have chosen “NO” not because they oppose improvements, but because they doubt the effectiveness using higher sales taxes to fund them.
  2. Low Response Rates: Surveys / opinion polls can lead to unrepresentative samples. A low response rate may result in a sample that does not accurately reflect the target population, potentially skewing the results. [7]
    • Example: In the 2016 U.S. presidential election, many opinion polls predicted a victory for Hillary Clinton. However, these polls often had low response rates, leading to unrepresentative samples that underestimated support for Donald Trump. This discrepancy highlighted how low participation can skew survey results. [8]
  3. Sampling Challenges: Surveys risk sampling bias if the sample lacks diversity. If certain groups are underrepresented, the findings may not be generalizable to the entire population.
    • Example: A 1936 Literary Digest poll for U.S. presidential election predicted Alf Landon’s victory over Franklin D. Roosevelt by sampling its readers, car owners, and telephone users—groups not representative of the general population during the Great Depression. This sampling bias led to an incorrect prediction. [9]
  4. Response Bias: Surveys may not always provide truthful answers. Respondents might give socially desirable responses or may not recall information accurately, leading to biased data. [10] Surveys may also lead to other self-reporting inaccuracies due to memory lapses, misinterpreting questions, or self-censorship. [11]
    • Example: Surveys on sensitive topics, such as dietary habits or illicit drug use, often face response bias, with respondents under-reporting due to social desirability bias. This can result in data that grossly misrepresents actual behaviors.

Writing Effective Survey Questions

Clarity and Simplicity

Use straightforward language and avoid technical jargon or complex wording.

Example: Instead of “Do you support implementing an ETS protocol that requires corporate entities to offset GHG emissions via tradable emissions permits?” try “Do you support the creation of a program that would require companies to purchase permits for their excess emissions in order to reduce greenhouse gas emissions?”

Conciseness and Single-Focus

Keep questions short to reduce respondent fatigue and prevent confusion. Ask only one piece of information per question to avoid ambiguity.

Examples:

Split “How satisfied are you with the quality and reliability of our customer service?” into two questions: “How satisfied are you with the quality of our customer service?” and “How satisfied are you with the reliability of our customer service?”
Rather than “How satisfied are you with the company’s benefits and career development opportunities?” split it into “How satisfied are you with the company’s benefits?” and “How satisfied are you with the company’s career development opportunities?”

Neutral Language

Avoid leading questions that imply a preferred answer. Use neutral phrasing to encourage honesty.

Example: Instead of “Don’t you think our product is the best on the market?” try “How would you rate our product compared to others on the market?”

Relevance and Sensitivity

Ensure questions are relevant and appropriate for the audience. Avoid overly personal questions unless essential.

Example: Instead of “How often do you feel sad while using our service?” ask, “How does using our service affect your mood?”

Choosing Between Open-Ended and Close-Ended Questions

Researchers tend to use closed-ended questions when they need quantifiable data that is easily comparable, such as demographic details or satisfaction ratings. Open-ended questions are used when in-depth feedback is needed, to explore new topics, or to understand respondents’ reasoning. There are pros and cons to both question types.

Close-Ended Questions

Close-ended questions provide respondents with predefined options, such as multiple-choice, yes/no, or rating scales.

Advantages
  • Ease of Analysis: Quantifiable data is easy to code and analyze. Ensures uniform responses across participants.
  • Efficiency: Quick for respondents to answer, which improves response rates.
Limitations
  • Limited Insight: Restricts responses to preset options, missing depth.
  • Risk of Bias: Poor answer choices may bias responses. See “Limitations” section above.
Examples of Close-Ended Questions
Multiple-Choice: “What social media platform do you use most frequently?”
Yes/No: “Do you feel our product meets your expectations?”
Likert Scale: “How satisfied are you with our service?” Options: 1. Very Dissatisfied, 2. Dissatisfied, 3. Neutral, 4. Satisfied, 5. Very Satisfied

Open-Ended Questions

These questions allow respondents to answer freely in their own words, offering more detailed insights.

Advantages
  • Nuance: Allows for detailed responses, offering deeper insights.
  • Flexibility: Enables responses that may reveal unexpected insights.
Limitations
  • Time-Consuming: Longer response times and more complex analysis.
  • Complex Analysis: Open-ended responses need qualitative coding.
Examples of Open-Ended Questions
Example 1: “What features would you like us to add to our product, and why?”
Example 2: “Describe your experience with our customer service.”
Example 3: “What motivates you to choose one news source over another?”

*Click here to continue to our lesson on Sampling & Evaluating Survey Accuracy

❖ Further Reading ❖

  • Fink, A. (2009). How to conduct surveys: A step-by-step guide (4th ed.). Thousand Oaks, CA: Sage.
  • Fowler Jr, F. J. (2013). Survey research methods. Sage publications.
  • Mehrabi, N., et al. (2021). A survey on bias and fairness in machine learning. ACM computing surveys.
  • Nardi, P. M. (2018). Doing survey research: A guide to quantitative methods. Routledge.
  • Rea, L. M., & Parker, R. A. (2005). Designing and conducting survey research. San Francisco: Jossey-Bass.

Argumentation Lesson – Evaluating Evidence and Online Sources (SCOM 2710)

 

Evaluating Evidence and Digital Literacy

SCOM 2710 Argumentation Lesson, Posted by Keren Wang, updated 2024

 

Overview

This week we will be focusing on understanding and evaluating evidence, a crucial aspect of constructing persuasive arguments. It explains how evidence interacts with values, and presents general tests for assessing the quality of evidence. We will also be learning how to locate and evaluate various sources of evidence, guiding you on choosing reliable information from books, periodicals, websites, and more. The chapter emphasizes the importance of digital literacy and critical evaluation of different types of sources.

Ad Fontes Media Bias Chart 6.0
Media Bias Chart published by Ad Fontes Media, 2020. Fact-checking always lags behind the emergence of new biased sources of information.

Understanding Evidence

Evidence and Values

Evidence is always interpreted through personal and cultural values. Here are some examples of how values shape our interpretation of evidence:

  • Artificial Intelligence in Employment: Evidence showing that AI can improve productivity and efficiency is interpreted by some as a positive development for economic growth, whereas others see it as a threat to jobs, fearing mass unemployment and widening economic inequality.
  • Genetic Editing (CRISPR): Evidence about the successful application of CRISPR to edit genes in humans can be seen as a revolutionary medical advancement that will eliminate hereditary diseases, or as a dangerous intervention with unknown ethical and social consequences.
  • Universal Basic Income (UBI): Evidence from trials of Universal Basic Income might show improvements in mental health and poverty reduction, which is interpreted positively by proponents as proof of the policy’s benefits. However, others might see it as fostering a culture of dependency or as economically unviable, depending on their economic values.
  • Police Surveillance Technology: Evidence supporting the use of facial recognition and other surveillance technology to improve public safety can be seen as a way to effectively reduce crime. On the other hand, it is interpreted by others as a serious threat to privacy and civil liberties, especially in communities that may be disproportionately targeted.
  • Vaccination and Public Health: Evidence showing the efficacy of mandatory vaccination for school children may be interpreted as essential for public safety by some individuals, while others may view it as intrusive government overreach or distrust the pharmaceutical industry.
Robotic Sculpture at MIT Media Lab (photo by Keren Wang 2015)
Evidence that suggests a major breakthrough in general artificial intelligence may be seen either as a major technological advancement or as ethically problematic depending on the individual’s values.

General Tests of Evidence

Herrick introduces seven general tests of evidence that can help evaluate whether evidence used in an argument is reliable, credible, and sufficient to support a conclusion. These tests provide a comprehensive approach to assessing the quality of evidence. Here’s a detailed breakdown:

  1. Accessibility: Is the Evidence Available?Evidence that is accessible and open to scrutiny is generally considered more reliable.Example: A public health official cites the number of COVID-19 cases reported by the Centers for Disease Control and Prevention (CDC). This evidence is accessible because the CDC publishes its data on a website that anyone can visit and verify.

    Counterexample: Someone claims that the government has “secret documents” showing proof of extraterrestrial contact. Since these alleged documents are not accessible for review, the claim fails the test of accessibility.

  2. Credibility: Is the Source of the Evidence Reliable?This can depend on the reputation of the author or organization providing the evidence, as well as whether the source has the appropriate credentials or expertise.Example: A research paper on the safety of vaccines authored by a team of immunologists and published in The New England Journal of Medicine is credible due to the expertise of the authors and the reputation of the journal.

    Counterexample: A claim about vaccine dangers coming from an anonymous social media post lacks credibility because the author’s qualifications are unknown, and the post does not have any verifiable authority.

  3. Internal Consistency: Does the Evidence Contradict Itself?Evidence should not contradict itself. If evidence is self-contradictory, it weakens the argument and creates doubt regarding its reliability.Example: A government report on unemployment must consistently present the same statistics throughout the report. If one section states an unemployment rate of 6% and another section states 8% without clarification, the evidence lacks internal consistency.
  4. External Consistency: Does the Evidence Contradict Other Evidence?Evidence that sharply contradicts most other reputable evidence is often seen as unreliable.Example: A study on climate change that finds rising global temperatures should align with the majority of climate research from other scientific bodies such as NASA, the IPCC, and NOAA.
  5. Recency: Is the Evidence Up to Date?Evidence that has been superseded by more recent findings may no longer be applicable.Example: Citing a 2023 meta-analysis on the effectiveness of renewable energy technologies is preferable to citing a study from 2001, as the newer study will have taken into account technological advancements.
  6. Relevance: Does the Evidence Bear on the Conclusion?Evidence that does not directly relate to the argument is not helpful.Example: If a speaker argues for increasing the minimum wage, citing research that shows increased minimum wages boost consumer spending is relevant because it directly supports the argument.
  7. Adequacy: Is the Evidence Sufficient to Support Its Claim?Adequate evidence means having enough quality evidence to convincingly support the claim being made.Example: If you are trying to prove that sugary drinks contribute to obesity, providing multiple studies from different credible sources, statistics on consumption rates, and expert testimony would collectively provide adequate evidence to support your claim.

Sources of Evidence

Herrick, Chapter 7 outlines different types of sources for evidence and their respective strengths and limitations:

  • Periodicals: These include scholarly journals, special-interest magazines, and news/commentary publications. Scholarly journals are considered the most reliable due to their rigorous editorial and peer-review process.
    • Scholarly journals are considered the gold standard due to their peer-review process, while special-interest publications and news magazines can offer accessible information but with less depth and more bias. They can be easily accessed via university libraries.
  • Books: Books can be useful sources of in-depth information, but it is important to consider the author’s credentials, publication date, and the type of publisher.
  • Documentaries: These can offer reliable insights but may be influenced by commercial interests or biases.
  • The Internet: Offers vast information, but requires critical assessment for credibility. Websites with recognizable authors and credible organizations are generally more reliable. Digital literacy has become an essential skill for identifying and evaluating online sources.

Digital Literacy

Digital Literacy refers to the ability to effectively navigate, evaluate, and utilize online information. Digital literacy is more than simply being able to use technology; it is about understanding how to critically evaluate the veracity and quality of digital content and its sources. Key aspects of digital literacy include:

  1. Critical Evaluation of Sources: Not all websites are created equal, and digital literacy involves determining whether an online source is credible, up-to-date, and relevant. It also requires recognizing the purpose of the content—whether it aims to inform, persuade, entertain, or mislead.
  2. Understanding Bias and Intent: It is important to understand the motives behind the creation of digital content. Websites often have particular political, social, or commercial agendas, and digital literacy involves identifying these biases. For example, a blog promoting dietary supplements might not be objective if it’s sponsored by a company that sells such products.
  3. Verification of Facts: Digital literacy requires cross-referencing information found online with multiple reliable sources. This helps verify facts and avoid falling for misinformation or “fake news.” For instance, a claim about a health benefit found on social media should be verified through medical publications or government health websites.
  4. Awareness of Digital Manipulation: The internet includes not only text but also images, videos, and audio clips, many of which may be digitally altered. Digital literacy involves assessing whether visual or multimedia evidence has been manipulated to present a biased narrative.
  5. Navigating Information Overload: The sheer volume of information available online can be overwhelming. Being digitally literate means knowing how to sift through large amounts of data to find high-quality, relevant information. This involves using effective search terms, recognizing authoritative domains (e.g., “.gov” or “.edu”), and understanding how search engine algorithms may prioritize certain content.
  6. Digital Security and Privacy: Digital literacy also includes understanding how to protect one’s privacy online and recognizing secure websites. For example, a digitally literate individual would know to look for “https://” at the beginning of a URL as an indicator of a secure website.

Example of Digital Literacy in Practice: Suppose you are researching the benefits of electric vehicles (EVs). A digitally literate approach would involve consulting a mix of sources, including reputable news organizations (e.g., Associated Press, Reuters), trusted independent technical professional organizations or public agencies (e.g., IEEE, European Alternative Fuels Observatory), and peer-reviewed journals (Energies, Transport Reviews, Journal of Power Sources). It would also involve recognizing potential biases—such as an oil company-funded blog questioning the sustainability of EVs.

Evaluating Websites

Evaluating the credibility of websites is a critical component of digital literacy. The internet contains valuable information but also a lot of misleading or false content. Here are key considerations for evaluating websites:

Key Considerations for Evaluating Websites

  • Language and Content Quality:
    • Credible websites typically use a moderate and professional tone. They avoid extreme or sensational language that appeals to emotions rather than presenting facts.
    • Grammatical accuracy and proper punctuation are often indicators of a professional and reliable website. Sites riddled with typos or casual language may lack reliability.
    • Fact-based Content: Reliable websites provide references, links to original studies, or citations to support their claims.
    • Example: A health website like Mayo Clinic (www.mayoclinic.org) provides detailed health information, cites medical sources, and avoids sensational claims about treatments.
  • Authority of the Site Creator:
    • Consider who created the website. Recognized authorities (e.g., universities, government institutions, established news organizations) provide credible content.
    • Look for the author’s credentials. An article on medical treatments should ideally be authored by a healthcare professional or medical researcher, with appropriate qualifications listed.
    • Example: The American Medical Association’s website (www.ama-assn.org) is a trustworthy source for medical information because it is maintained by a reputable professional organization.
  • External Consistency:
    • External consistency is about comparing the information on the site with other reliable sources. A credible website should not present claims that contradict established knowledge.
    • Cross-referencing helps determine if the information presented aligns with mainstream consensus or is a fringe theory.
    • Example: If a website claims that climate change is not occurring, a comparison with multiple authoritative scientific sources (e.g., NASA, NOAA, IPCC) may reveal that the claim lacks external consistency and therefore credibility.
  • Objectivity and Bias:
    • Recognize the potential bias or purpose of a website. Websites created to sell a product, promote a political agenda, or advocate for a specific cause may present information in a skewed manner.
    • Lobbying organizations, for example, may present one-sided information to persuade rather than to inform.
    • Example: Greenpeace’s website (www.greenpeace.org) provides valuable information on environmental issues but is also advocating for specific policy changes. It is important to note that the content is aimed at activism and may include a biased perspective.
  • Currency of Information:
    • Up-to-date content is crucial, especially for topics like technology, health, or science. Websites should indicate the date the content was published or last updated.
    • Outdated information can mislead or provide inaccurate conclusions if more recent research contradicts earlier findings.
    • Example: A website discussing COVID-19 treatments that has not been updated since 2020 may not reflect recent advancements, making it less reliable for current information.
  • Security of the Website:
    • Secure websites often indicate greater credibility. Look for “https://” in the URL as a sign of secure data handling.
    • Trustworthy websites also typically have an “About Us” page that details their mission, authors, and organization’s background.
  • Cross-Referencing Sources:
    • A good practice in evaluating websites is to cross-check information with other reputable sources. If multiple authoritative sites support the same conclusion, the information is more likely to be accurate.
    • Use fact-checking websites such as Snopes (www.snopes.com) or Media Bias / Fact Check (mediabiasfactcheck.com)  to verify claims and their sources that seem suspicious.
  • Avoiding Clickbait and Sensationalism:
    • Clickbait headlines are designed to attract attention but often lack substance or reliable evidence. Reliable websites present headlines that are informative and factual rather than exaggerated or misleading.
    • Example: Compare a “clickbait” headline like “5 Ways Coffee Will Instantly Cure All Health Problems!” with a more measured one such as “Research Shows Potential Health Benefits of Moderate Coffee Consumption.” The latter is more likely to come from a reputable source.

 

Further Reading:

  • Baly, Ramy, Giovanni Da San Martino, James Glass, and Preslav Nakov. “We can detect your bias: Predicting the political ideology of news articles.” arXiv preprint arXiv:2010.05338 (2020).Chiang, Chun-Fang, and Brian Knight. “Media bias and influence: Evidence from newspaper endorsements.” The Review of economic studies 78, no. 3 (2011): 795-820.

    Finlayson, Alan. “YouTube and political ideologies: Technology, populism and rhetorical form.” Political Studies 70, no. 1 (2022): 62-80.

    Kulshrestha, Juhi, Motahhare Eslami, Johnnatan Messias, Muhammad Bilal Zafar, Saptarshi Ghosh, Krishna P. Gummadi, and Karrie Karahalios. “Search bias quantification: investigating political bias in social media and web search.” Information Retrieval Journal 22 (2019): 188-227.

    Li, Heidi Oi-Yee, Adrian Bailey, David Huynh, and James Chan. “YouTube as a source of information on COVID-19: a pandemic of misinformation?.” BMJ global health 5, no. 5 (2020): e002604.

    McGrew, Sarah. “Learning to evaluate: An intervention in civic online reasoning.” Computers & Education 145 (2020): 103711.

    Morstatter, Fred, Liang Wu, Uraz Yavanoglu, Stephen R. Corman, and Huan Liu. “Identifying framing bias in online news.” ACM Transactions on Social Computing 1, no. 2 (2018): 1-18.

    Pangrazio, Luci, and Julian Sefton-Green. “Digital rights, digital citizenship and digital literacy: What’s the difference?.” Journal of new approaches in educational research 10, no. 1 (2021): 15-27.

    Sobbrio, Francesco. “Indirect lobbying and media bias.” Quarterly Journal of Political Science 6 (2011): 3-4.

    Tinmaz, Hasan, Yoo-Taek Lee, Mina Fanea-Ivanovici, and Hasnan Baber. “A systematic review on digital literacy.” Smart Learning Environments 9, no. 1 (2022): 21.