Lesson on Statistical Evidence in Argumentation – Evaluating Survey Accuracy

 

 

Sampling & Evaluating Survey Accuracy

Posted by Keren Wang, 2024

*Before starting this lesson, make sure you have reviewed: Statistical Evidence: Survey and Opinion Polling

*

Sampling Error and Polling Accuracy Case Study: 2016 U.S. Presidential Election

During the 2016 U.S. Presidential Election, many national polls predicted a victory for Democratic candidate Hillary Clinton. While Clinton won the popular vote by around 2.1%, Republican candidate Donald Trump won the Electoral College, securing the presidency. [1

Huffington Post 2016 Election Prediction

Image: Huffington Post’s 2016 US Presidential Election prediction updated on the eve of the election day.

The challenges pollsters faced with predicting the 2016 election shed light on a common problem with using statistics in arguments: numbers can give us a false sense of certainty. Stats are powerful—they carry authority and can seem “objective”—but they’re rarely as clear-cut as they appear. In reality, they often come with hidden biases and assumptions that may not capture the full picture. [2]

In 2016, pollsters and analysts leaned heavily on polling data to forecast the election outcome, treating the numbers almost like a science. But they didn’t account for factors like sampling errors, social desirability bias, and last-minute changes in voter sentiment, all of which skewed the predictions. The result? A widespread belief that one outcome was nearly certain—until it wasn’t.

This reliance on numbers to tell a definitive story shows how easy it is to be misled by the “authority” of stats. It’s a reminder that while statistical evidence can be persuasive, it’s not infallible. To use data responsibly in arguments, we need to present it with a little humility, recognizing its limitations and the need to pair it with other forms of analysis. Instead of seeing numbers as the whole truth, we should treat them as one piece of the puzzle, open to interpretation and discussion. [3]

Evaluating the Use of Polling Evidence in 2016 US Presidential Election

1. Non-Response Bias

 

Impact: Non-response bias occurs when certain demographics are less likely to respond to polls, which can skew results. In 2016, many polls underrepresented rural and working-class voters, groups that tended to favor Donald Trump. These groups were harder to reach and less likely to respond to traditional polling methods. [4]

 

Problematic Use in Argumentation: Analysts and commentators who used these poll results often overlooked or underestimated the impact of this underrepresentation. News networks frequently relied on results from similar polling agencies, creating a feedback loop that reinforced a constructed “reality” of Clinton’s expected victory. This effect was further amplified as media outlets fed off each other’s election news stories and headlines, creating a narrative that appeared authoritative but was actually based on incomplete data. This collective overconfidence in Clinton’s chances contributed to a misleading perception that didn’t reflect the complexities and variances among voter demographics.

2. Late Shifts in Voter Preferences

Impact: Many voters made up their minds close to Election Day, influenced by last-minute campaign events, media coverage, or debates. Polling, however, generally stops a few days before the election, often missing these late shifts. In 2016, a significant portion of undecided voters shifted toward Trump in the final days, which wasn’t captured in most polling. [5] The reasons for this shift are complex, but one contributing factor may have been social desirability bias—some Trump supporters may not have honestly disclosed their preferences to pollsters, fearing negative judgment from their friends and family members. As a result, these voters remained hidden within the undecided category, skewing polling data away from an accurate portrayal of support for Trump. [6]

Problematic Use in Argumentation: This late shift was largely invisible in the polling data, leading analysts to underestimate Trump’s chances. When using this data for argumentation, commentators tended to either overlook the intrinsic time constraint of surveying, or erroneously assume that the voters who were undecided would either not vote or distribute evenly across candidates. [7] This assumption failed to account for the unpredictability of undecided voters, ultimately leading to faulty conclusions.

3. Sampling Error

 

Impact: Sampling error, the statistical error that occurs when a poll’s sample does not perfectly represent the population, was especially impactful in closely divided states. In 2016, even minor errors in states like Michigan, Wisconsin, and Pennsylvania, where the polls showed narrow leads for Clinton, contributed to a misleading picture. The underestimated support for Trump in these states shifted the Electoral College outcome in his favor. [8]

Studies have found that that 2016 election polls across the board suffer from a range of sampling malpractices—often collectively referred to as “total survey error.” Reported margins of error typically capture only sampling variability and often ignore systemic sampling errors, such as uncertainty in defining the target population, particularly regarding who will likely to vote. An empirical analysis 2016 US presidential election cycles found that average survey error was around 3.5 percentage points—approximately double the margin implied by most reported margins of error. Polls were also predominately erred on the side of overestimating Clinton’s performance, which was partly due to similar types of unwarranted assumptions about voter demographics by major polling organizations. This shared “meta-bias” creates further compound inaccuracies, especially in close races as seen in 2016. [9]

Problematic Use in Argumentation: Polling margins of error are often presented as minor uncertainties, with little impact on the overall narrative. In 2016, this assumption was problematic because the race in key states was so close that even a small sampling error could, and did, shift the predicted outcome. The statistics carried an aura of scientific objectivity, which masked underlying biases and imperfect assumptions that remained tacit and hidden behind cold numbers. News media perpetuated this misconception by over-relying on the seemingly definitive value of these numerical data, interpreting polling results as if they offered predictable and accurate insights. [10] This contributed to overconfidence in Clinton’s prospects and led commentators to misjudge the actual electoral dynamics in crucial swing states.

 

The Problem of Sampling

Sampling is the process of selecting a subset of individuals from a larger population to make inferences about that population. Effective sampling ensures that survey findings are representative and reliable.

The goal of sampling is to accurately reflect the larger population’s characteristics by selecting a group that is both representative and adequately sized. This section of the reading covers three primary sampling methods used to create a sample that reflects the diversity and characteristics of the population being surveyed: simple random sampling, stratified random sampling, and clustered sampling:

Simple Random Samples

A simple random sample gives each member of the population an equal chance of being selected. This method involves a straightforward, single-step selection process.

Process

Researchers assign a number to each individual in the population and use a random number generator or a table of random numbers to select individuals.

Example

In a survey of university students’ media consumption habits, a researcher may use a list of all enrolled students, assign each a number, and then use a random number generator to pick students for the survey.

Benefits

This method helps prevent bias since each individual has an equal opportunity to be included. It’s often the most representative sampling method if done correctly.

Limitations

Simple random sampling can be challenging and time-consuming when dealing with large populations, as researchers need an accurate, complete list of all members.

Stratified Random Samples

Stratified random sampling involves dividing the population into subgroups (strata) based on relevant characteristics and then randomly selecting individuals within each stratum.

Process

Researchers identify categories (e.g., age, gender, ethnicity), divide the population accordingly, and randomly sample individuals from each category. This ensures each subgroup is adequately represented.

Example

If a study examines the impact of social media on mental health among high school students, researchers might stratify the sample by grade level (e.g., freshman, sophomore) and then randomly select students within each grade level to participate.

Benefits

This method increases precision by ensuring the sample reflects key population characteristics, making it valuable for ensuring representation across specific groups.

Limitations

Stratified sampling requires additional time and resources to divide the population and select individuals within each subgroup. It assumes that researchers know which subgroups are relevant to the study.

Clustered Samples

Clustered sampling selects groups (clusters) rather than individual members, which is useful for large, widely distributed populations or when a complete list of members is impractical.

Process

Researchers divide the population into naturally occurring clusters, such as geographical locations, and then randomly select clusters. Within each cluster, they may survey all members or randomly choose individuals.

Example

In a survey on internet access across a large city, researchers might select certain neighborhoods as clusters and then survey individuals within those neighborhoods.

Benefits

Cluster sampling saves time and reduces travel costs, especially for geographically dispersed populations. It’s often more practical and economical for large-scale studies.

Limitations

Clustered sampling can lead to sampling bias if clusters are not representative of the overall population and is generally less precise than other methods due to potential similarities within clusters.

Obtaining Samples

Random sampling ensures that each member of the population has a known, non-zero chance of being selected, minimizing bias and improving the representativeness of the sample. Here’s how the process works in the three sampling methods discussed earlier:

1. Simple Random Sampling

Researchers create a list of every individual in the population and assign a sequential number to each. Using a table of random numbers or software, they randomly select numbers corresponding to individuals.

  • Population Members: Imagine 100 individuals in a line, represented as gray dots labeled from 1 to 100 as shown below.
  •  Random Sample Selection: 15 individuals (most commonly selected via computer-generated random numbers) are highlighted in blue among the gray dots, showing how a simple random sample is chosen without any grouping or structure.

This method works best with smaller, manageable populations where researchers have full access to an accurate population list. [11]

Simple Random Sampling

2. Stratified Random Sampling

This method typically involves dividing the population into distinct groups or strata based on relevant characteristics (such as age, income, education level). Within each stratum, a simple random sample is then conducted.

By sampling within each group, researchers can control for potential influences that specific characteristics might have on the survey’s findings, thereby increasing the sample’s representativeness. [12]

Stratified Random Sampling

3. Clustered Sampling

When it is impractical to list every individual in a large population, researchers divide the population into clusters, often based on geographic or organizational divisions.

They randomly select entire clusters and survey individuals within those selected clusters. This can involve surveying everyone in each cluster or using random sampling within clusters for larger groups. [13]

Clustered Sampling

Evaluating Survey Accuracy

This section explores three critical factors for assessing survey accuracy: Sample Size, Margin of Error, and Confidence Level. Understanding these elements helps researchers determine the reliability of their survey results and interpret findings with appropriate caution.

1. Sample Size

Definition: The number of individuals or units chosen to represent the population in the survey.

Importance: Larger samples generally provide more accurate data. The relationship between sample size and accuracy follows the law of diminishing returns, meaning that after a certain point, increases in sample size result in only minor improvements in accuracy.

Key Concept: Sampling Error decreases as sample size increases. However, the increase in precision grows smaller as the sample size becomes very large.

Example: Imagine researchers want to understand coffee preferences across a city with 100,000 residents. They conduct a survey to find out what percentage of residents prefer iced coffee over hot coffee:

  • The researchers initially surveyed 100 residents and found that 60% prefer iced coffee. However, with only 100 people surveyed out of 100,000, this small sample has a higher margin of error, potentially making the results less representative of the entire population.
  • To get a more precise estimate, they increase the sample size to 1,000 people, which lowers the margin of error. As the sample size grows, the accuracy of the result improves, giving a clearer picture of the true percentage of residents who prefer iced coffee.

2. Margin of Error

Definition: The range within which the true population parameter is expected to fall, considering sampling variability.

Role in Surveys: The margin of error shows the possible deviation between the survey’s results and the actual population values.

Calculation: It’s derived from the standard error and sample size, reflecting how representative the sample is of the population.

Example : In the same coffee preferences survey scenario:

  • With the 100-person survey, they might have a margin of error of ±10% (at 95% confidence level), meaning the true preference for iced coffee could be anywhere between 50% and 70%.
  • With the 1,000-person survey, the margin of error decreases to ±3% (at 95% confidence level), so they can now be more confident that the true preference is between 57% and 63%.
  • With a 2,000-person survey, the margin of error further goes down to ±2% (at 95% confidence level).

3. Confidence Level

Definition: The degree of certainty that the population parameter lies within the margin of error.

Common Confidence Levels: 95% confidence is standard, meaning if the survey were repeated multiple times, 95% of the results would fall within the margin of error.

Confidence Interval: This is the range constructed around the survey result to indicate where the true population parameter is likely to be, given the confidence level.

Example: In the coffee preferences survey scenario:

  • 95% Confidence Level (C.L.): The researchers can be 95% confident that the true percentage of iced coffee preference lies within their calculated margin of error (±3%).
  • 99% C.L.: If they want to be even more certain, they could use a 99% confidence level, increasing the margin of error to ±4% for the 1,000-person survey.
  • To maintain the same margin of error at a 99% confidence level, a larger sample size would be required, such as 2,000 people to achieve a ±3% margin of error.

Sample Size and Margin of Error Chart

Chart: The chart above illustrates the relationship between sample size and margin of error across different confidence levels (95% and 99%).

As sample size increases, the margin of error decreases, making the survey more precise. Higher confidence levels (e.g., 99%) result in a larger margin of error, meaning we can be more confident in the results but within a wider range. The diminishing effect of increasing sample size shows that the margin of error decreases rapidly with smaller samples but flattens out at higher sample sizes.

*

 

Further Reading

1. Courtney Kennedy et. al., An Evaluation of the 2016 Election Polls in the United States, Public Opinion Quarterly, Volume 82, Issue 1, Spring 2018.

2. Joshua J. Bon, Timothy Ballard, Bernard Baffour, Polling Bias and Undecided Voter Allocations: US Presidential Elections, 2004-2016, Journal of the Royal Statistical Society Series A: Statistics in Society, Volume 182, Issue 2, February 2019, Pages 467-493.

3. Wright, Fred A., and Alec A. Wright. “How surprising was Trump’s victory? Evaluations of the 2016 US presidential election and a new poll aggregation model.” Electoral Studies 54 (2018): 81-89.

4. Battersby, Mark. “The Rhetoric of Numbers: Statistical Inference as Argumentation.” (2003).

5. Hoeken, Hans. “Anecdotal, statistical, and causal evidence: Their perceived and actual persuasiveness.” Argumentation 15 (2001): 425-437.

6. Giri, Vetti, and M. U. Paily. “Effect of scientific argumentation on the development of critical thinking.” Science & Education 29, no. 3 (2020): 673-690.

7. Gibson, James L., and Joseph L. Sutherland. “Keeping your mouth shut: Spiraling self-censorship in the United States.” Political Science Quarterly 138, no. 3 (2023): 361-376.

8. Roeh, Itzhak, and Saul Feldman. “The rhetoric of numbers in front-page journalism: how numbers contribute to the melodramatic in the popular press.” Text-Interdisciplinary Journal for the Study of Discourse 4, no. 4 (1984): 347-368.

9. Ziliak, Stephen T., and Ron Wasserstein. “One Thing About… the Rhetoric of Statistics.” CHANCE 36, no. 4 (2023): 55-56.

New Publication Announcement: Legal and Ritological Dynamics of Personalized “Pillars of Shame” in Chinese Social Credit System Construction

I am delighted to announce the publication of my latest article, Legal and Ritualological Dynamics of Personalized ‘Pillars of Shame’ in Chinese Social Credit System Construction,” featured in the latest issue of The China Review (Vol. 24, No. 3). This work explores the intersection of the Chinese Social Credit System (SCS) with the Confucian ritual legal tradition and the rhetoric of public shaming. It integrates insights from rhetorical studies and philosophy of law to examine how the SCS operates as both a governance-by-data experiment and a framework that aligns with—and diverges from—domestic and transnational constitutional norms.

In particular, the article delves into the use of personalized “public shaming” by local Chinese authorities, analyzing how these practices serve as ritualistic public performances aimed at restoring trustworthiness in a hyper-connected society. By positioning the SCS within the broader context of Chinese intellectual history and legal tradition, the study reveals the complex dynamics of this system as a modern tool of governance.

Below is the abstract for the article, and you can access the full text here.

Abstract: This article argues that the construction of the Chinese Social Credit System (SCS) largely adheres to the Confucian ritual legal tradition, serving as a tacit “societal constitutional” framework in contemporary China. On the one hand, the SCS aligns with established normative traditions and moral language inherent in Chinese culture. On the other hand, it represents a divergence from post-WWII transnational constitutionalism and rule-of-law norms, contrasting externally with international standards and internally with socialist rule-of-law narratives. This study examines one of the most ambitious social engineering projects in post-economic reform China, which also represents a significant 21st-century governance-by-data experiment. The first part of the article leverages perspectives from Chinese intellectual history, ritual studies, and comparative legal scholarship as analytical tools to examine the deeper discursive structures within the SCS. The second part uses a transdisciplinary approach to analyze recent instances of data-driven, personalized “public shaming” as urban enforcement by local Chinese authorities. These practices, symbolizing “pillars of shame,” function not only as disciplinary mechanisms against chronic debt defaulters, known as lǎolài, but also as public rituals performed to restore trustworthiness in an “always-connected” society.

I want to extend my gratitude to Björn Ahl (Institute of East Asian Studies, University of Cologne) for organizing this special issue focused on Law and Social Credit in China. As highlighted in his introduction co-authored with Larry Catá Backer (Pennsylvania State University), and Yongxi Chen (ANU College of Law), the development of the SCS signals a “fundamental transformation of how law is enforced, as well as a profound alteration of the forms and functions of law itself.”

This issue also features insightful contributions from leading scholars in the field, including:

  • Marianne von Blomberg and Björn Ahl, Debating the Legality of Social Credit Measures in China: A Review of Chinese Legal Scholarship
  • Haixu Yu, The Evolving Complex of the Chinese Corporate Tax Credit System and Tax Law
  • Larry Catá Backer, Social Credit ‘in’ or ‘as’ the Cage of Regulation of Socialist Legality
  • Chun Peng, Building a High-trust Society: Lineage, Logic, and Limitations of China’s Social Credit System
  • Yongxi Chen, Disregarding Blameworthiness, Prioritizing Deterrence: Social Credit-based Punishment and the Erosion of Individual Autonomy

The full article of “Legal and Ritualological Dynamics of Personalized ‘Pillars of Shame’ in Chinese Social Credit System Construction” may be accessed here.

Announcing Upcoming Book: “Legal and Rhetorical Foundations of Economic Globalization: An Atlas of Ritual Sacrifice in Late-Capitalism” (in press with Routledge)

(November 2nd, 2018)

I am very pleased to announce my new book contract with Routledge | Taylor & Francis Group for the publication of my academic monograph: Keren Wang, Legal and Rhetorical Foundations of Economic Globalization: An Atlas of Ritual Sacrifice in Late-Capitalism. Routledge intends to publish this book in both hardback and digital formats, and the hardback version is expected to be in print by the end of next year.

This book was developed from my doctoral dissertation, “Three Studies of Ritual Sacrifice in Late Capitalism.” I would like to extend my special thanks to Stephen H. Browne, my dissertation supervisor, and to members of my dissertation advising committee: Larry Catá Backer, Kirt H. Wilson, and Jeremy David Engels. This project would not have been possible without their guidance and mentorship. I would like to also express my gratitude to members of the Department of Communication Arts and Sciences at Penn State University, for their generous continuous support of my Ph.D study and related postdoctoral research.

My forthcoming book examines the rhetoric of political and economic sacrifice under neoliberalism, both within the U.S. and at a global level. While the book contains philosophical and theoretical ideas that would be useful to teach in both advanced undergraduate and graduate courses,  it also delineates a rhetorical theory of sacrifice as a way to address both the general question of the relationship between rhetoric and political community (what Kenneth Burke might frame as the dialectic between identification and division), and the specific issue concerning biopolitics (or who is “made to live and prosper,” at the expense of whom) under late capitalism. The book also contributes to the study of the connection between the theological and the political, as exemplified by Burke in rhetorical studies, in relation to a broad set of discussions revolving round economic globalization.