Statistical Methods for Infectious Disease Across Scales

April 5 – 7, 2024

 

Bank of America Building (103ABC)

Department of Statistics & Center for Infectious Disease Dynamics

Pennsylvania State University

 

Shweta Bansal (Georgetown University)

Title: Disentangling Social and Spatial Heterogeneity Relevant to Infectious Disease

Abstract: TBA

 

Nita Bharti (Pennsylvania State University)

Title: Navigating Gaps and Biases in Surveillance Data

Abstract:

As global health emphasizes data-driven approaches to improve health equity, it is increasingly important to assess the quality and representativeness of data that are used in decision making. We measured the inclusion of health vulnerable populations in mobile phone data, which are used to measure mobility, access to health care, and potential pathogen transmission. We find that the representation of health vulnerable populations in these data is both low and biased in ways that would magnify, rather than reduce, health inequities. We discuss strategies for detecting and overcoming data biases due to exclusion.

 

Fan Bu (University of Michigan)

Title: Epidemic Models with a “Human Touch” – Incorporating Human Behavior into Mathematical Epidemic Models

Abstract:

Traditional mathematical epidemic models often involve compartmental models that partition the population into disease-status-based groups, assuming interchangeable individuals that are well-mixed. However, as evidenced in the Covid-19 pandemic, disease transmission depends on human contact patterns and social-demographic characteristics, which could be impacted by disease spread as well. Emerging modern data, such as contact tracing, mobility data, and socio-epidemiological surveys, also provide an enriched evidence base to study the mutual influence and interplay between epidemic dynamics and human behavior. In this talk, we will discuss some statistical modeling efforts that incorporate human behavior into stochastic epidemic models to better account for individual heterogeneity as well as population-level patterns, utilizing emerging multi-modality data. Applications include an influenza cohort study with high-resolution contact tracing, and an HIV transmission study with deep-viral sequencing. We will conclude by a discussion of open questions and future research directions.

 

Forrest Crawford (Yale University)

Title: Causal Inference For Infectious Disease Interventions

Abstract:

Vaccine trials conducted in groups of connected or potentially interacting subjects are sometimes necessary during infectious disease outbreaks. Two serious complications arise in this setting: 1) when the pathogen of interest is contagious – transmissible between study subjects – outcomes may exhibit strong dependence even in the absence of treatment; and 2) the vaccine treatment may exhibit interference or spillover, in which individuals’ infection outcomes may depend on treatments received by others. Epidemiologists have introduced several competing – and incompatible – formalisms for dealing with these problems. In fact, some approaches to estimating causal vaccine effects in randomized trials of interacting subjects guarantee sign bias in large samples, falsely indicating that a beneficial intervention is harmful, or vice versa. In this presentation I will introduce a causal framework for understanding infectious disease transmission and the effects of interventions on infection outcomes. I will discuss a synthesis of two broad research efforts: causal inference for individual vaccine effects in observational and randomized trials, and population-level transmission modeling. I outline the causal structure of contagion, identification of meaningful individual effects, and generalization of these effects to counterfactual population-level epidemic trajectories. Finally, I describe some of the pitfalls of ignoring contagion in studies of infectious disease interventions.

This is joint work with many other researchers, including Xiaoxuan Cai, Olga Morozova, Daniel Eck, Wen Wei Loh, and Eben Kenah.

 

Nianqiao “Phyllis” Ju (Purdue University)

Title: SNP-Slice Resolves Mixed Infections: Simultaneously Unveiling Strain Haplotypes And Linking Them To Hosts

Abstract:

Multi-strain infection is a common yet under-investigated phenomenon of many pathogens. Currently, biologists analyzing SNP information have to discard mixed infection samples, because existing downstream analyses require monogenomic inputs. Such a protocol impedes our understanding of the underlying genetic diversity, co-infection patterns, and genomic relatedness of pathogens. A reliable tool to learn and resolve the SNP haplotypes from polygenomic data is an urgent need in molecular epidemiology. In this work, we develop a slice sampling Markov Chain Monte Carlo algorithm, named SNP-Slice, to learn not only the SNP haplotypes of all strains in the populations but also which strains infect which hosts. Our method reconstructs SNP haplotypes and individual heterozygosities accurately without reference panels and outperforms the state of art methods at estimating the multiplicity of infections and allele frequencies. Thus, SNP-Slice introduces a novel approach to address polygenomic data and opens a new avenue for resolving complex infection patterns in molecular surveillance. We illustrate the performance of SNP-Slice on empirical malaria and HIV datasets and provide recommendations for the practical use of the method.

 

Dave Kennedy (Pennsylvania State University)

Title: Exploiting Selection Bias: Host Jumps and Host Heterogeneity

Abstract:

Selection bias is when a sample is not representative of the population it is supposed to have been drawn from. Although typically viewed as a nuisance, in some situations selection bias itself can be leveraged to gain novel insight about infectious disease dynamics. In the first part of this talk, I present a framework for identifying how spillover frequency translates to host jump risk. I show that due to selection bias, pathogens that spill over frequently are not inherently more likely to jump hosts than pathogens that spill over rarely. In the second part of my talk, I present a novel method that can be used to detect and estimate host heterogeneity in susceptibility to infection using only contact tracing data. The premise of the method is that exposure to a pathogen creates a form of selection bias, such that individuals not infected after being in a contact network are likely more resistant than average. The difference in future infection risk between these individuals and individuals that have never been exposed can then be used to back-calculate heterogeneity in susceptibility.

 

Zehang “Richard” Li (University of California Santa Cruz)

Title: Domain Adaptive Mortality Surveillance Using Verbal Autopsies

Abstract:

Worldwide, two-thirds of deaths do not have a cause assigned. Verbal autopsy (VA) is a well-established tool to collect information describing deaths outside of hospitals by conducting surveys to caregivers of a deceased person. The collected data are then analyzed by statistical algorithms to produce a cause assigned to each death and to estimate the fraction of deaths due to each cause in the population. In the last decade, VA has expanded from research activities into large-scale routine data collection in many low- and middle-income countries. While data collection has scaled up quickly, data with high-quality labels remain sparse. Several methodological challenges exist in analyzing VA data in this new context. In this talk we discuss two projects that address the generalizability of VA models under data shift. In the first case, we examine how labelled data from multiple heterogeneous source populations can be used to improve cause-of-death assignment in a new target population without labelled data. In the second case, we discuss models and considerations for estimating time-varying cause-specific mortality fractions for small sub-populations during disease outbreak. I will also discuss some ongoing work and challenges on extending the framework to incorporate active sampling of deaths. This is joint work with Zhenke Wu, Sam Clark, Yu Zhu, Irena Chen, and Mengbing Li.

 

Ayesha Mahmud (University of California Berkley)

Title: Modeling Transmission Dynamics Of Directly-Transmitted Diseases Using Data From Contact Studies

Abstract:

Demography and social structures shape almost all aspects of an infectious disease outbreak in a population – from host susceptibility and exposure to transmission and health outcomes. However, the social forces that shape human behavior and contact patterns are difficult to quantify and are often omitted from mathematical epidemiological models. At the start of the COVID-19 pandemic, the Berkeley Interpersonal Contact Study (BICS) was launched to measure changes in contact rates and behavior in the U.S. over the course of the pandemic. I will present some recent work bridging statistical insights about human behavior from BICS data with mechanistic models of respiratory disease transmission. A major advantage of this data is that it allows us to examine heterogeneities in contact and health-related behavior by key demographic characteristics (such as age and race/ethnicity) as well as less well studied characteristics (such as occupation and political identity). Surveillance of contact patterns and health behaviors can help us understand heterogeneities in risk of infection over time and across population sub-groups, but many open questions remain about how to collect these data to ensure population representativeness and at a sufficient spatial and temporal resolution, about connecting these types of data to disease transmission events, and the extent to which observed patterns can be generalized and incorporated into mathematical transmission models.

 

Pamela Martinez (University of Illinois Urbana-Champaign)

Title: Immune History And Coronaviruses: From Serostatus To Booster Impacts

Abstract:

Using statistical models, we classified cross-sectional seroprevalence data for seasonal coronaviruses HKU1, NL63, 229E, and OC43, and identified patterns of seropositivity levels suggesting varying levels of immune history. While traditional methods use binary serocatalytic models to capture disease parameters, we expanded these models to include varying serostatus, finding a sharp increase in seroreversion rates after the first seropositive level. At high seropositivity, alphacoronavirus seroconversion and seroreversion were less frequent than for betacoronaviruses. I will discuss the methodological challenges and opportunities of relating seropositivity with susceptibility to infection through neutralizing antibody titer data and with disease transmission at the population level. During the second part of my talk, I will provide an example of how neutralizing antibody titers stratified by SARS-CoV-2 immune history can be helpful in parametrizing models of transmission, particularly on the levels of protection against future infections. These models could then be used to evaluate the impact of booster vaccine formulations on populations with different degrees of immune history.

 

Volodymyr Minin (University of California Irvine)

Title: Inference, Nowcasting, and Forecasting Using Multiple Surveillance Data Streams

Abstract:

The area of statistical modeling of infectious disease dynamics is actively responding to the challenges and opportunities offered by the increasing abundance of relevant data from electronic surveillance systems, seroprevalence studies, genetic sequencing of pathogens, and wastewater sampling. Determining what combinations of data streams are optimal for particular inferential or forecasting tasks remains an open question. We describe our work in progress developing novel statistical methods to combine multiple surveillance data streams to improve both inference, including nowcasting, and forecasting of infectious disease dynamics. We furnish a series of semi-parametric Bayesian compartmental models and demonstrate that this class of models can effectively integrate passively collected time series of diagnostic tests, mortality data, seroprevalence data, and wastewater pathogen concentrations. Using retrospective inference of California COVID-19 data sets we evaluate the utility of each data stream in the context of nowcasting and short-term forecasting. Lastly, we focus on healthcare demand forecasting during epidemic surges of pathogen variants capable of immune escape. We incorporate time series of cases, hospitalizations, ICU admissions, deaths, and genetic sequence counts into a Bayesian model and show that using genetic information leads to superior forecasting performance, compared to traditional models.

 

Olga Morozova (University of Chicago)

Title: Integrating Novel Surveillance Data In Infectious Disease Analytics

Abstract:

Effective response to infectious disease epidemics requires reliable data to inform decisions. Correct estimation of important epidemiologic quantities, such as transmissibility and effective reproduction number, is the basis for forecasting and scenario analyses, upon which policymakers rely to identify thresholds for the implementation of interventions. Traditional infectious disease surveillance, including case, mortality, and hospital surveillance data, has historically been the primary basis for epidemiologic monitoring and forecasting. These data offer many advantages, including direct interpretation and policy relevance. However, traditional surveillance data are susceptible to long lags and time-varying biases due to incomplete ascertainment of infections, errors in recording, complex reporting structures, and changes in human behavior.

The COVID-19 pandemic triggered the wide-scale implementation of novel infectious disease surveillance mechanisms that rely on passive monitoring of human mobility and detection of biological material in wastewater and air. These novel data sources show promise in strengthening epidemiologic surveillance through routine data collection and analysis that are not susceptible to the biases of traditional surveillance. At the same time, novel surveillance data are not directly interpretable, noisy, and their relevance may depend on the stage of an epidemic and response measures, making it challenging to use these data to identify subtle signals and integrate them into a comprehensive analytic pipeline.

In this talk, I will discuss an example of successful integration of mobile device data into epidemiologic modeling and forecasting, lessons learned from this project, and future opportunities and challenges of integrating mobility, wastewater, and air sensing surveillance into infectious disease analytics to support public health decision-making.

 

Katriona Shea (Pennsylvania State University)

Title: Uncertainty and the Management of Outbreaks: Harnessing the Power of Multiple Models

Abstract:

During outbreaks of weeds, pests and infectious diseases, uncertainty hinders our ability to forecast dynamics, and to make critical decisions about management. In particular, disparate epidemiological projections from different modeling groups, arising from different scientific descriptions of the underlying biological and management processes, may hamper intervention planning and response by policy makers. Drawing on methods from expert elicitation and judgment, we can harness the expertise of multiple modeling groups in a structured decision theoretic framework. I will discuss pandemic and ongoing use of these methods by the COVID-19 Scenario Modeling Hub.

 

Kayoko Shioda (Boston University)

Title: 1) Social Contact Data for Infectious Disease Modeling; and 2) Target Trial Emulation for Vaccine Evaluation

Abstract:

1) Social Contact Data for Infectious Disease Modeling
Understanding social contact patterns is crucial for comprehending disease spread dynamics. While mechanistic mathematical models provide insights, data on contact patterns among sick individuals with acute infections are limited. Our project addresses this gap by investigating temporal changes in contact patterns among individuals with acute respiratory infections or gastroenteritis, along with their household members (exposed close contacts). Through collaboration with Kaiser Permanente Northwest, we use clinic-based approaches to recruit cases of all ages, gathering longitudinal data on social contact over a two-week period. By structuring transmission dynamic models based on empirical behavioral insights, we aim to enhance the accuracy of estimations of key transmission parameters and intervention impacts, using SARS-CoV-2 and rotavirus as examples.

2) Application of Target Trial Emulation (TTE) for Vaccine Evaluation
TTE emulates randomized controlled trials using observational data, allowing comprehensive assessments of vaccine impact across diverse populations and dosing schedules. TTE addresses selection bias and immortal time bias by explicitly defining follow-up times and accounting for infection risks during interdose intervals. However, estimating the indirect effect of vaccines within the TTE framework presents challenges, which I aim to overcome by integrating compartmental transmission modeling.

 

Saki Takahashi (Johns Hopkins University)

Title: Epidemiological Inference from Correlated Serological Data

Abstract:

Population susceptibility and immunity play a central role in driving infectious disease dynamics. As these quantities are typically unobserved, serology is often the best way to illuminate past immune exposures in the population. There has been rapid expansion in the quantity and diversity of serological data in recent years, particularly those using multiplex immunoassays which allow for simultaneous measurement of antibodies to multiple antigens. However, obtaining epidemiological inference from these data is complicated by various factors including antibody kinetics, exposure type, host factors and individual-level variation, and imperfect test performance characteristics and batch effects. In this talk, I will describe recent work in both data generation and methods development to address some of these challenges, drawing from examples of population serosurveys and longitudinal cohort studies that we and our collaborators have been conducting for various pathogens.

 

Lance Waller (Emory University)

Title: Maps: A Statistical View

Abstract:

Spatial statistical analysis builds upon the premise that where something happens can influence what happens, i.e., the location of observations can provide information on the observations themselves. Location can be defined on geographic maps and in geometric space, but geography often involves information beyond simple location, distance, and direction. Here, we will explore how geography influences inference in spatial statistical analyses and offer geographic insights on familiar statistical constructs such as data visualization, asymptotics, classical and Bayesian inference, weighted estimation, model diagnostics, and compromises between design and modeling. We will discuss compromises between geographic and statistical precision, statistical precision and local and global probabilistic strategies for ensuring data confidentiality. Using historical and contemporary examples from disease ecology, we will illustrate how maps provide a critical context for data visualization and interpretation, ranging from the known (“You are here”) to the unknown (“Here be dragons”).

 

Jason Xu (Duke University)

Title: Exact Bayesian Inference for Stochastic Epidemic Models via Data Augmented MCMC

Abstract:

We propose novel data-augmented Markov Chain Monte Carlo strategies to enable fast and exact Bayesian inference under the stochastic susceptible-infected-removed model and its variants. In common surveillance studies such as the incidence data setting, where we are given only discretely observed counts of infection, significant challenges to inference arise due only a partially informative glimpse of the underlying continuous-time process. To account for the missing data while targeting the exact posterior of model parameters, we make use of latent variables that are jointly proposed from surrogates related to branching processes, carefully designed to closely resemble the SIR model. This allows several conditional sampling strategies that make classical MCMC ideas practical, surmounting the intractable observed data likelihood. The method extends to non-Markovian settings as well as tasks such as simultaneous change-point detection under time-varying transmission.

 

Jon Zelner (University of Michigan)

Title: Spatial Mechanisms Or Social Residues? How Should We Make Sense Of Intersecting Social And Environmental Drivers Of Infection

Abstract:

Much has been written in recent years about the impact of processes of social discrimination, such as occupational and residential segregation, racial capitalism, mass incarceration and other dimensions of social and economic inequity on the spatial and sociodemographic patterning of death from SARS-CoV-2. While the COVID-19 pandemic has redirected attention towards the centrality of these processes in the creation of spatially distinct patterns of transmission, they have been central to the patterning of infection and death in prior pandemics, such as the 1918 influenza, seasonal influenza and other acute respiratory infections, as well as HIV, tuberculosis, and transmission of sexually-transmitted infections. Much of the empirical work in this area has shown how social phenomena such as residential segregation function as high-level ‘fundamental’ causes of infection inequity that orchestrate and induce correlations between downstream, more-proximal processes of exposure, infection, and death. The spatial patterning of infection and death associated with these social-structural causes often reflects the spatial distribution of socioeconomic risk factors, e.g. concentrated poverty, lack of access to healthcare, poor quality housing.

This raises an important and difficult question: To what extent is the spatial clustering of this kind of risk reflective of local spatial processes of transmission versus the higher-level factors that place people at risk-of-risk? In this talk I will leverage recent results from work conducted by myself and members of my research group to talk about the implications of these differing perspectives for statistical analysis of socio-spatial patterning of infection risk, suggest approaches to closing the gap between them, and highlight the types of data that might – and might not – help us get there.

 


Organizers/moderators (Penn State)

Le Bao
Ottar Bjornstad
Matthew Ferrari
Ephraim Hanks
Murali Haran