CV

 

Education


Ph.D. dual title in Statistics and Social Data Analytics, Penn State University, Expected: May 2021

B.S. in Civil and Environmental Engineering, Carnegie Mellon University, Dec 2012

 

 

Awards and Fellowships


 Big Data Social Science – Integrated Graduate Education and Research Training (BDSS-IGERT) Fellow (2016-2018)

University Graduate Fellow (2015-2017)

 

Papers


Zhang A., Bao. L, Daniels M. Approximate cross-validated mean estimates for Bayesian hierarchical regression models. (Submitted: arXiv code)

 

Li X., Al-Zaidy R., Zhang A., Bao L., Baral S., Giles C.L.. Document Classification with Distant-Supervision for Systematic Review of HIV Prevalence Data among Female Sex Workers. (Submitted: arXiv)

Felmlee D., Rodis P. I., & Zhang A. (2020). Sexist slurs: reinforcing feminine stereotypes online. Sex Roles, 83(1), 16-28. (link)

 

Niu X., Zhang A., Brown T., Puckett R., Mahy M., Bao L. (2017). Incorporation of hierarchical structure into estimation and projection package fitting with examples of estimating subnational HIV/AIDS dynamics. AIDS, 31, S51-S59. (link)

 

 

Research Project Summary


Completed:

 

Approximate cross-validated mean estimates for Bayesian hierarchical regression

  • Novel method for cross-validated (CV) mean estimates, used with any CV scheme
  • Order of magnitude faster than comparable methods
  • Accuracy improves upon comparable methods; often equivalent to manually re-fitting models
  • Theoretical results and comparisons over variety of publicly available data sets

 

Cyberbullying on Twitter. IGERT Rotation, 2016-2017

  • Collected 2.9 million tweets containing sexist slurs
  • Classified sentiment towards a target term based on word distance
  • Analyzed negative stereotypes by compared sentiment of tweets with key adjectives vs those without
  • Joint work with Dr. Felmlee in Department of Sociology 

 

Ongoing:

 

Quantifying the effect of data availability on model estimates.

  • Data imbalance can have a large impact on model estimates
  • Novel method which quantifies the effect of each data point on model estimates
  • Comparing data availability to effect on estimates contextualizes data imbalance

 

 

Professional Experience


Graduate Data Science Summer Program Fellow, National Institutes of Health, Summer 2019

  • Personalized medicine: Predicted drug sensitivity in 43 cell lines with multiple myeloma using 84,738 biomarkers for 8 drug compounds at 11 dosages
  • Compared dimension reduction methods to capture high-level relationships among biomarkers and transform high-dimensional feature space

Intern, Joint United Nations Programme on HIV and AIDS(UNAIDS), Summer 2018

  • Evaluation and development of new incidence rate ratios (IRRs) used in UNAIDS’ proprietary software to estimate annual HIV infections and deaths due to HIV
  • Results presented at UNAIDS Reference Group Meeting in September, 2018

 

Technical Writer, MicroStrategy, 2013-2014

  • Translated developer-speak into client-facing documentation for APIs and GUIs

Identified primary needs for customers in the struggle to use

 

Technical Skills


  • Regression
  • Semiparametric models
  • Hierarchical Bayesian models
  • Statistical learning
  • R
  • Python
  • STAN

 

 

Contact


axzhang [at] psu [dot] edu