Education
Ph.D. dual title in Statistics and Social Data Analytics, Penn State University, Expected: May 2021
B.S. in Civil and Environmental Engineering, Carnegie Mellon University, Dec 2012
Awards and Fellowships
Big Data Social Science – Integrated Graduate Education and Research Training (BDSS-IGERT) Fellow (2016-2018)
University Graduate Fellow (2015-2017)
Papers
Zhang A., Bao. L, Daniels M. Approximate cross-validated mean estimates for Bayesian hierarchical regression models. (Submitted: arXiv code)
Li X., Al-Zaidy R., Zhang A., Bao L., Baral S., Giles C.L.. Document Classification with Distant-Supervision for Systematic Review of HIV Prevalence Data among Female Sex Workers. (Submitted: arXiv)
Felmlee D., Rodis P. I., & Zhang A. (2020). Sexist slurs: reinforcing feminine stereotypes online. Sex Roles, 83(1), 16-28. (link)
Niu X., Zhang A., Brown T., Puckett R., Mahy M., Bao L. (2017). Incorporation of hierarchical structure into estimation and projection package fitting with examples of estimating subnational HIV/AIDS dynamics. AIDS, 31, S51-S59. (link)
Research Project Summary
Completed:
Approximate cross-validated mean estimates for Bayesian hierarchical regression
- Novel method for cross-validated (CV) mean estimates, used with any CV scheme
- Order of magnitude faster than comparable methods
- Accuracy improves upon comparable methods; often equivalent to manually re-fitting models
- Theoretical results and comparisons over variety of publicly available data sets
Cyberbullying on Twitter. IGERT Rotation, 2016-2017
- Collected 2.9 million tweets containing sexist slurs
- Classified sentiment towards a target term based on word distance
- Analyzed negative stereotypes by compared sentiment of tweets with key adjectives vs those without
- Joint work with Dr. Felmlee in Department of Sociology
Ongoing:
Quantifying the effect of data availability on model estimates.
- Data imbalance can have a large impact on model estimates
- Novel method which quantifies the effect of each data point on model estimates
- Comparing data availability to effect on estimates contextualizes data imbalance
Professional Experience
Graduate Data Science Summer Program Fellow, National Institutes of Health, Summer 2019
- Personalized medicine: Predicted drug sensitivity in 43 cell lines with multiple myeloma using 84,738 biomarkers for 8 drug compounds at 11 dosages
- Compared dimension reduction methods to capture high-level relationships among biomarkers and transform high-dimensional feature space
Intern, Joint United Nations Programme on HIV and AIDS(UNAIDS), Summer 2018
- Evaluation and development of new incidence rate ratios (IRRs) used in UNAIDS’ proprietary software to estimate annual HIV infections and deaths due to HIV
- Results presented at UNAIDS Reference Group Meeting in September, 2018
Technical Writer, MicroStrategy, 2013-2014
- Translated developer-speak into client-facing documentation for APIs and GUIs
Identified primary needs for customers in the struggle to use
Technical Skills
- Regression
- Semiparametric models
- Hierarchical Bayesian models
- Statistical learning
- R
- Python
- STAN
Contact
axzhang [at] psu [dot] edu