CV – A. Zhang

Print-friendly version

Education

Ph.D. dual title in Statistics and Social Data Analytics, Penn State University, Expected: May 2021

B.S. in Civil and Environmental Engineering, Carnegie Mellon University, Dec 2012

Awards and Fellowships

Big Data Social Science – Integrated Graduate Education and Research Training (BDSS-IGERT) Fellow (2016-2018)

University Graduate Fellow (2015-2017)

Papers

Zhang A., Bao. L, Daniels M. Approximate cross-validated mean estimates for Bayesian hierarchical regression models. (Submitted: arXiv code)

Li X., Al-Zaidy R., Zhang A., Bao L., Baral S., Giles C.L.. Document Classification with Distant-Supervision for Systematic Review of HIV Prevalence Data among Female Sex Workers. (Submitted: arXiv)

Felmlee D., Rodis P. I., & Zhang A. (2020). Sexist slurs: reinforcing feminine stereotypes online. Sex Roles, 83(1), 16-28. (link)

Niu X., Zhang A., Brown T., Puckett R., Mahy M., Bao L. (2017). Incorporation of hierarchical structure into estimation and projection package fitting with examples of estimating subnational HIV/AIDS dynamics. AIDS, 31, S51-S59. (link)

Research Project Summary

Completed:

Approximate cross-validated mean estimates for Bayesian hierarchical regression

Novel method for cross-validated (CV) mean estimates, used with any CV scheme
Order of magnitude faster than comparable methods
Accuracy improves upon comparable methods; often equivalent to manually re-fitting models
Theoretical results and comparisons over variety of publicly available data sets

Cyberbullying on Twitter. IGERT Rotation, 2016-2017

Collected 2.9 million tweets containing sexist slurs
Classified sentiment towards a target term based on word distance
Analyzed negative stereotypes by compared sentiment of tweets with key adjectives vs those without
Joint work with Dr. Felmlee in Department of Sociology

Ongoing:

Quantifying the effect of data availability on model estimates.

Data imbalance can have a large impact on model estimates
Novel method which quantifies the effect of each data point on model estimates
Comparing data availability to effect on estimates contextualizes data imbalance

Professional Experience

Graduate Data Science Summer Program Fellow, National Institutes of Health, Summer 2019

Personalized medicine: Predicted drug sensitivity in 43 cell lines with multiple myeloma using 84,738 biomarkers for 8 drug compounds at 11 dosages
Compared dimension reduction methods to capture high-level relationships among biomarkers and transform high-dimensional feature space

Intern, Joint United Nations Programme on HIV and AIDS(UNAIDS), Summer 2018

Evaluation and development of new incidence rate ratios (IRRs) used in UNAIDS’ proprietary software to estimate annual HIV infections and deaths due to HIV
Results presented at UNAIDS Reference Group Meeting in September, 2018

Technical Writer, MicroStrategy, 2013-2014

Translated developer-speak into client-facing documentation for APIs and GUIs

Identified primary needs for customers in the struggle to use

Technical Skills

Regression
Semiparametric models
Hierarchical Bayesian models
Statistical learning
R
Python
STAN

Contact

axzhang [at] psu [dot] edu