KDD2021 Tutorial: Advances in Mining Heterogeneous Healthcare Data

Time: 12 PM — 3 PM, Aug 14, 2021 (US East)

Description: Thanks to the explosion of heterogeneous healthcare data and advanced machine learning and data mining techniques, specifically deep learning methods, we now have an opportunity to make difference in healthcare. In this tutorial, we will present state-of-the-art deep learning methods and their real-world applications, specifically focusing on exploring the unique characteristics of different types of healthcare data. The first half will be spent on introducing recent advances in mining structured healthcare data, including computational phenotyping, disease early detection/risk prediction and treatment recommendation. In the second half, we will focus on challenges specific to the unstructured healthcare data, and introduce advanced deep learning methods in automated ICD coding, understandable medical language translation, clinical trial mining, and medical report generation. This tutorial is intended for students, engineers and researchers who are interested in applying deep learning methods to healthcare, and prerequisite knowledge will be minimal. The tutorial will be concluded with open problems and a Q&A session.

Outline:

  • Introduction to Electronic Healthcare Records
    • Various types of EHR data
    • Different applications and challenges
  • Part I: Mining structured health data
    • Phenotyping
    • Disease detection/Risk prediction
    • Treatment recommendation
  • Part II: Mining unstructured health data
    • Automated ICD coding /Disease classification
    • Understandable medical language translation
    • Medical report generation
    • Clinical trial mining
  • Conclusion and Future Outlook

KDD21_tutorial_slides

Presenters:

Fenglong Ma is currently an Assistant Professor in the College of Information Science and Technology at the Pennsylvania State University (PSU). He received his Ph.D. from the Department of Computer Science and Engineering, University at Buffalo (UB) in 2019, and subsequently joined PSU. His research interests lie in data mining and machine learning, with an emphasis on mining health-related data. His research interests also include natural language processing, social network mining, and security. He has published over 60 papers in top conferences and journals such as KDD, WWW, AAAI, IJCAI, ACL, CIKM, WSDM, ICDM, SDM, and TKDE. More information can be found at his website: http://personal.psu.edu/ffm5105/

Muchao Ye is a Ph.D. student at the College of Information Sciences and Technology, the Pennsylvania State University. His research interests are data mining and machine learning, especially the topics related to temporal data such as electronic health records and videos. He has published research papers in top conferences such as KDD, WWW, ACM MM, and CIKM. Muchao received his B.S. in Information Engineering from South China University Of Technology, China in 2019.

Junyu Luo is currently a Ph.D. student at the College of Information Sciences and Technology, Pennsylvania State University. He received his B.S. degree in Computer and Technology from Sichuan University, China in 2020. His current research interests include data mining and machine learning. More specifically, he is interested in data mining in sequential medical data and medical text-related topics. His research results have been published in KDD, WWW, and CIKM.

Cao Xiao is the senior director and head of data science and machine learning at Amplitude. Her research focuses on developing machine learning and deep learning models to solve real world healthcare and business challenges and has been published in leading AI conferences including KDD, NeurIPS, ICLR, AAAI, IJCAI, SDM, ICDM, WWW and top health informatics and data mining journals such as Nature Scientific Reports, JAMIA, Bioinformatics and TKDE. Prior to Amplitude, she was the director of machine learning in the Analytics Center of Excellence (ACOE) of IQVIA from 2019 to 2021 and a research staff member leading AI for Healthcare research at IBM Research AI from 2017 to 2019 and served as member of the IBM Global Technology Outlook Committee from 2018 to 2019. She acquired her Ph.D. degree from University of Washington, Seattle in 2016.

Jimeng Sun is the Health Innovation Professor at Computer Science Department and Carle’s Illinois College of Medicine at University of Illinois, Urbana-Champaign. Previously, he was at the College of Computing at Georgia Institute of Technology and was a research staff member at IBM TJ Watson Research Center. His research focuses on health analytics and data mining, especially in designing tensor factorizations, deep learning methods, and large-scale predictive modeling systems. He published over 120 papers and filed over 20 patents (5 granted). He has received SDM/IBM early career research award 2017, ICDM best research paper award in 2008, SDM best research paper award in 2007, and KDD Dissertation runner-up award in 2008. Dr. Sun received B.S. and M.Phil. in Computer Science from Hong Kong University of Science and Technology in 2002 and 2003, M.Sc and PhD in Computer Science from Carnegie Mellon University in 2006 and 2007.