WSDM20 Tutorial: Learning with Small Data


Time: 2:00 PM – 5:00 PM Monday 3 February 2020
Room: Westheimer


In the era of big data, it is easy for us to collect a huge number of images and text data. However, we frequently face real-world problems with only small (labeled) data in some domains, such as healthcare and urban computing. The challenge is how to make machine learn algorithms still work well with small data? To solve this challenge, in this tutorial, we will cover the state-of-the-art machine learning techniques to handle small data issue. In particular, we focus on the following three aspects: (1) Providing a comprehensive review of recent advances in exploring the power of knowledge transfer, especially focusing on meta-learning; (2) introducing the cutting-edge techniques of incorporating human/expert knowledge into machine learning models; and (3) identifying the open challenges to data augmentation techniques, such as generative adversarial networks.


    • Introduction
    • Transfer knowledge from models
        • Transfer learning
        • Multi-task learning
        • Meta-learning
        • Applications
    • Transfer knowledge from domain expert
        • Enrich representations using knowledge graph
        • Regularizing the loss function by incorporating domain knowledge
    • Data augmentation
        • Augmentation using labeled data
        • Augmentation using unlabeled data



Zhenhui Li is a tenured associate professor of Information Sciences and Technology at the Pennsylvania State University. She is Haile family early career endowed professor. Prior to joining Penn State, she received her PhD degree in Computer Science from University of Illinois Urbana-Champaign in 2012, where she was a member of data mining research group. Her research has been focused on mining spatial-temporal data with applications in transportation, ecology, environment, social science, and urban computing. She is a passionate interdisciplinary researcher and has been actively collaborating with cross-domain researchers. She has served as organizing committee or senior program committee of many conferences including KDD, ICDM, SDM, CIKM, and SIGSPATIAL. She has been regularly offering classes on data organizing and data mining since 2012. Her classes have constantly received high student ratings. She has received NSF CAREER award, junior faculty excellence in research, and George J. McMurtry junior faculty excellence in teaching and learning award.


Huaxiu Yao is currently a Ph.D. candidate of College of Information Sciences and Technology at the Pennsylvania State University. He got his B.Eng. degree from the University of Electronic Science and Technology of China. His research interests focus on improving the generalizability of machine learning algorithms via knowledge transfer with the applications in social good. He has published over 10 papers on top conferences and journals such as ICML, ICLR, KDD, AAAI, WWW and TIST. He has served as program committee member in major machine learning and data mining conferences such as ICML, ICLR, KDD, AAAI, IJCAI.


Fenglong Ma is currently an assistant professor of the College of Information Sciences and Technology at Pennsylvania State University. He received his PhD degree from the Department of Computer Science and Engineering, the State University of New York at Buffalo in 2019, and subsequently joined Pennsylvania State University. His research interests lie in data mining and machine learning, with an emphasis on mining health-related data. His research interests also include Crowdsourcing, Internet of Things, Social Network Mining and Security. He has published over 40 papers in top conferences and journals such as KDD, WWW, CIKM, WSDM, ICDM, SDM, ACL, IJCAI, MobiCom, INFOCOM and TKDE.