III:Small: Learning Latent Representations of Heterogeneous Information Networks, IIS-1717084, 8/1/2017 – 7/31/2020, $499,635, National Science Foundation.
Sponsors
Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. |
Latest News
- Our work on Learning to Route by GAN (i.e., ProgRPGAN) is accepted to KDD 2021.
- Our work on Future Citation Forecasting (i.e., CINES) is accepted to SIGIR 2021.
- Our work on Road Network Representation Learning (i.e., RNRL) is published in ACM TIST.
- The software code for WWM [DSAA 2020] is released and made available via Github.
- Our work on Economic Worth-Aware Word Embeddings (i.e., WWM) is accepted to DSAA 202
- Our work on Co-Ordered Network Embeddings (i.e., CO2Vec) is accepted to DSAA 2020
- Our work on Network Intervention is accepted to WWW 2020.
- The software code for DeepIST [CIKM 2019] is released and made available via Github.
- Our work on Trajectory Representation Learning (i.e., Trembr) is published in ACM TIST.
- Our work on Intersection Representation Learning (i.e., IRN2Vec) is accepted to SIGSPATIAL 2019.
- Our work on DeepIST is accepted to CIKM 2019.
- Our work on Order Embedding is accepted to SIGIR 2019.
- Our IA3’17 paper won the Best Artifact Award in the Seventh Workshop on Irregular Applications: Architectures and Algorithms. Congratulations!
- The HIN2Vec paper is published in CIKM 2017.
Project Overview
Feature engineering is an important preprocessing step in for applying machine learning algorithms for data mining and knowledge discovery projects. Recent development on representation learning has shed a light on alleviating the dependence of feature engineering on human knowledge and labors. Meanwhile, heterogeneous information networks have been proposed recently to model the heterogeneous types of network entities and their relations in support of network data analysis and mining. In this project, new representation learning methods are proposed to learn representations (embeddings) of nodes and relations that capture rich, meaningful and discriminative feature information in heterogeneous information networks. These node and relation representations are to be learned efficiently and automatically, and to be general-purposed, effective, meaningful in order to be reusable in a wide variety of network data analysis and mining applications.
The overall goal of the project is to develop new neural network frameworks to learn representations of heterogeneous information networks. Specifically, the research objectives of this project are three-fold: 1) Leverage information embedding in the network structures of heterogeneous information networks to learn representations of latent features for nodes and relations in the network. Multiple relationships, specified by meta-paths in the network, are targeted to jointly learn the representations. Novel techniques are developed to address the scalability issues in learning and new sample extraction schemes are proposed to automatically prepare data samples for representation learning. 2) Address model design and learning issues arising in heterogeneous information networks growing with time, e.g., citation networks. The factor of time lagging in citation information networks is incorporated in representation learning. New neural network architectures and new sample data extraction schemes are devised. 3) Integrate both content and network structures in representation learning of heterogeneous information networks. Via study on heterogeneous information networks of scientific publications, words, citations, and implication of knowledge flow via citations are exploited to coherently learn representations of papers and citations. New neural network architectures and sample data extraction schemes are devised. As described in the objectives, this project develops a testbed consisting of new neural network frameworks for representation learning on heterogeneous information networks. Rigorous testing and comprehensive evaluation are performed on the developed models, techniques and software, which is made available as research resources to the communities of data mining and representation learning. While these algorithms are designed for the learning frameworks in this project, the techniques may generally applicable to other neural network architectures, potentially advancing the research in data mining and machine learning.
In this project, a new neural network framework, HIN2Vec [CIKM’17], that exploits network structures of the heterogeneous information networks (HINs) has been developed and implemented. HIN2Vec is designed to capture the rich semantics embedded in HINs by jointly predicting different types of relationships among nodes (specified as metapaths) to learn embeddings (latent vectors) of nodes and metapaths in the HIN. Similar to Word2Vec, the network representation learning framework HIN2Vec adopts a shallow learning paradigm that trains neural network models based on negative sampling and stochastic graident descent. For performance optimization, an initial research effort has been made on improving the efficiency of Word2Vec on a multicore system using a One Billion Word benchmark dataset. We propose a new optimization technique called context combining that simultaneously processes multiple contexts by reusing positive and negative samples [IA3’17 ], which will be extended for HIN2Vec.
Faculty Members
Student Members
- Jiaming Chai
- Fang He
- Hui-Ju Hung
- Mengxiang Wang
- William Wang
Graduated Members
- Tao-Yang Fu (2020)
- Yusan Lin (2018)
Related Publication
- Fu, T.-Y., & Lee, W. (2021). ProgRPGAN: Progressive GAN for Route Planning. The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2021), Virtual Event, Singapore, August 14-18, 2021. (pp. 11). ACM. DOI: https://doi.org/10.1145/3447548.3467406
- He, F., Fu, T.-Y., Lee, W., & Lei, Z. (2021). CINES: Explore Citation Network and Event Sequences for Citation Forecasting. The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2021), Virtual Event, Canada, July 11-15, 2021.. (pp. 10). ACM. DOI: https://doi.org/10.1145/3404835.3462903
- Teng, Y.-W., Shi, Y., Tai, C.-H., Yang, D.-N., Lee, W., & Chen, M.-S. (2021). Influence Maximization Based on Dynamic Personal Perception in Knowledge Graph. Proceedings of the 37th IEEE International Conference on Data Engineering, ICDE 2021, Chania, Greece, April 19-22, 2021. (pp. 12). Institute of Electrical and Electronics Engineers.
- Luo, Z., Cai, S., Chen, G., Gao, J., Lee, W., Ngiam, K. Y., & Zhang, M. (2021). Improving Data Analytics with Fast and Adaptive Regularization. IEEE Transaction on Knowledge and Data Engineering (TKDE), 33(2), 18. DOI: 10.1109/TKDE.2019.2916683
- Wang, M., Fu, T.-Y., Lee, W., & Yu, G. (2021). On Representation Learning for Road Networks. ACM Transactions on Intelligent Systems and Technology (TIST), 12(1), 27. DOI: https://doi.org/10.1145/3424346
- Lai, H.-C., Tsai, J.-Y., Shuai, H.-H., Huang, J.-L., Lee, W., & Yang, D.-N. (2020). Live Multi-Streaming and Donation Recommendations via Coupled Donation-Response Tensor Factorization. Proceedings of the 29th ACM International Conference on Information and Knowledge Management, CIKM 2020, Ireland, Oct. 19-23, 2020. (pp. 10). USA: ACM. DOI: https://doi.org/10.1145/3340531.3411925
- Chiang, M.-F., Lim, E.-P., Lee, W., & Prasetyo, P. (2020). CO2Vec: Embeddings of Co-Ordered Networks Based on Mutual Reinforcement. Proceedings of the 7th IIEEE International Conference on Data Science and Advanced Analytics, DSAA 2020, Sydney, Australia, October 6-9, 2020. (pp. 10). IEEE. DOI: 10.1109/DSAA49011.2020.00027, ISBN/ISSN: 978-1-7281-8206-3
- Lin, Y., Yin, P., & Lee, W. (2020). Economic Worth-Aware Word Embeddings. Proceedings of the 7th IIEEE International Conference on Data Science and Advanced Analytics, DSAA 2020, Sydney, Australia, October 6-9, 2020. (pp. 10). IEEE. DOI: 10.1109/DSAA49011.2020.00048, ISBN/ISSN: 978-1-7281-8206-3
- Ko, S.-H., Lai, H.-C., Shuai, H.-H., Yang, D.-N., Lee, W., Yu, P. S. Optimizing Item and Subgroup Configurations for SocialAware VR Shopping. Proceedings of the VLDB Endowment, the 46th International Conference on Very Large Data Base, VLDB 2020, Tokyo, Japan,. ACM, pp. 1275-1289.
- Ma, Q., Gu, Y., Lee, W., Ge, Y., Liu, H., & Wu, X. (2020). REMIAN: Real-Time and Error-Tolerant Missing Value Imputation. ACM Transactions on Knowledge Discovery from Data 14(6), 38. DOI: https://doi.org/10.1145/3412364
- Ma, Q., Lee, W., Fu, T.-Y., Gu, Y., & Ge, Y. (2020). MIDIA: Exploring Denoising Autoencoders for Missing Data Imputation. Data Mining and Knowledge Discovery 34(6), 19. DOI: https://doi.org/10.1007/s10618-020-00706-8
- Hung, H.-J., Lee, W., Yang, D.-N., Shen, C.-Y., Lei, Z., & Chow, S.-M. (2020). Efficient Algorithms towards Network Intervention. Proceedings of the International Conference on World Wide Web, WWW 2020, Taipei, Taiwan,. ACM. Acceptance rate: 19%
- Fu, T.-Y., & Lee, W. (2020). Trembr: Exploring Road Networks for Trajectory Representation Learning. ACM Transactions on Intelligent Systems and Technology (TIST), 11(1). ACM.
- Chen, Y.-L., Yang, D.-N., Shen, C.-Y., Lee, W., Chen, M.-S. (2019). On Efficient Processing of Group and Subsequent Queries for Social Activity Planning. IEEE Transaction on Knowledge and Data Engineering (TKDE), 31(12).
- Wang, M., Lee, W., Fu, T.-Y., Yu, G. (2019). Learning Embeddings of Intersections on Road Networks. Proceedings of the 27th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, SIGSPATIAL 2019, Chicago, IL, USA, November 5-8, 2019. ACM. pp. 309-318.
- Fu, T.-Y., Lee, W. (2019). DeepIST: Deep Image-based Spatio-Temporal Network for Travel Time Estimation. Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM 2019, Beijing, China, November 3-7, 2019. ACM. pp. 69-78.
- Chiang, M., Lim, E., Lee, W., Ashok, X., Prasetyo, P. (2019). One-Class Order Embedding Learning for Dependency Relation Prediction, Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, July 21-25, 2019, Paris, France. Association for Computing Machinery, pp. 205-214.
- Chiang, M., Lim, E., Lee, W., Hoang, T. (2018). Inferring Trip Occupancies in the Rise of Ride-Hailing Services. Proceedings of the ACM 2018 International Conference on Information and Knowledge Management, CIKM 2018. Association for Computing Machinery. pp. 2097-2105.
- Shuai, H.-H., Shen, C.-Y., Yang, D.-N., Lan, Y.-F., Lee, W., Yu, P. S., & Chen, M.-S. (2018). A Comprehensive Study on Social Network Mental Disorders Detection via Online Social Media Mining. IEEE Transaction on Knowledge and Data Engineering (TKDE), 30(7).
- Gao, J., Ooi, B.C., Shen, Y., & Lee, W. (Author) (2018). Cuckoo Feature Hashing: Dynamic Weight Sharing for Sparse Analytics. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI 2018, July 13-19, 2018, Stockholm, Sweden. The AAAI Press. Acceptance rate: 20.5%
- Luo, Z., Cai, S., Gao, J., Zhang, M., Ngiam, K.Y., Chen, G., Lee, W. (2018). Adaptive Lightweight Regularization Tool for Complex Analytics. International Conference on Data Engineering, ICDE 2018, Paris, France. IEEE Computer Society. (Best Paper Award – Runner Up)
- Lin, Y.-S., Yin, P., & Lee, W. (2018). Modeling Dynamic Competition on Crowdfunding Markets. Proceedings of the International Conference on World Wide Web, WWW 2018, Leon, France. ACM.
- Lin, Y.-S., Yin, P., & Lee, W. (2017). Modeling Menu Bundle Designs of Crowdfunding Projects. Proceedings of the ACM 2017 International Conference on Information and Knowledge Management, CIKM 2017. Association for Computing Machinery. Acceptance rate: 171/820=21%
Fu, T.-Y., Lee, W., Lei, Z. (2017). HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning. Proceedings of the ACM 2017 International Conference on Information and Knowledge Management, CIKM 2017. Association for Computing Machinery. Acceptance rate: 171/820=21% - Rengasamy, V., Fu, T.-Y., Lee, W., Madduri, K. (2017). Optimizing Word2Vec Performance on Multicore Systems. Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, IA3 2017. Association for Computing Machinery. (The Best Artifact Award)
Released Software
- The code for WWM [DSAA 2020] is available at Github https://github.com/yusanlin/word-worth-embedding
- The code for DeepIST [CIKM 2019] is available at Github https://github.com/csiesheep/deepist.
- The code for HIN2Vec [CIKM 2017] is available at Github https://github.com/csiesheep/hin2vec.
- The code for [IA3 2017] is available at Github https://vasupsu.github.io/publication/psgnscc-ia3.