III:Small: Learning Latent Representations of Heterogeneous Information Networks, IIS-1717084, 8/1/2017 – 7/31/2020, $499,635, National Science Foundation.
|Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.|
- Our work on Order Embedding is accepted to SIGIR 2019.
- Our IA3’17 paper won the Best Artifact Award in the Seventh Workshop on Irregular Applications: Architectures and Algorithms. Congratulations!
- The HIN2Vec paper is published in CIKM 2017.
Feature engineering is an important preprocessing step in for applying machine learning algorithms for data mining and knowledge discovery projects. Recent development on representation learning has shed a light on alleviating the dependence of feature engineering on human knowledge and labors. Meanwhile, heterogeneous information networks have been proposed recently to model the heterogeneous types of network entities and their relations in support of network data analysis and mining. In this project, new representation learning methods are proposed to learn representations (embeddings) of nodes and relations that capture rich, meaningful and discriminative feature information in heterogeneous information networks. These node and relation representations are to be learned efficiently and automatically, and to be general-purposed, effective, meaningful in order to be reusable in a wide variety of network data analysis and mining applications.
The overall goal of the project is to develop new neural network frameworks to learn representations of heterogeneous information networks. Specifically, the research objectives of this project are three-fold: 1) Leverage information embedding in the network structures of heterogeneous information networks to learn representations of latent features for nodes and relations in the network. Multiple relationships, specified by meta-paths in the network, are targeted to jointly learn the representations. Novel techniques are developed to address the scalability issues in learning and new sample extraction schemes are proposed to automatically prepare data samples for representation learning. 2) Address model design and learning issues arising in heterogeneous information networks growing with time, e.g., citation networks. The factor of time lagging in citation information networks is incorporated in representation learning. New neural network architectures and new sample data extraction schemes are devised. 3) Integrate both content and network structures in representation learning of heterogeneous information networks. Via study on heterogeneous information networks of scientific publications, words, citations, and implication of knowledge flow via citations are exploited to coherently learn representations of papers and citations. New neural network architectures and sample data extraction schemes are devised. As described in the objectives, this project develops a testbed consisting of new neural network frameworks for representation learning on heterogeneous information networks. Rigorous testing and comprehensive evaluation are performed on the developed models, techniques and software, which is made available as research resources to the communities of data mining and representation learning. While these algorithms are designed for the learning frameworks in this project, the techniques may generally applicable to other neural network architectures, potentially advancing the research in data mining and machine learning.
In this project, a new neural network framework, HIN2Vec [CIKM’17], that exploits network structures of the heterogeneous information networks (HINs) has been developed and implemented. HIN2Vec is designed to capture the rich semantics embedded in HINs by jointly predicting different types of relationships among nodes (specified as metapaths) to learn embeddings (latent vectors) of nodes and metapaths in the HIN. Similar to Word2Vec, the network representation learning framework HIN2Vec adopts a shallow learning paradigm that trains neural network models based on negative sampling and stochastic graident descent. For performance optimization, an initial research effort has been made on improving the efficiency of Word2Vec on a multicore system using a One Billion Word benchmark dataset. We propose a new optimization technique called context combining that simultaneously processes multiple contexts by reusing positive and negative samples [IA3’17 ], which will be extended for HIN2Vec.
- Tao-Yang Fu
- Fang He
- Mengxiang Wang
- William Wang
- Yusan Lin
- Chiang, M., Lim, E., Lee, W., Ashok, X., Prasetyo, P. (2019). One-Class Order Embedding Learning for Dependency Relation Prediction, Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2019, July 21-25, 2019, Paris, France. Association for Computing Machinery, to appear.
- Luo, Z., Cai, S., Chen, G., Gao, J. Lee, W., Ngiam, K., Zhang, M. (2019). Improving Data Analytics with Fast and Adaptive Regularization, IEEE Transactions on Knowledge and Data Engineering (TKDE), to appear.
- Chiang, M., Lim, E., Lee, W., Hoang, T. (2018). Inferring Trip Occupancies in the Rise of Ride-Hailing Services. Proceedings of the ACM 2018 International Conference on Information and Knowledge Management, CIKM 2018. Association for Computing Machinery. pp. 2097-2105.
- Luo, Z., Cai, S., Gao, J., Zhang, M., Ngiam, K.Y., Chen, G., Lee, W. (2018). Adaptive Lightweight Regularization Tool for Complex Analytics. International Conference on Data Engineering, ICDE 2018, Paris, France. ACM.
- Lin, Y.-S., Yin, P., & Lee, W. (2018). Modeling Dynamic Competition on Crowdfunding Markets. Proceedings of the International Conference on World Wide Web, WWW 2018, Leon, France. ACM.
- Fu, T.-Y., Lee, W., Lei, Z. (2017). HIN2Vec: Explore Meta-paths in Heterogeneous Information Networks for Representation Learning. Proceedings of the ACM 2017 International Conference on Information and Knowledge Management, CIKM 2017. Association for Computing Machinery. Acceptance rate: 171/820=21%
- Rengasamy, V., Fu, T.-Y., Lee, W., Madduri, K. (2017). Optimizing Word2Vec Performance on Multicore Systems. Proceedings of the Seventh Workshop on Irregular Applications: Architectures and Algorithms, IA3 2017. Association for Computing Machinery. (The Best Artifact Award)
- The code for HIN2Vec [CIKM 2017] is available at Github https://github.com/csiesheep/hin2vec.
- The code for [IA3 2017] is available at Github https://vasupsu.github.io/publication/psgnscc-ia3.