TEAGS: time-aware text embedding approach to generate subgraphs

Hosseini, Saeid; Najafipour, Saeed; Cheung, Ngai-Man; Yin, Hongzhi; Kangavari, Mohammad Reza; Zhou, Xiaofang

doi:10.1007/s10618-020-00688-7

TEAGS: time-aware text embedding approach to generate subgraphs

Published: 03 June 2020

Volume 34, pages 1136–1174, (2020)
Cite this article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Saeid Hosseini ORCID: orcid.org/0000-0003-1956-6373¹,
Saeed Najafipour²,
Ngai-Man Cheung³,
Hongzhi Yin⁴,
Mohammad Reza Kangavari² &
…
Xiaofang Zhou⁴

373 Accesses
7 Citations
Explore all metrics

Abstract

Contagions (e.g. virus and gossip) spread over the nodes in propagation graphs. We can use temporal-textual contents of nodes to compute the edge weights and generate subgraphs with highly relevant nodes. This is beneficial to many applications. Yet, challenges abound. First, the propagation pattern between each pair of nodes may change by time. Second, not always the same contagion propagates. Hence, current text mining approaches including topic-modeling cannot effectively compute the edge weights. Third, since the propagation is affected by time, the word–word co-occurrence patterns may differ in various temporal dimensions which adversely impacts the performance of word embedding approaches. We argue that multi-aspect temporal dimensions (hour, day, etc) should be considered to better calculate the correlation weights between the nodes. In this work, we devise a novel framework that on the one hand, integrates a time-aware word embedding component to construct the word vectors through multiple temporal facets, and on the other hand, uses a time-only multi-facet generative model to compute the weights. Subsequently, we propose a Max-Heap Graph cutting algorithm to generate subgraphs. We validate our model through experiments on real-world datasets. The results show that our model can generate the subgraphs more effective than other rivals and temporal dynamics must be adhered in the modeling of the dynamical processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Dynamic Heterogeneous Graph Embedding via Heterogeneous Hawkes Process

Event prediction from news text using subgraph embedding and graph sequence mining

Article 28 February 2022

TemporalNode2vec: Temporal Node Embedding in Temporal Networks

Notes

https://sites.google.com/view/time-aware-embedding

References

Anderson RM, May RM, Anderson B (1992) Infectious diseases of humans: dynamics and control, vol 28. Wiley Online Library
Babai L, Luks EM (1983) Canonical labeling of graphs. In: Proceedings of the fifteenth annual ACM symposium on Theory of computing, ACM, pp 171–183
Babishin V, Taghipour S (2016) Optimal maintenance policy for multicomponent systems with periodic and opportunistic inspections and preventive replacements. Appl Math Model 40(23):10480–10505
Article MathSciNet Google Scholar
Bamler R, Mandt S (2017) Dynamic word embeddings via skip-gram filtering. Stat 1050:27
Google Scholar
Cauchi N, Macek K, Abate A (2017) Model-based predictive maintenance in building automation systems with user discomfort. Energy 138:306–315
Article Google Scholar
Chang L, Yu JX, Qin L (2013a) Fast maximal cliques enumeration in sparse graphs. Algorithmica 66(1):173–186
Article MathSciNet Google Scholar
Chang L, Yu JX, Qin L, Lin X, Liu C, Liang W (2013b) Efficiently computing k-edge connected components via graph decomposition. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, ACM, pp 205–216
Chang L, Li W, Qin L, Zhang W, Yang S (2017) \({\sf pSCAN}\): fast and exact structural graph clustering. IEEE Trans Knowl Data Eng 29(2):387–401
Article Google Scholar
Chen C, Tong H, Prakash B, Tsourakakis C, Eliassi-Rad T, Faloutsos C, Chau D (2016) Node immunization on large graphs: theory and algorithms. IEEE Trans Knowl Data Eng, pp 1–1
Cheng J, Ke Y, Fu AWC, Yu JX, Zhu L (2010) Finding maximal cliques in massive networks by h*-graph. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, ACM, pp 447–458
Cheng J, Ke Y, Chu S, Özsu MT (2011) Efficient core decomposition in massive networks. In: 2011 IEEE 27th international conference on data engineering, IEEE, pp 51–62
Cohen R, Havlin S, Ben-Avraham D (2003) Efficient immunization strategies for computer networks and populations. Phys Rev Lett 91(24):247901
Article Google Scholar
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Article Google Scholar
Dubossarsky H, Weinshall D, Grossman E (2017) Outta control: laws of semantic change and inherent biases in word representation models. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1136–1145
Dumais ST (2004) Latent semantic analysis. Annu Rev Inf Sci Technol 38(1):188–230
Article Google Scholar
Ebraheem M, Thirumuruganathan S, Joty S, Ouzzani M, Tang N (2018) Distributed representations of tuples for entity resolution. Proc VLDB Endow 11(11):1454–1467
Article Google Scholar
Ganesh A, Massouli L, Towsley D (2005) The effect of network topology on the spread of epidemics. In: INFOCOM 2005. 24th annual joint conference of the IEEE computer and communications societies. Proceedings IEEE, IEEE, vol 2, pp 1455–1466
Goldberg AV (1984) Finding a maximum density subgraph. University of California, Berkeley
Google Scholar
Gomez Rodriguez M, Leskovec J, Krause A (2010) Inferring networks of diffusion and influence. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1019–1028
Goyal A, Bonchi F, Lakshmanan LV (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining, ACM, pp 241–250
Hartke SG, Radcliffe A (2009) Mckays canonical graph labeling algorithm. Commun Math 479:99–111
Article MathSciNet Google Scholar
Hethcote HW (2000) The mathematics of infectious diseases. SIAM Rev 42(4):599–653
Article MathSciNet Google Scholar
Hosseini S (2017) Location inference and recommendation in social networks. Thesis
Hosseini S, Unankard S, Zhou X, Sadiq S (2014) Location oriented phrase detection in microblogs. In: International conference on database systems for advanced applications, Springer, Berlin, pp 495–509
Hosseini S, Yin H, Zhou X, Sadiq S, Kangavari MR, Cheung NM (2017) Leveraging multi-aspect time-related influence in location recommendation. World Wide Web, pp 1–28
Hosseini S, Yin H, Cheung NM, Leng KP, Elovici Y, Zhou X (2018a) Exploiting reshaping subgraphs from bilateral propagation graphs. In: International conference on database systems for advanced applications. Springer, Berlin, pp 342–351
Hosseini S, Yin H, Zhang M, Elovici Y, Zhou X (2018b) Mining subgraphs from propagation networks through temporal dynamic analysis. In: 2018 19th IEEE international conference on mobile data management (MDM), IEEE, pp 66–75
Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 137–146
Khalil EB, Dilkina B, Song L (2014) Scalable diffusion-aware optimization of network topology. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1226–1235
Kloster K, Li Y (2016) Scalable and robust local community detection via adaptive subgraph extraction and diffusions. arXiv preprint arXiv:1611.05152
Kobler J, Schöning U, Torán J (2012) The graph isomorphism problem: its structural complexity. Springer, Berlin
MATH Google Scholar
Li Y, He K, Bindel D, Hopcroft JE (2015) Uncovering the small community structure in large networks: a local spectral approach. In: Proceedings of the 24th international conference on world wide web, international world wide web conferences steering committee, pp 658–668
Ling W, Dyer C, Black AW, Trancoso I (2015a) Two/too simple adaptations of word2vec for syntax problems. In: Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1299–1304
Ling W, Tsvetkov Y, Amir S, Fermandez R, Dyer C, Black AW, Trancoso I, Lin CC (2015b) Not all contexts are created equal: better word representations with variable attention. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1367–1372
Liu H, Latecki LJ, Yan S (2013) Fast detection of dense subgraphs with iterative shrinking and expansion. IEEE Trans Pattern Anal Mach Intell 35(9):2131–2142
Article Google Scholar
Liu X, Ge T, Wu Y (2019) Finding densest lasting subgraphs in dynamic graphs: a stochastic approach. In: 2019 IEEE 35th international conference on data engineering (ICDE), IEEE, pp 782–793
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Book Google Scholar
Medlock J, Galvani AP (2009) Optimizing influenza vaccine distribution. Science 325(5948):1705–1708
Article Google Scholar
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26 (NIPS 2013)
Nguyen DQ, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313
Article Google Scholar
Ni J, Cheng W, Zhang K, Song D, Yan T, Chen H, Zhang X (2017) Ranking causal anomalies by modeling local propagations on networked systems. In: 2017 IEEE international conference on data mining (ICDM), IEEE, pp 1003–1008
Park D, Kim S, Lee J, Choo J, Diakopoulos N, Elmqvist N (2018) Conceptvector: text visual analytics via interactive lexicon building using word embedding. IEEE Trans Visual Comput Graphics 24(1):361–370
Article Google Scholar
Pavan M, Pelillo M (2006) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):167–172
Article Google Scholar
Peng S, Wang G, Zhou Y, Wan C, Wang C, Yu S (2017) An immunization framework for social networks through big data based influence modeling. In: IEEE transactions on dependable and secure computing
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), association for computational linguistics, https://doi.org/10.3115/v1/d14-1162
Prakash BA, Tong H, Valler N, Faloutsos M, Faloutsos C (2010) Virus propagation on time-varying networks: theory and immunization algorithms. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, Berlin, pp 99–114
Prakash BA, Beutel A, Rosenfeld R, Faloutsos C (2012) Winner takes all: competing viruses or ideas on fair-play networks. In: Proceedings of the 21st international conference on World Wide Web, ACM, pp 1037–1046
Prakash BA, Adamic L, Iwashyna T, Tong H, Faloutsos C (2013) Fractional immunization in networks. In: Proceedings of the 2013 SIAM international conference on data mining, SIAM, pp 659–667
Rosin GD, Adar E, Radinsky K (2017) Learning word relatedness over time. arXiv preprint arXiv:1707.08081
Saha S, Adiga A, Prakash BA, Vullikanti AKS (2015) Approximation algorithms for reducing the spectral radius to control epidemic spread. In: Proceedings of the 2015 SIAM international conference on data mining, SIAM, pp 568–576
Sepehr A, Beigy H (2018) Viral cascade probability estimation and maximization in diffusion networks. IEEE Trans Knowl Data Eng
Shim E (2013) Optimal strategies of social distancing and vaccination against seasonal influenza. Math Biosci Eng 10:1615–1634
Article MathSciNet Google Scholar
Talley EM, Newman D, Mimno D, Herr BW II, Wallach HM, Burns GA, Leenders AM, McCallum A (2011) Database of nih grants using machine-learned categories and graphical clustering. Nat Methods 8(6):443
Article Google Scholar
Valente TW, Pitts SR (2017) An appraisal of social network theory and analysis as applied to public health: challenges and opportunities. Annu Rev Public Health 38:103–118
Article Google Scholar
Wang N, Zhang J, Tan KL, Tung AK (2010) On triangulation-based dense neighborhood graph discovery. Proc VLDB Endow 4(2):58–68
Article Google Scholar
Yan Y, Chen LJ, Zhang Z (2014) Error-bounded sampling for analytics on big sparse data. Proc VLDB Endow 7(13):1508–1519
Article Google Scholar
Yang Y, Chu L, Zhang Y, Wang Z, Pei J, Chen E (2018) Mining density contrast subgraphs. In: 2018 IEEE 34th international conference on data engineering (ICDE), IEEE, pp 221–232
Yoo J, Jo S, Kang U (2017) Supervised belief propagation: scalable supervised inference on attributed networks. In: Data mining (ICDM), 2017 IEEE international conference on, IEEE, pp 595–604
Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Trans Database Syst 32(2):13
Article Google Scholar
Zhang X, Su Y, Qu S, Xie S, Fang B, Yu P (2018) IAD: interaction-aware diffusion framework in social networks. IEEE Trans Knowl Data Eng
Zhang Y, Parthasarathy S (2012) Extracting analyzing and visualizing triangle k-core motifs within networks. In: 2012 IEEE 28th international conference on data engineering, IEEE, pp 1049–1060
Zhang Y, Adiga A, Vullikanti A, Prakash BA (2015) Controlling propagation at group scale on networks. In: 2015 IEEE international conference on data mining (ICDM), IEEE, pp 619–628
Zhang Y, Adiga A, Saha S, Vullikanti A, Prakash BA (2016) Near-optimal algorithms for controlling propagation at group scale on networks. IEEE Trans Knowl Data Eng 28(12):3339–3352
Article Google Scholar
Zhang Y, Ramanathan A, Vullikanti A, Pullum L, Prakash BA (2017) Data-driven immunization. In: Data mining (ICDM), 2017 IEEE international conference on, IEEE, pp 615–624
Zhu G, Iglesias CA (2018) Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst Appl 101:8–24
Article Google Scholar

Download references

Acknowledgements

This work was partially supported by both ST Electronics and the National Research Foundation (NRF), Prime Minister’s Office, Singapore under Corporate Laboratory @ University Scheme (Programme Title: STEE Infosec - SUTD Corporate Laboratory).

Author information

Authors and Affiliations

Faculty of Computing and Information Technology, Sohar University, Sohar, Oman
Saeid Hosseini
Computational Cognitive Model Research Laboratory, School of Computer Engineering, Iran University of Science and Technology, Tehran, Iran
Saeed Najafipour & Mohammad Reza Kangavari
ST Electronics - SUTD Cyber Security Laboratory, Singapore University of Technology and Design, Singapore, Singapore
Ngai-Man Cheung
School of Information Technology and Electrical Engineering, University of Queensland, Brisbane, Australia
Hongzhi Yin & Xiaofang Zhou

Authors

Saeid Hosseini

View author publications

You can also search for this author in PubMed Google Scholar
Saeed Najafipour

View author publications

You can also search for this author in PubMed Google Scholar
Ngai-Man Cheung

View author publications

You can also search for this author in PubMed Google Scholar
Hongzhi Yin

View author publications

You can also search for this author in PubMed Google Scholar
Mohammad Reza Kangavari

View author publications

You can also search for this author in PubMed Google Scholar
Xiaofang Zhou

View author publications

You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saeid Hosseini.

Additional information

Responsible editor: Evangelos Papalexakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hosseini, S., Najafipour, S., Cheung, NM. et al. TEAGS: time-aware text embedding approach to generate subgraphs. Data Min Knowl Disc 34, 1136–1174 (2020). https://doi.org/10.1007/s10618-020-00688-7

Download citation

Received: 27 August 2019
Accepted: 08 May 2020
Published: 03 June 2020
Issue Date: July 2020
DOI: https://doi.org/10.1007/s10618-020-00688-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TEAGS: time-aware text embedding approach to generate subgraphs

Abstract

Access this article

Similar content being viewed by others

Dynamic Heterogeneous Graph Embedding via Heterogeneous Hawkes Process

Event prediction from news text using subgraph embedding and graph sequence mining

TemporalNode2vec: Temporal Node Embedding in Temporal Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TEAGS: time-aware text embedding approach to generate subgraphs

Abstract

Access this article

Similar content being viewed by others

Dynamic Heterogeneous Graph Embedding via Heterogeneous Hawkes Process

Event prediction from news text using subgraph embedding and graph sequence mining

TemporalNode2vec: Temporal Node Embedding in Temporal Networks

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation