Abstract
Contagions (e.g. virus and gossip) spread over the nodes in propagation graphs. We can use temporal-textual contents of nodes to compute the edge weights and generate subgraphs with highly relevant nodes. This is beneficial to many applications. Yet, challenges abound. First, the propagation pattern between each pair of nodes may change by time. Second, not always the same contagion propagates. Hence, current text mining approaches including topic-modeling cannot effectively compute the edge weights. Third, since the propagation is affected by time, the word–word co-occurrence patterns may differ in various temporal dimensions which adversely impacts the performance of word embedding approaches. We argue that multi-aspect temporal dimensions (hour, day, etc) should be considered to better calculate the correlation weights between the nodes. In this work, we devise a novel framework that on the one hand, integrates a time-aware word embedding component to construct the word vectors through multiple temporal facets, and on the other hand, uses a time-only multi-facet generative model to compute the weights. Subsequently, we propose a Max-Heap Graph cutting algorithm to generate subgraphs. We validate our model through experiments on real-world datasets. The results show that our model can generate the subgraphs more effective than other rivals and temporal dynamics must be adhered in the modeling of the dynamical processes.
Similar content being viewed by others
References
Anderson RM, May RM, Anderson B (1992) Infectious diseases of humans: dynamics and control, vol 28. Wiley Online Library
Babai L, Luks EM (1983) Canonical labeling of graphs. In: Proceedings of the fifteenth annual ACM symposium on Theory of computing, ACM, pp 171–183
Babishin V, Taghipour S (2016) Optimal maintenance policy for multicomponent systems with periodic and opportunistic inspections and preventive replacements. Appl Math Model 40(23):10480–10505
Bamler R, Mandt S (2017) Dynamic word embeddings via skip-gram filtering. Stat 1050:27
Cauchi N, Macek K, Abate A (2017) Model-based predictive maintenance in building automation systems with user discomfort. Energy 138:306–315
Chang L, Yu JX, Qin L (2013a) Fast maximal cliques enumeration in sparse graphs. Algorithmica 66(1):173–186
Chang L, Yu JX, Qin L, Lin X, Liu C, Liang W (2013b) Efficiently computing k-edge connected components via graph decomposition. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, ACM, pp 205–216
Chang L, Li W, Qin L, Zhang W, Yang S (2017) \({\sf pSCAN}\): fast and exact structural graph clustering. IEEE Trans Knowl Data Eng 29(2):387–401
Chen C, Tong H, Prakash B, Tsourakakis C, Eliassi-Rad T, Faloutsos C, Chau D (2016) Node immunization on large graphs: theory and algorithms. IEEE Trans Knowl Data Eng, pp 1–1
Cheng J, Ke Y, Fu AWC, Yu JX, Zhu L (2010) Finding maximal cliques in massive networks by h*-graph. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, ACM, pp 447–458
Cheng J, Ke Y, Chu S, Özsu MT (2011) Efficient core decomposition in massive networks. In: 2011 IEEE 27th international conference on data engineering, IEEE, pp 51–62
Cohen R, Havlin S, Ben-Avraham D (2003) Efficient immunization strategies for computer networks and populations. Phys Rev Lett 91(24):247901
Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Dubossarsky H, Weinshall D, Grossman E (2017) Outta control: laws of semantic change and inherent biases in word representation models. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1136–1145
Dumais ST (2004) Latent semantic analysis. Annu Rev Inf Sci Technol 38(1):188–230
Ebraheem M, Thirumuruganathan S, Joty S, Ouzzani M, Tang N (2018) Distributed representations of tuples for entity resolution. Proc VLDB Endow 11(11):1454–1467
Ganesh A, Massouli L, Towsley D (2005) The effect of network topology on the spread of epidemics. In: INFOCOM 2005. 24th annual joint conference of the IEEE computer and communications societies. Proceedings IEEE, IEEE, vol 2, pp 1455–1466
Goldberg AV (1984) Finding a maximum density subgraph. University of California, Berkeley
Gomez Rodriguez M, Leskovec J, Krause A (2010) Inferring networks of diffusion and influence. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1019–1028
Goyal A, Bonchi F, Lakshmanan LV (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining, ACM, pp 241–250
Hartke SG, Radcliffe A (2009) Mckays canonical graph labeling algorithm. Commun Math 479:99–111
Hethcote HW (2000) The mathematics of infectious diseases. SIAM Rev 42(4):599–653
Hosseini S (2017) Location inference and recommendation in social networks. Thesis
Hosseini S, Unankard S, Zhou X, Sadiq S (2014) Location oriented phrase detection in microblogs. In: International conference on database systems for advanced applications, Springer, Berlin, pp 495–509
Hosseini S, Yin H, Zhou X, Sadiq S, Kangavari MR, Cheung NM (2017) Leveraging multi-aspect time-related influence in location recommendation. World Wide Web, pp 1–28
Hosseini S, Yin H, Cheung NM, Leng KP, Elovici Y, Zhou X (2018a) Exploiting reshaping subgraphs from bilateral propagation graphs. In: International conference on database systems for advanced applications. Springer, Berlin, pp 342–351
Hosseini S, Yin H, Zhang M, Elovici Y, Zhou X (2018b) Mining subgraphs from propagation networks through temporal dynamic analysis. In: 2018 19th IEEE international conference on mobile data management (MDM), IEEE, pp 66–75
Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 137–146
Khalil EB, Dilkina B, Song L (2014) Scalable diffusion-aware optimization of network topology. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1226–1235
Kloster K, Li Y (2016) Scalable and robust local community detection via adaptive subgraph extraction and diffusions. arXiv preprint arXiv:1611.05152
Kobler J, Schöning U, Torán J (2012) The graph isomorphism problem: its structural complexity. Springer, Berlin
Li Y, He K, Bindel D, Hopcroft JE (2015) Uncovering the small community structure in large networks: a local spectral approach. In: Proceedings of the 24th international conference on world wide web, international world wide web conferences steering committee, pp 658–668
Ling W, Dyer C, Black AW, Trancoso I (2015a) Two/too simple adaptations of word2vec for syntax problems. In: Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1299–1304
Ling W, Tsvetkov Y, Amir S, Fermandez R, Dyer C, Black AW, Trancoso I, Lin CC (2015b) Not all contexts are created equal: better word representations with variable attention. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1367–1372
Liu H, Latecki LJ, Yan S (2013) Fast detection of dense subgraphs with iterative shrinking and expansion. IEEE Trans Pattern Anal Mach Intell 35(9):2131–2142
Liu X, Ge T, Wu Y (2019) Finding densest lasting subgraphs in dynamic graphs: a stochastic approach. In: 2019 IEEE 35th international conference on data engineering (ICDE), IEEE, pp 782–793
Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge
Medlock J, Galvani AP (2009) Optimizing influenza vaccine distribution. Science 325(5948):1705–1708
Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26 (NIPS 2013)
Nguyen DQ, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313
Ni J, Cheng W, Zhang K, Song D, Yan T, Chen H, Zhang X (2017) Ranking causal anomalies by modeling local propagations on networked systems. In: 2017 IEEE international conference on data mining (ICDM), IEEE, pp 1003–1008
Park D, Kim S, Lee J, Choo J, Diakopoulos N, Elmqvist N (2018) Conceptvector: text visual analytics via interactive lexicon building using word embedding. IEEE Trans Visual Comput Graphics 24(1):361–370
Pavan M, Pelillo M (2006) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):167–172
Peng S, Wang G, Zhou Y, Wan C, Wang C, Yu S (2017) An immunization framework for social networks through big data based influence modeling. In: IEEE transactions on dependable and secure computing
Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), association for computational linguistics, https://doi.org/10.3115/v1/d14-1162
Prakash BA, Tong H, Valler N, Faloutsos M, Faloutsos C (2010) Virus propagation on time-varying networks: theory and immunization algorithms. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, Berlin, pp 99–114
Prakash BA, Beutel A, Rosenfeld R, Faloutsos C (2012) Winner takes all: competing viruses or ideas on fair-play networks. In: Proceedings of the 21st international conference on World Wide Web, ACM, pp 1037–1046
Prakash BA, Adamic L, Iwashyna T, Tong H, Faloutsos C (2013) Fractional immunization in networks. In: Proceedings of the 2013 SIAM international conference on data mining, SIAM, pp 659–667
Rosin GD, Adar E, Radinsky K (2017) Learning word relatedness over time. arXiv preprint arXiv:1707.08081
Saha S, Adiga A, Prakash BA, Vullikanti AKS (2015) Approximation algorithms for reducing the spectral radius to control epidemic spread. In: Proceedings of the 2015 SIAM international conference on data mining, SIAM, pp 568–576
Sepehr A, Beigy H (2018) Viral cascade probability estimation and maximization in diffusion networks. IEEE Trans Knowl Data Eng
Shim E (2013) Optimal strategies of social distancing and vaccination against seasonal influenza. Math Biosci Eng 10:1615–1634
Talley EM, Newman D, Mimno D, Herr BW II, Wallach HM, Burns GA, Leenders AM, McCallum A (2011) Database of nih grants using machine-learned categories and graphical clustering. Nat Methods 8(6):443
Valente TW, Pitts SR (2017) An appraisal of social network theory and analysis as applied to public health: challenges and opportunities. Annu Rev Public Health 38:103–118
Wang N, Zhang J, Tan KL, Tung AK (2010) On triangulation-based dense neighborhood graph discovery. Proc VLDB Endow 4(2):58–68
Yan Y, Chen LJ, Zhang Z (2014) Error-bounded sampling for analytics on big sparse data. Proc VLDB Endow 7(13):1508–1519
Yang Y, Chu L, Zhang Y, Wang Z, Pei J, Chen E (2018) Mining density contrast subgraphs. In: 2018 IEEE 34th international conference on data engineering (ICDE), IEEE, pp 221–232
Yoo J, Jo S, Kang U (2017) Supervised belief propagation: scalable supervised inference on attributed networks. In: Data mining (ICDM), 2017 IEEE international conference on, IEEE, pp 595–604
Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Trans Database Syst 32(2):13
Zhang X, Su Y, Qu S, Xie S, Fang B, Yu P (2018) IAD: interaction-aware diffusion framework in social networks. IEEE Trans Knowl Data Eng
Zhang Y, Parthasarathy S (2012) Extracting analyzing and visualizing triangle k-core motifs within networks. In: 2012 IEEE 28th international conference on data engineering, IEEE, pp 1049–1060
Zhang Y, Adiga A, Vullikanti A, Prakash BA (2015) Controlling propagation at group scale on networks. In: 2015 IEEE international conference on data mining (ICDM), IEEE, pp 619–628
Zhang Y, Adiga A, Saha S, Vullikanti A, Prakash BA (2016) Near-optimal algorithms for controlling propagation at group scale on networks. IEEE Trans Knowl Data Eng 28(12):3339–3352
Zhang Y, Ramanathan A, Vullikanti A, Pullum L, Prakash BA (2017) Data-driven immunization. In: Data mining (ICDM), 2017 IEEE international conference on, IEEE, pp 615–624
Zhu G, Iglesias CA (2018) Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst Appl 101:8–24
Acknowledgements
This work was partially supported by both ST Electronics and the National Research Foundation (NRF), Prime Minister’s Office, Singapore under Corporate Laboratory @ University Scheme (Programme Title: STEE Infosec - SUTD Corporate Laboratory).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible editor: Evangelos Papalexakis.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hosseini, S., Najafipour, S., Cheung, NM. et al. TEAGS: time-aware text embedding approach to generate subgraphs. Data Min Knowl Disc 34, 1136–1174 (2020). https://doi.org/10.1007/s10618-020-00688-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-020-00688-7