Skip to main content
Log in

TEAGS: time-aware text embedding approach to generate subgraphs

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Contagions (e.g. virus and gossip) spread over the nodes in propagation graphs. We can use temporal-textual contents of nodes to compute the edge weights and generate subgraphs with highly relevant nodes. This is beneficial to many applications. Yet, challenges abound. First, the propagation pattern between each pair of nodes may change by time. Second, not always the same contagion propagates. Hence, current text mining approaches including topic-modeling cannot effectively compute the edge weights. Third, since the propagation is affected by time, the word–word co-occurrence patterns may differ in various temporal dimensions which adversely impacts the performance of word embedding approaches. We argue that multi-aspect temporal dimensions (hour, day, etc) should be considered to better calculate the correlation weights between the nodes. In this work, we devise a novel framework that on the one hand, integrates a time-aware word embedding component to construct the word vectors through multiple temporal facets, and on the other hand, uses a time-only multi-facet generative model to compute the weights. Subsequently, we propose a Max-Heap Graph cutting algorithm to generate subgraphs. We validate our model through experiments on real-world datasets. The results show that our model can generate the subgraphs more effective than other rivals and temporal dynamics must be adhered in the modeling of the dynamical processes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Notes

  1. https://sites.google.com/view/time-aware-embedding

References

  • Anderson RM, May RM, Anderson B (1992) Infectious diseases of humans: dynamics and control, vol 28. Wiley Online Library

  • Babai L, Luks EM (1983) Canonical labeling of graphs. In: Proceedings of the fifteenth annual ACM symposium on Theory of computing, ACM, pp 171–183

  • Babishin V, Taghipour S (2016) Optimal maintenance policy for multicomponent systems with periodic and opportunistic inspections and preventive replacements. Appl Math Model 40(23):10480–10505

    Article  MathSciNet  Google Scholar 

  • Bamler R, Mandt S (2017) Dynamic word embeddings via skip-gram filtering. Stat 1050:27

    Google Scholar 

  • Cauchi N, Macek K, Abate A (2017) Model-based predictive maintenance in building automation systems with user discomfort. Energy 138:306–315

    Article  Google Scholar 

  • Chang L, Yu JX, Qin L (2013a) Fast maximal cliques enumeration in sparse graphs. Algorithmica 66(1):173–186

    Article  MathSciNet  Google Scholar 

  • Chang L, Yu JX, Qin L, Lin X, Liu C, Liang W (2013b) Efficiently computing k-edge connected components via graph decomposition. In: Proceedings of the 2013 ACM SIGMOD international conference on management of data, ACM, pp 205–216

  • Chang L, Li W, Qin L, Zhang W, Yang S (2017) \({\sf pSCAN}\): fast and exact structural graph clustering. IEEE Trans Knowl Data Eng 29(2):387–401

    Article  Google Scholar 

  • Chen C, Tong H, Prakash B, Tsourakakis C, Eliassi-Rad T, Faloutsos C, Chau D (2016) Node immunization on large graphs: theory and algorithms. IEEE Trans Knowl Data Eng, pp 1–1

  • Cheng J, Ke Y, Fu AWC, Yu JX, Zhu L (2010) Finding maximal cliques in massive networks by h*-graph. In: Proceedings of the 2010 ACM SIGMOD international conference on management of data, ACM, pp 447–458

  • Cheng J, Ke Y, Chu S, Özsu MT (2011) Efficient core decomposition in massive networks. In: 2011 IEEE 27th international conference on data engineering, IEEE, pp 51–62

  • Cohen R, Havlin S, Ben-Avraham D (2003) Efficient immunization strategies for computer networks and populations. Phys Rev Lett 91(24):247901

    Article  Google Scholar 

  • Deerwester S, Dumais ST, Furnas GW, Landauer TK, Harshman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407

    Article  Google Scholar 

  • Dubossarsky H, Weinshall D, Grossman E (2017) Outta control: laws of semantic change and inherent biases in word representation models. In: Proceedings of the 2017 conference on empirical methods in natural language processing, pp 1136–1145

  • Dumais ST (2004) Latent semantic analysis. Annu Rev Inf Sci Technol 38(1):188–230

    Article  Google Scholar 

  • Ebraheem M, Thirumuruganathan S, Joty S, Ouzzani M, Tang N (2018) Distributed representations of tuples for entity resolution. Proc VLDB Endow 11(11):1454–1467

    Article  Google Scholar 

  • Ganesh A, Massouli L, Towsley D (2005) The effect of network topology on the spread of epidemics. In: INFOCOM 2005. 24th annual joint conference of the IEEE computer and communications societies. Proceedings IEEE, IEEE, vol 2, pp 1455–1466

  • Goldberg AV (1984) Finding a maximum density subgraph. University of California, Berkeley

    Google Scholar 

  • Gomez Rodriguez M, Leskovec J, Krause A (2010) Inferring networks of diffusion and influence. In: Proceedings of the 16th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1019–1028

  • Goyal A, Bonchi F, Lakshmanan LV (2010) Learning influence probabilities in social networks. In: Proceedings of the third ACM international conference on Web search and data mining, ACM, pp 241–250

  • Hartke SG, Radcliffe A (2009) Mckays canonical graph labeling algorithm. Commun Math 479:99–111

    Article  MathSciNet  Google Scholar 

  • Hethcote HW (2000) The mathematics of infectious diseases. SIAM Rev 42(4):599–653

    Article  MathSciNet  Google Scholar 

  • Hosseini S (2017) Location inference and recommendation in social networks. Thesis

  • Hosseini S, Unankard S, Zhou X, Sadiq S (2014) Location oriented phrase detection in microblogs. In: International conference on database systems for advanced applications, Springer, Berlin, pp 495–509

  • Hosseini S, Yin H, Zhou X, Sadiq S, Kangavari MR, Cheung NM (2017) Leveraging multi-aspect time-related influence in location recommendation. World Wide Web, pp 1–28

  • Hosseini S, Yin H, Cheung NM, Leng KP, Elovici Y, Zhou X (2018a) Exploiting reshaping subgraphs from bilateral propagation graphs. In: International conference on database systems for advanced applications. Springer, Berlin, pp 342–351

  • Hosseini S, Yin H, Zhang M, Elovici Y, Zhou X (2018b) Mining subgraphs from propagation networks through temporal dynamic analysis. In: 2018 19th IEEE international conference on mobile data management (MDM), IEEE, pp 66–75

  • Kempe D, Kleinberg J, Tardos E (2003) Maximizing the spread of influence through a social network. In: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 137–146

  • Khalil EB, Dilkina B, Song L (2014) Scalable diffusion-aware optimization of network topology. In: Proceedings of the 20th ACM SIGKDD international conference on knowledge discovery and data mining, ACM, pp 1226–1235

  • Kloster K, Li Y (2016) Scalable and robust local community detection via adaptive subgraph extraction and diffusions. arXiv preprint arXiv:1611.05152

  • Kobler J, Schöning U, Torán J (2012) The graph isomorphism problem: its structural complexity. Springer, Berlin

    MATH  Google Scholar 

  • Li Y, He K, Bindel D, Hopcroft JE (2015) Uncovering the small community structure in large networks: a local spectral approach. In: Proceedings of the 24th international conference on world wide web, international world wide web conferences steering committee, pp 658–668

  • Ling W, Dyer C, Black AW, Trancoso I (2015a) Two/too simple adaptations of word2vec for syntax problems. In: Proceedings of the 2015 conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 1299–1304

  • Ling W, Tsvetkov Y, Amir S, Fermandez R, Dyer C, Black AW, Trancoso I, Lin CC (2015b) Not all contexts are created equal: better word representations with variable attention. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1367–1372

  • Liu H, Latecki LJ, Yan S (2013) Fast detection of dense subgraphs with iterative shrinking and expansion. IEEE Trans Pattern Anal Mach Intell 35(9):2131–2142

    Article  Google Scholar 

  • Liu X, Ge T, Wu Y (2019) Finding densest lasting subgraphs in dynamic graphs: a stochastic approach. In: 2019 IEEE 35th international conference on data engineering (ICDE), IEEE, pp 782–793

  • Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, Cambridge

    Book  Google Scholar 

  • Medlock J, Galvani AP (2009) Optimizing influenza vaccine distribution. Science 325(5948):1705–1708

    Article  Google Scholar 

  • Mikolov T, Sutskever I, Chen K, Corrado G, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems 26 (NIPS 2013)

  • Nguyen DQ, Billingsley R, Du L, Johnson M (2015) Improving topic models with latent feature word representations. Trans Assoc Comput Linguist 3:299–313

    Article  Google Scholar 

  • Ni J, Cheng W, Zhang K, Song D, Yan T, Chen H, Zhang X (2017) Ranking causal anomalies by modeling local propagations on networked systems. In: 2017 IEEE international conference on data mining (ICDM), IEEE, pp 1003–1008

  • Park D, Kim S, Lee J, Choo J, Diakopoulos N, Elmqvist N (2018) Conceptvector: text visual analytics via interactive lexicon building using word embedding. IEEE Trans Visual Comput Graphics 24(1):361–370

    Article  Google Scholar 

  • Pavan M, Pelillo M (2006) Dominant sets and pairwise clustering. IEEE Trans Pattern Anal Mach Intell 29(1):167–172

    Article  Google Scholar 

  • Peng S, Wang G, Zhou Y, Wan C, Wang C, Yu S (2017) An immunization framework for social networks through big data based influence modeling. In: IEEE transactions on dependable and secure computing

  • Pennington J, Socher R, Manning C (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), association for computational linguistics, https://doi.org/10.3115/v1/d14-1162

  • Prakash BA, Tong H, Valler N, Faloutsos M, Faloutsos C (2010) Virus propagation on time-varying networks: theory and immunization algorithms. In: Joint European conference on machine learning and knowledge discovery in databases, Springer, Berlin, pp 99–114

  • Prakash BA, Beutel A, Rosenfeld R, Faloutsos C (2012) Winner takes all: competing viruses or ideas on fair-play networks. In: Proceedings of the 21st international conference on World Wide Web, ACM, pp 1037–1046

  • Prakash BA, Adamic L, Iwashyna T, Tong H, Faloutsos C (2013) Fractional immunization in networks. In: Proceedings of the 2013 SIAM international conference on data mining, SIAM, pp 659–667

  • Rosin GD, Adar E, Radinsky K (2017) Learning word relatedness over time. arXiv preprint arXiv:1707.08081

  • Saha S, Adiga A, Prakash BA, Vullikanti AKS (2015) Approximation algorithms for reducing the spectral radius to control epidemic spread. In: Proceedings of the 2015 SIAM international conference on data mining, SIAM, pp 568–576

  • Sepehr A, Beigy H (2018) Viral cascade probability estimation and maximization in diffusion networks. IEEE Trans Knowl Data Eng

  • Shim E (2013) Optimal strategies of social distancing and vaccination against seasonal influenza. Math Biosci Eng 10:1615–1634

    Article  MathSciNet  Google Scholar 

  • Talley EM, Newman D, Mimno D, Herr BW II, Wallach HM, Burns GA, Leenders AM, McCallum A (2011) Database of nih grants using machine-learned categories and graphical clustering. Nat Methods 8(6):443

    Article  Google Scholar 

  • Valente TW, Pitts SR (2017) An appraisal of social network theory and analysis as applied to public health: challenges and opportunities. Annu Rev Public Health 38:103–118

    Article  Google Scholar 

  • Wang N, Zhang J, Tan KL, Tung AK (2010) On triangulation-based dense neighborhood graph discovery. Proc VLDB Endow 4(2):58–68

    Article  Google Scholar 

  • Yan Y, Chen LJ, Zhang Z (2014) Error-bounded sampling for analytics on big sparse data. Proc VLDB Endow 7(13):1508–1519

    Article  Google Scholar 

  • Yang Y, Chu L, Zhang Y, Wang Z, Pei J, Chen E (2018) Mining density contrast subgraphs. In: 2018 IEEE 34th international conference on data engineering (ICDE), IEEE, pp 221–232

  • Yoo J, Jo S, Kang U (2017) Supervised belief propagation: scalable supervised inference on attributed networks. In: Data mining (ICDM), 2017 IEEE international conference on, IEEE, pp 595–604

  • Zeng Z, Wang J, Zhou L, Karypis G (2007) Out-of-core coherent closed quasi-clique mining from large dense graph databases. ACM Trans Database Syst 32(2):13

    Article  Google Scholar 

  • Zhang X, Su Y, Qu S, Xie S, Fang B, Yu P (2018) IAD: interaction-aware diffusion framework in social networks. IEEE Trans Knowl Data Eng

  • Zhang Y, Parthasarathy S (2012) Extracting analyzing and visualizing triangle k-core motifs within networks. In: 2012 IEEE 28th international conference on data engineering, IEEE, pp 1049–1060

  • Zhang Y, Adiga A, Vullikanti A, Prakash BA (2015) Controlling propagation at group scale on networks. In: 2015 IEEE international conference on data mining (ICDM), IEEE, pp 619–628

  • Zhang Y, Adiga A, Saha S, Vullikanti A, Prakash BA (2016) Near-optimal algorithms for controlling propagation at group scale on networks. IEEE Trans Knowl Data Eng 28(12):3339–3352

    Article  Google Scholar 

  • Zhang Y, Ramanathan A, Vullikanti A, Pullum L, Prakash BA (2017) Data-driven immunization. In: Data mining (ICDM), 2017 IEEE international conference on, IEEE, pp 615–624

  • Zhu G, Iglesias CA (2018) Exploiting semantic similarity for named entity disambiguation in knowledge graphs. Expert Syst Appl 101:8–24

    Article  Google Scholar 

Download references

Acknowledgements

This work was partially supported by both ST Electronics and the National Research Foundation (NRF), Prime Minister’s Office, Singapore under Corporate Laboratory @ University Scheme (Programme Title: STEE Infosec - SUTD Corporate Laboratory).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saeid Hosseini.

Additional information

Responsible editor: Evangelos Papalexakis.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hosseini, S., Najafipour, S., Cheung, NM. et al. TEAGS: time-aware text embedding approach to generate subgraphs. Data Min Knowl Disc 34, 1136–1174 (2020). https://doi.org/10.1007/s10618-020-00688-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-020-00688-7

Keywords

Navigation