article

Out-of-core coherent closed quasi-clique mining from large dense graph databases

Authors:
Zhiping Zeng

Tsinghua University, Beijing, PR China

Tsinghua University, Beijing, PR China
View Profile

,
Jianyong Wang

Tsinghua University, Beijing, PR China

Tsinghua University, Beijing, PR China
View Profile

,
Lizhu Zhou

Tsinghua University, Beijing, PR China

Tsinghua University, Beijing, PR China
View Profile

,
George Karypis

University of Minnesota, Minneapolis, MN

University of Minnesota, Minneapolis, MN
View Profile

Authors Info & Claims

ACM Transactions on Database Systems Volume 32 Issue 2pp 13–eshttps://doi.org/10.1145/1242524.1242530

Published:01 June 2007Publication History

Get Citation Alerts

New Citation Alert added!

This alert has been successfully added and will be sent to:

You will be notified whenever a record that you have chosen has been cited.

To manage your alert preferences, click on the button below.
Manage my Alerts

New Citation Alert!

Please log in to your account
Publisher Site

Get Access

ACM Transactions on Database Systems

Abstract

Due to the ability of graphs to represent more generic and more complicated relationships among different objects, graph mining has played a significant role in data mining, attracting increasing attention in the data mining community. In addition, frequent coherent subgraphs can provide valuable knowledge about the underlying internal structure of a graph database, and mining frequently occurring coherent subgraphs from large dense graph databases has witnessed several applications and received considerable attention in the graph mining community recently. In this article, we study how to efficiently mine the complete set of coherent closed quasi-cliques from large dense graph databases, which is an especially challenging task due to the fact that the downward-closure property no longer holds. By fully exploring some properties of quasi-cliques, we propose several novel optimization techniques which can prune the unpromising and redundant subsearch spaces effectively. Meanwhile, we devise an efficient closure checking scheme to facilitate the discovery of closed quasi-cliques only. Since large databases cannot be held in main memory, we also design an out-of-core solution with efficient index structures for mining coherent closed quasi-cliques from large dense graph databases. We call this Cocain*. Thorough performance study shows that Cocain* is very efficient and scalable for large dense graph databases.

References

Abello, J., Resende, M. G., and Sudarsky, S. 2002. Massive quasi-clique detection. In Proceedings of the 5th Latin American Symposium on Theoretical Informatics (LATIN) (Cancun, Mexico). 598--612. Google ScholarDigital Library
Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (Washington, D.C.). 207--216. Google ScholarDigital Library
Agrawal, R. and Srikant, R. 1994. Fast algorithms for mining association rules in large databases. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB) (Santiago, Chile). 487--499. Google ScholarDigital Library
Agrawal, R. and Srikant, R. 1995. Mining sequential patterns. In Proceedings of the 11th International Conference on Data Engineering (ICDE) (Taipei, Taiwan). 3--14. Google ScholarDigital Library
Boginski, V., Butenko, S., and Pardalos, P. M. 2004. On structural properties of the market graph. In Innovations in Financial and Economic Networks, A. Nagurney ed. Edward Elgar. 29--45.Google Scholar
Borgelt, C. and Berthold, M. R. 2002. Mining molecular fragments: Finding relevant substructures of molecules. In Proceedings of the IEEE International Conference on Data Mining (ICDM) (Washington, DC). 51--58. Google ScholarDigital Library
Brin, S., Motwani, R., and Silverstein, C. 1997. Beyond market baskets: Generalizing association rules to correlations. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (Tucson, AZ). 265--276. Google ScholarDigital Library
Broder, A., Kumar, R., Maghoul, F., Raghavan, P., Rajagopalan, S., Stata, R., Tomkins, A., and Wiener, J. 2000. Graph structure in the web: Experiments and models. In Proceedings of the 9th International World Wide Web Conference (WWW) (Amsterdam, the Netherlands). 309--320. Google ScholarDigital Library
Bron, C. and Kerbosch, J. 1973. Finding all cliques of an undireced graph. Commun. ACM 16, 9, 575--576. Google ScholarDigital Library
Buehrer, G., Parthasarathy, S., and Ghoting, A. 2006. Out-of-Core frequent pattern mining on a commodity PC. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD). (Philadelphia, PA) 86--95. Google ScholarDigital Library
Chakrabarti, D. and Faloutsos, C. 2006. Graph mining: Laws, generators, and algorithms. ACM Comput. Surv. 38, 1 (Mar.), Article 2. Google ScholarDigital Library
Chen, Q., Lim, A., and Ong, K. W. 2003. D(k)-index: An adaptive structural summary for graph-structured data. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (San Diego, CA). 134--144. Google ScholarDigital Library
Chi, Y., Nijssen, S., Muntz, R., and Kok, J. 2005. Frequent subtree mining---An overview. Fundam. Inf. 66, 1-2, 161--198. Google ScholarDigital Library
Dehaspe, L., Toivonen, H., and King, R. 1998. Finding frequent substructures in chemical compounds. In Proceedings of the 4th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (New York). 30--36.Google Scholar
Deshpande, M., Kuramochi, M., and Wale, N. 2005. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng. 17, 8, 1036--1050. Google ScholarDigital Library
Dong, G. and Li, J. 1999. Efficient mining of emerging patterns: Discovering trends and differences. In Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (San Diego, CA). 43--52. Google ScholarDigital Library
Feige, U., Goldwasser, S., Lovasz, L., Safra, S., and Szegedy, M. 1991. Approximating clique is almost NP-complete. In Proceedings of the 32nd Annual Symposium on Foundations of Computer Science (FOCS) (San Juan, PR). 2--12. Google ScholarDigital Library
Frawley, W. J., Piatetsky-Shapiro, G., and Matheus, C. J. 1992. Knowledge discovery in databases---An overview. AI Mag. 13, 3, 57--70. Google ScholarDigital Library
Hashimoto, K., Aoki-Kinoshita, K. F., Ueda, N., Kanehisa, M., and Mamitsuka, H. 2006. A new efficient probabilistic model for mining labeled ordered trees. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (Philadelphia, PA). 177--186. Google ScholarDigital Library
Hastad, J. 1996. Clique is hard to approximate within n<sup>1−ϵ</sup>. In Proceedings of the 37th Annual Symposium on Foundations of Computer Science (FOCS) (Burlington, VT). 627--636. Google ScholarDigital Library
Horvath, T., Bringmann, B., and Raedt., L. D. 2006. Frequent hypergraph mining. In Proceedings of the 16th International Conference on Inductive Logic Programming (ILP) (Santiago, Spain).Google Scholar
Horvath, T., Ramon, J., and Wrobel, S. 2006. Frequent subgraph mining in outerplanar graphs. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (Philadelphia, PA). 197--206. Google ScholarDigital Library
Hu, Y., Olman, V., and Xu, D. 2002. Clustering gene expression data using a graph-theoretic approach: An application of minimum spanning trees. Bioinformatics 18, 4, 536--545.Google ScholarCross Ref
Hu, H., Yan, X., Hang, Y., Han, J., and Zhou, X. J. 2005. Mining coherent dense subgraphs across massive biological network for functional discovery. Bioinformatics 21, 213--221. Google ScholarDigital Library
Huan, J., Wang, W., and Prins, J. 2003. Efficient mining of frequent subgraphs in the presence of isomorphism. In Proceedings of the 3rd IEEE International Conference on Data Mining (ICDM) (Melbourne, FL). 549--552. Google ScholarDigital Library
Inokuchi, A., Washio, T., and Motoda, H. 2000. An apriori-based algorithm for mining frequent substructures from graph data. In Proceedings of the 4th European Conference on Principles of Data Mining and Knowledge Discovery (PKDD) (Freiburg, Germany). 13--23. Google ScholarDigital Library
Karp, R. 1972. Reducibility among combinational problems. In Complexity of Computer Computations, R. E. Miller and Thatcher eds. Plenum Press, New York. 85--103.Google Scholar
Kato, H. and Takahashi, Y. 2001. Automated identification of three-dimensional common structural features of proteins. Genome Inf. 8, 296--297.Google Scholar
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., and Verkamo, A. I. 1994. Finding interesting rules from large sets of discovered association rules. In Proceedings of the 3rd International Conference on Information and Knowledge Management (CIKM) (Gaithersburg, MD). 401--407. Google ScholarDigital Library
Kuramochi, M. and Karypis, G. 2001. Frequent subgraph discovery. In Proceedings of the IEEE International Conference on Data Mining (ICDM) (San Jose, CA). 313--320. Google ScholarDigital Library
Laxman, S. and Unnikrishnan, K. P. 2005. Discovering frequent episodes and learning hidden Markov models: A formal connection. IEEE Trans. Knowl. Data Eng. 17, 11, 1505--1517. Google ScholarDigital Library
Mannila, H., Toivonen, H., and Verkamo, A. I. 1997. Discovery of frequent episodes in event sequences. Data Mining Knowl. Discov. 1, 3, 259--289. Google ScholarDigital Library
Matsuda, H., Ishihara, T., and Hashimoto, A. 1999. Classifying molecular sequences using a linkage graph with their pairwise similarities. Theor. Comput. Sci. 210, 2, 305--320. Google ScholarDigital Library
Ostergard, P. R. 2002. A fast algorithm for the maximum clique problem. Discrete Appl. Math. 120, 1-3, 197--207. Google ScholarDigital Library
Papadias, D., Tao, Y., Mouratidis, K., and Hui, C. K. 2005. Aggregate nearest neighbor queries in spatial databases. ACM Trans. Database Syst. 30, 2, 529--576. Google ScholarDigital Library
Pei, J., Jiang, D., and Zhang, A. 2005. On mining cross-graph quasi-cliques. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (Chicago, IL). 228--238. Google ScholarDigital Library
Pensa, R.G., Robardet, C., and Boulicaut, J.F. 2005. A bi-clustering framework for categorical data. In Proceedings of the 9th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD) (Porto, Portugal). 643--650.Google Scholar
Selmaoui, N., Leschi, C., Gay, D., and Boulicaut, J.F. 2006. Feature construction and delta-free sets in 0/1 samples. In Proceedings of the 9th International Conference on Discovery Science (DS) (Barcelona, Spain). 363--367. Google ScholarDigital Library
Silverstein, C., Brin, S., Motwani, R., and Ullman, J. 2000. Scalable techniques for mining causal structures. Data Mining Knowl. Discov. 4, 2-3, 163--192. Google ScholarDigital Library
Vanetik, N., Gudes, E., and Shimony, S. E. 2002. Computing frequent graph patterns from semistructured data. In Proceedings of the IEEE International Conference on Data Mining (ICDM) (Maebashi City, Japan). 458--465. Google ScholarDigital Library
Wang, C., Wang, W., Pei, J., Zhu, Y., and Shi, B. 2004. Scalable mining of large disk-based graph databases. In Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (Seattle, WA). 316--325. Google ScholarDigital Library
Wang, H., Wang, W., Yang, J., and Yu, P. S. 2002. Clustering by pattern similarity in large data sets. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD). Madison, WI). 394--405. Google ScholarDigital Library
Wang, J., Han, J., and Pei, J. 2006a. Closed constrained gradient mining in retail databases. IEEE Trans. Knowl. Data Eng. 18, 6, 764--769. Google ScholarDigital Library
Wang, J., Zeng, Z., and Zhou, L. 2006b. Clan: An algorithm for mining closed cliques from large dense graph databases. In Proceedings of the 22nd International Conference on Data Engineering (ICDE) (Atlanta, GA). Article 73. Google ScholarDigital Library
Yan, X. and Han, J. 2002. GSPAN: Graph-Based substructure pattern mining. In Proceedings of the IEEE International Conference on Data Mining (ICDM) (Maebashi City, Japan). 721--724. Google ScholarDigital Library
Yan, X. and Han, J. 2003. Closegraph: Mining closed frequent graph patterns. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (Washington, DC). 286--295. Google ScholarDigital Library
Yan, X., Yu, P. S., and Han, J. 2004. Graph indexing: A frequent structure-based approach. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD) (Paris). 335--346 Google ScholarDigital Library
Yan, X., Zhou, X. J., and Han, J. 2005. Mining closed relational graphs with connectivity constraints. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (Chicago, IL). 324--333. Google ScholarDigital Library
Yang, L., Lee, M. L., and Hsu, W. 2003. Efficient mining of XML query patterns for caching. In Proceedings of 29th International Conference on Very Large Data Bases (VLDB) (Berlin). 69--80.Google Scholar
Zaki, M. J. 2002. Efficiently mining frequent trees in a forest. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (Edmonton, Alberta, Canada). 71--80. Google ScholarDigital Library
Zeng, Z., Wang, J., Zhou, L., and Karypis, G. 2006. Coherent closed quasi-clique discovery from large dense graph databases. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (Philadelphia, PA). 797--802. Google ScholarDigital Library
Zhang, J., Hsu, W., and Lee, M. 2005. Clustering in dynamic spatial databases. J. Intell. Inf. Syst. 24, 1, 5--27. Google ScholarDigital Library
Zhang, M., Kao, B., Cheung, D. W., and Yip, K. Y. 2005. Mining periodic patterns with gap requirement from sequences. In Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (SIGKDD) (Chicago, IL). 623--633. Google ScholarDigital Library

Index Terms

Out-of-core coherent closed quasi-clique mining from large dense graph databases
1. Information systems
  1. Information systems applications
    1. Data mining

Recommendations

On mining cross-graph quasi-cliques

KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining

Joint mining of multiple data sets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in cross-market customer segmentation, a group of customers who behave similarly in ...

Read More
Coherent closed quasi-clique discovery from large dense graph databases

KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining

Frequent coherent subgraphs can provide valuable knowledge about the underlying internal structure of a graph database, and mining frequently occurring coherent subgraphs from large dense graph databases has been witnessed several applications and ...

Read More
Mining frequent cross-graph quasi-cliques

Joint mining of multiple datasets can often discover interesting, novel, and reliable patterns which cannot be obtained solely from any single source. For example, in bioinformatics, jointly mining multiple gene expression datasets obtained by different ...

Read More

Comments

comments powered by Disqus.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in

ACM Transactions on Database Systems Volume 32, Issue 2

June 2007

267 pages

ISSN:0362-5915

EISSN:1557-4644

DOI:10.1145/1242524
Issue’s Table of Contents

Copyright © 2007 ACM

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]
Sponsors
In-Cooperation
Publisher

Association for Computing Machinery

New York, NY, United States
Publication History
- Published: 1 June 2007
Published in tods Volume 32, Issue 2

Permissions

Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Graph mining

coherent subgraph

frequent closed subgraph

out-of-core algorithm

quasi-clique
Qualifiers
- article
Conference
Funding Sources
Other Metrics

View Article Metrics

Article Metrics
- 70
  Total Citations
  View Citations
- 1,126
  Total Downloads
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)2
Other Metrics

View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Out-of-core coherent closed quasi-clique mining from large dense graph databases

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

On mining cross-graph quasi-cliques

Coherent closed quasi-clique discovery from large dense graph databases

Mining frequent cross-graph quasi-cliques

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

Caption

Out-of-core coherent closed quasi-clique mining from large dense graph databases

ACM Transactions on Database Systems

Abstract

References

Cited By

Index Terms

Recommendations

On mining cross-graph quasi-cliques

Coherent closed quasi-clique discovery from large dense graph databases

Mining frequent cross-graph quasi-cliques

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

Share this Publication link

Share on Social Media