Algorithms for Detecting Significantly Mutated Pathways in Cancer
Publication: Journal of Computational Biology
Volume 18, Issue Number 3
Abstract
Recent genome sequencing studies have shown that the somatic mutations that drive cancer development are distributed across a large number of genes. This mutational heterogeneity complicates efforts to distinguish functional mutations from sporadic, passenger mutations. Since cancer mutations are hypothesized to target a relatively small number of cellular signaling and regulatory pathways, a common practice is to assess whether known pathways are enriched for mutated genes. We introduce an alternative approach that examines mutated genes in the context of a genome-scale gene interaction network. We present a computationally efficient strategy for de novo identification of subnetworks in an interaction network that are mutated in a statistically significant number of patients. This framework includes two major components. First, we use a diffusion process on the interaction network to define a local neighborhood of “influence” for each mutated gene in the network. Second, we derive a two-stage multiple hypothesis test to bound the false discovery rate (FDR) associated with the identified subnetworks. We test these algorithms on a large human protein-protein interaction network using somatic mutation data from glioblastoma and lung adenocarcinoma samples. We successfully recover pathways that are known to be important in these cancers and also identify additional pathways that have been implicated in other cancers but not previously reported as mutated in these samples. We anticipate that our approach will find increasing use as cancer genome studies increase in size and scope.
Get full access to this article
View all available purchase options and get full access to this article.
References
Axelson H.2004. Notch signaling and cancer: emerging complexitySemin. Cancer Biol.14317-319. Axelson, H. 2004. Notch signaling and cancer: emerging complexity. Semin. Cancer Biol. 14, 317–319.
Bader G.D.Donaldson I.Wolting C. et al.2001. BIND—The Biomolecular Interaction Network DatabaseNucleic Acids Res.29242-245. Bader, G.D., Donaldson, I., Wolting, C., et al. 2001. BIND—The Biomolecular Interaction Network Database. Nucleic Acids Res. 29, 242–245.
Benjamini Y.Hochberg Y.1995. Controlling the false discovery rateJ. R. Stat. Soc. Ser. B57289-300. Benjamini, Y., and Hochberg, Y. 1995. Controlling the false discovery rate. J. R. Stat. Soc. Ser. B 57, 289–300.
Benjamini Y.Yekutieli D.2001. The control of the false discovery rate in multiple testing under dependencyAnn. Stat.291165-1188. Benjamini, Y., and Yekutieli, D. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188.
Chuang H.Y.Lee E.Liu Y.T. et al.2007. Network-based classification of breast cancer metastasisMol. Syst. Biol.3140. Chuang, H.Y., Lee, E., Liu, Y.T., et al. 2007. Network-based classification of breast cancer metastasis. Mol. Syst. Biol. 3, 140.
Chung F.2007. The heat kernel as the pagerank of a graphProc. Nat. Acad. Sci. USA10419735. Chung, F. 2007. The heat kernel as the pagerank of a graph. Proc. Nat. Acad. Sci. USA 104, 19735.
Collins B.J.Kleeberger W.Ball D.W.2004. Notch in lung development and lung cancerSemin. Cancer Biol.14357-364. Collins, B.J., Kleeberger, W., and Ball, D.W. 2004. Notch in lung development and lung cancer. Semin. Cancer Biol. 14, 357–364.
Cui Q.Ma Y.Jaramillo M. et al.2007. A map of human cancer signalingMol. Syst. Biol.3152. Cui, Q., Ma, Y., Jaramillo, M., et al. 2007. A map of human cancer signaling. Mol. Syst. Biol. 3, 152.
Ding L.Getz G.Wheeler D.A. et al.2008. Somatic mutations affect key pathways in lung adenocarcinomaNature4551069-1075. Ding, L., Getz, G., Wheeler, D.A., et al. 2008. Somatic mutations affect key pathways in lung adenocarcinoma. Nature 455, 1069–1075.
Doyle P.Snell J.1984Random Walks and Electric NetworksThe Mathematical Association of AmericaWashington, DC. Doyle, P., and Snell, J. 1984. Random Walks and Electric Networks. The Mathematical Association of America, Washington, DC.
Feige U.Kortsarz G.Peleg D.1999. The dense k-subgraph problemAlgorithmica292001. Feige, U., Kortsarz, G., and Peleg, D. 1999. The dense k-subgraph problem. Algorithmica 29, 2001.
Greenman C.Stephens P.Smith R. et al.2007. Patterns of somatic mutation in human cancer genomesNature446153-158. Greenman, C., Stephens, P., Smith, R., et al. 2007. Patterns of somatic mutation in human cancer genomes. Nature 446, 153–158.
Hahn W.C.Weinberg R.A.2002. Modelling the molecular circuitry of cancerNat. Rev. Cancer2331-341. Hahn, W.C., and Weinberg, R.A. 2002. Modelling the molecular circuitry of cancer. Nat. Rev. Cancer 2, 331–341.
Hescott B.J.Leiserson M.D.M.Cowen L. et al.2009. Evaluating between-pathway models with expression dataProc. RECOMB2009372-385. Hescott, B.J., Leiserson, M.D.M., Cowen, L., et al. 2009. Evaluating between-pathway models with expression data. Proc. RECOMB 2009 372–385.
Hochbaum D.S.1997Approximation Algorithms for NP-Hard ProblemsPWS Publishing Co.Boston. Hochbaum, D.S., ed. 1997. Approximation Algorithms for NP-Hard Problems. PWS Publishing Co., Boston.
Hodges E.Xuan Z.Balija V. et al.2007. Genome-wide in situ exon capture for selective resequencingNat. Genet.391522-1527. Hodges, E., Xuan, Z. Balija, V., et al. 2007. Genome-wide in situ exon capture for selective resequencing. Nat. Genet. 39, 1522–1527.
Ideker T.Ozier O.Schwikowski B. et al.2002. Discovering regulatory and signalling circuits in molecular interaction networksBioinformatics18Suppl 1S233-S240. Ideker, T., Ozier, O., Schwikowski, B., et al. 2002 Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics 18, Suppl 1, S233–S240.
Jensen L.J.Kuhn M.Stark M. et al.2009. STRING 8—a global view on proteins and their functional interactions in 630 organismsNucleic Acids Res.37D412-D416. Jensen, L.J., Kuhn, M., Stark, M., et al. 2009. STRING 8—a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37, D412–D416.
Jones S.Zhang X.Parsons D.W. et al.2008. Core signaling pathways in human pancreatic cancers revealed by global genomic analysesScience3211801-1806. Jones, S., Zhang, X., Parsons, D.W., et al. 2008. Core signaling pathways in human pancreatic cancers revealed by global genomic analyses. Science 321, 1801–1806.
Jonsson P.F.Bates P.A.2006. Global topological features of cancer proteins in the human interactomeBioinformatics222291-2297. Jonsson, P.F., and Bates, P.A. 2006. Global topological features of cancer proteins in the human interactome. Bioinformatics 22, 2291–2297.
Kanehisa M.Goto S.2000. KEGG: Kyoto Encyclopedia of Genes and GenomesNucleic Acids Res.2827-30. Kanehisa, M., and Goto, S. 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28, 27–30.
Karni S.Soreq H.Sharan R.2009. A network-based method for predicting disease-causing genesJ. Comput. Biol.16181-189. Karni, S., Soreq, H., and Sharan, R. 2009. A network-based method for predicting disease-causing genes. J. Comput. Biol. 16, 181–189.
Keshava Prasad T.S.Goel R.Kandasamy K. et al.2009. Human Protein Reference Database—2009 updateNucleic Acids Res.37D767-D772. Keshava Prasad, T.S., Goel, R., Kandasamy, K., et al. 2009. Human Protein Reference Database—2009 update. Nucleic Acids Res. 37, D767–D772.
Kirsch A.Mitzenmacher M.Pietracaprina A. et al.2009. An efficient rigorous approach for identifying statistically significant frequent itemsetsPODS117-126. Kirsch, A., Mitzenmacher, M., Pietracaprina, A., et al. 2009. An efficient rigorous approach for identifying statistically significant frequent itemsets. PODS 117–126.
Kondor R.I.Lafferty J.2002. Diffusion kernels on graphs and other discrete structuresProc. ICML315-322. Kondor, R.I., and Lafferty, J. 2002. Diffusion kernels on graphs and other discrete structures. Proc. ICML 315–322.
Lin J.Gan C.M.Zhang X. et al.2007. A multidimensional analysis of genes mutated in breast and colorectal cancersGenome Res.171304-1318. Lin, J., Gan, C.M., Zhang, X., et al. 2007. A multidimensional analysis of genes mutated in breast and colorectal cancers. Genome Res. 17, 1304–1318.
Liu M.Liberzon A.Kong S.W. et al.2007. Network-based analysis of affected biological processes in type 2 diabetes modelsPLoS Genet.3e96. Liu, M., Liberzon, A., Kong., S.W., et al. 2007. Network-based analysis of affected biological processes in type 2 diabetes models. PLoS Genet. 3, e96.
Lovász L.1993. Random walks on graphs: a survey, 1–46. CombinatoricsPaul Erdös Is Eighty2. Lovász, L. 1993. Random walks on graphs: a survey, 1–46. Combinatorics. Paul Erdös Is Eighty (Volume 2).
Ma X.Lee H.Wang L. et al.2007. CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction dataBioinformatics23215-221. Ma, X., Lee, H., Wang, L., et al. 2007. CGI: a new approach for prioritizing genes by combining gene expression and protein-protein interaction data. Bioinformatics 23, 215–221.
McCormick F.1999. Signalling networks that cause cancerTrends Cell Biol.9M53-M56. McCormick, F. 1999. Signalling networks that cause cancer. Trends Cell Biol. 9, M53–M56.
Nabieva E.Jim K.Agarwal A. et al.2005. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsBioinformatics21Suppl 1i302-i310. Nabieva, E., Jim, K., Agarwal, A., et al. 2005. Whole-proteome prediction of protein function via graph-theoretic analysis of interaction maps. Bioinformatics 21, Suppl 1, i302–i310.
Nacu S.Critchley-Thorne R.Lee P. et al.2007. Gene expression network analysis and applications to immunologyBioinformatics23850-858. Nacu, S., Critchley-Thorne, R., Lee, P., et al. 2007. Gene expression network analysis and applications to immunology. Bioinformatics 23, 850–858.
Parsons D.W.Jones S.Zhang X. et al.2008. An integrated genomic analysis of human glioblastoma multiformeScience3211807-1812. Parsons, D.W., Jones, S., Zhang, X., et al. 2008. An integrated genomic analysis of human glioblastoma multiforme. Science 321, 1807–1812.
Qi Y.Suhail Y.Lin Y.Y. et al.2008. Finding friends and enemies in an enemies-only network: a graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactionsGenome Res.181991-2004. Qi, Y., Suhail, Y., Lin, Y.Y., et al. 2008. Finding friends and enemies in an enemies-only network: a graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactions. Genome Res. 18, 1991–2004.
Salwinski L.Miller C.S.Smith A.J. et al.2004. The Database of Interacting Proteins: 2004 updateNucleic Acids Res.32D449-D451. Salwinski, L., Miller, C.S., Smith, A.J., et al. 2004. The Database of Interacting Proteins: 2004 update. Nucleic Acids Res. 32, D449–D451.
Shuai T.-P.Hu X.-D.2006. Connected set cover problem and its applicationsProc. AAIM243-254. Shuai, T.-P. and Hu, X.-D. 2006. Connected set cover problem and its applications. Proc. AAIM 243–254.
Sjoblom T.Jones S.Wood L.D. et al.2006. The consensus coding sequences of human breast and colorectal cancersScience314268-274. Sjoblom, T., Jones, S. Wood, L.D., et al. 2006. The consensus coding sequences of human breast and colorectal cancers. Science 314, 268–274.
TCGA (The Cancer Genome Atlas Research Network)2008. Comprehensive genomic characterization defines human glioblastoma genes and core pathwaysNature4551061-1068. TCGA (The Cancer Genome Atlas Research Network). 2008. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068.
Tsuda K.Noble W. S.2004. Learning kernels from biological networks by maximizing entropyBioinformatics20Suppl 1i326-i333. Tsuda, K., and Noble, W. S. 2004. Learning kernels from biological networks by maximizing entropy. Bioinformatics 20, Suppl 1, i326–i333.
Ulitsky I.Karp R. M.Shamir R.2008. Detecting disease-specific dysregulated pathways via analysis of clinical expression profilesProc. RECOMB2008347-359. Ulitsky, I., Karp, R. M., and Shamir, R. 2008. Detecting disease-specific dysregulated pathways via analysis of clinical expression profiles. Proc. RECOMB 2008 347–359.
Vogelstein B.Kinzler K. W.2004. Cancer genes and the pathways they controlNat. Med.10789-799. Vogelstein, B., and Kinzler, K. W. 2004. Cancer genes and the pathways they control. Nat. Med. 10, 789–799.
Wood L.D.Parsons D.W.Jones S. et al.2007. The genomic landscapes of human breast and colorectal cancersScience3181108-1113. Wood, L.D., Parsons, D.W., Jones, S., et al. 2007. The genomic landscapes of human breast and colorectal cancers. Science 318, 1108–1113.
Information & Authors
Information
Published In
Journal of Computational Biology
Volume 18 • Issue Number 3 • March 2011
Pages: 507 - 522
PubMed: 21385051
Copyright
Copyright 2011, Mary Ann Liebert, Inc.
History
Published online: 8 March 2011
Published in print: March 2011
Topics
Authors
Disclosure Statement
No competing financial interests exist.
Metrics & Citations
Metrics
Citations
Export Citation
Export citation
Select the format you want to export the citations of this publication.
View Options
Get Access
Access content
To read the fulltext, please use one of the options below to sign in or purchase access.⚠ Society Access
If you are a member of a society that has access to this content please log in via your society website and then return to this publication.