Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint

Russell L Marsden; Tony A Lewis; Christine A Orengo

doi:10.1186/1471-2105-8-86

Towards a comprehensive structural coverage of completed genomes: a structural genomics viewpoint

BMC Bioinformatics. 2007 Mar 9:8:86. doi: 10.1186/1471-2105-8-86.

Authors

Russell L Marsden¹, Tony A Lewis, Christine A Orengo

Affiliation

¹ Department of Biochemistry and Molecular Biology, University College London, Gower Street, London WC1E 6BT, UK. marsden@biochem.ucl.ac.uk

Abstract

Background: Structural genomics initiatives were established with the aim of solving protein structures on a large-scale. For many initiatives, such as the Protein Structure Initiative (PSI), the primary aim of target selection is focussed towards structurally characterising protein families which, so far, lack a structural representative. It is therefore of considerable interest to gain insights into the number and distribution of these families, and what efforts may be required to achieve a comprehensive structural coverage across all protein families.

Results: In this analysis we have derived a comprehensive domain annotation of the genomes using CATH, Pfam-A and Newfam domain families. We consider what proportions of structurally uncharacterized families are accessible to high-throughput structural genomics pipelines, specifically those targeting families containing multiple prokaryotic orthologues. In measuring the domain coverage of the genomes, we show the benefits of selecting targets from both structurally uncharacterized domain families, whilst in addition, pursuing additional targets from large structurally characterised protein superfamilies.

Conclusion: This work suggests that such a combined approach to target selection is essential if structural genomics is to achieve a comprehensive structural coverage of the genomes, leading to greater insights into structure and the mechanisms that underlie protein evolution.

Publication types

Comparative Study
Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't

MeSH terms

Animals
Databases, Protein*
Genome / genetics*
Genomics* / methods
Humans
Multigene Family
Sequence Analysis, Protein / methods
Structural Homology, Protein

Grants and funding

WT_/Wellcome Trust/United Kingdom