Progress and challenges in protein structure prediction

https://doi.org/10.1016/j.sbi.2008.02.004 Get rights and content

Depending on whether similar structures are found in the PDB library, the protein structure prediction can be categorized into template-based modeling and free modeling. Although threading is an efficient tool to detect the structural analogs, the advancements in methodology development have come to a steady state. Encouraging progress is observed in structure refinement which aims at drawing template structures closer to the native; this has been mainly driven by the use of multiple structure templates and the development of hybrid knowledge-based and physics-based force fields. For free modeling, exciting examples have been witnessed in folding small proteins to atomic resolutions. However, predicting structures for proteins larger than 150 residues still remains a challenge, with bottlenecks from both force field and conformational search.

Introduction

In recent years, despite many debates, structure genomics is probably one of the most noteworthy efforts in protein structure determination, which aims to obtain 3D models of all proteins by an optimized combination of experimental structure solution and computer-based structure prediction [1, 2•]. Two factors will dictate the success of the structure genomics: experimental structure determination of optimally selected proteins and efficient computer modeling algorithms. Based on about 40 000 structures in the PDB library (many are redundant) [3], 4 million models/fold-assignments can be obtained by a simple combination of the PSI-BLAST search and the comparative modeling technique [4]. Development of more sophisticated and automated computer modeling approaches will dramatically enlarge the scope of modelable proteins in the structure genomics project.

The crucial problems/efforts in the field of protein structure prediction include: first, for the sequences of similar structures in PDB (especially those of weakly/distant homologous relation to the target), how to identify the correct templates and how to refine the template structure closer to the native; second, for the sequences without appropriate templates, how to build models of correct topology from scratch. The progress made along these directions was assessed in the recent CASP7 experiment [5] under the categories of template-based modeling (TBM) and free modeling (FM). Here, I will review the new progress and challenges in these directions.

Section snippets

Template-based modeling

The canonical procedure of the TBM consists of four steps: first, finding known structures (templates) related to the sequence to be modeled (target); second, aligning the target sequence to the template structure; third, building structural frameworks by copying the aligned regions or by satisfying the spatial restraints from templates; fourth, constructing the unaligned loop regions and adding side-chain atoms. The first two steps are actually done in a single procedure called threading (or

Free modeling

When structural analogs do not exist in the PDB library or could not be successfully identified by threading (which is more often the case as shown by Figure 1), the structure prediction has to be generated from scratch. This type of predictions has been termed as ‘ab initio’ or ‘de novo’ modeling, a term that may be easily understood as a modeling ‘from first principle’. In CASP7, it is named as ‘free modeling’ which I think reflects more appropriately the status of the field, since the most

Conclusions

Since a detailed physicochemical description of protein folding principles does not yet exist, the protein structure prediction problem is largely defined by the evolutionary or structural distance between the target and the solved proteins in the PDB library. For the proteins with close templates, full-length models can be constructed by copying the template framework. Recent studies show that if using the best possible template structures in PDB, the state-of-the-art modeling algorithms could

References and recommended reading

Papers of particular interest, published within the annual period of review, have been highlighted as:

  • • of special interest

  • •• of outstanding interest

Acknowledgements

The project is supported in part by KU Start-up Fund 06194, the Alfred P. Sloan Foundation, and Grant Number R01GM083107 of the National Institute of General Medical Sciences.

References (51)

  • J.U. Bowie et al.

    A method to identify protein sequences that fold into a known three-dimensional structure

    Science

    (1991)
  • D.T. Jones et al.

    A new approach to protein fold recognition

    Nature

    (1992)
  • Y. Zhang et al.

    TM-align: a protein structure alignment algorithm based on the TM-score

    Nucleic Acids Res

    (2005)
  • Y. Zhang et al.

    The protein structure prediction problem could be solved using the current PDB library

    Proc Natl Acad Sci U S A

    (2005)
  • J. Skolnick et al.

    Development and large scale benchmark testing of the PROSPECTOR 3.0 threading algorithm

    Protein

    (2004)
  • L. Jaroszewski et al.

    FFAS03: a server for profile–profile sequence alignments

    Nucleic Acids Res

    (2005)
  • H. Zhou et al.

    Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments

    Proteins

    (2005)
  • K. Ginalski et al.

    ORFeus: detection of distant homology using sequence profiles and predicted secondary structure

    Nucleic Acids Res

    (2003)
  • J. Shi et al.

    FUGUE: sequence–structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties

    J Mol Biol

    (2001)
  • K. Karplus et al.

    Hidden Markov models for detecting remote protein homologies

    Bioinformatics

    (1998)
  • J. Soding

    Protein homology detection by HMM–HMM comparison

    Bioinformatics

    (2005)
  • J. Cheng et al.

    A machine learning information retrieval approach to protein fold recognition

    Bioinformatics

    (2006)
  • M. Gribskov et al.

    Profile analysis: detection of distantly related proteins

    Proc Natl Acad Sci U S A

    (1987)
  • R. Sadreyev et al.

    COMPASS: a tool for comparison of multiple protein alignments with assessment of statistical significance

    J Mol Biol

    (2003)
  • D. Fischer et al.

    CAFASP3: the third critical assessment of fully automated structure prediction methods

    Proteins

    (2003)
  • Cited by (423)

    • Support vector machine in drug design

      2023, Cheminformatics, QSAR and Machine Learning Applications for Novel Drug Development
    • Energy landscapes in inorganic chemistry

      2023, Comprehensive Inorganic Chemistry III, Third Edition
    • Computational deciphering of blast resistance genes in rice

      2024, Fungal Diseases of Rice and Their Management
    View all citing articles on Scopus
    View full text