PreSSAPro: A software for the prediction of secondary structure by amino acid properties

doi:10.1016/j.compbiolchem.2007.08.010

Computational Biology and Chemistry

Volume 31, Issues 5–6, October 2007, Pages 389-392

, ,

https://doi.org/10.1016/j.compbiolchem.2007.08.010 Get rights and content

Abstract

PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha–beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/ .

Introduction

The propensities for different secondary structures represent intrinsic properties of amino acids, used in the last three decades to investigate protein structure. In the 1970s Chou and Fasman developed their pioneering prediction method based on the statistical propensity of amino acids for secondary structures, evaluated on the few tens of proteins for which the three-dimensional structures determined by X-ray diffraction were available. On the basis of such propensities, it was possible to evaluate the mean propensity for the different secondary structures along a given sequence, and so to predict its secondary structure (Chou and Fasman, 1974a, Chou and Fasman, 1974b, Chou, 1989). Propensities evaluated in the early works, or their re-evaluated versions, are still used for developing new algorithms and predictive methods (Wang and Feng, 2005, Fuchs and Alix, 2005).

The PreSSAPro service is based on our recent paper (Costantini et al., 2006) which investigated a new point of view about amino acid propensities. The main question in our work was what is the best protein dataset to evaluate the amino acid propensities, either larger but not homogeneous or smaller but homogeneous sets, and how the composition of the protein dataset affects these propensities. We evaluated the amino acid propensities for three types of secondary structures (i.e. helix, beta-strand and coil) for 2168 proteins reported in the PDBselect dataset. The success of predictions based on these propensities was improved in comparison to the original Chou and Fasman method, based on few tens of proteins. Then, this dataset was subdivided into three subsets corresponding to the secondary structural classes, i.e. all-alpha, all-beta and alpha–beta proteins, according to the definition of Nakashima et al. (1986), that consider proteins with >15% alpha-helical content and <10% beta-strand content as all-alpha proteins, with <15% alpha-content and >10% beta-content as all-beta proteins, with >15% alpha-content and >10% beta-content as mixed proteins, and the remaining as irregular. For each subset, the amino acid propensities have been calculated and used for predicting the secondary structure of the proteins belonging to that subset. The success of the predictions resulted further improved in comparison to the predictions obtained using the propensities calculated for the whole dataset. The final consideration from that work concerns the reliability of the Chou and Fasman approach. Its results can increase drastically with the growth of the number of proteins in the initial data set used to evaluate the amino acid propensities, but also by using smaller data sets of proteins which are homogeneous in their secondary structure content. These conclusions allowed us to develop a novel software for the prediction of secondary structure of proteins, named PreSSAPro.

Section snippets

Methods

PreSSAPro is a CGI script written in PERL language that predicts the secondary structure of proteins using the residue propensity values in different secondary structural types (P_ij) determined from the ratio of the residue's frequency of occurrence in helices, beta-strand and coil versus its frequency of occurrence evaluated in four different protein subsets (Costantini et al., 2006). The tables of amino acids propensities calculated for the whole PDBselect dataset and for three subsets

Using PreSSAPro

PreSSAPro has been developed with the aim of offering a user-friendly web tool to provide predictions of secondary structures starting from the amino acid sequence of a given protein. The user has three choices to obtain the prediction as shown in Fig. 1: (i) by indicating the helices and/or beta-strands contents, as percentages, if known by experimental studies; (ii) by indicating the structural class of the input sequence, choosing among “all-alpha”, “all-beta” and “alpha–beta”; or (iii) by

References (7)

S. Costantini et al.
Amino acid propensities for secondary structures are influenced by the protein structural class

Biochem. Biophys. Res. Commun.

(2006)
P.Y. Chou et al.
Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins

Biochemistry

(1974)
P.Y. Chou et al.
Prediction of protein conformation

Biochemistry

(1974)

There are more references available in the full text version of this article.

Cited by (17)

Biomolecular structures: Prediction, identification and analyses

2018, Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics

Secondary structure elements (SSEs) in biomolecules play a major role in folding and determining the function. Accurate and reliable identification of SSEs is a long persisting challenge in structural biology. This article intends to briefly introduce various SSEs in nucleic acids as well as proteins. Available algorithms for the prediction and identification of SSEs have also been discussed.
Proteomics to Assess Fish Quality and Bioactivity

2017, Proteomics in Food Science: From Farm to Fork

The large-scale analysis of proteins, or proteomics, has become a useful tool to identify and understand processes involved in quality and bioactivity of fish. Proteomics provides unique evidence about structure, function, posttranslational modifications, interactions, and abundance of proteins, information that has been successfully used to explain changes in quality of fish during processing and storage, but also mechanisms of bioactivity of fish to reduce disease prevalence. The aim of the present work is to provide an overview of the diverse information that different proteomic strategies have provided (or can provide) to evaluate and understand important issues concerning quality and bioactivity of food systems based on fish and seafood. This chapter will discuss the distinctive advantages that proteomics brings to evaluate quality challenges and bioactivity of fish and the methodological aspects that should necessarily be considered according to the type of proteins or biological processes intended to be monitored. The information made available in this chapter should guide users to better design proteomic approaches intended to tackle challenges that compromise quality and bioactivity of fish.
Random coil structures in bacterial proteins. Relationships of their amino acid compositions to flanking structures and corresponding genic base compositions

2013, Biochimie

Citation Excerpt :

There are several amino acid residues which are known as strong coil formers (i.e. helix and beta strand breakers). They are: Gly, Pro, Asn and Asp [2–5]. In this study we showed that three of them (Gly, Asn and Asp) have significantly higher usages in the specific type of coil (BCB), than in other ones.

In this study we classified regions of random coil into four types: coil between alpha helix and beta strand, coil between beta strand and alpha helix, coil between two alpha helices and coil between two beta strands. This classification may be considered as natural. We used 610 3D structures of proteins collected from the Protein Data Bank from bacteria with low, average and high genomic GC-content. Relatively short regions of coil are not random: certain amino acid residues are more or less frequent in each of the types of coil. Namely, hydrophobic amino acids with branched side chains (Ile, Val and Leu) are rare in coil between two beta strands, unlike some acrophilic amino acids (Asp, Asn and Gly). In contrast, coil between two alpha helices is enriched by Leu. Regions of coil between alpha helix and beta strand are enriched by positively charged amino acids (Arg and Lys), while the usage of residues with side chains possessing hydroxyl group (Ser and Thr) is low in them, in contrast to the regions of coil between beta strand and alpha helix. Regions of coil between beta strand and alpha helix are significantly enriched by Cys residues. The response to the symmetric mutational pressure (AT-pressure or GC-pressure) is also quite different for four types of coil. The most conserved regions of coil are “connecting bridges” between beta strand and alpha helix, since their amino acid content shows less strong dependence on GC-content of genes than amino acid contents of other three types of coil. Possible causes and consequences of the described differences in amino acid content distribution between different types of random coil have been discussed.
Stabilization of secondary structure elements by specific combinations of hydrophilic and hydrophobic amino acid residues is more important for proteins encoded by GC-poor genes

2012, Biochimie

Citation Excerpt :

In general, they have been confirmed in in vitro studies on model peptides [8]. Those propensity scales have brought new information on the theoretical issues of secondary structure formation and also have been used for secondary structure prediction in numerous computer algorithms [1,3]. It is known that amino acid content of proteins highly depends on GC-content of genes coding for them [9].

Stabilization of secondary structure elements by specific combinations of hydrophobic and hydrophilic amino acids has been studied by the way of analysis of pentapeptide fragments from twelve partial bacterial proteomes. PDB files describing structures of proteins from species with extremely high and low genomic GC-content, as well as with average G + C were included in the study. Amino acid residues in 78,009 pentapeptides from alpha helices, beta strands and coil regions were classified into hydrophobic and hydrophilic ones. The common propensity scale for 32 possible combinations of hydrophobic and hydrophilic amino acid residues in pentapeptide has been created: specific pentapeptides for helix, sheet and coil were described. The usage of pentapeptides preferably forming alpha helices is decreasing in alpha helices of partial bacterial proteomes with the increase of the average genomic GC-content in first and second codon positions. The usage of pentapeptides preferably forming beta strands is increasing in coil regions and in helices of partial bacterial proteomes with the growth of the average genomic GC-content in first and second codon positions. Due to these circumstances the probability of coil-sheet and helix-sheet transitions should be increased in proteins encoded by GC-rich genes making them prone to form amyloid in certain conditions. Possible causes of the described fact that importance of alpha helix and coil stabilization by specific combinations of hydrophobic and hydrophilic amino acids is growing with the decrease of genomic GC-content have been discussed.
A compact hybrid feature vector for an accurate secondary structure prediction

2011, Information Sciences

Citation Excerpt :

An amino acid of type aa was extracted from 126 sequences from the RS126 dataset. RS126 is a popular benchmark dataset widely used in secondary structure prediction [3,13,22,25,26,29,49] and is still continuously used. Furthermore, RS126 composed of low identity sequences with no sequences sharing more than 24% similar identity.

Amino acid propensity score is one of the earliest successful methods used in protein secondary structure prediction. However, the score performs poorly on small-sized datasets and low-identity protein sequences. Based on current in silico method, secondary structure can be predicted from local folds or local protein structure. In biology, the evolution of secondary structure produces local protein structure with different lengths. To precisely predict secondary structures, we propose a derivative feature vector, DPS that utilizes the optimal length of the local protein structure. DPS is the unification of amino acid propensity score and dihedral angle score. This new feature vector is further normalized to level the edges. Prediction is performed by support vector machines (SVM) over the DPS feature vectors with class labels generated by secondary structure assignment method (SSAM) and secondary structure prediction method (SSPM). All experiments are carried out on RS126 sequences. The results from this proposed method also highlight the overall accuracy of our method compared to other state-of-the-art methods. The performance of our method was acceptable specifically in dealing with low number and low identity sequences.
Evaluation of the sensitivity and specificity of gst-tagged recombinant antigens 2b2t, ag5t and dipol in elisa for the diagnosis and follow up of patients with cystic echinococcosis

2020, PLoS Neglected Tropical Diseases

View all citing articles on Scopus

View full text

Software Note PreSSAPro: A software for the prediction of secondary structure by amino acid properties

Abstract

Introduction

Section snippets

Methods

Using PreSSAPro

Biochem. Biophys. Res. Commun.

Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins

Biochemistry

Prediction of protein conformation

Biochemistry

Software Note
PreSSAPro: A software for the prediction of secondary structure by amino acid properties