Software Note
PreSSAPro: A software for the prediction of secondary structure by amino acid properties
Introduction
The propensities for different secondary structures represent intrinsic properties of amino acids, used in the last three decades to investigate protein structure. In the 1970s Chou and Fasman developed their pioneering prediction method based on the statistical propensity of amino acids for secondary structures, evaluated on the few tens of proteins for which the three-dimensional structures determined by X-ray diffraction were available. On the basis of such propensities, it was possible to evaluate the mean propensity for the different secondary structures along a given sequence, and so to predict its secondary structure (Chou and Fasman, 1974a, Chou and Fasman, 1974b, Chou, 1989). Propensities evaluated in the early works, or their re-evaluated versions, are still used for developing new algorithms and predictive methods (Wang and Feng, 2005, Fuchs and Alix, 2005).
The PreSSAPro service is based on our recent paper (Costantini et al., 2006) which investigated a new point of view about amino acid propensities. The main question in our work was what is the best protein dataset to evaluate the amino acid propensities, either larger but not homogeneous or smaller but homogeneous sets, and how the composition of the protein dataset affects these propensities. We evaluated the amino acid propensities for three types of secondary structures (i.e. helix, beta-strand and coil) for 2168 proteins reported in the PDBselect dataset. The success of predictions based on these propensities was improved in comparison to the original Chou and Fasman method, based on few tens of proteins. Then, this dataset was subdivided into three subsets corresponding to the secondary structural classes, i.e. all-alpha, all-beta and alpha–beta proteins, according to the definition of Nakashima et al. (1986), that consider proteins with >15% alpha-helical content and <10% beta-strand content as all-alpha proteins, with <15% alpha-content and >10% beta-content as all-beta proteins, with >15% alpha-content and >10% beta-content as mixed proteins, and the remaining as irregular. For each subset, the amino acid propensities have been calculated and used for predicting the secondary structure of the proteins belonging to that subset. The success of the predictions resulted further improved in comparison to the predictions obtained using the propensities calculated for the whole dataset. The final consideration from that work concerns the reliability of the Chou and Fasman approach. Its results can increase drastically with the growth of the number of proteins in the initial data set used to evaluate the amino acid propensities, but also by using smaller data sets of proteins which are homogeneous in their secondary structure content. These conclusions allowed us to develop a novel software for the prediction of secondary structure of proteins, named PreSSAPro.
Section snippets
Methods
PreSSAPro is a CGI script written in PERL language that predicts the secondary structure of proteins using the residue propensity values in different secondary structural types (Pij) determined from the ratio of the residue's frequency of occurrence in helices, beta-strand and coil versus its frequency of occurrence evaluated in four different protein subsets (Costantini et al., 2006). The tables of amino acids propensities calculated for the whole PDBselect dataset and for three subsets
Using PreSSAPro
PreSSAPro has been developed with the aim of offering a user-friendly web tool to provide predictions of secondary structures starting from the amino acid sequence of a given protein. The user has three choices to obtain the prediction as shown in Fig. 1: (i) by indicating the helices and/or beta-strands contents, as percentages, if known by experimental studies; (ii) by indicating the structural class of the input sequence, choosing among “all-alpha”, “all-beta” and “alpha–beta”; or (iii) by
References (7)
- et al.
Amino acid propensities for secondary structures are influenced by the protein structural class
Biochem. Biophys. Res. Commun.
(2006) - et al.
Conformational parameters for amino acids in helical, beta-sheet, and random coil regions calculated from proteins
Biochemistry
(1974) - et al.
Prediction of protein conformation
Biochemistry
(1974)
Cited by (17)
-
Biomolecular structures: Prediction, identification and analyses
2018, Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics -
Proteomics to Assess Fish Quality and Bioactivity
2017, Proteomics in Food Science: From Farm to Fork -
Random coil structures in bacterial proteins. Relationships of their amino acid compositions to flanking structures and corresponding genic base compositions
2013, BiochimieCitation Excerpt :There are several amino acid residues which are known as strong coil formers (i.e. helix and beta strand breakers). They are: Gly, Pro, Asn and Asp [2–5]. In this study we showed that three of them (Gly, Asn and Asp) have significantly higher usages in the specific type of coil (BCB), than in other ones.
-
Stabilization of secondary structure elements by specific combinations of hydrophilic and hydrophobic amino acid residues is more important for proteins encoded by GC-poor genes
2012, BiochimieCitation Excerpt :In general, they have been confirmed in in vitro studies on model peptides [8]. Those propensity scales have brought new information on the theoretical issues of secondary structure formation and also have been used for secondary structure prediction in numerous computer algorithms [1,3]. It is known that amino acid content of proteins highly depends on GC-content of genes coding for them [9].
-
A compact hybrid feature vector for an accurate secondary structure prediction
2011, Information SciencesCitation Excerpt :An amino acid of type aa was extracted from 126 sequences from the RS126 dataset. RS126 is a popular benchmark dataset widely used in secondary structure prediction [3,13,22,25,26,29,49] and is still continuously used. Furthermore, RS126 composed of low identity sequences with no sequences sharing more than 24% similar identity.