Computational Design of Stable and Soluble Biocatalysts
- Milos Musil
Milos MusilLoschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech RepublicIT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech RepublicInternational Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech RepublicMore by Milos Musil
- ,
- Hannes Konegger
Hannes KoneggerLoschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech RepublicInternational Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech RepublicMore by Hannes Konegger
- ,
- Jiri Hon
Jiri HonLoschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech RepublicIT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech RepublicInternational Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech RepublicMore by Jiri Hon
- ,
- David Bednar
David BednarLoschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech RepublicInternational Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech RepublicMore by David Bednar
- , and
- Jiri Damborsky*
Jiri DamborskyLoschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech RepublicInternational Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech RepublicMore by Jiri Damborsky
Abstract
Natural enzymes are delicate biomolecules possessing only marginal thermodynamic stability. Poorly stable, misfolded, and aggregated proteins lead to huge economic losses in the biotechnology and biopharmaceutical industries. Consequently, there is a need to design optimized protein sequences that maximize stability, solubility, and activity over a wide range of temperatures and pH values in buffers of different composition and in the presence of organic cosolvents. This has created great interest in using computational methods to enhance biocatalysts’ robustness and solubility. Suitable methods include (i) energy calculations, (ii) machine learning, (iii) phylogenetic analyses, and (iv) combinations of these approaches. We have witnessed impressive progress in the design of stable enzymes over the last two decades, but predictions of protein solubility and expressibility are scarce. Stabilizing mutations can be predicted accurately using available force fields, and the number of sequences available for phylogenetic analyses is growing. In addition, complex computational workflows are being implemented in intuitive web tools, enhancing the quality of protein stability predictions. Conversely, solubility predictors are limited by the lack of robust and balanced experimental data, an inadequate understanding of fundamental principles of protein aggregation, and a dearth of structural information on folding intermediates. Here we summarize recent progress in the development of computational tools for predicting protein stability and solubility, critically assess their strengths and weaknesses, and identify apparent gaps in data and knowledge. We also present perspectives on the computational design of stable and soluble biocatalysts.
1. Introduction
Stable Biocatalysts | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
enzyme | UniProt ID | substrate | methodj | mutant code | mutationsa | wild-type Tm [°C] | ΔTm [°C]b | t1/2c | specific activityc | kcat/Kmc | ref |
cutinase | P52956 | 4-nitrophenyl butyrate | force field | variant 10 | 7 of 197 | 62.3 | 5.7 | 12.9× (60 °C) | 0.64× (25 °C) | n.d.d | (41) |
keratinase | Q1EM64 | keratin | machine learning | quadruple mutant | 4 of 379 | n.d. | n.d. | 8.6× (60 °C) | n.d. | 4.11× (40 °C) | (42) |
adenylate kinase | P16304 | Mg/ATP, AMP | phylogeny (ASR) | ANC1 | 66 of 218 | 53.6 | 35.4 | n.d. | n.d. | 1.79× (25 °C) | (43) |
β-lactamase | P62593 | benzylpenicillin | phylogeny (CD) | ALL-CON | 122 of 262 | 55.0 | 23.6 | n.d. | n.d. | 0.03× (25 °C) | (44) |
kemp eliminase | Q06121 | 5-nitrobenzisoxazole | phylogeny (CD)e | R2–4/3D | 9 of 247 | 72.0 | 10.0 | n.d. | n.d. | 11.46× (25 °C) | (31) |
haloalkane dehalogenase | P59336 | 1-iodohexane | hybridf | DhaA115 | 11 of 294 | 49.0 | 24.6 | 200× (60 °C) | 0.31× (37 °C) | 2.77× (37 °C) | (45) |
halohydrin dehalogenase | Q93D82 | rac-p-nitro-2-bromo-1-phenylethanol | hybridg | HheC-H12 | 13 of 253 | 57.0 | 25.5 | n.d. | n.d. | 0.88× (30 °C) | (9) |
Soluble Biocatalysts | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
enzyme | UniProt ID | substrate | methodj | mutant code | mutationsa | wild-type Tm [°C] | ΔTm [°C]b | expr. yieldc | specific activityc | expr. host | ref |
haloalkane dehalogenase | P59337 | 1,2-dibromoethane | phylogeny (ASR) | AncHLD2 | 69 of 317 | 53.6 | 21.9 | 4.8× (20 °C) | 1.86× (37 °C) | E. coli | (46) |
α-galactosidase | P06280 | α-d-galactose | hybridh | A348R/A368P/S405L | 3 of 397 | n.d. | n.d. | 1.4× (37 °C) | 2.00× (37 °C) | H. gartleri | (18) |
acetylcholinesterase | P22303 | acetylcholine | hybridi | dAChE4 | 51 of 542 | 44.0 | 18.3 | 2000× (20 °C) | 0.89× (25 °C) | E. coli | (47) |
Number of introduced mutations and total number of residues.
ΔTm value of the mutant with respect to the wild-type enzyme.
Fold change in the specified property of the mutant relative to the wild-type enzyme. The temperature at which the given property was measured is given in parentheses.
n.d.: not determined.
Spiked Consensus Design, Directed Evolution.
FireProt: Rosetta, FoldX, Consensus Design.
FRESCO: Rosetta, FoldX, Disufide Bonds, MD.
SOLUBIS: TANGO, FoldX.
PROSS: Consensus Design, Rosetta.
CD - Consensus Design, ASR - Ancestral Sequence Reconstruction.
method | advantages | disadvantages |
---|---|---|
energy calculations | • granularity of predictions can be adjusted via different force fields | • high computational cost of accurate methods |
• web servers make predictions accessible to inexperienced users | • dependence on high-resolution structures | |
• ever-growing structural databases together with advances in homology modeling and molecular threading | • trade-offs between stability and activity | |
• high accuracy for the prediction of single-point mutations | • predicted stable mutants may not be expressible | |
• epistatic effects are not well resolved | ||
machine learning | • very rapid predictions | • lack of balanced high-quality experimental data |
• easy to implement and use | • limited accuracy of current models | |
• wide applicability of features | • risk of overtraining | |
• no need to understand all dependencies | ||
• previously unknown patterns can be discovered | ||
phylogeneticsa | • rich abundance of sequence data | • selection of relevant sequences is nontrivial |
• structures not needed for predictions | • profound understanding of the gene family is required | |
• web servers available for certain tasks | • CD: epistatic effects are not considered | |
• CD: simple and fast | • ASR: small data set size due to computational costs | |
• CD: several filters are available to enhance prediction accuracies | • ASR: requires technical skills and experience | |
• ASR: prediction of highly thermostable variants is achievable | ||
• ASR: sequences of extremophilic proteins are not required | ||
• ASR: sequence context and epistasis are maintained |
CD, consensus design; ASR, ancestral sequence reconstruction.
2. Experimental Framework To Determine Protein Stability and Solubility
2.1. Experimental Determination of Protein Stability
2.2. Experimental Determination of Protein Solubility
3. Theoretical Framework for the Design of Robust Proteins
3.1. Principles of Methods Based on Energy Calculations
3.2. Principles of Methods Based on Machine Learning
3.3. Principles of Methods Based on Phylogenetic Analysis
4. Data Sets and Software Tools for Designing Stable Proteins
4.1. Data Sets for Protein Stability
4.2. Software Tools for Predicting Protein Stability Based on Energy Calculations
4.3. Software Tools for Predicting Protein Stability Based on Machine Learning
4.4. Software Tools for Predicting Protein Stability Based on Phylogenetics
4.5. Software Tools for Predicting Protein Stability Based on Hybrid Approaches
5. Data Sets and Software Tools for the Design of Soluble Proteins
5.1. Protein Solubility Data Sets
5.1.1. Protein Solubility Data Sets Based on Full-Length Proteins
5.1.2. Protein Solubility Data Sets Based on Protein Fragments
5.1.3. Protein Solubility Data Sets Based on Mutants
5.2. Software Tools for Predicting Protein Solubility
5.2.1. Software Tools for Protein Solubility Based on Primary Sequences
5.2.2. Software Tools for Predicting Protein Solubility Based on Sequence Profiles
5.2.3. Software Tools for Protein Solubility Based on Mutations
6. Perspectives
Protein Structures from Cryoelectron Microscopy and Hardware-Accelerated Calculations
Consistent and Balanced Stability Data Sets Are Urgently Needed
The Shift from Scores to Profiles and Specific Mutations in Solubility Predictions
High-Throughput Techniques for Highly Consistent Data Sets
Robust Scaffolds for Directed Evolution by Phylogenetic Analyses
Addressing Stability–Activity Trade-Offs Using Metadata and Negative and Multistate Designs
Enhancing Accuracy by Using Metapredictors, Consensual Force Fields, and Hybrid Methods
Supporting Information
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscatal.8b03613.
Data sets for prediction of protein stability (Table S1); software tools for prediction of protein stability (Table S2); data sets for prediction of protein solubility (Table S3); software tools for prediction of protein solubility (Table S4); comparison of the existing tools with the S350 data set (Table S5) (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.
Acknowledgments
The authors thank the Czech Ministry of Education (LM2015051, LM2015047, LM2015055, CZ.02.1.01/0.0/0.0/16_013/0001761, CZ.02.1.01/0.0/0.0/16_019/0000868, and CZ.02.1.01/0.0/0.0/16_026/0008451) and the European Commission (720776 and 722610) for financial support. H.K. is the MSCA ITN ES-Cat Research Fellow supported by the European Commission (722610). The work of M.M. and J.H. was supported by the ICT Tools, Methods and Technologies for Smart Cities Project of the Brno University of Technology (FIT-S-17-3964).
References
This article references 221 other publications.
-
1Choi, J.-M.; Han, S.-S.; Kim, H.-S. Industrial Applications of Enzyme Biocatalysis: Current Status and Future Aspects. Biotechnol. Adv. 2015, 33, 1443– 1454, DOI: 10.1016/j.biotechadv.2015.02.014Google Scholar1https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXksV2gsL4%253D&md5=024eb6961205da41328dd5b1c3b19244Industrial applications of enzyme biocatalysis: Current status and future aspectsChoi, Jung-Min; Han, Sang-Soo; Kim, Hak-SungBiotechnology Advances (2015), 33 (7), 1443-1454CODEN: BIADDD; ISSN:0734-9750. (Elsevier)A review. Enzymes are the most proficient catalysts, offering much more competitive processes compared to chem. catalysts. The no. of industrial applications for enzymes has exploded in recent years, mainly owing to advances in protein engineering technol. and environmental and economic necessities. Herein, we review recent progress in enzyme biocatalysis, and discuss the trends and strategies that are leading to broader industrial enzyme applications. The challenges and opportunities in developing biocatalytic processes are also discussed.
-
2Mitchell, A. C.; Briquez, P. S.; Hubbell, J. A.; Cochran, J. R. Engineering Growth Factors for Regenerative Medicine Applications. Acta Biomater. 2016, 30, 1– 12, DOI: 10.1016/j.actbio.2015.11.007Google Scholar2https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhvFWlur%252FK&md5=d04bfadd38b7372647cdb2fb406331bcEngineering growth factors for regenerative medicine applicationsMitchell, Aaron C.; Briquez, Priscilla S.; Hubbell, Jeffrey A.; Cochran, Jennifer R.Acta Biomaterialia (2016), 30 (), 1-12CODEN: ABCICB; ISSN:1742-7061. (Elsevier Ltd.)Growth factors are important morphogenetic proteins that instruct cell behavior and guide tissue repair and renewal. Although their therapeutic potential holds great promise in regenerative medicine applications, translation of growth factors into clin. treatments has been hindered by limitations including poor protein stability, low recombinant expression yield, and suboptimal efficacy. This review highlights current tools, technologies, and approaches to design integrated and effective growth factor-based therapies for regenerative medicine applications. The first section describes rational and combinatorial protein engineering approaches that have been utilized to improve growth factor stability, expression yield, biodistribution, and serum half-life, or alter their cell trafficking behavior or receptor binding affinity. The second section highlights elegant biomaterial-based systems, inspired by the natural extracellular matrix milieu, that have been developed for effective spatial and temporal delivery of growth factors to cell surface receptors. Although appearing distinct, these two approaches are highly complementary and involve principles of mol. design and engineering to be considered in parallel when developing optimal materials for clin. applications. Growth factors are promising therapeutic proteins that have the ability to modulate morphogenetic behaviors, including cell survival, proliferation, migration and differentiation. However, the translation of growth factors into clin. therapies has been hindered by properties such as poor protein stability, low recombinant expression yield, and non-physiol. delivery, which lead to suboptimal efficacy and adverse side effects. To address these needs, researchers are employing clever mol. and material engineering and design strategies to both improve the intrinsic properties of growth factors and effectively control their delivery into tissue. This review highlights examples of interdisciplinary tools and technologies used to augment the therapeutic potential of growth factors for clin. applications in regenerative medicine.
-
3Dvořák, P.; Nikel, P. I.; Damborský, J.; de Lorenzo, V. Bioremediation 3.0: Engineering Pollutant-Removing Bacteria in the Times of Systemic Biology. Biotechnol. Adv. 2017, 35, 845– 866, DOI: 10.1016/j.biotechadv.2017.08.001Google Scholar3https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtlOrt7zO&md5=fdd614f3970f7ea580fd20affbbd4f50Bioremediation 3.0: Engineering pollutant-removing bacteria in the times of systemic biologyDvorak, Pavel; Nikel, Pablo I.; Damborsky, Jiri; de Lorenzo, VictorBiotechnology Advances (2017), 35 (7), 845-866CODEN: BIADDD; ISSN:0734-9750. (Elsevier)Elimination or mitigation of the toxic effects of chem. waste released to the environment by industrial and urban activities relies largely on the catalytic activities of microorganisms-specifically bacteria. Given their capacity to evolve rapidly, they have the biochem. power to tackle a large no. of mols. mobilized from their geol. repositories through human action (e.g., hydrocarbons, heavy metals) or generated through chem. synthesis (e.g., xenobiotic compds.). Whereas naturally occurring microbes already have considerable ability to remove many environmental pollutants with no external intervention, the onset of genetic engineering in the 1980s allowed the possibility of rational design of bacteria to catabolize specific compds., which could eventually be released into the environment as bioremediation agents. The complexity of this endeavour and the lack of fundamental knowledge nonetheless led to the virtual abandonment of such a recombinant DNA-based bioremediation only a decade later. In a twist of events, the last few years have witnessed the emergence of new systemic fields (including systems and synthetic biol., and metabolic engineering) that allow revisiting the same environmental pollution challenges through fresh and far more powerful approaches. The focus on contaminated sites and chems. has been broadened by the phenomenal problems of anthropogenic emissions of greenhouse gases and the accumulation of plastic waste on a global scale. In this article, we analyze how contemporary systemic biol. is helping to take the design of bioremediation agents back to the core of environmental biotechnol. We inspect a no. of recent strategies for catabolic pathway construction and optimization and we bring them together by proposing an engineering workflow.
-
4Vanacek, P.; Sebestova, E.; Babkova, P.; Bidmanova, S.; Daniel, L.; Dvorak, P.; Stepankova, V.; Chaloupkova, R.; Brezovsky, J.; Prokop, Z.; Damborsky, J. Exploration of Enzyme Diversity by Integrating Bioinformatics with Expression Analysis and Biochemical Characterization. ACS Catal. 2018, 8, 2402– 2412, DOI: 10.1021/acscatal.7b03523Google Scholar4https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit1Wiur0%253D&md5=d6ecaacad16a14d9020c4c22df0220f6Exploration of enzyme diversity by integrating bioinformatics with expression analysis and biochemical characterizationVanacek, Pavel; Sebestova, Eva; Babkova, Petra; Bidmanova, Sarka; Daniel, Lukas; Dvorak, Pavel; Stepankova, Veronika; Chaloupkova, Radka; Brezovsky, Jan; Prokop, Zbynek; Damborsky, JiriACS Catalysis (2018), 8 (3), 2402-2412CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)Millions of protein sequences are being discovered at an incredible pace, representing an inexhaustible source of biocatalysts. Here, we describe an integrated system for automated in silico screening and systematic characterization of diverse family members. The workflow consists of (i) identification and computational characterization of relevant genes by sequence/structural bioinformatics, (ii) expression anal. and activity screening of selected proteins, and (iii) complete biochem./biophys. characterization and was validated against the haloalkane dehalogenase family. The sequence-based search identified 658 potential dehalogenases. The subsequent structural bioinformatics prioritized and selected 20 candidates for exploration of protein functional diversity. Out of these 20, the expression anal. and the robotic screening of enzymic activity provided 8 sol. proteins with dehalogenase activity. The enzymes discovered originated from genetically unrelated Bacteria, Eukaryota, and also Archaea. Overall, the integrated system provided biocatalysts with broad catalytic diversity showing unique substrate specificity profiles, covering a wide range of optimal operational temp. from 20 to 70 °C and an unusually broad pH range from 5.7 to 10. We obtained the most catalytically proficient native haloalkane dehalogenase enzyme to date (kcat/K0.5 = 96.8 mM-1s-1), the most thermostable enzyme with melting temp. 71 °C, three different cold-adapted enzymes showing dehalogenase activity at near-to-zero temps., and a biocatalyst degrading the warfare chem. sulfur mustard. The established strategy can be adapted to other enzyme families for exploration of their biocatalytic diversity in a large sequence space continuously growing due to the use of next-generation sequencing technologies.
-
5Bornscheuer, U. T.; Huisman, G. W.; Kazlauskas, R. J.; Lutz, S.; Moore, J. C.; Robins, K. Engineering the Third Wave of Biocatalysis. Nature 2012, 485, 185– 194, DOI: 10.1038/nature11117Google Scholar5https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmvVeqsLk%253D&md5=5f20c530c25ea886f5f5d33dbea0075aEngineering the third wave of biocatalysisBornscheuer, U. T.; Huisman, G. W.; Kazlauskas, R. J.; Lutz, S.; Moore, J. C.; Robins, K.Nature (London, United Kingdom) (2012), 485 (7397), 185-194CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)A review. Over the past ten years, scientific and technol. advances have established biocatalysis as a practical and environmentally friendly alternative to traditional metallo- and organocatalysis in chem. synthesis, both in the lab. and on an industrial scale. Key advances in DNA sequencing and gene synthesis are at the base of tremendous progress in tailoring biocatalysts by protein engineering and design, and the ability to reorganize enzymes into new biosynthetic pathways. To highlight these achievements, here we discuss applications of protein-engineered biocatalysts ranging from commodity chems. to advanced pharmaceutical intermediates that use enzyme catalysis as a key step.
-
6Tokuriki, N.; Stricher, F.; Serrano, L.; Tawfik, D. S. How Protein Stability and New Functions Trade Off. PLoS Comput. Biol. 2008, 4, e1000002, DOI: 10.1371/journal.pcbi.1000002Google Scholar6https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1czgtFWksg%253D%253D&md5=ade16cd7f3f47d20357654c6b85ce338How protein stability and new functions trade offTokuriki Nobuhiko; Stricher Francois; Serrano Luis; Tawfik Dan SPLoS computational biology (2008), 4 (2), e1000002 ISSN:.Numerous studies have noted that the evolution of new enzymatic specificities is accompanied by loss of the protein's thermodynamic stability (DeltaDeltaG), thus suggesting a tradeoff between the acquisition of new enzymatic functions and stability. However, since most mutations are destabilizing (DeltaDeltaG>0), one should ask how destabilizing mutations that confer new or altered enzymatic functions relative to all other mutations are. We applied DeltaDeltaG computations by FoldX to analyze the effects of 548 mutations that arose from the directed evolution of 22 different enzymes. The stability effects, location, and type of function-altering mutations were compared to DeltaDeltaG changes arising from all possible point mutations in the same enzymes. We found that mutations that modulate enzymatic functions are mostly destabilizing (average DeltaDeltaG = +0.9 kcal/mol), and are almost as destabilizing as the "average" mutation in these enzymes (+1.3 kcal/mol). Although their stability effects are not as dramatic as in key catalytic residues, mutations that modify the substrate binding pockets, and thus mediate new enzymatic specificities, place a larger stability burden than surface mutations that underline neutral, non-adaptive evolutionary changes. How are the destabilizing effects of functional mutations balanced to enable adaptation? Our analysis also indicated that many mutations that appear in directed evolution variants with no obvious role in the new function exert stabilizing effects that may compensate for the destabilizing effects of the crucial function-altering mutations. Thus, the evolution of new enzymatic activities, both in nature and in the laboratory, is dependent on the compensatory, stabilizing effect of apparently "silent" mutations in regions of the protein that are irrelevant to its function.
-
7Dellus-Gur, E.; Toth-Petroczy, A.; Elias, M.; Tawfik, D. S. What Makes a Protein Fold Amenable to Functional Innovation? Fold Polarity and Stability Trade-Offs. J. Mol. Biol. 2013, 425, 2609– 2621, DOI: 10.1016/j.jmb.2013.03.033Google Scholar7https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmtV2rsbs%253D&md5=3c572ecbdb0b1d5ace71fd67257be136What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offsDellus-Gur, Eynat; Toth-Petroczy, Agnes; Elias, Mikael; Tawfik, Dan S.Journal of Molecular Biology (2013), 425 (14), 2609-2621CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Protein evolvability includes 2 elements, robustness (or neutrality, mutations having no effect) and innovability (mutations readily inducing new functions). How are these 2 conflicting demands bridged. Does the ability to bridge them relate to the observation that certain folds, such as TIM barrels, accommodate numerous functions, whereas other folds support only one. Here, the authors hypothesized that the key to innovability is polarity, an active site composed of flexible, loosely packed loops alongside a well-sepd., highly ordered scaffold. The authors showed that highly stabilized variants of TEM-1 β-lactamase exhibited selective rigidification of the enzyme's scaffold while the active site loops maintained their conformational plasticity. Polarity therefore results in stabilizing, compensatory mutations not trading off, but instead promoting the acquisition of new activities. Indeed, computational anal. indicated that in folds that accommodate only one function throughout evolution, e.g., dihydrofolate reductase, ≥60% of the active site residues belonged to the scaffold. In contrast, folds assocd. with multiple functions such as the TIM barrel showed high scaffold-active site polarity (∼20% of the active site comprised scaffold residues) and >2-fold higher rates of sequence divergence at active site positions. Thus, this work suggests structural measures of fold polarity that appear to be correlated with innovability, thereby providing new insights regarding protein evolution, design, and engineering.
-
8Johansson, K. E.; Johansen, N. T.; Christensen, S.; Horowitz, S.; Bardwell, J. C. A.; Olsen, J. G.; Willemoës, M.; Lindorff-Larsen, K.; Ferkinghoff-Borg, J.; Hamelryck, T.; Winther, J. R. Computational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone Template. J. Mol. Biol. 2016, 428, 4361– 4377, DOI: 10.1016/j.jmb.2016.09.013Google Scholar8https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsFOku7%252FM&md5=7f8f2300f99adbd84be36e52806c9a1eComputational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone TemplateJohansson, Kristoffer E.; Johansen, Nicolai Tidemand; Christensen, Signe; Horowitz, Scott; Bardwell, James C. A.; Olsen, Johan G.; Willemoes, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R.Journal of Molecular Biology (2016), 428 (21), 4361-4377CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 exptl. detd. template structures and generated 120 designs from each. For exptl. evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce sol. protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, sol. proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in sol. protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations.
-
9Arabnejad, H.; Dal Lago, M.; Jekel, P. A.; Floor, R. J.; Thunnissen, A.-M. W. H.; Terwisscha van Scheltinga, A. C.; Wijma, H. J.; Janssen, D. B. A Robust Cosolvent-Compatible Halohydrin Dehalogenase by Computational Library Design. Protein Eng., Des. Sel. 2017, 30, 175– 189, DOI: 10.1093/protein/gzw068Google Scholar9https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhsFWmsL7P&md5=b96764d41caee68bf2da8c229f63fa95A robust cosolvent-compatible halohydrin dehalogenase by computational library designArabnejad, Hesam; Lago, Marco Dal; Jekel, Peter A.; Floor, Robert J.; Thunnissen, Andy-Mark W. H.; van Scheltinga, Anke C. Terwisscha; Wijma, Hein J.; Janssen, Dick B.Protein Engineering, Design & Selection (2017), 30 (3), 175-189CODEN: PEDSBR; ISSN:1741-0134. (Oxford University Press)To improve the applicability of halohydrin dehalogenase as a catalyst for reactions in the presence of org. cosolvents, we explored a computational library design strategy (Framework for Rapid Enzyme Stabilization by Computational libraries) that involves discovery and in silico evaluation of stabilizing mutations. Energy calcns., disulfide bond predictions and mol. dynamics simulations identified 218 point mutations and 35 disulfide bonds with predicted stabilizing effects. Expts. confirmed 29 stabilizing point mutations, most of which were located in two distinct regions, whereas introduction of disulfide bonds was not effective. Combining the best mutations resulted in a 12-fold mutant (HheC-H12) with a 28°C higher apparent melting temp. and a remarkable increase in resistance to cosolvents. This variant also showed a higher optimum temp. for catalysis while activity at low temp. was preserved. Mutant H12 was used as a template for the introduction of mutations that enhance enantioselectivity or activity. Crystal structures showed that the structural changes in the H12 mutant mostly agreed with the computational predictions and that the enhanced stability was mainly due to mutations that redistributed surface charges and improved interactions between subunits, the latter including better interactions of water mols. at the subunit interfaces.
-
10Wyganowski, K. T.; Kaltenbach, M.; Tokuriki, N. GroEL/ES Buffering and Compensatory Mutations Promote Protein Evolution by Stabilizing Folding Intermediates. J. Mol. Biol. 2013, 425, 3403– 3414, DOI: 10.1016/j.jmb.2013.06.028Google Scholar10https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtFOgtLbO&md5=ff35330976ea66426ccd97b1f481eb5aGroEL/ES Buffering and Compensatory Mutations Promote Protein Evolution by Stabilizing Folding IntermediatesWyganowski, Kirsten T.; Kaltenbach, Miriam; Tokuriki, NobuhikoJournal of Molecular Biology (2013), 425 (18), 3403-3414CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Maintaining stability is a major constraint in protein evolution because most mutations are destabilizing. Buffering and/or compensatory mechanisms that counteract this progressive destabilization during functional adaptation are pivotal for protein evolution as well as protein engineering. However, the interplay of these two mechanisms during a full evolutionary trajectory has never been explored. Here, we unravel such dynamics during the lab. evolution of a phosphotriesterase into an arylesterase. A controllable GroEL/ES chaperone co-expression system enabled us to vary the selection environment between buffering and compensatory, which smoothened the trajectory along the fitness landscape to achieve a > 104 increase in arylesterase activity. Biophys. characterization revealed that, in contrast to prevalent models of protein stability and evolution, the variants' sol. cellular expression did not correlate with in vitro stability, and compensatory mutations were linked to a stabilization of folding intermediates. Thus, folding kinetics in the cell are a key feature of protein evolvability.
-
11Lawrence, P. B.; Gavrilov, Y.; Matthews, S. S.; Langlois, M. I.; Shental-Bechor, D.; Greenblatt, H. M.; Pandey, B. K.; Smith, M. S.; Paxman, R.; Torgerson, C. D.; Merrell, J. P.; Ritz, C. C.; Prigozhin, M. B.; Levy, Y.; Price, J. L. Criteria for Selecting PEGylation Sites on Proteins for Higher Thermodynamic and Proteolytic Stability. J. Am. Chem. Soc. 2014, 136, 17547– 17560, DOI: 10.1021/ja5095183Google Scholar11https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhvFOjsL3F&md5=080147667225e65ace2de42bbb99266cCriteria for Selecting PEGylation Sites on Proteins for Higher Thermodynamic and Proteolytic StabilityLawrence, Paul B.; Gavrilov, Yulian; Matthews, Sam S.; Langlois, Minnie I.; Shental-Bechor, Dalit; Greenblatt, Harry M.; Pandey, Brijesh K.; Smith, Mason S.; Paxman, Ryan; Torgerson, Chad D.; Merrell, Jacob P.; Ritz, Cameron C.; Prigozhin, Maxim B.; Levy, Yaakov; Price, Joshua L.Journal of the American Chemical Society (2014), 136 (50), 17547-17560CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)PEGylation of protein side chains has been used for >30 years to enhance the pharmacokinetic properties of protein drugs. However, there are no structure- or sequence-based guidelines for selecting sites that provide optimal PEG-based pharmacokinetic enhancement with minimal losses to biol. activity. The authors hypothesize that globally optimal PEGylation sites are characterized by the ability of the PEG oligomer to increase protein conformational stability; however, the current understanding of how PEG influences the conformational stability of proteins is incomplete. Here the authors use the WW domain of the human protein Pin 1 (WW) as a model system to probe the impact of PEG on protein conformational stability. Using a combination of exptl. and theor. approaches, the authors develop a structure-based method for predicting which sites within WW are most likely to experience PEG-based stabilization, and this method correctly predicts the location of a stabilizing PEGylation site within the chicken Src SH3 domain. PEG-based stabilization in WW is assocd. with enhanced resistance to proteolysis, is entropic in origin, and likely involves disruption by PEG of the network of hydrogen-bound solvent mols. that surround the protein. The authors' results highlight the possibility of using modern site-specific PEGylation techniques to install PEG oligomers at predetd. locations where PEG will provide optimal increases in conformational and proteolytic stability.
-
12Rueda, N.; Dos Santos, J. C. S.; Ortiz, C.; Torres, R.; Barbosa, O.; Rodrigues, R. C.; Berenguer-Murcia, Á.; Fernandez-Lafuente, R. Chemical Modification in the Design of Immobilized Enzyme Biocatalysts: Drawbacks and Opportunities. Chem. Rec. 2016, 16, 1436– 1455, DOI: 10.1002/tcr.201600007Google Scholar12https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XnslajsL4%253D&md5=a13d77ce97c2c090b8cb40bc079aec4dChemical Modification in the Design of Immobilized Enzyme Biocatalysts: Drawbacks and OpportunitiesRueda, Nazzoly; dos Santos, Jose C. S.; Ortiz, Claudia; Torres, Rodrigo; Barbosa, Oveimar; Rodrigues, Rafael C.; Berenguer-Murcia, Angel; Fernandez-Lafuente, RobertoChemical Record (2016), 16 (3), 1436-1455CODEN: CRHEAK; ISSN:1528-0691. (Wiley-VCH Verlag GmbH & Co. KGaA)Chem. modification of enzymes and immobilization used to be considered as sep. ways to improve enzyme properties. This review shows how the coupled use of both tools may greatly improve the final biocatalyst performance. Chem. modification of a previously immobilized enzyme is far simpler and easier to control than the modification of the free enzyme. Moreover, if protein modification is performed to improve its immobilization (enriching the enzyme in reactive groups), the final features of the immobilized enzyme may be greatly improved. Chem. modification may be directed to improve enzyme stability, but also to improve selectivity, specificity, activity, and even cell penetrability. Coupling of immobilization and chem. modification with site-directed mutagenesis is a powerful instrument to obtain fully controlled modification. Some new ideas such as photoreceptive enzyme modifiers that change their phys. properties under UV exposition are discussed.
-
13Stepankova, V.; Bidmanova, S.; Koudelakova, T.; Prokop, Z.; Chaloupkova, R.; Damborsky, J. Strategies for Stabilization of Enzymes in Organic Solvents. ACS Catal. 2013, 3, 2823– 2836, DOI: 10.1021/cs400684xGoogle Scholar13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhs1Sqs7nL&md5=6fdea25bb110c5b82c8fd8c03dcf7e90Strategies for Stabilization of Enzymes in Organic SolventsStepankova, Veronika; Bidmanova, Sarka; Koudelakova, Tana; Prokop, Zbynek; Chaloupkova, Radka; Damborsky, JiriACS Catalysis (2013), 3 (12), 2823-2836CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)A review. One of the major barriers to the use of enzymes in industrial biotechnol. is their insufficient stability under processing conditions. The use of org. solvent systems instead of aq. media for enzymic reactions offers numerous advantages, such as increased soly. of hydrophobic substrates or suppression of water-dependent side reactions. For example, reverse hydrolysis reactions that form esters from acids and alcs. become thermodynamically favorable. However, org. solvents often inactivate enzymes. Industry and academia have devoted considerable effort into developing effective strategies to enhance the lifetime of enzymes in the presence of org. solvents. The strategies can be grouped into three main categories: (i) isolation of novel enzymes functioning under extreme conditions, (ii) modification of enzyme structures to increase their resistance toward nonconventional media, and (iii) modification of the solvent environment to decrease its denaturing effect on enzymes. Here we discuss successful examples representing each of these categories and summarize their advantages and disadvantages. Finally, we highlight some potential future research directions in the field, such as investigation of novel nanomaterials for immobilization, wider application of computational tools for semirational prediction of stabilizing mutations, knowledge-driven modification of key structural elements learned from successfully engineered proteins, and replacement of volatile org. solvents by ionic liqs. and deep eutectic solvents.
-
14Butt, T. R.; Edavettal, S. C.; Hall, J. P.; Mattern, M. R. SUMO Fusion Technology for Difficult-to-Express Proteins. Protein Expression Purif. 2005, 43, 1– 9, DOI: 10.1016/j.pep.2005.03.016Google Scholar14https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXntVGhsbw%253D&md5=999298b10bc86121509754c3fe448baeSUMO fusion technology for difficult-to-express proteinsButt, Tauseef R.; Edavettal, Suzanne C.; Hall, John P.; Mattern, Michael R.Protein Expression and Purification (2005), 43 (1), 1-9CODEN: PEXPEJ; ISSN:1046-5928. (Elsevier)A review. The demands of structural and functional genomics for large quantities of sol., properly folded proteins in heterologous hosts have been aided by advancements in the field of protein prodn. and purifn. Escherichia coli, the preferred host for recombinant protein expression, presents many challenges which must be surmounted in order to over-express heterologous proteins. These challenges include the proteolytic degrdn. of target proteins, protein misfolding, poor soly., and the necessity for good purifn. methodologies. Gene fusion technologies have been able to improve heterologous expression by overcoming many of these challenges. The ability of gene fusions to improve expression, soly., purifn., and decrease proteolytic degrdn. will be discussed in this review. The main disadvantage, cleaving the protein fusion, will also be addressed. Focus will be given to the newly described SUMO fusion system and the improvements that this technol. has advanced over traditional gene fusion systems.
-
15LaVallie, E. R.; DiBlasio, E. A.; Kovacic, S.; Grant, K. L.; Schendel, P. F.; McCoy, J. M. A Thioredoxin Gene Fusion Expression System That Circumvents Inclusion Body Formation in the E. coli Cytoplasm. Nat. Biotechnol. 1993, 11, 187– 193, DOI: 10.1038/nbt0293-187Google Scholar15https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXisFegsr0%253D&md5=d29a3b665417f16971038d34f7e58d92A thioredoxin gene fusion expression system that circumvents inclusion body formation in the E. coli cytoplasmLaVallie, Edward R.; DiBlasio, Elizabeth A.; Kovacic, Sharlotte; Grant, Kathleen L.; Schendel, Paul F.; McCoy, John M.Bio/Technology (1993), 11 (2), 187-93CODEN: BTCHDA; ISSN:0733-222X.A versatile Escherichia coli expression system was developed based on the use of E. coli thioredoxin (trxA) as a gene fusion partner. The broad utility of the system is illustrated by the prodn. of a variety of mammalian cytokines and growth factors as thioredoxin fusion proteins. Although many of these cytokines previously have been produced in E. coli as insol. aggregates or inclusion bodies, as thioredoxin fusions they can be made in sol. forms that are biol. active. In general, linkage to thioredoxin dramatically increases the soly. of heterologous proteins synthesized in the E. coli cytoplasm, and thioredoxin fusion proteins usually accumulate to high levels. Two addnl. properties of E. coli thioredoxin, its ability to be specifically released from the E. coli cytoplasm by osmotic shock or freeze/thaw treatments and its intrinsic thermal stability , are retained by some fusions and provide convenient purifn. steps. Active-site loop of E. coli thioredoxin can be used as a general site for small peptide insertions, allowing for the high level prodn. of sol. peptides in the E. coli cytoplasm.
-
16Bloom, J. D.; Labthavikul, S. T.; Otey, C. R.; Arnold, F. H. Protein Stability Promotes Evolvability. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 5869– 5874, DOI: 10.1073/pnas.0510098103Google Scholar16https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XktFait7s%253D&md5=dde8f702bc7083edad42a615aff09292Protein stability promotes evolvabilityBloom, Jesse D.; Labthavikul, Sy T.; Otey, Christopher R.; Arnold, Frances H.Proceedings of the National Academy of Sciences of the United States of America (2006), 103 (15), 5869-5874CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)The biophys. properties that enable proteins to so readily evolve to perform diverse biochem. tasks are largely unknown. Here, we show that a protein's capacity to evolve is enhanced by the mutational robustness conferred by extra stability. We use simulations with model lattice proteins to demonstrate how extra stability increases evolvability by allowing a protein to accept a wider range of beneficial mutations while still folding to its native structure. We confirm this view exptl. by mutating marginally stable and thermostable variants of cytochrome P 450 BM3. Mutants of the stabilized parent were more likely to exhibit new or improved functions. Only the stabilized P 450 parent could tolerate the highly destabilizing mutations needed to confer novel activities such as hydroxylating the antiinflammatory drug naproxen. Our work establishes a crucial link between protein stability and evolution. We show that we can exploit this link to discover protein functions, and we suggest how natural evolution might do the same.
-
17Sormanni, P.; Aprile, F. A.; Vendruscolo, M. The CamSol Method of Rational Design of Protein Mutants with Enhanced Solubility. J. Mol. Biol. 2015, 427, 478– 490, DOI: 10.1016/j.jmb.2014.09.026Google Scholar17https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhslOktbfN&md5=10cea42ff7f45b198c6bc60f52127adfThe CamSol Method of Rational Design of Protein Mutants with Enhanced SolubilitySormanni, Pietro; Aprile, Francesco A.; Vendruscolo, MicheleJournal of Molecular Biology (2015), 427 (2), 478-490CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Protein soly. is often an essential requirement in biotechnol. and biomedical applications. Great advances in understanding the principles that det. this specific property of proteins have been made during the past decade, in particular concerning the physicochem. characteristics of their constituent amino acids. By exploiting these advances, we present the CamSol method for the rational design of protein variants with enhanced soly. The method works by performing a rapid computational screening of tens of thousand of mutations to identify those with the greatest impact on the soly. of the target protein while maintaining its native state and biol. activity. The application to a single-domain antibody that targets the Alzheimer's Aβ peptide demonstrates that the method predicts with great accuracy soly. changes upon mutation, thus offering a cost-effective strategy to help the prodn. of sol. proteins for academic and industrial purposes.
-
18Ganesan, A.; Siekierska, A.; Beerten, J.; Brams, M.; Van Durme, J.; De Baets, G.; Van der Kant, R.; Gallardo, R.; Ramakers, M.; Langenberg, T.; Wilkinson, H.; De Smet, F.; Ulens, C.; Rousseau, F.; Schymkowitz, J. Structural Hot Spots for the Solubility of Globular Proteins. Nat. Commun. 2016, 7, 10816, DOI: 10.1038/ncomms10816Google Scholar18https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xjt1emtbw%253D&md5=eba06a338f47637413eff7720f355602Structural hot spots for the solubility of globular proteinsGanesan, Ashok; Siekierska, Aleksandra; Beerten, Jacinte; Brams, Marijke; Van Durme, Joost; De Baets, Greet; Van der Kant, Rob; Gallardo, Rodrigo; Ramakers, Meine; Langenberg, Tobias; Wilkinson, Hannah; De Smet, Frederik; Ulens, Chris; Rousseau, Frederic; Schymkowitz, JoostNature Communications (2016), 7 (), 10816CODEN: NCAOBW; ISSN:2041-1723. (Nature Publishing Group)Natural selection shapes protein soly. to physiol. requirements and recombinant applications that require higher protein concns. are often problematic. This raises the question whether the soly. of natural protein sequences can be improved. We here show an anti-correlation between the no. of aggregation prone regions (APRs) in a protein sequence and its soly., suggesting that mutational suppression of APRs provides a simple strategy to increase protein soly. We show that mutations at specific positions within a protein structure can act as APR suppressors without affecting protein stability. These hot spots for protein soly. are both structure and sequence dependent but can be computationally predicted. We demonstrate this by reducing the aggregation of human α-galactosidase and protective antigen of Bacillus anthracis through mutation. Our results indicate that many proteins possess hot spots allowing to adapt protein soly. independently of structure and function.
-
19Zeymer, C.; Hilvert, D. Directed Evolution of Protein Catalysts. Annu. Rev. Biochem. 2018, 87, 131– 157, DOI: 10.1146/annurev-biochem-062917-012034Google Scholar19https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXjs1Oisr0%253D&md5=933b5fe198a29f6e4ac0a738e34a566dDirected Evolution of Protein CatalystsZeymer, Cathleen; Hilvert, DonaldAnnual Review of Biochemistry (2018), 87 (), 131-157CODEN: ARBOAW; ISSN:0066-4154. (Annual Reviews)A review. Directed evolution is a powerful technique for generating tailor-made enzymes for a wide range of biocatalytic applications. Following the principles of natural evolution, iterative cycles of mutagenesis and screening or selection are applied to modify protein properties, enhance catalytic activities, or develop completely new protein catalysts for non-natural chem. transformations. This review briefly surveys the exptl. methods used to generate genetic diversity and screen or select for improved enzyme variants. Emphasis is placed on a key challenge, namely how to generate novel catalytic activities that expand the scope of natural reactions. Two particularly effective strategies, exploiting catalytic promiscuity and rational design, are illustrated by representative examples of successfully evolved enzymes. Opportunities for extending these approaches to more complex biocatalytic systems are also considered.
-
20Starr, T. N.; Thornton, J. W. Epistasis in Protein Evolution. Protein Sci. 2016, 25, 1204– 1218, DOI: 10.1002/pro.2897Google Scholar20https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xjt1Cnt78%253D&md5=1bebae9ad2da57b530ac51ade55e1813Epistasis in protein evolutionStarr, Tyler N.; Thornton, Joseph W.Protein Science (2016), 25 (7), 1204-1218CODEN: PRCIEI; ISSN:1469-896X. (Wiley-Blackwell)The structure, function, and evolution of proteins depend on phys. and genetic interactions among amino acids. Recent studies have used new strategies to explore the prevalence, biochem. mechanisms, and evolutionary implications of these interactions-called epistasis-within proteins. Here we describe an emerging picture of pervasive epistasis in which the phys. and biol. effects of mutations change over the course of evolution in a lineage-specific fashion. Epistasis can restrict the trajectories available to an evolving protein or open new paths to sequences and functions that would otherwise have been inaccessible. We describe two broad classes of epistatic interactions, which arise from different phys. mechanisms and have different effects on evolutionary processes. Specific epistasis-in which one mutation influences the phenotypic effect of few other mutations-is caused by direct and indirect phys. interactions between mutations, which nonadditively change the protein's phys. properties, such as conformation, stability, or affinity for ligands. In contrast, nonspecific epistasis describes mutations that modify the effect of many others; these typically behave additively with respect to the phys. properties of a protein but exhibit epistasis because of a nonlinear relationship between the phys. properties and their biol. effects, such as function or fitness. Both types of interaction are rampant, but specific epistasis has stronger effects on the rate and outcomes of evolution, because it imposes stricter constraints and modulates evolutionary potential more dramatically; it therefore makes evolution more contingent on low-probability historical events and leaves stronger marks on the sequences, structures, and functions of protein families.
-
21Goldsmith, M.; Tawfik, D. S. Enzyme Engineering: Reaching the Maximal Catalytic Efficiency Peak. Curr. Opin. Struct. Biol. 2017, 47, 140– 150, DOI: 10.1016/j.sbi.2017.09.002Google Scholar21https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhs1ais7vK&md5=6d05255856d58b4d8fd491c5b68da28bEnzyme engineering: reaching the maximal catalytic efficiency peakGoldsmith, Moshe; Tawfik, Dan S.Current Opinion in Structural Biology (2017), 47 (), 140-150CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)A review. The practical need for highly efficient enzymes presents new challenges in enzyme engineering, in particular, the need to improve catalytic turnover (kcat) or efficiency (kcat/KM) by several orders of magnitude. However, optimizing catalysis demands navigation through complex and rugged fitness landscapes, with optimization trajectories often leading to strong diminishing returns and dead-ends. When no further improvements are obsd. in library screens or selections, it remains unclear whether the maximal catalytic efficiency of the enzyme (the catalytic 'fitness peak') has been reached; or perhaps, an alternative combination of mutations exists that could yield addnl. improvements. Here, we discuss fundamental aspects of the process of catalytic optimization, and offer practical solns. with respect to overcoming optimization plateaus.
-
22Currin, A.; Swainston, N.; Day, P. J.; Kell, D. B. Synthetic Biology for the Directed Evolution of Protein Biocatalysts: Navigating Sequence Space Intelligently. Chem. Soc. Rev. 2015, 44, 1172– 1239, DOI: 10.1039/C4CS00351AGoogle Scholar22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXitFeht7jK&md5=c921dc51a66756d2d3d96f2d0b619b38Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligentlyCurrin, Andrew; Swainston, Neil; Day, Philip J.; Kell, Douglas B.Chemical Society Reviews (2015), 44 (5), 1172-1239CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biol., whereby increasingly large sequences of DNA can be synthesized de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the no. of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modeling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), esp. with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a no. of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modeling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biol., offers scope for the development of novel biocatalysts that are both highly active and robust.
-
23Rocklin, G. J.; Chidyausiku, T. M.; Goreshnik, I.; Ford, A.; Houliston, S.; Lemak, A.; Carter, L.; Ravichandran, R.; Mulligan, V. K.; Chevalier, A.; Arrowsmith, C. H.; Baker, D. Global Analysis of Protein Folding Using Massively Parallel Design, Synthesis, and Testing. Science 2017, 357, 168– 175, DOI: 10.1126/science.aan0693Google Scholar23https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFOjs7rK&md5=0c089edbcc1309b72f412cfe72d149cfGlobal analysis of protein folding using massively parallel design, synthesis, and testingRocklin, Gabriel J.; Chidyausiku, Tamuka M.; Goreshnik, Inna; Ford, Alex; Houliston, Scott; Lemak, Alexander; Carter, Lauren; Ravichandran, Rashmi; Mulligan, Vikram K.; Chevalier, Aaron; Arrowsmith, Cheryl H.; Baker, DavidScience (Washington, DC, United States) (2017), 357 (6347), 168-175CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are "encoded" in the thousands of known protein structures, "decoding" them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 neg. control sequences. This anal. identified more than 2500 stable designed proteins in four basic folds - a no. sufficient to enable us to systematically examine how sequence dets. folding and stability in uncharted protein space. Iteration between design and expt. increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and expt. and has the potential to transform computational protein design into a data-driven science.
-
24Sumbalova, L.; Stourac, J.; Martinek, T.; Bednar, D.; Damborsky, J. HotSpot Wizard 3.0: Web Server for Automated Design of Mutations and Smart Libraries Based on Sequence Input Information. Nucleic Acids Res. 2018, 46, W356– W362, DOI: 10.1093/nar/gky417Google Scholar24https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXosVyrs7s%253D&md5=2b71751334fae9917f809937b25e0c34HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input informationSumbalova, Lenka; Stourac, Jan; Martinek, Tomas; Bednar, David; Damborsky, JiriNucleic Acids Research (2018), 46 (W1), W356-W362CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)HotSpot Wizard is a web server used for the automated identification of hotspots in semi-rational protein design to give improved protein stability, catalytic activity, substrate specificity and enantioselectivity. Since there are three orders of magnitude fewer protein structures than sequences in bioinformatic databases, the major limitation to the usability of previous versions was the requirement for the protein structure to be a compulsory input for the calcn. HotSpot Wizard 3.0 now accepts the protein sequence as input data. The protein structure for the query sequence is obtained either from eight repositories of homol. models or is modeled using Modeller and I-Tasser. The quality of the models is then evaluated using three quality assessment tools--WHAT CHECK, PROCHECK and Mol- Probity. During follow-up analyses, the system automatically warns the users whenever they attempt to redesign poorly predicted parts of their homol. models. The second main limitation of HotSpot Wizard's predictions is that it identifies suitable positions for mutagenesis, but does not provide any reliable advice on particular substitutions. A new module for the estn. of thermodn. stabilities using the Rosetta and FoldX suites has been introduced which prevents destabilizing mutations among pre-selected variants entering exptl. testing.
-
25Kuipers, R. K.; Joosten, H.-J.; van Berkel, W. J. H.; Leferink, N. G. H.; Rooijen, E.; Ittmann, E.; van Zimmeren, F.; Jochens, H.; Bornscheuer, U.; Vriend, G.; Martins dos Santos, V. A. P.; Schaap, P. J. 3DM: Systematic Analysis of Heterogeneous Superfamily Data to Discover Protein Functionalities. Proteins: Struct., Funct., Bioinf. 2010, 78, 2101– 2113, DOI: 10.1002/prot.22725Google Scholar25https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXlslegtrc%253D&md5=0bff2b9dfe4df7986583e43e9a7b16823DM: systematic analysis of heterogeneous superfamily data to discover protein functionalitiesKuipers, Remko K.; Joosten, Henk-Jan; van Berkel, Willem J. H.; Leferink, Nicole G. H.; Rooijen, Erik; Ittmann, Erik; van Zimmeren, Frank; Jochens, Helge; Bornscheuer, Uwe; Vriend, Gert; Martins dos Santos, Vitor A. P.; Schaap, Peter J.Proteins: Structure, Function, and Bioinformatics (2010), 78 (9), 2101-2113CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)Ten years of experience with mol. class-specific information systems (MCSIS) such as with the hand-curated G protein-coupled receptor database (GPCRDB) or the semiautomatically generated nuclear receptor database has made clear that a wide variety of questions can be answered when protein-related data from many different origins can be flexibly combined. MCSISes revolve around a multiple sequence alignment (MSA) that includes "all" available sequences from the entire superfamily, and it has been shown at many occasions that the quality of these alignments is the most crucial aspect of the MCSIS approach. We describe here a system called 3DM that can automatically build an entire MCSIS. 3DM bases the MSA on a multiple structure alignment, which implies that the availability of a large no. of superfamily members with a known three-dimensional structure is a requirement for 3DM to succeed well. Thirteen MCSISes were constructed and placed on the Internet for examn. These systems have been instrumental in a large series of research projects related to enzyme activity or the understanding and engineering of specificity, protein stability engineering, DNA-diagnostics, drug design, and so forth.
-
26Reetz, M. T.; Carballeira, J. D. Iterative Saturation Mutagenesis (ISM) for Rapid Directed Evolution of Functional Enzymes. Nat. Protoc. 2007, 2, 891– 903, DOI: 10.1038/nprot.2007.72Google Scholar26https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtFGnur%252FP&md5=03f309e6d923d5c2506e3718362b7ee7Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymesReetz, Manfred T.; Carballeira, Jose DanielNature Protocols (2007), 2 (4), 891-903CODEN: NPARDW; ISSN:1750-2799. (Nature Publishing Group)Iterative satn. mutagenesis (ISM) is a new and efficient method for the directed evolution of functional enzymes. It reduces the necessary mol. biol. work and the screening effort drastically. It is based on a Cartesian view of the protein structure, performing iterative cycles of satn. mutagenesis at rationally chosen sites in an enzyme, a given site being composed of one, two or three amino acid positions. The basis for choosing these sites depends on the nature of the catalytic property to be improved, e.g., enantioselectivity, substrate acceptance or thermostability. In the case of thermostability, sites showing highest B-factors (available from x-ray data) are chosen. The pronounced increase in thermostability of the lipase from Bacillus subtilis (Lip A) as a result of applying ISM is illustrated here.
-
27Liskova, V.; Stepankova, V.; Bednar, D.; Brezovsky, J.; Prokop, Z.; Chaloupkova, R.; Damborsky, J. Different Structural Origins of the Enantioselectivity of Haloalkane Dehalogenases toward Linear β-Haloalkanes: Open-Solvated versus Occluded-Desolvated Active Sites. Angew. Chem., Int. Ed. 2017, 56, 4719– 4723, DOI: 10.1002/anie.201611193Google Scholar27https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXkvFWlsrc%253D&md5=180d88a209ea61b5e45f3e03c826db89Different Structural Origins of the Enantioselectivity of Haloalkane Dehalogenases toward Linear β-Haloalkanes: Open-Solvated versus Occluded-Desolvated Active SitesLiskova, Veronika; Stepankova, Veronika; Bednar, David; Brezovsky, Jan; Prokop, Zbynek; Chaloupkova, Radka; Damborsky, JiriAngewandte Chemie, International Edition (2017), 56 (17), 4719-4723CODEN: ACIEF5; ISSN:1433-7851. (Wiley-VCH Verlag GmbH & Co. KGaA)The enzymic enantiodiscrimination of linear β-haloalkanes is difficult because the simple structures of the substrates prevent directional interactions. Herein we describe two distinct mol. mechanisms for the enantiodiscrimination of the β-haloalkane 2-bromopentane by haloalkane dehalogenases. Highly enantioselective DbjA has an open, solvent-accessible active site, whereas the engineered enzyme DhaA31 has an occluded and less solvated cavity but shows similar enantioselectivity. The enantioselectivity of DhaA31 arises from steric hindrance imposed by two specific substitutions rather than hydration as in DbjA.
-
28Bar-Even, A.; Noor, E.; Savir, Y.; Liebermeister, W.; Davidi, D.; Tawfik, D. S.; Milo, R. The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme Parameters. Biochemistry 2011, 50, 4402– 4410, DOI: 10.1021/bi2002289Google Scholar28https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXlsFWnur8%253D&md5=6cca5d0e98fe4f835de63adfe4059a56The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme ParametersBar-Even, Arren; Noor, Elad; Savir, Yonatan; Liebermeister, Wolfram; Davidi, Dan; Tawfik, Dan S.; Milo, RonBiochemistry (2011), 50 (21), 4402-4410CODEN: BICHAW; ISSN:0006-2960. (American Chemical Society)The kinetic parameters of enzymes are key to understanding the rate and specificity of most biol. processes. Although specific trends are frequently studied for individual enzymes, global trends are rarely addressed. We performed an anal. of kcat and KM values of several thousand enzymes collected from the literature. We found that the "av. enzyme" exhibits a kcat of ∼10 s-1 and a kcat/KM of ∼ 105 s-1 M-1, much below the diffusion limit and the characteristic textbook portrayal of kinetically superior enzymes. Why do most enzymes exhibit moderate catalytic efficiencies Maximal rates may not evolve in cases where weaker selection pressures are expected. We find, for example, that enzymes operating in secondary metab. are, on av., ∼ 30-fold slower than those of central metab. We also find indications that the physicochem. properties of substrates affect the kinetic parameters. Specifically, low mol. mass and hydrophobicity appear to limit KM optimization. In accordance, substitution with phosphate, CoA, or other large modifiers considerably lowers the KM values of enzymes utilizing the substituted substrates. It therefore appears that both evolutionary selection pressures and physicochem. constraints shape the kinetic parameters of enzymes. It also seems likely that the catalytic efficiency of some enzymes toward their natural substrates could be increased in many cases by natural or lab. evolution.
-
29Balchin, D.; Hayer-Hartl, M.; Hartl, F. U. In Vivo Aspects of Protein Folding and Quality Control. Science 2016, 353, aac4354, DOI: 10.1126/science.aac4354Google ScholarThere is no corresponding record for this reference.
-
30Colón, W.; Church, J.; Sen, J.; Thibeault, J.; Trasatti, H.; Xia, K. Biological Roles of Protein Kinetic Stability. Biochemistry 2017, 56, 6179– 6186, DOI: 10.1021/acs.biochem.7b00942Google Scholar30https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhslentrfL&md5=b7e0dd86dd97d503913bcab967ee7495Biological Roles of Protein Kinetic StabilityColon, Wilfredo; Church, Jennifer; Sen, Jayeeta; Thibeault, Jane; Trasatti, Hannah; Xia, KeBiochemistry (2017), 56 (47), 6179-6186CODEN: BICHAW; ISSN:0006-2960. (American Chemical Society)A review. A protein's stability may range from non-existent, as in the case of intrinsically disordered proteins, to very high, as indicated by a protein's resistance to degrdn., even under relatively harsh conditions. The stability of this latter group is usually under kinetic control due to a high activation energy for unfolding that virtually traps the protein in a specific conformation, thereby conferring resistance to proteolytic degrdn. and misfolding-aggregation. The usual outcome of kinetic stability is a longer protein half-life. Thus, the protective role of protein kinetic stability is often appreciated, but relatively little is known about the extent of biol. roles related to this property. Here, we discuss several known or putative biol. roles of protein kinetic stability, including protection from stressors to avoid aggregation or premature degrdn., achieving long-term phenotypic change, and regulating cellular processes by controlling the trigger and timing of mol. motion. The picture that emerges from this anal. is that protein kinetic stability is involved in a myriad of known and yet to be discovered biol. functions via its ability to resist degrdn. and control the timing, extent, and permanency of mol. motion.
-
31Khersonsky, O.; Kiss, G.; Röthlisberger, D.; Dym, O.; Albeck, S.; Houk, K. N.; Baker, D.; Tawfik, D. S. Bridging the Gaps in Design Methodologies by Evolutionary Optimization of the Stability and Proficiency of Designed Kemp Eliminase KE59. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 10358– 10363, DOI: 10.1073/pnas.1121063109Google Scholar31https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtFWgt7rE&md5=da69b08f6e228aa3c871519287f12688Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59Khersonsky, Olga; Kiss, Gert; Rothlisberger, Daniela; Dym, Orly; Albeck, Shira; Houk, Kendall N.; Baker, David; Tawfik, Dan S.Proceedings of the National Academy of Sciences of the United States of America (2012), 109 (26), 10358-10363, S10358/1-S10358/47CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Computational design is a test of our understanding of enzyme catalysis and a means of engineering novel, tailor-made enzymes. While the de novo computational design of catalytically efficient enzymes remains a challenge, designed enzymes may comprise unique starting points for further optimization by directed evolution. Directed evolution of two computationally designed Kemp eliminases, KE07 and KE70, led to low to moderately efficient enzymes (kcat/Km values of ≤5 × 104 M-1s-1). Here we describe the optimization of a third design, KE59. Although KE59 was the most catalytically efficient Kemp eliminase from this design series (by kcat/Km, and by catalyzing the elimination of nonactivated benzisoxazoles), its impaired stability prevented its evolutionary optimization. To boost KE59's evolvability, stabilizing consensus mutations were included in the libraries throughout the directed evolution process. The libraries were also screened with less activated substrates. Sixteen rounds of mutation and selection led to >2000-fold increase in catalytic efficiency, mainly via higher kcat values. The best KE59 variants exhibited kcat/Km values up to 0.6 × 106 M-1s-1, and kcat/kuncat values of ≤107 almost regardless of substrate reactivity. Biochem., structural, and mol. dynamics (MD) simulation studies provided insights regarding the optimization of KE59. Overall, the directed evolution of three different designed Kemp eliminases, KE07, KE70, and KE59, demonstrates that computational designs are highly evolvable and can be optimized to high catalytic efficiencies.
-
32Taverna, D. M.; Goldstein, R. A. Why Are Proteins Marginally Stable?. Proteins: Struct., Funct., Genet. 2002, 46, 105– 109, DOI: 10.1002/prot.10016Google ScholarThere is no corresponding record for this reference.
-
33Sanchez-Ruiz, J. M. Protein Kinetic Stability. Biophys. Chem. 2010, 148, 1– 15, DOI: 10.1016/j.bpc.2010.02.004Google Scholar33https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXkvFCju7c%253D&md5=b0bccbeea28fd182559a4381889877a6Protein kinetic stabilitySanchez-Ruiz, Jose M.Biophysical Chemistry (2010), 148 (1-3), 1-15CODEN: BICIAZ; ISSN:0301-4622. (Elsevier B.V.)A review. The relevance of protein stability for biol. function and mol. evolution is widely recognized. Protein stability, however, comes in 2 flavors: (1) thermodn. stability, which is related to a low amt. of unfolded and partially-unfolded states in equil. with the native, functional protein, and (2) kinetic stability, which is related to a high free energy barrier "sepg." the native state from the non-functional forms (unfolded states, irreversibly-denatured protein). Such a barrier may guarantee that the biol. function of the protein is maintained, at least during a physiol. relevant time-scale, even if the native state is not thermodynamically stable with respect to non-functional forms. Kinetic stabilization is likely required in many cases, since proteins often work under conditions (harsh extracellular or crowded intracellular environments) in which deleterious alterations (proteolysis, aggregation, undesirable interactions with other macromol. components) are prone to occur. Also, kinetic stability may provide a mechanism for the evolution of optimal functional properties. Furthermore, enhancement of kinetic stability is essential for many biotechnol. applications of proteins. Despite all of this, many published studies focus on thermodn. stability, partly because it can be easily quantified in vitro for small model proteins and, also, because of the availability of computational algorithms to est. mutation effects on thermodn. stability. Here, the opposite bias is purposely adopted: the exptl. evidence supporting widespread kinetic stabilization of proteins is summarized, the role of natural selection in detg. this feature is discussed, possible mol. mechanisms responsible for kinetic stability are described, and the relation between kinetic destabilization and protein misfolding diseases is highlighted.
-
34Bommarius, A. S.; Paye, M. F. Stabilizing Biocatalysts. Chem. Soc. Rev. 2013, 42, 6534– 6565, DOI: 10.1039/c3cs60137dGoogle Scholar34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtVKhtrzI&md5=5ba9406e8a7666b99704af985258606fStabilizing biocatalystsBommarius, Andreas S.; Paye, Marietou F.Chemical Society Reviews (2013), 42 (15), 6534-6565CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)A review. The area of biocatalysis itself is in rapid development, fueled by both an enhanced repertoire of protein engineering tools and an increasing list of solved problems. Biocatalysts, however, are delicate materials that hover close to the thermodn. limit of stability. In many cases, they need to be stabilized to survive a range of challenges regarding temp., pH value, salt type and concn., co-solvents, as well as shear and surface forces. Biocatalysts may be delicate proteins, however, once stabilized, they are efficiently active enzymes. Kinetic stability must be achieved to a level satisfactory for large-scale process application. Kinetic stability evokes resistance to degrdn. and maintained or increased catalytic efficiency of the enzyme in which the desired reaction is accomplished at an increased rate. However, beyond these limitations, stable biocatalysts can be operated at higher temps. or co-solvent concns., with ensuing redn. in microbial contamination, better soly., as well as in many cases more favorable equil., and can serve as more effective templates for combinatorial and data-driven protein engineering. To increase thermodn. and kinetic stability, immobilization, protein engineering, and medium engineering of biocatalysts are available, the main focus of this work. In the case of protein engineering, there are three main approaches to enhancing the stability of protein biocatalysts: (i) rational design, based on knowledge of the 3D-structure and the catalytic mechanism, (ii) combinatorial design, requiring a protocol to generate diversity at the genetic level, a large, often high throughput, screening capacity to distinguish hits' from misses', and (iii) data-driven design, fueled by the increased availability of nucleotide and amino acid sequences of equiv. functionality.
-
35Goldenzweig, A.; Fleishman, S. J. Principles of Protein Stability and Their Application in Computational Design. Annu. Rev. Biochem. 2018, 87, 105– 129, DOI: 10.1146/annurev-biochem-062917-012102Google Scholar35https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXitFyqt7k%253D&md5=d5b820508142e79dafe127543f1ad6b7Principles of Protein Stability and Their Application in Computational DesignGoldenzweig, Adi; Fleishman, Sarel J.Annual Review of Biochemistry (2018), 87 (), 105-129CODEN: ARBOAW; ISSN:0066-4154. (Annual Reviews)A review. Proteins are increasingly used in basic and applied biomedical research. Many proteins, however, are only marginally stable and can be expressed in limited amts., thus hampering research and applications. Research has revealed the thermodn., cellular, and evolutionary principles and mechanisms that underlie marginal stability. With this growing understanding, computational stability design methods have advanced over the past two decades starting from methods that selectively addressed only some aspects of marginal stability. Current methods are more general and, by combining phylogenetic anal. with atomistic design, have shown drastic improvements in soly., thermal stability, and aggregation resistance while maintaining the protein's primary mol. activity. Stability design is opening the way to rational engineering of improved enzymes, therapeutics, and vaccines and to the application of protein design methodol. to large proteins and mol. activities that have proven challenging in the past.
-
36Hansen, N.; van Gunsteren, W. F. Practical Aspects of Free-Energy Calculations: A Review. J. Chem. Theory Comput. 2014, 10, 2632– 2647, DOI: 10.1021/ct500161fGoogle Scholar36https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXotlWjs7k%253D&md5=096fd11727692a87b0884e50bcb4a5e3Practical Aspects of Free-Energy Calculations: A ReviewHansen, Niels; van Gunsteren, Wilfred F.Journal of Chemical Theory and Computation (2014), 10 (7), 2632-2647CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)A review. Free-energy calcns. in the framework of classical mol. dynamics simulations are nowadays used in a wide range of research areas including solvation thermodn., mol. recognition, and protein folding. The basic components of a free-energy calcn., i.e., a suitable model Hamiltonian, a sampling protocol, and an estimator for the free energy, are independent of the specific application. However, the attention that one has to pay to these components depends considerably on the specific application. Here, we review six different areas of application and discuss the relative importance of the three main components to provide the reader with an organigram and to make nonexperts aware of the many pitfalls present in free energy calcns.
-
37Polizzi, K. M.; Bommarius, A. S.; Broering, J. M.; Chaparro-Riggers, J. F. Stability of Biocatalysts. Curr. Opin. Chem. Biol. 2007, 11, 220– 225, DOI: 10.1016/j.cbpa.2007.01.685Google Scholar37https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXjvFegurk%253D&md5=0b7ef267bbff8fcbdc29ca772079fc57Stability of biocatalystsPolizzi, Karen M.; Bommarius, Andreas S.; Broering, James M.; Chaparro-Riggers, Javier F.Current Opinion in Chemical Biology (2007), 11 (2), 220-225CODEN: COCBF4; ISSN:1367-5931. (Elsevier B.V.)A review. Here, the authors highlight recent research on the stabilization of enzymes using both chem. and biol. means to increase the lifetime of the biocatalyst. Despite their many favorable qualities, the marginal stability of biocatalysts in many types of reaction media often has prevented or delayed their implementation for industrial-scale synthesis of fine chems. and pharmaceuticals. Consequently, there is great interest in understanding the effects of soln. conditions on protein stability, as well as in developing strategies to improve protein stability in desired reaction media. Recent methods include novel chem. modifications of proteins, lyophilization in the presence of additives, and phys. immobilization on novel supports. Rational and combinatorial protein engineering techniques have been used to yield unmodified proteins with exceptionally improved stability. Both have been aided by the development of computational tools and structure-guided heuristics aimed at reducing library sizes that must be generated and screened to identify improved mutants. The no. of parameters used to indicate protein stability can complicate discussions and investigations, and care should be taken to identify whether thermodn. or kinetic stability limits the obsd. stability of proteins. Although the useful lifetime of a biocatalyst is dictated by its kinetic stability, only 6% of protein stability studies use kinetic stability measures. Clearly, more effort is needed to study how soln. conditions impact protein kinetic stability.
-
38Buck, P. M.; Kumar, S.; Wang, X.; Agrawal, N. J.; Trout, B. L.; Singh, S. K. Computational Methods To Predict Therapeutic Protein Aggregation. Methods Mol. Biol. 2012, 899, 425– 451, DOI: 10.1007/978-1-61779-921-1_26Google Scholar38https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXitFags7c%253D&md5=a7b334801df3f41c32bb81f50bf967b4Computational methods to predict therapeutic protein aggregationBuck, Patrick M.; Kumar, Sandeep; Wang, Xiaoling; Agrawal, Neeraj J.; Trout, Bernhardt L.; Singh, Satish K.Methods in Molecular Biology (New York, NY, United States) (2012), 899 (Therapeutic Proteins), 425-451CODEN: MMBIED; ISSN:1064-3745. (Springer)A review. Protein based biotherapeutics have emerged as a successful class of pharmaceuticals. However, these macromols. endure a variety of physicochem. degrdns. during manufg., shipping, and storage, which may adversely impact the drug product quality. Of these degrdns., the irreversible self-assocn. of therapeutic proteins to form aggregates is a major challenge in the formulation of these mols. Tools to predict and mitigate protein aggregation are, therefore, of great interest to biopharmaceutical research and development. In this chapter, a no. of such computational tools developed to understand and predict the various steps involved in protein aggregation are described. These tools can be grouped into three general classes: unfolding kinetics and native state thermal stability, colloidal stability, and sequence/structure based aggregation liabilities. Chapter sections introduce each class by discussing how these predictive tools provide insight into the mol. events leading to protein aggregation. The computational methods are then explained in detail along with their advantages and limitations.
-
39Jaswal, S. S.; Sohl, J. L.; Davis, J. H.; Agard, D. A. Energetic Landscape of α-Lytic Protease Optimizes Longevity through Kinetic Stability. Nature 2002, 415, 343– 346, DOI: 10.1038/415343aGoogle Scholar39https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38Xptlaktg%253D%253D&md5=9d150430ebb485d561b9a8d71cab305bEnergetic landscape of α-lytic protease optimizes longevity through kinetic stabilityJaswal, Shella S.; Sohl, Julie L.; Davis, Jonathan H.; Agard, David A.Nature (London, United Kingdom) (2002), 415 (6869), 343-346CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)During the evolution of proteins the pressure to optimize biol. activity is moderated by a need for efficient folding. For most proteins, this is accomplished through spontaneous folding to a thermodynamically stable and active native state. However, in the extracellular bacterial α-lytic protease (αLP) these two processes have become decoupled. The native state of αLP is thermodynamically unstable, and when denatured, requires millennia (t1/2 ∼ 1800 yr) to refold. Folding is made possible by an attached folding catalyst, the pro-region, which is degraded on completion of folding, leaving αLP trapped in its native state by a large kinetic unfolding barrier (t1/2 ∼ 1.2 yr). αLP faces two very different folding landscapes: one in the presence of the pro-region controlling folding, and one in its absence restricting unfolding. Here we demonstrate that this sepn. of folding and unfolding pathways has removed constraints placed on the folding of thermodynamically stable proteins, and allowed the evolution of a native state having markedly reduced dynamic fluctuations. This, in turn, has led to a significant extension of the functional lifetime of αLP by the optimal suppression of proteolytic sensitivity.
-
40Young, T. A.; Skordalakes, E.; Marqusee, S. Comparison of Proteolytic Susceptibility in Phosphoglycerate Kinases from Yeast and E. coli: Modulation of Conformational Ensembles Without Altering Structure or Stability. J. Mol. Biol. 2007, 368, 1438– 1447, DOI: 10.1016/j.jmb.2007.02.077Google Scholar40https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXkslCmsLs%253D&md5=31e892b4872a16ea5507c80dfe4c63d4Comparison of Proteolytic Susceptibility in Phosphoglycerate Kinases from Yeast and E. coli: Modulation of Conformational Ensembles Without Altering Structure or StabilityYoung, Tracy A.; Skordalakes, Emmanuel; Marqusee, SusanJournal of Molecular Biology (2007), 368 (5), 1438-1447CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Escherichia coli phosphoglycerate kinase (PGK) is resistant to proteolytic cleavage while the yeast homolog from Saccharomyces cerevisiae is not. We have explored the biophys. basis of this surprising difference. The sequences of these homologs are 39% identical and 56% similar. Detn. of the crystal structure for the E. coli protein and comparison to the previously solved yeast structure reveals that the two proteins have extremely similar tertiary structures, and their global stabilities detd. by equil. denaturation are also very similar. The extrapolated unfolding rate of E. coli PGK is, however, 105 slower than that of the yeast homolog. This surprisingly large difference in unfolding rates appears to arise from a divergence in the extent of cooperativity between the two structural domains (the N and C-domains) that make up these kinases. This is supported by: (1) the C-domain of E. coli PGK cannot be expressed or fold independently of the N-domain, while both domains of the yeast protein fold in isolation into stable structures and (2) the energetics and kinetics of the proteolytically sensitive state of E. coli PGK match those for global unfolding. This suggests that proteolysis occurs from the globally unfolded state of E. coli PGK, while the characteristics defining the yeast homolog suggest that proteolysis occurs upon unfolding of only the C-domain, with the N-domain remaining folded and consequently resistant to cleavage.
-
41Shirke, A. N.; Basore, D.; Butterfoss, G. L.; Bonneau, R.; Bystroff, C.; Gross, R. A. Toward Rational Thermostabilization of Aspergillus Oryzae Cutinase: Insights into Catalytic and Structural Stability. Proteins: Struct., Funct., Genet. 2016, 84, 60– 72, DOI: 10.1002/prot.24955Google ScholarThere is no corresponding record for this reference.
-
42Liu, B.; Zhang, J.; Li, B.; Liao, X.; Du, G.; Chen, J. Expression and Characterization of Extreme Alkaline, Oxidation-Resistant Keratinase from Bacillus Licheniformis in Recombinant Bacillus Subtilis WB600 Expression System and Its Application in Wool Fiber Processing. World J. Microbiol. Biotechnol. 2013, 29, 825– 832, DOI: 10.1007/s11274-012-1237-5Google Scholar42https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXlsFWnsb4%253D&md5=ff6b515ff39b7f5d5bb752ba1a1c1ce8Expression and characterization of extreme alkaline, oxidation-resistant keratinase from Bacillus licheniformis in recombinant Bacillus subtilis WB600 expression system and its application in wool fiber processingLiu, Baihong; Zhang, Juan; Li, Ben; Liao, Xiangru; Du, Guocheng; Chen, JianWorld Journal of Microbiology & Biotechnology (2013), 29 (5), 825-832CODEN: WJMBEY; ISSN:0959-3993. (Springer)A keratin-degrading bacterium of Bacillus licheniformis BBE11-1 was isolated and its ker gene encoding keratinase with native signal peptide was cloned and expressed in Bacillus subtilis WB600 under the strong PHpaII promoter of the pMA0911 vector. In the 3-L fermenter, the recombinant keratinase was secreted with 323 units/mL when non-induced after 24 h at 37 °C. And then, keratinase was concd. and purified by hydrophobic interaction chromatog. using HiTrap Phenyl-Sepharose Fast Flow. The recombinant keratinase had an optimal temp. and the pH at 40 °C and 10.5, resp., and was stable at 10-50 °C and pH 7-11.5. We found this enzyme can retained 80 % activity after treated 5 h with 1 M H2O2, it was activated by Mg2+, Co2+ and could degraded broad substrates such as degraded feather, bovine serum albumin, casein, gelatin, the keratinase was considered to be a serine protease. Coordinate with Savinase, the keratinase could efficient prevent shrinkage and eliminate fibers of wool, which showed its potential in textile industries and detergent industries.
-
43Nguyen, V.; Wilson, C.; Hoemberger, M.; Stiller, J. B.; Agafonov, R. V.; Kutter, S.; English, J.; Theobald, D. L.; Kern, D. Evolutionary Drivers of Thermoadaptation in Enzyme Catalysis. Science 2017, 355, 289– 294, DOI: 10.1126/science.aah3717Google Scholar43https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVehtbk%253D&md5=f2d5eebf186da3323268e68831d65e49Evolutionary drivers of thermoadaptation in enzyme catalysisNguyen, Vy; Wilson, Christopher; Hoemberger, Marc; Stiller, John B.; Agafonov, Roman V.; Kutter, Steffen; English, Justin; Theobald, Douglas L.; Kern, DorotheeScience (Washington, DC, United States) (2017), 355 (6322), 289-294CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)With early life likely to have existed in a hot environment, enzymes had to cope with an inherent drop in catalytic speed caused by lowered temp. Here we characterize the mol. mechanisms underlying thermoadaptation of enzyme catalysis in adenylate kinase using ancestral sequence reconstruction spanning 3 billion years of evolution. We show that evolution solved the enzyme's key kinetic obstacle - how to maintain catalytic speed on a cooler Earth - by exploiting transition-state heat capacity. Tracing the evolution of enzyme activity and stability from the hot-start toward modern hyperthermophilic, mesophilic, and psychrophilic organisms illustrates active pressure vs. passive drift in evolution on a mol. level, refutes the debated activity/stability trade-off, and suggests that the catalytic speed of adenylate kinase is an evolutionary driver for organismal fitness.
-
44Risso, V. A.; Gavira, J. A.; Gaucher, E. A.; Sanchez-Ruiz, J. M. Phenotypic Comparisons of Consensus Variants versus Laboratory Resurrections of Precambrian Proteins. Proteins: Struct., Funct., Genet. 2014, 82, 887– 896, DOI: 10.1002/prot.24575Google ScholarThere is no corresponding record for this reference.
-
45Bednar, D.; Beerens, K.; Sebestova, E.; Bendl, J.; Khare, S.; Chaloupkova, R.; Prokop, Z.; Brezovsky, J.; Baker, D.; Damborsky, J. FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point Mutants. PLoS Comput. Biol. 2015, 11, e1004556, DOI: 10.1371/journal.pcbi.1004556Google Scholar45https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XkvVKhtb4%253D&md5=82389328fe2da01f6f99eba4afe20f40FireProt: energy- and evolution-based computational design of thermostable multiple-point mutantsBednar, David; Beerens, Koen; Sebestova, Eva; Bendl, Jaroslav; Khare, Sagar; Chaloupkova, Radka; Prokop, Zbynek; Brezovsky, Jan; Baker, David; Damborsky, JiriPLoS Computational Biology (2015), 11 (11), e1004556/1-e1004556/20CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)There is great interest in increasing proteins' stability to enhance their utility as biocatalysts, therapeutics, diagnostics and nanomaterials. Directed evolution is a powerful, but exptl. strenuous approach. Computational methods offer attractive alternatives. However, due to the limited reliability of predictions and potentially antagonistic effects of substitutions, only single-point mutations are usually predicted in silico, exptl. verified and then recombined in multiple-point mutants. Thus, substantial screening is still required. Here we present FireProt, a robust computational strategy for predicting highly stable multiple-point mutants that combines energy- and evolution-based approaches with smart filtering to identify additive stabilizing mutations. FireProt's reliability and applicability was demonstrated by validating its predictions against 656 mutations from the ProTherm database. We demonstrate that thermostability of the model enzymes haloalkane dehalogenase DhaA and γ-hexachlorocyclohexane dehydrochlorinase LinA can be substantially increased (ΔTm = 24°C and 21°C) by constructing and characterizing only a handful of multiple-point mutants. FireProt can be applied to any protein for which a tertiary structure and homologous sequences are available, and will facilitate the rapid development of robust proteins for biomedical and biotechnol. applications.
-
46Babkova, P.; Sebestova, E.; Brezovsky, J.; Chaloupkova, R.; Damborsky, J. Ancestral Haloalkane Dehalogenases Show Robustness and Unique Substrate Specificity. ChemBioChem 2017, 18, 1448– 1456, DOI: 10.1002/cbic.201700197Google Scholar46https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXpsFSqsLs%253D&md5=f732a67defa3a56ea468a91bc1c345ddAncestral Haloalkane Dehalogenases Show Robustness and Unique Substrate SpecificityBabkova, Petra; Sebestova, Eva; Brezovsky, Jan; Chaloupkova, Radka; Damborsky, JiriChemBioChem (2017), 18 (14), 1448-1456CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)Ancestral sequence reconstruction (ASR) represents a powerful approach for empirical testing structure-function relationships of diverse proteins. We employed ASR to predict sequences of five ancestral haloalkane dehalogenases (HLDs) from the HLD-II subfamily. Genes encoding the inferred ancestral sequences were synthesized and expressed in Escherichia coli, and the resurrected ancestral enzymes (AncHLD1-5) were exptl. characterized. Strikingly, the ancestral HLDs exhibited significantly enhanced thermodn. stability compared to extant enzymes (ΔTm up to 24 °C), as well as higher specific activities with preference for short multi-substituted halogenated substrates. Moreover, multivariate statistical anal. revealed a shift in the substrate specificity profiles of AncHLD1 and AncHLD2. This is extremely difficult to achieve by rational protein engineering. The study highlights that ASR is an efficient approach for the development of novel biocatalysts and robust templates for directed evolution.
-
47Goldenzweig, A.; Goldsmith, M.; Hill, S. E.; Gertman, O.; Laurino, P.; Ashani, Y.; Dym, O.; Unger, T.; Albeck, S.; Prilusky, J.; Lieberman, R. L.; Aharoni, A.; Silman, I.; Sussman, J. L.; Tawfik, D. S.; Fleishman, S. J. Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and Stability. Mol. Cell 2016, 63, 337– 346, DOI: 10.1016/j.molcel.2016.06.012Google Scholar47https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhtFyiur7L&md5=b0a4f7734048636b9c30bd9449b4d4a1Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and StabilityGoldenzweig, Adi; Goldsmith, Moshe; Hill, Shannon E.; Gertman, Or; Laurino, Paola; Ashani, Yacov; Dym, Orly; Unger, Tamar; Albeck, Shira; Prilusky, Jaime; Lieberman, Raquel L.; Aharoni, Amir; Silman, Israel; Sussman, Joel L.; Tawfik, Dan S.; Fleishman, Sarel J.Molecular Cell (2016), 63 (2), 337-346CODEN: MOCEFL; ISSN:1097-2765. (Elsevier Inc.)Upon heterologous overexpression, many proteins misfold or aggregate, thus resulting in low functional yields. Human acetylcholinesterase (hAChE), an enzyme mediating synaptic transmission, is a typical case of a human protein that necessitates mammalian systems to obtain functional expression. We developed a computational strategy and designed an AChE variant bearing 51 mutations that improved core packing, surface polarity, and backbone rigidity. This variant expressed at ∼2,000-fold higher levels in E. coli compared to wild-type hAChE and exhibited 20°C higher thermostability with no change in enzymic properties or in the active-site configuration as detd. by crystallog. To demonstrate broad utility, we similarly designed four other human and bacterial proteins. Testing at most three designs per protein, we obtained enhanced stability and/or higher yields of sol. and active protein in E. coli. Our algorithm requires only a 3D structure and several dozen sequences of naturally occurring homologs, and is available at http://pross.weizmann.ac.il.
-
48Hammes, G. G.; Chang, Y.-C.; Oas, T. G. Conformational Selection or Induced Fit: A Flux Description of Reaction Mechanism. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 13737, DOI: 10.1073/pnas.0907195106Google Scholar48https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtFWksL3N&md5=8c3f865f8c53b2d597ec26c8bba27fb3Conformational selection or induced fit: a flux description of reaction mechanismHammes, Gordon G.; Chang, Yu-Chu; Oas, Terrence G.Proceedings of the National Academy of Sciences of the United States of America (2009), 106 (33), 13737-13741CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)The mechanism of ligand binding coupled to conformational changes in macromols. has recently attracted considerable interest. The 2 limiting cases are the "induced fit" mechanism (binding first) or "conformational selection" (conformational change first). Described here are the criteria by which the sequence of events can be detd. quant. The relative importance of the 2 pathways is detd. not by comparing rate consts. (a common misconception) but instead by comparing the flux through each pathway. The simple rules for calcg. flux in multistep mechanisms are described and then applied to 2 examples from the literature, neither of which has previously been analyzed using the concept of flux. The first example is the mechanism of conformational change in the binding of NADPH to dihydrofolate reductase (DHFR). The second example is the mechanism of flavodoxin folding coupled to binding of its cofactor, FMN. In both cases, the mechanism switches from being dominated by the conformational selection pathway at low ligand concn. to induced fit at high ligand concn. Over a wide range of conditions, a significant fraction of the flux occurs through both pathways. Such a mixed mechanism likely will be discovered for many cases of coupled conformational change and ligand binding when kinetic data are analyzed by using a flux-based approach.
-
49Kramer, R. M.; Shende, V. R.; Motl, N.; Pace, C. N.; Scholtz, J. M. Toward a Molecular Understanding of Protein Solubility: Increased Negative Surface Charge Correlates with Increased Solubility. Biophys. J. 2012, 102, 1907– 1915, DOI: 10.1016/j.bpj.2012.01.060Google Scholar49https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmtVeku7k%253D&md5=1425a6b01b8a36cf68766f853607cdf6Toward a Molecular Understanding of Protein Solubility: Increased Negative Surface Charge Correlates with Increased SolubilityKramer, Ryan M.; Shende, Varad R.; Motl, Nicole; Pace, C. Nick; Scholtz, J. MartinBiophysical Journal (2012), 102 (8), 1907-1915CODEN: BIOJAU; ISSN:0006-3495. (Cell Press)Protein soly. is a problem for many protein chemists, including structural biologists and developers of protein pharmaceuticals. Knowledge about how intrinsic factors influence soly. is limited due to the difficulty of obtaining quant. soly. measurements. Soly. measurements in buffer alone are difficult to reproduce, because gels or supersatd. solns. often form, making it impossible to det. soly. values for many proteins. Protein precipitants can be used to obtain comparative soly. measurements and, in some cases, estns. of soly. in buffer alone. Protein precipitants fall into three broad classes: salts, long-chain polymers, and org. solvents. Here, we compare the use of representatives from two classes of precipitants, ammonium sulfate and polyethylene glycol 8000, by measuring the soly. of seven proteins. We find that increased neg. surface charge correlates strongly with increased protein soly. and may be due to strong binding of water by the acidic amino acids. We also find that the soly. results obtained for the two different precipitants agree closely with each other, suggesting that the two precipitants probe similar properties that are relevant to soly. in buffer alone.
-
50Khow, O.; Suntrarachun, S. Strategies for Production of Active Eukaryotic Proteins in Bacterial Expression System. Asian Pac. J. Trop. Biomed. 2012, 2, 159– 162, DOI: 10.1016/S2221-1691(11)60213-XGoogle Scholar50https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XjtVKlsrc%253D&md5=73bf7eb097e7d1c12983ef331a2a06c6Strategies for production of active eukaryotic proteins in bacterial expression systemKhow, Orawan; Suntrarachun, SunutchaAsian Pacific Journal of Tropical Biomedicine (2012), 2 (2), 159-162CODEN: APJTC7; ISSN:2221-1691. (Asian Pacific Tropical Medicine Press)A review. Bacteria have long been the favorite expression system for recombinant protein prodn. However, the flaw of the system is that insol. and inactive proteins are co-produced due to codon bias, protein folding, phosphorylation, glycosylation, mRNA stability and promoter strength. Factors are cited and the methods to convert to sol. and active proteins are described, for example a tight control of Escherichia coli milieu, refolding from inclusion body and through fusion technol.
-
51Sørensen, H. P.; Mortensen, K. K. Soluble Expression of Recombinant Proteins in the Cytoplasm of Escherichia coli. Microb. Cell Fact. 2005, 4, 1, DOI: 10.1186/1475-2859-4-1Google Scholar51https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2sbnvFOksg%253D%253D&md5=f3fbb4b2b2bce0500b4aa4f806c23e0bSoluble expression of recombinant proteins in the cytoplasm of Escherichia coliSorensen Hans Peter; Mortensen Kim KuskMicrobial cell factories (2005), 4 (1), 1 ISSN:.Pure, soluble and functional proteins are of high demand in modern biotechnology. Natural protein sources rarely meet the requirements for quantity, ease of isolation or price and hence recombinant technology is often the method of choice. Recombinant cell factories are constantly employed for the production of protein preparations bound for downstream purification and processing. Eschericia coli is a frequently used host, since it facilitates protein expression by its relative simplicity, its inexpensive and fast high density cultivation, the well known genetics and the large number of compatible molecular tools available. In spite of all these qualities, expression of recombinant proteins with E. coli as the host often results in insoluble and/or nonfunctional proteins. Here we review new approaches to overcome these obstacles by strategies that focus on either controlled expression of target protein in an unmodified form or by applying modifications using expressivity and solubility tags.
-
52Hartl, F. U.; Bracher, A.; Hayer-Hartl, M. Molecular Chaperones in Protein Folding and Proteostasis. Nature 2011, 475, 324– 332, DOI: 10.1038/nature10317Google Scholar52https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXpt1aqsb8%253D&md5=8d3045af796a78a2e587bafc3a49211eMolecular chaperones in protein folding and proteostasisHartl, F. Ulrich; Bracher, Andreas; Hayer-Hartl, ManajitNature (London, United Kingdom) (2011), 475 (7356), 324-332CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)A review. Most proteins must fold into defined 3-dimensional structures to gain functional activity. However, in the cellular environment, newly synthesized proteins are at great risk of aberrant folding and aggregation, potentially forming toxic species. To avoid these dangers, cells invest in a complex network of mol. chaperones, which use ingenious mechanisms to prevent aggregation and promote efficient folding. Because protein mols. are highly dynamic, const. chaperone surveillance is required to ensure protein homeostasis (proteostasis). Recent advances suggest that an age-related decline in proteostasis capacity allows the manifestation of various protein-aggregation diseases, including Alzheimer's disease and Parkinson's disease. Interventions in these and numerous other pathol. states may spring from a detailed understanding of the pathways underlying proteome maintenance.
-
53Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y.; Wriggers, W. Atomic-Level Characterization of the Structural Dynamics of Proteins. Science 2010, 330, 341– 346, DOI: 10.1126/science.1187409Google Scholar53https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXht1OisL%252FN&md5=85c9d897881e8684fc39d69b2b6b2fadAtomic-Level Characterization of the Structural Dynamics of ProteinsShaw, David E.; Maragakis, Paul; Lindorff-Larsen, Kresten; Piana, Stefano; Dror, Ron O.; Eastwood, Michael P.; Bank, Joseph A.; Jumper, John M.; Salmon, John K.; Shan, Yibing; Wriggers, WillyScience (Washington, DC, United States) (2010), 330 (6002), 341-346CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)Mol. dynamics (MD) simulations are widely used to study protein motions at an at. level of detail, but they have been limited to time scales shorter than those of many biol. crit. conformational changes. We examd. two fundamental processes in protein dynamics-protein folding and conformational change within the folded state-by means of extremely long all-atom MD simulations conducted on a special-purpose machine. Equil. simulations of a WW protein domain captured multiple folding and unfolding events that consistently follow a well-defined folding pathway; sep. simulations of the protein's constituent substructures shed light on possible determinants of this pathway. A 1-ms simulation of the folded protein BPTI reveals a small no. of structurally distinct conformational states whose reversible interconversion is slower than local relaxations within those states by a factor of more than 1000.
-
54Englander, S. W.; Mayne, L. The Case for Defined Protein Folding Pathways. Proc. Natl. Acad. Sci. U. S. A. 2017, 114, 8253– 8258, DOI: 10.1073/pnas.1706196114Google Scholar54https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVWmtL3O&md5=b7132d811d7981126c692fd69f27dd2cThe case for defined protein folding pathwaysEnglander, S. Walter; Mayne, LelandProceedings of the National Academy of Sciences of the United States of America (2017), 114 (31), 8253-8258CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)We consider the differences between the many-pathway protein folding model derived from theor. energy landscape considerations and the defined-pathway model derived from expt. A basic tenet of the energy landscape model is that proteins fold through many heterogeneous pathways by way of amino acid-level dynamics biased toward selecting native-like interactions. The many pathways imagined in the model are not obsd. in the structure-formation stage of folding by expts. that would have found them, but they have now been detected and characterized for one protein in the initial prenucleation stage. Anal. presented here shows that these many microscopic trajectories are not distinct in any functionally significant way, and they have neither the structural information nor the biased energetics needed to select native vs. non-native interactions during folding. The opposed defined-pathway model stems from exptl. results that show that proteins are assemblies of small cooperative units called foldons and that a no. of proteins fold in a reproducible pathway one foldon unit at a time. Thus, the same foldon interactions that encode the native structure of any given protein also naturally encode its particular foldon-based folding pathway, and they collectively sum to produce the energy bias toward native interactions that is necessary for efficient folding. Available information suggests that quantized native structure and stepwise folding coevolved in ancient repeat proteins and were retained as a functional pair due to their utility for solving the difficult protein folding problem.
-
55Voelz, V. A.; Bowman, G. R.; Beauchamp, K.; Pande, V. S. Molecular Simulation of Ab Initio Protein Folding for a Millisecond Folder NTL9(1–39). J. Am. Chem. Soc. 2010, 132, 1526– 1528, DOI: 10.1021/ja9090353Google Scholar55https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXkvFCktQ%253D%253D&md5=0f7e3f2489fc0693ee494b212cde2a6cMolecular Simulation of ab Initio Protein Folding for a Millisecond Folder NTL9(1-39)Voelz, Vincent A.; Bowman, Gregory R.; Beauchamp, Kyle; Pande, Vijay S.Journal of the American Chemical Society (2010), 132 (5), 1526-1528CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)The results obtained suggest that existing force field models using implicit solvent are indeed accurate enough to fold proteins ab initio at long time scales (milliseconds). opening the door to simulating more structurally complex proteins. Moreover, our work demonstrates that there need not be a single pathway or single. dominant mechanism for the folding of a given protein: since the theories proposed for how proteins fold are based on broadly relevant phys. principles, it is natural to imagine that multiple mechanisms could be simultaneously present but that the sequence of the protein, coupled with the chem. environment, would control the balance to which each mechanistic pathway is seen.
-
56Eaton, W. A.; Wolynes, P. G. Theory, Simulations, and Experiments Show That Proteins Fold by Multiple Pathways. Proc. Natl. Acad. Sci. U. S. A. 2017, 114, E9759– E9760, DOI: 10.1073/pnas.1716444114Google Scholar56https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvVShsr3O&md5=ecdf18b4810579010470c23b1349b3f1Theory, simulations, and experiments show that proteins fold by multiple pathwaysEaton, William A.; Wolynes, Peter G.Proceedings of the National Academy of Sciences of the United States of America (2017), 114 (46), E9759-E9760CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)There is no expanded citation for this reference.
-
57Yang, Y.; Niroula, A.; Shen, B.; Vihinen, M. PON-Sol: Prediction of Effects of Amino Acid Substitutions on Protein Solubility. Bioinformatics 2016, 32, 2032– 2034, DOI: 10.1093/bioinformatics/btw066Google Scholar57https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsF2lt7jN&md5=718a1b391921d0f38c443b58819ab66aPON-Sol: prediction of effects of amino acid substitutions on protein solubilityYang, Yang; Niroula, Abhishek; Shen, Bairong; Vihinen, MaunoBioinformatics (2016), 32 (13), 2032-2034CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Soly. is one of the fundamental protein properties. It is of great interest because of its relevance to protein expression. Reduced soly. and protein aggregation are also assocd. with many diseases. Results: We collected from literature the largest exptl. verified soly. affecting amino acid substitution (AAS) dataset and used it to train a predictor called PON-Sol. The predictor can distinguish both soly. decreasing and increasing variants from those not affecting soly. PON-Sol has normalized correct prediction ratio of 0.491 on cross-validation and 0.432 for independent test set. The performance of the method was compared both to soly. and aggregation predictors and found to be superior. PON-Sol can be used for the prediction of effects of disease-related substitutions, effects on heterologous recombinant protein expression and enhanced crystallizability. One application is to investigate effects of all possible AASs in a protein to aid protein engineering.
-
58Broom, A.; Jacobi, Z.; Trainor, K.; Meiering, E. M. Computational Tools Help Improve Protein Stability but with a Solubility Tradeoff. J. Biol. Chem. 2017, 292, 14349– 14361, DOI: 10.1074/jbc.M117.784165Google Scholar58https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVantLfP&md5=1c7b181ba75fad3548ba938167dd3a92Computational tools help improve protein stability but with a solubility tradeoffBroom, Aron; Jacobi, Zachary; Trainor, Kyle; Meiering, Elizabeth M.Journal of Biological Chemistry (2017), 292 (35), 14349-14361CODEN: JBCHA3; ISSN:0021-9258. (American Society for Biochemistry and Molecular Biology)Accurately predicting changes in protein stability upon amino acid substitution is a much sought after goal. Destabilizing mutations are often implicated in disease, whereas stabilizing mutations are of great value for industrial and therapeutic biotechnol. Increasing protein stability is an esp. challenging task, with random substitution yielding stabilizing mutations in only ∼2% of cases. To overcome this bottleneck, computational tools that aim to predict the effect of mutations have been developed; however, achieving accuracy and consistency remains challenging. Here, we combined 11 freely available tools into a meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/). Validation against ∼600 exptl. mutations indicated that our meta-predictor has improved performance over any of the individual tools. The meta-predictor was then used to recommend 10 mutations in a previously designed protein of moderate thermodn. stability, ThreeFoil. Exptl. characterization showed that four mutations increased protein stability and could be amplified through ThreeFoil's structural symmetry to yield several multiple mutants with >2-kcal/mol stabilization. By avoiding residues within functional ties, we could maintain ThreeFoil's glycan-binding capacity. Despite successfully achieving substantial stabilization, however, almost all mutations decreased protein soly., the most common cause of protein design failure. Examn. of the 600-mutation data set revealed that stabilizing mutations on the protein surface tend to increase hydrophobicity and that the individual tools favor this approach to gain stability. Thus, whereas currently available tools can increase protein stability and combining them into a meta-predictor yields enhanced reliability, improvements to the potentials/force fields underlying these tools are needed to avoid gaining protein stability at the cost of soly.
-
59Cabantous, S.; Waldo, G. S. In Vivo and in Vitro Protein Solubility Assays Using Split GFP. Nat. Methods 2006, 3, 845– 854, DOI: 10.1038/nmeth932Google Scholar59https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XpvVCmtb8%253D&md5=1e312220eac04371c8e03d4c8ee6bf48In vivo and in vitro protein solubility assays using split GFPCabantous, Stephanie; Waldo, Geoffrey S.Nature Methods (2006), 3 (10), 845-854CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)The rapid assessment of protein soly. is essential for evaluating expressed proteins and protein variants for use as reagents for downstream studies. Soly. screens based on antibody blots are complex and have limited screening capacity. Protein soly. screens using split β-galactosidase in vivo and in vitro can perturb protein folding. Split GFP used for monitoring protein interactions folds poorly, and to overcome this limitation, we recently developed a protein-tagging system based on self-complementing split GFP derived from an exceptionally well folded variant of GFP termed 'superfolder GFP'. Here we present the step-by-step procedure of the soly. assay using split GFP. A 15-amino-acid GFP fragment, GFP 11, is fused to a test protein. The GFP 1-10 detector fragment is expressed sep. These fragments assoc. spontaneously to form fluorescent GFP. The fragments are sol., and the GFP 11 tag has minimal effect on protein soly. and folding. We describe high-throughput protein soly. screens amenable both for in vivo and in vitro formats. The split-GFP system is composed of two vectors used in the same strain: pTET GFP 11 and pET GFP 1-10. The gene encoding the protein of interest is cloned into the pTET GFP 11 vector (resulting in an N-terminal fusion) and transformed into Escherichia coli BL21 (DE3) cells contg. the pET GFP 1-10 plasmid. We also describe how this system can be used for selecting sol. proteins from a library of variants. The large screening power of the in vivo assay combined with the high accuracy of the in vitro assay point to the efficiency of this two-step split-GFP tool for identifying sol. clones suitable for purifn. and downstream applications.
-
60Niwa, T.; Ying, B.-W.; Saito, K.; Jin, W.; Takada, S.; Ueda, T.; Taguchi, H. Bimodal Protein Solubility Distribution Revealed by an Aggregation Analysis of the Entire Ensemble of Escherichia coli Proteins. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 4201– 4206, DOI: 10.1073/pnas.0811922106Google Scholar60https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXjslChur4%253D&md5=d40a704ee7e3e75c515c5be76d8c0dbbBimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteinsNiwa, Tatsuya; Ying, Bei-Wen; Saito, Katsu; Jin, Wen Zhen; Takada, Shoji; Ueda, Takuya; Taguchi, HidekiProceedings of the National Academy of Sciences of the United States of America (2009), 106 (11), 4201-4206CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Protein folding often competes with intermol. aggregation, which in most cases irreversibly impairs protein function, as exemplified by the formation of inclusion bodies. Although it has been empirically detd. that some proteins tend to aggregate, the relationship between the protein aggregation propensities and the primary sequences remains poorly understood. Here, the authors individually synthesized the entire ensemble of Escherichia coli proteins by using an in vitro reconstituted translation system and analyzed the aggregation propensities. Because the reconstituted translation system is chaperone-free, they could evaluate the inherent aggregation propensities of thousands of proteins in a translation-coupled manner. A histogram of the solubilities, based on data from 3,173 translated proteins, revealed a clear bimodal distribution, indicating that the aggregation propensities are not evenly distributed across a continuum. Instead, the proteins can be categorized into 2 groups, sol. and aggregation-prone proteins. The aggregation propensity is most prominently correlated with the structural classification of proteins, implying that the prediction of aggregation propensity requires structural information about the protein.
-
61Eijsink, V. G.; Vriend, G.; van den Burg, B.; van der Zee, J. R.; Veltman, O. R.; Stulp, B. K.; Venema, G. Introduction of a Stabilizing 10 Residue Beta-Hairpin in Bacillus Subtilis Neutral Protease. Protein Eng., Des. Sel. 1992, 5, 157– 163, DOI: 10.1093/protein/5.2.157Google ScholarThere is no corresponding record for this reference.
-
62Lee, C.; Levitt, M. Accurate Prediction of the Stability and Activity Effects of Site-Directed Mutagenesis on a Protein Core. Nature 1991, 352, 448– 451, DOI: 10.1038/352448a0Google Scholar62https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3MXltlWgt7g%253D&md5=c6845f89ebb1cfb4b56f72fdeb552838Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein coreLee, Christopher; Levitt, MichaelNature (London, United Kingdom) (1991), 352 (6334), 448-51CODEN: NATUAS; ISSN:0028-0836.Theor. prediction of the structure, stability and activity of proteins, an important unsolved problem in mol. biol., would be of use for guiding site-directed mutagenesis and other protein-engineering techniques. X-ray diffraction studies have provided extensive structural information for many proteins, challenging theorists to develop reliable techniques able to use such knowledge as a base for prediction of mutants' characteristics. Here theor. calcn. of stabilization energies is reported for 78 triple-site sequence variants of λ repressor characterized exptl. The calcd. energies correlate with the mutants' measured activities; active and inactive mutations are discriminated with 92% reliability. They correlate even more directly with the mutant's thermostabilities, correctly identifying two of the mutants to be more stable than the wild type.
-
63Buß, O.; Muller, D.; Jager, S.; Rudat, J.; Rabe, K. S. Improvement in the Thermostability of a β-Amino Acid Converting ω-Transaminase by Using FoldX. ChemBioChem 2018, 19, 379– 387, DOI: 10.1002/cbic.201700467Google Scholar63https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvF2jsbzP&md5=9e031341acdee60b9abf74ba62226fd9Improvement in the Thermostability of a β-Amino Acid Converting ω-Transaminase by Using FoldXBuss, Oliver; Muller, Delphine; Jager, Sven; Rudat, Jens; Rabe, Kersten S.ChemBioChem (2018), 19 (4), 379-387CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)ω-Transaminases (ω-TAs) are important biocatalysts for the synthesis of active, chiral pharmaceutical ingredients contg. amino groups, such as β-amino acids, which are important in peptidomimetics and as building blocks for drugs. However, the application of ω-TAs is limited by the availability and stability of enzymes with high conversion rates. One strategy for the synthesis and optical resoln. of β-phenylalanine and other important arom. β-amino acids is biotransformation by utilizing an ω-transaminase from Variovorax paradoxus. We designed variants of this ω-TA to gain higher process stability on the basis of predictions calcd. by using the FoldX software. We herein report the first thermostabilization of a nonthermostable S-selective ω-TA by FoldX-guided site-directed mutagenesis. The m.p. (Tm) of our best-performing mutant was increased to 59.3 °C, an increase of 4.0 °C relative to the Tm value of the wild-type enzyme, whereas the mutant fully retained its specific activity.
-
64Modarres, H. P.; Mofrad, M. R.; Sanati-Nezhad, A. Protein Thermostability Engineering. RSC Adv. 2016, 6, 115252– 115270, DOI: 10.1039/C6RA16992AGoogle Scholar64https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhsl2gt7nE&md5=d1f4472316de49f4a85a1fa175a17b49Protein thermostability engineeringModarres, H. Pezeshgi; Mofrad, M. R.; Sanati-Nezhad, A.RSC Advances (2016), 6 (116), 115252-115270CODEN: RSCACL; ISSN:2046-2069. (Royal Society of Chemistry)The use of enzymes for industrial and biomedical applications is limited to their function at elevated temps. The principles of thermostability engineering need to be implemented for proteins with low thermal stability to broaden their applications. Therefore, understanding the thermal stability modulating factors of proteins is necessary for engineering their thermostability. In this review, first different thermostability enhancing strategies in both the sequence and structure levels, discovered by studying the natural proteins adapted to different conditions, are introduced. Next, the progress in the development of various computational methods to engineer thermostability of proteins by learning from nature and introducing several popular tools and algorithms for protein thermostability engineering is highlighted. Further discussion includes the challenges in the field of protein thermostability engineering such as the protein stability-activity trade-off. Finally, how thermostability engineering could be instrumental for the design of protein drugs for biomedical applications is demonstrated.
-
65Pace, C. N.; Scholtz, J. M.; Grimsley, G. R. Forces Stabilizing Proteins. FEBS Lett. 2014, 588, 2177– 2184, DOI: 10.1016/j.febslet.2014.05.006Google ScholarThere is no corresponding record for this reference.
-
66Lazaridis, T.; Karplus, M. Effective Energy Functions for Protein Structure Prediction. Curr. Opin. Struct. Biol. 2000, 10, 139– 145, DOI: 10.1016/S0959-440X(00)00063-4Google Scholar66https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXivFWgsbY%253D&md5=eeefab13ff97ddc2b40453f19291f365Effective energy functions for protein structure predictionLazaridis, Themis; Karplus, MartinCurrent Opinion in Structural Biology (2000), 10 (2), 139-145CODEN: COSBEF; ISSN:0959-440X. (Elsevier Science Ltd.)A review, with 78 refs. Protein structure prediction, fold recognition, homol. modeling and design rely mainly on statistical effective energy functions. Although the theor. foundation of such functions is not clear, their usefulness has been demonstrated in many applications. Mol. mechanics force fields, particularly when augmented by implicit solvation models, provide phys. effective energy functions that are beginning to play a role in this area.
-
67Seeliger, D.; de Groot, B. L. Protein Thermostability Calculations Using Alchemical Free Energy Simulations. Biophys. J. 2010, 98, 2309– 2316, DOI: 10.1016/j.bpj.2010.01.051Google Scholar67https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXosFCitrw%253D&md5=cc1e6e66f18be5c171829f6485cff377Protein thermostability calculations using alchemical free energy simulationsSeeliger, Daniel; de Groot, Bert L.Biophysical Journal (2010), 98 (10), 2309-2316CODEN: BIOJAU; ISSN:0006-3495. (Cell Press)Thermal stability of proteins is crucial for both biotechnol. and therapeutic applications. Rational protein engineering therefore frequently aims at increasing thermal stability by introducing stabilizing mutations. The accurate prediction of the thermodn. consequences caused by mutations, however, is highly challenging as thermal stability changes are caused by alterations in the free energy of folding. Growing computational power, however, increasingly allows us to use alchem. free energy simulations, such as free energy perturbation or thermodn. integration, to calc. free energy differences with relatively high accuracy. In this article, we present an automated protocol for setting up alchem. free energy calcns. for mutations of naturally occurring amino acids (except for proline) that allows an unprecedented, automated screening of large mutant libraries. To validate the developed protocol, we calcd. thermodn. stability differences for 109 mutations in the microbial RNase Barnase. The obtained quant. agreement with exptl. data illustrates the potential of the approach in protein engineering and design.
-
68Zhang, Z.; Wang, L.; Gao, Y.; Zhang, J.; Zhenirovskyy, M.; Alexov, E. Predicting Folding Free Energy Changes upon Single Point Mutations. Bioinformatics 2012, 28, 664– 671, DOI: 10.1093/bioinformatics/bts005Google Scholar68https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XjtlKntLg%253D&md5=f73d2f94ea145bd7a2b6ef7098e5ec52Predicting folding free energy changes upon single point mutationsZhang, Zhe; Wang, Lin; Gao, Yang; Zhang, Jie; Zhenirovskyy, Maxim; Alexov, EmilBioinformatics (2012), 28 (5), 664-671CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: The folding free energy is an important characteristic of proteins stability and is directly related to protein's wild-type function. The changes of protein's stability due to naturally occurring mutations, missense mutations, are typically causing diseases. Single point mutations made in vitro are frequently used to assess the contribution of given amino acid to the stability of the protein. In both cases, it is desirable to predict the change of the folding free energy upon single point mutations in order to either provide insights of the mol. mechanism of the change or to design new exptl. studies. Results: We report an approach that predicts the free energy change upon single point mutation by utilizing the 3D structure of the wild-type protein. It is based on variation of the mol. mechanics Generalized Born (MMGB) method, scaled with optimized parameters (sMMGB) and utilizing specific model of unfolded state. The corresponding mutations are built in silico and the predictions are tested against large dataset of 1109 mutations with exptl. measured changes of the folding free energy. Benchmarking resulted in root mean square deviation = 1.78 kcal/mol and slope of the linear regression fit between the exptl. data and the calcns. was 1.04. The sMMGB is compared with other leading methods of predicting folding free energy changes upon single mutations and results discussed with respect to various parameters. Availability: All the pdb files the authors used in this article can be downloaded from http://compbio.clemson.edu/downloadDir/mentaldisorders/sMMGBpdb.rar.
-
69Wickstrom, L.; Gallicchio, E.; Levy, R. M. The Linear Interaction Energy Method for the Prediction of Protein Stability Changes Upon Mutation. Proteins: Struct., Funct., Genet. 2012, 80, 111– 125, DOI: 10.1002/prot.23168Google Scholar69https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtlygu73L&md5=bc404545288a8529418812ed678171e1The linear interaction energy method for the prediction of protein stability changes upon mutationWickstrom, Lauren; Gallicchio, Emilio; Levy, Ronald M.Proteins: Structure, Function, and Bioinformatics (2012), 80 (1), 111-125CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)The coupling of protein energetics and sequence changes is a crit. aspect of computational protein design, as well as for the understanding of protein evolution, human disease, and drug resistance. To study the mol. basis for this coupling, computational tools must be sufficiently accurate and computationally inexpensive enough to handle large amts. of sequence data. We have developed a computational approach based on the linear interaction energy (LIE) approxn. to predict the changes in the free-energy of the native state induced by a single mutation. This approach was applied to a set of 822 mutations in 10 proteins which resulted in an av. unsigned error of 0.82 kcal/mol and a correlation coeff. of 0.72 between the calcd. and exptl. ΔΔG values. The method is able to accurately identify destabilizing hot spot mutations; however, it has difficulty in distinguishing between stabilizing and destabilizing mutations because of the distribution of stability changes for the set of mutations used to parameterize the model. In addn., the model also performs quite well in initial tests on a small set of double mutations. On the basis of these promising results, we can begin to examine the relationship between protein stability and fitness, correlated mutations, and drug resistance.
-
70Guerois, R.; Nielsen, J. E.; Serrano, L. Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More than 1000 Mutations. J. Mol. Biol. 2002, 320, 369– 387, DOI: 10.1016/S0022-2836(02)00442-4Google Scholar70https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XkslansLc%253D&md5=1e37d01c8310f0ba153cd2af3f5f771cPredicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutationsGuerois, Raphael; Nielsen, Jens Erik; Serrano, LuisJournal of Molecular Biology (2002), 320 (2), 369-387CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Science Ltd.)We have developed a computer algorithm, FOLDEF (for FOLD-X energy function), to provide a fast and quant. estn. of the importance of the interactions contributing to the stability of proteins and protein complexes. The predictive power of FOLDEF was tested on a very large set of point mutants (1088 mutants) spanning most of the structural environments found in proteins. FOLDEF uses a full at. description of the structure of the proteins. The different energy terms taken into account in FOLDEF have been weighted using empirical data obtained from protein engineering expts. First, we considered a training database of 339 mutants in nine different proteins and optimized the set of parameters and weighting factors that best accounted for the changes in stability of the mutants. The predictive power of the method was then tested using a blind test mutant database of 667 mutants, as well as a database of 82 protein-protein complex mutants. The global correlation obtained for 95 % of the entire mutant database (1030 mutants) is 0.83 with a std. deviation of 0.81 kcal mol-1 and a slope of 0.76. The present energy function uses a min. of computational resources and can therefore easily be used in protein design algorithms, and in the field of protein structure and folding pathways prediction where one requires a fast and accurate energy function.
-
71Mendes, J.; Guerois, R.; Serrano, L. Energy Estimation in Protein Design. Curr. Opin. Struct. Biol. 2002, 12, 441– 446, DOI: 10.1016/S0959-440X(02)00345-7Google Scholar71https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XlvF2rtLc%253D&md5=d0d2d096d37ae267145a550325ba0cc0Energy estimation in protein designMendes, Joaquim; Guerois, Raphael; Serrano, LuisCurrent Opinion in Structural Biology (2002), 12 (4), 441-446CODEN: COSBEF; ISSN:0959-440X. (Elsevier Science Ltd.)A review. The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold. The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold.
-
72Dehouck, Y.; Gilis, D.; Rooman, M. A New Generation of Statistical Potentials for Proteins. Biophys. J. 2006, 90, 4010– 4017, DOI: 10.1529/biophysj.105.079434Google Scholar72https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XltF2mt7s%253D&md5=9e0b446c406d4d388bb2a4ad0ef271e4A new generation of statistical potentials for proteinsDehouck, Y.; Gilis, D.; Rooman, M.Biophysical Journal (2006), 90 (11), 4010-4017CODEN: BIOJAU; ISSN:0006-3495. (Biophysical Society)We propose a novel and flexible derivation scheme of statistical, database-derived, potentials, which allows one to take simultaneously into account specific correlations between several sequence and structure descriptors. This scheme leads to the decompn. of the total folding free energy of a protein into a sum of lower order terms, thereby giving the possibility to analyze independently each contribution and clarify its significance and importance, to avoid overcounting certain contributions, and to deal more efficiently with the limited size of the database. In addn., this derivation scheme appears as quite general, for many previously developed potentials can be expressed as particular cases of our formalism. We use this formalism as a framework to generate different residue-based energy functions, whose performances are assessed on the basis of their ability to discriminate genuine proteins from decoy models. The optimal potential is generated as a combination of several coupling terms, measuring correlations between residue types, backbone torsion angles, solvent accessibilities, relative positions along the sequence, and interresidue distances. This potential outperforms all tested residue-based potentials, and even several atom-based potentials. Its incorporation in algorithms aiming at predicting protein structure and stability should therefore substantially improve their performances.
-
73Dehouck, Y.; Kwasigroch, J. M.; Gilis, D.; Rooman, M. PoPMuSiC 2.1: A Web Server for the Estimation of Protein Stability Changes upon Mutation and Sequence Optimality. BMC Bioinf. 2011, 12, 151, DOI: 10.1186/1471-2105-12-151Google Scholar73https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3MngtVKktg%253D%253D&md5=b05d95255f2c9c47c88a3d96485e76cdPoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimalityDehouck Yves; Kwasigroch Jean Marc; Gilis Dimitri; Rooman MarianneBMC bioinformatics (2011), 12 (), 151 ISSN:.BACKGROUND: The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level. RESULTS: PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability. CONCLUSION: The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.
-
74Liu, H. On Statistical Energy Functions for Biomolecular Modeling and Design. Quant. Biol. 2015, 3, 157– 167, DOI: 10.1007/s40484-015-0054-xGoogle Scholar74https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XjvFGrtw%253D%253D&md5=c09a8a84fa14a3c91a5ac461d6443ff9On statistical energy functions for biomolecular modeling and designLiu, HaiyanQuantitative Biology (2015), 3 (4), 157-167CODEN: QBUIA3; ISSN:2095-4697. (Springer GmbH)Statistical energy functions are general models about at. or residue-level interactions in biomols., derived from existing exptl. data. They provide quant. foundations for structural modeling as well as for structure-based protein sequence design. Statistical energy functions can be derived computationally either based on statistical distributions or based on variational assumptions. We present overviews on the theor. assumptions underlying the various types of approaches. Theor. considerations underlying important pragmatic choices are discussed. [Figure not available: see fulltext.].
-
75Kumar, M. D. S.; Bava, K. A.; Gromiha, M. M.; Prabakaran, P.; Kitajima, K.; Uedaira, H.; Sarai, A. ProTherm and ProNIT: Thermodynamic Databases for Proteins and Protein–Nucleic Acid Interactions. Nucleic Acids Res. 2006, 34, D204– 206, DOI: 10.1093/nar/gkj103Google Scholar75https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XisFyitA%253D%253D&md5=31a4c4d1ba1948a78963225177f1bcdfProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactionsKumar, M. D. Shaji; Bava, K. Abdulla; Gromiha, M. Michael; Prabakaran, Ponraj; Kitajima, Koji; Uedaira, Hatsuho; Sarai, AkinoriNucleic Acids Research (2006), 34 (Database), D204-D206CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)ProTherm and ProNIT are two thermodn. databases that contain exptl. detd. thermodn. parameters of protein stability and protein-nucleic acid interactions, resp. The current versions of both the databases have considerably increased the total no. of entries and enhanced search interface with added new fields, improved search, display and sorting options. As on Sept. 2005, ProTherm release 5.0 contains 17 113 entries from 771 proteins, retrieved from 1497 scientific articles (∼20% increase in data from the previous version). ProNIT release 2.0 contains 4900 entries from 273 research articles, representing 158 proteins. Both databases can be queried using WWW interfaces. Both quick search and advanced search are provided on this web page to facilitate easy retrieval and display of the data from these databases. ProTherm is freely available online at http://gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html and ProNIT at http://gibk26.bse.kyutech.ac.jp.jouhou/pronit/pronit.html.
-
76Pucci, F.; Bourgeas, R.; Rooman, M. High-Quality Thermodynamic Data on the Stability Changes of Proteins Upon Single-Site Mutations. J. Phys. Chem. Ref. Data 2016, 45, 023104, DOI: 10.1063/1.4947493Google Scholar76https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XpsF2kt70%253D&md5=054a88915e91acf65599527a74c7d0c6High-quality Thermodynamic Data on the Stability Changes of Proteins Upon Single-site MutationsPucci, Fabrizio; Bourgeas, Raphael; Rooman, MarianneJournal of Physical and Chemical Reference Data (2016), 45 (2), 023104/1-023104/53CODEN: JPCRBU; ISSN:0047-2689. (American Institute of Physics)We have set up and manually curated a dataset contg. exptl. information on the impact of amino acid substitutions in a protein on its thermal stability. It consists of a repository of exptl. measured melting temps. (Tm) and their changes upon point mutations (ΔTm) for proteins having a well-resolved x-ray structure. This high-quality dataset is designed for being used for the training or benchmarking of in silico thermal stability prediction methods. It also reports other exptl. measured thermodn. quantities when available, i.e., the folding enthalpy (ΔH) and heat capacity (ΔCP) of the wild type proteins and their changes upon mutations (ΔΔH and ΔΔCP), as well as the change in folding free energy (ΔΔG) at a ref. temp. These data are analyzed in view of improving our insights into the correlation between thermal and thermodn. stabilities, the asymmetry between the no. of stabilizing and destabilizing mutations, and the difference in stabilization potential of thermostable vs. mesostable proteins. (c) 2016 American Institute of Physics.
-
77Potapov, V.; Cohen, M.; Schreiber, G. Assessing Computational Methods for Predicting Protein Stability upon Mutation: Good on Average but Not in the Details. Protein Eng., Des. Sel. 2009, 22, 553– 560, DOI: 10.1093/protein/gzp030Google Scholar77https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtVKns7zI&md5=428ef7793cd6062f3e4b05831742ce25Assessing computational methods for predicting protein stability upon mutation: good on average but not in the detailsPotapov, Vladimir; Cohen, Mati; Schreiber, GideonProtein Engineering, Design & Selection (2009), 22 (9), 553-560CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)Methods for protein modeling and design advanced rapidly in recent years. At the heart of these computational methods is an energy function that calcs. the free energy of the system. Many of these functions were also developed to est. the consequence of mutation on protein stability or binding affinity. In the current study, the authors chose 6 different methods that were previously reported as being able to predict the change in protein stability (ΔΔG) upon mutation: CC/PBSA, EGAD, FoldX, I-Mutant2.0, Rosetta and Hunter. The authors evaluated their performance on a large set of 2156 single mutations, avoiding for each program the mutations used for training. The correlation coeffs. between exptl. and predicted ΔΔG values were in the range of 0.59 for the best and 0.26 for the worst performing method. All the tested computational methods showed a correct trend in their predictions, but failed in providing the precise values. This is not due to lack in precision of the exptl. data, which showed a correlation coeff. of 0.86 between different measurements. Combining the methods did not significantly improve prediction accuracy compared to a single method. These results suggest that there is still room for improvement, which is crucial if we want forcefields to perform better in their various tasks.
-
78Schymkowitz, J.; Borg, J.; Stricher, F.; Nys, R.; Rousseau, F.; Serrano, L. The FoldX Web Server: An Online Force Field. Nucleic Acids Res. 2005, 33, W382– 388, DOI: 10.1093/nar/gki387Google Scholar78https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXlslyrur4%253D&md5=1c3cd02dfeb8b5df1e1096939aa9cf03The FoldX web server: an online force fieldSchymkowitz, Joost; Borg, Jesper; Stricher, Francois; Nys, Robby; Rousseau, Frederic; Serrano, LuisNucleic Acids Research (2005), 33 (Web Server), W382-W388CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)FoldX is an empirical force field that was developed for the rapid evaluation of the effect of mutations on the stability, folding and dynamics of proteins and nucleic acids. The core functionality of FoldX, namely the calcn. of the free energy of a macromol. based on its high-resoln. 3D structure, is now publicly available through a web server at http://foldx.embl.de/. The current release allows the calcn. of the stability of a protein, calcn. of the positions of the protons and the prediction of water bridges, prediction of metal binding sites and the anal. of the free energy of complex formation. Alanine scanning, the systematic truncation of side chains to alanine, is also included. In addn., some reporting functions have been added, and it is now possible to print both the at. interaction networks that constitute the protein, print the structural and energetic details of the interactions per atom or per residue, as well as generate a general quality report of the pdb structure. This core functionality will be further extended as more FoldX applications are developed.
-
79Kepp, K. P. Towards a “Golden Standard” for Computing Globin Stability: Stability and Structure Sensitivity of Myoglobin Mutants. Biochim. Biophys. Acta, Proteins Proteomics 2015, 1854, 1239– 1248, DOI: 10.1016/j.bbapap.2015.06.002Google Scholar79https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtVWrtLjL&md5=92253dc29cffce1ed2835a1df377b9f6Towards a "Golden Standard" for computing globin stability: Stability and structure sensitivity of myoglobin mutantsKepp, Kasper P.Biochimica et Biophysica Acta, Proteins and Proteomics (2015), 1854 (10_Part_A), 1239-1248CODEN: BBAPBW; ISSN:1570-9639. (Elsevier B. V.)Fast and accurate computation of protein stability is increasingly important for e.g. protein engineering and protein misfolding diseases, but no consensus methods exist for important proteins such as globins, and performance may depend on the type of structural input given. This paper reports benchmarking of six protein stability calculators (POPMUSIC 2.1, I-Mutant 2.0, I-Mutant 3.0, CUPSAT, SDM, and mCSM) against 134 exptl. stability changes for mutations of sperm-whale myoglobin. Six different high-resoln. structures were used to test structure sensitivity that may impair protein calcns. The trend accuracy of the methods decreased as I-Mutant 2.0 (R = 0.64 - 0.65), SDM (R = 0.57 - 0.60), POPMUSIC2.1 (R = 0.54 - 0.57), I-Mutant 3.0 (R = 0.53 - 0.55), mCSM (R = 0.35 - 0.47), and CUPSAT (R = 0.25 - 0.48). The mean signed errors increased as SDM < CUPSAT < I-Mutant 2.0 < I-Mutant 3.0 < POPMUSIC 2.1 < mCSM. Mean abs. errors increased as I-Mutant 2.0 < I-Mutant 3.0 < POPMUSIC 2.1 < CUPSAT < SDM < mCSM. Structural sensitivity increased as I-Mutant 3.0 (0.05) < I-Mutant 2.0 (0.09) < POPMUSIC 2.1 (0.12) < SDM (0.18) < mCSM (0.27) < CUPSAT (0.58). Leaving out heterogeneous exptl. data did not change conclusions. The distinct performances reveal room for improvement, but I-Mutant 2.0 is proficient for this purpose, as further validated against a data set of related cytochrome c like proteins. The results also emphasize the importance of high-quality crystal structures and reveal structure-dependent effects even in the near-at. resoln. limit.
-
80Christensen, N. J.; Kepp, K. P. Accurate Stabilities of Laccase Mutants Predicted with a Modified FoldX Protocol. J. Chem. Inf. Model. 2012, 52, 3028– 3042, DOI: 10.1021/ci300398zGoogle Scholar80https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhsFOlt77M&md5=967d831a16967900e285baf70988ad75Accurate Stabilities of Laccase Mutants Predicted with a Modified FoldX ProtocolChristensen, Niels J.; Kepp, Kasper P.Journal of Chemical Information and Modeling (2012), 52 (11), 3028-3042CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Fungal laccases are multicopper enzymes of industrial importance due to their high stability, multifunctionality, and oxidizing power. This paper reports computational protocols that quantify the relative stability (ΔΔG of folding) of mutants of high-redox-potential laccases (TvLIIIb and PM1L) with up to 11 simultaneously mutated sites with good correlation against exptl. stability trends. Mol. dynamics simulations of the two laccases show that FoldX is very structure-sensitive, since all mutants and the wild type must share structural configuration to avoid artifacts of local sampling. However, using the av. of 50 MD snapshots of the equilibrated trajectories restores correlation (r ∼ 0.7-0.9, r2 ∼ 0.49-0.81) and provides a root-mean-square accuracy of ∼1.2 kcal/mol for ΔΔG or 3.5 °C for T50, suggesting that the time-av. of the crystal structure is recovered. MD-averaged input also reduces the spread in ΔΔG, suggesting that local FoldX sampling overestimates free energy changes because of neglected protein relaxation. FoldX can be viewed as a simple "linear interaction energy" method using sampling of the wild type and mutant and a parametrized relative free energy function: Thus, we show in this work that a substantial "hysteresis" of ∼1 kcal/mol applies to FoldX, and that an improved protocol that reverses calcns. and uses the av. obtained ΔΔG enhances correlation with the exptl. data. As glycosylation is ignored in FoldX, its effect on ΔΔG must be additive to the amino acid mutations. Quant. structure-property relationships of the FoldX energy components produced a substantially improved laccase stability predictor with errors of ∼1 °C for T50, vs 3-5 °C for a std. FoldX protocol. The developed model provides insight into the phys. forces governing the high stability of fungal laccases, most notably the hydrophobic and van der Waals interactions in the folded state, which provide most of the predictive power.
-
81MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586– 3616, DOI: 10.1021/jp973084fGoogle Scholar81https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXivVOlsb4%253D&md5=ebb5100dafd0daeee60ca2fa66c1324aAll-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of ProteinsMacKerell, A. D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E., III; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M.Journal of Physical Chemistry B (1998), 102 (18), 3586-3616CODEN: JPCBFK; ISSN:1089-5647. (American Chemical Society)New protein parameters are reported for the all-atom empirical energy function in the CHARMM program. The parameter evaluation was based on a self-consistent approach designed to achieve a balance between the internal (bonding) and interaction (nonbonding) terms of the force field and among the solvent-solvent, solvent-solute, and solute-solute interactions. Optimization of the internal parameters used exptl. gas-phase geometries, vibrational spectra, and torsional energy surfaces supplemented with ab initio results. The peptide backbone bonding parameters were optimized with respect to data for N-methylacetamide and the alanine dipeptide. The interaction parameters, particularly the at. charges, were detd. by fitting ab initio interaction energies and geometries of complexes between water and model compds. that represented the backbone and the various side chains. In addn., dipole moments, exptl. heats and free energies of vaporization, solvation and sublimation, mol. vols., and crystal pressures and structures were used in the optimization. The resulting protein parameters were tested by applying them to noncyclic tripeptide crystals, cyclic peptide crystals, and the proteins crambin, bovine pancreatic trypsin inhibitor, and carbonmonoxy myoglobin in vacuo and in a crystal. A detailed anal. of the relationship between the alanine dipeptide potential energy surface and calcd. protein φ, χ angles was made and used in optimizing the peptide group torsional parameters. The results demonstrate that use of ab initio structural and energetic data by themselves are not sufficient to obtain an adequate backbone representation for peptides and proteins in soln. and in crystals. Extensive comparisons between mol. dynamics simulation and exptl. data for polypeptides and proteins were performed for both structural and dynamic properties. Calcd. data from energy minimization and dynamics simulations for crystals demonstrate that the latter are needed to obtain meaningful comparisons with exptl. crystal structures. The presented parameters, in combination with the previously published CHARMM all-atom parameters for nucleic acids and lipids, provide a consistent set for condensed-phase simulations of a wide variety of mols. of biol. interest.
-
82Oostenbrink, C.; Villa, A.; Mark, A. E.; van Gunsteren, W. F. A Biomolecular Force Field Based on the Free Enthalpy of Hydration and Solvation: The GROMOS Force-Field Parameter Sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656– 1676, DOI: 10.1002/jcc.20090Google Scholar82https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXmvVOhtr4%253D&md5=f2c0be6f44fe768128989c9031957e4eA biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force-field parameter sets 53A5 and 53A6Oostenbrink, Chris; Villa, Alessandra; Mark, Alan E.; van Gunsteren, Wilfred F.Journal of Computational Chemistry (2004), 25 (13), 1656-1676CODEN: JCCHDD; ISSN:0192-8651. (John Wiley & Sons, Inc.)Successive parameterizations of the GROMOS force field have been used successfully to simulate biomol. systems over a long period of time. The continuing expansion of computational power with time makes it possible to compute ever more properties for an increasing variety of mol. systems with greater precision. This has led to recurrent parameterizations of the GROMOS force field all aimed at achieving better agreement with exptl. data. Here we report the results of the latest, extensive reparameterization of the GROMOS force field. In contrast to the parameterization of other biomol. force fields, this parameterization of the GROMOS force field is based primarily on reproducing the free enthalpies of hydration and apolar solvation for a range of compds. This approach was chosen because the relative free enthalpy of solvation between polar and apolar environments is a key property in many biomol. processes of interest, such as protein folding, biomol. assocn., membrane formation, and transport over membranes. The newest parameter sets, 53A5 and 53A6, were optimized by first fitting to reproduce the thermodn. properties of pure liqs. of a range of small polar mols. and the solvation free enthalpies of amino acid analogs in cyclohexane (53A5). The partial charges were then adjusted to reproduce the hydration free enthalpies in water (53A6). Both parameter sets are fully documented, and the differences between these and previous parameter sets are discussed.
-
83Alford, R. F.; Leaver-Fay, A.; Jeliazkov, J. R.; O’Meara, M. J.; DiMaio, F. P.; Park, H.; Shapovalov, M. V.; Renfrew, P. D.; Mulligan, V. K.; Kappel, K.; Labonte, J. W.; Pacella, M. S.; Bonneau, R.; Bradley, P.; Dunbrack, R. L.; Das, R.; Baker, D.; Kuhlman, B.; Kortemme, T.; Gray, J. J. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 3031– 3048, DOI: 10.1021/acs.jctc.7b00125Google Scholar83https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXmsFajtb0%253D&md5=7c50732bb0c8d060bbf13df04766ce39The Rosetta All-Atom Energy Function for Macromolecular Modeling and DesignAlford, Rebecca F.; Leaver-Fay, Andrew; Jeliazkov, Jeliazko R.; O'Meara, Matthew J.; DiMaio, Frank P.; Park, Hahnbeom; Shapovalov, Maxim V.; Renfrew, P. Douglas; Mulligan, Vikram K.; Kappel, Kalli; Labonte, Jason W.; Pacella, Michael S.; Bonneau, Richard; Bradley, Philip; Dunbrack, Roland L.; Das, Rhiju; Baker, David; Kuhlman, Brian; Kortemme, Tanja; Gray, Jeffrey J.Journal of Chemical Theory and Computation (2017), 13 (6), 3031-3048CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)A review. Over the past decade, the Rosetta biomol. modeling suite has informed diverse biol. questions and engineering challenges ranging from interpretation of low-resoln. structural data to design of nanomaterials, protein therapeutics, and vaccines. Central to Rosetta's success is the energy function: a model parameterized from small mol. and x-ray crystal structure data used to approx. the energy assocd. with each biomol. conformation. This paper describes the math. models and phys. concepts that underlie the latest Rosetta Energy Function, REF15. Applying these concepts, the authors explain how to use Rosetta energies to identify and analyze the features of biomol. models. Finally, the authors discuss the latest advances in the energy function that extend capabilities from sol. proteins to also include membrane proteins, peptides contg. noncanonical amino acids, small mols., carbohydrates, nucleic acids, and other macromols.
-
84Davey, J. A.; Damry, A. M.; Euler, C. K.; Goto, N. K.; Chica, R. A. Prediction of Stable Globular Proteins Using Negative Design with Non-Native Backbone Ensembles. Structure 2015, 23, 2011– 2021, DOI: 10.1016/j.str.2015.07.021Google Scholar84https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhsFKqsbzF&md5=4c11259f13e4cd1a28c5b550631978eePrediction of Stable Globular Proteins Using Negative Design with Non-native Backbone EnsemblesDavey, James A.; Damry, Adam M.; Euler, Christian K.; Goto, Natalie K.; Chica, Roberto A.Structure (Oxford, United Kingdom) (2015), 23 (11), 2011-2021CODEN: STRUE6; ISSN:0969-2126. (Elsevier Ltd.)Accurate predictions of protein stability have great potential to accelerate progress in computational protein design, yet the correlation of predicted and exptl. detd. stabilities remains a significant challenge. To address this problem, we have developed a computational framework based on neg. multistate design in which sequence energy is evaluated in the context of both native and non-native backbone ensembles. This framework was validated exptl. with the design of ten variants of streptococcal protein G domain β1 that retained the wild-type fold, and showed a very strong correlation between predicted and exptl. stabilities (R2 = 0.86). When applied to four different proteins spanning a range of fold types, similarly strong correlations were also obtained. Overall, the enhanced prediction accuracies afforded by this method pave the way for new strategies to facilitate the generation of proteins with novel functions by computational protein design.
-
85Ó Conchúir, S.; Barlow, K. A.; Pache, R. A.; Ollikainen, N.; Kundert, K.; O’Meara, M. J.; Smith, C. A.; Kortemme, T. A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design. PLoS One 2015, 10, e0130433, DOI: 10.1371/journal.pone.0130433Google ScholarThere is no corresponding record for this reference.
-
86Trainor, K.; Broom, A.; Meiering, E. M. Exploring the Relationships between Protein Sequence, Structure and Solubility. Curr. Opin. Struct. Biol. 2017, 42, 136– 146, DOI: 10.1016/j.sbi.2017.01.004Google Scholar86https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVeitbw%253D&md5=ecc332d41a33abbdd3a2ff614195d08fExploring the relationships between protein sequence, structure and solubilityTrainor, Kyle; Broom, Aron; Meiering, Elizabeth M.Current Opinion in Structural Biology (2017), 42 (), 136-146CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)A review. Aggregation can be thought of as a form of protein folding in which intermol. assocns. lead to the formation of large, insol. assemblies. Various types of aggregates can be differentiated by their internal structures and gross morphologies (e.g., fibrillar or amorphous), and the ability to accurately predict the likelihood of their formation by a given polypeptide is of great practical utility in the fields of biol. (including the study of disease), biotechnol., and biomaterials research. Here we review aggregation/soly. prediction methods and selected applications thereof. The development of increasingly sophisticated methods that incorporate knowledge of conformations possibly adopted by aggregating polypeptide monomers and predict the internal structure of aggregates is improving the accuracy of the predictions and continually expanding the range of applications.
-
87Das, R. Four Small Puzzles That Rosetta Doesn’t Solve. PLoS One 2011, 6, e20044, DOI: 10.1371/journal.pone.0020044Google Scholar87https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXms1Kmur4%253D&md5=e61085052b8642f9819bf84d8090f4cbFour small puzzles that Rosetta doesn't solveDas, RhijuPLoS One (2011), 6 (5), e20044CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)A complete macromol. modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resoln. structure modeling and design, the Rosetta software suite fares poorly on small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approxns. and omissions in the Rosetta all-atom energy function currently preclude discriminating exptl. obsd. conformations from de novo models at at. resoln. These mol. "puzzles" should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.
-
88Kellogg, E. H.; Leaver-Fay, A.; Baker, D. Role of Conformational Sampling in Computing Mutation-Induced Changes in Protein Structure and Stability. Proteins: Struct., Funct., Genet. 2011, 79, 830– 838, DOI: 10.1002/prot.22921Google Scholar88https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjtFahsbg%253D&md5=df144d0b7df3f42669c7344c0b13b806Role of conformational sampling in computing mutation-induced changes in protein structure and stabilityKellogg, Elizabeth H.; Leaver-Fay, Andrew; Baker, DavidProteins: Structure, Function, and Bioinformatics (2011), 79 (3), 830-838CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)The prediction of changes in protein stability and structure resulting from single amino acid substitutions is both a fundamental test of macromol. modeling methodol. and an important current problem as high throughput sequencing reveals sequence polymorphisms at an increasing rate. In principle, given the structure of a wild-type protein and a point mutation whose effects are to be predicted, an accurate method should recapitulate both the structural changes and the change in the folding-free energy. Here, we explore the performance of protocols which sample an increasing diversity of conformations. We find that surprisingly similar performances in predicting changes in stability are achieved using protocols that involve very different amts. of conformational sampling, provided that the resoln. of the force field is matched to the resoln. of the sampling method. Methods involving backbone sampling can in some cases closely recapitulate the structural changes accompanying mutations but not surprisingly tend to do more harm than good in cases where structural changes are negligible. Anal. of the outliers in the stability change calcns. suggests areas needing particular improvement; these include the balance between desolvation and the formation of favorable buried polar interactions, and unfolded state modeling.
-
89Musil, M.; Stourac, J.; Bendl, J.; Brezovsky, J.; Prokop, Z.; Zendulka, J.; Martinek, T.; Bednar, D.; Damborsky, J. FireProt: Web Server for Automated Design of Thermostable Proteins. Nucleic Acids Res. 2017, 45, W393– W399, DOI: 10.1093/nar/gkx285Google Scholar89https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1ajtbs%253D&md5=10985da4ecd4d7ff3835a413c85f8e3bFireProt: web server for automated design of thermostable proteinsMusil, Milos; Stourac, Jan; Bendl, Jaroslav; Brezovsky, Jan; Prokop, Zbynek; Zendulka, Jaroslav; Martinek, Tomas; Bednar, David; Damborsky, JiriNucleic Acids Research (2017), 45 (W1), W393-W399CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)There is a continuous interest in increasing proteins stability to enhance their usability in numerous biomedical and biotechnol. applications. A no. of in silico tools for the prediction of the effect of mutations on protein stability have been developed recently. However, only single-point mutations with a small effect on protein stability are typically predicted with the existing tools and have to be followed by laborious protein expression, purifn., and characterization. Here, the authors present FireProt, a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calcn. core. FireProt utilizes sixteen tools and three protein engineering strategies for making reliable protein designs. The server is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostable mutants. FireProt is freely available at http://loschmidt.chemi.muni.cz/fireprot.
-
90Bush, J.; Makhatadze, G. I. Statistical Analysis of Protein Structures Suggests That Buried Ionizable Residues in Proteins Are Hydrogen Bonded or Form Salt Bridges. Proteins: Struct., Funct., Genet. 2011, 79, 2027– 2032, DOI: 10.1002/prot.23067Google Scholar90https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXnt1Klt7k%253D&md5=93d332c5c9a965168e698c947386f46bStatistical analysis of protein structures suggests that buried ionizable residues in proteins are hydrogen bonded or form salt bridgesBush, Jeffrey; Makhatadze, George I.Proteins: Structure, Function, and Bioinformatics (2011), 79 (7), 2027-2032CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)It is well known that nonpolar residues are largely buried in the interior of proteins, whereas polar and ionizable residues tend to be more localized on the protein surface where they are solvent-exposed. Such a distribution of residues between surface and interior is well understood from a thermodn. point: nonpolar side-chains are excluded from contact with solvent water, whereas polar and ionizable groups have favorable interactions with water and thus are preferred at the protein surface. However, there is an increasing amt. of information suggesting that polar and ionizable residues do occur in the protein core, including at positions that have no known functional importance. This is inconsistent with the observations that dehydration of polar and in particular ionizable groups is very energetically unfavorable. To resolve this, the authors performed a detailed anal. of the distribution of fractional burial of polar and ionizable residues using a large set of ∼2600 non-homologous protein structures. The authors showed that when ionizable residues were fully buried, the vast majority of them formed H-bonds and/or salt bridges with other polar/ionizable groups. This observation resolved an apparent contradiction: the energetic penalty of dehydration of polar/ionizable groups is paid off by the favorable energy of H-bonding and/or salt bridge formation in the protein interior. This conclusion agrees well with previous findings based on continuum models for electrostatic interactions in proteins.
-
91Stranges, P. B.; Kuhlman, B. A Comparison of Successful and Failed Protein Interface Designs Highlights the Challenges of Designing Buried Hydrogen Bonds. Protein Sci. 2013, 22, 74– 82, DOI: 10.1002/pro.2187Google Scholar91https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhvVeksrvO&md5=1e503efb4899c5769d094fa4b4a259b6A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bondsStranges, P. Benjamin; Kuhlman, BrianProtein Science (2013), 22 (1), 74-82CODEN: PRCIEI; ISSN:1469-896X. (Wiley-Blackwell)The accurate design of new protein-protein interactions is a longstanding goal of computational protein design. However, most computationally designed interfaces fail to form exptl. This investigation compares five previously described successful de novo interface designs with 158 failures. Both sets of proteins were designed with the mol. modeling program Rosetta. Designs were considered a success if a high-resoln. crystal structure of the complex closely matched the design model and the equil. dissocn. const. for binding was less than 10 μM. The successes and failures represent a wide variety of interface types and design goals including heterodimers, homodimers, peptide-protein interactions, one-sided designs (i.e., where only one of the proteins was mutated) and two-sided designs. The most striking feature of the successful designs is that they have fewer polar atoms at their interfaces than many of the failed designs. Designs that attempted to create extensive sets of interface-spanning hydrogen bonds resulted in no detectable binding. In contrast, polar atoms make up more than 40% of the interface area of many natural dimers, and native interfaces often contain extensive hydrogen bonding networks. These results suggest that Rosetta may not be accurately balancing hydrogen bonding and electrostatic energies against desolvation penalties and that design processes may not include sufficient sampling to identify side chains in preordered conformations that can fully satisfy the hydrogen bonding potential of the interface.
-
92Beerens, K.; Mazurenko, S.; Kunka, A.; Marques, S. M.; Hansen, N.; Musil, M.; Chaloupkova, R.; Waterman, J.; Brezovsky, J.; Bednar, D.; Prokop, Z.; Damborsky, J. Evolutionary Analysis Is a Powerful Complement to Energy Calculations for Protein Stabilization. ACS Catal. 2018, 8, 9420– 9428, DOI: 10.1021/acscatal.8b01677Google Scholar92https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1ChtrjL&md5=c558f092f166df3aa700b008f3bfae5dEvolutionary Analysis As a Powerful Complement to Energy Calculations for Protein StabilizationBeerens, Koen; Mazurenko, Stanislav; Kunka, Antonin; Marques, Sergio M.; Hansen, Niels; Musil, Milos; Chaloupkova, Radka; Waterman, Jitka; Brezovsky, Jan; Bednar, David; Prokop, Zbynek; Damborsky, JiriACS Catalysis (2018), 8 (10), 9420-9428CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)Stability is one of the most important characteristics of proteins employed as biocatalysts, biotherapeutics and biomaterials, and the role of computational approaches in modifying protein stability is rapidly expanding. We have recently identified stabilizing mutations in haloalkane dehalogenase DhaA using phylogenetic anal. but were not able to reproduce the effects of these mutations using force-field calcns. Here we tested four different hypotheses to explain the mol. basis of stabilization using structural, biochem., biophys. and computational analyses. We demonstrate that stabilization of DhaA by the mutations identified using the phylogenetic anal. is driven by both entropy and enthalpy-contributions, in contrast to primarily enthalpy-driven stabilization by mutations designed by the force-field calcns. Comprehensive bioinformatics anal. revealed that more than half (53%) of 1,099 evolution-based stabilizing mutations would be evaluated as de-stabilizing by force-field calcns. Thermodn. integration considers both folded and unfolded states and can describe the entropic component of stabilization, yet it is not suitable for predictive purposes due to computational demands. Altogether, our results strongly suggest that energetic calcns. should be complemented by a phylogenetic anal. in protein stabilization endeavors.
-
93Wijma, H. J.; Floor, R. J.; Jekel, P. A.; Baker, D.; Marrink, S. J.; Janssen, D. B. Computationally Designed Libraries for Rapid Enzyme Stabilization. Protein Eng., Des. Sel. 2014, 27, 49– 58, DOI: 10.1093/protein/gzt061Google Scholar93https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXptV2gsA%253D%253D&md5=86dfd0f58590931be81287805299d234Computationally designed libraries for rapid enzyme stabilizationWijma, Hein J.; Floor, Robert J.; Jekel, Peter A.; Baker, David; Marrink, Siewert J.; Janssen, Dick B.Protein Engineering, Design & Selection (2014), 27 (2), 49-58CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)The ability to engineer enzymes and other proteins to any desired stability would have wide-ranging applications. Here, we demonstrate that computational design of a library with chem. diverse stabilizing mutations allows the engineering of drastically stabilized and fully functional variants of the mesostable enzyme limonene epoxide hydrolase. First, point mutations were selected if they significantly improved the predicted free energy of protein folding. Disulfide bonds were designed using sampling of backbone conformational space, which tripled the no. of exptl. stabilizing disulfide bridges. Next, orthogonal in silico screening steps were used to remove chem. unreasonable mutations and mutations that are predicted to increase protein flexibility. The resulting library of 64 variants was exptl. screened, which revealed 21 (pairs of) stabilizing mutations located both in relatively rigid and in flexible areas of the enzyme. Finally, combining 10-12 of these confirmed mutations resulted in multi-site mutants with an increase in apparent melting temp. from 50 to 85°C, enhanced catalytic activity, preserved regioselectivity and a >250-fold longer half-life. The developed Framework for Rapid Enzyme Stabilization by Computational libraries (FRESCO) requires far less screening than conventional directed evolution.
-
94Thiltgen, G.; Goldstein, R. A. Assessing Predictors of Changes in Protein Stability upon Mutation Using Self-Consistency. PLoS One 2012, 7, e46084, DOI: 10.1371/journal.pone.0046084Google Scholar94https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xhs12qsrbM&md5=0f37aae808ba1872727b2d0a162f5f07Assessing predictors of changes in protein stability upon mutation using self-consistencyThiltgen, Grant; Goldstein, Richard A.PLoS One (2012), 7 (10), e46084CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)The ability to predict the effect of mutations on protein stability is important for a wide range of tasks, from protein engineering to assessing the impact of SNPs to understanding basic protein biophysics. A no. of methods have been developed that make these predictions, but assessing the accuracy of these tools is difficult given the limitations and inconsistencies of the exptl. data. We evaluate four different methods based on the ability of these methods to generate consistent results for forward and back mutations and examine how this ability varies with the nature and location of the mutation. We find that, while one method seems to outperform the others, the ability of these methods to make accurate predictions is limited.
-
95Buß, O.; Rudat, J.; Ochsenreither, K. FoldX as Protein Engineering Tool: Better Than Random Based Approaches?. Comput. Struct. Biotechnol. J. 2018, 16, 25– 33, DOI: 10.1016/j.csbj.2018.01.002Google Scholar95https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXjsF2gtr0%253D&md5=a4864b8be6a05bd2e9593d27433aaef4FoldX as Protein Engineering Tool: Better Than Random Based Approaches?Buss, Oliver; Rudat, Jens; Ochsenreither, KatrinComputational and Structural Biotechnology Journal (2018), 16 (), 25-33CODEN: CSBJAC; ISSN:2001-0370. (Elsevier B.V.)Improving protein stability is an important goal for basic research as well as for clin. and industrial applications but no commonly accepted and widely used strategy for efficient engineering is known. Beside random approaches like error prone PCR or phys. techniques to stabilize proteins, e.g. by immobilization, in silico approaches are gaining more attention to apply target-oriented mutagenesis. In this review different algorithms for the prediction of beneficial mutation sites to enhance protein stability are summarized and the advantages and disadvantages of FoldX are highlighted. The question whether the prediction of mutation sites by the algorithm FoldX is more accurate than random based approaches is addressed.
-
96Allen, B. D.; Nisthal, A.; Mayo, S. L. Experimental Library Screening Demonstrates the Successful Application of Computational Protein Design to Large Structural Ensembles. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 19838– 19843, DOI: 10.1073/pnas.1012985107Google Scholar96https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhsVyjsbrE&md5=4f90691cd71820f87fcae32845b45239Experimental library screening demonstrates the successful application of computational protein design to large structural ensemblesAllen, Benjamin D.; Nisthal, Alex; Mayo, Stephen L.Proceedings of the National Academy of Sciences of the United States of America (2010), 107 (46), 19838-19843, S19838/1-S19838/8CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)The stability, activity, and soly. of a protein sequence are detd. by a delicate balance of mol. interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate more thorough anal., we developed new methods for the design and high-throughput stability detn. of combinatorial mutation libraries based on protein design calcns. The application of these methods to the core design of a small model system produced many variants with improved thermodn. stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and exptl. measured stability values shows clearly that a design procedure need not reproduce exptl. data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technol.
-
97Barlow, K. A.; Ó Conchúir, S.; Thompson, S.; Suresh, P.; Lucas, J. E.; Heinonen, M.; Kortemme, T. Flex DdG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation. J. Phys. Chem. B 2018, 122, 5389– 5399, DOI: 10.1021/acs.jpcb.7b11367Google Scholar97https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit1ymsb4%253D&md5=0bd9fb996c5579bca1cd4bc13608cd13Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon MutationBarlow, Kyle A.; Conchuir, Shane O.; Thompson, Samuel; Suresh, Pooja; Lucas, James E.; Heinonen, Markus; Kortemme, TanjaJournal of Physical Chemistry B (2018), 122 (21), 5389-5399CODEN: JPCBFK; ISSN:1520-5207. (American Chemical Society)Computationally modeling changes in binding free energies upon mutation (interface ΔΔG) allows large-scale prediction and perturbation of protein-protein interactions. Addnl., methods that consider and sample relevant conformational plasticity should be able to achieve higher prediction accuracy over methods that do not. To test this hypothesis, the authors developed a method within the Rosetta macromol. modeling suite (flex ddG) that samples conformational diversity using "backrub" to generate an ensemble of models and then applies torsion minimization, side chain repacking, and averaging across this ensemble to est. interface ΔΔG values. The authors tested the method on a curated benchmark set of 1240 mutants, and found the method outperformed existing methods that sampled conformational space to a lesser degree. The authors obsd. considerable improvements with flex ddG over existing methods on the subset of small side chain to large side chain mutations, as well as for multiple simultaneous nonalanine mutations, stabilizing mutations, and mutations in antibody-antigen interfaces. Finally, the authors applied a generalized additive model (GAM) approach to the Rosetta energy function; the resulting nonlinear reweighting model improved the agreement with exptl. detd. interface ΔΔG values but also highlighted the necessity of future energy function improvements.
-
98Ludwiczak, J.; Jarmula, A.; Dunin-Horkawicz, S. Combining Rosetta with Molecular Dynamics (MD): A Benchmark of the MD-Based Ensemble Protein Design. J. Struct. Biol. 2018, 203, 54– 61, DOI: 10.1016/j.jsb.2018.02.004Google Scholar98https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXivFSkur8%253D&md5=d877ddf87c0d62bb2467beaaa0c0c164Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein designLudwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, StanislawJournal of Structural Biology (2018), 203 (1), 54-61CODEN: JSBIEM; ISSN:1047-8477. (Elsevier Inc.)Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while mol. dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addn., we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for exptl. validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, esp. for tasks that require a large pool of diverse sequences.
-
99Davis, I. W.; Arendall, W. B.; Richardson, D. C.; Richardson, J. S. The Backrub Motion: How Protein Backbone Shrugs When a Sidechain Dances. Structure 2006, 14, 265– 274, DOI: 10.1016/j.str.2005.10.007Google Scholar99https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtlKltr8%253D&md5=b3bddc8b2314f8a5dabf31f1c3912241The Backrub Motion: How Protein Backbone Shrugs When a Sidechain DancesDavis, Ian W.; Arendall, W. Bryan; Richardson, David C.; Richardson, Jane S.Structure (Cambridge, MA, United States) (2006), 14 (2), 265-274CODEN: STRUE6; ISSN:0969-2126. (Cell Press)Surprisingly, the frozen structures from ultra-high-resoln. protein crystallog. reveal a prevalent, but subtle, mode of local backbone motion coupled to much larger, two-state changes of sidechain conformation. This "backrub" motion provides an influential and common type of local plasticity in protein backbone. Concerted reorientation of two adjacent peptides swings the central sidechain perpendicular to the chain direction, changing accessible sidechain conformations while leaving flanking structure undisturbed. Alternate conformations in sub-1 Å crystal structures show backrub motions for two-thirds of the significant Cβ shifts and 3% of the total residues in these proteins (126/3882), accompanied by two-state changes in sidechain rotamer. The B modeling tool is effective in crystallog. rebuilding. For homol. modeling or protein redesign, backrubs can provide realistic, small perturbations to rigid backbones. For large sidechain changes in protein dynamics or for single mutations, backrubs allow backbone accommodation while maintaining H bonds and ideal geometry.
-
100Wei, G.; Xi, W.; Nussinov, R.; Ma, B. Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the Cell. Chem. Rev. 2016, 116, 6516– 6551, DOI: 10.1021/acs.chemrev.5b00562Google Scholar100https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XitVyhsLo%253D&md5=fac9ab64e11aa3a4f2f0988bf1db1209Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the CellWei, Guanghong; Xi, Wenhui; Nussinov, Ruth; Ma, BuyongChemical Reviews (Washington, DC, United States) (2016), 116 (11), 6516-6551CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)All sol. proteins populate conformational ensembles that together constitute the native state. Their fluctuations in water are intrinsic thermodn. phenomena, and the distributions of the states on the energy landscape are detd. by statistical thermodn.; however, they are optimized to perform their biol. functions. In this review we briefly describe advances in free energy landscape studies of protein conformational ensembles. Exptl. (NMR, small-angle X-ray scattering, single-mol. spectroscopy, and cryo-electron microscopy) and computational (replica-exchange mol. dynamics, metadynamics, and Markov state models) approaches have made great progress in recent years. These address the challenging characterization of the highly flexible and heterogeneous protein ensembles. We focus on structural aspects of protein conformational distributions, from collective motions of single- and multi-domain proteins, intrinsically disordered proteins, to multiprotein complexes. Importantly, we highlight recent studies that illustrate functional adjustment of protein conformational ensembles in the crowded cellular environment. We center on the role of the ensemble in recognition of small- and macro-mols. (protein and RNA/DNA) and emphasize emerging concepts of protein dynamics in enzyme catalysis. Overall, protein ensembles link fundamental physicochem. principles and protein behavior and the cellular network and its regulation.
-
101Fan, H.; Mark, A. E. Relative Stability of Protein Structures Determined by X-Ray Crystallography or NMR Spectroscopy: A Molecular Dynamics Simulation Study. Proteins: Struct., Funct., Genet. 2003, 53, 111– 120, DOI: 10.1002/prot.10496Google Scholar101https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXnt1WltLw%253D&md5=b0f43f93057a0824336823a539ae3985Relative stability of protein structures determined by X-ray crystallography or NMR spectroscopy: A molecular dynamics simulation studyFan, Hao; Mark, Alan E.Proteins: Structure, Function, and Genetics (2003), 53 (1), 111-120CODEN: PSFGEY; ISSN:0887-3585. (Wiley-Liss, Inc.)The relative stability of protein structures detd. by either x-ray crystallog. or NMR spectroscopy has been investigated by using mol. dynamics simulation techniques. Published structures of 34 proteins contg. between 50 and 100 residues have been evaluated. The proteins selected represent a mixt. of secondary structure types including all α, all β, and α/β. The proteins selected do not contain cysteine-cysteine bridges. In addn., any crystallog. waters, metal ions, cofactors, or bound ligands were removed before the systems were simulated. The stability of the structures was evaluated by simulating, under identical conditions, each of the proteins for at least 5 ns in explicit solvent. It is found that not only do NMR-derived structures have, on av., higher internal strain than structures detd. by x-ray crystallog. but that a significant proportion of the structures are unstable and rapidly diverge in simulations.
-
102Kuzmanic, A.; Pannu, N. S.; Zagrovic, B. X-Ray Refinement Significantly Underestimates the Level of Microscopic Heterogeneity in Biomolecular Crystals. Nat. Commun. 2014, 5, 3220, DOI: 10.1038/ncomms4220Google Scholar102https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2cvivFKgsw%253D%253D&md5=99e76ed6614f1c57b46fa8917bcbcf99X-ray refinement significantly underestimates the level of microscopic heterogeneity in biomolecular crystalsKuzmanic Antonija; Zagrovic Bojan; Pannu Navraj SNature communications (2014), 5 (), 3220 ISSN:.Biomolecular X-ray structures typically provide a static, time- and ensemble-averaged view of molecular ensembles in crystals. In the absence of rigid-body motions and lattice defects, B-factors are thought to accurately reflect the structural heterogeneity of such ensembles. In order to study the effects of averaging on B-factors, we employ molecular dynamics simulations to controllably manipulate microscopic heterogeneity of a crystal containing 216 copies of villin headpiece. Using average structure factors derived from simulation, we analyse how well this heterogeneity is captured by high-resolution molecular-replacement-based model refinement. We find that both isotropic and anisotropic refined B-factors often significantly deviate from their actual values known from simulation: even at high 1.0 ÅA resolution and Rfree of 5.9%, B-factors of some well-resolved atoms underestimate their actual values even sixfold. Our results suggest that conformational averaging and inadequate treatment of correlated motion considerably influence estimation of microscopic heterogeneity via B-factors, and invite caution in their interpretation.
-
103Karshikoff, A.; Nilsson, L.; Ladenstein, R. Rigidity versus Flexibility: The Dilemma of Understanding Protein Thermal Stability. FEBS J. 2015, 282, 3899– 3917, DOI: 10.1111/febs.13343Google Scholar103https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtFKgt7vI&md5=065a89fa9d115391b32f09d84a41fb1aRigidity versus flexibility: the dilemma of understanding protein thermal stabilityKarshikoff, Andrey; Nilsson, Lennart; Ladenstein, RudolfFEBS Journal (2015), 282 (20), 3899-3917CODEN: FJEOAC; ISSN:1742-464X. (Wiley-Blackwell)A review. The role of fluctuations in protein thermostability has recently received considerable attention. In the current literature a dualistic picture can be found as follows. On one hand, thermostability seems to be assocd. with enhanced rigidity of the protein scaffold in parallel with the redn. of flexible parts of the structure. However, in contrast with this argument it has been shown by exptl. studies and computer simulation that thermal tolerance of a protein is not necessarily correlated with the suppression of internal fluctuations and mobility. Both concepts - i.e., rigidity and flexibility - are derived from a mech. engineering perspective and represent temporally insensitive features describing static properties and neglect the notion that relative motion at certain time scales is possible in structurally stable regions of a protein. This suggests that a strict sepn. of rigid and flexible parts of a protein mol. does not correctly describe the reality of the situation. In this work the concepts of mobility/flexibility vs. rigidity will be critically reconsidered by taking into account mol. dynamics calcns. of heat capacity and conformational entropy, salt bridge networks, electrostatic interactions in folded and unfolded states, and the emerging picture of protein thermostability in view of recently developed network theories. Last, but not least, the influence of high temp. on the active site and activity of enzymes will be considered.
-
104Der, B. S.; Kluwe, C.; Miklos, A. E.; Jacak, R.; Lyskov, S.; Gray, J. J.; Georgiou, G.; Ellington, A. D.; Kuhlman, B. Alternative Computational Protocols for Supercharging Protein Surfaces for Reversible Unfolding and Retention of Stability. PLoS One 2013, 8, e64363, DOI: 10.1371/journal.pone.0064363Google Scholar104https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpslOgtLY%253D&md5=ad4d2287e1b74fc768c985f14520609aAlternative computational protocols for supercharging protein surfaces for reversible unfolding and retention of stabilityDer, Bryan S.; Kluwe, Christien; Miklos, Aleksandr E.; Jacak, Ron; Lyskov, Sergey; Gray, Jeffrey J.; Georgiou, George; Ellington, Andrew D.; Kuhlman, BrianPLoS One (2013), 8 (5), e64363CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)Reengineering protein surfaces to exhibit high net charge, referred to as "supercharging", can improve reversibility of unfolding by preventing aggregation of partially unfolded states. Incorporation of charged side chains should be optimized while considering structural and energetic consequences, as numerous mutations and accumulation of like-charges can also destabilize the native state. A previously demonstrated approach deterministically mutates flexible polar residues (amino acids DERKNQ) with the fewest av. neighboring atoms per side chain atom (AvNAPSA). Our approach uses Rosetta-based energy calcns. to choose the surface mutations. Both protocols are available for use through the ROSIE web server. The automated Rosetta and AvNAPSA approaches for supercharging choose dissimilar mutations, raising an interesting division in surface charging strategy. Rosetta-supercharged variants of GFP (RscG) ranging from -11 to -61 and +7 to +58 were exptl. tested, and for comparison, we re-tested the previously developed AvNAPSA-supercharged variants of GFP (AscG) with +36 and -30 net charge. Mid-charge variants demonstrated ∼3-fold improvement in refolding with retention of stability. However, as we pushed to higher net charges, expression and sol. yield decreased, indicating that net charge or mutational load may be limiting factors. Interestingly, the two different approaches resulted in GFP variants with similar refolding properties. Our results show that there are multiple sets of residues that can be mutated to successfully supercharge a protein, and combining alternative supercharge protocols with exptl. testing can be an effective approach for charge-based improvement to refolding.
-
105Chan, P.; Curtis, R. A.; Warwicker, J. Soluble Expression of Proteins Correlates with a Lack of Positively-Charged Surface. Sci. Rep. 2013, 3, 3333, DOI: 10.1038/srep03333Google Scholar105https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2c3gslGmsw%253D%253D&md5=f919d4ac66e3b0535b9ac0910bbd6341Soluble expression of proteins correlates with a lack of positively-charged surfaceChan Pedro; Curtis Robin A; Warwicker JimScientific reports (2013), 3 (), 3333 ISSN:.Prediction of protein solubility is gaining importance with the growing use of protein molecules as therapeutics, and ongoing requirements for high level expression. We have investigated protein surface features that correlate with insolubility. Non-polar surface patches associate to some degree with insolubility, but this is far exceeded by the association with positively-charged patches. Negatively-charged patches do not separate insoluble/soluble subsets. The separation of soluble and insoluble subsets by positive charge clustering (area under the curve for a ROC plot is 0.85) has a striking parallel with the separation that delineates nucleic acid-binding proteins, although most of the insoluble dataset are not known to bind nucleic acid. Additionally, these basic patches are enriched for arginine, relative to lysine. The results are discussed in the context of expression systems and downstream processing, contributing to a view of protein solubility in which the molecular interactions of charged groups are far from equivalent.
-
106Rezaie, E.; Mohammadi, M.; Sakhteman, A.; Bemani, P.; Ahrari, S. Application of Molecular Dynamics Simulations To Design a Dual-Purpose Oligopeptide Linker Sequence for Fusion Proteins. J. Mol. Model. 2018, 24, 313, DOI: 10.1007/s00894-018-3846-xGoogle Scholar106https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB3czps1Oktw%253D%253D&md5=53d54b6672d1e6cea66bc0be9574636fApplication of molecular dynamics simulations to design a dual-purpose oligopeptide linker sequence for fusion proteinsRezaie Ehsan; Mohammadi Mozafar; Rezaie Ehsan; Sakhteman Amirhossein; Bemani Peyman; Ahrari SajjadJournal of molecular modeling (2018), 24 (11), 313 ISSN:.Proteins are often monitored by combining a fluorescent polypeptide tag with the target protein. However, due to the high molecular weight and immunogenicity of such tags, they are not suitable choices for combining with fusion proteins such as immunotoxins. In this study, we designed a polypeptide sequence with a dual role (it acts as both a linker and a fluorescent probe) to use with fusion proteins. Two common fluorescent tag sequences based on tetracysteine were compared to a commonly used rigid linker as well as our proposed dual-purpose sequence. Computational investigations showed that the dual-purpose sequence was structurally stable and may be a good choice to use as both a linker and a fluorescence marker between two moieties in a fusion protein.
-
107Folkman, L.; Stantic, B.; Sattar, A.; Zhou, Y. EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models. J. Mol. Biol. 2016, 428, 1394– 1405, DOI: 10.1016/j.jmb.2016.01.012Google Scholar107https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsFKmsrg%253D&md5=f18493bae91d6e45eb5bdfe42249d354EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models.Folkman, Lukas; Stantic, Bela; Sattar, Abdul; Zhou, YaoqiJournal of Molecular Biology (2016), 428 (6), 1394-1405CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Protein engineering and characterization of non-synonymous single nucleotide variants (SNVs) require accurate prediction of protein stability changes (ΔΔGu) induced by single amino acid substitutions. Here, we have developed a new prediction method called Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM), which comprises five specialised support vector machine (SVM) models and makes the final prediction from a consensus of two models selected based on the predicted secondary structure and accessible surface area of the mutated residue. The new method is applicable to single-domain monomeric proteins and can predict ΔΔGu with a protein sequence and mutation as the only inputs. EASE-MM yielded a Pearson correlation coeff. of 0.53-0.59 in 10-fold cross-validation and independent testing and was able to outperform other sequence-based methods. When compared to structure-based energy functions, EASE-MM achieved a comparable or better performance. The application to a large dataset of human germline non-synonymous SNVs showed that the disease-causing variants tend to be assocd. with larger magnitudes of ΔΔGu predicted with EASE-MM. The EASE-MM web-server is available at http://sparks-lab.org/server/ease.
-
108Teng, S.; Srivastava, A. K.; Wang, L. Sequence Feature-Based Prediction of Protein Stability Changes upon Amino Acid Substitutions. BMC Genomics 2010, 11 (Suppl 2), S5, DOI: 10.1186/1471-2164-11-S2-S5Google ScholarThere is no corresponding record for this reference.
-
109Huang, L.-T.; Gromiha, M. M.; Ho, S.-Y. IPTREE-STAB: Interpretable Decision Tree Based Method for Predicting Protein Stability Changes upon Mutations. Bioinformatics 2007, 23, 1292– 1293, DOI: 10.1093/bioinformatics/btm100Google Scholar109https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXntVOjuro%253D&md5=93f9f9be58c4e5fc3091a6409d93ad60iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutationsHuang, Liang-Tsung; Gromiha, M. Michael; Ho, Shinn-YingBioinformatics (2007), 23 (10), 1292-1293CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)A web server, iPTREE-STAB is developed or discriminating the stability of proteins (stabilizing or destabilizing) and predicting their stability changes (ΔΔG) upon single amino acid substitutions from amino acid sequence. The discrimination and prediction are mainly based on decision tree coupled with adaptive boosting algorithm, and classification and regression tree, resp., using three neighboring residues of the mutant site along N- and C-terminals. Our method showed an accuracy of 82% for discriminating the stabilizing and destabilizing mutants, and a correlation of 0.70 for predicting protein stability changes upon mutations.
-
110Paladin, L.; Piovesan, D.; Tosatto, S. C. E. SODA: Prediction of Protein Solubility from Disorder and Aggregation Propensity. Nucleic Acids Res. 2017, 45, W236– W240, DOI: 10.1093/nar/gkx412Google Scholar110https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1amtbY%253D&md5=fad21a88462efc7f300fd49d3396e95aSODA: prediction of protein solubility from disorder and aggregation propensityPaladin, Lisanna; Piovesan, Damiano; Tosatto, Silvio C. E.Nucleic Acids Research (2017), 45 (W1), W236-W240CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)Soly. is an important, albeit not well understood, feature detg. protein behavior. It is of paramount importance in protein engineering, where similar folded proteins may behave in very different ways in soln. Here we present SODA, a novel method to predict the changes of protein soly. based on several physico-chem. properties of the protein. SODA uses the propensity of the protein sequence to aggregate as well as intrinsic disorder, plus hydrophobicity and secondary structure preferences to est. changes in soly. It has been trained and benchmarked on two different datasets. The comparison to other recently published methods shows that SODA has state-of-the-art performance and is particularly well suited to predict mutations decreasing soly. The method is fast, returning results for single mutations in seconds. A usage example estg. the full repertoire of mutations for a human germline antibody highlights several soly. hotspots on the surface.
-
111Liaw, A.; Wiener, M. Classification and Regression by RandomForest. R News 2002, 2, 18– 22Google ScholarThere is no corresponding record for this reference.
-
112Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5– 32, DOI: 10.1023/A:1010933404324Google ScholarThere is no corresponding record for this reference.
-
113Boughorbel, S.; Jarray, F.; El-Anbari, M. Optimal Classifier for Imbalanced Data Using Matthews Correlation Coefficient Metric. PLoS One 2017, 12, e0177678, DOI: 10.1371/journal.pone.0177678Google Scholar113https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXkvFaktLk%253D&md5=f3ed23447a504356fa60617bc836ffdfOptimal classifier for imbalanced data using Matthews Correlation Coefficient metricBoughorbel, Sabri; Jarray, Fethi; Mohammed, El-AnbariPLoS One (2017), 12 (6), e0177678/1-e0177678/17CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)Data imbalance is frequently encountered in biomedical applications. Resampling techniques can be used in binary classification to tackle this issue. However such solns. are not desired when the no. of samples in the small class is limited. Moreover the use of inadequate performance metrics, such as accuracy, lead to poor generalization results because the classifiers tend to predict the largest size class. One of the good approaches to deal with this issue is to optimize performance metrics that are designed to handle data imbalance. Matthews Correlation Coeff. (MCC) is widely used in Bioinformatics as a performance metric. We are interested in developing a new classifier based on the MCC metric to handle imbalanced data. We derive an optimal Bayes classifier for the MCC metric using an approach based on Frechet deriv. We show that the proposed algorithm has the nice theor. property of consistency. Using simulated data, we verify the correctness of our optimality result by searching in the space of all possible binary classifiers. The proposed classifier is evaluated on 64 datasets from a wide range data imbalance. We compare both classification performance and CPU efficiency for three classifiers: 1) the proposed algorithm (MCC-classifier), the Bayes classifier with a default threshold (MCC-base) and imbalanced SVM (SVM-imba). The exptl. evaluation shows that MCC-classifier has a close performance to SVM-imba while being simpler and more efficient.
-
114Ling, C. X.; Sheng, V. S. Cost-Sensitive Learning and the Class Imbalance Problem. In Encyclopedia of Machine Learning; Sammut, C., Ed.; Springer: New York, 2007.Google ScholarThere is no corresponding record for this reference.
-
115Rao, R.; Fung, G.; Rosales, R. On the Dangers of Cross-Validation. An Experimental Evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining; Society for Industrial and Applied Mathematics: Philadelphia, PA, 2008; pp 588– 596.Google ScholarThere is no corresponding record for this reference.
-
116Stephens, Z. D.; Lee, S. Y.; Faghri, F.; Campbell, R. H.; Zhai, C.; Efron, M. J.; Iyer, R.; Schatz, M. C.; Sinha, S.; Robinson, G. E. Big Data: Astronomical or Genomical?. PLoS Biol. 2015, 13, e1002195, DOI: 10.1371/journal.pbio.1002195Google Scholar116https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XktVGrsrs%253D&md5=0a81543b8015e929b89cc4bfe228a83cBig data: astronomical or genomical?Stephens, Zachary D.; Lee, Skylar Y.; Faghri, Faraz; Campbell, Roy H.; Zhai, Chengxiang; Efron, Miles J.; Iyer, Ravishankar; Schatz, Michael C.; Sinha, Saurabh; Robinson, Gene E.PLoS Biology (2015), 13 (7), e1002195/1-e1002195/11CODEN: PBLIBG; ISSN:1545-7885. (Public Library of Science)Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our ests. show that genomics is a "four-headed beast"-it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and anal. We discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the "genomical" challenges of the next decade.
-
117Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic Local Alignment Search Tool. J. Mol. Biol. 1990, 215, 403– 410, DOI: 10.1016/S0022-2836(05)80360-2Google Scholar117https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3MXitVGmsA%253D%253D&md5=009d2323eb82f0549356880e1101db16Basic local alignment search toolAltschul, Stephen F.; Gish, Warren; Miller, Webb; Myers, Eugene W.; Lipman, David J.Journal of Molecular Biology (1990), 215 (3), 403-10CODEN: JMOBAK; ISSN:0022-2836.A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent math. results on the stochastic properties of MSP scores allow an anal. of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a no. of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the anal. of multiple regions of similarity in long DNA sequences. In addn. to its flexibility and tractability to math. anal., BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
-
118Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Res. 1997, 25, 3389– 3402, DOI: 10.1093/nar/25.17.3389Google Scholar118https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXlvFyhu7w%253D&md5=4e44123e5984e4aca46a9899d347a176Gapped BLAST and PSI-BLAST: a new generation of protein database search programsAltschul, Stephen F.; Madden, Thomas L.; Schaffer, Alejandro A.; Zhang, Jinghui; Zhang, Zheng; Miller, Webb; Lipman, David J.Nucleic Acids Research (1997), 25 (17), 3389-3402CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approx. three times the speed of the original. In addn., a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approx. the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biol. relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily. The source code for the new BLAST programs is available by anonymous ftp from the machine ncbi.nlm.nih.gov, within the directory 'blast', and the programs may be run from NCBIs web site at http://www.ncbi.nlm.nih.gov/.
-
119Eddy, S. R. Profile Hidden Markov Models. Bioinformatics 1998, 14, 755– 763, DOI: 10.1093/bioinformatics/14.9.755Google Scholar119https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1MXktlCmtQ%253D%253D&md5=ff718714f195b87980385b1674a35353Profile hidden Markov modelsEddy, Sean R.Bioinformatics (1998), 14 (9), 755-763CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)A review with many refs. The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement std. pairwise comparison methods for large-scale sequence anal. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise.
-
120Remmert, M.; Biegert, A.; Hauser, A.; Söding, J. HHblits: Lightning-Fast Iterative Protein Sequence Searching by HMM–HMM Alignment. Nat. Methods 2012, 9, 173– 175, DOI: 10.1038/nmeth.1818Google Scholar120https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhs1OltbnO&md5=7173e55f4fe71458233a77c3bd38cf68HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignmentRemmert, Michael; Biegert, Andreas; Hauser, Andreas; Soeding, JohannesNature Methods (2012), 9 (2), 173-175CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)Sequence-based protein function and structure prediction depends crucially on sequence-search sensitivity and accuracy of the resulting sequence alignments. We present an open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): 'HMM-HMM-based lightning-fast iterative sequence search' (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/). Compared to the sequence-search tool PSI-BLAST, HHblits is faster owing to its discretized-profile prefilter, has 50-100% higher sensitivity and generates more accurate alignments.
-
121Pearson, W. R. An Introduction to Sequence Similarity (“Homology”) Searching. Curr. Protoc. Bioinf. 2013, 42, 3.1.1– 3.1.8, DOI: 10.1002/0471250953.bi0301s42Google ScholarThere is no corresponding record for this reference.
-
122Rost, B. Twilight Zone of Protein Sequence Alignments. Protein Eng., Des. Sel. 1999, 12, 85– 94, DOI: 10.1093/protein/12.2.85Google ScholarThere is no corresponding record for this reference.
-
123Fletcher, W.; Yang, Z. The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection. Mol. Biol. Evol. 2010, 27, 2257– 2267, DOI: 10.1093/molbev/msq115Google Scholar123https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXht1WhtL%252FK&md5=243dcf1c1aaee3f824ad895fc7bd3d57The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive SelectionFletcher, William; Yang, ZihengMolecular Biology and Evolution (2010), 27 (10), 2257-2267CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)The detection of pos. Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The "branch-site" test is designed to detect localized episodic bouts of pos. selection that affect only a few amino acid residues on particular lineages and has been shown to have reasonable power and low false-pos. rates for a wide range of selection schemes. Previous simulations examg. the performance of the test, however, were conducted under idealized conditions without insertions, deletions, or alignment errors. As the test is sometimes used to analyze divergent sequences, the impact of indels and alignment errors is a major concern. Here, we used a recently developed indel-simulation program to examine the false-pos. rate and power of the branch-site test. We find that insertions and deletions do not cause excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives. Of the alignment methods evaluated, PRANK consistently outperformed MUSCLE, MAFFT, and ClustalW, mostly because the latter programs tend to place nonhomologous codons (or amino acids) into the same column, producing shorter and less accurate alignments and giving the false impression that many amino acid substitutions have occurred at those sites. Our examn. of two previous studies suggests that alignment errors may impact the anal. of mammalian and vertebrate genes by the branch-site test, and it is important to use reliable alignment methods.
-
124Vialle, R. A.; Tamuri, A. U.; Goldman, N. Alignment Modulates Ancestral Sequence Reconstruction Accuracy. Mol. Biol. Evol. 2018, 35, 1783– 1797, DOI: 10.1093/molbev/msy055Google Scholar124https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtF2rurzO&md5=d87a9d035ac03728fc42191d94ae34d6Alignment modulates ancestral sequence reconstruction accuracyVialle, Ricardo Assuncao; Tamuri, Asif U.; Goldman, NickMolecular Biology and Evolution (2018), 35 (7), 1783-1797CODEN: MBEVEO; ISSN:1537-1719. (Oxford University Press)It relies on multiple sequence alignment (MSA) which may introduce biases, and it remains unknown how MSA methodol. approaches impact ancestral sequence reconstruction (ASR). Here, we investigate how MSA methodol. modulates ASR using a simulation study of various evolutionary scenarios. We evaluate the accuracy of ancestral protein sequence reconstruction for simulated data and compare reconstruction outcomes using different alignment methods. Our results reveal biases introduced not only by aligner algorithms and assumptions, but also tree topol. and the rate of insertions and deletions. Under many conditions we find no substantial differences between the MSAs. However, increasing the difficulty for the aligners can significantly impact ASR. The MAFFT consistency aligners and PRANK variants exhibit the best performance, whereas FSA displays limited performance. We also discover a bias towards reconstructed sequences longer than the true ancestors, deriving from a preference for inferring insertions, in almost all MSA methodol. approaches. In addn., we find measures of MSA quality generally correlate highly with reconstruction accuracy. Thus, we show MSA methodol. differences can affect the quality of reconstructions and propose MSA methods should be selected with care to accurately det. ancestral states with confidence.
-
125Chowdhury, B.; Garai, G. A Review on Multiple Sequence Alignment from the Perspective of Genetic Algorithm. Genomics 2017, 109, 419– 431, DOI: 10.1016/j.ygeno.2017.06.007Google Scholar125https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFGhsLzM&md5=ea687bedc4969e0baeb473d5c243927aA review on multiple sequence alignment from the perspective of genetic algorithmChowdhury, Biswanath; Garai, GautamGenomics (2017), 109 (5-6), 419-431CODEN: GNMCEP; ISSN:0888-7543. (Elsevier Inc.)A review. Sequence alignment is an active research area in the field of bioinformatics. It is also a crucial task as it guides many other tasks like phylogenetic anal., function, and/or structure prediction of biol. macromols. like DNA, RNA, and Protein. Proteins are the building blocks of every living organism. Although protein alignment problem has been studied for several decades, unfortunately, every available method produces alignment results differently for a single alignment problem. Multiple sequence alignment is characterized as a very high computational complex problem. Many stochastic methods, therefore, are considered for improving the accuracy of alignment. Among them, many researchers frequently use Genetic Algorithm. In this study, we have shown different types of the method applied in alignment and the recent trends in the multiobjective genetic algorithm for solving multiple sequence alignment. Many recent studies have demonstrated considerable progress in finding the alignment accuracy.
-
126Taly, J.-F.; Magis, C.; Bussotti, G.; Chang, J.-M.; Di Tommaso, P.; Erb, I.; Espinosa-Carrasco, J.; Kemena, C.; Notredame, C. Using the T-Coffee Package to Build Multiple Sequence Alignments of Protein, RNA, DNA Sequences and 3D Structures. Nat. Protoc. 2011, 6, 1669– 1682, DOI: 10.1038/nprot.2011.393Google Scholar126https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXht1yjsrjI&md5=ffd8032f578a0e00234e3ff361219c8bUsing the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structuresTaly, Jean-Francois; Magis, Cedrik; Bussotti, Giovanni; Chang, Jia-Ming; Di Tommaso, Paolo; Erb, Ionas; Espinosa-Carrasco, Jose; Kemena, Carsten; Notredame, CedricNature Protocols (2011), 6 (11), 1669-1682CODEN: NPARDW; ISSN:1750-2799. (Nature Publishing Group)T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biol. sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homol.) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homol. extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.
-
127Pei, J.; Grishin, N. V. PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information. Methods Mol. Biol. 2014, 1079, 263– 271, DOI: 10.1007/978-1-62703-646-7_17Google Scholar127https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2c7gvFOnsg%253D%253D&md5=73ceb74e9bc0c51251abf63b4e4d9bd3PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural informationPei Jimin; Grishin Nick VMethods in molecular biology (Clifton, N.J.) (2014), 1079 (), 263-71 ISSN:.Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D.
-
128Steipe, B.; Schiller, B.; Plückthun, A.; Steinbacher, S. Sequence Statistics Reliably Predict Stabilizing Mutations in a Protein Domain. J. Mol. Biol. 1994, 240, 188– 192, DOI: 10.1006/jmbi.1994.1434Google Scholar128https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2cXltlWqsLg%253D&md5=83d409e3066ec939fae03e05eaeeefb8Sequence statistics reliably predict stabilizing mutations in a protein domainSteipe, Boris; Schiller, Britta; Plueckthun, Andreas; Steinbacher, StefanJournal of Molecular Biology (1994), 240 (3), 188-92CODEN: JMOBAK; ISSN:0022-2836.Ig variable domains are generally thought of as well conserved platforms providing the base for antigen binding loops of highly varying sequence and structure. However, domain evolution must ensure a balance between optimizing antigen affinity and the requirements of a stable, cooperatively folding domain. Since random mutations can carry a significant penalty for domain stability, constraints are imposed both on the repertoire of germline sequences and on somatic amino acid replacements during affinity maturation. Analyzing these constraints in the conceptual framework of statistical mech., the authors have been able to predict stabilizing mutations in the McPC603 VK domain from sequence information alone with better than 60% success rate. The validity of this concept not only has far reaching implications for antibody engineering but may also be generalized to engineer other proteins for high stability.
-
129Sullivan, B. J.; Nguyen, T.; Durani, V.; Mathur, D.; Rojas, S.; Thomas, M.; Syu, T.; Magliery, T. J. Stabilizing Proteins from Sequence Statistics: The Interplay of Conservation and Correlation in Triosephosphate Isomerase Stability. J. Mol. Biol. 2012, 420, 384– 399, DOI: 10.1016/j.jmb.2012.04.025Google Scholar129https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XntFansb8%253D&md5=e358fa1cb59394f38ae264f104c2b3ecStabilizing Proteins from Sequence Statistics: The Interplay of Conservation and Correlation in Triosephosphate Isomerase StabilitySullivan, Brandon J.; Nguyen, Tran; Durani, Venuka; Mathur, Deepti; Rojas, Samantha; Thomas, Miriam; Syu, Trixy; Magliery, Thomas J.Journal of Molecular Biology (2012), 420 (4-5), 384-399CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Understanding the determinants of protein stability remains one of protein science's greatest challenges. There are still no computational solns. that calc. the stability effects of even point mutations with sufficient reliability for practical use. Amino acid substitutions rarely increase the stability of native proteins; hence, large libraries and high-throughput screens or selections are needed to stabilize proteins using directed evolution. Consensus mutations have proven effective for increasing stability, but these mutations are successful only about half the time. We set out to understand why some consensus mutations fail to stabilize, and what criteria might be useful to predict stabilization more accurately. Overall, consensus mutations at more conserved positions were more likely to be stabilizing in our model, triosephosphate isomerase (TIM) from Saccharomyces cerevisiae. However, positions coupled to other sites were more likely not to stabilize upon mutation. Destabilizing mutations could be removed both by removing sites with high statistical correlations to other positions and by removing nearly invariant positions at which "hidden correlations" can occur. Application of these rules resulted in identification of stabilizing mutations in 9 out of 10 positions, and amalgamation of all predicted stabilizing positions resulted in the most stable yeast TIM variant we produced (+ 8 °C). In contrast, a multimutant with 14 mutations each found to stabilize TIM independently was destabilized by 2 °C. Our results are a practical extension to the consensus concept of protein stabilization, and they further suggest the importance of positional independence in the mechanism of consensus stabilization.
-
130Lehmann, M.; Kostrewa, D.; Wyss, M.; Brugger, R.; D’Arcy, A.; Pasamontes, L.; van Loon, A. P. From DNA Sequence to Improved Functionality: Using Protein Sequence Comparisons to Rapidly Design a Thermostable Consensus Phytase. Protein Eng., Des. Sel. 2000, 13, 49– 57, DOI: 10.1093/protein/13.1.49Google ScholarThere is no corresponding record for this reference.
-
131Magliery, T. J. Protein Stability: Computation, Sequence Statistics, and New Experimental Methods. Curr. Opin. Struct. Biol. 2015, 33, 161– 168, DOI: 10.1016/j.sbi.2015.09.002Google Scholar131https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhs1SltbvJ&md5=f60fb4ac5dc13566a98015944d24ae0bProtein stability: computation, sequence statistics, and new experimental methodsMagliery, Thomas J.Current Opinion in Structural Biology (2015), 33 (), 161-168CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)A review. Calcg. protein stability and predicting stabilizing mutations remain exceedingly difficult tasks, largely due to the inadequacy of potential functions, the difficulty of modeling entropy and the unfolded state, and challenges of sampling, particularly of backbone conformations. Yet, computational design produced some remarkably stable proteins in recent years, apparently owing to near ideality in structure and sequence features. With caveats, computational prediction of stability can be used to guide mutation, and mutations derived from consensus sequence anal., esp. improved by recent co-variation filters, are very likely to stabilize without sacrificing function. The combination of computational and statistical approaches with library approaches, including new technologies such as deep sequencing and high throughput stability measurements, point to a very exciting near term future for stability engineering, even with difficult computational issues remaining.
-
132Porebski, B. T.; Buckle, A. M. Consensus Protein Design. Protein Eng., Des. Sel. 2016, 29, 245– 251, DOI: 10.1093/protein/gzw015Google Scholar132https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsF2jtr%252FO&md5=d96858b68df92bbbd0811bee8188b048Consensus protein designPorebski, Benjamin T.; Buckle, Ashley M.Protein Engineering, Design & Selection (2016), 29 (7), 245-251CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)A popular and successful strategy in semi-rational design of protein stability is the use of evolutionary information encapsulated in homologous protein sequences. Consensus design is based on the hypothesis that at a given position, the resp. consensus amino acid contributes more than av. to the stability of the protein than non-conserved amino acids. Here, we review the consensus design approach, its theor. underpinnings, successes, limitations and challenges, as well as providing a detailed guide to its application in protein engineering.
-
133Jäckel, C.; Bloom, J. D.; Kast, P.; Arnold, F. H.; Hilvert, D. Consensus Protein Design without Phylogenetic Bias. J. Mol. Biol. 2010, 399, 541– 546, DOI: 10.1016/j.jmb.2010.04.039Google Scholar133https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3cnotF2gsg%253D%253D&md5=7e4dc61c19f12f6625895e1e1c35093cConsensus protein design without phylogenetic biasJackel Christian; Bloom Jesse D; Kast Peter; Arnold Frances H; Hilvert DonaldJournal of molecular biology (2010), 399 (4), 541-6 ISSN:.Consensus design is an appealing strategy for the stabilization of proteins. It exploits amino acid conservation in sets of homologous proteins to identify likely beneficial mutations. Nevertheless, its success depends on the phylogenetic diversity of the sequence set available. Here, we show that randomization of a single protein represents a reliable alternative source of sequence diversity that is essentially free of phylogenetic bias. A small number of functional protein sequences selected from binary-patterned libraries suffice as input for the consensus design of active enzymes that are easier to produce and substantially more stable than individual members of the starting data set. Although catalytic activity correlates less consistently with sequence conservation in these extensively randomized proteins, less extreme mutagenesis strategies might be adopted in practice to augment stability while maintaining function.
-
134Goyal, V. D.; Magliery, T. J. Phylogenetic Spread of Sequence Data Affects Fitness of SOD1 Consensus Enzymes: Insights from Sequence Statistics and Structural Analyses. Proteins: Struct., Funct., Genet. 2018, 86, 609– 620, DOI: 10.1002/prot.25486Google ScholarThere is no corresponding record for this reference.
-
135Vázquez-Figueroa, E.; Chaparro-Riggers, J.; Bommarius, A. S. Development of a Thermostable Glucose Dehydrogenase by a Structure-Guided Consensus Concept. ChemBioChem 2007, 8, 2295– 2301, DOI: 10.1002/cbic.200700500Google Scholar135https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXltlenur0%253D&md5=7157c350631e7102bc5fd0b0ef8a4a7cDevelopment of a thermostable glucose dehydrogenase by a structure-guided consensus conceptVazquez-Figueroa, Eduardo; Chaparro-Riggers, Javier; Bommarius, Andreas S.ChemBioChem (2007), 8 (18), 2295-2301CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)Instability under non-native processing conditions, esp. at elevated temps., is a major factor preventing the widespread adoption of biocatalysts for industrial synthesis. A crucial distinction of many redox enzymes used to synthesize chiral compds. is the need for cofactors (e.g., NAD(P)(H)) for function. Because of the prohibitively high prices of nicotinamide cofactors, a robust cofactor-regenerating enzyme is required for the economical synthesis of fine chems. by biocatalysis. Here we test the structure-guided consensus for the generation of a thermostable glucose dehydrogenase (GDH). The consensus sequence in combination with addnl. knowledge-based criteria was used to select amino acids for substitutions. Using this approach we generated 24 variants, 11 of which showed higher thermal stability than the wild-type GDH, a success rate of 46%. Of the 24 variants, seven were located at the subunit interface-known to influence GDH stability-and six were more stable (86% success). The best variants feature a half-life of ∼3.5 days at 65°, in contrast to ∼20 min at 25° for the wild type, thus enhancing stability 106-fold. In addn., the three most stabilizing single mutations were transferred to two GDH homologs from Bacillus thuringiensis and Bacillus licheniformis. The thermal stability as measured by half-life and CD222 nm of the GDH variants was increased, as expected. The resulting stability changes provide further support for the view that these residues are crit. for stability of GDHs and reinforce the success of the consensus approach for identifying stabilizing mutations.
-
136Parthasarathy, S.; Murthy, M. R. Protein Thermal Stability: Insights from Atomic Displacement Parameters (B Values). Protein Eng., Des. Sel. 2000, 13, 9– 13, DOI: 10.1093/protein/13.1.9Google ScholarThere is no corresponding record for this reference.
-
137Cole, M. F.; Gaucher, E. A. Exploiting Models of Molecular Evolution to Efficiently Direct Protein Engineering. J. Mol. Evol. 2011, 72, 193– 203, DOI: 10.1007/s00239-010-9415-2Google Scholar137https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjvVSksb4%253D&md5=d54d91ea84b7e5660b5f2f72539c7d58Exploiting Models of Molecular Evolution to Efficiently Direct Protein EngineeringCole, Megan F.; Gaucher, Eric A.Journal of Molecular Evolution (2011), 72 (2), 193-203CODEN: JMEVAU; ISSN:0022-2844. (Springer)Directed evolution and protein engineering approaches used to generate novel or enhanced biomol. function often use the evolutionary sequence diversity of protein homologs to rationally guide library design. To fully capture this sequence diversity, however, libraries contg. millions of variants are often necessary. Screening libraries of this size is often undesirable due to inaccuracies of high-throughput assays, costs, and time constraints. The ability to effectively cull sequence diversity while still generating the functional diversity within a library thus holds considerable value. This is particularly relevant when high-throughput assays are not amenable to select/screen for certain biomol. properties. Here, we summarize our recent attempts to develop an evolution-guided approach, Reconstructing Evolutionary Adaptive Paths (REAP), for directed evolution and protein engineering that exploits phylogenetic and sequence analyses to identify amino acid substitutions that are likely to alter or enhance function of a protein. To demonstrate the utility of this technique, we highlight our previous work with DNA polymerases in which a REAP-designed small library was used to identify a DNA polymerase capable of accepting non-std. nucleosides. We anticipate that the REAP approach will be used in the future to facilitate the engineering of biopolymers with expanded functions and will thus have a significant impact on the developing field ofevolutionary synthetic biol.'.
-
138Hochberg, G. K. A.; Thornton, J. W. Reconstructing Ancient Proteins to Understand the Causes of Structure and Function. Annu. Rev. Biophys. 2017, 46, 247– 269, DOI: 10.1146/annurev-biophys-070816-033631Google Scholar138https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXksVCqs70%253D&md5=19552f9d9e82ad02000e1650203db066Reconstructing Ancient Proteins to Understand the Causes of Structure and FunctionHochberg, Georg K. A.; Thornton, Joseph W.Annual Review of Biophysics (2017), 46 (), 247-269CODEN: ARBNCV; ISSN:1936-122X. (Annual Reviews)A review. A central goal in biochem. is to explain the causes of protein sequence, structure, and function. Mainstream approaches seek to rationalize sequence and structure in terms of their effects on function and to identify function's underlying determinants by comparing related proteins to each other. Although productive, both strategies suffer from intrinsic limitations that have left important aspects of many proteins unexplained. These limits can be overcome by reconstructing ancient proteins, exptl. characterizing their properties, and retracing their evolution through time. This approach has proven to be a powerful means for discovering how historical changes in sequence produced the functions, structures, and other phys./chem. characteristics of modern proteins. It has also illuminated whether protein features evolved because of functional optimization, historical constraint, or blind chance. Here this review recent studies employing ancestral protein reconstruction and show how they have produced new knowledge not only of mol. evolutionary processes but also of the underlying determinants of modern proteins' phys., chem., and biol. properties.
-
139Aerts, D.; Verhaeghe, T.; Joosten, H.-J.; Vriend, G.; Soetaert, W.; Desmet, T. Consensus Engineering of Sucrose Phosphorylase: The Outcome Reflects the Sequence Input. Biotechnol. Bioeng. 2013, 110, 2563– 2572, DOI: 10.1002/bit.24940Google Scholar139https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXntV2nsrY%253D&md5=0eb7d3666500fcc3c3e4b2c9bf0d2726Consensus Engineering of Sucrose Phosphorylase: The Outcome Reflects the Sequence InputAerts, Dirk; Verhaeghe, Tom; Joosten, Henk-Jan; Vriend, Gert; Soetaert, Wim; Desmet, TomBiotechnology and Bioengineering (2013), 110 (10), 2563-2572CODEN: BIBIAU; ISSN:0006-3592. (John Wiley & Sons, Inc.)Consensus engineering, which is replacing amino acids by the most frequently occurring one at their positions in a multiple sequence alignment (MSA), is a known strategy to increase the stability of a protein. The application of this concept to the entire sequence of an enzyme, however, has been tried only a few times mainly because of the problems detg. the consensus in highly variable regions. We show that this problem can be solved by replacing such problematic regions by the corresponding sequence of the natural homolog closest to the consensus. When one or a few sub-families are overrepresented in the MSA the consensus sequence is a biased representation of the sequence space. We examine the influence of this bias by constructing three consensus sequences using different MSAs of sucrose phosphorylase (SP). Each consensus enzyme contained about 70 mutations compared to its closest natural homolog and folded correctly and displayed activity on sucrose. Correlation anal. revealed that the family's co-evolution network was kept intact, which is one of the main advantages of full-length consensus design. The consensus enzymes displayed an "av." thermostability, i.e., one that is higher than some but not all known representatives. We cautiously present practical rules for the design of consensus sequences, but warn that the measure of success depends on which natural enzyme is used as point of comparison.
-
140Trudeau, D. L.; Kaltenbach, M.; Tawfik, D. S. On the Potential Origins of the High Stability of Reconstructed Ancestral Proteins. Mol. Biol. Evol. 2016, 33, 2633– 2641, DOI: 10.1093/molbev/msw138Google Scholar140https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhvVKmsrrK&md5=b75ae4a443a42d2a9c427d4471386c0eOn the potential origins of the high stability of reconstructed ancestral proteinsTrudeau, Devin L.; Kaltenbach, Miriam; Tawfik, Dan S.Molecular Biology and Evolution (2016), 33 (10), 2633-2641CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)Ancestral reconstruction provides instrumental insights regarding the biochem. and biophys. characteristics of past proteins. A striking observation relates to the remarkably high thermostability of reconstructed ancestors. The latter has been linked to high environmental temps. in the Precambrian era, the era relating to most reconstructed proteins.We found that inferred ancestors of the serum paraoxonase (PON) enzyme family, including the mammalian ancestor,exhibit dramatically increased thermostabilities compared with the extant, human enzyme (up to 30 °C higher melting temp.). However, the environmental temp. at the time of emergence of mammals is presumed to be similar to the present one. Addnl., the mammalian PON ancestor has superior folding properties (kinetic stability) -unlike the extant mammalian PONs, it expresses in E. coli in a sol. and functional form, and at a high yield. We discuss two potential origins of this unexpectedly high stability. First, ancestral stability may be overestimated by a "consensuseffect," whereby replacing amino acids that are rare in contemporary sequences with the amino acid most common in the family increases protein stability. Comparison to other reconstructed ancestors indicates that the consensus effect may bias some but not all reconstructions. Second, we note that high stability may relate to factors other than high environmental temp. such as oxidative stress or high radiation levels. Foremost, intrinsic factors such as high rates of genetic mutations and/or of transcriptional and translational errors, and less efficient protein quality control systems,may underlie the high kinetic and thermodn. stability of past proteins.
-
141Wheeler, L. C.; Lim, S. A.; Marqusee, S.; Harms, M. J. The Thermostability and Specificity of Ancient Proteins. Curr. Opin. Struct. Biol. 2016, 38, 37– 43, DOI: 10.1016/j.sbi.2016.05.015Google Scholar141https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XptVCkt7w%253D&md5=0c838f70ed03739135bd21b2da42976aThe thermostability and specificity of ancient proteinsWheeler, Lucas C.; Lim, Shion A.; Marqusee, Susan; Harms, Michael J.Current Opinion in Structural Biology (2016), 38 (), 37-43CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)A review. Were ancient proteins systematically different than modern proteins. The answer to this question is profoundly important, shaping how we understand the origins of protein biochem., biophys., and functional properties. Ancestral sequence reconstruction (ASR), a phylogenetic approach to infer the sequences of ancestral proteins, may reveal such trends. We discuss two proposed trends: a transition from higher to lower thermostability and a tendency for proteins to acquire higher specificity over time. We review the evidence for elevated ancestral thermostability and discuss its possible origins in a changing environmental temp. and/or reconstruction bias. We also conclude that there is, as yet, insufficient data to support a trend from promiscuity to specificity. Finally, we propose future work to understand these proposed evolutionary trends.
-
142Yang, Z. PAML: A Program Package for Phylogenetic Analysis by Maximum Likelihood. Bioinformatics 1997, 13, 555– 556, DOI: 10.1093/bioinformatics/13.5.555Google ScholarThere is no corresponding record for this reference.
-
143Stamatakis, A. RAxML-VI-HPC: Maximum Likelihood-Based Phylogenetic Analyses with Thousands of Taxa and Mixed Models. Bioinformatics 2006, 22, 2688– 2690, DOI: 10.1093/bioinformatics/btl446Google Scholar143https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtFKlsbfI&md5=7ace2669734254992f338db53aa64702RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed modelsStamatakis, AlexandrosBioinformatics (2006), 22 (21), 2688-2690CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)RAxML-VI-HPC (randomized accelerated max. likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with max. likelihood (ML). Low-level tech. optimizations, a modification of the search algorithm, and the use of the GTR + CAT approxn. as replacement for GTR + Γ yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data contg. 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets ≥4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date contg. 25 057 (1463 bp) and 2182 (51 089 bp) taxa, resp.
-
144Huelsenbeck, J. P.; Ronquist, F.; Nielsen, R.; Bollback, J. P. Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology. Science 2001, 294, 2310– 2314, DOI: 10.1126/science.1065889Google Scholar144https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXptFGkt7k%253D&md5=e7a0aada901ae4a53ce15b47e043b436Evolution: Bayesian inference of phylogeny and its impact on evolutionary biologyHuelsenbeck, John P.; Ronquist, Fredrik; Nielsen, Rasmus; Bollback, Jonathan P.Science (Washington, DC, United States) (2001), 294 (5550), 2310-2314CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)A review. As a discipline, phylogenetics is becoming transformed by a flood of mol. data. These data allow broad questions to be asked about the history of life, but also present difficult statistical and computational problems. Bayesian inference of phylogeny brings a new perspective to a no. of outstanding issues in evolutionary biol., including the anal. of large phylogenetic trees and complex evolutionary models and the detection of the footprint of natural selection in DNA sequences.
-
145Goldstein, R. A.; Pollard, S. T.; Shah, S. D.; Pollock, D. D. Nonadaptive Amino Acid Convergence Rates Decrease over Time. Mol. Biol. Evol. 2015, 32, 1373– 1381, DOI: 10.1093/molbev/msv041Google Scholar145https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs12qurfM&md5=1adafac45f4d245090310e484216fed6Nonadaptive amino acid convergence rates decrease over timeGoldstein, R. A.; Pollard, S. T.; Shah, S. D.; Pollock, D. D.Molecular Biology and Evolution (2015), 32 (6), 1373-1381CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)Convergence is a central concept in evolutionary studies because it provides strong evidence for adaptation. It also provides information about the nature of the fitness landscape and the repeatability of evolution, and can mislead phylogenetic inference. To understand the role of adaptive convergence, we need to understand the patterns of nonadaptive convergence. Here, we consider the relationship between nonadaptive convergence and divergence in mitochondrial and model proteins. Surprisingly, nonadaptive convergence is much more common than expected in closely related organisms, falling off as organisms diverge. The extent of the convergent drop-off in mitochondrial proteins is well predicted by epistatic or coevolutionary effects in our "evolutionary Stokes shift" models and poorly predicted by conventional evolutionary models. Convergence probabilities decrease dramatically if the ancestral amino acids of branches being compared have diverged, but also drop slowly over evolutionary time even if the ancestral amino acids have not substituted. Convergence probabilities drop-off rapidly for quickly evolving sites, but much more slowly for slowly evolving sites. Furthermore, once sites have diverged their convergence probabilities are extremely low and indistinguishable from convergence levels at randomized sites. These results indicate that we cannot assume that excessive convergence early on is necessarily adaptive. This new understanding should help us to better discriminate adaptive from nonadaptive convergence and develop more relevant evolutionary models with improved validity for phylogenetic inference.
-
146Williams, P. D.; Pollock, D. D.; Blackburne, B. P.; Goldstein, R. A. Assessing the Accuracy of Ancestral Protein Reconstruction Methods. PLoS Comput. Biol. 2006, 2, e69, DOI: 10.1371/journal.pcbi.0020069Google ScholarThere is no corresponding record for this reference.
-
147Eick, G. N.; Bridgham, J. T.; Anderson, D. P.; Harms, M. J.; Thornton, J. W. Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty. Mol. Biol. Evol. 2016, 34, 247– 261, DOI: 10.1093/molbev/msw223Google ScholarThere is no corresponding record for this reference.
-
148Gaucher, E. A.; Govindarajan, S.; Ganesh, O. K. Palaeotemperature Trend for Precambrian Life Inferred from Resurrected Proteins. Nature 2008, 451, 704– 707, DOI: 10.1038/nature06510Google Scholar148https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhs1Kns7c%253D&md5=12ecd01c6a3fb6f85528bd2424518e85Palaeotemperature trend for Precambrian life inferred from resurrected proteinsGaucher, Eric A.; Govindarajan, Sridhar; Ganesh, Omjoy K.Nature (London, United Kingdom) (2008), 451 (7179), 704-707CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)Biosignatures and structures in the geol. record indicate that microbial life has inhabited Earth for ∼3.5 × 109 yr. Research in the phys. sciences has been able to generate statements about the ancient environment that hosted this life. These include the chem. compns. and temps. of the early ocean and atm. Only recently have the natural sciences been able to provide exptl. results describing the environments of ancient life. The authors' previous work with resurrected proteins indicated that ancient life lived in a hot environment. Here, the authors expand the timescale of resurrected proteins to provide a palaeotemp. trend of the environments that hosted life 3.5-0.5 × 109 yr ago. The thermostability of >25 phylogenetically dispersed ancestral elongation factors suggests that the environment supporting ancient life cooled progressively by 30° during that period. Here, the authors show that their results are robust to potential statistical bias assocd. with the posterior distribution of inferred character states, phylogenetic ambiguity, and uncertainties in the amino acid equil. frequencies used by evolutionary models. The results are further supported by a nearly identical cooling trend for the ancient ocean as inferred from the deposition of O isotopes. The convergence of results from natural and phys. sciences suggests that ancient life has continually adapted to changes in environmental temps. throughout its evolutionary history.
-
149Akanuma, S. Characterization of Reconstructed Ancestral Proteins Suggests a Change in Temperature of the Ancient Biosphere. Life (Basel, Switz.) 2017, 7, 33, DOI: 10.3390/life7030033Google Scholar149https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXjvFKgur4%253D&md5=b10716c8df6fad176f11f86ba8344fc6Characterization of reconstructed ancestral proteins suggests a change in temperature of the ancient biosphereAkanuma, SatoshiLife (Basel, Switzerland) (2017), 7 (3), 33/1-33/14CODEN: LBSIB7; ISSN:2075-1729. (MDPI AG)Understanding the evolution of ancestral life, and esp. the ability of some organisms to flourish in the variable environments experienced in Earth's early biosphere, requires knowledge of the characteristics and the environment of these ancestral organisms. Information about early life and environmental conditions has been obtained from fossil records and geol. surveys. Recent advances in phylogenetic anal., and an increasing no. of protein sequences available in public databases, have made it possible to infer ancestral protein sequences possessed by ancient organisms. However, the in silico studies that assess the ancestral base content of rRNAs, the frequency of each amino acid in ancestral proteins, and est. the environmental temps. of ancient organisms, show conflicting results. The characterization of ancestral proteins reconstructed in vitro suggests that ancient organisms had very thermally stable proteins, and therefore were thermophilic or hyperthermophilic. Exptl. data supports the idea that only thermophilic ancestors survived the catastrophic increase in temp. of the biosphere that was likely assocd. with meteorite impacts during the early history of Earth. In addn., by expanding the timescale and including more ancestral proteins for reconstruction, it appears as though the Earth's surface temp. gradually decreased over time, from Archean to present.
-
150Gumulya, Y.; Baek, J.-M.; Wun, S.-J.; Thomson, R. E. S.; Harris, K. L.; Hunter, D. J. B.; Behrendorff, J. B. Y. H.; Kulig, J.; Zheng, S.; Wu, X.; Wu, B.; Stok, J. E.; De Voss, J. J.; Schenk, G.; Jurva, U.; Andersson, S.; Isin, E. M.; Bodén, M.; Guddat, L.; Gillam, E. M. J. Engineering Highly Functional Thermostable Proteins Using Ancestral Sequence Reconstruction. Nat. Catal. 2018, 1, 878, DOI: 10.1038/s41929-018-0159-5Google Scholar150https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtFGisL3E&md5=85eca5d2a0cb9a4d6dc5e8b6e790b718Engineering highly functional thermostable proteins using ancestral sequence reconstructionGumulya, Yosephin; Baek, Jong-Min; Wun, Shun-Jie; Thomson, Raine E. S.; Harris, Kurt L.; Hunter, Dominic J. B.; Behrendorff, James B. Y. H.; Kulig, Justyna; Zheng, Shan; Wu, Xueming; Wu, Bin; Stok, Jeanette E.; De Voss, James J.; Schenk, Gerhard; Jurva, Ulrik; Andersson, Shalini; Isin, Emre M.; Boden, Mikael; Guddat, Luke; Gillam, Elizabeth M. J.Nature Catalysis (2018), 1 (11), 878-888CODEN: NCAACP; ISSN:2520-1158. (Nature Research)Com. biocatalysis requires robust enzymes that can withstand elevated temps. and long incubations. Ancestral reconstruction has shown that pre-Cambrian enzymes were often much more thermostable than extant forms. Here, we resurrect ancestral enzymes that withstand ∼30 °C higher temps. and ≥100 times longer incubations than their extant forms. This is demonstrated on animal cytochromes P 450 that stereo- and regioselectively functionalize unactivated C-H bonds for the synthesis of valuable chems., and bacterial ketol-acid reductoisomerases that are used to make butanol-based biofuels. The vertebrate CYP3 P 450 ancestor showed a 60T50 of 66 °C and enhanced solvent tolerance compared with the human drug-metabolizing CYP3A4, yet comparable activity towards a similarly broad range of substrates. The ancestral ketol-acid reductoisomerase showed an eight-fold higher specific activity than the cognate Escherichia coli form at 25 °C, which increased 3.5-fold at 50 °C. Thus, thermostable proteins can be devised using sequence data alone from even recent ancestors.
-
151Dehouck, Y.; Grosfils, A.; Folch, B.; Gilis, D.; Bogaerts, P.; Rooman, M. Fast and Accurate Predictions of Protein Stability Changes upon Mutations Using Statistical Potentials and Neural Networks: PoPMuSiC-2.0. Bioinformatics 2009, 25, 2537– 2543, DOI: 10.1093/bioinformatics/btp445Google Scholar151https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtFyhtbbF&md5=59f9acfcafd822a7f3a27eb3cf3538cdFast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0Dehouck, Yves; Grosfils, Aline; Folch, Benjamin; Gilis, Dimitri; Bogaerts, Philippe; Rooman, MarianneBioinformatics (2009), 25 (19), 2537-2543CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)The rational design of proteins with modified properties, through amino acid substitutions, is of crucial importance in a large variety of applications. Given the huge no. of possible substitutions, every protein engineering project would benefit strongly from the guidance of in silico methods able to predict rapidly, and with reasonable accuracy, the stability changes resulting from all possible mutations in a protein. The authors exploit newly developed statistical potentials, based on a formalism that highlights the coupling between 4 protein sequence and structure descriptors, and take into account the amino acid vol. variation upon mutation. The stability change is expressed as a linear combination of these energy functions, whose proportionality coeffs. vary with the solvent accessibility of the mutated residue and are identified with the help of a neural network. A correlation coeff. of R = 0.63 and a root mean square error of σc = 1.15 kcal/mol between measured and predicted stability changes are obtained upon cross-validation. These scores reach R = 0.79, and σc = 0.86 kcal/mol after exclusion of 10% outliers. The predictive power of the authors' method is shown to be significantly higher than that of other programs described in the literature.
-
152Khatun, J.; Khare, S. D.; Dokholyan, N. V. Can Contact Potentials Reliably Predict Stability of Proteins?. J. Mol. Biol. 2004, 336, 1223– 1238, DOI: 10.1016/j.jmb.2004.01.002Google Scholar152https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXht1Kiu7s%253D&md5=1c0ddc0d286bbd29f6dbde6e0572af8fCan Contact Potentials Reliably Predict Stability of Proteins?Khatun, Jainab; Khare, Sagar D.; Dokholyan, Nikolay V.Journal of Molecular Biology (2004), 336 (5), 1223-1238CODEN: JMOBAK; ISSN:0022-2836. (Elsevier)The simplest approxn. of interaction potential between amino acid residues in proteins is the contact potential, which defines the effective free energy of a protein conformation by a set of amino acid contacts formed in this conformation. Finding a contact potential capable of predicting free energies of protein states across a variety of protein families will aid protein folding and engineering in silico on a computationally tractable time-scale. We test the ability of contact potentials to accurately and transferably (across various protein families) predict stability changes of proteins upon mutations. We develop a new methodol. to det. the contact potentials in proteins from exptl. measurements of changes in protein's thermodn. stabilities (ΔΔG) upon mutations. We apply our methodol. to derive sets of contact interaction parameters for a hierarchy of interaction models including solvation and multi-body contact parameters. We test how well our models reproduce exptl. measurements by statistical tests. We evaluate the max. accuracy of predictions obtained by using contact potentials and the correlation between parameters derived from different data-sets of exptl. (ΔΔG) values. We argue that it is impossible to reach exptl. accuracy and derive fully transferable contact parameters using the contact models of potentials. However, contact parameters may yield reliable predictions of ΔΔG for datasets of mutations confined to the same amino acid positions in the sequence of a single protein.
-
153Pucci, F.; Bernaerts, K. V.; Kwasigroch, J. M.; Rooman, M. Quantification of Biases in Predictions of Protein Stability Changes upon Mutations. Bioinformatics 2018, 34, 3659– 3665, DOI: 10.1093/bioinformatics/bty348Google Scholar153https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVOmtbzF&md5=a54b52981c88512a4c3e843c8aee584bQuantification of biases in predictions of protein stability changes upon mutationsPucci, Fabrizio; Bernaerts, Katrien V.; Kwasigroch, Jean Marc; Rooman, MarianneBioinformatics (2018), 34 (21), 3659-3665CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis expts. feasible, even on a proteome scale. One of these problems is their bias toward the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations. Results: We thoroughly analyzed the biases in the prediction of folding free energy changes upon point mutations (ΔΔG°) and proposed some unbiased solns. We started by constructing a dataset Ssym of exptl. measured ΔΔG°s with an equal no. of stabilizing and destabilizing mutations, by collecting mutations for which the structure of both the wild-type and mutant protein is available. On this balanced dataset, we assessed the performances of 15 widely used ΔΔG° predictors. After the astonishing observation that almost all these methods are strongly biased toward destabilizing mutations, esp. those that use black-box machine learning, we proposed an elegant way to solve the bias issue by imposing phys. symmetries under inverse mutations on the model structure, which we implemented in PoPMuSiCsym. This new predictor constitutes an efficient trade-off between accuracy and absence of biases. Some final considerations and suggestions for further improvement of the predictors are discussed.
-
154Yin, S.; Ding, F.; Dokholyan, N. V. Eris: An Automated Estimator of Protein Stability. Nat. Methods 2007, 4, 466– 467, DOI: 10.1038/nmeth0607-466Google Scholar154https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXlvVykurg%253D&md5=63263ef71a3219de60a8faff2ca9cfe3Eris: an automated estimator of protein stabilityYin, Shuangye; Ding, Feng; Dokholyan, Nikolay V.Nature Methods (2007), 4 (6), 466-467CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)There is no expanded citation for this reference.
-
155Benedix, A.; Becker, C. M.; de Groot, B. L.; Caflisch, A.; Böckmann, R. A. Predicting Free Energy Changes Using Structural Ensembles. Nat. Methods 2009, 6, 3– 4, DOI: 10.1038/nmeth0109-3Google Scholar155https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhsFCku77K&md5=fb79d05fe6984884761d2877f454d87fPredicting free energy changes using structural ensemblesBenedix, Alexander; Becker, Caroline M.; de Groot, Bert L.; Caflisch, Amedeo; Boeckmann, Rainer A.Nature Methods (2009), 6 (1), 3-4CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)Reliable and fast computation of protein free energy is crucial for protein-structure anal., structure-based protein design and protein docking. Rigorous treatments based on phys. effective energy functions involve computationally expensive methods such as free energy perturbation, which are time-consunming and are thus incompatible with the need to perform extensive scans. Commonly used fast methods, in turn, involve empirically derived scoring functions and usually do not include protein flexibility or are based on statistical potentials and are therefore highly dependent on the availability of case-dependent exptl. training data. Hence, such methods are inherently limited in accuracy and applicability. Here we propose a computational, structure-based method named Concoord/Poisson-Boltzmann surface area (CC/PBSA) for both fast and quant. estn. of the folding free energy of mutants, that is for measuring their conformational stability and for predicting the effect of mutations on protein-protein binding affinity. The first step is to rapidly generate alternative protein conformations via the program Concoord, which efficiently samples the available configurational spaced. The crystal or NMR input structure is translated into a geometric description of the complex, and starting from random coordinates, 300-600 structures both of the mutant and the wild type are generated by iteratively correcting the coordinates until all geometric constraints are fulfilled. Then an energy function based on phys. chem. (force field) and an efficient continuum solvent approach, the soln. of the Poisson-Boltzmann equation and a term for nonpolar solvation, is averaged over the generated structural ensembles.
-
156Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS 4.5: A High-Throughput and Highly Parallel Open Source Molecular Simulation Toolkit. Bioinformatics 2013, 29, 845– 854, DOI: 10.1093/bioinformatics/btt055Google Scholar156https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXksFWmsrg%253D&md5=4b25fd6ab4e33725ae56b5da63f4ad68GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkitPronk, Sander; Pall, Szilard; Schulz, Roland; Larsson, Per; Bjelkmar, Paer; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, ErikBioinformatics (2013), 29 (7), 845-854CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Mol. simulation has historically been a low-throughput technique, but faster computers and increasing amts. of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomols. with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomol. interaction and function in a manner directly testable by expt. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomols., such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these mols. built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.
-
157de Groot, B. L.; van Aalten, D. M.; Scheek, R. M.; Amadei, A.; Vriend, G.; Berendsen, H. J. C. Prediction of Protein Conformational Freedom from Distance Constraints. Proteins: Struct., Funct., Genet. 1997, 29, 240– 251, DOI: 10.1002/(SICI)1097-0134(199710)29:2<240::AID-PROT11>3.0.CO;2-OGoogle Scholar157https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXntVOhsbY%253D&md5=8840fe5112570bbefcc0ca3e89282adaPrediction of protein conformational freedom from distance constraintsde Groot, B. L.; van Aalten, D. M. F.; Scheek, R. M.; Amadei, A.; Vriend, G.; Berendsen, H. J. C.Proteins: Structure, Function, and Genetics (1997), 29 (2), 240-251CODEN: PSFGEY; ISSN:0887-3585. (Wiley-Liss)A method is presented that generates random protein structures that fulfil a set of upper and lower interat. distance limits. These limits depend on distances measured in exptl. structures and the strength of the interat. interaction. Structural differences between generated structures are similar to those obtained from expt. and from MD simulation. Although detailed aspects of dynamical mechanisms are not covered and the extent of variations are only estd. in a relative sense, applications to an IgG-binding domain, an SH3 binding domain, HPr, calmodulin, and lysozyme are presented which illustrate the use of the method as a fast and simple way to predict structural variability in proteins. The method may be used to support the design of mutants, when structural fluctuations for a large no. of mutants are to be screened. The results suggest that motional freedom in proteins is ruled largely by a set of simple geometric constraints.
-
158Hoppe, C.; Schomburg, D. Prediction of Protein Thermostability with a Direction- and Distance-Dependent Knowledge-Based Potential. Protein Sci. 2005, 14, 2682– 2692, DOI: 10.1110/ps.04940705Google Scholar158https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhtVOrurzN&md5=b9439e5a0eca60cb33c2cbd4762ba7a4Prediction of protein thermostability with a direction- and distance-dependent knowledge-based potentialHoppe, Christian; Schomburg, DietmarProtein Science (2005), 14 (10), 2682-2692CODEN: PRCIEI; ISSN:0961-8368. (Cold Spring Harbor Laboratory Press)The increasing use of enzymes in industrial processes and the importance of understanding protein folding and stability have led to several attempts to predict and quantify the effect of every possible amino acid exchange (mutation) on the thermostability of proteins. In this article the authors describe a knowledge-based discrimination function that acts as a fast and reliable guide in protein engineering and optimization. The function used consists of two parts, a pairwise energy function based on a distance- and direction-dependent at. description of the amino acid environment, and a torsion angle energy function. In a first step a training set of 11 proteins including 646 mutant proteins with exptl. detd. thermostability was used to optimize the knowledge-based energy functions. The resulting potential function was then tested using a test mutant database consisting of 918 various point mutations introduced in 27 proteins. The best correlation coeff. obtained for the exptl. data and the predicted thermostability for the training set is r = 0.81 (561 data points). A total of 76% of the mutations could be predicted correctly as being either stabilizing or destabilizing. The results for the test set are r = 0.74 (747 data points) and 72%, resp. The global correlation over the combined data (1308 mutants) obtained is 0.78.
-
159Pucci, F.; Bourgeas, R.; Rooman, M. Predicting Protein Thermal Stability Changes upon Point Mutations Using Statistical Potentials: Introducing HoTMuSiC. Sci. Rep. 2016, 6, 23257, DOI: 10.1038/srep23257Google Scholar159https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xks1emurk%253D&md5=f945740ddb0a07c32253903a4e3cdfbdPredicting protein thermal stability changes upon point mutations using statistical potentials: Introducing HoTMuSiCPucci, Fabrizio; Bourgeas, Raphael; Rooman, MarianneScientific Reports (2016), 6 (), 23257CODEN: SRCEC3; ISSN:2045-2322. (Nature Publishing Group)The accurate prediction of the impact of an amino acid substitution on the thermal stability of a protein is a central issue in protein science, and is of key relevance for the rational optimization of various bioprocesses that use enzymes in unusual conditions. Here we present one of the first computational tools to predict the change in melting temp. ΔTm upon point mutations, given the protein structure and, when available, the melting temp. Tm of the wild-type protein. The key ingredients of our model structure are std. and temp.-dependent statistical potentials, which are combined with the help of an artificial neural network. The model structure was chosen on the basis of a detailed thermodn. anal. of the system. The parameters of the model were identified on a set of more than 1,600 mutations with exptl. measured ΔTm. The performance of our method was tested using a strict 5-fold cross-validation procedure, and was found to be significantly superior to that of competing methods. We obtained a root mean square deviation between predicted and exptl. ΔTm values of 4.2 °C that reduces to 2.9 °C when ten percent outliers are removed. A webserver-based tool is freely available for non-com. use at soft.dezyme.com.
-
160Capriotti, E.; Fariselli, P.; Casadio, R. I-Mutant2.0: Predicting Stability Changes upon Mutation from the Protein Sequence or Structure. Nucleic Acids Res. 2005, 33, W306– W310, DOI: 10.1093/nar/gki375Google Scholar160https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXlslyrtLY%253D&md5=75a8728d1e9b62a97b205910ca190d40I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structureCapriotti, Emidio; Fariselli, Piero; Casadio, RitaNucleic Acids Research (2005), 33 (Web Server), W306-W310CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)I-Mutant2.0 is a support vector machine (SVM)-based tool for the automatic prediction of protein stability changes upon single point mutations. I-Mutant2.0 predictions are performed starting either from the protein structure or, more importantly, from the protein sequence. This latter task, to the best of our knowledge, is exploited for the first time. The method was trained and tested on a data set derived from ProTherm, which is presently the most comprehensive available database of thermodn. exptl. data of free energy changes of protein stability upon mutation under different conditions. I-Mutant2.0 can be used both as a classifier for predicting the sign of the protein stability change upon mutation and as a regression estimator for predicting the related ΔΔG values. Acting as a classifier, I-Mutant2.0 correctly predicts (with a cross-validation procedure) 80% or 77% of the data set, depending on the usage of structural or sequence information, resp. When predicting ΔΔG values assocd. with mutations, the correlation of predicted with expected/exptl. values is 0.71 (with a std. error of 1.30 kcal/mol) and 0.62 (with a std. error of 1.45 kcal/mol) when structural or sequence information are resp. adopted. Our web interface allows the selection of a predictive mode that depends on the availability of the protein structure and/or sequence. In this latter case, the web server requires only pasting of a protein sequence in a raw format. We therefore introduce I-Mutant2.0 as a unique and valuable helper for protein design, even when the protein structure is not yet known with at. resoln. Availability: http://gpcr.biocomp.uniboit/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi.
-
161Cheng, J.; Randall, A.; Baldi, P. Prediction of Protein Stability Changes for Single-Site Mutations Using Support Vector Machines. Proteins: Struct., Funct., Genet. 2006, 62, 1125– 1132, DOI: 10.1002/prot.20810Google Scholar161https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XivVWnsrY%253D&md5=a14fdcf11c855a7eefbdee0ebb152aeaPrediction of protein stability changes for single-site mutations using support vector machinesCheng, Jianlin; Randall, Arlo; Baldi, PierreProteins: Structure, Function, and Bioinformatics (2006), 62 (4), 1125-1132CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. The authors use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. The authors evaluate their approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy - a significant improvement over previously published results. Moreover, the exptl. results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because the authors' method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MU-pro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.
-
162Wainreb, G.; Wolf, L.; Ashkenazy, H.; Dehouck, Y.; Ben-Tal, N. Protein Stability: A Single Recorded Mutation Aids in Predicting the Effects of Other Mutations in the Same Amino Acid Site. Bioinformatics 2011, 27, 3286– 3292, DOI: 10.1093/bioinformatics/btr576Google Scholar162https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhsFCit7vE&md5=aecb549af2b498e90d14d6ec222f6e07Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid siteWainreb, Gilad; Wolf, Lior; Ashkenazy, Haim; Dehouck, Yves; Ben-Tal, NirBioinformatics (2011), 27 (23), 3286-3292CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Accurate prediction of protein stability is important for understanding the mol. underpinnings of diseases and for the design of new proteins. We introduce a novel approach for the prediction of changes in protein stability that arise from a single-site amino acid substitution; the approach uses available data on mutations occurring in the same position and in other positions. Our algorithm, named Pro-Maya (Protein Mutant stAbilitY Analyzer), combines a collaborative filtering baseline model, Random Forests regression and a diverse set of features. Pro-Maya predicts the stability free energy difference of mutant vs. wild type, denoted as ΔΔG. Results: We evaluated our algorithm extensively using cross-validation on two previously utilized datasets of single amino acid mutations and a (third) validation set. The results indicate that using known ΔΔG values of mutations at the query position improves the accuracy of ΔΔG predictions for other mutations in that position. The accuracy of our predictions in such cases significantly surpasses that of similar methods, achieving, e.g. a Pearson's correlation coeff. of 0.79 and a root mean square error of 0.96 on the validation set. Because Pro-Maya uses a diverse set of features, including predictions using two other methods, it also performs slightly better than other methods in the absence of addnl. exptl. data on the query positions. Availability: Pro-Maya is freely available via web server at http://bentalτac.il/ProMaya. Contact: nirb@tauexτac.il; wolf@Csτac.il.
-
163Li, Y.; Fang, J. PROTS-RF: A Robust Model for Predicting Mutation-Induced Protein Stability Changes. PLoS One 2012, 7, e47247, DOI: 10.1371/journal.pone.0047247Google Scholar163https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xhs1SitLnK&md5=290f4ce672a13b0db81ce26e5bc2516dPROTS-RF: a robust model for predicting mutation-induced protein stability changesLi, Yunqi; Fang, JianwenPLoS One (2012), 7 (10), e47247CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)The ability to improve protein thermostability via protein engineering is of great scientific interest and also has significant practical value. In this report we present PROTS-RF, a robust model based on the Random Forest algorithm capable of predicting thermostability changes induced by not only single-, but also double- or multiple-point mutations. The model is built using 41 features including evolutionary information, secondary structure, solvent accessibility and a set of fragment-based features. It achieves accuracies of 0.799, 0.782, 0.787 and areas under receiver operating characteristic (ROC) curves of 0.873, 0.868 and 0.862 for single-, double- and multiple- point mutation datasets, resp. Contrary to previous suggestions, our results clearly demonstrate that a robust predictive model trained for predicting single point mutation induced thermostability changes can be capable of predicting double and multiple point mutations. It also shows high levels of robustness in the tests using hypothetical reverse mutations. We demonstrate that testing datasets created based on phys. principles can be highly useful for testing the robustness of predictive models.
-
164Quang, D.; Chen, Y.; Xie, X. DANN: A Deep Learning Approach for Annotating the Pathogenicity of Genetic Variants. Bioinformatics 2015, 31, 761– 763, DOI: 10.1093/bioinformatics/btu703Google Scholar164https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1GntLfP&md5=dbb0345a0d2f9b399bdd47e229b40755DANN: a deep learning approach for annotating the pathogenicity of genetic variantsQuang, Daniel; Chen, Yifei; Xie, XiaohuiBioinformatics (2015), 31 (5), 761-763CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Annotating genetic variants, esp. non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large no. of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative redn. in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodol.
-
165Wang, Y.; Mao, H.; Yi, Z. Protein Secondary Structure Prediction by Using Deep Learning Method. Knowl.-Based Syst. 2017, 118, 115– 123, DOI: 10.1016/j.knosys.2016.11.015Google ScholarThere is no corresponding record for this reference.
-
166Ivakhnenko, A. G. Polynomial Theory of Complex Systems. IEEE Trans. Syst., Man, Cybern. 1971, SMC-1, 364– 378, DOI: 10.1109/TSMC.1971.4308320Google ScholarThere is no corresponding record for this reference.
-
167Bengio, Y.; Boulanger-Lewandowski, N.; Pascanu, R. Advances in Optimizing Recurrent Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing; IEEE: New York, 2013; pp 8624– 8628.Google ScholarThere is no corresponding record for this reference.
-
168Cang, Z.; Wei, G.-W. TopologyNet: Topology Based Deep Convolutional and Multi-Task Neural Networks for Biomolecular Property Predictions. PLoS Comput. Biol. 2017, 13, e1005690, DOI: 10.1371/journal.pcbi.1005690Google Scholar168https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXivVWhur4%253D&md5=f09964962b86fa1f30903097cb9e7122TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictionsCang, Zixuan; Wei, Guo-WeiPLoS Computational Biology (2017), 13 (7), e1005690/1-e1005690/27CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to threedimensional (3D) biomol. structural data sets have been hindered by the geometric and biol. complexity. To address this problem we introduce the element-specific persistent homol. (ESPH) method. ESPH represents 3D complex geometry by onedimensional (1D) topol. invariants and retains important biol. information via a multichannel image-like representation. This representation reveals hidden structure-function relationships in biomols. We further integrate ESPH and deep convolutional neural networks to construct a multichannel topol. neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the deep learning limitations from small and noisy training sets, we propose a multi-task multichannel topol. convolutional neural network (MM-TCNN). We demonstrate that TopologyNet outperforms the latest methods in the prediction of protein-ligand binding affinities, mutation induced globular protein folding free energy changes, and mutation induced membrane protein folding free energy changes.
-
169Laimer, J.; Hofer, H.; Fritz, M.; Wegenkittl, S.; Lackner, P. MAESTRO - Multi Agent Stability Prediction upon Point Mutations. BMC Bioinf. 2015, 16, 116, DOI: 10.1186/s12859-015-0548-6Google Scholar169https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2MjkvFGnsQ%253D%253D&md5=531c2cd74bf7afb2770b54fa88e8b71dMAESTRO--multi agent stability prediction upon point mutationsLaimer Josef; Hofer Heidi; Lackner Peter; Laimer Josef; Fritz Marko; Wegenkittl StefanBMC bioinformatics (2015), 16 (), 116 ISSN:.BACKGROUND: Point mutations can have a strong impact on protein stability. A change in stability may subsequently lead to dysfunction and finally cause diseases. Moreover, protein engineering approaches aim to deliberately modify protein properties, where stability is a major constraint. In order to support basic research and protein design tasks, several computational tools for predicting the change in stability upon mutations have been developed. Comparative studies have shown the usefulness but also limitations of such programs. RESULTS: We aim to contribute a novel method for predicting changes in stability upon point mutation in proteins called MAESTRO. MAESTRO is structure based and distinguishes itself from similar approaches in the following points: (i) MAESTRO implements a multi-agent machine learning system. (ii) It also provides predicted free energy change (Δ ΔG) values and a corresponding prediction confidence estimation. (iii) It provides high throughput scanning for multi-point mutations where sites and types of mutation can be comprehensively controlled. (iv) Finally, the software provides a specific mode for the prediction of stabilizing disulfide bonds. The predictive power of MAESTRO for single point mutations and stabilizing disulfide bonds is comparable to similar methods. CONCLUSIONS: MAESTRO is a versatile tool in the field of stability change prediction upon point mutations. Executables for the Linux and Windows operating systems are freely available to non-commercial users from http://biwww.che.sbg.ac.at/MAESTRO.
-
170Khan, S.; Vihinen, M. Performance of Protein Stability Predictors. Hum. Mutat. 2010, 31, 675– 684, DOI: 10.1002/humu.21242Google Scholar170https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXosl2lu70%253D&md5=5d887eca5281e83e6e02d9d7e2ff1176Performance of protein stability predictorsKhan, Sofia; Vihinen, MaunoHuman Mutation (2010), 31 (6), 675-684CODEN: HUMUE3; ISSN:1059-7794. (Wiley-Liss, Inc.)Stability is a fundamental property affecting function, activity, and regulation of biomols. Stability changes are often found for mutated proteins involved in diseases. Stability predictors computationally predict protein-stability changes caused by mutations. We performed a systematic anal. of 11 online stability predictors' performances. These predictors are CUPSAT, Dmutant, FoldX, I-Mutant2.0, two versions of I-Mutant3.0 (sequence and structure versions), MultiMutate, MUpro, SCide, Scpred, and SRide. As input, 1,784 single mutations found in 80 proteins were used, and these mutations did not include those used for training. The programs' performances were also assessed according to where the mutations were found in the proteins, i.e., in secondary structures and on the surface or in the core of a protein, and according to protein structure type. The extents to which the mutations altered the occupied vols. at the residue sites and the charge interactions were also characterized. The predictions of all programs were in line with the exptl. data. I-Mutant3.0 (utilizing structural information), Dmutant, and FoldX were the most reliable predictors. The stability-center predictors performed with similar accuracy. However, at best, the predictions were only moderately accurate (∼60%) and significantly better tools would be needed for routine anal. of mutation effects.
-
171Usmanova, D. R.; Bogatyreva, N. S.; Ariño Bernad, J.; Eremina, A. A.; Gorshkova, A. A.; Kanevskiy, G. M.; Lonishin, L. R.; Meister, A. V.; Yakupova, A. G.; Kondrashov, F. A.; Ivankov, D. N. Self-Consistency Test Reveals Systematic Bias in Programs for Prediction Change of Stability upon Mutation. Bioinformatics 2018, 34, 3653– 3658, DOI: 10.1093/bioinformatics/bty340Google Scholar171https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVOmtb3M&md5=b9568a7765adfed851715d8e389c42f0Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutationUsmanova, Dinara R.; Bogatyreva, Natalya S.; Bernad, Joan Arino; Eremina, Aleksandra A.; Gorshkova, Anastasiya A.; Kanevskiy, German M.; Lonishin, Lyubov R.; Meister, Alexander V.; Yakupova, Alisa G.; Kondrashov, Fyodor A.; Ivankov, Dmitry N.Bioinformatics (2018), 34 (21), 3653-3658CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Computational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results esp. when exploring the effects of combination of different mutations. Results: Here we use a protocol to measure the bias as a function of the no. of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without exptl. measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using addnl. relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here.
-
172Montanucci, L.; Martelli, P. L.; Ben-Tal, N.; Fariselli, P. A Natural Upper Bound to the Accuracy of Predicting Protein Stability Changes upon Mutations. 2018, arXiv:1809.10389 [q-bio.BM]. arXiv.org e-Print archive. https://arxiv.org/abs/1809.10389.Google ScholarThere is no corresponding record for this reference.
-
173Rice, P.; Longden, I.; Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000, 16, 276– 277, DOI: 10.1016/S0168-9525(00)02024-2Google Scholar173https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXjvVygsbs%253D&md5=6608aa9c93ff3740ca8af20578774ebeEMBOSS: the european molecular biology open software suiteRice, Peter; Longden, Ian; Bleasby, AlanTrends in Genetics (2000), 16 (6), 276-277CODEN: TRGEE2; ISSN:0168-9525. (Elsevier Science Ltd.)There is no expanded citation for this reference.
-
174Lu, G.; Moriyama, E. N. Vector NTI, a Balanced All-in-One Sequence Analysis Suite. Briefings Bioinf. 2004, 5, 378– 388, DOI: 10.1093/bib/5.4.378Google Scholar174https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhsVejt7k%253D&md5=6b12d412ce01d84107f45d90844ca199Vector NTI, a balanced all-in-one sequence analysis suiteLu, Guoqing; Moriyama, Etsuko N.Briefings in Bioinformatics (2004), 5 (4), 378-388CODEN: BBIMFX; ISSN:1467-5463. (Henry Stewart Publications)A review. Vector NTI is a well-balanced desktop application integrated for mol. sequence anal. and biol. data management. It has a centralized database and five application modules: Vector NTI, AlignX, BioAnnotator, ContigExpress and GenomBench. The features and functions available in this software are examd. These include database management, primer design, virtual cloning, alignments, sequence assembly, 3D mol. viewer and Internet tools. Some problems encountered when using this software are also discussed. Vector NTI is a tool that can save time and enhance anal. but it requires some learning on the user's part and there are some issues that need to be addressed by the developer.
-
175Bendl, J.; Stourac, J.; Sebestova, E.; Vavra, O.; Musil, M.; Brezovsky, J.; Damborsky, J. HotSpot Wizard 2.0: Automated Design of Site-Specific Mutations and Smart Libraries in Protein Engineering. Nucleic Acids Res. 2016, 44, W479– 487, DOI: 10.1093/nar/gkw416Google Scholar175https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtV2itrfJ&md5=01158b85880a6ce74f23fa5a8ccb8fb8HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineeringBendl, Jaroslav; Stourac, Jan; Sebestova, Eva; Vavra, Ondrej; Musil, Milos; Brezovsky, Jan; Damborsky, JiriNucleic Acids Research (2016), 44 (W1), W479-W487CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)HotSpot Wizard 2.0 is a web server for automated identification of hot spots and design of smart libraries for engineering proteins' stability, catalytic activity, substrate specificity and enantioselectivity. The server integrates sequence, structural and evolutionary information obtained from 3 databases and 20 computational tools. Users are guided through the processes of selecting hot spots using four different protein engineering strategies and optimizing the resulting library's size by narrowing down a set of substitutions at individual randomized positions. The only required input is a query protein structure. The results of the calcns. are mapped onto the protein's structure and visualized with a JSmol applet. HotSpot Wizard lists annotated residues suitable for mutagenesis and can automatically design appropriate codons for each implemented strategy. Overall, HotSpot Wizard provides comprehensive annotations of protein structures and assists protein engineers with the rational design of site-specific mutations and focused libraries.
-
176Stamatakis, A. RAxML Version 8: A Tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics 2014, 30, 1312– 1313, DOI: 10.1093/bioinformatics/btu033Google Scholar176https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXmvFCjsbc%253D&md5=4cd7a44e28cbb6dc49d38056c2c3d3a7RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogeniesStamatakis, AlexandrosBioinformatics (2014), 30 (9), 1312-1313CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Phylogenies are increasingly used in all fields of medical and biol. research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under max. likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addn., an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/std.-RAxML. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
-
177Ashkenazy, H.; Penn, O.; Doron-Faigenboim, A.; Cohen, O.; Cannarozzi, G.; Zomer, O.; Pupko, T. FastML: A Web Server for Probabilistic Reconstruction of Ancestral Sequences. Nucleic Acids Res. 2012, 40, W580– 584, DOI: 10.1093/nar/gks498Google Scholar177https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXjtVCrs7Y%253D&md5=b38b2e961d01140374e2ae004157411fFastML: a web server for probabilistic reconstruction of ancestral sequencesAshkenazy, Haim; Penn, Osnat; Doron-Faigenboim, Adi; Cohen, Ofir; Cannarozzi, Gina; Zomer, Oren; Pupko, TalNucleic Acids Research (2012), 40 (W1), W580-W584CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastmlτac.il/.
-
178Diallo, A. B.; Makarenkov, V.; Blanchette, M. Ancestors 1.0: A Web Server for Ancestral Sequence Reconstruction. Bioinformatics 2010, 26, 130– 131, DOI: 10.1093/bioinformatics/btp600Google Scholar178https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhs1WlurnO&md5=97c14a9db63c10f8e238cf1a4424cd10Ancestors 1.0: a web server for ancestral sequence reconstructionDiallo, Abdoulaye Banire; Makarenkov, Vladimir; Blanchette, MathieuBioinformatics (2010), 26 (1), 130-131CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: The computational inference of ancestral genomes consists of five difficult steps: identifying syntenic regions, inferring ancestral arrangement of syntenic regions, aligning multiple sequences, reconstructing the insertion and deletion history and finally inferring substitutions. Each of these steps have received lot of attention in the past years. However, there currently exists no framework that integrates all of the different steps in an easy workflow. Here, we introduce Ancestors 1.0, a web server allowing one to easily and quickly perform the last three steps of the ancestral genome reconstruction procedure. It implements several alignment algorithms, an indel max. likelihood solver and a context-dependent max. likelihood substitution inference algorithm. The results presented by the server include the posterior probabilities for the last two steps of the ancestral genome reconstruction and the expected error rate of each ancestral base prediction.
-
179Westesson, O.; Barquist, L.; Holmes, I. HandAlign: Bayesian Multiple Sequence Alignment, Phylogeny and Ancestral Reconstruction. Bioinformatics 2012, 28, 1170– 1171, DOI: 10.1093/bioinformatics/bts058Google Scholar179https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xlt1Gms70%253D&md5=b92f47dac2f20d877638f8a313602358HandAlign: Bayesian multiple sequence alignment, phylogeny and ancestral reconstructionWestesson, Oscar; Barquist, Lars; Holmes, IanBioinformatics (2012), 28 (8), 1170-1171CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: We describe , a software package for Bayesian reconstruction of phylogenetic history. The underlying model of sequence evolution describes indels and substitutions. Alignments, trees and model parameters are all treated as jointly dependent random variables and sampled via Metropolis-Hastings Markov chain Monte Carlo (MCMC), enabling systematic statistical parameter inference and hypothesis testing. implements several different MCMC proposal kernels, allows sampling from arbitrary target distributions via Hastings ratios, and uses std. file formats for trees, alignments and models. Availability and Implementation: Installation and usage instructions are at http://biowiki.org/HandAlign Contact: [email protected] Supplementary information: Supplementary material is available at Bioinformatics online.
-
180Ronquist, F.; Teslenko, M.; van der Mark, P.; Ayres, D. L.; Darling, A.; Höhna, S.; Larget, B.; Liu, L.; Suchard, M. A.; Huelsenbeck, J. P. MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice across a Large Model Space. Syst. Biol. 2012, 61, 539– 542, DOI: 10.1093/sysbio/sys029Google Scholar180https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38vjvFCqsA%253D%253D&md5=08e0e38811e8752992234a53a0cd1d4fMrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model spaceRonquist Fredrik; Teslenko Maxim; van der Mark Paul; Ayres Daniel L; Darling Aaron; Hohna Sebastian; Larget Bret; Liu Liang; Suchard Marc A; Huelsenbeck John PSystematic biology (2012), 61 (3), 539-42 ISSN:.Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.
-
181Finn, R. D.; Clements, J.; Eddy, S. R. HMMER Web Server: Interactive Sequence Similarity Searching. Nucleic Acids Res. 2011, 39, W29– 37, DOI: 10.1093/nar/gkr367Google Scholar181https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXosVOntbg%253D&md5=69e4432be46e905b8d9afa29c667f684HMMER web server: interactive sequence similarity searchingFinn, Robert D.; Clements, Jody; Eddy, Sean R.Nucleic Acids Research (2011), 39 (Web Server), W29-W37CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted work-flows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the no. of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.
-
182Altschul, S. F.; Gertz, E. M.; Agarwala, R.; Schäffer, A. A.; Yu, Y.-K. PSI-BLAST Pseudocounts and the Minimum Description Length Principle. Nucleic Acids Res. 2009, 37, 815– 824, DOI: 10.1093/nar/gkn981Google Scholar182https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXisFektrc%253D&md5=589075aa5cc67d2dbfa12552a8a939f1PSI-BLAST pseudocounts and the minimum description length principleAltschul, Stephen F.; Gertz, E. Michael; Agarwala, Richa; Schaeffer, Alejandro A.; Yu, Yi-KuoNucleic Acids Research (2009), 37 (3), 815-824CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)Position specific score matrixes (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the obsd. amino acid counts in a multiple alignment column. In the absence of theory, the no. of pseudocounts used has been a completely empirical parameter. This article argues that the min. description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a no. of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calcg. pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default.
-
183Whitehead, T. A.; Chevalier, A.; Song, Y.; Dreyfus, C.; Fleishman, S. J.; De Mattos, C.; Myers, C. A.; Kamisetty, H.; Blair, P.; Wilson, I. A.; Baker, D. Optimization of Affinity, Specificity and Function of Designed Influenza Inhibitors Using Deep Sequencing. Nat. Biotechnol. 2012, 30, 543– 548, DOI: 10.1038/nbt.2214Google Scholar183https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XnsFKgu7s%253D&md5=510fc078ab77b487db059e932395513cOptimization of affinity, specificity and function of designed influenza inhibitors using deep sequencingWhitehead, Timothy A.; Chevalier, Aaron; Song, Yifan; Dreyfus, Cyrille; Fleishman, Sarel J.; De Mattos, Cecilia; Myers, Chris A.; Kamisetty, Hetunandan; Blair, Patrick; Wilson, Ian A.; Baker, DavidNature Biotechnology (2012), 30 (6), 543-548CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.
-
184Shimizu, Y.; Inoue, A.; Tomari, Y.; Suzuki, T.; Yokogawa, T.; Nishikawa, K.; Ueda, T. Cell-Free Translation Reconstituted with Purified Components. Nat. Biotechnol. 2001, 19, 751– 755, DOI: 10.1038/90802Google Scholar184https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXlslekt7g%253D&md5=8560f1b7319ea88b4784a4f02bafcbafCell-free translation reconstituted with purified componentsShimizu, Yoshihiro; Inoue, Akio; Tomari, Yukihide; Suzuki, Tsutomu; Yokogawa, Takashi; Nishikawa, Kazuya; Ueda, TakuyaNature Biotechnology (2001), 19 (8), 751-755CODEN: NABIF9; ISSN:1087-0156. (Nature America Inc.)We have developed a protein-synthesizing system reconstituted from recombinant tagged protein factors purified to homogeneity. The system was able to produce protein at a rate of about 160 μg/mL/h in a batch mode without the need for any supplementary app. The protein products were easily purified within 1 h using affinity chromatog. to remove the tagged protein factors. Moreover, omission of a release factor allowed efficient incorporation of an unnatural amino acid using suppressor tRNA.
-
185Niwa, T.; Kanamori, T.; Ueda, T.; Taguchi, H. Global Analysis of Chaperone Effects Using a Reconstituted Cell-Free Translation System. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 8937– 8942, DOI: 10.1073/pnas.1201380109Google Scholar185https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XovF2gtLw%253D&md5=72312246f5d49ef2d94e69dac05dca7bGlobal analysis of chaperone effects using a reconstituted cell-free translation systemNiwa, Tatsuya; Kanamori, Takashi; Ueda, Takuya; Taguchi, HidekiProceedings of the National Academy of Sciences of the United States of America (2012), 109 (23), 8937-8942, S8937/1-S8937/8CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Protein folding is often hampered by protein aggregation, which can be prevented by a variety of chaperones in the cell. A dataset that evaluates which chaperones are effective for aggregation-prone proteins would provide an invaluable resource not only for understanding the roles of chaperones, but also for broader applications in protein science and engineering. Therefore, we comprehensively evaluated the effects of the major Escherichia coli chaperones, trigger factor, DnaK/DnaJ/GrpE, and GroEL/GroES, on ∼800 aggregation-prone cytosolic E. coli proteins, using a reconstituted chaperone-free translation system. Statistical analyses revealed the robustness and the intriguing properties of chaperones. The DnaK and GroEL systems drastically increased the solubilities of hundreds of proteins with weak biases, whereas trigger factor had only a marginal effect on soly. The combined addn. of the chaperones was effective for a subset of proteins that were not rescued by any single chaperone system, supporting the synergistic effect of these chaperones. The resource, which is accessible via a public database, can be used to investigate the properties of proteins of interest in terms of their solubilities and chaperone effects.
-
186Berman, H. M.; Gabanyi, M. J.; Kouranov, A.; Micallef, D. I.; Westbrook, J. Protein Structure Initiative - TargetTrack 2000–2017 - All Data Files. DOI: 10.5281/zenodo.821654 .Google ScholarThere is no corresponding record for this reference.
-
187Price, W. N.; Handelman, S. K.; Everett, J. K.; Tong, S. N.; Bracic, A.; Luff, J. D.; Naumov, V.; Acton, T.; Manor, P.; Xiao, R.; Rost, B.; Montelione, G. T.; Hunt, J. F. Large-Scale Experimental Studies Show Unexpected Amino Acid Effects on Protein Expression and Solubility in Vivo in E. coli. Microb. Inf. Exp. 2011, 1, 6, DOI: 10.1186/2042-5783-1-6Google Scholar187https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXpt12gsbw%253D&md5=82e9dc51ba9a58313e6879a9c634717fLarge-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coliPrice, W. Nicholson, II; Handelman, Samuel K.; Everett, John K.; Tong, Saichiu N.; Bracic, Ana; Luff, Jon D.; Naumov, Victor; Acton, Thomas; Manor, Philip; Xiao, Rong; Rost, Burkhard; Montelione, Gaetano T.; Hunt, John F.Microbial Informatics and Experimentation (2011), 1 (), 6CODEN: MIEIBV; ISSN:2042-5783. (BioMed Central Ltd.)The biochem. and phys. factors controlling protein expression level and soly. in vivo remain incompletely characterized. To gain insight into the primary sequence features influencing these outcomes, we performed statistical analyses of results from the high-throughput protein-prodn. pipeline of the Northeast Structural Genomics Consortium. Proteins expressed in E. coli and consistently purified were scored independently for expression and soly. levels. These parameters nonetheless show a very strong pos. correlation. We used logistic regressions to det. whether they are systematically influenced by fractional amino acid compn. or several bulk sequence parameters including hydrophobicity, sidechain entropy, electrostatic charge, and predicted backbone disorder. Decreasing hydrophobicity correlates with higher expression and soly. levels, but this correlation apparently derives solely from the beneficial effect of three charged amino acids, at least for bacterial proteins. In fact, the three most hydrophobic residues showed very different correlations with soly. level. Leu showed the strongest neg. correlation among amino acids, while Ile showed a slightly pos. correlation in most data segments. Several other amino acids also had unexpected effects. Notably, Arg correlated with decreased expression and, most surprisingly, soly. of bacterial proteins, an effect only partially attributable to rare codons. However, rare codons did significantly reduce expression despite use of a codon-enhanced strain. Addnl. analyses suggest that pos. but not neg. charged amino acids may reduce translation efficiency in E. coli irresp. of codon usage. While some obsd. effects may reflect indirect evolutionary correlations, others may reflect basic physicochem. phenomena. We used these results to construct and validate predictors of expression and soly. levels and overall protein usability, and we propose new strategies to be explored for engineering improved protein expression and soly.
-
188Hirose, S.; Kawamura, Y.; Yokota, K.; Kuroita, T.; Natsume, T.; Komiya, K.; Tsutsumi, T.; Suwa, Y.; Isogai, T.; Goshima, N.; Noguchi, T. Statistical Analysis of Features Associated with Protein Expression/Solubility in an in Vivo Escherichia coli Expression System and a Wheat Germ Cell-Free Expression System. J. Biochem. 2011, 150, 73– 81, DOI: 10.1093/jb/mvr042Google Scholar188https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXosVOnsbg%253D&md5=7330f2d39d93e7641ee73536e6faee97Statistical analysis of features associated with protein expression/solubility in an in vivo Escherichia coli expression system and a wheat germ cell-free expression systemHirose, Shuichi; Kawamura, Yoshifumi; Yokota, Kiyonobu; Kuroita, Toshihiro; Natsume, Tohru; Komiya, Kazuo; Tsutsumi, Takeshi; Suwa, Yorimasa; Isogai, Takao; Goshima, Naoki; Noguchi, TamotsuJournal of Biochemistry (2011), 150 (1), 73-81CODEN: JOBIAO; ISSN:0021-924X. (Japanese Biochemical Society)Recombinant protein technol. is an important tool in many industrial and pharmacol. applications. Although the success rate of obtaining sol. proteins is relatively low, knowledge of protein expression/soly. under std.' conditions may increase the efficiency and reduce the cost of proteomics studies. In this study, we conducted a genome-scale expt. to assess the overexpression and the soly. of human full-length cDNA in an in vivo Escherichia coli expression system and a wheat germ cell-free expression system. We evaluated the influences of sequence and structural features on protein expression/soly. in each system and estd. a minimal set of features assocd. with them. A comparison of the feature sets related to protein expression/soly. in the in vivo Escherichia coli expression system revealed that the structural information was strongly assocd. with protein expression, rather than protein soly. Moreover, a significant difference was found in the no. of features assocd. with protein soly. in the two expression systems.
-
189Pawlicki, S.; Le Béchec, A.; Delamarche, C. AMYPdb: A Database Dedicated to Amyloid Precursor Proteins. BMC Bioinf. 2008, 9, 273, DOI: 10.1186/1471-2105-9-273Google Scholar189https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1cvis1Kqsg%253D%253D&md5=066a0a7b2527a74deb78bad957070fc4AMYPdb: a database dedicated to amyloid precursor proteinsPawlicki Sandrine; Le Bechec Antony; Delamarche ChristianBMC bioinformatics (2008), 9 (), 273 ISSN:.BACKGROUND: Misfolding and aggregation of proteins into ordered fibrillar structures is associated with a number of severe pathologies, including Alzheimer's disease, prion diseases, and type II diabetes. The rapid accumulation of knowledge about the sequences and structures of these proteins allows using of in silico methods to investigate the molecular mechanisms of their abnormal conformational changes and assembly. However, such an approach requires the collection of accurate data, which are inconveniently dispersed among several generalist databases. RESULTS: We therefore created a free online knowledge database (AMYPdb) dedicated to amyloid precursor proteins and we have performed large scale sequence analysis of the included data. Currently, AMYPdb integrates data on 31 families, including 1,705 proteins from nearly 600 organisms. It displays links to more than 2,300 bibliographic references and 1,200 3D-structures. A Wiki system is available to insert data into the database, providing a sharing and collaboration environment. We generated and analyzed 3,621 amino acid sequence patterns, reporting highly specific patterns for each amyloid family, along with patterns likely to be involved in protein misfolding and aggregation. CONCLUSION: AMYPdb is a comprehensive online database aiming at the centralization of bioinformatic data regarding all amyloid proteins and their precursors. Our sequence pattern discovery and analysis approach unveiled protein regions of significant interest. AMYPdb is freely accessible 1.
-
190Thompson, M. J.; Sievers, S. A.; Karanicolas, J.; Ivanova, M. I.; Baker, D.; Eisenberg, D. The 3D Profile Method for Identifying Fibril-Forming Segments of Proteins. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 4074– 4078, DOI: 10.1073/pnas.0511295103Google Scholar190https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XivFWitbo%253D&md5=e9bbb052fa861fe0f2ac116efeedaa23The 3D profile method for identifying fibril-forming segments of proteinsThompson, Michael J.; Sievers, Stuart A.; Karanicolas, John; Ivanova, Magdalena I.; Baker, David; Eisenberg, DavidProceedings of the National Academy of Sciences of the United States of America (2006), 103 (11), 4074-4078CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Based on the crystal structure of the cross-β spine formed by the peptide NNQQNY, we have developed a computational approach for identifying those segments of amyloidogenic proteins that themselves can form amyloid-like fibrils. The approach builds on expts. showing that hexapeptides are sufficient for forming amyloid-like fibrils. Each six-residue peptide of a protein of interest is mapped onto an ensemble of templates, or 3D profile, generated from the crystal structure of the peptide NNQQNY by small displacements of one of the two intermeshed β-sheets relative to the other. The energy of each mapping of a sequence to the profile is evaluated by using ROSETTADESIGN, and the lowest energy match for a given peptide to the template library is taken as the putative prediction. If the energy of the putative prediction is lower than a threshold value, a prediction of fibril formation is made. This method can reach an accuracy of ≈80% with a P value of ≈10-12 when a conservative energy threshold is used to sep. peptides that form fibrils from those that do not. We see enrichment for pos. predictions in a set of fibril-forming segments of amyloid proteins, and we illustrate the method with applications to proteins of interest in amyloid research.
-
191Beerten, J.; Van Durme, J.; Gallardo, R.; Capriotti, E.; Serpell, L.; Rousseau, F.; Schymkowitz, J. WALTZ-DB: A Benchmark Database of Amyloidogenic Hexapeptides. Bioinformatics 2015, 31, 1698– 1700, DOI: 10.1093/bioinformatics/btv027Google Scholar191https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1GntLbL&md5=dea8cf53396bc57a03ab464287bca20cWALTZ-DB: a benchmark database of amyloidogenic hexapeptidesBeerten, Jacinte; Van Durme, Joost; Gallardo, Rodrigo; Capriotti, Emidio; Serpell, Louise; Rousseau, Frederic; Schymkowitz, JoostBioinformatics (2015), 31 (10), 1698-1700CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: Accurate prediction of amyloid-forming amino acid sequences remains an important challenge. We here present an online database that provides open access to the largest set of exptl. characterized amyloid forming hexapeptides. To this end, we expanded our previous set of 280 hexapeptides used to develop the Waltz algorithm with 89 peptides from literature review and by systematic exptl. characterization of the aggregation of 720 hexapeptides by transmission electron microscopy, dye binding and Fourier transform IR spectroscopy. This brings the total no. of exptl. characterized hexapeptides in the WALTZ-DB database to 1089, of which 244 are annotated as pos. for amyloid formation.
-
192Wozniak, P. P.; Kotulska, M. AmyLoad: Website Dedicated to Amyloidogenic Protein Fragments. Bioinformatics 2015, 31, 3395– 3397, DOI: 10.1093/bioinformatics/btv375Google Scholar192https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1Cit7zK&md5=6a5be50aa459e25d0138ffd3226de846AmyLoad: website dedicated to amyloidogenic protein fragmentsWozniak, Pawel P.; Kotulska, MalgorzataBioinformatics (2015), 31 (20), 3395-3397CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Analyses of amyloidogenic sequence fragments are essential in studies of neurodegenerative diseases. However, there is no one internet dataset that collects all the sequences that have been investigated for their amyloidogenicity. Therefore, we have created the AmyLoad website which collects the amyloidogenic sequences from all major sources. The website allows for filtration of the fragments and provides detailed information about each of them. Registered users can both personalize their work with the website and submit their own sequences into the database. To maintain database reliability, submitted sequences are reviewed before making them available to the public. Finally, we re-implemented several amyloidogenic sequence predictors, thus the AmyLoad website can be used as a sequence anal. tool. We encourage researchers working on amyloid proteins to contribute to our service.
-
193Sastry, A.; Monk, J.; Tegel, H.; Uhlen, M.; Palsson, B. O.; Rockberg, J.; Brunk, E. Machine Learning in Computational Biology to Accelerate High-Throughput Protein Expression. Bioinformatics 2017, 33, 2487– 2495, DOI: 10.1093/bioinformatics/btx207Google Scholar193https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1cvmslaguw%253D%253D&md5=df1098665ccb1c5b077d6c6887322336Machine learning in computational biology to accelerate high-throughput protein expressionSastry Anand; Monk Jonathan; Palsson Bernhard O; Brunk Elizabeth; Tegel Hanna; Uhlen Mathias; Rockberg Johan; Uhlen Mathias; Palsson Bernhard O; Brunk ElizabethBioinformatics (Oxford, England) (2017), 33 (16), 2487-2495 ISSN:.Motivation: The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility. Results: Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation. Availability and implementation: We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template for analysis of further expression and solubility datasets. Contact: [email protected] or [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online.
-
194Thangakani, A. M.; Nagarajan, R.; Kumar, S.; Sakthivel, R.; Velmurugan, D.; Gromiha, M. M. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation. PLoS One 2016, 11, e0152949, DOI: 10.1371/journal.pone.0152949Google Scholar194https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1Gns7bK&md5=25848d120280c0afb71e16fbe96f918dCPAD, curated protein aggregation database: a repository of manually curated experimental data on protein and peptide aggregationThangakani, A. Mary; Nagarajan, R.; Kumar, Sandeep; Sakthivel, R.; Velmurugan, D.; Gromiha, M. MichaelPLoS One (2016), 11 (4), e0152949/1-e0152949/7CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are crit. to research on protein misfolding diseases, such as Alzheimer's and Parkinson's, as well as biotechnol. prodn. of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from exptl. studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 exptl. obsd. aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of addnl. information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Exptl. validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways.
-
195Tian, Y.; Deutsch, C.; Krishnamoorthy, B. Scoring Function To Predict Solubility Mutagenesis. Algorithms Mol. Biol. 2010, 5, 33, DOI: 10.1186/1748-7188-5-33Google Scholar195https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3cbgvVOltA%253D%253D&md5=8bdcda410281dcee011391f17e78febfScoring function to predict solubility mutagenesisTian Ye; Deutsch Christopher; Krishnamoorthy BalaAlgorithms for molecular biology : AMB (2010), 5 (), 33 ISSN:.BACKGROUND: Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention. RESULTS: We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%. AVAILABILITY: Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html.
-
196Wilkinson, D. L.; Harrison, R. G. Predicting the Solubility of Recombinant Proteins in Escherichia coli. Nat. Biotechnol. 1991, 9, 443– 448, DOI: 10.1038/nbt0591-443Google Scholar196https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK38Xjt1an&md5=b522dcdccd3f0c40b85d10cd5df10826Predicting the solubility of recombinant proteins in Escherichia coliWilkinson, David L.; Harrison, Roger G.Bio/Technology (1991), 9 (5), 443-8CODEN: BTCHDA; ISSN:0733-222X.The cause of inclusion body formation in E. coli grown at 37° was studied using statistical anal. of the compn. of 81 proteins that do and do not form inclusion bodies. Six compn. derived parameters were used. In declining order of their correlation with inclusion body formation, the parameters are charge av., turn forming residue fraction, cysteine fraction, proline fraction, hydrophilicity, and total no. of residues. The correlation with inclusion body formation is strong for the 1st 2 parameters but weak for the last 4. This correlation can be used to predict the probability that a protein will form inclusion bodies using only the protein's amino acid compn. as the basis for the prediction.
-
197Davis, G. D.; Elisee, C.; Newham, D. M.; Harrison, R. G. New Fusion Protein Systems Designed to Give Soluble Expression in Escherichia coli. Biotechnol. Bioeng. 1999, 65, 382– 388, DOI: 10.1002/(SICI)1097-0290(19991120)65:4<382::AID-BIT2>3.0.CO;2-IGoogle Scholar197https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1MXmslGktr8%253D&md5=3c966e2554f136b96e47e14c11680506New fusion protein systems designed to give soluble expression in Escherichia coliDavis, Gregory D.; Elisee, Claude; Newham, Denton M.; Harrison, Roger G.Biotechnology and Bioengineering (1999), 65 (4), 382-388CODEN: BIBIAU; ISSN:0006-3592. (John Wiley & Sons, Inc.)Three native E. coli proteins-NusA, GrpE, and bacterioferritin (BFR)-were studied in fusion proteins expressed in E. coli for their ability to confer soly. on a target insol. protein at the C-terminus of the fusion protein. These three proteins were chosen based on their favorable cytoplasmic soly. characteristics as predicted by a statistical soly. model for recombinant proteins in E. coli. Modeling predicted the probability of sol. fusion protein expression for the target insol. protein human interleukin-3 (hIL-3) in the following order: NusA (most sol.), GrpE, BFR, and thioredoxin (least sol.). Expression expts. at 37° showed that the NusA/hIL-3 fusion protein was expressed almost completely in the sol. fraction, while GrpE/hIL-3 and BFR/hIL-3 exhibited partial soly. at 37°. Thioredoxin/hIL-3 was expressed almost completely in the insol. fraction. Fusion proteins consisting of NusA and either bovine growth hormone or human interferon-γ were also expressed in E. coli at 37° and again showed that the fusion protein was almost completely sol. Starting with the NusA/hIL-3 fusion protein with an N-terminal histidine tag, purified hIL-3 with full biol. activity was obtained using immobilized metal affinity chromatog., factor Xa protease cleavage, and anion exchange chromatog.
-
198Magnan, C. N.; Randall, A.; Baldi, P. SOLpro: Accurate Sequence-Based Prediction of Protein Solubility. Bioinformatics 2009, 25, 2200– 2207, DOI: 10.1093/bioinformatics/btp386Google Scholar198https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtVelu7fE&md5=7c24ccbf700c19b311ecd42abe49ec4aSOLpro: accurate sequence-based prediction of protein solubilityMagnan, Christophe N.; Randall, Arlo; Baldi, PierreBioinformatics (2009), 25 (17), 2200-2207CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Protein insoly. is a major obstacle for many exptl. studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be sol. on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the soly. of insol. proteins. Here, the authors first curate a large, non-redundant and balanced training set of more than 17 000 proteins. Next, the authors ext. and study 23 groups of features computed directly or predicted (e.g. secondary structure) from the primary sequence. The data and the features are used to train a two-stage support vector machine (SVM) architecture. The resulting predictor, SOLpro, is compared directly with existing methods and shows significant improvement according to std. evaluation metrics, with an overall accuracy of over 74% estd. using multiple runs of 10-fold cross-validation. SOLpro is integrated in the SCRATCH suite of predictors and is available for download as a standalone application and as a web server at: http://scratch.proteomics.ics.uci.edu.
-
199Smialowski, P.; Doose, G.; Torkler, P.; Kaufmann, S.; Frishman, D. PROSO II—A New Method for Protein Solubility Prediction. FEBS J. 2012, 279, 2192– 2200, DOI: 10.1111/j.1742-4658.2012.08603.xGoogle Scholar199https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xps12qtrs%253D&md5=e80fc695e7ec155c3e173d218793f10fPROSO II - a new method for protein solubility predictionSmialowski, Pawel; Doose, Gero; Torkler, Phillipp; Kaufmann, Stefanie; Frishman, DmitrijFEBS Journal (2012), 279 (12), 2192-2200CODEN: FJEOAC; ISSN:1742-464X. (Wiley-Blackwell)Many fields of science and industry depend on efficient prodn. of active protein using heterologous expression in Escherichia coli. The soly. of proteins upon expression is dependent on their amino acid sequence. Prediction of soly. from sequence is therefore highly valuable. We present a novel machine-learning-based model called PROSO II which makes use of new classification methods and growth in exptl. data to improve coverage and accuracy of soly. predictions. The classification algorithm is organized as a two-layered structure in which the output of a primary Parzen window model for sequence similarity and a logistic regression classifier of amino acid k-mer compn. serve as input for a second-level logistic regression classifier. Compared with previously published research our model is trained on five times more data than used by any other method before (82,000 proteins). When tested on a sep. holdout set not used at any point of method development our server attained the best results in comparison with other currently available methods: accuracy 75.4%, Matthew's correlation coeff. 0.39, sensitivity 0.731, specificity 0.759, gain (sol.) 2.263. In summary, due to utilization of cutting edge machine learning technologies combined with the largest currently available exptl. data set the PROSO II server constitutes a substantial improvement in protein soly. predictions.
-
200Agostini, F.; Cirillo, D.; Livi, C. M.; Delli Ponti, R.; Tartaglia, G. G. CcSOL Omics: A Webserver for Solubility Prediction of Endogenous and Heterologous Expression in Escherichia coli. Bioinformatics 2014, 30, 2975– 2977, DOI: 10.1093/bioinformatics/btu420Google Scholar200https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhtFOrt7nP&md5=ff80067f4bfe752df02b81f0db836b99ccSOL omics: a webserver for solubility prediction of endogenous and heterologous expression in Escherichia coliAgostini, Federico; Cirillo, Davide; Livi, Carmen Maria; Delli Ponti, Riccardo; Tartaglia, Gian GaetanoBioinformatics (2014), 30 (20), 2975-2977CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: Here we introduce ccSOL omics, a webserver for largescale calcns. of protein soly. Our method allows (i) proteome- wide predictions; (ii) identification of sol. fragments within each sequences; (iii) exhaustive single-point mutation anal. Results: Using coil/disorder, hydrophobicity, hydrophilicity, β-sheet and α-helix propensities, we built a predictor of protein soly. Our approach shows an accuracy of 79% on the training set (36 990 Target Track entries). Validation on three independent sets indicates that ccSOL omics discriminates sol. and insol. proteins with an accuracy of 74% on 31 760 proteins sharing 530% sequence similarity.
-
201Khurana, S.; Rawi, R.; Kunji, K.; Chuang, G.-Y.; Bensmail, H.; Mall, R. DeepSol: A Deep Learning Framework for Sequence-Based Protein Solubility Prediction. Bioinformatics 2018, 34, 2605– 2613, DOI: 10.1093/bioinformatics/bty166Google Scholar201https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVWis7fO&md5=7b526bdc5291f8cf9eec6d1f13ad1289DeepSol: a deep learning framework for sequence-based protein solubility predictionKhurana, Sameer; Rawi, Reda; Kunji, Khalid; Chuang, Gwo-Yu; Bensmail, Halima; Mall, RaghvendraBioinformatics (2018), 34 (15), 2605-2613CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Protein soly. plays a vital role in pharmaceutical research and prodn. yield. For a given protein, the extent of its soly. can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein soly. predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein soly. predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and addnl. sequence and structural features extd. from the protein sequence. Results: DeepSol outperformed all known sequence-based state-of-the-art soly. prediction methods and attained an accuracy of 0.77 and Matthew's correlation coeff. of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced prodn. capacity and can more reliably predict soly. of novel proteins.
-
202Chang, C. C. H.; Li, C.; Webb, G. I.; Tey, B.; Song, J.; Ramanan, R. N. Periscope: Quantitative Prediction of Soluble Protein Expression in the Periplasm of Escherichia coli. Sci. Rep. 2016, 6, 21844, DOI: 10.1038/srep21844Google Scholar202https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XjsFers7w%253D&md5=def551b2e361651ff1400145da18de96Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coliChang, Catherine Ching Han; Li, Chen; Webb, Geoffrey I.; Tey, Beng Ti; Song, Jiangning; Ramanan, Ramakrishnan NagasundaraScientific Reports (2016), 6 (), 21844CODEN: SRCEC3; ISSN:2045-2322. (Nature Publishing Group)Periplasmic expression of sol. proteins in Escherichia coli not only offers a much-simplified downstream purifn. process, but also enhances the probability of obtaining correctly folded and biol. active proteins. Different combinations of signal peptides and target proteins lead to different sol. protein expression levels, ranging from negligible to several grams per L. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier dets. which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson's correlation coeff. (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/.
-
203Hirose, S.; Noguchi, T. ESPRESSO: A System for Estimating Protein Expression and Solubility in Protein Expression Systems. Proteomics 2013, 13, 1444– 1456, DOI: 10.1002/pmic.201200175Google Scholar203https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmtV2lurk%253D&md5=adcfeb20aa6d4a259d19fe4c7f88c9e7ESPRESSO: A system for estimating protein expression and solubility in protein expression systemsHirose, Shuichi; Noguchi, TamotsuProteomics (2013), 13 (9), 1444-1456CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)Recombinant protein technol. is essential for conducting protein science and using proteins as materials in pharmaceutical or industrial applications. Although obtaining sol. proteins is still a major exptl. obstacle, knowledge about protein expression/soly. under std. conditions may increase the efficiency and reduce the cost of proteomics studies. In this study, we present a computational approach to est. the probability of protein expression and soly. for two different protein expression systems: in vivo Escherichia coli and wheat germ cell-free, from only the sequence information. It implements two kinds of methods: a sequence/predicted structural property-based method that uses both the sequence and predicted structural features, and a sequence pattern-based method that utilizes the occurrence frequencies of sequence patterns. In the benchmark test, the proposed methods obtained F-scores of around 70%, and outperformed publicly available servers. Applying the proposed methods to genomic data revealed that proteins assocd. with translation or transcription have a strong tendency to be expressed as sol. proteins by the in vivo E. coli expression system. The sequence pattern-based method also has the potential to indicate a candidate region for modification, to increase protein soly. All methods are available for free at the ESPRESSO server (http://mbs.cbrc.jp/ESPRESSO).
-
204Hon, J.; Marusiak, M.; Martinek, T.; Zendulka, J.; Bednar, D.; Damborsky, J. SoluProt: Prediction of Protein Solubility. Nucleic Acids Res. 2018, in preparationGoogle ScholarThere is no corresponding record for this reference.
-
205DuBay, K. F.; Pawar, A. P.; Chiti, F.; Zurdo, J.; Dobson, C. M.; Vendruscolo, M. Prediction of the Absolute Aggregation Rates of Amyloidogenic Polypeptide Chains. J. Mol. Biol. 2004, 341, 1317– 1326, DOI: 10.1016/j.jmb.2004.06.043Google Scholar205https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXmsVCrtb4%253D&md5=6069c47fb2fff331a1a037c8345cd72fPrediction of the Absolute Aggregation Rates of Amyloidogenic Polypeptide ChainsDuBay, Kateri F.; Pawar, Amol P.; Chiti, Fabrizio; Zurdo, Jesus; Dobson, Christopher M.; Vendruscolo, MicheleJournal of Molecular Biology (2004), 341 (5), 1317-1326CODEN: JMOBAK; ISSN:0022-2836. (Elsevier)Protein aggregation is assocd. with a variety of pathol. conditions, including Alzheimer's and Creutzfeldt-Jakob diseases and type II diabetes. Such degenerative disorders result from the conversion of the normal sol. state of specific proteins into aggregated states that can ultimately form the characteristic amyloid fibrils found in diseased tissue. Under appropriate conditions it appears that many, perhaps all, proteins can be converted in vitro into amyloid fibrils. The aggregation propensities of different polypeptide chains have, however, been obsd. to vary substantially. Here, we describe an approach that uses the knowledge of the amino acid sequence and of the exptl. conditions to reproduce, with a correlation coeff. of 0.92 and over five orders of magnitude, the in vitro aggregation rates of a wide range of unstructured peptides and proteins. These results indicate that the formation of protein aggregates can be rationalized to a considerable extent in terms of simple physico-chem. parameters that describe the properties of polypeptide chains and their environment.
-
206Tartaglia, G. G.; Pawar, A. P.; Campioni, S.; Dobson, C. M.; Chiti, F.; Vendruscolo, M. Prediction of Aggregation-Prone Regions in Structured Proteins. J. Mol. Biol. 2008, 380, 425– 436, DOI: 10.1016/j.jmb.2008.05.013Google Scholar206https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXnt1eltrg%253D&md5=602438424a74012b2c2fd0b17ce944d4Prediction of Aggregation-Prone Regions in Structured ProteinsTartaglia, Gian Gaetano; Pawar, Amol P.; Campioni, Silvia; Dobson, Christopher M.; Chiti, Fabrizio; Vendruscolo, MicheleJournal of Molecular Biology (2008), 380 (2), 425-436CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)We present a method for predicting the regions of the sequences of peptides and proteins that are most important in promoting their aggregation and amyloid formation. The method extends previous approaches by allowing such predictions to be carried out for conditions under which the mols. concerned can be folded or contain a significant degree of persistent structure. In order to achieve this result, the method uses only knowledge of the sequence of amino acids to est. simultaneously both the propensity for folding and aggregation and the way in which these two types of propensity compete. We illustrate the approach by its application to a set of peptides and proteins both assocd. and not assocd. with disease. Our results show not only that the regions of a protein with a high intrinsic aggregation propensity can be identified in a robust manner but also that the structural context of such regions in the monomeric form is crucial for detg. their actual role in the aggregation process.
-
207Conchillo-Solé, O.; de Groot, N. S.; Avilés, F. X.; Vendrell, J.; Daura, X.; Ventura, S. AGGRESCAN: A Server for the Prediction and Evaluation of “Hot Spots” of Aggregation in Polypeptides. BMC Bioinf. 2007, 8, 65, DOI: 10.1186/1471-2105-8-65Google Scholar207https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2s7ntFCjtg%253D%253D&md5=45a7bdfb4bdda006778830f70a5cc030AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptidesConchillo-Sole Oscar; de Groot Natalia S; Aviles Francesc X; Vendrell Josep; Daura Xavier; Ventura SalvadorBMC bioinformatics (2007), 8 (), 65 ISSN:.BACKGROUND: Protein aggregation correlates with the development of several debilitating human disorders of growing incidence, such as Alzheimer's and Parkinson's diseases. On the biotechnological side, protein production is often hampered by the accumulation of recombinant proteins into aggregates. Thus, the development of methods to anticipate the aggregation properties of polypeptides is receiving increasing attention. AGGRESCAN is a web-based software for the prediction of aggregation-prone segments in protein sequences, the analysis of the effect of mutations on protein aggregation propensities and the comparison of the aggregation properties of different proteins or protein sets. RESULTS: AGGRESCAN is based on an aggregation-propensity scale for natural amino acids derived from in vivo experiments and on the assumption that short and specific sequence stretches modulate protein aggregation. The algorithm is shown to identify a series of protein fragments involved in the aggregation of disease-related proteins and to predict the effect of genetic mutations on their deposition propensities. It also provides new insights into the differential aggregation properties displayed by globular proteins, natively unfolded polypeptides, amyloidogenic proteins and proteins found in bacterial inclusion bodies. CONCLUSION: By identifying aggregation-prone segments in proteins, AGGRESCAN http://bioinf.uab.es/aggrescan/ shall facilitate (i) the identification of possible therapeutic targets for anti-depositional strategies in conformational diseases and (ii) the anticipation of aggregation phenomena during storage or recombinant production of bioactive polypeptides or polypeptide sets.
-
208Fernandez-Escamilla, A.-M.; Rousseau, F.; Schymkowitz, J.; Serrano, L. Prediction of Sequence-Dependent and Mutational Effects on the Aggregation of Peptides and Proteins. Nat. Biotechnol. 2004, 22, 1302– 1306, DOI: 10.1038/nbt1012Google Scholar208https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXotFGqtb8%253D&md5=ce1f751f3691066ec1bc6ce5caed6aaePrediction of sequence-dependent and mutational effects on the aggregation of peptides and proteinsFernandez-Escamilla, Ana-Maria; Rousseau, Frederic; Schymkowitz, Joost; Serrano, LuisNature Biotechnology (2004), 22 (10), 1302-1306CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)A statistical mechanics algorithm, TANGO, is developed to predict protein aggregation. TANGO is based on the physico-chem. principles of β-sheet formation, extended by the assumption that the core regions of an aggregate are fully buried. The algorithm accurately predicts the aggregation of a data set of 179 peptides compiled from the literature as well as of a new set of 71 peptides derived from human disease-related proteins, including prion protein, lysozyme and β2-microglobulin. TANGO also correctly predicts pathogenic as well as protective mutations of the Alzheimer β-peptide, human lysozyme and transthyretin, and discriminates between β-sheet propensity and aggregation. The results confirm the model of intermol. β-sheet formation as a widespread underlying mechanism of protein aggregation. Furthermore, the algorithm opens the door to a fully automated, sequence-based design strategy to improve the aggregation properties of proteins of scientific or industrial interest.
-
209Maurer-Stroh, S.; Debulpaep, M.; Kuemmerer, N.; Lopez de la Paz, M.; Martins, I. C.; Reumers, J.; Morris, K. L.; Copland, A.; Serpell, L.; Serrano, L.; Schymkowitz, J. W. H.; Rousseau, F. Exploring the Sequence Determinants of Amyloid Structure Using Position-Specific Scoring Matrices. Nat. Methods 2010, 7, 237– 242, DOI: 10.1038/nmeth.1432Google Scholar209https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhvFGmsbw%253D&md5=788ac031c7946f7d9c7c1f4e8de62a32Exploring the sequence determinants of amyloid structure using position-specific scoring matricesMaurer-Stroh, Sebastian; Debulpaep, Maja; Kuemmerer, Nico; de la Paz, Manuela Lopez; Martins, Ivo Cristiano; Reumers, Joke; Morris, Kyle L.; Copland, Alastair; Serpell, Louise; Serrano, Luis; Schymkowitz, Joost W. H.; Rousseau, FredericNature Methods (2010), 7 (3), 237-242CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)Protein aggregation results in β-sheet-like assemblies that adopt either a variety of amorphous morphologies or ordered amyloid-like structures. These differences in structure also reflect biol. differences; amyloid and amorphous β-sheet aggregates have different chaperone affinities, accumulate in different cellular locations and are degraded by different mechanisms. Further, amyloid function depends entirely on a high intrinsic degree of order. Here we exptl. explored the sequence space of amyloid hexapeptides and used the derived data to build Waltz, a web-based tool that uses a position-specific scoring matrix to det. amyloid-forming sequences. Waltz allows users to identify and better distinguish between amyloid sequences and amorphous β-sheet aggregates and allowed us to identify amyloid-forming regions in functional amyloids.
-
210Walsh, I.; Seno, F.; Tosatto, S. C. E.; Trovato, A. PASTA 2.0: An Improved Server for Protein Aggregation Prediction. Nucleic Acids Res. 2014, 42, W301– 307, DOI: 10.1093/nar/gku399Google Scholar210https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhtFCqs7vF&md5=e5eef7b6922fc7db345b10ff9a14b004PASTA 2.0: an improved server for protein aggregation predictionWalsh, Ian; Seno, Flavio; Tosatto, Silvio C. E.; Trovato, AntonioNucleic Acids Research (2014), 42 (W1), W301-W307CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)The formation of amyloid aggregates upon protein misfolding is related to several devastating degenerative diseases. The propensities of different protein sequences to aggregate into amyloids, how they are enhanced by pathogenic mutations, the presence of aggregation hot spots stabilizing pathol. interactions, the establishing of cross-amyloid interactions between co-aggregating proteins, all rely at the mol. level on the stability of the amyloid cross-beta structure. The authors' redesigned server, PASTA 2.0, provides a versatile platform where all of these different features can be easily predicted on a genomic scale given input sequences. The server provides other pieces of information, such as intrinsic disorder and secondary structure predictions, that complement the aggregation data. The PASTA 2.0 energy function evaluates the stability of putative cross-beta pairings between different sequence stretches. It was re-derived on a larger dataset of globular protein domains. The resulting algorithm was benchmarked on comprehensive peptide and protein test sets, leading to improved, state-of-the-art results with more amyloid forming regions correctly detected at high specificity. The PASTA 2.0 server can be accessed at http://protein.bio.unipd.it/pasta2/.
-
211Bryan, A. W.; Menke, M.; Cowen, L. J.; Lindquist, S. L.; Berger, B. BETASCAN: Probable Beta-Amyloids Identified by Pairwise Probabilistic Analysis. PLoS Comput. Biol. 2009, 5, e1000333, DOI: 10.1371/journal.pcbi.1000333Google Scholar211https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1M3jslaltA%253D%253D&md5=076a6b9a72cda8145ad23af5825d9cc0BETASCAN: probable beta-amyloids identified by pairwise probabilistic analysisBryan Allen W Jr; Menke Matthew; Cowen Lenore J; Lindquist Susan L; Berger BonniePLoS computational biology (2009), 5 (3), e1000333 ISSN:.Amyloids and prion proteins are clinically and biologically important beta-structures, whose supersecondary structures are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Recent work has indicated the utility of pairwise probabilistic statistics in beta-structure prediction. We develop here a new strategy for beta-structure prediction, emphasizing the determination of beta-strands and pairs of beta-strands as fundamental units of beta-structure. Our program, BETASCAN, calculates likelihood scores for potential beta-strands and strand-pairs based on correlations observed in parallel beta-sheets. The program then determines the strands and pairs with the greatest local likelihood for all of the sequence's potential beta-structures. BETASCAN suggests multiple alternate folding patterns and assigns relative a priori probabilities based solely on amino acid sequence, probability tables, and pre-chosen parameters. The algorithm compares favorably with the results of previous algorithms (BETAPRO, PASTA, SALSA, TANGO, and Zyggregator) in beta-structure prediction and amyloid propensity prediction. Accurate prediction is demonstrated for experimentally determined amyloid beta-structures, for a set of known beta-aggregates, and for the parallel beta-strands of beta-helices, amyloid-like globular proteins. BETASCAN is able both to detect beta-strands with higher sensitivity and to detect the edges of beta-strands in a richly beta-like sequence. For two proteins (Abeta and Het-s), there exist multiple sets of experimental data implying contradictory structures; BETASCAN is able to detect each competing structure as a potential structure variant. The ability to correlate multiple alternate beta-structures to experiment opens the possibility of computational investigation of prion strains and structural heterogeneity of amyloid. BETASCAN is publicly accessible on the Web at http://betascan.csail.mit.edu.
-
212Garbuzynskiy, S. O.; Lobanov, M. Y.; Galzitskaya, O. V. FoldAmyloid: A Method of Prediction of Amyloidogenic Regions from Protein Sequence. Bioinformatics 2010, 26, 326– 332, DOI: 10.1093/bioinformatics/btp691Google Scholar212https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhs1Onsrc%253D&md5=54bb87f8753d52c5c9bf8be6e8c86bc9FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequenceGarbuzynskiy, Sergiy O.; Lobanov, Michail Yu.; Galzitskaya, Oxana V.Bioinformatics (2010), 26 (3), 326-332CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Amyloidogenic regions in polypeptide chains are very important because such regions are responsible for amyloid formation and aggregation. It is useful to be able to predict positions of amyloidogenic regions in protein chains. Results: Two characteristics (expected probability of hydrogen bonds formation and expected packing d. of residues) have been introduced by us to detect amyloidogenic regions in a protein sequence. We demonstrate that regions with high expected probability of the formation of backbone-backbone hydrogen bonds as well as regions with high expected packing d. are mostly responsible for the formation of amyloid fibrils. Our method (FoldAmyloid) has been tested on a dataset of 407 peptides (144 amyloidogenic and 263 non-amyloidogenic peptides) and has shown good performance in predicting a peptide status: amyloidogenic or non-amyloidogenic. The prediction based on the expected packing d. classified correctly 75% of amyloidogenic peptides and 74% of non-amyloidogenic ones. Two variants (averaging by donors and by acceptors) of prediction based on the probability of formation of backbone-backbone hydrogen bonds gave a comparable efficiency. With a hybrid-scale constructed by merging the above three scales, our method is correct for 80% of amyloidogenic peptides and for 72% of non-amyloidogenic ones. Prediction of amyloidogenic regions in proteins where positions of amyloidogenic regions are known from exptl. data has also been done. In the proteins, our method correctly finds 10 out of 11 amyloidogenic regions.
-
213Goldschmidt, L.; Teng, P. K.; Riek, R.; Eisenberg, D. Identifying the Amylome, Proteins Capable of Forming Amyloid-like Fibrils. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 3487– 3492, DOI: 10.1073/pnas.0915166107Google Scholar213https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXjtFymu74%253D&md5=39dfce19592f6de1c53a6c9469f691d2Identifying the amylome, proteins capable of forming amyloid-like fibrilsGoldschmidt, Lukasz; Teng, Poh K.; Riek, Roland; Eisenberg, DavidProceedings of the National Academy of Sciences of the United States of America (2010), 107 (8), 3487-3492, S3487/1-S3487/13CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)The amylome is the universe of proteins that are capable of forming amyloid-like fibrils. Here we investigate the factors that enable a protein to belong to the amylome. A major factor is the presence in the protein of a segment that can form a tightly complementary interface with an identical segment, which permits the formation of a steric zipper - two self-complementary beta sheets that form the spine of an amyloid fibril. Another factor is sufficient conformational freedom of the self-complementary segment to interact with other mols. Using RNase A as a model system, we validate our fibrillogenic predictions by the 3D profile method based on the crystal structure of NNQQNY and demonstrate that a specific residue order is required for fiber formation. Our genome-wide anal. revealed that self-complementary segments are found in almost all proteins, yet not all proteins form amyloids. The implication is that chaperoning effects have evolved to constrain self-complementary segments from interaction with each other.
-
214Ahmed, A. B.; Znassi, N.; Château, M.-T.; Kajava, A. V. A Structure-Based Approach to Predict Predisposition to Amyloidosis. Alzheimer’s Dementia 2015, 11, 681– 690, DOI: 10.1016/j.jalz.2014.06.007Google Scholar214https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2M%252FksFSqsA%253D%253D&md5=f2f97b3d51ec862bbab6fab75e180239A structure-based approach to predict predisposition to amyloidosisAhmed Abdullah B; Znassi Nadia; Chateau Marie-Therese; Kajava Andrey VAlzheimer's & dementia : the journal of the Alzheimer's Association (2015), 11 (6), 681-90 ISSN:.BACKGROUND: Neurodegenerative diseases and other amyloidoses are linked to the formation of amyloid fibrils. It has been shown that the ability to form these fibrils is coded by the amino acid sequence. Existing methods for the prediction of amyloidogenicity generate an unsatisfactory high number of false positives when tested against sequences of the disease-related proteins. METHODS: Recently, it has been shown that the three-dimensional structure of a majority of disease-related amyloid fibrils contains a β-strand-loop-β-strand motif called β-arch. Using this information, we have developed a novel bioinformatics approach for the prediction of amyloidogenicity. RESULTS: The benchmark results show the superior performance of our method over the existing programs. CONCLUSIONS: As genome sequencing becomes more affordable, our method provides an opportunity to create individual risk profiles for the neurodegenerative, age-related, and other diseases ushering in an era of personalized medicine. It will also be used in the large-scale analysis of proteomes to find new amyloidogenic proteins.
-
215Krogh, A.; Vedelsby, J. Neural Network Ensembles, Cross Validation and Active Learning. In Proceedings of the 7th International Conference on Neural Information Processing Systems (NIPS’94); MIT Press: Cambridge, MA, 1994; pp 231– 238.Google ScholarThere is no corresponding record for this reference.
-
216Maclin, R.; Opitz, D. Popular Ensemble Methods: An Empirical Study. J. Artif. Intell. Res. 1999, 11, 169– 198, DOI: 10.1613/jair.614Google ScholarThere is no corresponding record for this reference.
-
217Tsolis, A. C.; Papandreou, N. C.; Iconomidou, V. A.; Hamodrakas, S. J. A Consensus Method for the Prediction of “Aggregation-Prone” Peptides in Globular Proteins. PLoS One 2013, 8, e54175, DOI: 10.1371/journal.pone.0054175Google Scholar217https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtlCqsb4%253D&md5=759edce8afae8bbe81b455770c9ab600A consensus method for the prediction of 'aggregation-prone' peptides in globular proteinsTsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.PLoS One (2013), 8 (1), e54175CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)The purpose of this work was to construct a consensus prediction algorithm of 'aggregation-prone' peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the prodn. of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users, for the consensus prediction of amyloidogenic determinants/'aggregation-prone' peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are assocd. with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/soly. in biotechnol. (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins).
-
218Emily, M.; Talvas, A.; Delamarche, C. MetAmyl: A METa-Predictor for AMYLoid Proteins. PLoS One 2013, 8, e79722, DOI: 10.1371/journal.pone.0079722Google ScholarThere is no corresponding record for this reference.
-
219Zambrano, R.; Jamroz, M.; Szczasiuk, A.; Pujols, J.; Kmiecik, S.; Ventura, S. AGGRESCAN3D (A3D): Server for Prediction of Aggregation Properties of Protein Structures. Nucleic Acids Res. 2015, 43, W306– 313, DOI: 10.1093/nar/gkv359Google Scholar219https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVymtbjK&md5=4d5a4d94fa0bf2744250860780e2a203AGGRESCAN3D (A3D): server for prediction of aggregation properties of protein structuresZambrano, Rafael; Jamroz, Michal; Szczasiuk, Agata; Pujols, Jordi; Kmiecik, Sebastian; Ventura, SalvadorNucleic Acids Research (2015), 43 (W1), W306-W313CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)Protein aggregation underlies an increasing no. of disorders and constitutes a major bottleneck in the development of therapeutic proteins. Our present understanding on the mol. determinants of protein aggregation has crystd. in a series of predictive algorithms to identify aggregation-prone sites. A majority of these methods rely only on sequence. Therefore, they find difficulties to predict the aggregation properties of folded globular proteins, where aggregation-prone sites are often not contiguous in sequence or buried inside the native structure. The AGGRESCAN3D (A3D) server overcomes these limitations by taking into account the protein structure and the exptl. aggregation propensity scale from the well-established AGGRESCAN method. Using the A3D server, the identified aggregation-prone residues can be virtually mutated to design variants with increased soly., or to test the impact of pathogenic mutations. Addnl., A3D server enables to take into account the dynamic fluctuations of protein structure in soln., which may influence aggregation propensity. This is possible in A3D Dynamic Mode that exploits the CABS-flex approach for the fast simulations of flexibility of globular proteins.
-
220De Baets, G.; Van Durme, J.; van der Kant, R.; Schymkowitz, J.; Rousseau, F. Solubis: Optimize Your Protein. Bioinformatics 2015, 31, 2580– 2582, DOI: 10.1093/bioinformatics/btv162Google Scholar220https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs1Gisr3O&md5=895e1eac4e041610ac951c35a500f7e7Solubis: optimize your proteinDe Baets, Greet; Van Durme, Joost; van der Kant, Rob; Schymkowitz, Joost; Rousseau, FredericBioinformatics (2015), 31 (15), 2580-2582CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation:Protein aggregation is assocd. with a no. of protein misfolding diseases and is a major concern for therapeutic proteins. Aggregation is caused by the presence of aggregation- prone regions (APRs) in the amino acid sequence of the protein. The lower the aggregation propen- sity of APRs and the better they are protected by native interactions within the folded structure of the protein, the more aggregation is prevented. Therefore, both the local thermodn. stability of APRs in the native structure and their intrinsic aggregation propensity are a key parameter that needs to be optimized to prevent protein aggregation. Results:The Solubis method presented here automates the process of carefully selecting point mutations that minimize the intrinsic aggregation propensity while improving local protein stability.
-
221Van Durme, J.; De Baets, G.; Van Der Kant, R.; Ramakers, M.; Ganesan, A.; Wilkinson, H.; Gallardo, R.; Rousseau, F.; Schymkowitz, J. Solubis: A Webserver To Reduce Protein Aggregation through Mutation. Protein Eng., Des. Sel. 2016, 29, 285– 289, DOI: 10.1093/protein/gzw019Google Scholar221https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs1OntbjM&md5=f2dd5db6195dd37365285f09a44e9c0bSolubis: a webserver to reduce protein aggregation through mutationVan Durme, Joost; De Baets, Greet; Van Der Kant, Rob; Ramakers, Meine; Ganesan, Ashok; Wilkinson, Hannah; Gallardo, Rodrigo; Rousseau, Frederic; Schymkowitz, JoostProtein Engineering, Design & Selection (2016), 29 (8), 285-289CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)Protein aggregation is a major factor limiting the biotechnol. and therapeutic application of many proteins, including enzymes and monoclonal antibodies. The mol. principles underlying aggregation are by now sufficiently understood to allow rational redesign of natural polypeptide sequences for decreased aggregation tendency, and hence potentially increased expression and soly. Given that aggregation-prone regions (APRs) tend to contribute to the stability of the hydrophobic core or to functional sites of the protein, mutations in these regions have to be carefully selected in order not to disrupt protein structure or function. Therefore, we here provide access to an automated pipeline to identify mutations that reduce protein aggregation by reducing the intrinsic aggregation propensity of the sequence (using the TANGO algorithm), while taking care not to disrupt the thermodn. stability of the native structure (using the empirical force-field FoldX). Moreover, by providing a plot of the intrinsic aggregation propensity score of APRs cor. by the local stability of that region in the folded structure, we allow users to prioritize those regions in the protein that are most in need of improvement through protein engineering.
Cited By
This article is cited by 86 publications.
- Bingxin Zhou, Lirong Zheng, Banghao Wu, Yang Tan, Outongyi Lv, Kai Yi, Guisheng Fan, Liang Hong. Protein Engineering with Lightweight Graph Denoising Neural Networks. Journal of Chemical Information and Modeling 2024, Article ASAP.
- Tong Zhu, Jinyuan Sun, Hua Pang, Bian Wu. Computational Enzyme Redesign Enhances Tolerance to Denaturants for Peptide C-Terminal Amidation. JACS Au 2024, 4 (2) , 788-797. https://doi.org/10.1021/jacsau.3c00792
- Braun Markus, Gruber Christian C, Krassnigg Andreas, Kummer Arkadij, Lutz Stefan, Oberdorfer Gustav, Siirola Elina, Snajdrova Radka. Accelerating Biocatalysis Discovery with Machine Learning: A Paradigm Shift in Enzyme Engineering, Discovery, and Design. ACS Catalysis 2023, 13 (21) , 14454-14469. https://doi.org/10.1021/acscatal.3c03417
- Antonin Kunka, Sérgio M. Marques, Martin Havlasek, Michal Vasina, Nikola Velatova, Lucia Cengelova, David Kovar, Jiri Damborsky, Martin Marek, David Bednar, Zbynek Prokop. Advancing Enzyme’s Stability and Catalytic Efficiency through Synergy of Force-Field Calculations, Evolutionary Analysis, and Machine Learning. ACS Catalysis 2023, 13 (19) , 12506-12518. https://doi.org/10.1021/acscatal.3c02575
- Lixia Liu, Shenghu Zhou, Yu Deng. Rational Design of the Substrate Tunnel of β-Ketothiolase Reveals a Local Cationic Domain Modulated Rule that Improves the Efficiency of Claisen Condensation. ACS Catalysis 2023, 13 (12) , 8183-8194. https://doi.org/10.1021/acscatal.3c01426
- Sergi Roda, Henrik Terholsen, Jule Ruth Heike Meyer, Albert Cañellas-Solé, Victor Guallar, Uwe Bornscheuer, Masoud Kazemi. AsiteDesign: a Semirational Algorithm for an Automated Enzyme Design. The Journal of Physical Chemistry B 2023, 127 (12) , 2661-2670. https://doi.org/10.1021/acs.jpcb.2c07091
- Tianjin Yang, Alessia Villois, Antonín Kunka, Fulvio Grigolato, Paolo Arosio, Zbynek Prokop, Andrew deMello, Stavros Stavrakis. Droplet-Based Microfluidic Temperature-Jump Platform for the Rapid Assessment of Biomolecular Kinetics. Analytical Chemistry 2022, 94 (48) , 16675-16684. https://doi.org/10.1021/acs.analchem.2c03009
- David A. Hueting, Sudarsana R. Vanga, Per-Olof Syrén. Thermoadaptation in an Ancestral Diterpene Cyclase by Altered Loop Stability. The Journal of Physical Chemistry B 2022, 126 (21) , 3809-3821. https://doi.org/10.1021/acs.jpcb.1c10605
- Daniel Markthaler, Maximilian Fleck, Bartosz Stankiewicz, Niels Hansen. Exploring the Effect of Enhanced Sampling on Protein Stability Prediction. Journal of Chemical Theory and Computation 2022, 18 (4) , 2569-2583. https://doi.org/10.1021/acs.jctc.1c01012
- Jiajun Chen, Ding Chen, Qiuming Chen, Wei Xu, Wenli Zhang, Wanmeng Mu. Computer-Aided Targeted Mutagenesis of Thermoclostridium caenicola d-Allulose 3-Epimerase for Improved Thermostability. Journal of Agricultural and Food Chemistry 2022, 70 (6) , 1943-1951. https://doi.org/10.1021/acs.jafc.1c07256
- Jennifer L. Kennemur, Rajat Maji, Manuel J. Scharf, Benjamin List. Catalytic Asymmetric Hydroalkoxylation of C–C Multiple Bonds. Chemical Reviews 2021, 121 (24) , 14649-14681. https://doi.org/10.1021/acs.chemrev.1c00620
- Ailan Huang, Chengcheng Chai, Jiayu Zhang, Lei Zhao, Fuping Lu, Fufeng Liu. Engineered N57P Variant of Ulvan Lyase with Improvement of Catalytic Efficiency and Thermostability via Reducing Loop Flexibility and Anchoring Substrate. ACS Sustainable Chemistry & Engineering 2021, 9 (48) , 16415-16423. https://doi.org/10.1021/acssuschemeng.1c06348
- Klara Markova, Antonin Kunka, Klaudia Chmelova, Martin Havlasek, Petra Babkova, Sérgio M. Marques, Michal Vasina, Joan Planas-Iglesias, Radka Chaloupkova, David Bednar, Zbynek Prokop, Jiri Damborsky, Martin Marek. Computational Enzyme Stabilization Can Affect Folding Energy Landscapes and Lead to Catalytically Enhanced Domain-Swapped Dimers. ACS Catalysis 2021, 11 (21) , 12864-12885. https://doi.org/10.1021/acscatal.1c03343
- Kohei Kozuka, Shogo Nakano, Yasuhisa Asano, Sohei Ito. Partial Consensus Design and Enhancement of Protein Function by Secondary-Structure-Guided Consensus Mutations. Biochemistry 2021, 60 (29) , 2309-2319. https://doi.org/10.1021/acs.biochem.1c00309
- Rae A. Corrigan, Guowei Qi, Andrew C. Thiel, Jack R. Lynn, Brandon D. Walker, Thomas L. Casavant, Louis Lagardere, Jean-Philip Piquemal, Jay W. Ponder, Pengyu Ren, Michael J. Schnieders. Implicit Solvents for the Polarizable Atomic Multipole AMOEBA Force Field. Journal of Chemical Theory and Computation 2021, 17 (4) , 2323-2341. https://doi.org/10.1021/acs.jctc.0c01286
- Megan V. Doble, Lorenz Obrecht, Henk-Jan Joosten, Misun Lee, Henriette J. Rozeboom, Emma Branigan, James. H. Naismith, Dick B. Janssen, Amanda G. Jarvis, Paul C. J. Kamer. Engineering Thermostability in Artificial Metalloenzymes to Increase Catalytic Activity. ACS Catalysis 2021, 11 (6) , 3620-3627. https://doi.org/10.1021/acscatal.0c05413
- Yinglu Cui, Yanchun Chen, Xinyue Liu, Saijun Dong, Yu’e Tian, Yuxin Qiao, Ruchira Mitra, Jing Han, Chunli Li, Xu Han, Weidong Liu, Quan Chen, Wangqing Wei, Xin Wang, Wenbin Du, Shuangyan Tang, Hua Xiang, Haiyan Liu, Yong Liang, Kendall N. Houk, Bian Wu. Computational Redesign of a PETase for Plastic Biodegradation under Ambient Condition by the GRAPE Strategy. ACS Catalysis 2021, 11 (3) , 1340-1350. https://doi.org/10.1021/acscatal.0c05126
- Christoph K. Winkler, Joerg H. Schrittwieser, Wolfgang Kroutil. Power of Biocatalysis for Organic Synthesis. ACS Central Science 2021, 7 (1) , 55-71. https://doi.org/10.1021/acscentsci.0c01496
- Peishan Huang, Simon K. S. Chu, Henrique N. Frizzo, Morgan P. Connolly, Ryan W. Caster, Justin B. Siegel. Evaluating Protein Engineering Thermostability Prediction Tools Using an Independently Generated Dataset. ACS Omega 2020, 5 (12) , 6487-6493. https://doi.org/10.1021/acsomega.9b04105
- Qinglong Meng, Nikolas Capra, Cyntia M. Palacio, Elisa Lanfranchi, Marleen Otzen, Luc Z. van Schie, Henriëtte J. Rozeboom, Andy-Mark W. H. Thunnissen, Hein J. Wijma, Dick B. Janssen. Robust ω-Transaminases by Computational Stabilization of the Subunit Interface. ACS Catalysis 2020, 10 (5) , 2915-2928. https://doi.org/10.1021/acscatal.9b05223
- Stanislav Mazurenko, Zbynek Prokop, Jiri Damborsky. Machine Learning in Enzyme Engineering. ACS Catalysis 2020, 10 (2) , 1210-1223. https://doi.org/10.1021/acscatal.9b04321
- Brianne R. King, Kiera H. Sumida, Jessica L. Caruso, David Baker, Jesse G. Zalatan. Computational stabilization of a non-heme iron enzyme enables efficient evolution of new function. 2024https://doi.org/10.1101/2024.04.18.590141
- Mohammad Reza Rahbar, Navid Nezafat, Mohammad Hossein Morowvat, Amir Savardashtaki, Mohammad Bagher Ghoshoon, Kamran Mehrabani-Zeinabad, Younes Ghasemi. Targeting Efficient Features of Urate Oxidase to Increase Its Solubility. Applied Biochemistry and Biotechnology 2024, 41 https://doi.org/10.1007/s12010-023-04819-w
- Suhyeon Kim, Seongmin Ga, Hayeon Bae, Ronald Sluyter, Konstantin Konstantinov, Lok Kumar Shrestha, Yong Ho Kim, Jung Ho Kim, Katsuhiko Ariga. Multidisciplinary approaches for enzyme biocatalysis in pharmaceuticals: protein engineering, computational biology, and nanoarchitectonics. EES Catalysis 2024, 2 (1) , 14-48. https://doi.org/10.1039/D3EY00239J
- Hiroki Ozawa, Ibuki Unno, Ryohei Sekine, Taichi Chisuga, Sohei Ito, Shogo Nakano. Development of evolutionary algorithm-based protein redesign method. Cell Reports Physical Science 2024, 5 (1) , 101758. https://doi.org/10.1016/j.xcrp.2023.101758
- Lihang Xie. Biofoundries for plant-derived bioactive compounds. 2024, 257-283. https://doi.org/10.1016/B978-0-443-15558-1.00005-9
- Carola Jerves, Rui P. P. Neves, Saulo L. da Silva, Maria J. Ramos, Pedro A. Fernandes. Rate-enhancing PETase mutations determined through DFT/MM molecular dynamics simulations. New Journal of Chemistry 2023, 48 (1) , 45-54. https://doi.org/10.1039/D3NJ04204A
- Elena Tomarelli, Bruno Cerra, Francesco G. Mutti, Antimo Gioiello. Merging Continuous Flow Technology, Photochemistry and Biocatalysis to Streamline Steroid Synthesis. Advanced Synthesis & Catalysis 2023, 365 (23) , 4024-4048. https://doi.org/10.1002/adsc.202300305
- Honglin Lu, Maoyuan Xue, Xinling Nie, Hongzheng Luo, Zhongbiao Tan, Xiao Yang, Hao Shi, Xun Li, Tao Wang. Glycoside hydrolases in the biodegradation of lignocellulosic biomass. 3 Biotech 2023, 13 (12) https://doi.org/10.1007/s13205-023-03819-1
- Md Sakib Hossen, Md. Nazmul Hasan, Munima Haque, Tawsif Al Arian, Sajal Kumar Halder, Md. Jasim Uddin, M. Abdullah-Al-Mamun, Md Salman Shakil. Immunoinformatics-aided rational design of multiepitope-based peptide vaccine (MEBV) targeting human parainfluenza virus 3 (HPIV-3) stable proteins. Journal of Genetic Engineering and Biotechnology 2023, 21 (1) , 162. https://doi.org/10.1186/s43141-023-00623-5
- Milos Musil, Andrej Jezik, Jana Horackova, Simeon Borko, Petr Kabourek, Jiri Damborsky, David Bednar. FireProt 2.0: web-based platform for the fully automated design of thermostable proteins. Briefings in Bioinformatics 2023, 25 (1) https://doi.org/10.1093/bib/bbad425
- Mahrokh Dastmalchi, Mahdiyeh Alizadeh, Omid Jamshidi-Kandjan, Hassan Rezazadeh, Maryam Hamzeh-Mivehroud, Mohammad M Farajollahi, Siavoush Dastmalchi. Expression and Biological Evaluation of an Engineered Recombinant L-asparaginase Designed by In Silico Method Based on Sequence of the Enzyme from Escherichia coli. Advanced Pharmaceutical Bulletin 2023, 13 (4) , 827-836. https://doi.org/10.34172/apb.2023.085
- Caroline Torres de Oliveira, Michelle Alexandrino de Assis, Marcio Antonio Mazutti, Gonçalo Amarante Guimarães Pereira, Débora de Oliveira. Production of recombinant cutinases and their potential applications in polymer hydrolysis: The current status. Process Biochemistry 2023, 134 , 30-46. https://doi.org/10.1016/j.procbio.2023.10.020
- Jie Luo, Chenshuo Song, Wenjing Cui, Laichuang Han, Zhemin Zhou. Counteraction of stability-activity trade-off of Nattokinase through flexible region shifting. Food Chemistry 2023, 423 , 136241. https://doi.org/10.1016/j.foodchem.2023.136241
- Liliana Mammino. Green chemistry and computational chemistry: A wealth of promising synergies. Sustainable Chemistry and Pharmacy 2023, 34 , 101151. https://doi.org/10.1016/j.scp.2023.101151
- Qing Guo, Meiling Dan, Yuting Zheng, Ji Shen, Guohua Zhao, Damao Wang. Improving the thermostability of a novel PL-6 family alginate lyase by rational design engineering for industrial preparation of alginate oligosaccharides. International Journal of Biological Macromolecules 2023, 249 , 125998. https://doi.org/10.1016/j.ijbiomac.2023.125998
- Zheng Wei, Tanja Knaus, Yuxin Liu, Ziran Zhai, Andrea F. G. Gargano, Gadi Rothenberg, Ning Yan, Francesco G. Mutti. A high-performance electrochemical biosensor using an engineered urate oxidase. Chemical Communications 2023, 59 (52) , 8071-8074. https://doi.org/10.1039/D3CC01869E
- Anwesha Chatterjee, Sonakshi Puri, Pankaj Kumar Sharma, P. R. Deepa, Shibasish Chowdhury. Nature-inspired Enzyme engineering and sustainable catalysis: biochemical clues from the world of plants and extremophiles. Frontiers in Bioengineering and Biotechnology 2023, 11 https://doi.org/10.3389/fbioe.2023.1229300
- Stefanie Hanreich, Elisa Bonandi, Ivana Drienovská. Design of Artificial Enzymes: Insights into Protein Scaffolds. ChemBioChem 2023, 24 (6) https://doi.org/10.1002/cbic.202200566
- Jie Gu, Yan Xu, Yao Nie. Role of distal sites in enzyme engineering. Biotechnology Advances 2023, 63 , 108094. https://doi.org/10.1016/j.biotechadv.2023.108094
- Hanbeen Kim, Jakyeom Seo. A Novel Strategy to Identify Endolysins with Lytic Activity against Methicillin-Resistant Staphylococcus aureus. International Journal of Molecular Sciences 2023, 24 (6) , 5772. https://doi.org/10.3390/ijms24065772
- Zhixin Dou, Yuqing Sun, Xukai Jiang, Xiuyun Wu, Yingjie Li, Bin Gong, Lushan Wang. Data-driven strategies for the computational design of enzyme thermal stability: trends, perspectives, and prospects. Acta Biochimica et Biophysica Sinica 2023, 55 (3) , 343-355. https://doi.org/10.3724/abbs.2023033
- Delaney M. Anderson, Lakshmi P. Jayanthi, Shachi Gosavi, Elizabeth M. Meiering. Engineering the kinetic stability of a β-trefoil protein by tuning its topological complexity. Frontiers in Molecular Biosciences 2023, 10 https://doi.org/10.3389/fmolb.2023.1021733
- Tianhao Yu, Aashutosh Girish Boob, Michael J. Volk, Xuan Liu, Haiyang Cui, Huimin Zhao. Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 2023, 6 (2) , 137-151. https://doi.org/10.1038/s41929-022-00909-w
- María Laura Foresti, María Luján Ferreira. Enzyme immobilization for use in nonconventional media. 2023, 165-202. https://doi.org/10.1016/B978-0-323-91317-1.00008-6
- Charu Tripathi, Twinkle Yadav. Recent approaches and innovations for enzyme engineering used in industrial biotechnology. 2023, 161-175. https://doi.org/10.1016/B978-0-323-95332-0.00017-X
- Seyyed Soheil Rahmatabadi, Keivan Mobini, Soudabeh Askari, Javad Najafian, Keyvan Karami, Bijan Soleymani, Ali Mostafaie. In silico characterization of fructosyl peptide oxidase properties from Eupenicillium terrenum. Journal of Molecular Recognition 2022, 35 (11) https://doi.org/10.1002/jmr.2980
- Zhuha Basit, Hira Akram, Muhammad Mudassir Iqbal, Gulzar Muhammad, Muhammad Shahbaz Aslam, Iram Gul, Muhammad Jamil, Mudassir Hussain Tahir. Protein Redesign and Engineering Using Machine Learning. 2022, 247-282. https://doi.org/10.1002/9781394167258.ch9
- Muhammad Naveed, Jawad-ul Hassan, Muneeb Ahmad, Nida Naeem, Muhammad Saad Mughal, Ali A. Rabaan, Mohammed Aljeldah, Basim R. Al Shammari, Mohammed Alissa, Amal A. Sabour, Rana A. Alaeq, Maha A. Alshiekheid, Safaa A. Turkistani, Abdirahman Hussein Elmi, Naveed Ahmed. Designing mRNA- and Peptide-Based Vaccine Construct against Emerging Multidrug-Resistant Citrobacter freundii: A Computational-Based Subtractive Proteomics Approach. Medicina 2022, 58 (10) , 1356. https://doi.org/10.3390/medicina58101356
- Michal Vasina, Pavel Vanacek, Jiri Hon, David Kovar, Hana Faldynova, Antonin Kunka, Tomas Buryska, Christoffel P.S. Badenhorst, Stanislav Mazurenko, David Bednar, Stavros Stavrakis, Uwe T. Bornscheuer, Andrew deMello, Jiri Damborsky, Zbynek Prokop. Advanced database mining of efficient haloalkane dehalogenases by sequence and structure bioinformatics and microfluidics. Chem Catalysis 2022, 2 (10) , 2704-2725. https://doi.org/10.1016/j.checat.2022.09.011
- Aisaraphon Phintha, Pimchai Chaiyen. Rational and mechanistic approaches for improving biocatalyst performance. Chem Catalysis 2022, 2 (10) , 2614-2643. https://doi.org/10.1016/j.checat.2022.09.026
- Yu-Jie Yang, Xiao-Qiong Pei, Yan Liu, Zhong-Liu Wu. Thermostabilizing ketoreductase ChKRED20 by consensus mutagenesis at dimeric interfaces. Enzyme and Microbial Technology 2022, 158 , 110052. https://doi.org/10.1016/j.enzmictec.2022.110052
- Erich R Kuechler, Matthew Jacobson, Thibault Mayor, Jörg Gsponer. GraPES: The Granule Protein Enrichment Server for prediction of biological condensate constituents. Nucleic Acids Research 2022, 50 (W1) , W384-W391. https://doi.org/10.1093/nar/gkac279
- Antonin Kunka, David Lacko, Jan Stourac, Jiri Damborsky, Zbynek Prokop, Stanislav Mazurenko. CalFitter 2.0: Leveraging the power of singular value decomposition to analyse protein thermostability. Nucleic Acids Research 2022, 50 (W1) , W145-W151. https://doi.org/10.1093/nar/gkac378
- Yinglu Cui, Jinyuan Sun, Bian Wu. Computational enzyme redesign: large jumps in function. Trends in Chemistry 2022, 4 (5) , 409-419. https://doi.org/10.1016/j.trechm.2022.03.001
- Yanxia Wang, Yao Chen, Ling Jiang, He Huang. Improvement of the enzymatic detoxification activity towards mycotoxins through structure-based engineering. Biotechnology Advances 2022, 56 , 107927. https://doi.org/10.1016/j.biotechadv.2022.107927
- Ziyang Huang, Xueqin Lv, Guoyun Sun, Xinzhu Mao, Wei Lu, Yanfeng Liu, Jianghua Li, Guocheng Du, Long Liu. Chitin deacetylase: from molecular structure to practical applications. Systems Microbiology and Biomanufacturing 2022, 2 (2) , 271-284. https://doi.org/10.1007/s43393-022-00077-9
- Michal Vasina, Jan Velecký, Joan Planas-Iglesias, Sergio M. Marques, Jana Skarupova, Jiri Damborsky, David Bednar, Stanislav Mazurenko, Zbynek Prokop. Tools for computational design and high-throughput screening of therapeutic enzymes. Advanced Drug Delivery Reviews 2022, 183 , 114143. https://doi.org/10.1016/j.addr.2022.114143
- Petr Rozhin, Jada Abdel Monem Gamal, Silvia Giordani, Silvia Marchesan. Carbon Nanomaterials (CNMs) and Enzymes: From Nanozymes to CNM-Enzyme Conjugates and Biodegradation. Materials 2022, 15 (3) , 1037. https://doi.org/10.3390/ma15031037
- Xavier F. Cadet, Jean Christophe Gelly, Aster van Noord, Frédéric Cadet, Carlos G. Acevedo-Rocha. Learning Strategies in Protein Directed Evolution. 2022, 225-275. https://doi.org/10.1007/978-1-0716-2152-3_15
- Michal Vasina, Pavel Vanacek, Jiri Hon, David Kovar, Hana Faldynova, Antonin Kunka, Tomas Buryska, Christoffel P. S. Badenhorst, Stanislav Mazurenko, David Bednar, Stavros Stavrakis, Uwe T. Bornscheuer, Andrew deMello, Jiri Damborsky, Zbynek Prokop. Advanced Database Mining of Efficient Biocatalysts by Sequence and Structure Bioinformatics and Microfluidics. SSRN Electronic Journal 2022, 43 https://doi.org/10.2139/ssrn.4111603
- Ziheng Cui, Shiding Zhang, Shengyu Zhang, Biqiang Chen, Yushan Zhu, Tianwei Tan. Green biomanufacturing promoted by automatic retrobiosynthesis planning and computational enzyme design. Chinese Journal of Chemical Engineering 2022, 41 , 6-21. https://doi.org/10.1016/j.cjche.2021.08.017
- Kyle Trainor, Colleen M. Doyle, Avril Metcalfe-Roach, Julia Steckner, Daša Lipovšek, Heather Malakian, David Langley, Stanley R. Krystek Jr., Elizabeth M. Meiering. Design for Solubility May Reveal Induction of Amide Hydrogen/Deuterium Exchange by Protein Self-Association. Journal of Molecular Biology 2022, 434 (2) , 167398. https://doi.org/10.1016/j.jmb.2021.167398
- Pritam Giri, Amol D. Pagar, Mahesh D. Patil, Hyungdon Yun. Chemical modification of enzymes to improve biocatalytic performance. Biotechnology Advances 2021, 53 , 107868. https://doi.org/10.1016/j.biotechadv.2021.107868
- Jinling Xu, Haisheng Zhou, Haoran Yu, Tong Deng, Ziyuan Wang, Hongyu Zhang, Jianping Wu, Lirong Yang. Computational design of highly stable and soluble alcohol dehydrogenase for NADPH regeneration. Bioresources and Bioprocessing 2021, 8 (1) https://doi.org/10.1186/s40643-021-00362-w
- Benjamin B. V. Louis, Luciano A. Abriata. Reviewing Challenges of Predicting Protein Melting Temperature Change Upon Mutation Through the Full Analysis of a Highly Detailed Dataset with High-Resolution Structures. Molecular Biotechnology 2021, 63 (10) , 863-884. https://doi.org/10.1007/s12033-021-00349-0
- Sérgio M Marques, Joan Planas-Iglesias, Jiri Damborsky. Web-based tools for computational enzyme design. Current Opinion in Structural Biology 2021, 69 , 19-34. https://doi.org/10.1016/j.sbi.2021.01.010
- Milos Musil, Rayyan Tariq Khan, Andy Beier, Jan Stourac, Hannes Konegger, Jiri Damborsky, David Bednar. FireProtASR: A Web Server for Fully Automated Ancestral Sequence Reconstruction. Briefings in Bioinformatics 2021, 22 (4) https://doi.org/10.1093/bib/bbaa337
- Yameng Xu, Yaokang Wu, Xueqin Lv, Guoyun Sun, Hongzhi Zhang, Taichi Chen, Guocheng Du, Jianghua Li, Long Liu. Design and construction of novel biocatalyst for bioprocessing: Recent advances and future outlook. Bioresource Technology 2021, 332 , 125071. https://doi.org/10.1016/j.biortech.2021.125071
- Carlos Eduardo Sequeiros-Borja, Bartłomiej Surpeta, Jan Brezovsky. Recent advances in user-friendly computational tools to engineer protein function. Briefings in Bioinformatics 2021, 22 (3) https://doi.org/10.1093/bib/bbaa150
- Jiri Hon, Martin Marusiak, Tomas Martinek, Antonin Kunka, Jaroslav Zendulka, David Bednar, Jiri Damborsky, . SoluProt: prediction of soluble protein expression in Escherichia coli. Bioinformatics 2021, 37 (1) , 23-28. https://doi.org/10.1093/bioinformatics/btaa1102
- Yan Liu, Zi-Yi Li, Chao Guo, Can Cui, Hui Lin, Zhong-Liu Wu. Enhancing the thermal stability of ketoreductase ChKRED12 using the FireProt web server. Process Biochemistry 2021, 101 , 207-212. https://doi.org/10.1016/j.procbio.2020.11.018
- Jan Stourac, Juraj Dubrava, Milos Musil, Jana Horackova, Jiri Damborsky, Stanislav Mazurenko, David Bednar. FireProtDB: database of manually curated protein stability data. Nucleic Acids Research 2021, 49 (D1) , D319-D324. https://doi.org/10.1093/nar/gkaa981
- Tanatarov Dinmukhamed, Ziyang Huang, Yanfeng Liu, Xueqin Lv, Jianghua Li, Guocheng Du, Long Liu. Current advances in design and engineering strategies of industrial enzymes. Systems Microbiology and Biomanufacturing 2021, 1 (1) , 15-23. https://doi.org/10.1007/s43393-020-00005-9
- Jiahua Bi, Xiaoran Jing, Lunjie Wu, Xia Zhou, Jie Gu, Yao Nie, Yan Xu. Computational design of noncanonical amino acid-based thioether staples at N/C-terminal domains of multi-modular pullulanase for thermostabilization in enzyme catalysis. Computational and Structural Biotechnology Journal 2021, 19 , 577-585. https://doi.org/10.1016/j.csbj.2020.12.043
- Stanislav Mazurenko. Predicting protein stability and solubility changes upon mutations: data perspective. ChemCatChem 2020, 12 (22) , 5590-5598. https://doi.org/10.1002/cctc.202000933
- Klara Markova, Klaudia Chmelova, Sérgio M. Marques, Philippe Carpentier, David Bednar, Jiri Damborsky, Martin Marek. Decoding the intricate network of molecular interactions of a hyperstable engineered biocatalyst. Chemical Science 2020, 11 (41) , 11162-11178. https://doi.org/10.1039/D0SC03367G
- Qiuming Chen, Yaqin Xiao, Wenli Zhang, Wanmeng Mu. Current methods and applications in computational protein design for food industry. Critical Reviews in Food Science and Nutrition 2020, 60 (19) , 3259-3270. https://doi.org/10.1080/10408398.2019.1682513
- Marian H. Hettiaratchi, Matthew J. O’Meara, Teresa R. O’Meara, Andrew J. Pickering, Nitzan Letko-Khait, Molly S. Shoichet. Reengineering biocatalysts: Computational redesign of chondroitinase ABC improves efficacy and stability. Science Advances 2020, 6 (34) https://doi.org/10.1126/sciadv.abc6378
- Jiri Hon, Simeon Borko, Jan Stourac, Zbynek Prokop, Jaroslav Zendulka, David Bednar, Tomas Martinek, Jiri Damborsky. EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. Nucleic Acids Research 2020, 48 (W1) , W104-W109. https://doi.org/10.1093/nar/gkaa372
- Sara Arana-Peña, Diego Carballares, Ángel Berenguer-Murcia, Andrés Alcántara, Rafael Rodrigues, Roberto Fernandez-Lafuente. One Pot Use of Combilipases for Full Modification of Oils and Fats: Multifunctional and Heterogeneous Substrates. Catalysts 2020, 10 (6) , 605. https://doi.org/10.3390/catal10060605
- Kulandai Arockia Rajesh Packiam, Ramakrishnan Nagasundara Ramanan, Chien Wei Ooi, Lakshminarasimhan Krishnaswamy, Beng Ti Tey. Stepwise optimization of recombinant protein production in Escherichia coli utilizing computational and experimental approaches. Applied Microbiology and Biotechnology 2020, 104 (8) , 3253-3266. https://doi.org/10.1007/s00253-020-10454-w
- Yi Zhang, Alberta NA Aryee, Benjamin K Simpson. Current role of in silico approaches for food enzymes. Current Opinion in Food Science 2020, 31 , 63-70. https://doi.org/10.1016/j.cofs.2019.11.003
- Pornkanok Pongpamorn, Pratchaya Watthaisong, Panu Pimviriyakul, Aritsara Jaruwat, Narin Lawan, Penchit Chitnumsub, Pimchai Chaiyen. Identification of a Hotspot Residue for Improving the Thermostability of a Flavin‐Dependent Monooxygenase. ChemBioChem 2019, 20 (24) , 3020-3031. https://doi.org/10.1002/cbic.201900413
- Susanna Navarro, Salvador Ventura. Computational re-design of protein structures to improve solubility. Expert Opinion on Drug Discovery 2019, 14 (10) , 1077-1088. https://doi.org/10.1080/17460441.2019.1637413
- Andy Beier, Jiri Damborsky, Zbynek Prokop. Transhalogenation Catalysed by Haloalkane Dehalogenases Engineered to Stop Natural Pathway at Intermediate. Advanced Synthesis & Catalysis 2019, 361 (11) , 2438-2442. https://doi.org/10.1002/adsc.201900132
-
References
ARTICLE SECTIONS
This article references 221 other publications.
-
1Choi, J.-M.; Han, S.-S.; Kim, H.-S. Industrial Applications of Enzyme Biocatalysis: Current Status and Future Aspects. Biotechnol. Adv. 2015, 33, 1443– 1454, DOI: 10.1016/j.biotechadv.2015.02.0141https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXksV2gsL4%253D&md5=024eb6961205da41328dd5b1c3b19244Industrial applications of enzyme biocatalysis: Current status and future aspectsChoi, Jung-Min; Han, Sang-Soo; Kim, Hak-SungBiotechnology Advances (2015), 33 (7), 1443-1454CODEN: BIADDD; ISSN:0734-9750. (Elsevier)A review. Enzymes are the most proficient catalysts, offering much more competitive processes compared to chem. catalysts. The no. of industrial applications for enzymes has exploded in recent years, mainly owing to advances in protein engineering technol. and environmental and economic necessities. Herein, we review recent progress in enzyme biocatalysis, and discuss the trends and strategies that are leading to broader industrial enzyme applications. The challenges and opportunities in developing biocatalytic processes are also discussed.
-
2Mitchell, A. C.; Briquez, P. S.; Hubbell, J. A.; Cochran, J. R. Engineering Growth Factors for Regenerative Medicine Applications. Acta Biomater. 2016, 30, 1– 12, DOI: 10.1016/j.actbio.2015.11.0072https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhvFWlur%252FK&md5=d04bfadd38b7372647cdb2fb406331bcEngineering growth factors for regenerative medicine applicationsMitchell, Aaron C.; Briquez, Priscilla S.; Hubbell, Jeffrey A.; Cochran, Jennifer R.Acta Biomaterialia (2016), 30 (), 1-12CODEN: ABCICB; ISSN:1742-7061. (Elsevier Ltd.)Growth factors are important morphogenetic proteins that instruct cell behavior and guide tissue repair and renewal. Although their therapeutic potential holds great promise in regenerative medicine applications, translation of growth factors into clin. treatments has been hindered by limitations including poor protein stability, low recombinant expression yield, and suboptimal efficacy. This review highlights current tools, technologies, and approaches to design integrated and effective growth factor-based therapies for regenerative medicine applications. The first section describes rational and combinatorial protein engineering approaches that have been utilized to improve growth factor stability, expression yield, biodistribution, and serum half-life, or alter their cell trafficking behavior or receptor binding affinity. The second section highlights elegant biomaterial-based systems, inspired by the natural extracellular matrix milieu, that have been developed for effective spatial and temporal delivery of growth factors to cell surface receptors. Although appearing distinct, these two approaches are highly complementary and involve principles of mol. design and engineering to be considered in parallel when developing optimal materials for clin. applications. Growth factors are promising therapeutic proteins that have the ability to modulate morphogenetic behaviors, including cell survival, proliferation, migration and differentiation. However, the translation of growth factors into clin. therapies has been hindered by properties such as poor protein stability, low recombinant expression yield, and non-physiol. delivery, which lead to suboptimal efficacy and adverse side effects. To address these needs, researchers are employing clever mol. and material engineering and design strategies to both improve the intrinsic properties of growth factors and effectively control their delivery into tissue. This review highlights examples of interdisciplinary tools and technologies used to augment the therapeutic potential of growth factors for clin. applications in regenerative medicine.
-
3Dvořák, P.; Nikel, P. I.; Damborský, J.; de Lorenzo, V. Bioremediation 3.0: Engineering Pollutant-Removing Bacteria in the Times of Systemic Biology. Biotechnol. Adv. 2017, 35, 845– 866, DOI: 10.1016/j.biotechadv.2017.08.0013https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtlOrt7zO&md5=fdd614f3970f7ea580fd20affbbd4f50Bioremediation 3.0: Engineering pollutant-removing bacteria in the times of systemic biologyDvorak, Pavel; Nikel, Pablo I.; Damborsky, Jiri; de Lorenzo, VictorBiotechnology Advances (2017), 35 (7), 845-866CODEN: BIADDD; ISSN:0734-9750. (Elsevier)Elimination or mitigation of the toxic effects of chem. waste released to the environment by industrial and urban activities relies largely on the catalytic activities of microorganisms-specifically bacteria. Given their capacity to evolve rapidly, they have the biochem. power to tackle a large no. of mols. mobilized from their geol. repositories through human action (e.g., hydrocarbons, heavy metals) or generated through chem. synthesis (e.g., xenobiotic compds.). Whereas naturally occurring microbes already have considerable ability to remove many environmental pollutants with no external intervention, the onset of genetic engineering in the 1980s allowed the possibility of rational design of bacteria to catabolize specific compds., which could eventually be released into the environment as bioremediation agents. The complexity of this endeavour and the lack of fundamental knowledge nonetheless led to the virtual abandonment of such a recombinant DNA-based bioremediation only a decade later. In a twist of events, the last few years have witnessed the emergence of new systemic fields (including systems and synthetic biol., and metabolic engineering) that allow revisiting the same environmental pollution challenges through fresh and far more powerful approaches. The focus on contaminated sites and chems. has been broadened by the phenomenal problems of anthropogenic emissions of greenhouse gases and the accumulation of plastic waste on a global scale. In this article, we analyze how contemporary systemic biol. is helping to take the design of bioremediation agents back to the core of environmental biotechnol. We inspect a no. of recent strategies for catabolic pathway construction and optimization and we bring them together by proposing an engineering workflow.
-
4Vanacek, P.; Sebestova, E.; Babkova, P.; Bidmanova, S.; Daniel, L.; Dvorak, P.; Stepankova, V.; Chaloupkova, R.; Brezovsky, J.; Prokop, Z.; Damborsky, J. Exploration of Enzyme Diversity by Integrating Bioinformatics with Expression Analysis and Biochemical Characterization. ACS Catal. 2018, 8, 2402– 2412, DOI: 10.1021/acscatal.7b035234https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit1Wiur0%253D&md5=d6ecaacad16a14d9020c4c22df0220f6Exploration of enzyme diversity by integrating bioinformatics with expression analysis and biochemical characterizationVanacek, Pavel; Sebestova, Eva; Babkova, Petra; Bidmanova, Sarka; Daniel, Lukas; Dvorak, Pavel; Stepankova, Veronika; Chaloupkova, Radka; Brezovsky, Jan; Prokop, Zbynek; Damborsky, JiriACS Catalysis (2018), 8 (3), 2402-2412CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)Millions of protein sequences are being discovered at an incredible pace, representing an inexhaustible source of biocatalysts. Here, we describe an integrated system for automated in silico screening and systematic characterization of diverse family members. The workflow consists of (i) identification and computational characterization of relevant genes by sequence/structural bioinformatics, (ii) expression anal. and activity screening of selected proteins, and (iii) complete biochem./biophys. characterization and was validated against the haloalkane dehalogenase family. The sequence-based search identified 658 potential dehalogenases. The subsequent structural bioinformatics prioritized and selected 20 candidates for exploration of protein functional diversity. Out of these 20, the expression anal. and the robotic screening of enzymic activity provided 8 sol. proteins with dehalogenase activity. The enzymes discovered originated from genetically unrelated Bacteria, Eukaryota, and also Archaea. Overall, the integrated system provided biocatalysts with broad catalytic diversity showing unique substrate specificity profiles, covering a wide range of optimal operational temp. from 20 to 70 °C and an unusually broad pH range from 5.7 to 10. We obtained the most catalytically proficient native haloalkane dehalogenase enzyme to date (kcat/K0.5 = 96.8 mM-1s-1), the most thermostable enzyme with melting temp. 71 °C, three different cold-adapted enzymes showing dehalogenase activity at near-to-zero temps., and a biocatalyst degrading the warfare chem. sulfur mustard. The established strategy can be adapted to other enzyme families for exploration of their biocatalytic diversity in a large sequence space continuously growing due to the use of next-generation sequencing technologies.
-
5Bornscheuer, U. T.; Huisman, G. W.; Kazlauskas, R. J.; Lutz, S.; Moore, J. C.; Robins, K. Engineering the Third Wave of Biocatalysis. Nature 2012, 485, 185– 194, DOI: 10.1038/nature111175https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmvVeqsLk%253D&md5=5f20c530c25ea886f5f5d33dbea0075aEngineering the third wave of biocatalysisBornscheuer, U. T.; Huisman, G. W.; Kazlauskas, R. J.; Lutz, S.; Moore, J. C.; Robins, K.Nature (London, United Kingdom) (2012), 485 (7397), 185-194CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)A review. Over the past ten years, scientific and technol. advances have established biocatalysis as a practical and environmentally friendly alternative to traditional metallo- and organocatalysis in chem. synthesis, both in the lab. and on an industrial scale. Key advances in DNA sequencing and gene synthesis are at the base of tremendous progress in tailoring biocatalysts by protein engineering and design, and the ability to reorganize enzymes into new biosynthetic pathways. To highlight these achievements, here we discuss applications of protein-engineered biocatalysts ranging from commodity chems. to advanced pharmaceutical intermediates that use enzyme catalysis as a key step.
-
6Tokuriki, N.; Stricher, F.; Serrano, L.; Tawfik, D. S. How Protein Stability and New Functions Trade Off. PLoS Comput. Biol. 2008, 4, e1000002, DOI: 10.1371/journal.pcbi.10000026https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1czgtFWksg%253D%253D&md5=ade16cd7f3f47d20357654c6b85ce338How protein stability and new functions trade offTokuriki Nobuhiko; Stricher Francois; Serrano Luis; Tawfik Dan SPLoS computational biology (2008), 4 (2), e1000002 ISSN:.Numerous studies have noted that the evolution of new enzymatic specificities is accompanied by loss of the protein's thermodynamic stability (DeltaDeltaG), thus suggesting a tradeoff between the acquisition of new enzymatic functions and stability. However, since most mutations are destabilizing (DeltaDeltaG>0), one should ask how destabilizing mutations that confer new or altered enzymatic functions relative to all other mutations are. We applied DeltaDeltaG computations by FoldX to analyze the effects of 548 mutations that arose from the directed evolution of 22 different enzymes. The stability effects, location, and type of function-altering mutations were compared to DeltaDeltaG changes arising from all possible point mutations in the same enzymes. We found that mutations that modulate enzymatic functions are mostly destabilizing (average DeltaDeltaG = +0.9 kcal/mol), and are almost as destabilizing as the "average" mutation in these enzymes (+1.3 kcal/mol). Although their stability effects are not as dramatic as in key catalytic residues, mutations that modify the substrate binding pockets, and thus mediate new enzymatic specificities, place a larger stability burden than surface mutations that underline neutral, non-adaptive evolutionary changes. How are the destabilizing effects of functional mutations balanced to enable adaptation? Our analysis also indicated that many mutations that appear in directed evolution variants with no obvious role in the new function exert stabilizing effects that may compensate for the destabilizing effects of the crucial function-altering mutations. Thus, the evolution of new enzymatic activities, both in nature and in the laboratory, is dependent on the compensatory, stabilizing effect of apparently "silent" mutations in regions of the protein that are irrelevant to its function.
-
7Dellus-Gur, E.; Toth-Petroczy, A.; Elias, M.; Tawfik, D. S. What Makes a Protein Fold Amenable to Functional Innovation? Fold Polarity and Stability Trade-Offs. J. Mol. Biol. 2013, 425, 2609– 2621, DOI: 10.1016/j.jmb.2013.03.0337https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmtV2rsbs%253D&md5=3c572ecbdb0b1d5ace71fd67257be136What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offsDellus-Gur, Eynat; Toth-Petroczy, Agnes; Elias, Mikael; Tawfik, Dan S.Journal of Molecular Biology (2013), 425 (14), 2609-2621CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Protein evolvability includes 2 elements, robustness (or neutrality, mutations having no effect) and innovability (mutations readily inducing new functions). How are these 2 conflicting demands bridged. Does the ability to bridge them relate to the observation that certain folds, such as TIM barrels, accommodate numerous functions, whereas other folds support only one. Here, the authors hypothesized that the key to innovability is polarity, an active site composed of flexible, loosely packed loops alongside a well-sepd., highly ordered scaffold. The authors showed that highly stabilized variants of TEM-1 β-lactamase exhibited selective rigidification of the enzyme's scaffold while the active site loops maintained their conformational plasticity. Polarity therefore results in stabilizing, compensatory mutations not trading off, but instead promoting the acquisition of new activities. Indeed, computational anal. indicated that in folds that accommodate only one function throughout evolution, e.g., dihydrofolate reductase, ≥60% of the active site residues belonged to the scaffold. In contrast, folds assocd. with multiple functions such as the TIM barrel showed high scaffold-active site polarity (∼20% of the active site comprised scaffold residues) and >2-fold higher rates of sequence divergence at active site positions. Thus, this work suggests structural measures of fold polarity that appear to be correlated with innovability, thereby providing new insights regarding protein evolution, design, and engineering.
-
8Johansson, K. E.; Johansen, N. T.; Christensen, S.; Horowitz, S.; Bardwell, J. C. A.; Olsen, J. G.; Willemoës, M.; Lindorff-Larsen, K.; Ferkinghoff-Borg, J.; Hamelryck, T.; Winther, J. R. Computational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone Template. J. Mol. Biol. 2016, 428, 4361– 4377, DOI: 10.1016/j.jmb.2016.09.0138https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsFOku7%252FM&md5=7f8f2300f99adbd84be36e52806c9a1eComputational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone TemplateJohansson, Kristoffer E.; Johansen, Nicolai Tidemand; Christensen, Signe; Horowitz, Scott; Bardwell, James C. A.; Olsen, Johan G.; Willemoes, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R.Journal of Molecular Biology (2016), 428 (21), 4361-4377CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 exptl. detd. template structures and generated 120 designs from each. For exptl. evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce sol. protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, sol. proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in sol. protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations.
-
9Arabnejad, H.; Dal Lago, M.; Jekel, P. A.; Floor, R. J.; Thunnissen, A.-M. W. H.; Terwisscha van Scheltinga, A. C.; Wijma, H. J.; Janssen, D. B. A Robust Cosolvent-Compatible Halohydrin Dehalogenase by Computational Library Design. Protein Eng., Des. Sel. 2017, 30, 175– 189, DOI: 10.1093/protein/gzw0689https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhsFWmsL7P&md5=b96764d41caee68bf2da8c229f63fa95A robust cosolvent-compatible halohydrin dehalogenase by computational library designArabnejad, Hesam; Lago, Marco Dal; Jekel, Peter A.; Floor, Robert J.; Thunnissen, Andy-Mark W. H.; van Scheltinga, Anke C. Terwisscha; Wijma, Hein J.; Janssen, Dick B.Protein Engineering, Design & Selection (2017), 30 (3), 175-189CODEN: PEDSBR; ISSN:1741-0134. (Oxford University Press)To improve the applicability of halohydrin dehalogenase as a catalyst for reactions in the presence of org. cosolvents, we explored a computational library design strategy (Framework for Rapid Enzyme Stabilization by Computational libraries) that involves discovery and in silico evaluation of stabilizing mutations. Energy calcns., disulfide bond predictions and mol. dynamics simulations identified 218 point mutations and 35 disulfide bonds with predicted stabilizing effects. Expts. confirmed 29 stabilizing point mutations, most of which were located in two distinct regions, whereas introduction of disulfide bonds was not effective. Combining the best mutations resulted in a 12-fold mutant (HheC-H12) with a 28°C higher apparent melting temp. and a remarkable increase in resistance to cosolvents. This variant also showed a higher optimum temp. for catalysis while activity at low temp. was preserved. Mutant H12 was used as a template for the introduction of mutations that enhance enantioselectivity or activity. Crystal structures showed that the structural changes in the H12 mutant mostly agreed with the computational predictions and that the enhanced stability was mainly due to mutations that redistributed surface charges and improved interactions between subunits, the latter including better interactions of water mols. at the subunit interfaces.
-
10Wyganowski, K. T.; Kaltenbach, M.; Tokuriki, N. GroEL/ES Buffering and Compensatory Mutations Promote Protein Evolution by Stabilizing Folding Intermediates. J. Mol. Biol. 2013, 425, 3403– 3414, DOI: 10.1016/j.jmb.2013.06.02810https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtFOgtLbO&md5=ff35330976ea66426ccd97b1f481eb5aGroEL/ES Buffering and Compensatory Mutations Promote Protein Evolution by Stabilizing Folding IntermediatesWyganowski, Kirsten T.; Kaltenbach, Miriam; Tokuriki, NobuhikoJournal of Molecular Biology (2013), 425 (18), 3403-3414CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Maintaining stability is a major constraint in protein evolution because most mutations are destabilizing. Buffering and/or compensatory mechanisms that counteract this progressive destabilization during functional adaptation are pivotal for protein evolution as well as protein engineering. However, the interplay of these two mechanisms during a full evolutionary trajectory has never been explored. Here, we unravel such dynamics during the lab. evolution of a phosphotriesterase into an arylesterase. A controllable GroEL/ES chaperone co-expression system enabled us to vary the selection environment between buffering and compensatory, which smoothened the trajectory along the fitness landscape to achieve a > 104 increase in arylesterase activity. Biophys. characterization revealed that, in contrast to prevalent models of protein stability and evolution, the variants' sol. cellular expression did not correlate with in vitro stability, and compensatory mutations were linked to a stabilization of folding intermediates. Thus, folding kinetics in the cell are a key feature of protein evolvability.
-
11Lawrence, P. B.; Gavrilov, Y.; Matthews, S. S.; Langlois, M. I.; Shental-Bechor, D.; Greenblatt, H. M.; Pandey, B. K.; Smith, M. S.; Paxman, R.; Torgerson, C. D.; Merrell, J. P.; Ritz, C. C.; Prigozhin, M. B.; Levy, Y.; Price, J. L. Criteria for Selecting PEGylation Sites on Proteins for Higher Thermodynamic and Proteolytic Stability. J. Am. Chem. Soc. 2014, 136, 17547– 17560, DOI: 10.1021/ja509518311https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhvFOjsL3F&md5=080147667225e65ace2de42bbb99266cCriteria for Selecting PEGylation Sites on Proteins for Higher Thermodynamic and Proteolytic StabilityLawrence, Paul B.; Gavrilov, Yulian; Matthews, Sam S.; Langlois, Minnie I.; Shental-Bechor, Dalit; Greenblatt, Harry M.; Pandey, Brijesh K.; Smith, Mason S.; Paxman, Ryan; Torgerson, Chad D.; Merrell, Jacob P.; Ritz, Cameron C.; Prigozhin, Maxim B.; Levy, Yaakov; Price, Joshua L.Journal of the American Chemical Society (2014), 136 (50), 17547-17560CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)PEGylation of protein side chains has been used for >30 years to enhance the pharmacokinetic properties of protein drugs. However, there are no structure- or sequence-based guidelines for selecting sites that provide optimal PEG-based pharmacokinetic enhancement with minimal losses to biol. activity. The authors hypothesize that globally optimal PEGylation sites are characterized by the ability of the PEG oligomer to increase protein conformational stability; however, the current understanding of how PEG influences the conformational stability of proteins is incomplete. Here the authors use the WW domain of the human protein Pin 1 (WW) as a model system to probe the impact of PEG on protein conformational stability. Using a combination of exptl. and theor. approaches, the authors develop a structure-based method for predicting which sites within WW are most likely to experience PEG-based stabilization, and this method correctly predicts the location of a stabilizing PEGylation site within the chicken Src SH3 domain. PEG-based stabilization in WW is assocd. with enhanced resistance to proteolysis, is entropic in origin, and likely involves disruption by PEG of the network of hydrogen-bound solvent mols. that surround the protein. The authors' results highlight the possibility of using modern site-specific PEGylation techniques to install PEG oligomers at predetd. locations where PEG will provide optimal increases in conformational and proteolytic stability.
-
12Rueda, N.; Dos Santos, J. C. S.; Ortiz, C.; Torres, R.; Barbosa, O.; Rodrigues, R. C.; Berenguer-Murcia, Á.; Fernandez-Lafuente, R. Chemical Modification in the Design of Immobilized Enzyme Biocatalysts: Drawbacks and Opportunities. Chem. Rec. 2016, 16, 1436– 1455, DOI: 10.1002/tcr.20160000712https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XnslajsL4%253D&md5=a13d77ce97c2c090b8cb40bc079aec4dChemical Modification in the Design of Immobilized Enzyme Biocatalysts: Drawbacks and OpportunitiesRueda, Nazzoly; dos Santos, Jose C. S.; Ortiz, Claudia; Torres, Rodrigo; Barbosa, Oveimar; Rodrigues, Rafael C.; Berenguer-Murcia, Angel; Fernandez-Lafuente, RobertoChemical Record (2016), 16 (3), 1436-1455CODEN: CRHEAK; ISSN:1528-0691. (Wiley-VCH Verlag GmbH & Co. KGaA)Chem. modification of enzymes and immobilization used to be considered as sep. ways to improve enzyme properties. This review shows how the coupled use of both tools may greatly improve the final biocatalyst performance. Chem. modification of a previously immobilized enzyme is far simpler and easier to control than the modification of the free enzyme. Moreover, if protein modification is performed to improve its immobilization (enriching the enzyme in reactive groups), the final features of the immobilized enzyme may be greatly improved. Chem. modification may be directed to improve enzyme stability, but also to improve selectivity, specificity, activity, and even cell penetrability. Coupling of immobilization and chem. modification with site-directed mutagenesis is a powerful instrument to obtain fully controlled modification. Some new ideas such as photoreceptive enzyme modifiers that change their phys. properties under UV exposition are discussed.
-
13Stepankova, V.; Bidmanova, S.; Koudelakova, T.; Prokop, Z.; Chaloupkova, R.; Damborsky, J. Strategies for Stabilization of Enzymes in Organic Solvents. ACS Catal. 2013, 3, 2823– 2836, DOI: 10.1021/cs400684x13https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhs1Sqs7nL&md5=6fdea25bb110c5b82c8fd8c03dcf7e90Strategies for Stabilization of Enzymes in Organic SolventsStepankova, Veronika; Bidmanova, Sarka; Koudelakova, Tana; Prokop, Zbynek; Chaloupkova, Radka; Damborsky, JiriACS Catalysis (2013), 3 (12), 2823-2836CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)A review. One of the major barriers to the use of enzymes in industrial biotechnol. is their insufficient stability under processing conditions. The use of org. solvent systems instead of aq. media for enzymic reactions offers numerous advantages, such as increased soly. of hydrophobic substrates or suppression of water-dependent side reactions. For example, reverse hydrolysis reactions that form esters from acids and alcs. become thermodynamically favorable. However, org. solvents often inactivate enzymes. Industry and academia have devoted considerable effort into developing effective strategies to enhance the lifetime of enzymes in the presence of org. solvents. The strategies can be grouped into three main categories: (i) isolation of novel enzymes functioning under extreme conditions, (ii) modification of enzyme structures to increase their resistance toward nonconventional media, and (iii) modification of the solvent environment to decrease its denaturing effect on enzymes. Here we discuss successful examples representing each of these categories and summarize their advantages and disadvantages. Finally, we highlight some potential future research directions in the field, such as investigation of novel nanomaterials for immobilization, wider application of computational tools for semirational prediction of stabilizing mutations, knowledge-driven modification of key structural elements learned from successfully engineered proteins, and replacement of volatile org. solvents by ionic liqs. and deep eutectic solvents.
-
14Butt, T. R.; Edavettal, S. C.; Hall, J. P.; Mattern, M. R. SUMO Fusion Technology for Difficult-to-Express Proteins. Protein Expression Purif. 2005, 43, 1– 9, DOI: 10.1016/j.pep.2005.03.01614https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXntVGhsbw%253D&md5=999298b10bc86121509754c3fe448baeSUMO fusion technology for difficult-to-express proteinsButt, Tauseef R.; Edavettal, Suzanne C.; Hall, John P.; Mattern, Michael R.Protein Expression and Purification (2005), 43 (1), 1-9CODEN: PEXPEJ; ISSN:1046-5928. (Elsevier)A review. The demands of structural and functional genomics for large quantities of sol., properly folded proteins in heterologous hosts have been aided by advancements in the field of protein prodn. and purifn. Escherichia coli, the preferred host for recombinant protein expression, presents many challenges which must be surmounted in order to over-express heterologous proteins. These challenges include the proteolytic degrdn. of target proteins, protein misfolding, poor soly., and the necessity for good purifn. methodologies. Gene fusion technologies have been able to improve heterologous expression by overcoming many of these challenges. The ability of gene fusions to improve expression, soly., purifn., and decrease proteolytic degrdn. will be discussed in this review. The main disadvantage, cleaving the protein fusion, will also be addressed. Focus will be given to the newly described SUMO fusion system and the improvements that this technol. has advanced over traditional gene fusion systems.
-
15LaVallie, E. R.; DiBlasio, E. A.; Kovacic, S.; Grant, K. L.; Schendel, P. F.; McCoy, J. M. A Thioredoxin Gene Fusion Expression System That Circumvents Inclusion Body Formation in the E. coli Cytoplasm. Nat. Biotechnol. 1993, 11, 187– 193, DOI: 10.1038/nbt0293-18715https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXisFegsr0%253D&md5=d29a3b665417f16971038d34f7e58d92A thioredoxin gene fusion expression system that circumvents inclusion body formation in the E. coli cytoplasmLaVallie, Edward R.; DiBlasio, Elizabeth A.; Kovacic, Sharlotte; Grant, Kathleen L.; Schendel, Paul F.; McCoy, John M.Bio/Technology (1993), 11 (2), 187-93CODEN: BTCHDA; ISSN:0733-222X.A versatile Escherichia coli expression system was developed based on the use of E. coli thioredoxin (trxA) as a gene fusion partner. The broad utility of the system is illustrated by the prodn. of a variety of mammalian cytokines and growth factors as thioredoxin fusion proteins. Although many of these cytokines previously have been produced in E. coli as insol. aggregates or inclusion bodies, as thioredoxin fusions they can be made in sol. forms that are biol. active. In general, linkage to thioredoxin dramatically increases the soly. of heterologous proteins synthesized in the E. coli cytoplasm, and thioredoxin fusion proteins usually accumulate to high levels. Two addnl. properties of E. coli thioredoxin, its ability to be specifically released from the E. coli cytoplasm by osmotic shock or freeze/thaw treatments and its intrinsic thermal stability , are retained by some fusions and provide convenient purifn. steps. Active-site loop of E. coli thioredoxin can be used as a general site for small peptide insertions, allowing for the high level prodn. of sol. peptides in the E. coli cytoplasm.
-
16Bloom, J. D.; Labthavikul, S. T.; Otey, C. R.; Arnold, F. H. Protein Stability Promotes Evolvability. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 5869– 5874, DOI: 10.1073/pnas.051009810316https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XktFait7s%253D&md5=dde8f702bc7083edad42a615aff09292Protein stability promotes evolvabilityBloom, Jesse D.; Labthavikul, Sy T.; Otey, Christopher R.; Arnold, Frances H.Proceedings of the National Academy of Sciences of the United States of America (2006), 103 (15), 5869-5874CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)The biophys. properties that enable proteins to so readily evolve to perform diverse biochem. tasks are largely unknown. Here, we show that a protein's capacity to evolve is enhanced by the mutational robustness conferred by extra stability. We use simulations with model lattice proteins to demonstrate how extra stability increases evolvability by allowing a protein to accept a wider range of beneficial mutations while still folding to its native structure. We confirm this view exptl. by mutating marginally stable and thermostable variants of cytochrome P 450 BM3. Mutants of the stabilized parent were more likely to exhibit new or improved functions. Only the stabilized P 450 parent could tolerate the highly destabilizing mutations needed to confer novel activities such as hydroxylating the antiinflammatory drug naproxen. Our work establishes a crucial link between protein stability and evolution. We show that we can exploit this link to discover protein functions, and we suggest how natural evolution might do the same.
-
17Sormanni, P.; Aprile, F. A.; Vendruscolo, M. The CamSol Method of Rational Design of Protein Mutants with Enhanced Solubility. J. Mol. Biol. 2015, 427, 478– 490, DOI: 10.1016/j.jmb.2014.09.02617https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhslOktbfN&md5=10cea42ff7f45b198c6bc60f52127adfThe CamSol Method of Rational Design of Protein Mutants with Enhanced SolubilitySormanni, Pietro; Aprile, Francesco A.; Vendruscolo, MicheleJournal of Molecular Biology (2015), 427 (2), 478-490CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Protein soly. is often an essential requirement in biotechnol. and biomedical applications. Great advances in understanding the principles that det. this specific property of proteins have been made during the past decade, in particular concerning the physicochem. characteristics of their constituent amino acids. By exploiting these advances, we present the CamSol method for the rational design of protein variants with enhanced soly. The method works by performing a rapid computational screening of tens of thousand of mutations to identify those with the greatest impact on the soly. of the target protein while maintaining its native state and biol. activity. The application to a single-domain antibody that targets the Alzheimer's Aβ peptide demonstrates that the method predicts with great accuracy soly. changes upon mutation, thus offering a cost-effective strategy to help the prodn. of sol. proteins for academic and industrial purposes.
-
18Ganesan, A.; Siekierska, A.; Beerten, J.; Brams, M.; Van Durme, J.; De Baets, G.; Van der Kant, R.; Gallardo, R.; Ramakers, M.; Langenberg, T.; Wilkinson, H.; De Smet, F.; Ulens, C.; Rousseau, F.; Schymkowitz, J. Structural Hot Spots for the Solubility of Globular Proteins. Nat. Commun. 2016, 7, 10816, DOI: 10.1038/ncomms1081618https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xjt1emtbw%253D&md5=eba06a338f47637413eff7720f355602Structural hot spots for the solubility of globular proteinsGanesan, Ashok; Siekierska, Aleksandra; Beerten, Jacinte; Brams, Marijke; Van Durme, Joost; De Baets, Greet; Van der Kant, Rob; Gallardo, Rodrigo; Ramakers, Meine; Langenberg, Tobias; Wilkinson, Hannah; De Smet, Frederik; Ulens, Chris; Rousseau, Frederic; Schymkowitz, JoostNature Communications (2016), 7 (), 10816CODEN: NCAOBW; ISSN:2041-1723. (Nature Publishing Group)Natural selection shapes protein soly. to physiol. requirements and recombinant applications that require higher protein concns. are often problematic. This raises the question whether the soly. of natural protein sequences can be improved. We here show an anti-correlation between the no. of aggregation prone regions (APRs) in a protein sequence and its soly., suggesting that mutational suppression of APRs provides a simple strategy to increase protein soly. We show that mutations at specific positions within a protein structure can act as APR suppressors without affecting protein stability. These hot spots for protein soly. are both structure and sequence dependent but can be computationally predicted. We demonstrate this by reducing the aggregation of human α-galactosidase and protective antigen of Bacillus anthracis through mutation. Our results indicate that many proteins possess hot spots allowing to adapt protein soly. independently of structure and function.
-
19Zeymer, C.; Hilvert, D. Directed Evolution of Protein Catalysts. Annu. Rev. Biochem. 2018, 87, 131– 157, DOI: 10.1146/annurev-biochem-062917-01203419https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXjs1Oisr0%253D&md5=933b5fe198a29f6e4ac0a738e34a566dDirected Evolution of Protein CatalystsZeymer, Cathleen; Hilvert, DonaldAnnual Review of Biochemistry (2018), 87 (), 131-157CODEN: ARBOAW; ISSN:0066-4154. (Annual Reviews)A review. Directed evolution is a powerful technique for generating tailor-made enzymes for a wide range of biocatalytic applications. Following the principles of natural evolution, iterative cycles of mutagenesis and screening or selection are applied to modify protein properties, enhance catalytic activities, or develop completely new protein catalysts for non-natural chem. transformations. This review briefly surveys the exptl. methods used to generate genetic diversity and screen or select for improved enzyme variants. Emphasis is placed on a key challenge, namely how to generate novel catalytic activities that expand the scope of natural reactions. Two particularly effective strategies, exploiting catalytic promiscuity and rational design, are illustrated by representative examples of successfully evolved enzymes. Opportunities for extending these approaches to more complex biocatalytic systems are also considered.
-
20Starr, T. N.; Thornton, J. W. Epistasis in Protein Evolution. Protein Sci. 2016, 25, 1204– 1218, DOI: 10.1002/pro.289720https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xjt1Cnt78%253D&md5=1bebae9ad2da57b530ac51ade55e1813Epistasis in protein evolutionStarr, Tyler N.; Thornton, Joseph W.Protein Science (2016), 25 (7), 1204-1218CODEN: PRCIEI; ISSN:1469-896X. (Wiley-Blackwell)The structure, function, and evolution of proteins depend on phys. and genetic interactions among amino acids. Recent studies have used new strategies to explore the prevalence, biochem. mechanisms, and evolutionary implications of these interactions-called epistasis-within proteins. Here we describe an emerging picture of pervasive epistasis in which the phys. and biol. effects of mutations change over the course of evolution in a lineage-specific fashion. Epistasis can restrict the trajectories available to an evolving protein or open new paths to sequences and functions that would otherwise have been inaccessible. We describe two broad classes of epistatic interactions, which arise from different phys. mechanisms and have different effects on evolutionary processes. Specific epistasis-in which one mutation influences the phenotypic effect of few other mutations-is caused by direct and indirect phys. interactions between mutations, which nonadditively change the protein's phys. properties, such as conformation, stability, or affinity for ligands. In contrast, nonspecific epistasis describes mutations that modify the effect of many others; these typically behave additively with respect to the phys. properties of a protein but exhibit epistasis because of a nonlinear relationship between the phys. properties and their biol. effects, such as function or fitness. Both types of interaction are rampant, but specific epistasis has stronger effects on the rate and outcomes of evolution, because it imposes stricter constraints and modulates evolutionary potential more dramatically; it therefore makes evolution more contingent on low-probability historical events and leaves stronger marks on the sequences, structures, and functions of protein families.
-
21Goldsmith, M.; Tawfik, D. S. Enzyme Engineering: Reaching the Maximal Catalytic Efficiency Peak. Curr. Opin. Struct. Biol. 2017, 47, 140– 150, DOI: 10.1016/j.sbi.2017.09.00221https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhs1ais7vK&md5=6d05255856d58b4d8fd491c5b68da28bEnzyme engineering: reaching the maximal catalytic efficiency peakGoldsmith, Moshe; Tawfik, Dan S.Current Opinion in Structural Biology (2017), 47 (), 140-150CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)A review. The practical need for highly efficient enzymes presents new challenges in enzyme engineering, in particular, the need to improve catalytic turnover (kcat) or efficiency (kcat/KM) by several orders of magnitude. However, optimizing catalysis demands navigation through complex and rugged fitness landscapes, with optimization trajectories often leading to strong diminishing returns and dead-ends. When no further improvements are obsd. in library screens or selections, it remains unclear whether the maximal catalytic efficiency of the enzyme (the catalytic 'fitness peak') has been reached; or perhaps, an alternative combination of mutations exists that could yield addnl. improvements. Here, we discuss fundamental aspects of the process of catalytic optimization, and offer practical solns. with respect to overcoming optimization plateaus.
-
22Currin, A.; Swainston, N.; Day, P. J.; Kell, D. B. Synthetic Biology for the Directed Evolution of Protein Biocatalysts: Navigating Sequence Space Intelligently. Chem. Soc. Rev. 2015, 44, 1172– 1239, DOI: 10.1039/C4CS00351A22https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXitFeht7jK&md5=c921dc51a66756d2d3d96f2d0b619b38Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligentlyCurrin, Andrew; Swainston, Neil; Day, Philip J.; Kell, Douglas B.Chemical Society Reviews (2015), 44 (5), 1172-1239CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biol., whereby increasingly large sequences of DNA can be synthesized de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the no. of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modeling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), esp. with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a no. of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modeling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biol., offers scope for the development of novel biocatalysts that are both highly active and robust.
-
23Rocklin, G. J.; Chidyausiku, T. M.; Goreshnik, I.; Ford, A.; Houliston, S.; Lemak, A.; Carter, L.; Ravichandran, R.; Mulligan, V. K.; Chevalier, A.; Arrowsmith, C. H.; Baker, D. Global Analysis of Protein Folding Using Massively Parallel Design, Synthesis, and Testing. Science 2017, 357, 168– 175, DOI: 10.1126/science.aan069323https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFOjs7rK&md5=0c089edbcc1309b72f412cfe72d149cfGlobal analysis of protein folding using massively parallel design, synthesis, and testingRocklin, Gabriel J.; Chidyausiku, Tamuka M.; Goreshnik, Inna; Ford, Alex; Houliston, Scott; Lemak, Alexander; Carter, Lauren; Ravichandran, Rashmi; Mulligan, Vikram K.; Chevalier, Aaron; Arrowsmith, Cheryl H.; Baker, DavidScience (Washington, DC, United States) (2017), 357 (6347), 168-175CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are "encoded" in the thousands of known protein structures, "decoding" them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 neg. control sequences. This anal. identified more than 2500 stable designed proteins in four basic folds - a no. sufficient to enable us to systematically examine how sequence dets. folding and stability in uncharted protein space. Iteration between design and expt. increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and expt. and has the potential to transform computational protein design into a data-driven science.
-
24Sumbalova, L.; Stourac, J.; Martinek, T.; Bednar, D.; Damborsky, J. HotSpot Wizard 3.0: Web Server for Automated Design of Mutations and Smart Libraries Based on Sequence Input Information. Nucleic Acids Res. 2018, 46, W356– W362, DOI: 10.1093/nar/gky41724https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXosVyrs7s%253D&md5=2b71751334fae9917f809937b25e0c34HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input informationSumbalova, Lenka; Stourac, Jan; Martinek, Tomas; Bednar, David; Damborsky, JiriNucleic Acids Research (2018), 46 (W1), W356-W362CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)HotSpot Wizard is a web server used for the automated identification of hotspots in semi-rational protein design to give improved protein stability, catalytic activity, substrate specificity and enantioselectivity. Since there are three orders of magnitude fewer protein structures than sequences in bioinformatic databases, the major limitation to the usability of previous versions was the requirement for the protein structure to be a compulsory input for the calcn. HotSpot Wizard 3.0 now accepts the protein sequence as input data. The protein structure for the query sequence is obtained either from eight repositories of homol. models or is modeled using Modeller and I-Tasser. The quality of the models is then evaluated using three quality assessment tools--WHAT CHECK, PROCHECK and Mol- Probity. During follow-up analyses, the system automatically warns the users whenever they attempt to redesign poorly predicted parts of their homol. models. The second main limitation of HotSpot Wizard's predictions is that it identifies suitable positions for mutagenesis, but does not provide any reliable advice on particular substitutions. A new module for the estn. of thermodn. stabilities using the Rosetta and FoldX suites has been introduced which prevents destabilizing mutations among pre-selected variants entering exptl. testing.
-
25Kuipers, R. K.; Joosten, H.-J.; van Berkel, W. J. H.; Leferink, N. G. H.; Rooijen, E.; Ittmann, E.; van Zimmeren, F.; Jochens, H.; Bornscheuer, U.; Vriend, G.; Martins dos Santos, V. A. P.; Schaap, P. J. 3DM: Systematic Analysis of Heterogeneous Superfamily Data to Discover Protein Functionalities. Proteins: Struct., Funct., Bioinf. 2010, 78, 2101– 2113, DOI: 10.1002/prot.2272525https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXlslegtrc%253D&md5=0bff2b9dfe4df7986583e43e9a7b16823DM: systematic analysis of heterogeneous superfamily data to discover protein functionalitiesKuipers, Remko K.; Joosten, Henk-Jan; van Berkel, Willem J. H.; Leferink, Nicole G. H.; Rooijen, Erik; Ittmann, Erik; van Zimmeren, Frank; Jochens, Helge; Bornscheuer, Uwe; Vriend, Gert; Martins dos Santos, Vitor A. P.; Schaap, Peter J.Proteins: Structure, Function, and Bioinformatics (2010), 78 (9), 2101-2113CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)Ten years of experience with mol. class-specific information systems (MCSIS) such as with the hand-curated G protein-coupled receptor database (GPCRDB) or the semiautomatically generated nuclear receptor database has made clear that a wide variety of questions can be answered when protein-related data from many different origins can be flexibly combined. MCSISes revolve around a multiple sequence alignment (MSA) that includes "all" available sequences from the entire superfamily, and it has been shown at many occasions that the quality of these alignments is the most crucial aspect of the MCSIS approach. We describe here a system called 3DM that can automatically build an entire MCSIS. 3DM bases the MSA on a multiple structure alignment, which implies that the availability of a large no. of superfamily members with a known three-dimensional structure is a requirement for 3DM to succeed well. Thirteen MCSISes were constructed and placed on the Internet for examn. These systems have been instrumental in a large series of research projects related to enzyme activity or the understanding and engineering of specificity, protein stability engineering, DNA-diagnostics, drug design, and so forth.
-
26Reetz, M. T.; Carballeira, J. D. Iterative Saturation Mutagenesis (ISM) for Rapid Directed Evolution of Functional Enzymes. Nat. Protoc. 2007, 2, 891– 903, DOI: 10.1038/nprot.2007.7226https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtFGnur%252FP&md5=03f309e6d923d5c2506e3718362b7ee7Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymesReetz, Manfred T.; Carballeira, Jose DanielNature Protocols (2007), 2 (4), 891-903CODEN: NPARDW; ISSN:1750-2799. (Nature Publishing Group)Iterative satn. mutagenesis (ISM) is a new and efficient method for the directed evolution of functional enzymes. It reduces the necessary mol. biol. work and the screening effort drastically. It is based on a Cartesian view of the protein structure, performing iterative cycles of satn. mutagenesis at rationally chosen sites in an enzyme, a given site being composed of one, two or three amino acid positions. The basis for choosing these sites depends on the nature of the catalytic property to be improved, e.g., enantioselectivity, substrate acceptance or thermostability. In the case of thermostability, sites showing highest B-factors (available from x-ray data) are chosen. The pronounced increase in thermostability of the lipase from Bacillus subtilis (Lip A) as a result of applying ISM is illustrated here.
-
27Liskova, V.; Stepankova, V.; Bednar, D.; Brezovsky, J.; Prokop, Z.; Chaloupkova, R.; Damborsky, J. Different Structural Origins of the Enantioselectivity of Haloalkane Dehalogenases toward Linear β-Haloalkanes: Open-Solvated versus Occluded-Desolvated Active Sites. Angew. Chem., Int. Ed. 2017, 56, 4719– 4723, DOI: 10.1002/anie.20161119327https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXkvFWlsrc%253D&md5=180d88a209ea61b5e45f3e03c826db89Different Structural Origins of the Enantioselectivity of Haloalkane Dehalogenases toward Linear β-Haloalkanes: Open-Solvated versus Occluded-Desolvated Active SitesLiskova, Veronika; Stepankova, Veronika; Bednar, David; Brezovsky, Jan; Prokop, Zbynek; Chaloupkova, Radka; Damborsky, JiriAngewandte Chemie, International Edition (2017), 56 (17), 4719-4723CODEN: ACIEF5; ISSN:1433-7851. (Wiley-VCH Verlag GmbH & Co. KGaA)The enzymic enantiodiscrimination of linear β-haloalkanes is difficult because the simple structures of the substrates prevent directional interactions. Herein we describe two distinct mol. mechanisms for the enantiodiscrimination of the β-haloalkane 2-bromopentane by haloalkane dehalogenases. Highly enantioselective DbjA has an open, solvent-accessible active site, whereas the engineered enzyme DhaA31 has an occluded and less solvated cavity but shows similar enantioselectivity. The enantioselectivity of DhaA31 arises from steric hindrance imposed by two specific substitutions rather than hydration as in DbjA.
-
28Bar-Even, A.; Noor, E.; Savir, Y.; Liebermeister, W.; Davidi, D.; Tawfik, D. S.; Milo, R. The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme Parameters. Biochemistry 2011, 50, 4402– 4410, DOI: 10.1021/bi200228928https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXlsFWnur8%253D&md5=6cca5d0e98fe4f835de63adfe4059a56The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme ParametersBar-Even, Arren; Noor, Elad; Savir, Yonatan; Liebermeister, Wolfram; Davidi, Dan; Tawfik, Dan S.; Milo, RonBiochemistry (2011), 50 (21), 4402-4410CODEN: BICHAW; ISSN:0006-2960. (American Chemical Society)The kinetic parameters of enzymes are key to understanding the rate and specificity of most biol. processes. Although specific trends are frequently studied for individual enzymes, global trends are rarely addressed. We performed an anal. of kcat and KM values of several thousand enzymes collected from the literature. We found that the "av. enzyme" exhibits a kcat of ∼10 s-1 and a kcat/KM of ∼ 105 s-1 M-1, much below the diffusion limit and the characteristic textbook portrayal of kinetically superior enzymes. Why do most enzymes exhibit moderate catalytic efficiencies Maximal rates may not evolve in cases where weaker selection pressures are expected. We find, for example, that enzymes operating in secondary metab. are, on av., ∼ 30-fold slower than those of central metab. We also find indications that the physicochem. properties of substrates affect the kinetic parameters. Specifically, low mol. mass and hydrophobicity appear to limit KM optimization. In accordance, substitution with phosphate, CoA, or other large modifiers considerably lowers the KM values of enzymes utilizing the substituted substrates. It therefore appears that both evolutionary selection pressures and physicochem. constraints shape the kinetic parameters of enzymes. It also seems likely that the catalytic efficiency of some enzymes toward their natural substrates could be increased in many cases by natural or lab. evolution.
-
29Balchin, D.; Hayer-Hartl, M.; Hartl, F. U. In Vivo Aspects of Protein Folding and Quality Control. Science 2016, 353, aac4354, DOI: 10.1126/science.aac4354There is no corresponding record for this reference.
-
30Colón, W.; Church, J.; Sen, J.; Thibeault, J.; Trasatti, H.; Xia, K. Biological Roles of Protein Kinetic Stability. Biochemistry 2017, 56, 6179– 6186, DOI: 10.1021/acs.biochem.7b0094230https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhslentrfL&md5=b7e0dd86dd97d503913bcab967ee7495Biological Roles of Protein Kinetic StabilityColon, Wilfredo; Church, Jennifer; Sen, Jayeeta; Thibeault, Jane; Trasatti, Hannah; Xia, KeBiochemistry (2017), 56 (47), 6179-6186CODEN: BICHAW; ISSN:0006-2960. (American Chemical Society)A review. A protein's stability may range from non-existent, as in the case of intrinsically disordered proteins, to very high, as indicated by a protein's resistance to degrdn., even under relatively harsh conditions. The stability of this latter group is usually under kinetic control due to a high activation energy for unfolding that virtually traps the protein in a specific conformation, thereby conferring resistance to proteolytic degrdn. and misfolding-aggregation. The usual outcome of kinetic stability is a longer protein half-life. Thus, the protective role of protein kinetic stability is often appreciated, but relatively little is known about the extent of biol. roles related to this property. Here, we discuss several known or putative biol. roles of protein kinetic stability, including protection from stressors to avoid aggregation or premature degrdn., achieving long-term phenotypic change, and regulating cellular processes by controlling the trigger and timing of mol. motion. The picture that emerges from this anal. is that protein kinetic stability is involved in a myriad of known and yet to be discovered biol. functions via its ability to resist degrdn. and control the timing, extent, and permanency of mol. motion.
-
31Khersonsky, O.; Kiss, G.; Röthlisberger, D.; Dym, O.; Albeck, S.; Houk, K. N.; Baker, D.; Tawfik, D. S. Bridging the Gaps in Design Methodologies by Evolutionary Optimization of the Stability and Proficiency of Designed Kemp Eliminase KE59. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 10358– 10363, DOI: 10.1073/pnas.112106310931https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtFWgt7rE&md5=da69b08f6e228aa3c871519287f12688Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59Khersonsky, Olga; Kiss, Gert; Rothlisberger, Daniela; Dym, Orly; Albeck, Shira; Houk, Kendall N.; Baker, David; Tawfik, Dan S.Proceedings of the National Academy of Sciences of the United States of America (2012), 109 (26), 10358-10363, S10358/1-S10358/47CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Computational design is a test of our understanding of enzyme catalysis and a means of engineering novel, tailor-made enzymes. While the de novo computational design of catalytically efficient enzymes remains a challenge, designed enzymes may comprise unique starting points for further optimization by directed evolution. Directed evolution of two computationally designed Kemp eliminases, KE07 and KE70, led to low to moderately efficient enzymes (kcat/Km values of ≤5 × 104 M-1s-1). Here we describe the optimization of a third design, KE59. Although KE59 was the most catalytically efficient Kemp eliminase from this design series (by kcat/Km, and by catalyzing the elimination of nonactivated benzisoxazoles), its impaired stability prevented its evolutionary optimization. To boost KE59's evolvability, stabilizing consensus mutations were included in the libraries throughout the directed evolution process. The libraries were also screened with less activated substrates. Sixteen rounds of mutation and selection led to >2000-fold increase in catalytic efficiency, mainly via higher kcat values. The best KE59 variants exhibited kcat/Km values up to 0.6 × 106 M-1s-1, and kcat/kuncat values of ≤107 almost regardless of substrate reactivity. Biochem., structural, and mol. dynamics (MD) simulation studies provided insights regarding the optimization of KE59. Overall, the directed evolution of three different designed Kemp eliminases, KE07, KE70, and KE59, demonstrates that computational designs are highly evolvable and can be optimized to high catalytic efficiencies.
-
32Taverna, D. M.; Goldstein, R. A. Why Are Proteins Marginally Stable?. Proteins: Struct., Funct., Genet. 2002, 46, 105– 109, DOI: 10.1002/prot.10016There is no corresponding record for this reference.
-
33Sanchez-Ruiz, J. M. Protein Kinetic Stability. Biophys. Chem. 2010, 148, 1– 15, DOI: 10.1016/j.bpc.2010.02.00433https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXkvFCju7c%253D&md5=b0bccbeea28fd182559a4381889877a6Protein kinetic stabilitySanchez-Ruiz, Jose M.Biophysical Chemistry (2010), 148 (1-3), 1-15CODEN: BICIAZ; ISSN:0301-4622. (Elsevier B.V.)A review. The relevance of protein stability for biol. function and mol. evolution is widely recognized. Protein stability, however, comes in 2 flavors: (1) thermodn. stability, which is related to a low amt. of unfolded and partially-unfolded states in equil. with the native, functional protein, and (2) kinetic stability, which is related to a high free energy barrier "sepg." the native state from the non-functional forms (unfolded states, irreversibly-denatured protein). Such a barrier may guarantee that the biol. function of the protein is maintained, at least during a physiol. relevant time-scale, even if the native state is not thermodynamically stable with respect to non-functional forms. Kinetic stabilization is likely required in many cases, since proteins often work under conditions (harsh extracellular or crowded intracellular environments) in which deleterious alterations (proteolysis, aggregation, undesirable interactions with other macromol. components) are prone to occur. Also, kinetic stability may provide a mechanism for the evolution of optimal functional properties. Furthermore, enhancement of kinetic stability is essential for many biotechnol. applications of proteins. Despite all of this, many published studies focus on thermodn. stability, partly because it can be easily quantified in vitro for small model proteins and, also, because of the availability of computational algorithms to est. mutation effects on thermodn. stability. Here, the opposite bias is purposely adopted: the exptl. evidence supporting widespread kinetic stabilization of proteins is summarized, the role of natural selection in detg. this feature is discussed, possible mol. mechanisms responsible for kinetic stability are described, and the relation between kinetic destabilization and protein misfolding diseases is highlighted.
-
34Bommarius, A. S.; Paye, M. F. Stabilizing Biocatalysts. Chem. Soc. Rev. 2013, 42, 6534– 6565, DOI: 10.1039/c3cs60137d34https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtVKhtrzI&md5=5ba9406e8a7666b99704af985258606fStabilizing biocatalystsBommarius, Andreas S.; Paye, Marietou F.Chemical Society Reviews (2013), 42 (15), 6534-6565CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)A review. The area of biocatalysis itself is in rapid development, fueled by both an enhanced repertoire of protein engineering tools and an increasing list of solved problems. Biocatalysts, however, are delicate materials that hover close to the thermodn. limit of stability. In many cases, they need to be stabilized to survive a range of challenges regarding temp., pH value, salt type and concn., co-solvents, as well as shear and surface forces. Biocatalysts may be delicate proteins, however, once stabilized, they are efficiently active enzymes. Kinetic stability must be achieved to a level satisfactory for large-scale process application. Kinetic stability evokes resistance to degrdn. and maintained or increased catalytic efficiency of the enzyme in which the desired reaction is accomplished at an increased rate. However, beyond these limitations, stable biocatalysts can be operated at higher temps. or co-solvent concns., with ensuing redn. in microbial contamination, better soly., as well as in many cases more favorable equil., and can serve as more effective templates for combinatorial and data-driven protein engineering. To increase thermodn. and kinetic stability, immobilization, protein engineering, and medium engineering of biocatalysts are available, the main focus of this work. In the case of protein engineering, there are three main approaches to enhancing the stability of protein biocatalysts: (i) rational design, based on knowledge of the 3D-structure and the catalytic mechanism, (ii) combinatorial design, requiring a protocol to generate diversity at the genetic level, a large, often high throughput, screening capacity to distinguish hits' from misses', and (iii) data-driven design, fueled by the increased availability of nucleotide and amino acid sequences of equiv. functionality.
-
35Goldenzweig, A.; Fleishman, S. J. Principles of Protein Stability and Their Application in Computational Design. Annu. Rev. Biochem. 2018, 87, 105– 129, DOI: 10.1146/annurev-biochem-062917-01210235https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXitFyqt7k%253D&md5=d5b820508142e79dafe127543f1ad6b7Principles of Protein Stability and Their Application in Computational DesignGoldenzweig, Adi; Fleishman, Sarel J.Annual Review of Biochemistry (2018), 87 (), 105-129CODEN: ARBOAW; ISSN:0066-4154. (Annual Reviews)A review. Proteins are increasingly used in basic and applied biomedical research. Many proteins, however, are only marginally stable and can be expressed in limited amts., thus hampering research and applications. Research has revealed the thermodn., cellular, and evolutionary principles and mechanisms that underlie marginal stability. With this growing understanding, computational stability design methods have advanced over the past two decades starting from methods that selectively addressed only some aspects of marginal stability. Current methods are more general and, by combining phylogenetic anal. with atomistic design, have shown drastic improvements in soly., thermal stability, and aggregation resistance while maintaining the protein's primary mol. activity. Stability design is opening the way to rational engineering of improved enzymes, therapeutics, and vaccines and to the application of protein design methodol. to large proteins and mol. activities that have proven challenging in the past.
-
36Hansen, N.; van Gunsteren, W. F. Practical Aspects of Free-Energy Calculations: A Review. J. Chem. Theory Comput. 2014, 10, 2632– 2647, DOI: 10.1021/ct500161f36https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXotlWjs7k%253D&md5=096fd11727692a87b0884e50bcb4a5e3Practical Aspects of Free-Energy Calculations: A ReviewHansen, Niels; van Gunsteren, Wilfred F.Journal of Chemical Theory and Computation (2014), 10 (7), 2632-2647CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)A review. Free-energy calcns. in the framework of classical mol. dynamics simulations are nowadays used in a wide range of research areas including solvation thermodn., mol. recognition, and protein folding. The basic components of a free-energy calcn., i.e., a suitable model Hamiltonian, a sampling protocol, and an estimator for the free energy, are independent of the specific application. However, the attention that one has to pay to these components depends considerably on the specific application. Here, we review six different areas of application and discuss the relative importance of the three main components to provide the reader with an organigram and to make nonexperts aware of the many pitfalls present in free energy calcns.
-
37Polizzi, K. M.; Bommarius, A. S.; Broering, J. M.; Chaparro-Riggers, J. F. Stability of Biocatalysts. Curr. Opin. Chem. Biol. 2007, 11, 220– 225, DOI: 10.1016/j.cbpa.2007.01.68537https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXjvFegurk%253D&md5=0b7ef267bbff8fcbdc29ca772079fc57Stability of biocatalystsPolizzi, Karen M.; Bommarius, Andreas S.; Broering, James M.; Chaparro-Riggers, Javier F.Current Opinion in Chemical Biology (2007), 11 (2), 220-225CODEN: COCBF4; ISSN:1367-5931. (Elsevier B.V.)A review. Here, the authors highlight recent research on the stabilization of enzymes using both chem. and biol. means to increase the lifetime of the biocatalyst. Despite their many favorable qualities, the marginal stability of biocatalysts in many types of reaction media often has prevented or delayed their implementation for industrial-scale synthesis of fine chems. and pharmaceuticals. Consequently, there is great interest in understanding the effects of soln. conditions on protein stability, as well as in developing strategies to improve protein stability in desired reaction media. Recent methods include novel chem. modifications of proteins, lyophilization in the presence of additives, and phys. immobilization on novel supports. Rational and combinatorial protein engineering techniques have been used to yield unmodified proteins with exceptionally improved stability. Both have been aided by the development of computational tools and structure-guided heuristics aimed at reducing library sizes that must be generated and screened to identify improved mutants. The no. of parameters used to indicate protein stability can complicate discussions and investigations, and care should be taken to identify whether thermodn. or kinetic stability limits the obsd. stability of proteins. Although the useful lifetime of a biocatalyst is dictated by its kinetic stability, only 6% of protein stability studies use kinetic stability measures. Clearly, more effort is needed to study how soln. conditions impact protein kinetic stability.
-
38Buck, P. M.; Kumar, S.; Wang, X.; Agrawal, N. J.; Trout, B. L.; Singh, S. K. Computational Methods To Predict Therapeutic Protein Aggregation. Methods Mol. Biol. 2012, 899, 425– 451, DOI: 10.1007/978-1-61779-921-1_2638https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXitFags7c%253D&md5=a7b334801df3f41c32bb81f50bf967b4Computational methods to predict therapeutic protein aggregationBuck, Patrick M.; Kumar, Sandeep; Wang, Xiaoling; Agrawal, Neeraj J.; Trout, Bernhardt L.; Singh, Satish K.Methods in Molecular Biology (New York, NY, United States) (2012), 899 (Therapeutic Proteins), 425-451CODEN: MMBIED; ISSN:1064-3745. (Springer)A review. Protein based biotherapeutics have emerged as a successful class of pharmaceuticals. However, these macromols. endure a variety of physicochem. degrdns. during manufg., shipping, and storage, which may adversely impact the drug product quality. Of these degrdns., the irreversible self-assocn. of therapeutic proteins to form aggregates is a major challenge in the formulation of these mols. Tools to predict and mitigate protein aggregation are, therefore, of great interest to biopharmaceutical research and development. In this chapter, a no. of such computational tools developed to understand and predict the various steps involved in protein aggregation are described. These tools can be grouped into three general classes: unfolding kinetics and native state thermal stability, colloidal stability, and sequence/structure based aggregation liabilities. Chapter sections introduce each class by discussing how these predictive tools provide insight into the mol. events leading to protein aggregation. The computational methods are then explained in detail along with their advantages and limitations.
-
39Jaswal, S. S.; Sohl, J. L.; Davis, J. H.; Agard, D. A. Energetic Landscape of α-Lytic Protease Optimizes Longevity through Kinetic Stability. Nature 2002, 415, 343– 346, DOI: 10.1038/415343a39https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38Xptlaktg%253D%253D&md5=9d150430ebb485d561b9a8d71cab305bEnergetic landscape of α-lytic protease optimizes longevity through kinetic stabilityJaswal, Shella S.; Sohl, Julie L.; Davis, Jonathan H.; Agard, David A.Nature (London, United Kingdom) (2002), 415 (6869), 343-346CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)During the evolution of proteins the pressure to optimize biol. activity is moderated by a need for efficient folding. For most proteins, this is accomplished through spontaneous folding to a thermodynamically stable and active native state. However, in the extracellular bacterial α-lytic protease (αLP) these two processes have become decoupled. The native state of αLP is thermodynamically unstable, and when denatured, requires millennia (t1/2 ∼ 1800 yr) to refold. Folding is made possible by an attached folding catalyst, the pro-region, which is degraded on completion of folding, leaving αLP trapped in its native state by a large kinetic unfolding barrier (t1/2 ∼ 1.2 yr). αLP faces two very different folding landscapes: one in the presence of the pro-region controlling folding, and one in its absence restricting unfolding. Here we demonstrate that this sepn. of folding and unfolding pathways has removed constraints placed on the folding of thermodynamically stable proteins, and allowed the evolution of a native state having markedly reduced dynamic fluctuations. This, in turn, has led to a significant extension of the functional lifetime of αLP by the optimal suppression of proteolytic sensitivity.
-
40Young, T. A.; Skordalakes, E.; Marqusee, S. Comparison of Proteolytic Susceptibility in Phosphoglycerate Kinases from Yeast and E. coli: Modulation of Conformational Ensembles Without Altering Structure or Stability. J. Mol. Biol. 2007, 368, 1438– 1447, DOI: 10.1016/j.jmb.2007.02.07740https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXkslCmsLs%253D&md5=31e892b4872a16ea5507c80dfe4c63d4Comparison of Proteolytic Susceptibility in Phosphoglycerate Kinases from Yeast and E. coli: Modulation of Conformational Ensembles Without Altering Structure or StabilityYoung, Tracy A.; Skordalakes, Emmanuel; Marqusee, SusanJournal of Molecular Biology (2007), 368 (5), 1438-1447CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Escherichia coli phosphoglycerate kinase (PGK) is resistant to proteolytic cleavage while the yeast homolog from Saccharomyces cerevisiae is not. We have explored the biophys. basis of this surprising difference. The sequences of these homologs are 39% identical and 56% similar. Detn. of the crystal structure for the E. coli protein and comparison to the previously solved yeast structure reveals that the two proteins have extremely similar tertiary structures, and their global stabilities detd. by equil. denaturation are also very similar. The extrapolated unfolding rate of E. coli PGK is, however, 105 slower than that of the yeast homolog. This surprisingly large difference in unfolding rates appears to arise from a divergence in the extent of cooperativity between the two structural domains (the N and C-domains) that make up these kinases. This is supported by: (1) the C-domain of E. coli PGK cannot be expressed or fold independently of the N-domain, while both domains of the yeast protein fold in isolation into stable structures and (2) the energetics and kinetics of the proteolytically sensitive state of E. coli PGK match those for global unfolding. This suggests that proteolysis occurs from the globally unfolded state of E. coli PGK, while the characteristics defining the yeast homolog suggest that proteolysis occurs upon unfolding of only the C-domain, with the N-domain remaining folded and consequently resistant to cleavage.
-
41Shirke, A. N.; Basore, D.; Butterfoss, G. L.; Bonneau, R.; Bystroff, C.; Gross, R. A. Toward Rational Thermostabilization of Aspergillus Oryzae Cutinase: Insights into Catalytic and Structural Stability. Proteins: Struct., Funct., Genet. 2016, 84, 60– 72, DOI: 10.1002/prot.24955There is no corresponding record for this reference.
-
42Liu, B.; Zhang, J.; Li, B.; Liao, X.; Du, G.; Chen, J. Expression and Characterization of Extreme Alkaline, Oxidation-Resistant Keratinase from Bacillus Licheniformis in Recombinant Bacillus Subtilis WB600 Expression System and Its Application in Wool Fiber Processing. World J. Microbiol. Biotechnol. 2013, 29, 825– 832, DOI: 10.1007/s11274-012-1237-542https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXlsFWnsb4%253D&md5=ff6b515ff39b7f5d5bb752ba1a1c1ce8Expression and characterization of extreme alkaline, oxidation-resistant keratinase from Bacillus licheniformis in recombinant Bacillus subtilis WB600 expression system and its application in wool fiber processingLiu, Baihong; Zhang, Juan; Li, Ben; Liao, Xiangru; Du, Guocheng; Chen, JianWorld Journal of Microbiology & Biotechnology (2013), 29 (5), 825-832CODEN: WJMBEY; ISSN:0959-3993. (Springer)A keratin-degrading bacterium of Bacillus licheniformis BBE11-1 was isolated and its ker gene encoding keratinase with native signal peptide was cloned and expressed in Bacillus subtilis WB600 under the strong PHpaII promoter of the pMA0911 vector. In the 3-L fermenter, the recombinant keratinase was secreted with 323 units/mL when non-induced after 24 h at 37 °C. And then, keratinase was concd. and purified by hydrophobic interaction chromatog. using HiTrap Phenyl-Sepharose Fast Flow. The recombinant keratinase had an optimal temp. and the pH at 40 °C and 10.5, resp., and was stable at 10-50 °C and pH 7-11.5. We found this enzyme can retained 80 % activity after treated 5 h with 1 M H2O2, it was activated by Mg2+, Co2+ and could degraded broad substrates such as degraded feather, bovine serum albumin, casein, gelatin, the keratinase was considered to be a serine protease. Coordinate with Savinase, the keratinase could efficient prevent shrinkage and eliminate fibers of wool, which showed its potential in textile industries and detergent industries.
-
43Nguyen, V.; Wilson, C.; Hoemberger, M.; Stiller, J. B.; Agafonov, R. V.; Kutter, S.; English, J.; Theobald, D. L.; Kern, D. Evolutionary Drivers of Thermoadaptation in Enzyme Catalysis. Science 2017, 355, 289– 294, DOI: 10.1126/science.aah371743https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVehtbk%253D&md5=f2d5eebf186da3323268e68831d65e49Evolutionary drivers of thermoadaptation in enzyme catalysisNguyen, Vy; Wilson, Christopher; Hoemberger, Marc; Stiller, John B.; Agafonov, Roman V.; Kutter, Steffen; English, Justin; Theobald, Douglas L.; Kern, DorotheeScience (Washington, DC, United States) (2017), 355 (6322), 289-294CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)With early life likely to have existed in a hot environment, enzymes had to cope with an inherent drop in catalytic speed caused by lowered temp. Here we characterize the mol. mechanisms underlying thermoadaptation of enzyme catalysis in adenylate kinase using ancestral sequence reconstruction spanning 3 billion years of evolution. We show that evolution solved the enzyme's key kinetic obstacle - how to maintain catalytic speed on a cooler Earth - by exploiting transition-state heat capacity. Tracing the evolution of enzyme activity and stability from the hot-start toward modern hyperthermophilic, mesophilic, and psychrophilic organisms illustrates active pressure vs. passive drift in evolution on a mol. level, refutes the debated activity/stability trade-off, and suggests that the catalytic speed of adenylate kinase is an evolutionary driver for organismal fitness.
-
44Risso, V. A.; Gavira, J. A.; Gaucher, E. A.; Sanchez-Ruiz, J. M. Phenotypic Comparisons of Consensus Variants versus Laboratory Resurrections of Precambrian Proteins. Proteins: Struct., Funct., Genet. 2014, 82, 887– 896, DOI: 10.1002/prot.24575There is no corresponding record for this reference.
-
45Bednar, D.; Beerens, K.; Sebestova, E.; Bendl, J.; Khare, S.; Chaloupkova, R.; Prokop, Z.; Brezovsky, J.; Baker, D.; Damborsky, J. FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point Mutants. PLoS Comput. Biol. 2015, 11, e1004556, DOI: 10.1371/journal.pcbi.100455645https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XkvVKhtb4%253D&md5=82389328fe2da01f6f99eba4afe20f40FireProt: energy- and evolution-based computational design of thermostable multiple-point mutantsBednar, David; Beerens, Koen; Sebestova, Eva; Bendl, Jaroslav; Khare, Sagar; Chaloupkova, Radka; Prokop, Zbynek; Brezovsky, Jan; Baker, David; Damborsky, JiriPLoS Computational Biology (2015), 11 (11), e1004556/1-e1004556/20CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)There is great interest in increasing proteins' stability to enhance their utility as biocatalysts, therapeutics, diagnostics and nanomaterials. Directed evolution is a powerful, but exptl. strenuous approach. Computational methods offer attractive alternatives. However, due to the limited reliability of predictions and potentially antagonistic effects of substitutions, only single-point mutations are usually predicted in silico, exptl. verified and then recombined in multiple-point mutants. Thus, substantial screening is still required. Here we present FireProt, a robust computational strategy for predicting highly stable multiple-point mutants that combines energy- and evolution-based approaches with smart filtering to identify additive stabilizing mutations. FireProt's reliability and applicability was demonstrated by validating its predictions against 656 mutations from the ProTherm database. We demonstrate that thermostability of the model enzymes haloalkane dehalogenase DhaA and γ-hexachlorocyclohexane dehydrochlorinase LinA can be substantially increased (ΔTm = 24°C and 21°C) by constructing and characterizing only a handful of multiple-point mutants. FireProt can be applied to any protein for which a tertiary structure and homologous sequences are available, and will facilitate the rapid development of robust proteins for biomedical and biotechnol. applications.
-
46Babkova, P.; Sebestova, E.; Brezovsky, J.; Chaloupkova, R.; Damborsky, J. Ancestral Haloalkane Dehalogenases Show Robustness and Unique Substrate Specificity. ChemBioChem 2017, 18, 1448– 1456, DOI: 10.1002/cbic.20170019746https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXpsFSqsLs%253D&md5=f732a67defa3a56ea468a91bc1c345ddAncestral Haloalkane Dehalogenases Show Robustness and Unique Substrate SpecificityBabkova, Petra; Sebestova, Eva; Brezovsky, Jan; Chaloupkova, Radka; Damborsky, JiriChemBioChem (2017), 18 (14), 1448-1456CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)Ancestral sequence reconstruction (ASR) represents a powerful approach for empirical testing structure-function relationships of diverse proteins. We employed ASR to predict sequences of five ancestral haloalkane dehalogenases (HLDs) from the HLD-II subfamily. Genes encoding the inferred ancestral sequences were synthesized and expressed in Escherichia coli, and the resurrected ancestral enzymes (AncHLD1-5) were exptl. characterized. Strikingly, the ancestral HLDs exhibited significantly enhanced thermodn. stability compared to extant enzymes (ΔTm up to 24 °C), as well as higher specific activities with preference for short multi-substituted halogenated substrates. Moreover, multivariate statistical anal. revealed a shift in the substrate specificity profiles of AncHLD1 and AncHLD2. This is extremely difficult to achieve by rational protein engineering. The study highlights that ASR is an efficient approach for the development of novel biocatalysts and robust templates for directed evolution.
-
47Goldenzweig, A.; Goldsmith, M.; Hill, S. E.; Gertman, O.; Laurino, P.; Ashani, Y.; Dym, O.; Unger, T.; Albeck, S.; Prilusky, J.; Lieberman, R. L.; Aharoni, A.; Silman, I.; Sussman, J. L.; Tawfik, D. S.; Fleishman, S. J. Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and Stability. Mol. Cell 2016, 63, 337– 346, DOI: 10.1016/j.molcel.2016.06.01247https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhtFyiur7L&md5=b0a4f7734048636b9c30bd9449b4d4a1Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and StabilityGoldenzweig, Adi; Goldsmith, Moshe; Hill, Shannon E.; Gertman, Or; Laurino, Paola; Ashani, Yacov; Dym, Orly; Unger, Tamar; Albeck, Shira; Prilusky, Jaime; Lieberman, Raquel L.; Aharoni, Amir; Silman, Israel; Sussman, Joel L.; Tawfik, Dan S.; Fleishman, Sarel J.Molecular Cell (2016), 63 (2), 337-346CODEN: MOCEFL; ISSN:1097-2765. (Elsevier Inc.)Upon heterologous overexpression, many proteins misfold or aggregate, thus resulting in low functional yields. Human acetylcholinesterase (hAChE), an enzyme mediating synaptic transmission, is a typical case of a human protein that necessitates mammalian systems to obtain functional expression. We developed a computational strategy and designed an AChE variant bearing 51 mutations that improved core packing, surface polarity, and backbone rigidity. This variant expressed at ∼2,000-fold higher levels in E. coli compared to wild-type hAChE and exhibited 20°C higher thermostability with no change in enzymic properties or in the active-site configuration as detd. by crystallog. To demonstrate broad utility, we similarly designed four other human and bacterial proteins. Testing at most three designs per protein, we obtained enhanced stability and/or higher yields of sol. and active protein in E. coli. Our algorithm requires only a 3D structure and several dozen sequences of naturally occurring homologs, and is available at http://pross.weizmann.ac.il.
-
48Hammes, G. G.; Chang, Y.-C.; Oas, T. G. Conformational Selection or Induced Fit: A Flux Description of Reaction Mechanism. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 13737, DOI: 10.1073/pnas.090719510648https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtFWksL3N&md5=8c3f865f8c53b2d597ec26c8bba27fb3Conformational selection or induced fit: a flux description of reaction mechanismHammes, Gordon G.; Chang, Yu-Chu; Oas, Terrence G.Proceedings of the National Academy of Sciences of the United States of America (2009), 106 (33), 13737-13741CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)The mechanism of ligand binding coupled to conformational changes in macromols. has recently attracted considerable interest. The 2 limiting cases are the "induced fit" mechanism (binding first) or "conformational selection" (conformational change first). Described here are the criteria by which the sequence of events can be detd. quant. The relative importance of the 2 pathways is detd. not by comparing rate consts. (a common misconception) but instead by comparing the flux through each pathway. The simple rules for calcg. flux in multistep mechanisms are described and then applied to 2 examples from the literature, neither of which has previously been analyzed using the concept of flux. The first example is the mechanism of conformational change in the binding of NADPH to dihydrofolate reductase (DHFR). The second example is the mechanism of flavodoxin folding coupled to binding of its cofactor, FMN. In both cases, the mechanism switches from being dominated by the conformational selection pathway at low ligand concn. to induced fit at high ligand concn. Over a wide range of conditions, a significant fraction of the flux occurs through both pathways. Such a mixed mechanism likely will be discovered for many cases of coupled conformational change and ligand binding when kinetic data are analyzed by using a flux-based approach.
-
49Kramer, R. M.; Shende, V. R.; Motl, N.; Pace, C. N.; Scholtz, J. M. Toward a Molecular Understanding of Protein Solubility: Increased Negative Surface Charge Correlates with Increased Solubility. Biophys. J. 2012, 102, 1907– 1915, DOI: 10.1016/j.bpj.2012.01.06049https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmtVeku7k%253D&md5=1425a6b01b8a36cf68766f853607cdf6Toward a Molecular Understanding of Protein Solubility: Increased Negative Surface Charge Correlates with Increased SolubilityKramer, Ryan M.; Shende, Varad R.; Motl, Nicole; Pace, C. Nick; Scholtz, J. MartinBiophysical Journal (2012), 102 (8), 1907-1915CODEN: BIOJAU; ISSN:0006-3495. (Cell Press)Protein soly. is a problem for many protein chemists, including structural biologists and developers of protein pharmaceuticals. Knowledge about how intrinsic factors influence soly. is limited due to the difficulty of obtaining quant. soly. measurements. Soly. measurements in buffer alone are difficult to reproduce, because gels or supersatd. solns. often form, making it impossible to det. soly. values for many proteins. Protein precipitants can be used to obtain comparative soly. measurements and, in some cases, estns. of soly. in buffer alone. Protein precipitants fall into three broad classes: salts, long-chain polymers, and org. solvents. Here, we compare the use of representatives from two classes of precipitants, ammonium sulfate and polyethylene glycol 8000, by measuring the soly. of seven proteins. We find that increased neg. surface charge correlates strongly with increased protein soly. and may be due to strong binding of water by the acidic amino acids. We also find that the soly. results obtained for the two different precipitants agree closely with each other, suggesting that the two precipitants probe similar properties that are relevant to soly. in buffer alone.
-
50Khow, O.; Suntrarachun, S. Strategies for Production of Active Eukaryotic Proteins in Bacterial Expression System. Asian Pac. J. Trop. Biomed. 2012, 2, 159– 162, DOI: 10.1016/S2221-1691(11)60213-X50https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XjtVKlsrc%253D&md5=73bf7eb097e7d1c12983ef331a2a06c6Strategies for production of active eukaryotic proteins in bacterial expression systemKhow, Orawan; Suntrarachun, SunutchaAsian Pacific Journal of Tropical Biomedicine (2012), 2 (2), 159-162CODEN: APJTC7; ISSN:2221-1691. (Asian Pacific Tropical Medicine Press)A review. Bacteria have long been the favorite expression system for recombinant protein prodn. However, the flaw of the system is that insol. and inactive proteins are co-produced due to codon bias, protein folding, phosphorylation, glycosylation, mRNA stability and promoter strength. Factors are cited and the methods to convert to sol. and active proteins are described, for example a tight control of Escherichia coli milieu, refolding from inclusion body and through fusion technol.
-
51Sørensen, H. P.; Mortensen, K. K. Soluble Expression of Recombinant Proteins in the Cytoplasm of Escherichia coli. Microb. Cell Fact. 2005, 4, 1, DOI: 10.1186/1475-2859-4-151https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2sbnvFOksg%253D%253D&md5=f3fbb4b2b2bce0500b4aa4f806c23e0bSoluble expression of recombinant proteins in the cytoplasm of Escherichia coliSorensen Hans Peter; Mortensen Kim KuskMicrobial cell factories (2005), 4 (1), 1 ISSN:.Pure, soluble and functional proteins are of high demand in modern biotechnology. Natural protein sources rarely meet the requirements for quantity, ease of isolation or price and hence recombinant technology is often the method of choice. Recombinant cell factories are constantly employed for the production of protein preparations bound for downstream purification and processing. Eschericia coli is a frequently used host, since it facilitates protein expression by its relative simplicity, its inexpensive and fast high density cultivation, the well known genetics and the large number of compatible molecular tools available. In spite of all these qualities, expression of recombinant proteins with E. coli as the host often results in insoluble and/or nonfunctional proteins. Here we review new approaches to overcome these obstacles by strategies that focus on either controlled expression of target protein in an unmodified form or by applying modifications using expressivity and solubility tags.
-
52Hartl, F. U.; Bracher, A.; Hayer-Hartl, M. Molecular Chaperones in Protein Folding and Proteostasis. Nature 2011, 475, 324– 332, DOI: 10.1038/nature1031752https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXpt1aqsb8%253D&md5=8d3045af796a78a2e587bafc3a49211eMolecular chaperones in protein folding and proteostasisHartl, F. Ulrich; Bracher, Andreas; Hayer-Hartl, ManajitNature (London, United Kingdom) (2011), 475 (7356), 324-332CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)A review. Most proteins must fold into defined 3-dimensional structures to gain functional activity. However, in the cellular environment, newly synthesized proteins are at great risk of aberrant folding and aggregation, potentially forming toxic species. To avoid these dangers, cells invest in a complex network of mol. chaperones, which use ingenious mechanisms to prevent aggregation and promote efficient folding. Because protein mols. are highly dynamic, const. chaperone surveillance is required to ensure protein homeostasis (proteostasis). Recent advances suggest that an age-related decline in proteostasis capacity allows the manifestation of various protein-aggregation diseases, including Alzheimer's disease and Parkinson's disease. Interventions in these and numerous other pathol. states may spring from a detailed understanding of the pathways underlying proteome maintenance.
-
53Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y.; Wriggers, W. Atomic-Level Characterization of the Structural Dynamics of Proteins. Science 2010, 330, 341– 346, DOI: 10.1126/science.118740953https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXht1OisL%252FN&md5=85c9d897881e8684fc39d69b2b6b2fadAtomic-Level Characterization of the Structural Dynamics of ProteinsShaw, David E.; Maragakis, Paul; Lindorff-Larsen, Kresten; Piana, Stefano; Dror, Ron O.; Eastwood, Michael P.; Bank, Joseph A.; Jumper, John M.; Salmon, John K.; Shan, Yibing; Wriggers, WillyScience (Washington, DC, United States) (2010), 330 (6002), 341-346CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)Mol. dynamics (MD) simulations are widely used to study protein motions at an at. level of detail, but they have been limited to time scales shorter than those of many biol. crit. conformational changes. We examd. two fundamental processes in protein dynamics-protein folding and conformational change within the folded state-by means of extremely long all-atom MD simulations conducted on a special-purpose machine. Equil. simulations of a WW protein domain captured multiple folding and unfolding events that consistently follow a well-defined folding pathway; sep. simulations of the protein's constituent substructures shed light on possible determinants of this pathway. A 1-ms simulation of the folded protein BPTI reveals a small no. of structurally distinct conformational states whose reversible interconversion is slower than local relaxations within those states by a factor of more than 1000.
-
54Englander, S. W.; Mayne, L. The Case for Defined Protein Folding Pathways. Proc. Natl. Acad. Sci. U. S. A. 2017, 114, 8253– 8258, DOI: 10.1073/pnas.170619611454https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVWmtL3O&md5=b7132d811d7981126c692fd69f27dd2cThe case for defined protein folding pathwaysEnglander, S. Walter; Mayne, LelandProceedings of the National Academy of Sciences of the United States of America (2017), 114 (31), 8253-8258CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)We consider the differences between the many-pathway protein folding model derived from theor. energy landscape considerations and the defined-pathway model derived from expt. A basic tenet of the energy landscape model is that proteins fold through many heterogeneous pathways by way of amino acid-level dynamics biased toward selecting native-like interactions. The many pathways imagined in the model are not obsd. in the structure-formation stage of folding by expts. that would have found them, but they have now been detected and characterized for one protein in the initial prenucleation stage. Anal. presented here shows that these many microscopic trajectories are not distinct in any functionally significant way, and they have neither the structural information nor the biased energetics needed to select native vs. non-native interactions during folding. The opposed defined-pathway model stems from exptl. results that show that proteins are assemblies of small cooperative units called foldons and that a no. of proteins fold in a reproducible pathway one foldon unit at a time. Thus, the same foldon interactions that encode the native structure of any given protein also naturally encode its particular foldon-based folding pathway, and they collectively sum to produce the energy bias toward native interactions that is necessary for efficient folding. Available information suggests that quantized native structure and stepwise folding coevolved in ancient repeat proteins and were retained as a functional pair due to their utility for solving the difficult protein folding problem.
-
55Voelz, V. A.; Bowman, G. R.; Beauchamp, K.; Pande, V. S. Molecular Simulation of Ab Initio Protein Folding for a Millisecond Folder NTL9(1–39). J. Am. Chem. Soc. 2010, 132, 1526– 1528, DOI: 10.1021/ja909035355https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXkvFCktQ%253D%253D&md5=0f7e3f2489fc0693ee494b212cde2a6cMolecular Simulation of ab Initio Protein Folding for a Millisecond Folder NTL9(1-39)Voelz, Vincent A.; Bowman, Gregory R.; Beauchamp, Kyle; Pande, Vijay S.Journal of the American Chemical Society (2010), 132 (5), 1526-1528CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)The results obtained suggest that existing force field models using implicit solvent are indeed accurate enough to fold proteins ab initio at long time scales (milliseconds). opening the door to simulating more structurally complex proteins. Moreover, our work demonstrates that there need not be a single pathway or single. dominant mechanism for the folding of a given protein: since the theories proposed for how proteins fold are based on broadly relevant phys. principles, it is natural to imagine that multiple mechanisms could be simultaneously present but that the sequence of the protein, coupled with the chem. environment, would control the balance to which each mechanistic pathway is seen.
-
56Eaton, W. A.; Wolynes, P. G. Theory, Simulations, and Experiments Show That Proteins Fold by Multiple Pathways. Proc. Natl. Acad. Sci. U. S. A. 2017, 114, E9759– E9760, DOI: 10.1073/pnas.171644411456https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvVShsr3O&md5=ecdf18b4810579010470c23b1349b3f1Theory, simulations, and experiments show that proteins fold by multiple pathwaysEaton, William A.; Wolynes, Peter G.Proceedings of the National Academy of Sciences of the United States of America (2017), 114 (46), E9759-E9760CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)There is no expanded citation for this reference.
-
57Yang, Y.; Niroula, A.; Shen, B.; Vihinen, M. PON-Sol: Prediction of Effects of Amino Acid Substitutions on Protein Solubility. Bioinformatics 2016, 32, 2032– 2034, DOI: 10.1093/bioinformatics/btw06657https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsF2lt7jN&md5=718a1b391921d0f38c443b58819ab66aPON-Sol: prediction of effects of amino acid substitutions on protein solubilityYang, Yang; Niroula, Abhishek; Shen, Bairong; Vihinen, MaunoBioinformatics (2016), 32 (13), 2032-2034CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Soly. is one of the fundamental protein properties. It is of great interest because of its relevance to protein expression. Reduced soly. and protein aggregation are also assocd. with many diseases. Results: We collected from literature the largest exptl. verified soly. affecting amino acid substitution (AAS) dataset and used it to train a predictor called PON-Sol. The predictor can distinguish both soly. decreasing and increasing variants from those not affecting soly. PON-Sol has normalized correct prediction ratio of 0.491 on cross-validation and 0.432 for independent test set. The performance of the method was compared both to soly. and aggregation predictors and found to be superior. PON-Sol can be used for the prediction of effects of disease-related substitutions, effects on heterologous recombinant protein expression and enhanced crystallizability. One application is to investigate effects of all possible AASs in a protein to aid protein engineering.
-
58Broom, A.; Jacobi, Z.; Trainor, K.; Meiering, E. M. Computational Tools Help Improve Protein Stability but with a Solubility Tradeoff. J. Biol. Chem. 2017, 292, 14349– 14361, DOI: 10.1074/jbc.M117.78416558https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVantLfP&md5=1c7b181ba75fad3548ba938167dd3a92Computational tools help improve protein stability but with a solubility tradeoffBroom, Aron; Jacobi, Zachary; Trainor, Kyle; Meiering, Elizabeth M.Journal of Biological Chemistry (2017), 292 (35), 14349-14361CODEN: JBCHA3; ISSN:0021-9258. (American Society for Biochemistry and Molecular Biology)Accurately predicting changes in protein stability upon amino acid substitution is a much sought after goal. Destabilizing mutations are often implicated in disease, whereas stabilizing mutations are of great value for industrial and therapeutic biotechnol. Increasing protein stability is an esp. challenging task, with random substitution yielding stabilizing mutations in only ∼2% of cases. To overcome this bottleneck, computational tools that aim to predict the effect of mutations have been developed; however, achieving accuracy and consistency remains challenging. Here, we combined 11 freely available tools into a meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/). Validation against ∼600 exptl. mutations indicated that our meta-predictor has improved performance over any of the individual tools. The meta-predictor was then used to recommend 10 mutations in a previously designed protein of moderate thermodn. stability, ThreeFoil. Exptl. characterization showed that four mutations increased protein stability and could be amplified through ThreeFoil's structural symmetry to yield several multiple mutants with >2-kcal/mol stabilization. By avoiding residues within functional ties, we could maintain ThreeFoil's glycan-binding capacity. Despite successfully achieving substantial stabilization, however, almost all mutations decreased protein soly., the most common cause of protein design failure. Examn. of the 600-mutation data set revealed that stabilizing mutations on the protein surface tend to increase hydrophobicity and that the individual tools favor this approach to gain stability. Thus, whereas currently available tools can increase protein stability and combining them into a meta-predictor yields enhanced reliability, improvements to the potentials/force fields underlying these tools are needed to avoid gaining protein stability at the cost of soly.
-
59Cabantous, S.; Waldo, G. S. In Vivo and in Vitro Protein Solubility Assays Using Split GFP. Nat. Methods 2006, 3, 845– 854, DOI: 10.1038/nmeth93259https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XpvVCmtb8%253D&md5=1e312220eac04371c8e03d4c8ee6bf48In vivo and in vitro protein solubility assays using split GFPCabantous, Stephanie; Waldo, Geoffrey S.Nature Methods (2006), 3 (10), 845-854CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)The rapid assessment of protein soly. is essential for evaluating expressed proteins and protein variants for use as reagents for downstream studies. Soly. screens based on antibody blots are complex and have limited screening capacity. Protein soly. screens using split β-galactosidase in vivo and in vitro can perturb protein folding. Split GFP used for monitoring protein interactions folds poorly, and to overcome this limitation, we recently developed a protein-tagging system based on self-complementing split GFP derived from an exceptionally well folded variant of GFP termed 'superfolder GFP'. Here we present the step-by-step procedure of the soly. assay using split GFP. A 15-amino-acid GFP fragment, GFP 11, is fused to a test protein. The GFP 1-10 detector fragment is expressed sep. These fragments assoc. spontaneously to form fluorescent GFP. The fragments are sol., and the GFP 11 tag has minimal effect on protein soly. and folding. We describe high-throughput protein soly. screens amenable both for in vivo and in vitro formats. The split-GFP system is composed of two vectors used in the same strain: pTET GFP 11 and pET GFP 1-10. The gene encoding the protein of interest is cloned into the pTET GFP 11 vector (resulting in an N-terminal fusion) and transformed into Escherichia coli BL21 (DE3) cells contg. the pET GFP 1-10 plasmid. We also describe how this system can be used for selecting sol. proteins from a library of variants. The large screening power of the in vivo assay combined with the high accuracy of the in vitro assay point to the efficiency of this two-step split-GFP tool for identifying sol. clones suitable for purifn. and downstream applications.
-
60Niwa, T.; Ying, B.-W.; Saito, K.; Jin, W.; Takada, S.; Ueda, T.; Taguchi, H. Bimodal Protein Solubility Distribution Revealed by an Aggregation Analysis of the Entire Ensemble of Escherichia coli Proteins. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 4201– 4206, DOI: 10.1073/pnas.081192210660https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXjslChur4%253D&md5=d40a704ee7e3e75c515c5be76d8c0dbbBimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteinsNiwa, Tatsuya; Ying, Bei-Wen; Saito, Katsu; Jin, Wen Zhen; Takada, Shoji; Ueda, Takuya; Taguchi, HidekiProceedings of the National Academy of Sciences of the United States of America (2009), 106 (11), 4201-4206CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Protein folding often competes with intermol. aggregation, which in most cases irreversibly impairs protein function, as exemplified by the formation of inclusion bodies. Although it has been empirically detd. that some proteins tend to aggregate, the relationship between the protein aggregation propensities and the primary sequences remains poorly understood. Here, the authors individually synthesized the entire ensemble of Escherichia coli proteins by using an in vitro reconstituted translation system and analyzed the aggregation propensities. Because the reconstituted translation system is chaperone-free, they could evaluate the inherent aggregation propensities of thousands of proteins in a translation-coupled manner. A histogram of the solubilities, based on data from 3,173 translated proteins, revealed a clear bimodal distribution, indicating that the aggregation propensities are not evenly distributed across a continuum. Instead, the proteins can be categorized into 2 groups, sol. and aggregation-prone proteins. The aggregation propensity is most prominently correlated with the structural classification of proteins, implying that the prediction of aggregation propensity requires structural information about the protein.
-
61Eijsink, V. G.; Vriend, G.; van den Burg, B.; van der Zee, J. R.; Veltman, O. R.; Stulp, B. K.; Venema, G. Introduction of a Stabilizing 10 Residue Beta-Hairpin in Bacillus Subtilis Neutral Protease. Protein Eng., Des. Sel. 1992, 5, 157– 163, DOI: 10.1093/protein/5.2.157There is no corresponding record for this reference.
-
62Lee, C.; Levitt, M. Accurate Prediction of the Stability and Activity Effects of Site-Directed Mutagenesis on a Protein Core. Nature 1991, 352, 448– 451, DOI: 10.1038/352448a062https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3MXltlWgt7g%253D&md5=c6845f89ebb1cfb4b56f72fdeb552838Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein coreLee, Christopher; Levitt, MichaelNature (London, United Kingdom) (1991), 352 (6334), 448-51CODEN: NATUAS; ISSN:0028-0836.Theor. prediction of the structure, stability and activity of proteins, an important unsolved problem in mol. biol., would be of use for guiding site-directed mutagenesis and other protein-engineering techniques. X-ray diffraction studies have provided extensive structural information for many proteins, challenging theorists to develop reliable techniques able to use such knowledge as a base for prediction of mutants' characteristics. Here theor. calcn. of stabilization energies is reported for 78 triple-site sequence variants of λ repressor characterized exptl. The calcd. energies correlate with the mutants' measured activities; active and inactive mutations are discriminated with 92% reliability. They correlate even more directly with the mutant's thermostabilities, correctly identifying two of the mutants to be more stable than the wild type.
-
63Buß, O.; Muller, D.; Jager, S.; Rudat, J.; Rabe, K. S. Improvement in the Thermostability of a β-Amino Acid Converting ω-Transaminase by Using FoldX. ChemBioChem 2018, 19, 379– 387, DOI: 10.1002/cbic.20170046763https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvF2jsbzP&md5=9e031341acdee60b9abf74ba62226fd9Improvement in the Thermostability of a β-Amino Acid Converting ω-Transaminase by Using FoldXBuss, Oliver; Muller, Delphine; Jager, Sven; Rudat, Jens; Rabe, Kersten S.ChemBioChem (2018), 19 (4), 379-387CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)ω-Transaminases (ω-TAs) are important biocatalysts for the synthesis of active, chiral pharmaceutical ingredients contg. amino groups, such as β-amino acids, which are important in peptidomimetics and as building blocks for drugs. However, the application of ω-TAs is limited by the availability and stability of enzymes with high conversion rates. One strategy for the synthesis and optical resoln. of β-phenylalanine and other important arom. β-amino acids is biotransformation by utilizing an ω-transaminase from Variovorax paradoxus. We designed variants of this ω-TA to gain higher process stability on the basis of predictions calcd. by using the FoldX software. We herein report the first thermostabilization of a nonthermostable S-selective ω-TA by FoldX-guided site-directed mutagenesis. The m.p. (Tm) of our best-performing mutant was increased to 59.3 °C, an increase of 4.0 °C relative to the Tm value of the wild-type enzyme, whereas the mutant fully retained its specific activity.
-
64Modarres, H. P.; Mofrad, M. R.; Sanati-Nezhad, A. Protein Thermostability Engineering. RSC Adv. 2016, 6, 115252– 115270, DOI: 10.1039/C6RA16992A64https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhsl2gt7nE&md5=d1f4472316de49f4a85a1fa175a17b49Protein thermostability engineeringModarres, H. Pezeshgi; Mofrad, M. R.; Sanati-Nezhad, A.RSC Advances (2016), 6 (116), 115252-115270CODEN: RSCACL; ISSN:2046-2069. (Royal Society of Chemistry)The use of enzymes for industrial and biomedical applications is limited to their function at elevated temps. The principles of thermostability engineering need to be implemented for proteins with low thermal stability to broaden their applications. Therefore, understanding the thermal stability modulating factors of proteins is necessary for engineering their thermostability. In this review, first different thermostability enhancing strategies in both the sequence and structure levels, discovered by studying the natural proteins adapted to different conditions, are introduced. Next, the progress in the development of various computational methods to engineer thermostability of proteins by learning from nature and introducing several popular tools and algorithms for protein thermostability engineering is highlighted. Further discussion includes the challenges in the field of protein thermostability engineering such as the protein stability-activity trade-off. Finally, how thermostability engineering could be instrumental for the design of protein drugs for biomedical applications is demonstrated.
-
65Pace, C. N.; Scholtz, J. M.; Grimsley, G. R. Forces Stabilizing Proteins. FEBS Lett. 2014, 588, 2177– 2184, DOI: 10.1016/j.febslet.2014.05.006There is no corresponding record for this reference.
-
66Lazaridis, T.; Karplus, M. Effective Energy Functions for Protein Structure Prediction. Curr. Opin. Struct. Biol. 2000, 10, 139– 145, DOI: 10.1016/S0959-440X(00)00063-466https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXivFWgsbY%253D&md5=eeefab13ff97ddc2b40453f19291f365Effective energy functions for protein structure predictionLazaridis, Themis; Karplus, MartinCurrent Opinion in Structural Biology (2000), 10 (2), 139-145CODEN: COSBEF; ISSN:0959-440X. (Elsevier Science Ltd.)A review, with 78 refs. Protein structure prediction, fold recognition, homol. modeling and design rely mainly on statistical effective energy functions. Although the theor. foundation of such functions is not clear, their usefulness has been demonstrated in many applications. Mol. mechanics force fields, particularly when augmented by implicit solvation models, provide phys. effective energy functions that are beginning to play a role in this area.
-
67Seeliger, D.; de Groot, B. L. Protein Thermostability Calculations Using Alchemical Free Energy Simulations. Biophys. J. 2010, 98, 2309– 2316, DOI: 10.1016/j.bpj.2010.01.05167https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXosFCitrw%253D&md5=cc1e6e66f18be5c171829f6485cff377Protein thermostability calculations using alchemical free energy simulationsSeeliger, Daniel; de Groot, Bert L.Biophysical Journal (2010), 98 (10), 2309-2316CODEN: BIOJAU; ISSN:0006-3495. (Cell Press)Thermal stability of proteins is crucial for both biotechnol. and therapeutic applications. Rational protein engineering therefore frequently aims at increasing thermal stability by introducing stabilizing mutations. The accurate prediction of the thermodn. consequences caused by mutations, however, is highly challenging as thermal stability changes are caused by alterations in the free energy of folding. Growing computational power, however, increasingly allows us to use alchem. free energy simulations, such as free energy perturbation or thermodn. integration, to calc. free energy differences with relatively high accuracy. In this article, we present an automated protocol for setting up alchem. free energy calcns. for mutations of naturally occurring amino acids (except for proline) that allows an unprecedented, automated screening of large mutant libraries. To validate the developed protocol, we calcd. thermodn. stability differences for 109 mutations in the microbial RNase Barnase. The obtained quant. agreement with exptl. data illustrates the potential of the approach in protein engineering and design.
-
68Zhang, Z.; Wang, L.; Gao, Y.; Zhang, J.; Zhenirovskyy, M.; Alexov, E. Predicting Folding Free Energy Changes upon Single Point Mutations. Bioinformatics 2012, 28, 664– 671, DOI: 10.1093/bioinformatics/bts00568https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XjtlKntLg%253D&md5=f73d2f94ea145bd7a2b6ef7098e5ec52Predicting folding free energy changes upon single point mutationsZhang, Zhe; Wang, Lin; Gao, Yang; Zhang, Jie; Zhenirovskyy, Maxim; Alexov, EmilBioinformatics (2012), 28 (5), 664-671CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: The folding free energy is an important characteristic of proteins stability and is directly related to protein's wild-type function. The changes of protein's stability due to naturally occurring mutations, missense mutations, are typically causing diseases. Single point mutations made in vitro are frequently used to assess the contribution of given amino acid to the stability of the protein. In both cases, it is desirable to predict the change of the folding free energy upon single point mutations in order to either provide insights of the mol. mechanism of the change or to design new exptl. studies. Results: We report an approach that predicts the free energy change upon single point mutation by utilizing the 3D structure of the wild-type protein. It is based on variation of the mol. mechanics Generalized Born (MMGB) method, scaled with optimized parameters (sMMGB) and utilizing specific model of unfolded state. The corresponding mutations are built in silico and the predictions are tested against large dataset of 1109 mutations with exptl. measured changes of the folding free energy. Benchmarking resulted in root mean square deviation = 1.78 kcal/mol and slope of the linear regression fit between the exptl. data and the calcns. was 1.04. The sMMGB is compared with other leading methods of predicting folding free energy changes upon single mutations and results discussed with respect to various parameters. Availability: All the pdb files the authors used in this article can be downloaded from http://compbio.clemson.edu/downloadDir/mentaldisorders/sMMGBpdb.rar.
-
69Wickstrom, L.; Gallicchio, E.; Levy, R. M. The Linear Interaction Energy Method for the Prediction of Protein Stability Changes Upon Mutation. Proteins: Struct., Funct., Genet. 2012, 80, 111– 125, DOI: 10.1002/prot.2316869https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtlygu73L&md5=bc404545288a8529418812ed678171e1The linear interaction energy method for the prediction of protein stability changes upon mutationWickstrom, Lauren; Gallicchio, Emilio; Levy, Ronald M.Proteins: Structure, Function, and Bioinformatics (2012), 80 (1), 111-125CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)The coupling of protein energetics and sequence changes is a crit. aspect of computational protein design, as well as for the understanding of protein evolution, human disease, and drug resistance. To study the mol. basis for this coupling, computational tools must be sufficiently accurate and computationally inexpensive enough to handle large amts. of sequence data. We have developed a computational approach based on the linear interaction energy (LIE) approxn. to predict the changes in the free-energy of the native state induced by a single mutation. This approach was applied to a set of 822 mutations in 10 proteins which resulted in an av. unsigned error of 0.82 kcal/mol and a correlation coeff. of 0.72 between the calcd. and exptl. ΔΔG values. The method is able to accurately identify destabilizing hot spot mutations; however, it has difficulty in distinguishing between stabilizing and destabilizing mutations because of the distribution of stability changes for the set of mutations used to parameterize the model. In addn., the model also performs quite well in initial tests on a small set of double mutations. On the basis of these promising results, we can begin to examine the relationship between protein stability and fitness, correlated mutations, and drug resistance.
-
70Guerois, R.; Nielsen, J. E.; Serrano, L. Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More than 1000 Mutations. J. Mol. Biol. 2002, 320, 369– 387, DOI: 10.1016/S0022-2836(02)00442-470https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XkslansLc%253D&md5=1e37d01c8310f0ba153cd2af3f5f771cPredicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutationsGuerois, Raphael; Nielsen, Jens Erik; Serrano, LuisJournal of Molecular Biology (2002), 320 (2), 369-387CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Science Ltd.)We have developed a computer algorithm, FOLDEF (for FOLD-X energy function), to provide a fast and quant. estn. of the importance of the interactions contributing to the stability of proteins and protein complexes. The predictive power of FOLDEF was tested on a very large set of point mutants (1088 mutants) spanning most of the structural environments found in proteins. FOLDEF uses a full at. description of the structure of the proteins. The different energy terms taken into account in FOLDEF have been weighted using empirical data obtained from protein engineering expts. First, we considered a training database of 339 mutants in nine different proteins and optimized the set of parameters and weighting factors that best accounted for the changes in stability of the mutants. The predictive power of the method was then tested using a blind test mutant database of 667 mutants, as well as a database of 82 protein-protein complex mutants. The global correlation obtained for 95 % of the entire mutant database (1030 mutants) is 0.83 with a std. deviation of 0.81 kcal mol-1 and a slope of 0.76. The present energy function uses a min. of computational resources and can therefore easily be used in protein design algorithms, and in the field of protein structure and folding pathways prediction where one requires a fast and accurate energy function.
-
71Mendes, J.; Guerois, R.; Serrano, L. Energy Estimation in Protein Design. Curr. Opin. Struct. Biol. 2002, 12, 441– 446, DOI: 10.1016/S0959-440X(02)00345-771https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XlvF2rtLc%253D&md5=d0d2d096d37ae267145a550325ba0cc0Energy estimation in protein designMendes, Joaquim; Guerois, Raphael; Serrano, LuisCurrent Opinion in Structural Biology (2002), 12 (4), 441-446CODEN: COSBEF; ISSN:0959-440X. (Elsevier Science Ltd.)A review. The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold. The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold.
-
72Dehouck, Y.; Gilis, D.; Rooman, M. A New Generation of Statistical Potentials for Proteins. Biophys. J. 2006, 90, 4010– 4017, DOI: 10.1529/biophysj.105.07943472https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XltF2mt7s%253D&md5=9e0b446c406d4d388bb2a4ad0ef271e4A new generation of statistical potentials for proteinsDehouck, Y.; Gilis, D.; Rooman, M.Biophysical Journal (2006), 90 (11), 4010-4017CODEN: BIOJAU; ISSN:0006-3495. (Biophysical Society)We propose a novel and flexible derivation scheme of statistical, database-derived, potentials, which allows one to take simultaneously into account specific correlations between several sequence and structure descriptors. This scheme leads to the decompn. of the total folding free energy of a protein into a sum of lower order terms, thereby giving the possibility to analyze independently each contribution and clarify its significance and importance, to avoid overcounting certain contributions, and to deal more efficiently with the limited size of the database. In addn., this derivation scheme appears as quite general, for many previously developed potentials can be expressed as particular cases of our formalism. We use this formalism as a framework to generate different residue-based energy functions, whose performances are assessed on the basis of their ability to discriminate genuine proteins from decoy models. The optimal potential is generated as a combination of several coupling terms, measuring correlations between residue types, backbone torsion angles, solvent accessibilities, relative positions along the sequence, and interresidue distances. This potential outperforms all tested residue-based potentials, and even several atom-based potentials. Its incorporation in algorithms aiming at predicting protein structure and stability should therefore substantially improve their performances.
-
73Dehouck, Y.; Kwasigroch, J. M.; Gilis, D.; Rooman, M. PoPMuSiC 2.1: A Web Server for the Estimation of Protein Stability Changes upon Mutation and Sequence Optimality. BMC Bioinf. 2011, 12, 151, DOI: 10.1186/1471-2105-12-15173https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3MngtVKktg%253D%253D&md5=b05d95255f2c9c47c88a3d96485e76cdPoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimalityDehouck Yves; Kwasigroch Jean Marc; Gilis Dimitri; Rooman MarianneBMC bioinformatics (2011), 12 (), 151 ISSN:.BACKGROUND: The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level. RESULTS: PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability. CONCLUSION: The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.
-
74Liu, H. On Statistical Energy Functions for Biomolecular Modeling and Design. Quant. Biol. 2015, 3, 157– 167, DOI: 10.1007/s40484-015-0054-x74https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XjvFGrtw%253D%253D&md5=c09a8a84fa14a3c91a5ac461d6443ff9On statistical energy functions for biomolecular modeling and designLiu, HaiyanQuantitative Biology (2015), 3 (4), 157-167CODEN: QBUIA3; ISSN:2095-4697. (Springer GmbH)Statistical energy functions are general models about at. or residue-level interactions in biomols., derived from existing exptl. data. They provide quant. foundations for structural modeling as well as for structure-based protein sequence design. Statistical energy functions can be derived computationally either based on statistical distributions or based on variational assumptions. We present overviews on the theor. assumptions underlying the various types of approaches. Theor. considerations underlying important pragmatic choices are discussed. [Figure not available: see fulltext.].
-
75Kumar, M. D. S.; Bava, K. A.; Gromiha, M. M.; Prabakaran, P.; Kitajima, K.; Uedaira, H.; Sarai, A. ProTherm and ProNIT: Thermodynamic Databases for Proteins and Protein–Nucleic Acid Interactions. Nucleic Acids Res. 2006, 34, D204– 206, DOI: 10.1093/nar/gkj10375https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XisFyitA%253D%253D&md5=31a4c4d1ba1948a78963225177f1bcdfProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactionsKumar, M. D. Shaji; Bava, K. Abdulla; Gromiha, M. Michael; Prabakaran, Ponraj; Kitajima, Koji; Uedaira, Hatsuho; Sarai, AkinoriNucleic Acids Research (2006), 34 (Database), D204-D206CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)ProTherm and ProNIT are two thermodn. databases that contain exptl. detd. thermodn. parameters of protein stability and protein-nucleic acid interactions, resp. The current versions of both the databases have considerably increased the total no. of entries and enhanced search interface with added new fields, improved search, display and sorting options. As on Sept. 2005, ProTherm release 5.0 contains 17 113 entries from 771 proteins, retrieved from 1497 scientific articles (∼20% increase in data from the previous version). ProNIT release 2.0 contains 4900 entries from 273 research articles, representing 158 proteins. Both databases can be queried using WWW interfaces. Both quick search and advanced search are provided on this web page to facilitate easy retrieval and display of the data from these databases. ProTherm is freely available online at http://gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html and ProNIT at http://gibk26.bse.kyutech.ac.jp.jouhou/pronit/pronit.html.
-
76Pucci, F.; Bourgeas, R.; Rooman, M. High-Quality Thermodynamic Data on the Stability Changes of Proteins Upon Single-Site Mutations. J. Phys. Chem. Ref. Data 2016, 45, 023104, DOI: 10.1063/1.494749376https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XpsF2kt70%253D&md5=054a88915e91acf65599527a74c7d0c6High-quality Thermodynamic Data on the Stability Changes of Proteins Upon Single-site MutationsPucci, Fabrizio; Bourgeas, Raphael; Rooman, MarianneJournal of Physical and Chemical Reference Data (2016), 45 (2), 023104/1-023104/53CODEN: JPCRBU; ISSN:0047-2689. (American Institute of Physics)We have set up and manually curated a dataset contg. exptl. information on the impact of amino acid substitutions in a protein on its thermal stability. It consists of a repository of exptl. measured melting temps. (Tm) and their changes upon point mutations (ΔTm) for proteins having a well-resolved x-ray structure. This high-quality dataset is designed for being used for the training or benchmarking of in silico thermal stability prediction methods. It also reports other exptl. measured thermodn. quantities when available, i.e., the folding enthalpy (ΔH) and heat capacity (ΔCP) of the wild type proteins and their changes upon mutations (ΔΔH and ΔΔCP), as well as the change in folding free energy (ΔΔG) at a ref. temp. These data are analyzed in view of improving our insights into the correlation between thermal and thermodn. stabilities, the asymmetry between the no. of stabilizing and destabilizing mutations, and the difference in stabilization potential of thermostable vs. mesostable proteins. (c) 2016 American Institute of Physics.
-
77Potapov, V.; Cohen, M.; Schreiber, G. Assessing Computational Methods for Predicting Protein Stability upon Mutation: Good on Average but Not in the Details. Protein Eng., Des. Sel. 2009, 22, 553– 560, DOI: 10.1093/protein/gzp03077https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtVKns7zI&md5=428ef7793cd6062f3e4b05831742ce25Assessing computational methods for predicting protein stability upon mutation: good on average but not in the detailsPotapov, Vladimir; Cohen, Mati; Schreiber, GideonProtein Engineering, Design & Selection (2009), 22 (9), 553-560CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)Methods for protein modeling and design advanced rapidly in recent years. At the heart of these computational methods is an energy function that calcs. the free energy of the system. Many of these functions were also developed to est. the consequence of mutation on protein stability or binding affinity. In the current study, the authors chose 6 different methods that were previously reported as being able to predict the change in protein stability (ΔΔG) upon mutation: CC/PBSA, EGAD, FoldX, I-Mutant2.0, Rosetta and Hunter. The authors evaluated their performance on a large set of 2156 single mutations, avoiding for each program the mutations used for training. The correlation coeffs. between exptl. and predicted ΔΔG values were in the range of 0.59 for the best and 0.26 for the worst performing method. All the tested computational methods showed a correct trend in their predictions, but failed in providing the precise values. This is not due to lack in precision of the exptl. data, which showed a correlation coeff. of 0.86 between different measurements. Combining the methods did not significantly improve prediction accuracy compared to a single method. These results suggest that there is still room for improvement, which is crucial if we want forcefields to perform better in their various tasks.
-
78Schymkowitz, J.; Borg, J.; Stricher, F.; Nys, R.; Rousseau, F.; Serrano, L. The FoldX Web Server: An Online Force Field. Nucleic Acids Res. 2005, 33, W382– 388, DOI: 10.1093/nar/gki38778https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXlslyrur4%253D&md5=1c3cd02dfeb8b5df1e1096939aa9cf03The FoldX web server: an online force fieldSchymkowitz, Joost; Borg, Jesper; Stricher, Francois; Nys, Robby; Rousseau, Frederic; Serrano, LuisNucleic Acids Research (2005), 33 (Web Server), W382-W388CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)FoldX is an empirical force field that was developed for the rapid evaluation of the effect of mutations on the stability, folding and dynamics of proteins and nucleic acids. The core functionality of FoldX, namely the calcn. of the free energy of a macromol. based on its high-resoln. 3D structure, is now publicly available through a web server at http://foldx.embl.de/. The current release allows the calcn. of the stability of a protein, calcn. of the positions of the protons and the prediction of water bridges, prediction of metal binding sites and the anal. of the free energy of complex formation. Alanine scanning, the systematic truncation of side chains to alanine, is also included. In addn., some reporting functions have been added, and it is now possible to print both the at. interaction networks that constitute the protein, print the structural and energetic details of the interactions per atom or per residue, as well as generate a general quality report of the pdb structure. This core functionality will be further extended as more FoldX applications are developed.
-
79Kepp, K. P. Towards a “Golden Standard” for Computing Globin Stability: Stability and Structure Sensitivity of Myoglobin Mutants. Biochim. Biophys. Acta, Proteins Proteomics 2015, 1854, 1239– 1248, DOI: 10.1016/j.bbapap.2015.06.00279https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtVWrtLjL&md5=92253dc29cffce1ed2835a1df377b9f6Towards a "Golden Standard" for computing globin stability: Stability and structure sensitivity of myoglobin mutantsKepp, Kasper P.Biochimica et Biophysica Acta, Proteins and Proteomics (2015), 1854 (10_Part_A), 1239-1248CODEN: BBAPBW; ISSN:1570-9639. (Elsevier B. V.)Fast and accurate computation of protein stability is increasingly important for e.g. protein engineering and protein misfolding diseases, but no consensus methods exist for important proteins such as globins, and performance may depend on the type of structural input given. This paper reports benchmarking of six protein stability calculators (POPMUSIC 2.1, I-Mutant 2.0, I-Mutant 3.0, CUPSAT, SDM, and mCSM) against 134 exptl. stability changes for mutations of sperm-whale myoglobin. Six different high-resoln. structures were used to test structure sensitivity that may impair protein calcns. The trend accuracy of the methods decreased as I-Mutant 2.0 (R = 0.64 - 0.65), SDM (R = 0.57 - 0.60), POPMUSIC2.1 (R = 0.54 - 0.57), I-Mutant 3.0 (R = 0.53 - 0.55), mCSM (R = 0.35 - 0.47), and CUPSAT (R = 0.25 - 0.48). The mean signed errors increased as SDM < CUPSAT < I-Mutant 2.0 < I-Mutant 3.0 < POPMUSIC 2.1 < mCSM. Mean abs. errors increased as I-Mutant 2.0 < I-Mutant 3.0 < POPMUSIC 2.1 < CUPSAT < SDM < mCSM. Structural sensitivity increased as I-Mutant 3.0 (0.05) < I-Mutant 2.0 (0.09) < POPMUSIC 2.1 (0.12) < SDM (0.18) < mCSM (0.27) < CUPSAT (0.58). Leaving out heterogeneous exptl. data did not change conclusions. The distinct performances reveal room for improvement, but I-Mutant 2.0 is proficient for this purpose, as further validated against a data set of related cytochrome c like proteins. The results also emphasize the importance of high-quality crystal structures and reveal structure-dependent effects even in the near-at. resoln. limit.
-
80Christensen, N. J.; Kepp, K. P. Accurate Stabilities of Laccase Mutants Predicted with a Modified FoldX Protocol. J. Chem. Inf. Model. 2012, 52, 3028– 3042, DOI: 10.1021/ci300398z80https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhsFOlt77M&md5=967d831a16967900e285baf70988ad75Accurate Stabilities of Laccase Mutants Predicted with a Modified FoldX ProtocolChristensen, Niels J.; Kepp, Kasper P.Journal of Chemical Information and Modeling (2012), 52 (11), 3028-3042CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)Fungal laccases are multicopper enzymes of industrial importance due to their high stability, multifunctionality, and oxidizing power. This paper reports computational protocols that quantify the relative stability (ΔΔG of folding) of mutants of high-redox-potential laccases (TvLIIIb and PM1L) with up to 11 simultaneously mutated sites with good correlation against exptl. stability trends. Mol. dynamics simulations of the two laccases show that FoldX is very structure-sensitive, since all mutants and the wild type must share structural configuration to avoid artifacts of local sampling. However, using the av. of 50 MD snapshots of the equilibrated trajectories restores correlation (r ∼ 0.7-0.9, r2 ∼ 0.49-0.81) and provides a root-mean-square accuracy of ∼1.2 kcal/mol for ΔΔG or 3.5 °C for T50, suggesting that the time-av. of the crystal structure is recovered. MD-averaged input also reduces the spread in ΔΔG, suggesting that local FoldX sampling overestimates free energy changes because of neglected protein relaxation. FoldX can be viewed as a simple "linear interaction energy" method using sampling of the wild type and mutant and a parametrized relative free energy function: Thus, we show in this work that a substantial "hysteresis" of ∼1 kcal/mol applies to FoldX, and that an improved protocol that reverses calcns. and uses the av. obtained ΔΔG enhances correlation with the exptl. data. As glycosylation is ignored in FoldX, its effect on ΔΔG must be additive to the amino acid mutations. Quant. structure-property relationships of the FoldX energy components produced a substantially improved laccase stability predictor with errors of ∼1 °C for T50, vs 3-5 °C for a std. FoldX protocol. The developed model provides insight into the phys. forces governing the high stability of fungal laccases, most notably the hydrophobic and van der Waals interactions in the folded state, which provide most of the predictive power.
-
81MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586– 3616, DOI: 10.1021/jp973084f81https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXivVOlsb4%253D&md5=ebb5100dafd0daeee60ca2fa66c1324aAll-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of ProteinsMacKerell, A. D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E., III; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M.Journal of Physical Chemistry B (1998), 102 (18), 3586-3616CODEN: JPCBFK; ISSN:1089-5647. (American Chemical Society)New protein parameters are reported for the all-atom empirical energy function in the CHARMM program. The parameter evaluation was based on a self-consistent approach designed to achieve a balance between the internal (bonding) and interaction (nonbonding) terms of the force field and among the solvent-solvent, solvent-solute, and solute-solute interactions. Optimization of the internal parameters used exptl. gas-phase geometries, vibrational spectra, and torsional energy surfaces supplemented with ab initio results. The peptide backbone bonding parameters were optimized with respect to data for N-methylacetamide and the alanine dipeptide. The interaction parameters, particularly the at. charges, were detd. by fitting ab initio interaction energies and geometries of complexes between water and model compds. that represented the backbone and the various side chains. In addn., dipole moments, exptl. heats and free energies of vaporization, solvation and sublimation, mol. vols., and crystal pressures and structures were used in the optimization. The resulting protein parameters were tested by applying them to noncyclic tripeptide crystals, cyclic peptide crystals, and the proteins crambin, bovine pancreatic trypsin inhibitor, and carbonmonoxy myoglobin in vacuo and in a crystal. A detailed anal. of the relationship between the alanine dipeptide potential energy surface and calcd. protein φ, χ angles was made and used in optimizing the peptide group torsional parameters. The results demonstrate that use of ab initio structural and energetic data by themselves are not sufficient to obtain an adequate backbone representation for peptides and proteins in soln. and in crystals. Extensive comparisons between mol. dynamics simulation and exptl. data for polypeptides and proteins were performed for both structural and dynamic properties. Calcd. data from energy minimization and dynamics simulations for crystals demonstrate that the latter are needed to obtain meaningful comparisons with exptl. crystal structures. The presented parameters, in combination with the previously published CHARMM all-atom parameters for nucleic acids and lipids, provide a consistent set for condensed-phase simulations of a wide variety of mols. of biol. interest.
-
82Oostenbrink, C.; Villa, A.; Mark, A. E.; van Gunsteren, W. F. A Biomolecular Force Field Based on the Free Enthalpy of Hydration and Solvation: The GROMOS Force-Field Parameter Sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656– 1676, DOI: 10.1002/jcc.2009082https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXmvVOhtr4%253D&md5=f2c0be6f44fe768128989c9031957e4eA biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force-field parameter sets 53A5 and 53A6Oostenbrink, Chris; Villa, Alessandra; Mark, Alan E.; van Gunsteren, Wilfred F.Journal of Computational Chemistry (2004), 25 (13), 1656-1676CODEN: JCCHDD; ISSN:0192-8651. (John Wiley & Sons, Inc.)Successive parameterizations of the GROMOS force field have been used successfully to simulate biomol. systems over a long period of time. The continuing expansion of computational power with time makes it possible to compute ever more properties for an increasing variety of mol. systems with greater precision. This has led to recurrent parameterizations of the GROMOS force field all aimed at achieving better agreement with exptl. data. Here we report the results of the latest, extensive reparameterization of the GROMOS force field. In contrast to the parameterization of other biomol. force fields, this parameterization of the GROMOS force field is based primarily on reproducing the free enthalpies of hydration and apolar solvation for a range of compds. This approach was chosen because the relative free enthalpy of solvation between polar and apolar environments is a key property in many biomol. processes of interest, such as protein folding, biomol. assocn., membrane formation, and transport over membranes. The newest parameter sets, 53A5 and 53A6, were optimized by first fitting to reproduce the thermodn. properties of pure liqs. of a range of small polar mols. and the solvation free enthalpies of amino acid analogs in cyclohexane (53A5). The partial charges were then adjusted to reproduce the hydration free enthalpies in water (53A6). Both parameter sets are fully documented, and the differences between these and previous parameter sets are discussed.
-
83Alford, R. F.; Leaver-Fay, A.; Jeliazkov, J. R.; O’Meara, M. J.; DiMaio, F. P.; Park, H.; Shapovalov, M. V.; Renfrew, P. D.; Mulligan, V. K.; Kappel, K.; Labonte, J. W.; Pacella, M. S.; Bonneau, R.; Bradley, P.; Dunbrack, R. L.; Das, R.; Baker, D.; Kuhlman, B.; Kortemme, T.; Gray, J. J. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 3031– 3048, DOI: 10.1021/acs.jctc.7b0012583https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXmsFajtb0%253D&md5=7c50732bb0c8d060bbf13df04766ce39The Rosetta All-Atom Energy Function for Macromolecular Modeling and DesignAlford, Rebecca F.; Leaver-Fay, Andrew; Jeliazkov, Jeliazko R.; O'Meara, Matthew J.; DiMaio, Frank P.; Park, Hahnbeom; Shapovalov, Maxim V.; Renfrew, P. Douglas; Mulligan, Vikram K.; Kappel, Kalli; Labonte, Jason W.; Pacella, Michael S.; Bonneau, Richard; Bradley, Philip; Dunbrack, Roland L.; Das, Rhiju; Baker, David; Kuhlman, Brian; Kortemme, Tanja; Gray, Jeffrey J.Journal of Chemical Theory and Computation (2017), 13 (6), 3031-3048CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)A review. Over the past decade, the Rosetta biomol. modeling suite has informed diverse biol. questions and engineering challenges ranging from interpretation of low-resoln. structural data to design of nanomaterials, protein therapeutics, and vaccines. Central to Rosetta's success is the energy function: a model parameterized from small mol. and x-ray crystal structure data used to approx. the energy assocd. with each biomol. conformation. This paper describes the math. models and phys. concepts that underlie the latest Rosetta Energy Function, REF15. Applying these concepts, the authors explain how to use Rosetta energies to identify and analyze the features of biomol. models. Finally, the authors discuss the latest advances in the energy function that extend capabilities from sol. proteins to also include membrane proteins, peptides contg. noncanonical amino acids, small mols., carbohydrates, nucleic acids, and other macromols.
-
84Davey, J. A.; Damry, A. M.; Euler, C. K.; Goto, N. K.; Chica, R. A. Prediction of Stable Globular Proteins Using Negative Design with Non-Native Backbone Ensembles. Structure 2015, 23, 2011– 2021, DOI: 10.1016/j.str.2015.07.02184https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhsFKqsbzF&md5=4c11259f13e4cd1a28c5b550631978eePrediction of Stable Globular Proteins Using Negative Design with Non-native Backbone EnsemblesDavey, James A.; Damry, Adam M.; Euler, Christian K.; Goto, Natalie K.; Chica, Roberto A.Structure (Oxford, United Kingdom) (2015), 23 (11), 2011-2021CODEN: STRUE6; ISSN:0969-2126. (Elsevier Ltd.)Accurate predictions of protein stability have great potential to accelerate progress in computational protein design, yet the correlation of predicted and exptl. detd. stabilities remains a significant challenge. To address this problem, we have developed a computational framework based on neg. multistate design in which sequence energy is evaluated in the context of both native and non-native backbone ensembles. This framework was validated exptl. with the design of ten variants of streptococcal protein G domain β1 that retained the wild-type fold, and showed a very strong correlation between predicted and exptl. stabilities (R2 = 0.86). When applied to four different proteins spanning a range of fold types, similarly strong correlations were also obtained. Overall, the enhanced prediction accuracies afforded by this method pave the way for new strategies to facilitate the generation of proteins with novel functions by computational protein design.
-
85Ó Conchúir, S.; Barlow, K. A.; Pache, R. A.; Ollikainen, N.; Kundert, K.; O’Meara, M. J.; Smith, C. A.; Kortemme, T. A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design. PLoS One 2015, 10, e0130433, DOI: 10.1371/journal.pone.0130433There is no corresponding record for this reference.
-
86Trainor, K.; Broom, A.; Meiering, E. M. Exploring the Relationships between Protein Sequence, Structure and Solubility. Curr. Opin. Struct. Biol. 2017, 42, 136– 146, DOI: 10.1016/j.sbi.2017.01.00486https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVeitbw%253D&md5=ecc332d41a33abbdd3a2ff614195d08fExploring the relationships between protein sequence, structure and solubilityTrainor, Kyle; Broom, Aron; Meiering, Elizabeth M.Current Opinion in Structural Biology (2017), 42 (), 136-146CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)A review. Aggregation can be thought of as a form of protein folding in which intermol. assocns. lead to the formation of large, insol. assemblies. Various types of aggregates can be differentiated by their internal structures and gross morphologies (e.g., fibrillar or amorphous), and the ability to accurately predict the likelihood of their formation by a given polypeptide is of great practical utility in the fields of biol. (including the study of disease), biotechnol., and biomaterials research. Here we review aggregation/soly. prediction methods and selected applications thereof. The development of increasingly sophisticated methods that incorporate knowledge of conformations possibly adopted by aggregating polypeptide monomers and predict the internal structure of aggregates is improving the accuracy of the predictions and continually expanding the range of applications.
-
87Das, R. Four Small Puzzles That Rosetta Doesn’t Solve. PLoS One 2011, 6, e20044, DOI: 10.1371/journal.pone.002004487https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXms1Kmur4%253D&md5=e61085052b8642f9819bf84d8090f4cbFour small puzzles that Rosetta doesn't solveDas, RhijuPLoS One (2011), 6 (5), e20044CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)A complete macromol. modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resoln. structure modeling and design, the Rosetta software suite fares poorly on small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approxns. and omissions in the Rosetta all-atom energy function currently preclude discriminating exptl. obsd. conformations from de novo models at at. resoln. These mol. "puzzles" should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.
-
88Kellogg, E. H.; Leaver-Fay, A.; Baker, D. Role of Conformational Sampling in Computing Mutation-Induced Changes in Protein Structure and Stability. Proteins: Struct., Funct., Genet. 2011, 79, 830– 838, DOI: 10.1002/prot.2292188https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjtFahsbg%253D&md5=df144d0b7df3f42669c7344c0b13b806Role of conformational sampling in computing mutation-induced changes in protein structure and stabilityKellogg, Elizabeth H.; Leaver-Fay, Andrew; Baker, DavidProteins: Structure, Function, and Bioinformatics (2011), 79 (3), 830-838CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)The prediction of changes in protein stability and structure resulting from single amino acid substitutions is both a fundamental test of macromol. modeling methodol. and an important current problem as high throughput sequencing reveals sequence polymorphisms at an increasing rate. In principle, given the structure of a wild-type protein and a point mutation whose effects are to be predicted, an accurate method should recapitulate both the structural changes and the change in the folding-free energy. Here, we explore the performance of protocols which sample an increasing diversity of conformations. We find that surprisingly similar performances in predicting changes in stability are achieved using protocols that involve very different amts. of conformational sampling, provided that the resoln. of the force field is matched to the resoln. of the sampling method. Methods involving backbone sampling can in some cases closely recapitulate the structural changes accompanying mutations but not surprisingly tend to do more harm than good in cases where structural changes are negligible. Anal. of the outliers in the stability change calcns. suggests areas needing particular improvement; these include the balance between desolvation and the formation of favorable buried polar interactions, and unfolded state modeling.
-
89Musil, M.; Stourac, J.; Bendl, J.; Brezovsky, J.; Prokop, Z.; Zendulka, J.; Martinek, T.; Bednar, D.; Damborsky, J. FireProt: Web Server for Automated Design of Thermostable Proteins. Nucleic Acids Res. 2017, 45, W393– W399, DOI: 10.1093/nar/gkx28589https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1ajtbs%253D&md5=10985da4ecd4d7ff3835a413c85f8e3bFireProt: web server for automated design of thermostable proteinsMusil, Milos; Stourac, Jan; Bendl, Jaroslav; Brezovsky, Jan; Prokop, Zbynek; Zendulka, Jaroslav; Martinek, Tomas; Bednar, David; Damborsky, JiriNucleic Acids Research (2017), 45 (W1), W393-W399CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)There is a continuous interest in increasing proteins stability to enhance their usability in numerous biomedical and biotechnol. applications. A no. of in silico tools for the prediction of the effect of mutations on protein stability have been developed recently. However, only single-point mutations with a small effect on protein stability are typically predicted with the existing tools and have to be followed by laborious protein expression, purifn., and characterization. Here, the authors present FireProt, a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calcn. core. FireProt utilizes sixteen tools and three protein engineering strategies for making reliable protein designs. The server is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostable mutants. FireProt is freely available at http://loschmidt.chemi.muni.cz/fireprot.
-
90Bush, J.; Makhatadze, G. I. Statistical Analysis of Protein Structures Suggests That Buried Ionizable Residues in Proteins Are Hydrogen Bonded or Form Salt Bridges. Proteins: Struct., Funct., Genet. 2011, 79, 2027– 2032, DOI: 10.1002/prot.2306790https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXnt1Klt7k%253D&md5=93d332c5c9a965168e698c947386f46bStatistical analysis of protein structures suggests that buried ionizable residues in proteins are hydrogen bonded or form salt bridgesBush, Jeffrey; Makhatadze, George I.Proteins: Structure, Function, and Bioinformatics (2011), 79 (7), 2027-2032CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)It is well known that nonpolar residues are largely buried in the interior of proteins, whereas polar and ionizable residues tend to be more localized on the protein surface where they are solvent-exposed. Such a distribution of residues between surface and interior is well understood from a thermodn. point: nonpolar side-chains are excluded from contact with solvent water, whereas polar and ionizable groups have favorable interactions with water and thus are preferred at the protein surface. However, there is an increasing amt. of information suggesting that polar and ionizable residues do occur in the protein core, including at positions that have no known functional importance. This is inconsistent with the observations that dehydration of polar and in particular ionizable groups is very energetically unfavorable. To resolve this, the authors performed a detailed anal. of the distribution of fractional burial of polar and ionizable residues using a large set of ∼2600 non-homologous protein structures. The authors showed that when ionizable residues were fully buried, the vast majority of them formed H-bonds and/or salt bridges with other polar/ionizable groups. This observation resolved an apparent contradiction: the energetic penalty of dehydration of polar/ionizable groups is paid off by the favorable energy of H-bonding and/or salt bridge formation in the protein interior. This conclusion agrees well with previous findings based on continuum models for electrostatic interactions in proteins.
-
91Stranges, P. B.; Kuhlman, B. A Comparison of Successful and Failed Protein Interface Designs Highlights the Challenges of Designing Buried Hydrogen Bonds. Protein Sci. 2013, 22, 74– 82, DOI: 10.1002/pro.218791https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhvVeksrvO&md5=1e503efb4899c5769d094fa4b4a259b6A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bondsStranges, P. Benjamin; Kuhlman, BrianProtein Science (2013), 22 (1), 74-82CODEN: PRCIEI; ISSN:1469-896X. (Wiley-Blackwell)The accurate design of new protein-protein interactions is a longstanding goal of computational protein design. However, most computationally designed interfaces fail to form exptl. This investigation compares five previously described successful de novo interface designs with 158 failures. Both sets of proteins were designed with the mol. modeling program Rosetta. Designs were considered a success if a high-resoln. crystal structure of the complex closely matched the design model and the equil. dissocn. const. for binding was less than 10 μM. The successes and failures represent a wide variety of interface types and design goals including heterodimers, homodimers, peptide-protein interactions, one-sided designs (i.e., where only one of the proteins was mutated) and two-sided designs. The most striking feature of the successful designs is that they have fewer polar atoms at their interfaces than many of the failed designs. Designs that attempted to create extensive sets of interface-spanning hydrogen bonds resulted in no detectable binding. In contrast, polar atoms make up more than 40% of the interface area of many natural dimers, and native interfaces often contain extensive hydrogen bonding networks. These results suggest that Rosetta may not be accurately balancing hydrogen bonding and electrostatic energies against desolvation penalties and that design processes may not include sufficient sampling to identify side chains in preordered conformations that can fully satisfy the hydrogen bonding potential of the interface.
-
92Beerens, K.; Mazurenko, S.; Kunka, A.; Marques, S. M.; Hansen, N.; Musil, M.; Chaloupkova, R.; Waterman, J.; Brezovsky, J.; Bednar, D.; Prokop, Z.; Damborsky, J. Evolutionary Analysis Is a Powerful Complement to Energy Calculations for Protein Stabilization. ACS Catal. 2018, 8, 9420– 9428, DOI: 10.1021/acscatal.8b0167792https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1ChtrjL&md5=c558f092f166df3aa700b008f3bfae5dEvolutionary Analysis As a Powerful Complement to Energy Calculations for Protein StabilizationBeerens, Koen; Mazurenko, Stanislav; Kunka, Antonin; Marques, Sergio M.; Hansen, Niels; Musil, Milos; Chaloupkova, Radka; Waterman, Jitka; Brezovsky, Jan; Bednar, David; Prokop, Zbynek; Damborsky, JiriACS Catalysis (2018), 8 (10), 9420-9428CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)Stability is one of the most important characteristics of proteins employed as biocatalysts, biotherapeutics and biomaterials, and the role of computational approaches in modifying protein stability is rapidly expanding. We have recently identified stabilizing mutations in haloalkane dehalogenase DhaA using phylogenetic anal. but were not able to reproduce the effects of these mutations using force-field calcns. Here we tested four different hypotheses to explain the mol. basis of stabilization using structural, biochem., biophys. and computational analyses. We demonstrate that stabilization of DhaA by the mutations identified using the phylogenetic anal. is driven by both entropy and enthalpy-contributions, in contrast to primarily enthalpy-driven stabilization by mutations designed by the force-field calcns. Comprehensive bioinformatics anal. revealed that more than half (53%) of 1,099 evolution-based stabilizing mutations would be evaluated as de-stabilizing by force-field calcns. Thermodn. integration considers both folded and unfolded states and can describe the entropic component of stabilization, yet it is not suitable for predictive purposes due to computational demands. Altogether, our results strongly suggest that energetic calcns. should be complemented by a phylogenetic anal. in protein stabilization endeavors.
-
93Wijma, H. J.; Floor, R. J.; Jekel, P. A.; Baker, D.; Marrink, S. J.; Janssen, D. B. Computationally Designed Libraries for Rapid Enzyme Stabilization. Protein Eng., Des. Sel. 2014, 27, 49– 58, DOI: 10.1093/protein/gzt06193https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXptV2gsA%253D%253D&md5=86dfd0f58590931be81287805299d234Computationally designed libraries for rapid enzyme stabilizationWijma, Hein J.; Floor, Robert J.; Jekel, Peter A.; Baker, David; Marrink, Siewert J.; Janssen, Dick B.Protein Engineering, Design & Selection (2014), 27 (2), 49-58CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)The ability to engineer enzymes and other proteins to any desired stability would have wide-ranging applications. Here, we demonstrate that computational design of a library with chem. diverse stabilizing mutations allows the engineering of drastically stabilized and fully functional variants of the mesostable enzyme limonene epoxide hydrolase. First, point mutations were selected if they significantly improved the predicted free energy of protein folding. Disulfide bonds were designed using sampling of backbone conformational space, which tripled the no. of exptl. stabilizing disulfide bridges. Next, orthogonal in silico screening steps were used to remove chem. unreasonable mutations and mutations that are predicted to increase protein flexibility. The resulting library of 64 variants was exptl. screened, which revealed 21 (pairs of) stabilizing mutations located both in relatively rigid and in flexible areas of the enzyme. Finally, combining 10-12 of these confirmed mutations resulted in multi-site mutants with an increase in apparent melting temp. from 50 to 85°C, enhanced catalytic activity, preserved regioselectivity and a >250-fold longer half-life. The developed Framework for Rapid Enzyme Stabilization by Computational libraries (FRESCO) requires far less screening than conventional directed evolution.
-
94Thiltgen, G.; Goldstein, R. A. Assessing Predictors of Changes in Protein Stability upon Mutation Using Self-Consistency. PLoS One 2012, 7, e46084, DOI: 10.1371/journal.pone.004608494https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xhs12qsrbM&md5=0f37aae808ba1872727b2d0a162f5f07Assessing predictors of changes in protein stability upon mutation using self-consistencyThiltgen, Grant; Goldstein, Richard A.PLoS One (2012), 7 (10), e46084CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)The ability to predict the effect of mutations on protein stability is important for a wide range of tasks, from protein engineering to assessing the impact of SNPs to understanding basic protein biophysics. A no. of methods have been developed that make these predictions, but assessing the accuracy of these tools is difficult given the limitations and inconsistencies of the exptl. data. We evaluate four different methods based on the ability of these methods to generate consistent results for forward and back mutations and examine how this ability varies with the nature and location of the mutation. We find that, while one method seems to outperform the others, the ability of these methods to make accurate predictions is limited.
-
95Buß, O.; Rudat, J.; Ochsenreither, K. FoldX as Protein Engineering Tool: Better Than Random Based Approaches?. Comput. Struct. Biotechnol. J. 2018, 16, 25– 33, DOI: 10.1016/j.csbj.2018.01.00295https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXjsF2gtr0%253D&md5=a4864b8be6a05bd2e9593d27433aaef4FoldX as Protein Engineering Tool: Better Than Random Based Approaches?Buss, Oliver; Rudat, Jens; Ochsenreither, KatrinComputational and Structural Biotechnology Journal (2018), 16 (), 25-33CODEN: CSBJAC; ISSN:2001-0370. (Elsevier B.V.)Improving protein stability is an important goal for basic research as well as for clin. and industrial applications but no commonly accepted and widely used strategy for efficient engineering is known. Beside random approaches like error prone PCR or phys. techniques to stabilize proteins, e.g. by immobilization, in silico approaches are gaining more attention to apply target-oriented mutagenesis. In this review different algorithms for the prediction of beneficial mutation sites to enhance protein stability are summarized and the advantages and disadvantages of FoldX are highlighted. The question whether the prediction of mutation sites by the algorithm FoldX is more accurate than random based approaches is addressed.
-
96Allen, B. D.; Nisthal, A.; Mayo, S. L. Experimental Library Screening Demonstrates the Successful Application of Computational Protein Design to Large Structural Ensembles. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 19838– 19843, DOI: 10.1073/pnas.101298510796https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhsVyjsbrE&md5=4f90691cd71820f87fcae32845b45239Experimental library screening demonstrates the successful application of computational protein design to large structural ensemblesAllen, Benjamin D.; Nisthal, Alex; Mayo, Stephen L.Proceedings of the National Academy of Sciences of the United States of America (2010), 107 (46), 19838-19843, S19838/1-S19838/8CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)The stability, activity, and soly. of a protein sequence are detd. by a delicate balance of mol. interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate more thorough anal., we developed new methods for the design and high-throughput stability detn. of combinatorial mutation libraries based on protein design calcns. The application of these methods to the core design of a small model system produced many variants with improved thermodn. stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and exptl. measured stability values shows clearly that a design procedure need not reproduce exptl. data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technol.
-
97Barlow, K. A.; Ó Conchúir, S.; Thompson, S.; Suresh, P.; Lucas, J. E.; Heinonen, M.; Kortemme, T. Flex DdG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation. J. Phys. Chem. B 2018, 122, 5389– 5399, DOI: 10.1021/acs.jpcb.7b1136797https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit1ymsb4%253D&md5=0bd9fb996c5579bca1cd4bc13608cd13Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon MutationBarlow, Kyle A.; Conchuir, Shane O.; Thompson, Samuel; Suresh, Pooja; Lucas, James E.; Heinonen, Markus; Kortemme, TanjaJournal of Physical Chemistry B (2018), 122 (21), 5389-5399CODEN: JPCBFK; ISSN:1520-5207. (American Chemical Society)Computationally modeling changes in binding free energies upon mutation (interface ΔΔG) allows large-scale prediction and perturbation of protein-protein interactions. Addnl., methods that consider and sample relevant conformational plasticity should be able to achieve higher prediction accuracy over methods that do not. To test this hypothesis, the authors developed a method within the Rosetta macromol. modeling suite (flex ddG) that samples conformational diversity using "backrub" to generate an ensemble of models and then applies torsion minimization, side chain repacking, and averaging across this ensemble to est. interface ΔΔG values. The authors tested the method on a curated benchmark set of 1240 mutants, and found the method outperformed existing methods that sampled conformational space to a lesser degree. The authors obsd. considerable improvements with flex ddG over existing methods on the subset of small side chain to large side chain mutations, as well as for multiple simultaneous nonalanine mutations, stabilizing mutations, and mutations in antibody-antigen interfaces. Finally, the authors applied a generalized additive model (GAM) approach to the Rosetta energy function; the resulting nonlinear reweighting model improved the agreement with exptl. detd. interface ΔΔG values but also highlighted the necessity of future energy function improvements.
-
98Ludwiczak, J.; Jarmula, A.; Dunin-Horkawicz, S. Combining Rosetta with Molecular Dynamics (MD): A Benchmark of the MD-Based Ensemble Protein Design. J. Struct. Biol. 2018, 203, 54– 61, DOI: 10.1016/j.jsb.2018.02.00498https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXivFSkur8%253D&md5=d877ddf87c0d62bb2467beaaa0c0c164Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein designLudwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, StanislawJournal of Structural Biology (2018), 203 (1), 54-61CODEN: JSBIEM; ISSN:1047-8477. (Elsevier Inc.)Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while mol. dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addn., we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for exptl. validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, esp. for tasks that require a large pool of diverse sequences.
-
99Davis, I. W.; Arendall, W. B.; Richardson, D. C.; Richardson, J. S. The Backrub Motion: How Protein Backbone Shrugs When a Sidechain Dances. Structure 2006, 14, 265– 274, DOI: 10.1016/j.str.2005.10.00799https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtlKltr8%253D&md5=b3bddc8b2314f8a5dabf31f1c3912241The Backrub Motion: How Protein Backbone Shrugs When a Sidechain DancesDavis, Ian W.; Arendall, W. Bryan; Richardson, David C.; Richardson, Jane S.Structure (Cambridge, MA, United States) (2006), 14 (2), 265-274CODEN: STRUE6; ISSN:0969-2126. (Cell Press)Surprisingly, the frozen structures from ultra-high-resoln. protein crystallog. reveal a prevalent, but subtle, mode of local backbone motion coupled to much larger, two-state changes of sidechain conformation. This "backrub" motion provides an influential and common type of local plasticity in protein backbone. Concerted reorientation of two adjacent peptides swings the central sidechain perpendicular to the chain direction, changing accessible sidechain conformations while leaving flanking structure undisturbed. Alternate conformations in sub-1 Å crystal structures show backrub motions for two-thirds of the significant Cβ shifts and 3% of the total residues in these proteins (126/3882), accompanied by two-state changes in sidechain rotamer. The B modeling tool is effective in crystallog. rebuilding. For homol. modeling or protein redesign, backrubs can provide realistic, small perturbations to rigid backbones. For large sidechain changes in protein dynamics or for single mutations, backrubs allow backbone accommodation while maintaining H bonds and ideal geometry.
-
100Wei, G.; Xi, W.; Nussinov, R.; Ma, B. Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the Cell. Chem. Rev. 2016, 116, 6516– 6551, DOI: 10.1021/acs.chemrev.5b00562100https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XitVyhsLo%253D&md5=fac9ab64e11aa3a4f2f0988bf1db1209Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the CellWei, Guanghong; Xi, Wenhui; Nussinov, Ruth; Ma, BuyongChemical Reviews (Washington, DC, United States) (2016), 116 (11), 6516-6551CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)All sol. proteins populate conformational ensembles that together constitute the native state. Their fluctuations in water are intrinsic thermodn. phenomena, and the distributions of the states on the energy landscape are detd. by statistical thermodn.; however, they are optimized to perform their biol. functions. In this review we briefly describe advances in free energy landscape studies of protein conformational ensembles. Exptl. (NMR, small-angle X-ray scattering, single-mol. spectroscopy, and cryo-electron microscopy) and computational (replica-exchange mol. dynamics, metadynamics, and Markov state models) approaches have made great progress in recent years. These address the challenging characterization of the highly flexible and heterogeneous protein ensembles. We focus on structural aspects of protein conformational distributions, from collective motions of single- and multi-domain proteins, intrinsically disordered proteins, to multiprotein complexes. Importantly, we highlight recent studies that illustrate functional adjustment of protein conformational ensembles in the crowded cellular environment. We center on the role of the ensemble in recognition of small- and macro-mols. (protein and RNA/DNA) and emphasize emerging concepts of protein dynamics in enzyme catalysis. Overall, protein ensembles link fundamental physicochem. principles and protein behavior and the cellular network and its regulation.
-
101Fan, H.; Mark, A. E. Relative Stability of Protein Structures Determined by X-Ray Crystallography or NMR Spectroscopy: A Molecular Dynamics Simulation Study. Proteins: Struct., Funct., Genet. 2003, 53, 111– 120, DOI: 10.1002/prot.10496101https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXnt1WltLw%253D&md5=b0f43f93057a0824336823a539ae3985Relative stability of protein structures determined by X-ray crystallography or NMR spectroscopy: A molecular dynamics simulation studyFan, Hao; Mark, Alan E.Proteins: Structure, Function, and Genetics (2003), 53 (1), 111-120CODEN: PSFGEY; ISSN:0887-3585. (Wiley-Liss, Inc.)The relative stability of protein structures detd. by either x-ray crystallog. or NMR spectroscopy has been investigated by using mol. dynamics simulation techniques. Published structures of 34 proteins contg. between 50 and 100 residues have been evaluated. The proteins selected represent a mixt. of secondary structure types including all α, all β, and α/β. The proteins selected do not contain cysteine-cysteine bridges. In addn., any crystallog. waters, metal ions, cofactors, or bound ligands were removed before the systems were simulated. The stability of the structures was evaluated by simulating, under identical conditions, each of the proteins for at least 5 ns in explicit solvent. It is found that not only do NMR-derived structures have, on av., higher internal strain than structures detd. by x-ray crystallog. but that a significant proportion of the structures are unstable and rapidly diverge in simulations.
-
102Kuzmanic, A.; Pannu, N. S.; Zagrovic, B. X-Ray Refinement Significantly Underestimates the Level of Microscopic Heterogeneity in Biomolecular Crystals. Nat. Commun. 2014, 5, 3220, DOI: 10.1038/ncomms4220102https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2cvivFKgsw%253D%253D&md5=99e76ed6614f1c57b46fa8917bcbcf99X-ray refinement significantly underestimates the level of microscopic heterogeneity in biomolecular crystalsKuzmanic Antonija; Zagrovic Bojan; Pannu Navraj SNature communications (2014), 5 (), 3220 ISSN:.Biomolecular X-ray structures typically provide a static, time- and ensemble-averaged view of molecular ensembles in crystals. In the absence of rigid-body motions and lattice defects, B-factors are thought to accurately reflect the structural heterogeneity of such ensembles. In order to study the effects of averaging on B-factors, we employ molecular dynamics simulations to controllably manipulate microscopic heterogeneity of a crystal containing 216 copies of villin headpiece. Using average structure factors derived from simulation, we analyse how well this heterogeneity is captured by high-resolution molecular-replacement-based model refinement. We find that both isotropic and anisotropic refined B-factors often significantly deviate from their actual values known from simulation: even at high 1.0 ÅA resolution and Rfree of 5.9%, B-factors of some well-resolved atoms underestimate their actual values even sixfold. Our results suggest that conformational averaging and inadequate treatment of correlated motion considerably influence estimation of microscopic heterogeneity via B-factors, and invite caution in their interpretation.
-
103Karshikoff, A.; Nilsson, L.; Ladenstein, R. Rigidity versus Flexibility: The Dilemma of Understanding Protein Thermal Stability. FEBS J. 2015, 282, 3899– 3917, DOI: 10.1111/febs.13343103https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtFKgt7vI&md5=065a89fa9d115391b32f09d84a41fb1aRigidity versus flexibility: the dilemma of understanding protein thermal stabilityKarshikoff, Andrey; Nilsson, Lennart; Ladenstein, RudolfFEBS Journal (2015), 282 (20), 3899-3917CODEN: FJEOAC; ISSN:1742-464X. (Wiley-Blackwell)A review. The role of fluctuations in protein thermostability has recently received considerable attention. In the current literature a dualistic picture can be found as follows. On one hand, thermostability seems to be assocd. with enhanced rigidity of the protein scaffold in parallel with the redn. of flexible parts of the structure. However, in contrast with this argument it has been shown by exptl. studies and computer simulation that thermal tolerance of a protein is not necessarily correlated with the suppression of internal fluctuations and mobility. Both concepts - i.e., rigidity and flexibility - are derived from a mech. engineering perspective and represent temporally insensitive features describing static properties and neglect the notion that relative motion at certain time scales is possible in structurally stable regions of a protein. This suggests that a strict sepn. of rigid and flexible parts of a protein mol. does not correctly describe the reality of the situation. In this work the concepts of mobility/flexibility vs. rigidity will be critically reconsidered by taking into account mol. dynamics calcns. of heat capacity and conformational entropy, salt bridge networks, electrostatic interactions in folded and unfolded states, and the emerging picture of protein thermostability in view of recently developed network theories. Last, but not least, the influence of high temp. on the active site and activity of enzymes will be considered.
-
104Der, B. S.; Kluwe, C.; Miklos, A. E.; Jacak, R.; Lyskov, S.; Gray, J. J.; Georgiou, G.; Ellington, A. D.; Kuhlman, B. Alternative Computational Protocols for Supercharging Protein Surfaces for Reversible Unfolding and Retention of Stability. PLoS One 2013, 8, e64363, DOI: 10.1371/journal.pone.0064363104https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpslOgtLY%253D&md5=ad4d2287e1b74fc768c985f14520609aAlternative computational protocols for supercharging protein surfaces for reversible unfolding and retention of stabilityDer, Bryan S.; Kluwe, Christien; Miklos, Aleksandr E.; Jacak, Ron; Lyskov, Sergey; Gray, Jeffrey J.; Georgiou, George; Ellington, Andrew D.; Kuhlman, BrianPLoS One (2013), 8 (5), e64363CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)Reengineering protein surfaces to exhibit high net charge, referred to as "supercharging", can improve reversibility of unfolding by preventing aggregation of partially unfolded states. Incorporation of charged side chains should be optimized while considering structural and energetic consequences, as numerous mutations and accumulation of like-charges can also destabilize the native state. A previously demonstrated approach deterministically mutates flexible polar residues (amino acids DERKNQ) with the fewest av. neighboring atoms per side chain atom (AvNAPSA). Our approach uses Rosetta-based energy calcns. to choose the surface mutations. Both protocols are available for use through the ROSIE web server. The automated Rosetta and AvNAPSA approaches for supercharging choose dissimilar mutations, raising an interesting division in surface charging strategy. Rosetta-supercharged variants of GFP (RscG) ranging from -11 to -61 and +7 to +58 were exptl. tested, and for comparison, we re-tested the previously developed AvNAPSA-supercharged variants of GFP (AscG) with +36 and -30 net charge. Mid-charge variants demonstrated ∼3-fold improvement in refolding with retention of stability. However, as we pushed to higher net charges, expression and sol. yield decreased, indicating that net charge or mutational load may be limiting factors. Interestingly, the two different approaches resulted in GFP variants with similar refolding properties. Our results show that there are multiple sets of residues that can be mutated to successfully supercharge a protein, and combining alternative supercharge protocols with exptl. testing can be an effective approach for charge-based improvement to refolding.
-
105Chan, P.; Curtis, R. A.; Warwicker, J. Soluble Expression of Proteins Correlates with a Lack of Positively-Charged Surface. Sci. Rep. 2013, 3, 3333, DOI: 10.1038/srep03333105https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2c3gslGmsw%253D%253D&md5=f919d4ac66e3b0535b9ac0910bbd6341Soluble expression of proteins correlates with a lack of positively-charged surfaceChan Pedro; Curtis Robin A; Warwicker JimScientific reports (2013), 3 (), 3333 ISSN:.Prediction of protein solubility is gaining importance with the growing use of protein molecules as therapeutics, and ongoing requirements for high level expression. We have investigated protein surface features that correlate with insolubility. Non-polar surface patches associate to some degree with insolubility, but this is far exceeded by the association with positively-charged patches. Negatively-charged patches do not separate insoluble/soluble subsets. The separation of soluble and insoluble subsets by positive charge clustering (area under the curve for a ROC plot is 0.85) has a striking parallel with the separation that delineates nucleic acid-binding proteins, although most of the insoluble dataset are not known to bind nucleic acid. Additionally, these basic patches are enriched for arginine, relative to lysine. The results are discussed in the context of expression systems and downstream processing, contributing to a view of protein solubility in which the molecular interactions of charged groups are far from equivalent.
-
106Rezaie, E.; Mohammadi, M.; Sakhteman, A.; Bemani, P.; Ahrari, S. Application of Molecular Dynamics Simulations To Design a Dual-Purpose Oligopeptide Linker Sequence for Fusion Proteins. J. Mol. Model. 2018, 24, 313, DOI: 10.1007/s00894-018-3846-x106https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB3czps1Oktw%253D%253D&md5=53d54b6672d1e6cea66bc0be9574636fApplication of molecular dynamics simulations to design a dual-purpose oligopeptide linker sequence for fusion proteinsRezaie Ehsan; Mohammadi Mozafar; Rezaie Ehsan; Sakhteman Amirhossein; Bemani Peyman; Ahrari SajjadJournal of molecular modeling (2018), 24 (11), 313 ISSN:.Proteins are often monitored by combining a fluorescent polypeptide tag with the target protein. However, due to the high molecular weight and immunogenicity of such tags, they are not suitable choices for combining with fusion proteins such as immunotoxins. In this study, we designed a polypeptide sequence with a dual role (it acts as both a linker and a fluorescent probe) to use with fusion proteins. Two common fluorescent tag sequences based on tetracysteine were compared to a commonly used rigid linker as well as our proposed dual-purpose sequence. Computational investigations showed that the dual-purpose sequence was structurally stable and may be a good choice to use as both a linker and a fluorescence marker between two moieties in a fusion protein.
-
107Folkman, L.; Stantic, B.; Sattar, A.; Zhou, Y. EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models. J. Mol. Biol. 2016, 428, 1394– 1405, DOI: 10.1016/j.jmb.2016.01.012107https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsFKmsrg%253D&md5=f18493bae91d6e45eb5bdfe42249d354EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models.Folkman, Lukas; Stantic, Bela; Sattar, Abdul; Zhou, YaoqiJournal of Molecular Biology (2016), 428 (6), 1394-1405CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Protein engineering and characterization of non-synonymous single nucleotide variants (SNVs) require accurate prediction of protein stability changes (ΔΔGu) induced by single amino acid substitutions. Here, we have developed a new prediction method called Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM), which comprises five specialised support vector machine (SVM) models and makes the final prediction from a consensus of two models selected based on the predicted secondary structure and accessible surface area of the mutated residue. The new method is applicable to single-domain monomeric proteins and can predict ΔΔGu with a protein sequence and mutation as the only inputs. EASE-MM yielded a Pearson correlation coeff. of 0.53-0.59 in 10-fold cross-validation and independent testing and was able to outperform other sequence-based methods. When compared to structure-based energy functions, EASE-MM achieved a comparable or better performance. The application to a large dataset of human germline non-synonymous SNVs showed that the disease-causing variants tend to be assocd. with larger magnitudes of ΔΔGu predicted with EASE-MM. The EASE-MM web-server is available at http://sparks-lab.org/server/ease.
-
108Teng, S.; Srivastava, A. K.; Wang, L. Sequence Feature-Based Prediction of Protein Stability Changes upon Amino Acid Substitutions. BMC Genomics 2010, 11 (Suppl 2), S5, DOI: 10.1186/1471-2164-11-S2-S5There is no corresponding record for this reference.
-
109Huang, L.-T.; Gromiha, M. M.; Ho, S.-Y. IPTREE-STAB: Interpretable Decision Tree Based Method for Predicting Protein Stability Changes upon Mutations. Bioinformatics 2007, 23, 1292– 1293, DOI: 10.1093/bioinformatics/btm100109https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXntVOjuro%253D&md5=93f9f9be58c4e5fc3091a6409d93ad60iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutationsHuang, Liang-Tsung; Gromiha, M. Michael; Ho, Shinn-YingBioinformatics (2007), 23 (10), 1292-1293CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)A web server, iPTREE-STAB is developed or discriminating the stability of proteins (stabilizing or destabilizing) and predicting their stability changes (ΔΔG) upon single amino acid substitutions from amino acid sequence. The discrimination and prediction are mainly based on decision tree coupled with adaptive boosting algorithm, and classification and regression tree, resp., using three neighboring residues of the mutant site along N- and C-terminals. Our method showed an accuracy of 82% for discriminating the stabilizing and destabilizing mutants, and a correlation of 0.70 for predicting protein stability changes upon mutations.
-
110Paladin, L.; Piovesan, D.; Tosatto, S. C. E. SODA: Prediction of Protein Solubility from Disorder and Aggregation Propensity. Nucleic Acids Res. 2017, 45, W236– W240, DOI: 10.1093/nar/gkx412110https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1amtbY%253D&md5=fad21a88462efc7f300fd49d3396e95aSODA: prediction of protein solubility from disorder and aggregation propensityPaladin, Lisanna; Piovesan, Damiano; Tosatto, Silvio C. E.Nucleic Acids Research (2017), 45 (W1), W236-W240CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)Soly. is an important, albeit not well understood, feature detg. protein behavior. It is of paramount importance in protein engineering, where similar folded proteins may behave in very different ways in soln. Here we present SODA, a novel method to predict the changes of protein soly. based on several physico-chem. properties of the protein. SODA uses the propensity of the protein sequence to aggregate as well as intrinsic disorder, plus hydrophobicity and secondary structure preferences to est. changes in soly. It has been trained and benchmarked on two different datasets. The comparison to other recently published methods shows that SODA has state-of-the-art performance and is particularly well suited to predict mutations decreasing soly. The method is fast, returning results for single mutations in seconds. A usage example estg. the full repertoire of mutations for a human germline antibody highlights several soly. hotspots on the surface.
-
111Liaw, A.; Wiener, M. Classification and Regression by RandomForest. R News 2002, 2, 18– 22There is no corresponding record for this reference.
-
112Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5– 32, DOI: 10.1023/A:1010933404324There is no corresponding record for this reference.
-
113Boughorbel, S.; Jarray, F.; El-Anbari, M. Optimal Classifier for Imbalanced Data Using Matthews Correlation Coefficient Metric. PLoS One 2017, 12, e0177678, DOI: 10.1371/journal.pone.0177678113https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXkvFaktLk%253D&md5=f3ed23447a504356fa60617bc836ffdfOptimal classifier for imbalanced data using Matthews Correlation Coefficient metricBoughorbel, Sabri; Jarray, Fethi; Mohammed, El-AnbariPLoS One (2017), 12 (6), e0177678/1-e0177678/17CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)Data imbalance is frequently encountered in biomedical applications. Resampling techniques can be used in binary classification to tackle this issue. However such solns. are not desired when the no. of samples in the small class is limited. Moreover the use of inadequate performance metrics, such as accuracy, lead to poor generalization results because the classifiers tend to predict the largest size class. One of the good approaches to deal with this issue is to optimize performance metrics that are designed to handle data imbalance. Matthews Correlation Coeff. (MCC) is widely used in Bioinformatics as a performance metric. We are interested in developing a new classifier based on the MCC metric to handle imbalanced data. We derive an optimal Bayes classifier for the MCC metric using an approach based on Frechet deriv. We show that the proposed algorithm has the nice theor. property of consistency. Using simulated data, we verify the correctness of our optimality result by searching in the space of all possible binary classifiers. The proposed classifier is evaluated on 64 datasets from a wide range data imbalance. We compare both classification performance and CPU efficiency for three classifiers: 1) the proposed algorithm (MCC-classifier), the Bayes classifier with a default threshold (MCC-base) and imbalanced SVM (SVM-imba). The exptl. evaluation shows that MCC-classifier has a close performance to SVM-imba while being simpler and more efficient.
-
114Ling, C. X.; Sheng, V. S. Cost-Sensitive Learning and the Class Imbalance Problem. In Encyclopedia of Machine Learning; Sammut, C., Ed.; Springer: New York, 2007.There is no corresponding record for this reference.
-
115Rao, R.; Fung, G.; Rosales, R. On the Dangers of Cross-Validation. An Experimental Evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining; Society for Industrial and Applied Mathematics: Philadelphia, PA, 2008; pp 588– 596.There is no corresponding record for this reference.
-
116Stephens, Z. D.; Lee, S. Y.; Faghri, F.; Campbell, R. H.; Zhai, C.; Efron, M. J.; Iyer, R.; Schatz, M. C.; Sinha, S.; Robinson, G. E. Big Data: Astronomical or Genomical?. PLoS Biol. 2015, 13, e1002195, DOI: 10.1371/journal.pbio.1002195116https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XktVGrsrs%253D&md5=0a81543b8015e929b89cc4bfe228a83cBig data: astronomical or genomical?Stephens, Zachary D.; Lee, Skylar Y.; Faghri, Faraz; Campbell, Roy H.; Zhai, Chengxiang; Efron, Miles J.; Iyer, Ravishankar; Schatz, Michael C.; Sinha, Saurabh; Robinson, Gene E.PLoS Biology (2015), 13 (7), e1002195/1-e1002195/11CODEN: PBLIBG; ISSN:1545-7885. (Public Library of Science)Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our ests. show that genomics is a "four-headed beast"-it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and anal. We discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the "genomical" challenges of the next decade.
-
117Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic Local Alignment Search Tool. J. Mol. Biol. 1990, 215, 403– 410, DOI: 10.1016/S0022-2836(05)80360-2117https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3MXitVGmsA%253D%253D&md5=009d2323eb82f0549356880e1101db16Basic local alignment search toolAltschul, Stephen F.; Gish, Warren; Miller, Webb; Myers, Eugene W.; Lipman, David J.Journal of Molecular Biology (1990), 215 (3), 403-10CODEN: JMOBAK; ISSN:0022-2836.A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent math. results on the stochastic properties of MSP scores allow an anal. of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a no. of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the anal. of multiple regions of similarity in long DNA sequences. In addn. to its flexibility and tractability to math. anal., BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
-
118Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Res. 1997, 25, 3389– 3402, DOI: 10.1093/nar/25.17.3389118https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXlvFyhu7w%253D&md5=4e44123e5984e4aca46a9899d347a176Gapped BLAST and PSI-BLAST: a new generation of protein database search programsAltschul, Stephen F.; Madden, Thomas L.; Schaffer, Alejandro A.; Zhang, Jinghui; Zhang, Zheng; Miller, Webb; Lipman, David J.Nucleic Acids Research (1997), 25 (17), 3389-3402CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approx. three times the speed of the original. In addn., a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approx. the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biol. relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily. The source code for the new BLAST programs is available by anonymous ftp from the machine ncbi.nlm.nih.gov, within the directory 'blast', and the programs may be run from NCBIs web site at http://www.ncbi.nlm.nih.gov/.
-
119Eddy, S. R. Profile Hidden Markov Models. Bioinformatics 1998, 14, 755– 763, DOI: 10.1093/bioinformatics/14.9.755119https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1MXktlCmtQ%253D%253D&md5=ff718714f195b87980385b1674a35353Profile hidden Markov modelsEddy, Sean R.Bioinformatics (1998), 14 (9), 755-763CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)A review with many refs. The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement std. pairwise comparison methods for large-scale sequence anal. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise.
-
120Remmert, M.; Biegert, A.; Hauser, A.; Söding, J. HHblits: Lightning-Fast Iterative Protein Sequence Searching by HMM–HMM Alignment. Nat. Methods 2012, 9, 173– 175, DOI: 10.1038/nmeth.1818120https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhs1OltbnO&md5=7173e55f4fe71458233a77c3bd38cf68HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignmentRemmert, Michael; Biegert, Andreas; Hauser, Andreas; Soeding, JohannesNature Methods (2012), 9 (2), 173-175CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)Sequence-based protein function and structure prediction depends crucially on sequence-search sensitivity and accuracy of the resulting sequence alignments. We present an open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): 'HMM-HMM-based lightning-fast iterative sequence search' (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/). Compared to the sequence-search tool PSI-BLAST, HHblits is faster owing to its discretized-profile prefilter, has 50-100% higher sensitivity and generates more accurate alignments.
-
121Pearson, W. R. An Introduction to Sequence Similarity (“Homology”) Searching. Curr. Protoc. Bioinf. 2013, 42, 3.1.1– 3.1.8, DOI: 10.1002/0471250953.bi0301s42There is no corresponding record for this reference.
-
122Rost, B. Twilight Zone of Protein Sequence Alignments. Protein Eng., Des. Sel. 1999, 12, 85– 94, DOI: 10.1093/protein/12.2.85There is no corresponding record for this reference.
-
123Fletcher, W.; Yang, Z. The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection. Mol. Biol. Evol. 2010, 27, 2257– 2267, DOI: 10.1093/molbev/msq115123https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXht1WhtL%252FK&md5=243dcf1c1aaee3f824ad895fc7bd3d57The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive SelectionFletcher, William; Yang, ZihengMolecular Biology and Evolution (2010), 27 (10), 2257-2267CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)The detection of pos. Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The "branch-site" test is designed to detect localized episodic bouts of pos. selection that affect only a few amino acid residues on particular lineages and has been shown to have reasonable power and low false-pos. rates for a wide range of selection schemes. Previous simulations examg. the performance of the test, however, were conducted under idealized conditions without insertions, deletions, or alignment errors. As the test is sometimes used to analyze divergent sequences, the impact of indels and alignment errors is a major concern. Here, we used a recently developed indel-simulation program to examine the false-pos. rate and power of the branch-site test. We find that insertions and deletions do not cause excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives. Of the alignment methods evaluated, PRANK consistently outperformed MUSCLE, MAFFT, and ClustalW, mostly because the latter programs tend to place nonhomologous codons (or amino acids) into the same column, producing shorter and less accurate alignments and giving the false impression that many amino acid substitutions have occurred at those sites. Our examn. of two previous studies suggests that alignment errors may impact the anal. of mammalian and vertebrate genes by the branch-site test, and it is important to use reliable alignment methods.
-
124Vialle, R. A.; Tamuri, A. U.; Goldman, N. Alignment Modulates Ancestral Sequence Reconstruction Accuracy. Mol. Biol. Evol. 2018, 35, 1783– 1797, DOI: 10.1093/molbev/msy055124https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtF2rurzO&md5=d87a9d035ac03728fc42191d94ae34d6Alignment modulates ancestral sequence reconstruction accuracyVialle, Ricardo Assuncao; Tamuri, Asif U.; Goldman, NickMolecular Biology and Evolution (2018), 35 (7), 1783-1797CODEN: MBEVEO; ISSN:1537-1719. (Oxford University Press)It relies on multiple sequence alignment (MSA) which may introduce biases, and it remains unknown how MSA methodol. approaches impact ancestral sequence reconstruction (ASR). Here, we investigate how MSA methodol. modulates ASR using a simulation study of various evolutionary scenarios. We evaluate the accuracy of ancestral protein sequence reconstruction for simulated data and compare reconstruction outcomes using different alignment methods. Our results reveal biases introduced not only by aligner algorithms and assumptions, but also tree topol. and the rate of insertions and deletions. Under many conditions we find no substantial differences between the MSAs. However, increasing the difficulty for the aligners can significantly impact ASR. The MAFFT consistency aligners and PRANK variants exhibit the best performance, whereas FSA displays limited performance. We also discover a bias towards reconstructed sequences longer than the true ancestors, deriving from a preference for inferring insertions, in almost all MSA methodol. approaches. In addn., we find measures of MSA quality generally correlate highly with reconstruction accuracy. Thus, we show MSA methodol. differences can affect the quality of reconstructions and propose MSA methods should be selected with care to accurately det. ancestral states with confidence.
-
125Chowdhury, B.; Garai, G. A Review on Multiple Sequence Alignment from the Perspective of Genetic Algorithm. Genomics 2017, 109, 419– 431, DOI: 10.1016/j.ygeno.2017.06.007125https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFGhsLzM&md5=ea687bedc4969e0baeb473d5c243927aA review on multiple sequence alignment from the perspective of genetic algorithmChowdhury, Biswanath; Garai, GautamGenomics (2017), 109 (5-6), 419-431CODEN: GNMCEP; ISSN:0888-7543. (Elsevier Inc.)A review. Sequence alignment is an active research area in the field of bioinformatics. It is also a crucial task as it guides many other tasks like phylogenetic anal., function, and/or structure prediction of biol. macromols. like DNA, RNA, and Protein. Proteins are the building blocks of every living organism. Although protein alignment problem has been studied for several decades, unfortunately, every available method produces alignment results differently for a single alignment problem. Multiple sequence alignment is characterized as a very high computational complex problem. Many stochastic methods, therefore, are considered for improving the accuracy of alignment. Among them, many researchers frequently use Genetic Algorithm. In this study, we have shown different types of the method applied in alignment and the recent trends in the multiobjective genetic algorithm for solving multiple sequence alignment. Many recent studies have demonstrated considerable progress in finding the alignment accuracy.
-
126Taly, J.-F.; Magis, C.; Bussotti, G.; Chang, J.-M.; Di Tommaso, P.; Erb, I.; Espinosa-Carrasco, J.; Kemena, C.; Notredame, C. Using the T-Coffee Package to Build Multiple Sequence Alignments of Protein, RNA, DNA Sequences and 3D Structures. Nat. Protoc. 2011, 6, 1669– 1682, DOI: 10.1038/nprot.2011.393126https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXht1yjsrjI&md5=ffd8032f578a0e00234e3ff361219c8bUsing the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structuresTaly, Jean-Francois; Magis, Cedrik; Bussotti, Giovanni; Chang, Jia-Ming; Di Tommaso, Paolo; Erb, Ionas; Espinosa-Carrasco, Jose; Kemena, Carsten; Notredame, CedricNature Protocols (2011), 6 (11), 1669-1682CODEN: NPARDW; ISSN:1750-2799. (Nature Publishing Group)T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biol. sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homol.) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homol. extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.
-
127Pei, J.; Grishin, N. V. PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information. Methods Mol. Biol. 2014, 1079, 263– 271, DOI: 10.1007/978-1-62703-646-7_17127https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2c7gvFOnsg%253D%253D&md5=73ceb74e9bc0c51251abf63b4e4d9bd3PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural informationPei Jimin; Grishin Nick VMethods in molecular biology (Clifton, N.J.) (2014), 1079 (), 263-71 ISSN:.Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D.
-
128Steipe, B.; Schiller, B.; Plückthun, A.; Steinbacher, S. Sequence Statistics Reliably Predict Stabilizing Mutations in a Protein Domain. J. Mol. Biol. 1994, 240, 188– 192, DOI: 10.1006/jmbi.1994.1434128https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2cXltlWqsLg%253D&md5=83d409e3066ec939fae03e05eaeeefb8Sequence statistics reliably predict stabilizing mutations in a protein domainSteipe, Boris; Schiller, Britta; Plueckthun, Andreas; Steinbacher, StefanJournal of Molecular Biology (1994), 240 (3), 188-92CODEN: JMOBAK; ISSN:0022-2836.Ig variable domains are generally thought of as well conserved platforms providing the base for antigen binding loops of highly varying sequence and structure. However, domain evolution must ensure a balance between optimizing antigen affinity and the requirements of a stable, cooperatively folding domain. Since random mutations can carry a significant penalty for domain stability, constraints are imposed both on the repertoire of germline sequences and on somatic amino acid replacements during affinity maturation. Analyzing these constraints in the conceptual framework of statistical mech., the authors have been able to predict stabilizing mutations in the McPC603 VK domain from sequence information alone with better than 60% success rate. The validity of this concept not only has far reaching implications for antibody engineering but may also be generalized to engineer other proteins for high stability.
-
129Sullivan, B. J.; Nguyen, T.; Durani, V.; Mathur, D.; Rojas, S.; Thomas, M.; Syu, T.; Magliery, T. J. Stabilizing Proteins from Sequence Statistics: The Interplay of Conservation and Correlation in Triosephosphate Isomerase Stability. J. Mol. Biol. 2012, 420, 384– 399, DOI: 10.1016/j.jmb.2012.04.025129https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XntFansb8%253D&md5=e358fa1cb59394f38ae264f104c2b3ecStabilizing Proteins from Sequence Statistics: The Interplay of Conservation and Correlation in Triosephosphate Isomerase StabilitySullivan, Brandon J.; Nguyen, Tran; Durani, Venuka; Mathur, Deepti; Rojas, Samantha; Thomas, Miriam; Syu, Trixy; Magliery, Thomas J.Journal of Molecular Biology (2012), 420 (4-5), 384-399CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)Understanding the determinants of protein stability remains one of protein science's greatest challenges. There are still no computational solns. that calc. the stability effects of even point mutations with sufficient reliability for practical use. Amino acid substitutions rarely increase the stability of native proteins; hence, large libraries and high-throughput screens or selections are needed to stabilize proteins using directed evolution. Consensus mutations have proven effective for increasing stability, but these mutations are successful only about half the time. We set out to understand why some consensus mutations fail to stabilize, and what criteria might be useful to predict stabilization more accurately. Overall, consensus mutations at more conserved positions were more likely to be stabilizing in our model, triosephosphate isomerase (TIM) from Saccharomyces cerevisiae. However, positions coupled to other sites were more likely not to stabilize upon mutation. Destabilizing mutations could be removed both by removing sites with high statistical correlations to other positions and by removing nearly invariant positions at which "hidden correlations" can occur. Application of these rules resulted in identification of stabilizing mutations in 9 out of 10 positions, and amalgamation of all predicted stabilizing positions resulted in the most stable yeast TIM variant we produced (+ 8 °C). In contrast, a multimutant with 14 mutations each found to stabilize TIM independently was destabilized by 2 °C. Our results are a practical extension to the consensus concept of protein stabilization, and they further suggest the importance of positional independence in the mechanism of consensus stabilization.
-
130Lehmann, M.; Kostrewa, D.; Wyss, M.; Brugger, R.; D’Arcy, A.; Pasamontes, L.; van Loon, A. P. From DNA Sequence to Improved Functionality: Using Protein Sequence Comparisons to Rapidly Design a Thermostable Consensus Phytase. Protein Eng., Des. Sel. 2000, 13, 49– 57, DOI: 10.1093/protein/13.1.49There is no corresponding record for this reference.
-
131Magliery, T. J. Protein Stability: Computation, Sequence Statistics, and New Experimental Methods. Curr. Opin. Struct. Biol. 2015, 33, 161– 168, DOI: 10.1016/j.sbi.2015.09.002131https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhs1SltbvJ&md5=f60fb4ac5dc13566a98015944d24ae0bProtein stability: computation, sequence statistics, and new experimental methodsMagliery, Thomas J.Current Opinion in Structural Biology (2015), 33 (), 161-168CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)A review. Calcg. protein stability and predicting stabilizing mutations remain exceedingly difficult tasks, largely due to the inadequacy of potential functions, the difficulty of modeling entropy and the unfolded state, and challenges of sampling, particularly of backbone conformations. Yet, computational design produced some remarkably stable proteins in recent years, apparently owing to near ideality in structure and sequence features. With caveats, computational prediction of stability can be used to guide mutation, and mutations derived from consensus sequence anal., esp. improved by recent co-variation filters, are very likely to stabilize without sacrificing function. The combination of computational and statistical approaches with library approaches, including new technologies such as deep sequencing and high throughput stability measurements, point to a very exciting near term future for stability engineering, even with difficult computational issues remaining.
-
132Porebski, B. T.; Buckle, A. M. Consensus Protein Design. Protein Eng., Des. Sel. 2016, 29, 245– 251, DOI: 10.1093/protein/gzw015132https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsF2jtr%252FO&md5=d96858b68df92bbbd0811bee8188b048Consensus protein designPorebski, Benjamin T.; Buckle, Ashley M.Protein Engineering, Design & Selection (2016), 29 (7), 245-251CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)A popular and successful strategy in semi-rational design of protein stability is the use of evolutionary information encapsulated in homologous protein sequences. Consensus design is based on the hypothesis that at a given position, the resp. consensus amino acid contributes more than av. to the stability of the protein than non-conserved amino acids. Here, we review the consensus design approach, its theor. underpinnings, successes, limitations and challenges, as well as providing a detailed guide to its application in protein engineering.
-
133Jäckel, C.; Bloom, J. D.; Kast, P.; Arnold, F. H.; Hilvert, D. Consensus Protein Design without Phylogenetic Bias. J. Mol. Biol. 2010, 399, 541– 546, DOI: 10.1016/j.jmb.2010.04.039133https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3cnotF2gsg%253D%253D&md5=7e4dc61c19f12f6625895e1e1c35093cConsensus protein design without phylogenetic biasJackel Christian; Bloom Jesse D; Kast Peter; Arnold Frances H; Hilvert DonaldJournal of molecular biology (2010), 399 (4), 541-6 ISSN:.Consensus design is an appealing strategy for the stabilization of proteins. It exploits amino acid conservation in sets of homologous proteins to identify likely beneficial mutations. Nevertheless, its success depends on the phylogenetic diversity of the sequence set available. Here, we show that randomization of a single protein represents a reliable alternative source of sequence diversity that is essentially free of phylogenetic bias. A small number of functional protein sequences selected from binary-patterned libraries suffice as input for the consensus design of active enzymes that are easier to produce and substantially more stable than individual members of the starting data set. Although catalytic activity correlates less consistently with sequence conservation in these extensively randomized proteins, less extreme mutagenesis strategies might be adopted in practice to augment stability while maintaining function.
-
134Goyal, V. D.; Magliery, T. J. Phylogenetic Spread of Sequence Data Affects Fitness of SOD1 Consensus Enzymes: Insights from Sequence Statistics and Structural Analyses. Proteins: Struct., Funct., Genet. 2018, 86, 609– 620, DOI: 10.1002/prot.25486There is no corresponding record for this reference.
-
135Vázquez-Figueroa, E.; Chaparro-Riggers, J.; Bommarius, A. S. Development of a Thermostable Glucose Dehydrogenase by a Structure-Guided Consensus Concept. ChemBioChem 2007, 8, 2295– 2301, DOI: 10.1002/cbic.200700500135https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXltlenur0%253D&md5=7157c350631e7102bc5fd0b0ef8a4a7cDevelopment of a thermostable glucose dehydrogenase by a structure-guided consensus conceptVazquez-Figueroa, Eduardo; Chaparro-Riggers, Javier; Bommarius, Andreas S.ChemBioChem (2007), 8 (18), 2295-2301CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)Instability under non-native processing conditions, esp. at elevated temps., is a major factor preventing the widespread adoption of biocatalysts for industrial synthesis. A crucial distinction of many redox enzymes used to synthesize chiral compds. is the need for cofactors (e.g., NAD(P)(H)) for function. Because of the prohibitively high prices of nicotinamide cofactors, a robust cofactor-regenerating enzyme is required for the economical synthesis of fine chems. by biocatalysis. Here we test the structure-guided consensus for the generation of a thermostable glucose dehydrogenase (GDH). The consensus sequence in combination with addnl. knowledge-based criteria was used to select amino acids for substitutions. Using this approach we generated 24 variants, 11 of which showed higher thermal stability than the wild-type GDH, a success rate of 46%. Of the 24 variants, seven were located at the subunit interface-known to influence GDH stability-and six were more stable (86% success). The best variants feature a half-life of ∼3.5 days at 65°, in contrast to ∼20 min at 25° for the wild type, thus enhancing stability 106-fold. In addn., the three most stabilizing single mutations were transferred to two GDH homologs from Bacillus thuringiensis and Bacillus licheniformis. The thermal stability as measured by half-life and CD222 nm of the GDH variants was increased, as expected. The resulting stability changes provide further support for the view that these residues are crit. for stability of GDHs and reinforce the success of the consensus approach for identifying stabilizing mutations.
-
136Parthasarathy, S.; Murthy, M. R. Protein Thermal Stability: Insights from Atomic Displacement Parameters (B Values). Protein Eng., Des. Sel. 2000, 13, 9– 13, DOI: 10.1093/protein/13.1.9There is no corresponding record for this reference.
-
137Cole, M. F.; Gaucher, E. A. Exploiting Models of Molecular Evolution to Efficiently Direct Protein Engineering. J. Mol. Evol. 2011, 72, 193– 203, DOI: 10.1007/s00239-010-9415-2137https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjvVSksb4%253D&md5=d54d91ea84b7e5660b5f2f72539c7d58Exploiting Models of Molecular Evolution to Efficiently Direct Protein EngineeringCole, Megan F.; Gaucher, Eric A.Journal of Molecular Evolution (2011), 72 (2), 193-203CODEN: JMEVAU; ISSN:0022-2844. (Springer)Directed evolution and protein engineering approaches used to generate novel or enhanced biomol. function often use the evolutionary sequence diversity of protein homologs to rationally guide library design. To fully capture this sequence diversity, however, libraries contg. millions of variants are often necessary. Screening libraries of this size is often undesirable due to inaccuracies of high-throughput assays, costs, and time constraints. The ability to effectively cull sequence diversity while still generating the functional diversity within a library thus holds considerable value. This is particularly relevant when high-throughput assays are not amenable to select/screen for certain biomol. properties. Here, we summarize our recent attempts to develop an evolution-guided approach, Reconstructing Evolutionary Adaptive Paths (REAP), for directed evolution and protein engineering that exploits phylogenetic and sequence analyses to identify amino acid substitutions that are likely to alter or enhance function of a protein. To demonstrate the utility of this technique, we highlight our previous work with DNA polymerases in which a REAP-designed small library was used to identify a DNA polymerase capable of accepting non-std. nucleosides. We anticipate that the REAP approach will be used in the future to facilitate the engineering of biopolymers with expanded functions and will thus have a significant impact on the developing field ofevolutionary synthetic biol.'.
-
138Hochberg, G. K. A.; Thornton, J. W. Reconstructing Ancient Proteins to Understand the Causes of Structure and Function. Annu. Rev. Biophys. 2017, 46, 247– 269, DOI: 10.1146/annurev-biophys-070816-033631138https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXksVCqs70%253D&md5=19552f9d9e82ad02000e1650203db066Reconstructing Ancient Proteins to Understand the Causes of Structure and FunctionHochberg, Georg K. A.; Thornton, Joseph W.Annual Review of Biophysics (2017), 46 (), 247-269CODEN: ARBNCV; ISSN:1936-122X. (Annual Reviews)A review. A central goal in biochem. is to explain the causes of protein sequence, structure, and function. Mainstream approaches seek to rationalize sequence and structure in terms of their effects on function and to identify function's underlying determinants by comparing related proteins to each other. Although productive, both strategies suffer from intrinsic limitations that have left important aspects of many proteins unexplained. These limits can be overcome by reconstructing ancient proteins, exptl. characterizing their properties, and retracing their evolution through time. This approach has proven to be a powerful means for discovering how historical changes in sequence produced the functions, structures, and other phys./chem. characteristics of modern proteins. It has also illuminated whether protein features evolved because of functional optimization, historical constraint, or blind chance. Here this review recent studies employing ancestral protein reconstruction and show how they have produced new knowledge not only of mol. evolutionary processes but also of the underlying determinants of modern proteins' phys., chem., and biol. properties.
-
139Aerts, D.; Verhaeghe, T.; Joosten, H.-J.; Vriend, G.; Soetaert, W.; Desmet, T. Consensus Engineering of Sucrose Phosphorylase: The Outcome Reflects the Sequence Input. Biotechnol. Bioeng. 2013, 110, 2563– 2572, DOI: 10.1002/bit.24940139https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXntV2nsrY%253D&md5=0eb7d3666500fcc3c3e4b2c9bf0d2726Consensus Engineering of Sucrose Phosphorylase: The Outcome Reflects the Sequence InputAerts, Dirk; Verhaeghe, Tom; Joosten, Henk-Jan; Vriend, Gert; Soetaert, Wim; Desmet, TomBiotechnology and Bioengineering (2013), 110 (10), 2563-2572CODEN: BIBIAU; ISSN:0006-3592. (John Wiley & Sons, Inc.)Consensus engineering, which is replacing amino acids by the most frequently occurring one at their positions in a multiple sequence alignment (MSA), is a known strategy to increase the stability of a protein. The application of this concept to the entire sequence of an enzyme, however, has been tried only a few times mainly because of the problems detg. the consensus in highly variable regions. We show that this problem can be solved by replacing such problematic regions by the corresponding sequence of the natural homolog closest to the consensus. When one or a few sub-families are overrepresented in the MSA the consensus sequence is a biased representation of the sequence space. We examine the influence of this bias by constructing three consensus sequences using different MSAs of sucrose phosphorylase (SP). Each consensus enzyme contained about 70 mutations compared to its closest natural homolog and folded correctly and displayed activity on sucrose. Correlation anal. revealed that the family's co-evolution network was kept intact, which is one of the main advantages of full-length consensus design. The consensus enzymes displayed an "av." thermostability, i.e., one that is higher than some but not all known representatives. We cautiously present practical rules for the design of consensus sequences, but warn that the measure of success depends on which natural enzyme is used as point of comparison.
-
140Trudeau, D. L.; Kaltenbach, M.; Tawfik, D. S. On the Potential Origins of the High Stability of Reconstructed Ancestral Proteins. Mol. Biol. Evol. 2016, 33, 2633– 2641, DOI: 10.1093/molbev/msw138140https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhvVKmsrrK&md5=b75ae4a443a42d2a9c427d4471386c0eOn the potential origins of the high stability of reconstructed ancestral proteinsTrudeau, Devin L.; Kaltenbach, Miriam; Tawfik, Dan S.Molecular Biology and Evolution (2016), 33 (10), 2633-2641CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)Ancestral reconstruction provides instrumental insights regarding the biochem. and biophys. characteristics of past proteins. A striking observation relates to the remarkably high thermostability of reconstructed ancestors. The latter has been linked to high environmental temps. in the Precambrian era, the era relating to most reconstructed proteins.We found that inferred ancestors of the serum paraoxonase (PON) enzyme family, including the mammalian ancestor,exhibit dramatically increased thermostabilities compared with the extant, human enzyme (up to 30 °C higher melting temp.). However, the environmental temp. at the time of emergence of mammals is presumed to be similar to the present one. Addnl., the mammalian PON ancestor has superior folding properties (kinetic stability) -unlike the extant mammalian PONs, it expresses in E. coli in a sol. and functional form, and at a high yield. We discuss two potential origins of this unexpectedly high stability. First, ancestral stability may be overestimated by a "consensuseffect," whereby replacing amino acids that are rare in contemporary sequences with the amino acid most common in the family increases protein stability. Comparison to other reconstructed ancestors indicates that the consensus effect may bias some but not all reconstructions. Second, we note that high stability may relate to factors other than high environmental temp. such as oxidative stress or high radiation levels. Foremost, intrinsic factors such as high rates of genetic mutations and/or of transcriptional and translational errors, and less efficient protein quality control systems,may underlie the high kinetic and thermodn. stability of past proteins.
-
141Wheeler, L. C.; Lim, S. A.; Marqusee, S.; Harms, M. J. The Thermostability and Specificity of Ancient Proteins. Curr. Opin. Struct. Biol. 2016, 38, 37– 43, DOI: 10.1016/j.sbi.2016.05.015141https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XptVCkt7w%253D&md5=0c838f70ed03739135bd21b2da42976aThe thermostability and specificity of ancient proteinsWheeler, Lucas C.; Lim, Shion A.; Marqusee, Susan; Harms, Michael J.Current Opinion in Structural Biology (2016), 38 (), 37-43CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)A review. Were ancient proteins systematically different than modern proteins. The answer to this question is profoundly important, shaping how we understand the origins of protein biochem., biophys., and functional properties. Ancestral sequence reconstruction (ASR), a phylogenetic approach to infer the sequences of ancestral proteins, may reveal such trends. We discuss two proposed trends: a transition from higher to lower thermostability and a tendency for proteins to acquire higher specificity over time. We review the evidence for elevated ancestral thermostability and discuss its possible origins in a changing environmental temp. and/or reconstruction bias. We also conclude that there is, as yet, insufficient data to support a trend from promiscuity to specificity. Finally, we propose future work to understand these proposed evolutionary trends.
-
142Yang, Z. PAML: A Program Package for Phylogenetic Analysis by Maximum Likelihood. Bioinformatics 1997, 13, 555– 556, DOI: 10.1093/bioinformatics/13.5.555There is no corresponding record for this reference.
-
143Stamatakis, A. RAxML-VI-HPC: Maximum Likelihood-Based Phylogenetic Analyses with Thousands of Taxa and Mixed Models. Bioinformatics 2006, 22, 2688– 2690, DOI: 10.1093/bioinformatics/btl446143https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtFKlsbfI&md5=7ace2669734254992f338db53aa64702RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed modelsStamatakis, AlexandrosBioinformatics (2006), 22 (21), 2688-2690CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)RAxML-VI-HPC (randomized accelerated max. likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with max. likelihood (ML). Low-level tech. optimizations, a modification of the search algorithm, and the use of the GTR + CAT approxn. as replacement for GTR + Γ yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data contg. 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets ≥4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date contg. 25 057 (1463 bp) and 2182 (51 089 bp) taxa, resp.
-
144Huelsenbeck, J. P.; Ronquist, F.; Nielsen, R.; Bollback, J. P. Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology. Science 2001, 294, 2310– 2314, DOI: 10.1126/science.1065889144https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXptFGkt7k%253D&md5=e7a0aada901ae4a53ce15b47e043b436Evolution: Bayesian inference of phylogeny and its impact on evolutionary biologyHuelsenbeck, John P.; Ronquist, Fredrik; Nielsen, Rasmus; Bollback, Jonathan P.Science (Washington, DC, United States) (2001), 294 (5550), 2310-2314CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)A review. As a discipline, phylogenetics is becoming transformed by a flood of mol. data. These data allow broad questions to be asked about the history of life, but also present difficult statistical and computational problems. Bayesian inference of phylogeny brings a new perspective to a no. of outstanding issues in evolutionary biol., including the anal. of large phylogenetic trees and complex evolutionary models and the detection of the footprint of natural selection in DNA sequences.
-
145Goldstein, R. A.; Pollard, S. T.; Shah, S. D.; Pollock, D. D. Nonadaptive Amino Acid Convergence Rates Decrease over Time. Mol. Biol. Evol. 2015, 32, 1373– 1381, DOI: 10.1093/molbev/msv041145https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs12qurfM&md5=1adafac45f4d245090310e484216fed6Nonadaptive amino acid convergence rates decrease over timeGoldstein, R. A.; Pollard, S. T.; Shah, S. D.; Pollock, D. D.Molecular Biology and Evolution (2015), 32 (6), 1373-1381CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)Convergence is a central concept in evolutionary studies because it provides strong evidence for adaptation. It also provides information about the nature of the fitness landscape and the repeatability of evolution, and can mislead phylogenetic inference. To understand the role of adaptive convergence, we need to understand the patterns of nonadaptive convergence. Here, we consider the relationship between nonadaptive convergence and divergence in mitochondrial and model proteins. Surprisingly, nonadaptive convergence is much more common than expected in closely related organisms, falling off as organisms diverge. The extent of the convergent drop-off in mitochondrial proteins is well predicted by epistatic or coevolutionary effects in our "evolutionary Stokes shift" models and poorly predicted by conventional evolutionary models. Convergence probabilities decrease dramatically if the ancestral amino acids of branches being compared have diverged, but also drop slowly over evolutionary time even if the ancestral amino acids have not substituted. Convergence probabilities drop-off rapidly for quickly evolving sites, but much more slowly for slowly evolving sites. Furthermore, once sites have diverged their convergence probabilities are extremely low and indistinguishable from convergence levels at randomized sites. These results indicate that we cannot assume that excessive convergence early on is necessarily adaptive. This new understanding should help us to better discriminate adaptive from nonadaptive convergence and develop more relevant evolutionary models with improved validity for phylogenetic inference.
-
146Williams, P. D.; Pollock, D. D.; Blackburne, B. P.; Goldstein, R. A. Assessing the Accuracy of Ancestral Protein Reconstruction Methods. PLoS Comput. Biol. 2006, 2, e69, DOI: 10.1371/journal.pcbi.0020069There is no corresponding record for this reference.
-
147Eick, G. N.; Bridgham, J. T.; Anderson, D. P.; Harms, M. J.; Thornton, J. W. Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty. Mol. Biol. Evol. 2016, 34, 247– 261, DOI: 10.1093/molbev/msw223There is no corresponding record for this reference.
-
148Gaucher, E. A.; Govindarajan, S.; Ganesh, O. K. Palaeotemperature Trend for Precambrian Life Inferred from Resurrected Proteins. Nature 2008, 451, 704– 707, DOI: 10.1038/nature06510148https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhs1Kns7c%253D&md5=12ecd01c6a3fb6f85528bd2424518e85Palaeotemperature trend for Precambrian life inferred from resurrected proteinsGaucher, Eric A.; Govindarajan, Sridhar; Ganesh, Omjoy K.Nature (London, United Kingdom) (2008), 451 (7179), 704-707CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)Biosignatures and structures in the geol. record indicate that microbial life has inhabited Earth for ∼3.5 × 109 yr. Research in the phys. sciences has been able to generate statements about the ancient environment that hosted this life. These include the chem. compns. and temps. of the early ocean and atm. Only recently have the natural sciences been able to provide exptl. results describing the environments of ancient life. The authors' previous work with resurrected proteins indicated that ancient life lived in a hot environment. Here, the authors expand the timescale of resurrected proteins to provide a palaeotemp. trend of the environments that hosted life 3.5-0.5 × 109 yr ago. The thermostability of >25 phylogenetically dispersed ancestral elongation factors suggests that the environment supporting ancient life cooled progressively by 30° during that period. Here, the authors show that their results are robust to potential statistical bias assocd. with the posterior distribution of inferred character states, phylogenetic ambiguity, and uncertainties in the amino acid equil. frequencies used by evolutionary models. The results are further supported by a nearly identical cooling trend for the ancient ocean as inferred from the deposition of O isotopes. The convergence of results from natural and phys. sciences suggests that ancient life has continually adapted to changes in environmental temps. throughout its evolutionary history.
-
149Akanuma, S. Characterization of Reconstructed Ancestral Proteins Suggests a Change in Temperature of the Ancient Biosphere. Life (Basel, Switz.) 2017, 7, 33, DOI: 10.3390/life7030033149https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXjvFKgur4%253D&md5=b10716c8df6fad176f11f86ba8344fc6Characterization of reconstructed ancestral proteins suggests a change in temperature of the ancient biosphereAkanuma, SatoshiLife (Basel, Switzerland) (2017), 7 (3), 33/1-33/14CODEN: LBSIB7; ISSN:2075-1729. (MDPI AG)Understanding the evolution of ancestral life, and esp. the ability of some organisms to flourish in the variable environments experienced in Earth's early biosphere, requires knowledge of the characteristics and the environment of these ancestral organisms. Information about early life and environmental conditions has been obtained from fossil records and geol. surveys. Recent advances in phylogenetic anal., and an increasing no. of protein sequences available in public databases, have made it possible to infer ancestral protein sequences possessed by ancient organisms. However, the in silico studies that assess the ancestral base content of rRNAs, the frequency of each amino acid in ancestral proteins, and est. the environmental temps. of ancient organisms, show conflicting results. The characterization of ancestral proteins reconstructed in vitro suggests that ancient organisms had very thermally stable proteins, and therefore were thermophilic or hyperthermophilic. Exptl. data supports the idea that only thermophilic ancestors survived the catastrophic increase in temp. of the biosphere that was likely assocd. with meteorite impacts during the early history of Earth. In addn., by expanding the timescale and including more ancestral proteins for reconstruction, it appears as though the Earth's surface temp. gradually decreased over time, from Archean to present.
-
150Gumulya, Y.; Baek, J.-M.; Wun, S.-J.; Thomson, R. E. S.; Harris, K. L.; Hunter, D. J. B.; Behrendorff, J. B. Y. H.; Kulig, J.; Zheng, S.; Wu, X.; Wu, B.; Stok, J. E.; De Voss, J. J.; Schenk, G.; Jurva, U.; Andersson, S.; Isin, E. M.; Bodén, M.; Guddat, L.; Gillam, E. M. J. Engineering Highly Functional Thermostable Proteins Using Ancestral Sequence Reconstruction. Nat. Catal. 2018, 1, 878, DOI: 10.1038/s41929-018-0159-5150https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtFGisL3E&md5=85eca5d2a0cb9a4d6dc5e8b6e790b718Engineering highly functional thermostable proteins using ancestral sequence reconstructionGumulya, Yosephin; Baek, Jong-Min; Wun, Shun-Jie; Thomson, Raine E. S.; Harris, Kurt L.; Hunter, Dominic J. B.; Behrendorff, James B. Y. H.; Kulig, Justyna; Zheng, Shan; Wu, Xueming; Wu, Bin; Stok, Jeanette E.; De Voss, James J.; Schenk, Gerhard; Jurva, Ulrik; Andersson, Shalini; Isin, Emre M.; Boden, Mikael; Guddat, Luke; Gillam, Elizabeth M. J.Nature Catalysis (2018), 1 (11), 878-888CODEN: NCAACP; ISSN:2520-1158. (Nature Research)Com. biocatalysis requires robust enzymes that can withstand elevated temps. and long incubations. Ancestral reconstruction has shown that pre-Cambrian enzymes were often much more thermostable than extant forms. Here, we resurrect ancestral enzymes that withstand ∼30 °C higher temps. and ≥100 times longer incubations than their extant forms. This is demonstrated on animal cytochromes P 450 that stereo- and regioselectively functionalize unactivated C-H bonds for the synthesis of valuable chems., and bacterial ketol-acid reductoisomerases that are used to make butanol-based biofuels. The vertebrate CYP3 P 450 ancestor showed a 60T50 of 66 °C and enhanced solvent tolerance compared with the human drug-metabolizing CYP3A4, yet comparable activity towards a similarly broad range of substrates. The ancestral ketol-acid reductoisomerase showed an eight-fold higher specific activity than the cognate Escherichia coli form at 25 °C, which increased 3.5-fold at 50 °C. Thus, thermostable proteins can be devised using sequence data alone from even recent ancestors.
-
151Dehouck, Y.; Grosfils, A.; Folch, B.; Gilis, D.; Bogaerts, P.; Rooman, M. Fast and Accurate Predictions of Protein Stability Changes upon Mutations Using Statistical Potentials and Neural Networks: PoPMuSiC-2.0. Bioinformatics 2009, 25, 2537– 2543, DOI: 10.1093/bioinformatics/btp445151https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtFyhtbbF&md5=59f9acfcafd822a7f3a27eb3cf3538cdFast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0Dehouck, Yves; Grosfils, Aline; Folch, Benjamin; Gilis, Dimitri; Bogaerts, Philippe; Rooman, MarianneBioinformatics (2009), 25 (19), 2537-2543CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)The rational design of proteins with modified properties, through amino acid substitutions, is of crucial importance in a large variety of applications. Given the huge no. of possible substitutions, every protein engineering project would benefit strongly from the guidance of in silico methods able to predict rapidly, and with reasonable accuracy, the stability changes resulting from all possible mutations in a protein. The authors exploit newly developed statistical potentials, based on a formalism that highlights the coupling between 4 protein sequence and structure descriptors, and take into account the amino acid vol. variation upon mutation. The stability change is expressed as a linear combination of these energy functions, whose proportionality coeffs. vary with the solvent accessibility of the mutated residue and are identified with the help of a neural network. A correlation coeff. of R = 0.63 and a root mean square error of σc = 1.15 kcal/mol between measured and predicted stability changes are obtained upon cross-validation. These scores reach R = 0.79, and σc = 0.86 kcal/mol after exclusion of 10% outliers. The predictive power of the authors' method is shown to be significantly higher than that of other programs described in the literature.
-
152Khatun, J.; Khare, S. D.; Dokholyan, N. V. Can Contact Potentials Reliably Predict Stability of Proteins?. J. Mol. Biol. 2004, 336, 1223– 1238, DOI: 10.1016/j.jmb.2004.01.002152https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXht1Kiu7s%253D&md5=1c0ddc0d286bbd29f6dbde6e0572af8fCan Contact Potentials Reliably Predict Stability of Proteins?Khatun, Jainab; Khare, Sagar D.; Dokholyan, Nikolay V.Journal of Molecular Biology (2004), 336 (5), 1223-1238CODEN: JMOBAK; ISSN:0022-2836. (Elsevier)The simplest approxn. of interaction potential between amino acid residues in proteins is the contact potential, which defines the effective free energy of a protein conformation by a set of amino acid contacts formed in this conformation. Finding a contact potential capable of predicting free energies of protein states across a variety of protein families will aid protein folding and engineering in silico on a computationally tractable time-scale. We test the ability of contact potentials to accurately and transferably (across various protein families) predict stability changes of proteins upon mutations. We develop a new methodol. to det. the contact potentials in proteins from exptl. measurements of changes in protein's thermodn. stabilities (ΔΔG) upon mutations. We apply our methodol. to derive sets of contact interaction parameters for a hierarchy of interaction models including solvation and multi-body contact parameters. We test how well our models reproduce exptl. measurements by statistical tests. We evaluate the max. accuracy of predictions obtained by using contact potentials and the correlation between parameters derived from different data-sets of exptl. (ΔΔG) values. We argue that it is impossible to reach exptl. accuracy and derive fully transferable contact parameters using the contact models of potentials. However, contact parameters may yield reliable predictions of ΔΔG for datasets of mutations confined to the same amino acid positions in the sequence of a single protein.
-
153Pucci, F.; Bernaerts, K. V.; Kwasigroch, J. M.; Rooman, M. Quantification of Biases in Predictions of Protein Stability Changes upon Mutations. Bioinformatics 2018, 34, 3659– 3665, DOI: 10.1093/bioinformatics/bty348153https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVOmtbzF&md5=a54b52981c88512a4c3e843c8aee584bQuantification of biases in predictions of protein stability changes upon mutationsPucci, Fabrizio; Bernaerts, Katrien V.; Kwasigroch, Jean Marc; Rooman, MarianneBioinformatics (2018), 34 (21), 3659-3665CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis expts. feasible, even on a proteome scale. One of these problems is their bias toward the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations. Results: We thoroughly analyzed the biases in the prediction of folding free energy changes upon point mutations (ΔΔG°) and proposed some unbiased solns. We started by constructing a dataset Ssym of exptl. measured ΔΔG°s with an equal no. of stabilizing and destabilizing mutations, by collecting mutations for which the structure of both the wild-type and mutant protein is available. On this balanced dataset, we assessed the performances of 15 widely used ΔΔG° predictors. After the astonishing observation that almost all these methods are strongly biased toward destabilizing mutations, esp. those that use black-box machine learning, we proposed an elegant way to solve the bias issue by imposing phys. symmetries under inverse mutations on the model structure, which we implemented in PoPMuSiCsym. This new predictor constitutes an efficient trade-off between accuracy and absence of biases. Some final considerations and suggestions for further improvement of the predictors are discussed.
-
154Yin, S.; Ding, F.; Dokholyan, N. V. Eris: An Automated Estimator of Protein Stability. Nat. Methods 2007, 4, 466– 467, DOI: 10.1038/nmeth0607-466154https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXlvVykurg%253D&md5=63263ef71a3219de60a8faff2ca9cfe3Eris: an automated estimator of protein stabilityYin, Shuangye; Ding, Feng; Dokholyan, Nikolay V.Nature Methods (2007), 4 (6), 466-467CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)There is no expanded citation for this reference.
-
155Benedix, A.; Becker, C. M.; de Groot, B. L.; Caflisch, A.; Böckmann, R. A. Predicting Free Energy Changes Using Structural Ensembles. Nat. Methods 2009, 6, 3– 4, DOI: 10.1038/nmeth0109-3155https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhsFCku77K&md5=fb79d05fe6984884761d2877f454d87fPredicting free energy changes using structural ensemblesBenedix, Alexander; Becker, Caroline M.; de Groot, Bert L.; Caflisch, Amedeo; Boeckmann, Rainer A.Nature Methods (2009), 6 (1), 3-4CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)Reliable and fast computation of protein free energy is crucial for protein-structure anal., structure-based protein design and protein docking. Rigorous treatments based on phys. effective energy functions involve computationally expensive methods such as free energy perturbation, which are time-consunming and are thus incompatible with the need to perform extensive scans. Commonly used fast methods, in turn, involve empirically derived scoring functions and usually do not include protein flexibility or are based on statistical potentials and are therefore highly dependent on the availability of case-dependent exptl. training data. Hence, such methods are inherently limited in accuracy and applicability. Here we propose a computational, structure-based method named Concoord/Poisson-Boltzmann surface area (CC/PBSA) for both fast and quant. estn. of the folding free energy of mutants, that is for measuring their conformational stability and for predicting the effect of mutations on protein-protein binding affinity. The first step is to rapidly generate alternative protein conformations via the program Concoord, which efficiently samples the available configurational spaced. The crystal or NMR input structure is translated into a geometric description of the complex, and starting from random coordinates, 300-600 structures both of the mutant and the wild type are generated by iteratively correcting the coordinates until all geometric constraints are fulfilled. Then an energy function based on phys. chem. (force field) and an efficient continuum solvent approach, the soln. of the Poisson-Boltzmann equation and a term for nonpolar solvation, is averaged over the generated structural ensembles.
-
156Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS 4.5: A High-Throughput and Highly Parallel Open Source Molecular Simulation Toolkit. Bioinformatics 2013, 29, 845– 854, DOI: 10.1093/bioinformatics/btt055156https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXksFWmsrg%253D&md5=4b25fd6ab4e33725ae56b5da63f4ad68GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkitPronk, Sander; Pall, Szilard; Schulz, Roland; Larsson, Per; Bjelkmar, Paer; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, ErikBioinformatics (2013), 29 (7), 845-854CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Mol. simulation has historically been a low-throughput technique, but faster computers and increasing amts. of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomols. with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomol. interaction and function in a manner directly testable by expt. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomols., such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these mols. built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.
-
157de Groot, B. L.; van Aalten, D. M.; Scheek, R. M.; Amadei, A.; Vriend, G.; Berendsen, H. J. C. Prediction of Protein Conformational Freedom from Distance Constraints. Proteins: Struct., Funct., Genet. 1997, 29, 240– 251, DOI: 10.1002/(SICI)1097-0134(199710)29:2<240::AID-PROT11>3.0.CO;2-O157https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXntVOhsbY%253D&md5=8840fe5112570bbefcc0ca3e89282adaPrediction of protein conformational freedom from distance constraintsde Groot, B. L.; van Aalten, D. M. F.; Scheek, R. M.; Amadei, A.; Vriend, G.; Berendsen, H. J. C.Proteins: Structure, Function, and Genetics (1997), 29 (2), 240-251CODEN: PSFGEY; ISSN:0887-3585. (Wiley-Liss)A method is presented that generates random protein structures that fulfil a set of upper and lower interat. distance limits. These limits depend on distances measured in exptl. structures and the strength of the interat. interaction. Structural differences between generated structures are similar to those obtained from expt. and from MD simulation. Although detailed aspects of dynamical mechanisms are not covered and the extent of variations are only estd. in a relative sense, applications to an IgG-binding domain, an SH3 binding domain, HPr, calmodulin, and lysozyme are presented which illustrate the use of the method as a fast and simple way to predict structural variability in proteins. The method may be used to support the design of mutants, when structural fluctuations for a large no. of mutants are to be screened. The results suggest that motional freedom in proteins is ruled largely by a set of simple geometric constraints.
-
158Hoppe, C.; Schomburg, D. Prediction of Protein Thermostability with a Direction- and Distance-Dependent Knowledge-Based Potential. Protein Sci. 2005, 14, 2682– 2692, DOI: 10.1110/ps.04940705158https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhtVOrurzN&md5=b9439e5a0eca60cb33c2cbd4762ba7a4Prediction of protein thermostability with a direction- and distance-dependent knowledge-based potentialHoppe, Christian; Schomburg, DietmarProtein Science (2005), 14 (10), 2682-2692CODEN: PRCIEI; ISSN:0961-8368. (Cold Spring Harbor Laboratory Press)The increasing use of enzymes in industrial processes and the importance of understanding protein folding and stability have led to several attempts to predict and quantify the effect of every possible amino acid exchange (mutation) on the thermostability of proteins. In this article the authors describe a knowledge-based discrimination function that acts as a fast and reliable guide in protein engineering and optimization. The function used consists of two parts, a pairwise energy function based on a distance- and direction-dependent at. description of the amino acid environment, and a torsion angle energy function. In a first step a training set of 11 proteins including 646 mutant proteins with exptl. detd. thermostability was used to optimize the knowledge-based energy functions. The resulting potential function was then tested using a test mutant database consisting of 918 various point mutations introduced in 27 proteins. The best correlation coeff. obtained for the exptl. data and the predicted thermostability for the training set is r = 0.81 (561 data points). A total of 76% of the mutations could be predicted correctly as being either stabilizing or destabilizing. The results for the test set are r = 0.74 (747 data points) and 72%, resp. The global correlation over the combined data (1308 mutants) obtained is 0.78.
-
159Pucci, F.; Bourgeas, R.; Rooman, M. Predicting Protein Thermal Stability Changes upon Point Mutations Using Statistical Potentials: Introducing HoTMuSiC. Sci. Rep. 2016, 6, 23257, DOI: 10.1038/srep23257159https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xks1emurk%253D&md5=f945740ddb0a07c32253903a4e3cdfbdPredicting protein thermal stability changes upon point mutations using statistical potentials: Introducing HoTMuSiCPucci, Fabrizio; Bourgeas, Raphael; Rooman, MarianneScientific Reports (2016), 6 (), 23257CODEN: SRCEC3; ISSN:2045-2322. (Nature Publishing Group)The accurate prediction of the impact of an amino acid substitution on the thermal stability of a protein is a central issue in protein science, and is of key relevance for the rational optimization of various bioprocesses that use enzymes in unusual conditions. Here we present one of the first computational tools to predict the change in melting temp. ΔTm upon point mutations, given the protein structure and, when available, the melting temp. Tm of the wild-type protein. The key ingredients of our model structure are std. and temp.-dependent statistical potentials, which are combined with the help of an artificial neural network. The model structure was chosen on the basis of a detailed thermodn. anal. of the system. The parameters of the model were identified on a set of more than 1,600 mutations with exptl. measured ΔTm. The performance of our method was tested using a strict 5-fold cross-validation procedure, and was found to be significantly superior to that of competing methods. We obtained a root mean square deviation between predicted and exptl. ΔTm values of 4.2 °C that reduces to 2.9 °C when ten percent outliers are removed. A webserver-based tool is freely available for non-com. use at soft.dezyme.com.
-
160Capriotti, E.; Fariselli, P.; Casadio, R. I-Mutant2.0: Predicting Stability Changes upon Mutation from the Protein Sequence or Structure. Nucleic Acids Res. 2005, 33, W306– W310, DOI: 10.1093/nar/gki375160https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXlslyrtLY%253D&md5=75a8728d1e9b62a97b205910ca190d40I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structureCapriotti, Emidio; Fariselli, Piero; Casadio, RitaNucleic Acids Research (2005), 33 (Web Server), W306-W310CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)I-Mutant2.0 is a support vector machine (SVM)-based tool for the automatic prediction of protein stability changes upon single point mutations. I-Mutant2.0 predictions are performed starting either from the protein structure or, more importantly, from the protein sequence. This latter task, to the best of our knowledge, is exploited for the first time. The method was trained and tested on a data set derived from ProTherm, which is presently the most comprehensive available database of thermodn. exptl. data of free energy changes of protein stability upon mutation under different conditions. I-Mutant2.0 can be used both as a classifier for predicting the sign of the protein stability change upon mutation and as a regression estimator for predicting the related ΔΔG values. Acting as a classifier, I-Mutant2.0 correctly predicts (with a cross-validation procedure) 80% or 77% of the data set, depending on the usage of structural or sequence information, resp. When predicting ΔΔG values assocd. with mutations, the correlation of predicted with expected/exptl. values is 0.71 (with a std. error of 1.30 kcal/mol) and 0.62 (with a std. error of 1.45 kcal/mol) when structural or sequence information are resp. adopted. Our web interface allows the selection of a predictive mode that depends on the availability of the protein structure and/or sequence. In this latter case, the web server requires only pasting of a protein sequence in a raw format. We therefore introduce I-Mutant2.0 as a unique and valuable helper for protein design, even when the protein structure is not yet known with at. resoln. Availability: http://gpcr.biocomp.uniboit/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi.
-
161Cheng, J.; Randall, A.; Baldi, P. Prediction of Protein Stability Changes for Single-Site Mutations Using Support Vector Machines. Proteins: Struct., Funct., Genet. 2006, 62, 1125– 1132, DOI: 10.1002/prot.20810161https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XivVWnsrY%253D&md5=a14fdcf11c855a7eefbdee0ebb152aeaPrediction of protein stability changes for single-site mutations using support vector machinesCheng, Jianlin; Randall, Arlo; Baldi, PierreProteins: Structure, Function, and Bioinformatics (2006), 62 (4), 1125-1132CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. The authors use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. The authors evaluate their approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy - a significant improvement over previously published results. Moreover, the exptl. results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because the authors' method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MU-pro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.
-
162Wainreb, G.; Wolf, L.; Ashkenazy, H.; Dehouck, Y.; Ben-Tal, N. Protein Stability: A Single Recorded Mutation Aids in Predicting the Effects of Other Mutations in the Same Amino Acid Site. Bioinformatics 2011, 27, 3286– 3292, DOI: 10.1093/bioinformatics/btr576162https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhsFCit7vE&md5=aecb549af2b498e90d14d6ec222f6e07Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid siteWainreb, Gilad; Wolf, Lior; Ashkenazy, Haim; Dehouck, Yves; Ben-Tal, NirBioinformatics (2011), 27 (23), 3286-3292CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Accurate prediction of protein stability is important for understanding the mol. underpinnings of diseases and for the design of new proteins. We introduce a novel approach for the prediction of changes in protein stability that arise from a single-site amino acid substitution; the approach uses available data on mutations occurring in the same position and in other positions. Our algorithm, named Pro-Maya (Protein Mutant stAbilitY Analyzer), combines a collaborative filtering baseline model, Random Forests regression and a diverse set of features. Pro-Maya predicts the stability free energy difference of mutant vs. wild type, denoted as ΔΔG. Results: We evaluated our algorithm extensively using cross-validation on two previously utilized datasets of single amino acid mutations and a (third) validation set. The results indicate that using known ΔΔG values of mutations at the query position improves the accuracy of ΔΔG predictions for other mutations in that position. The accuracy of our predictions in such cases significantly surpasses that of similar methods, achieving, e.g. a Pearson's correlation coeff. of 0.79 and a root mean square error of 0.96 on the validation set. Because Pro-Maya uses a diverse set of features, including predictions using two other methods, it also performs slightly better than other methods in the absence of addnl. exptl. data on the query positions. Availability: Pro-Maya is freely available via web server at http://bentalτac.il/ProMaya. Contact: nirb@tauexτac.il; wolf@Csτac.il.
-
163Li, Y.; Fang, J. PROTS-RF: A Robust Model for Predicting Mutation-Induced Protein Stability Changes. PLoS One 2012, 7, e47247, DOI: 10.1371/journal.pone.0047247163https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xhs1SitLnK&md5=290f4ce672a13b0db81ce26e5bc2516dPROTS-RF: a robust model for predicting mutation-induced protein stability changesLi, Yunqi; Fang, JianwenPLoS One (2012), 7 (10), e47247CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)The ability to improve protein thermostability via protein engineering is of great scientific interest and also has significant practical value. In this report we present PROTS-RF, a robust model based on the Random Forest algorithm capable of predicting thermostability changes induced by not only single-, but also double- or multiple-point mutations. The model is built using 41 features including evolutionary information, secondary structure, solvent accessibility and a set of fragment-based features. It achieves accuracies of 0.799, 0.782, 0.787 and areas under receiver operating characteristic (ROC) curves of 0.873, 0.868 and 0.862 for single-, double- and multiple- point mutation datasets, resp. Contrary to previous suggestions, our results clearly demonstrate that a robust predictive model trained for predicting single point mutation induced thermostability changes can be capable of predicting double and multiple point mutations. It also shows high levels of robustness in the tests using hypothetical reverse mutations. We demonstrate that testing datasets created based on phys. principles can be highly useful for testing the robustness of predictive models.
-
164Quang, D.; Chen, Y.; Xie, X. DANN: A Deep Learning Approach for Annotating the Pathogenicity of Genetic Variants. Bioinformatics 2015, 31, 761– 763, DOI: 10.1093/bioinformatics/btu703164https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1GntLfP&md5=dbb0345a0d2f9b399bdd47e229b40755DANN: a deep learning approach for annotating the pathogenicity of genetic variantsQuang, Daniel; Chen, Yifei; Xie, XiaohuiBioinformatics (2015), 31 (5), 761-763CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Annotating genetic variants, esp. non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large no. of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative redn. in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodol.
-
165Wang, Y.; Mao, H.; Yi, Z. Protein Secondary Structure Prediction by Using Deep Learning Method. Knowl.-Based Syst. 2017, 118, 115– 123, DOI: 10.1016/j.knosys.2016.11.015There is no corresponding record for this reference.
-
166Ivakhnenko, A. G. Polynomial Theory of Complex Systems. IEEE Trans. Syst., Man, Cybern. 1971, SMC-1, 364– 378, DOI: 10.1109/TSMC.1971.4308320There is no corresponding record for this reference.
-
167Bengio, Y.; Boulanger-Lewandowski, N.; Pascanu, R. Advances in Optimizing Recurrent Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing; IEEE: New York, 2013; pp 8624– 8628.There is no corresponding record for this reference.
-
168Cang, Z.; Wei, G.-W. TopologyNet: Topology Based Deep Convolutional and Multi-Task Neural Networks for Biomolecular Property Predictions. PLoS Comput. Biol. 2017, 13, e1005690, DOI: 10.1371/journal.pcbi.1005690168https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXivVWhur4%253D&md5=f09964962b86fa1f30903097cb9e7122TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictionsCang, Zixuan; Wei, Guo-WeiPLoS Computational Biology (2017), 13 (7), e1005690/1-e1005690/27CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to threedimensional (3D) biomol. structural data sets have been hindered by the geometric and biol. complexity. To address this problem we introduce the element-specific persistent homol. (ESPH) method. ESPH represents 3D complex geometry by onedimensional (1D) topol. invariants and retains important biol. information via a multichannel image-like representation. This representation reveals hidden structure-function relationships in biomols. We further integrate ESPH and deep convolutional neural networks to construct a multichannel topol. neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the deep learning limitations from small and noisy training sets, we propose a multi-task multichannel topol. convolutional neural network (MM-TCNN). We demonstrate that TopologyNet outperforms the latest methods in the prediction of protein-ligand binding affinities, mutation induced globular protein folding free energy changes, and mutation induced membrane protein folding free energy changes.
-
169Laimer, J.; Hofer, H.; Fritz, M.; Wegenkittl, S.; Lackner, P. MAESTRO - Multi Agent Stability Prediction upon Point Mutations. BMC Bioinf. 2015, 16, 116, DOI: 10.1186/s12859-015-0548-6169https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2MjkvFGnsQ%253D%253D&md5=531c2cd74bf7afb2770b54fa88e8b71dMAESTRO--multi agent stability prediction upon point mutationsLaimer Josef; Hofer Heidi; Lackner Peter; Laimer Josef; Fritz Marko; Wegenkittl StefanBMC bioinformatics (2015), 16 (), 116 ISSN:.BACKGROUND: Point mutations can have a strong impact on protein stability. A change in stability may subsequently lead to dysfunction and finally cause diseases. Moreover, protein engineering approaches aim to deliberately modify protein properties, where stability is a major constraint. In order to support basic research and protein design tasks, several computational tools for predicting the change in stability upon mutations have been developed. Comparative studies have shown the usefulness but also limitations of such programs. RESULTS: We aim to contribute a novel method for predicting changes in stability upon point mutation in proteins called MAESTRO. MAESTRO is structure based and distinguishes itself from similar approaches in the following points: (i) MAESTRO implements a multi-agent machine learning system. (ii) It also provides predicted free energy change (Δ ΔG) values and a corresponding prediction confidence estimation. (iii) It provides high throughput scanning for multi-point mutations where sites and types of mutation can be comprehensively controlled. (iv) Finally, the software provides a specific mode for the prediction of stabilizing disulfide bonds. The predictive power of MAESTRO for single point mutations and stabilizing disulfide bonds is comparable to similar methods. CONCLUSIONS: MAESTRO is a versatile tool in the field of stability change prediction upon point mutations. Executables for the Linux and Windows operating systems are freely available to non-commercial users from http://biwww.che.sbg.ac.at/MAESTRO.
-
170Khan, S.; Vihinen, M. Performance of Protein Stability Predictors. Hum. Mutat. 2010, 31, 675– 684, DOI: 10.1002/humu.21242170https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXosl2lu70%253D&md5=5d887eca5281e83e6e02d9d7e2ff1176Performance of protein stability predictorsKhan, Sofia; Vihinen, MaunoHuman Mutation (2010), 31 (6), 675-684CODEN: HUMUE3; ISSN:1059-7794. (Wiley-Liss, Inc.)Stability is a fundamental property affecting function, activity, and regulation of biomols. Stability changes are often found for mutated proteins involved in diseases. Stability predictors computationally predict protein-stability changes caused by mutations. We performed a systematic anal. of 11 online stability predictors' performances. These predictors are CUPSAT, Dmutant, FoldX, I-Mutant2.0, two versions of I-Mutant3.0 (sequence and structure versions), MultiMutate, MUpro, SCide, Scpred, and SRide. As input, 1,784 single mutations found in 80 proteins were used, and these mutations did not include those used for training. The programs' performances were also assessed according to where the mutations were found in the proteins, i.e., in secondary structures and on the surface or in the core of a protein, and according to protein structure type. The extents to which the mutations altered the occupied vols. at the residue sites and the charge interactions were also characterized. The predictions of all programs were in line with the exptl. data. I-Mutant3.0 (utilizing structural information), Dmutant, and FoldX were the most reliable predictors. The stability-center predictors performed with similar accuracy. However, at best, the predictions were only moderately accurate (∼60%) and significantly better tools would be needed for routine anal. of mutation effects.
-
171Usmanova, D. R.; Bogatyreva, N. S.; Ariño Bernad, J.; Eremina, A. A.; Gorshkova, A. A.; Kanevskiy, G. M.; Lonishin, L. R.; Meister, A. V.; Yakupova, A. G.; Kondrashov, F. A.; Ivankov, D. N. Self-Consistency Test Reveals Systematic Bias in Programs for Prediction Change of Stability upon Mutation. Bioinformatics 2018, 34, 3653– 3658, DOI: 10.1093/bioinformatics/bty340171https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVOmtb3M&md5=b9568a7765adfed851715d8e389c42f0Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutationUsmanova, Dinara R.; Bogatyreva, Natalya S.; Bernad, Joan Arino; Eremina, Aleksandra A.; Gorshkova, Anastasiya A.; Kanevskiy, German M.; Lonishin, Lyubov R.; Meister, Alexander V.; Yakupova, Alisa G.; Kondrashov, Fyodor A.; Ivankov, Dmitry N.Bioinformatics (2018), 34 (21), 3653-3658CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Computational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results esp. when exploring the effects of combination of different mutations. Results: Here we use a protocol to measure the bias as a function of the no. of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without exptl. measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using addnl. relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here.
-
172Montanucci, L.; Martelli, P. L.; Ben-Tal, N.; Fariselli, P. A Natural Upper Bound to the Accuracy of Predicting Protein Stability Changes upon Mutations. 2018, arXiv:1809.10389 [q-bio.BM]. arXiv.org e-Print archive. https://arxiv.org/abs/1809.10389.There is no corresponding record for this reference.
-
173Rice, P.; Longden, I.; Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000, 16, 276– 277, DOI: 10.1016/S0168-9525(00)02024-2173https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXjvVygsbs%253D&md5=6608aa9c93ff3740ca8af20578774ebeEMBOSS: the european molecular biology open software suiteRice, Peter; Longden, Ian; Bleasby, AlanTrends in Genetics (2000), 16 (6), 276-277CODEN: TRGEE2; ISSN:0168-9525. (Elsevier Science Ltd.)There is no expanded citation for this reference.
-
174Lu, G.; Moriyama, E. N. Vector NTI, a Balanced All-in-One Sequence Analysis Suite. Briefings Bioinf. 2004, 5, 378– 388, DOI: 10.1093/bib/5.4.378174https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhsVejt7k%253D&md5=6b12d412ce01d84107f45d90844ca199Vector NTI, a balanced all-in-one sequence analysis suiteLu, Guoqing; Moriyama, Etsuko N.Briefings in Bioinformatics (2004), 5 (4), 378-388CODEN: BBIMFX; ISSN:1467-5463. (Henry Stewart Publications)A review. Vector NTI is a well-balanced desktop application integrated for mol. sequence anal. and biol. data management. It has a centralized database and five application modules: Vector NTI, AlignX, BioAnnotator, ContigExpress and GenomBench. The features and functions available in this software are examd. These include database management, primer design, virtual cloning, alignments, sequence assembly, 3D mol. viewer and Internet tools. Some problems encountered when using this software are also discussed. Vector NTI is a tool that can save time and enhance anal. but it requires some learning on the user's part and there are some issues that need to be addressed by the developer.
-
175Bendl, J.; Stourac, J.; Sebestova, E.; Vavra, O.; Musil, M.; Brezovsky, J.; Damborsky, J. HotSpot Wizard 2.0: Automated Design of Site-Specific Mutations and Smart Libraries in Protein Engineering. Nucleic Acids Res. 2016, 44, W479– 487, DOI: 10.1093/nar/gkw416175https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtV2itrfJ&md5=01158b85880a6ce74f23fa5a8ccb8fb8HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineeringBendl, Jaroslav; Stourac, Jan; Sebestova, Eva; Vavra, Ondrej; Musil, Milos; Brezovsky, Jan; Damborsky, JiriNucleic Acids Research (2016), 44 (W1), W479-W487CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)HotSpot Wizard 2.0 is a web server for automated identification of hot spots and design of smart libraries for engineering proteins' stability, catalytic activity, substrate specificity and enantioselectivity. The server integrates sequence, structural and evolutionary information obtained from 3 databases and 20 computational tools. Users are guided through the processes of selecting hot spots using four different protein engineering strategies and optimizing the resulting library's size by narrowing down a set of substitutions at individual randomized positions. The only required input is a query protein structure. The results of the calcns. are mapped onto the protein's structure and visualized with a JSmol applet. HotSpot Wizard lists annotated residues suitable for mutagenesis and can automatically design appropriate codons for each implemented strategy. Overall, HotSpot Wizard provides comprehensive annotations of protein structures and assists protein engineers with the rational design of site-specific mutations and focused libraries.
-
176Stamatakis, A. RAxML Version 8: A Tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics 2014, 30, 1312– 1313, DOI: 10.1093/bioinformatics/btu033176https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXmvFCjsbc%253D&md5=4cd7a44e28cbb6dc49d38056c2c3d3a7RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogeniesStamatakis, AlexandrosBioinformatics (2014), 30 (9), 1312-1313CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Phylogenies are increasingly used in all fields of medical and biol. research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under max. likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addn., an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/std.-RAxML. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
-
177Ashkenazy, H.; Penn, O.; Doron-Faigenboim, A.; Cohen, O.; Cannarozzi, G.; Zomer, O.; Pupko, T. FastML: A Web Server for Probabilistic Reconstruction of Ancestral Sequences. Nucleic Acids Res. 2012, 40, W580– 584, DOI: 10.1093/nar/gks498177https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXjtVCrs7Y%253D&md5=b38b2e961d01140374e2ae004157411fFastML: a web server for probabilistic reconstruction of ancestral sequencesAshkenazy, Haim; Penn, Osnat; Doron-Faigenboim, Adi; Cohen, Ofir; Cannarozzi, Gina; Zomer, Oren; Pupko, TalNucleic Acids Research (2012), 40 (W1), W580-W584CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastmlτac.il/.
-
178Diallo, A. B.; Makarenkov, V.; Blanchette, M. Ancestors 1.0: A Web Server for Ancestral Sequence Reconstruction. Bioinformatics 2010, 26, 130– 131, DOI: 10.1093/bioinformatics/btp600178https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhs1WlurnO&md5=97c14a9db63c10f8e238cf1a4424cd10Ancestors 1.0: a web server for ancestral sequence reconstructionDiallo, Abdoulaye Banire; Makarenkov, Vladimir; Blanchette, MathieuBioinformatics (2010), 26 (1), 130-131CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: The computational inference of ancestral genomes consists of five difficult steps: identifying syntenic regions, inferring ancestral arrangement of syntenic regions, aligning multiple sequences, reconstructing the insertion and deletion history and finally inferring substitutions. Each of these steps have received lot of attention in the past years. However, there currently exists no framework that integrates all of the different steps in an easy workflow. Here, we introduce Ancestors 1.0, a web server allowing one to easily and quickly perform the last three steps of the ancestral genome reconstruction procedure. It implements several alignment algorithms, an indel max. likelihood solver and a context-dependent max. likelihood substitution inference algorithm. The results presented by the server include the posterior probabilities for the last two steps of the ancestral genome reconstruction and the expected error rate of each ancestral base prediction.
-
179Westesson, O.; Barquist, L.; Holmes, I. HandAlign: Bayesian Multiple Sequence Alignment, Phylogeny and Ancestral Reconstruction. Bioinformatics 2012, 28, 1170– 1171, DOI: 10.1093/bioinformatics/bts058179https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xlt1Gms70%253D&md5=b92f47dac2f20d877638f8a313602358HandAlign: Bayesian multiple sequence alignment, phylogeny and ancestral reconstructionWestesson, Oscar; Barquist, Lars; Holmes, IanBioinformatics (2012), 28 (8), 1170-1171CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: We describe , a software package for Bayesian reconstruction of phylogenetic history. The underlying model of sequence evolution describes indels and substitutions. Alignments, trees and model parameters are all treated as jointly dependent random variables and sampled via Metropolis-Hastings Markov chain Monte Carlo (MCMC), enabling systematic statistical parameter inference and hypothesis testing. implements several different MCMC proposal kernels, allows sampling from arbitrary target distributions via Hastings ratios, and uses std. file formats for trees, alignments and models. Availability and Implementation: Installation and usage instructions are at http://biowiki.org/HandAlign Contact: [email protected] Supplementary information: Supplementary material is available at Bioinformatics online.
-
180Ronquist, F.; Teslenko, M.; van der Mark, P.; Ayres, D. L.; Darling, A.; Höhna, S.; Larget, B.; Liu, L.; Suchard, M. A.; Huelsenbeck, J. P. MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice across a Large Model Space. Syst. Biol. 2012, 61, 539– 542, DOI: 10.1093/sysbio/sys029180https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38vjvFCqsA%253D%253D&md5=08e0e38811e8752992234a53a0cd1d4fMrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model spaceRonquist Fredrik; Teslenko Maxim; van der Mark Paul; Ayres Daniel L; Darling Aaron; Hohna Sebastian; Larget Bret; Liu Liang; Suchard Marc A; Huelsenbeck John PSystematic biology (2012), 61 (3), 539-42 ISSN:.Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.
-
181Finn, R. D.; Clements, J.; Eddy, S. R. HMMER Web Server: Interactive Sequence Similarity Searching. Nucleic Acids Res. 2011, 39, W29– 37, DOI: 10.1093/nar/gkr367181https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXosVOntbg%253D&md5=69e4432be46e905b8d9afa29c667f684HMMER web server: interactive sequence similarity searchingFinn, Robert D.; Clements, Jody; Eddy, Sean R.Nucleic Acids Research (2011), 39 (Web Server), W29-W37CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted work-flows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the no. of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.
-
182Altschul, S. F.; Gertz, E. M.; Agarwala, R.; Schäffer, A. A.; Yu, Y.-K. PSI-BLAST Pseudocounts and the Minimum Description Length Principle. Nucleic Acids Res. 2009, 37, 815– 824, DOI: 10.1093/nar/gkn981182https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXisFektrc%253D&md5=589075aa5cc67d2dbfa12552a8a939f1PSI-BLAST pseudocounts and the minimum description length principleAltschul, Stephen F.; Gertz, E. Michael; Agarwala, Richa; Schaeffer, Alejandro A.; Yu, Yi-KuoNucleic Acids Research (2009), 37 (3), 815-824CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)Position specific score matrixes (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the obsd. amino acid counts in a multiple alignment column. In the absence of theory, the no. of pseudocounts used has been a completely empirical parameter. This article argues that the min. description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a no. of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calcg. pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default.
-
183Whitehead, T. A.; Chevalier, A.; Song, Y.; Dreyfus, C.; Fleishman, S. J.; De Mattos, C.; Myers, C. A.; Kamisetty, H.; Blair, P.; Wilson, I. A.; Baker, D. Optimization of Affinity, Specificity and Function of Designed Influenza Inhibitors Using Deep Sequencing. Nat. Biotechnol. 2012, 30, 543– 548, DOI: 10.1038/nbt.2214183https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XnsFKgu7s%253D&md5=510fc078ab77b487db059e932395513cOptimization of affinity, specificity and function of designed influenza inhibitors using deep sequencingWhitehead, Timothy A.; Chevalier, Aaron; Song, Yifan; Dreyfus, Cyrille; Fleishman, Sarel J.; De Mattos, Cecilia; Myers, Chris A.; Kamisetty, Hetunandan; Blair, Patrick; Wilson, Ian A.; Baker, DavidNature Biotechnology (2012), 30 (6), 543-548CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.
-
184Shimizu, Y.; Inoue, A.; Tomari, Y.; Suzuki, T.; Yokogawa, T.; Nishikawa, K.; Ueda, T. Cell-Free Translation Reconstituted with Purified Components. Nat. Biotechnol. 2001, 19, 751– 755, DOI: 10.1038/90802184https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXlslekt7g%253D&md5=8560f1b7319ea88b4784a4f02bafcbafCell-free translation reconstituted with purified componentsShimizu, Yoshihiro; Inoue, Akio; Tomari, Yukihide; Suzuki, Tsutomu; Yokogawa, Takashi; Nishikawa, Kazuya; Ueda, TakuyaNature Biotechnology (2001), 19 (8), 751-755CODEN: NABIF9; ISSN:1087-0156. (Nature America Inc.)We have developed a protein-synthesizing system reconstituted from recombinant tagged protein factors purified to homogeneity. The system was able to produce protein at a rate of about 160 μg/mL/h in a batch mode without the need for any supplementary app. The protein products were easily purified within 1 h using affinity chromatog. to remove the tagged protein factors. Moreover, omission of a release factor allowed efficient incorporation of an unnatural amino acid using suppressor tRNA.
-
185Niwa, T.; Kanamori, T.; Ueda, T.; Taguchi, H. Global Analysis of Chaperone Effects Using a Reconstituted Cell-Free Translation System. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 8937– 8942, DOI: 10.1073/pnas.1201380109185https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XovF2gtLw%253D&md5=72312246f5d49ef2d94e69dac05dca7bGlobal analysis of chaperone effects using a reconstituted cell-free translation systemNiwa, Tatsuya; Kanamori, Takashi; Ueda, Takuya; Taguchi, HidekiProceedings of the National Academy of Sciences of the United States of America (2012), 109 (23), 8937-8942, S8937/1-S8937/8CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Protein folding is often hampered by protein aggregation, which can be prevented by a variety of chaperones in the cell. A dataset that evaluates which chaperones are effective for aggregation-prone proteins would provide an invaluable resource not only for understanding the roles of chaperones, but also for broader applications in protein science and engineering. Therefore, we comprehensively evaluated the effects of the major Escherichia coli chaperones, trigger factor, DnaK/DnaJ/GrpE, and GroEL/GroES, on ∼800 aggregation-prone cytosolic E. coli proteins, using a reconstituted chaperone-free translation system. Statistical analyses revealed the robustness and the intriguing properties of chaperones. The DnaK and GroEL systems drastically increased the solubilities of hundreds of proteins with weak biases, whereas trigger factor had only a marginal effect on soly. The combined addn. of the chaperones was effective for a subset of proteins that were not rescued by any single chaperone system, supporting the synergistic effect of these chaperones. The resource, which is accessible via a public database, can be used to investigate the properties of proteins of interest in terms of their solubilities and chaperone effects.
-
186Berman, H. M.; Gabanyi, M. J.; Kouranov, A.; Micallef, D. I.; Westbrook, J. Protein Structure Initiative - TargetTrack 2000–2017 - All Data Files. DOI: 10.5281/zenodo.821654 .There is no corresponding record for this reference.
-
187Price, W. N.; Handelman, S. K.; Everett, J. K.; Tong, S. N.; Bracic, A.; Luff, J. D.; Naumov, V.; Acton, T.; Manor, P.; Xiao, R.; Rost, B.; Montelione, G. T.; Hunt, J. F. Large-Scale Experimental Studies Show Unexpected Amino Acid Effects on Protein Expression and Solubility in Vivo in E. coli. Microb. Inf. Exp. 2011, 1, 6, DOI: 10.1186/2042-5783-1-6187https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXpt12gsbw%253D&md5=82e9dc51ba9a58313e6879a9c634717fLarge-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coliPrice, W. Nicholson, II; Handelman, Samuel K.; Everett, John K.; Tong, Saichiu N.; Bracic, Ana; Luff, Jon D.; Naumov, Victor; Acton, Thomas; Manor, Philip; Xiao, Rong; Rost, Burkhard; Montelione, Gaetano T.; Hunt, John F.Microbial Informatics and Experimentation (2011), 1 (), 6CODEN: MIEIBV; ISSN:2042-5783. (BioMed Central Ltd.)The biochem. and phys. factors controlling protein expression level and soly. in vivo remain incompletely characterized. To gain insight into the primary sequence features influencing these outcomes, we performed statistical analyses of results from the high-throughput protein-prodn. pipeline of the Northeast Structural Genomics Consortium. Proteins expressed in E. coli and consistently purified were scored independently for expression and soly. levels. These parameters nonetheless show a very strong pos. correlation. We used logistic regressions to det. whether they are systematically influenced by fractional amino acid compn. or several bulk sequence parameters including hydrophobicity, sidechain entropy, electrostatic charge, and predicted backbone disorder. Decreasing hydrophobicity correlates with higher expression and soly. levels, but this correlation apparently derives solely from the beneficial effect of three charged amino acids, at least for bacterial proteins. In fact, the three most hydrophobic residues showed very different correlations with soly. level. Leu showed the strongest neg. correlation among amino acids, while Ile showed a slightly pos. correlation in most data segments. Several other amino acids also had unexpected effects. Notably, Arg correlated with decreased expression and, most surprisingly, soly. of bacterial proteins, an effect only partially attributable to rare codons. However, rare codons did significantly reduce expression despite use of a codon-enhanced strain. Addnl. analyses suggest that pos. but not neg. charged amino acids may reduce translation efficiency in E. coli irresp. of codon usage. While some obsd. effects may reflect indirect evolutionary correlations, others may reflect basic physicochem. phenomena. We used these results to construct and validate predictors of expression and soly. levels and overall protein usability, and we propose new strategies to be explored for engineering improved protein expression and soly.
-
188Hirose, S.; Kawamura, Y.; Yokota, K.; Kuroita, T.; Natsume, T.; Komiya, K.; Tsutsumi, T.; Suwa, Y.; Isogai, T.; Goshima, N.; Noguchi, T. Statistical Analysis of Features Associated with Protein Expression/Solubility in an in Vivo Escherichia coli Expression System and a Wheat Germ Cell-Free Expression System. J. Biochem. 2011, 150, 73– 81, DOI: 10.1093/jb/mvr042188https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXosVOnsbg%253D&md5=7330f2d39d93e7641ee73536e6faee97Statistical analysis of features associated with protein expression/solubility in an in vivo Escherichia coli expression system and a wheat germ cell-free expression systemHirose, Shuichi; Kawamura, Yoshifumi; Yokota, Kiyonobu; Kuroita, Toshihiro; Natsume, Tohru; Komiya, Kazuo; Tsutsumi, Takeshi; Suwa, Yorimasa; Isogai, Takao; Goshima, Naoki; Noguchi, TamotsuJournal of Biochemistry (2011), 150 (1), 73-81CODEN: JOBIAO; ISSN:0021-924X. (Japanese Biochemical Society)Recombinant protein technol. is an important tool in many industrial and pharmacol. applications. Although the success rate of obtaining sol. proteins is relatively low, knowledge of protein expression/soly. under std.' conditions may increase the efficiency and reduce the cost of proteomics studies. In this study, we conducted a genome-scale expt. to assess the overexpression and the soly. of human full-length cDNA in an in vivo Escherichia coli expression system and a wheat germ cell-free expression system. We evaluated the influences of sequence and structural features on protein expression/soly. in each system and estd. a minimal set of features assocd. with them. A comparison of the feature sets related to protein expression/soly. in the in vivo Escherichia coli expression system revealed that the structural information was strongly assocd. with protein expression, rather than protein soly. Moreover, a significant difference was found in the no. of features assocd. with protein soly. in the two expression systems.
-
189Pawlicki, S.; Le Béchec, A.; Delamarche, C. AMYPdb: A Database Dedicated to Amyloid Precursor Proteins. BMC Bioinf. 2008, 9, 273, DOI: 10.1186/1471-2105-9-273189https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1cvis1Kqsg%253D%253D&md5=066a0a7b2527a74deb78bad957070fc4AMYPdb: a database dedicated to amyloid precursor proteinsPawlicki Sandrine; Le Bechec Antony; Delamarche ChristianBMC bioinformatics (2008), 9 (), 273 ISSN:.BACKGROUND: Misfolding and aggregation of proteins into ordered fibrillar structures is associated with a number of severe pathologies, including Alzheimer's disease, prion diseases, and type II diabetes. The rapid accumulation of knowledge about the sequences and structures of these proteins allows using of in silico methods to investigate the molecular mechanisms of their abnormal conformational changes and assembly. However, such an approach requires the collection of accurate data, which are inconveniently dispersed among several generalist databases. RESULTS: We therefore created a free online knowledge database (AMYPdb) dedicated to amyloid precursor proteins and we have performed large scale sequence analysis of the included data. Currently, AMYPdb integrates data on 31 families, including 1,705 proteins from nearly 600 organisms. It displays links to more than 2,300 bibliographic references and 1,200 3D-structures. A Wiki system is available to insert data into the database, providing a sharing and collaboration environment. We generated and analyzed 3,621 amino acid sequence patterns, reporting highly specific patterns for each amyloid family, along with patterns likely to be involved in protein misfolding and aggregation. CONCLUSION: AMYPdb is a comprehensive online database aiming at the centralization of bioinformatic data regarding all amyloid proteins and their precursors. Our sequence pattern discovery and analysis approach unveiled protein regions of significant interest. AMYPdb is freely accessible 1.
-
190Thompson, M. J.; Sievers, S. A.; Karanicolas, J.; Ivanova, M. I.; Baker, D.; Eisenberg, D. The 3D Profile Method for Identifying Fibril-Forming Segments of Proteins. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 4074– 4078, DOI: 10.1073/pnas.0511295103190https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XivFWitbo%253D&md5=e9bbb052fa861fe0f2ac116efeedaa23The 3D profile method for identifying fibril-forming segments of proteinsThompson, Michael J.; Sievers, Stuart A.; Karanicolas, John; Ivanova, Magdalena I.; Baker, David; Eisenberg, DavidProceedings of the National Academy of Sciences of the United States of America (2006), 103 (11), 4074-4078CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)Based on the crystal structure of the cross-β spine formed by the peptide NNQQNY, we have developed a computational approach for identifying those segments of amyloidogenic proteins that themselves can form amyloid-like fibrils. The approach builds on expts. showing that hexapeptides are sufficient for forming amyloid-like fibrils. Each six-residue peptide of a protein of interest is mapped onto an ensemble of templates, or 3D profile, generated from the crystal structure of the peptide NNQQNY by small displacements of one of the two intermeshed β-sheets relative to the other. The energy of each mapping of a sequence to the profile is evaluated by using ROSETTADESIGN, and the lowest energy match for a given peptide to the template library is taken as the putative prediction. If the energy of the putative prediction is lower than a threshold value, a prediction of fibril formation is made. This method can reach an accuracy of ≈80% with a P value of ≈10-12 when a conservative energy threshold is used to sep. peptides that form fibrils from those that do not. We see enrichment for pos. predictions in a set of fibril-forming segments of amyloid proteins, and we illustrate the method with applications to proteins of interest in amyloid research.
-
191Beerten, J.; Van Durme, J.; Gallardo, R.; Capriotti, E.; Serpell, L.; Rousseau, F.; Schymkowitz, J. WALTZ-DB: A Benchmark Database of Amyloidogenic Hexapeptides. Bioinformatics 2015, 31, 1698– 1700, DOI: 10.1093/bioinformatics/btv027191https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1GntLbL&md5=dea8cf53396bc57a03ab464287bca20cWALTZ-DB: a benchmark database of amyloidogenic hexapeptidesBeerten, Jacinte; Van Durme, Joost; Gallardo, Rodrigo; Capriotti, Emidio; Serpell, Louise; Rousseau, Frederic; Schymkowitz, JoostBioinformatics (2015), 31 (10), 1698-1700CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: Accurate prediction of amyloid-forming amino acid sequences remains an important challenge. We here present an online database that provides open access to the largest set of exptl. characterized amyloid forming hexapeptides. To this end, we expanded our previous set of 280 hexapeptides used to develop the Waltz algorithm with 89 peptides from literature review and by systematic exptl. characterization of the aggregation of 720 hexapeptides by transmission electron microscopy, dye binding and Fourier transform IR spectroscopy. This brings the total no. of exptl. characterized hexapeptides in the WALTZ-DB database to 1089, of which 244 are annotated as pos. for amyloid formation.
-
192Wozniak, P. P.; Kotulska, M. AmyLoad: Website Dedicated to Amyloidogenic Protein Fragments. Bioinformatics 2015, 31, 3395– 3397, DOI: 10.1093/bioinformatics/btv375192https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1Cit7zK&md5=6a5be50aa459e25d0138ffd3226de846AmyLoad: website dedicated to amyloidogenic protein fragmentsWozniak, Pawel P.; Kotulska, MalgorzataBioinformatics (2015), 31 (20), 3395-3397CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Analyses of amyloidogenic sequence fragments are essential in studies of neurodegenerative diseases. However, there is no one internet dataset that collects all the sequences that have been investigated for their amyloidogenicity. Therefore, we have created the AmyLoad website which collects the amyloidogenic sequences from all major sources. The website allows for filtration of the fragments and provides detailed information about each of them. Registered users can both personalize their work with the website and submit their own sequences into the database. To maintain database reliability, submitted sequences are reviewed before making them available to the public. Finally, we re-implemented several amyloidogenic sequence predictors, thus the AmyLoad website can be used as a sequence anal. tool. We encourage researchers working on amyloid proteins to contribute to our service.
-
193Sastry, A.; Monk, J.; Tegel, H.; Uhlen, M.; Palsson, B. O.; Rockberg, J.; Brunk, E. Machine Learning in Computational Biology to Accelerate High-Throughput Protein Expression. Bioinformatics 2017, 33, 2487– 2495, DOI: 10.1093/bioinformatics/btx207193https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1cvmslaguw%253D%253D&md5=df1098665ccb1c5b077d6c6887322336Machine learning in computational biology to accelerate high-throughput protein expressionSastry Anand; Monk Jonathan; Palsson Bernhard O; Brunk Elizabeth; Tegel Hanna; Uhlen Mathias; Rockberg Johan; Uhlen Mathias; Palsson Bernhard O; Brunk ElizabethBioinformatics (Oxford, England) (2017), 33 (16), 2487-2495 ISSN:.Motivation: The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility. Results: Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation. Availability and implementation: We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template for analysis of further expression and solubility datasets. Contact: [email protected] or [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online.
-
194Thangakani, A. M.; Nagarajan, R.; Kumar, S.; Sakthivel, R.; Velmurugan, D.; Gromiha, M. M. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation. PLoS One 2016, 11, e0152949, DOI: 10.1371/journal.pone.0152949194https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1Gns7bK&md5=25848d120280c0afb71e16fbe96f918dCPAD, curated protein aggregation database: a repository of manually curated experimental data on protein and peptide aggregationThangakani, A. Mary; Nagarajan, R.; Kumar, Sandeep; Sakthivel, R.; Velmurugan, D.; Gromiha, M. MichaelPLoS One (2016), 11 (4), e0152949/1-e0152949/7CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are crit. to research on protein misfolding diseases, such as Alzheimer's and Parkinson's, as well as biotechnol. prodn. of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from exptl. studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 exptl. obsd. aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of addnl. information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Exptl. validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways.
-
195Tian, Y.; Deutsch, C.; Krishnamoorthy, B. Scoring Function To Predict Solubility Mutagenesis. Algorithms Mol. Biol. 2010, 5, 33, DOI: 10.1186/1748-7188-5-33195https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3cbgvVOltA%253D%253D&md5=8bdcda410281dcee011391f17e78febfScoring function to predict solubility mutagenesisTian Ye; Deutsch Christopher; Krishnamoorthy BalaAlgorithms for molecular biology : AMB (2010), 5 (), 33 ISSN:.BACKGROUND: Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention. RESULTS: We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%. AVAILABILITY: Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html.
-
196Wilkinson, D. L.; Harrison, R. G. Predicting the Solubility of Recombinant Proteins in Escherichia coli. Nat. Biotechnol. 1991, 9, 443– 448, DOI: 10.1038/nbt0591-443196https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK38Xjt1an&md5=b522dcdccd3f0c40b85d10cd5df10826Predicting the solubility of recombinant proteins in Escherichia coliWilkinson, David L.; Harrison, Roger G.Bio/Technology (1991), 9 (5), 443-8CODEN: BTCHDA; ISSN:0733-222X.The cause of inclusion body formation in E. coli grown at 37° was studied using statistical anal. of the compn. of 81 proteins that do and do not form inclusion bodies. Six compn. derived parameters were used. In declining order of their correlation with inclusion body formation, the parameters are charge av., turn forming residue fraction, cysteine fraction, proline fraction, hydrophilicity, and total no. of residues. The correlation with inclusion body formation is strong for the 1st 2 parameters but weak for the last 4. This correlation can be used to predict the probability that a protein will form inclusion bodies using only the protein's amino acid compn. as the basis for the prediction.
-
197Davis, G. D.; Elisee, C.; Newham, D. M.; Harrison, R. G. New Fusion Protein Systems Designed to Give Soluble Expression in Escherichia coli. Biotechnol. Bioeng. 1999, 65, 382– 388, DOI: 10.1002/(SICI)1097-0290(19991120)65:4<382::AID-BIT2>3.0.CO;2-I197https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1MXmslGktr8%253D&md5=3c966e2554f136b96e47e14c11680506New fusion protein systems designed to give soluble expression in Escherichia coliDavis, Gregory D.; Elisee, Claude; Newham, Denton M.; Harrison, Roger G.Biotechnology and Bioengineering (1999), 65 (4), 382-388CODEN: BIBIAU; ISSN:0006-3592. (John Wiley & Sons, Inc.)Three native E. coli proteins-NusA, GrpE, and bacterioferritin (BFR)-were studied in fusion proteins expressed in E. coli for their ability to confer soly. on a target insol. protein at the C-terminus of the fusion protein. These three proteins were chosen based on their favorable cytoplasmic soly. characteristics as predicted by a statistical soly. model for recombinant proteins in E. coli. Modeling predicted the probability of sol. fusion protein expression for the target insol. protein human interleukin-3 (hIL-3) in the following order: NusA (most sol.), GrpE, BFR, and thioredoxin (least sol.). Expression expts. at 37° showed that the NusA/hIL-3 fusion protein was expressed almost completely in the sol. fraction, while GrpE/hIL-3 and BFR/hIL-3 exhibited partial soly. at 37°. Thioredoxin/hIL-3 was expressed almost completely in the insol. fraction. Fusion proteins consisting of NusA and either bovine growth hormone or human interferon-γ were also expressed in E. coli at 37° and again showed that the fusion protein was almost completely sol. Starting with the NusA/hIL-3 fusion protein with an N-terminal histidine tag, purified hIL-3 with full biol. activity was obtained using immobilized metal affinity chromatog., factor Xa protease cleavage, and anion exchange chromatog.
-
198Magnan, C. N.; Randall, A.; Baldi, P. SOLpro: Accurate Sequence-Based Prediction of Protein Solubility. Bioinformatics 2009, 25, 2200– 2207, DOI: 10.1093/bioinformatics/btp386198https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtVelu7fE&md5=7c24ccbf700c19b311ecd42abe49ec4aSOLpro: accurate sequence-based prediction of protein solubilityMagnan, Christophe N.; Randall, Arlo; Baldi, PierreBioinformatics (2009), 25 (17), 2200-2207CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Protein insoly. is a major obstacle for many exptl. studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be sol. on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the soly. of insol. proteins. Here, the authors first curate a large, non-redundant and balanced training set of more than 17 000 proteins. Next, the authors ext. and study 23 groups of features computed directly or predicted (e.g. secondary structure) from the primary sequence. The data and the features are used to train a two-stage support vector machine (SVM) architecture. The resulting predictor, SOLpro, is compared directly with existing methods and shows significant improvement according to std. evaluation metrics, with an overall accuracy of over 74% estd. using multiple runs of 10-fold cross-validation. SOLpro is integrated in the SCRATCH suite of predictors and is available for download as a standalone application and as a web server at: http://scratch.proteomics.ics.uci.edu.
-
199Smialowski, P.; Doose, G.; Torkler, P.; Kaufmann, S.; Frishman, D. PROSO II—A New Method for Protein Solubility Prediction. FEBS J. 2012, 279, 2192– 2200, DOI: 10.1111/j.1742-4658.2012.08603.x199https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xps12qtrs%253D&md5=e80fc695e7ec155c3e173d218793f10fPROSO II - a new method for protein solubility predictionSmialowski, Pawel; Doose, Gero; Torkler, Phillipp; Kaufmann, Stefanie; Frishman, DmitrijFEBS Journal (2012), 279 (12), 2192-2200CODEN: FJEOAC; ISSN:1742-464X. (Wiley-Blackwell)Many fields of science and industry depend on efficient prodn. of active protein using heterologous expression in Escherichia coli. The soly. of proteins upon expression is dependent on their amino acid sequence. Prediction of soly. from sequence is therefore highly valuable. We present a novel machine-learning-based model called PROSO II which makes use of new classification methods and growth in exptl. data to improve coverage and accuracy of soly. predictions. The classification algorithm is organized as a two-layered structure in which the output of a primary Parzen window model for sequence similarity and a logistic regression classifier of amino acid k-mer compn. serve as input for a second-level logistic regression classifier. Compared with previously published research our model is trained on five times more data than used by any other method before (82,000 proteins). When tested on a sep. holdout set not used at any point of method development our server attained the best results in comparison with other currently available methods: accuracy 75.4%, Matthew's correlation coeff. 0.39, sensitivity 0.731, specificity 0.759, gain (sol.) 2.263. In summary, due to utilization of cutting edge machine learning technologies combined with the largest currently available exptl. data set the PROSO II server constitutes a substantial improvement in protein soly. predictions.
-
200Agostini, F.; Cirillo, D.; Livi, C. M.; Delli Ponti, R.; Tartaglia, G. G. CcSOL Omics: A Webserver for Solubility Prediction of Endogenous and Heterologous Expression in Escherichia coli. Bioinformatics 2014, 30, 2975– 2977, DOI: 10.1093/bioinformatics/btu420200https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhtFOrt7nP&md5=ff80067f4bfe752df02b81f0db836b99ccSOL omics: a webserver for solubility prediction of endogenous and heterologous expression in Escherichia coliAgostini, Federico; Cirillo, Davide; Livi, Carmen Maria; Delli Ponti, Riccardo; Tartaglia, Gian GaetanoBioinformatics (2014), 30 (20), 2975-2977CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Summary: Here we introduce ccSOL omics, a webserver for largescale calcns. of protein soly. Our method allows (i) proteome- wide predictions; (ii) identification of sol. fragments within each sequences; (iii) exhaustive single-point mutation anal. Results: Using coil/disorder, hydrophobicity, hydrophilicity, β-sheet and α-helix propensities, we built a predictor of protein soly. Our approach shows an accuracy of 79% on the training set (36 990 Target Track entries). Validation on three independent sets indicates that ccSOL omics discriminates sol. and insol. proteins with an accuracy of 74% on 31 760 proteins sharing 530% sequence similarity.
-
201Khurana, S.; Rawi, R.; Kunji, K.; Chuang, G.-Y.; Bensmail, H.; Mall, R. DeepSol: A Deep Learning Framework for Sequence-Based Protein Solubility Prediction. Bioinformatics 2018, 34, 2605– 2613, DOI: 10.1093/bioinformatics/bty166201https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVWis7fO&md5=7b526bdc5291f8cf9eec6d1f13ad1289DeepSol: a deep learning framework for sequence-based protein solubility predictionKhurana, Sameer; Rawi, Reda; Kunji, Khalid; Chuang, Gwo-Yu; Bensmail, Halima; Mall, RaghvendraBioinformatics (2018), 34 (15), 2605-2613CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)Motivation: Protein soly. plays a vital role in pharmaceutical research and prodn. yield. For a given protein, the extent of its soly. can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein soly. predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein soly. predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and addnl. sequence and structural features extd. from the protein sequence. Results: DeepSol outperformed all known sequence-based state-of-the-art soly. prediction methods and attained an accuracy of 0.77 and Matthew's correlation coeff. of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced prodn. capacity and can more reliably predict soly. of novel proteins.
-
202Chang, C. C. H.; Li, C.; Webb, G. I.; Tey, B.; Song, J.; Ramanan, R. N. Periscope: Quantitative Prediction of Soluble Protein Expression in the Periplasm of Escherichia coli. Sci. Rep. 2016, 6, 21844, DOI: 10.1038/srep21844202https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XjsFers7w%253D&md5=def551b2e361651ff1400145da18de96Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coliChang, Catherine Ching Han; Li, Chen; Webb, Geoffrey I.; Tey, Beng Ti; Song, Jiangning; Ramanan, Ramakrishnan NagasundaraScientific Reports (2016), 6 (), 21844CODEN: SRCEC3; ISSN:2045-2322. (Nature Publishing Group)Periplasmic expression of sol. proteins in Escherichia coli not only offers a much-simplified downstream purifn. process, but also enhances the probability of obtaining correctly folded and biol. active proteins. Different combinations of signal peptides and target proteins lead to different sol. protein expression levels, ranging from negligible to several grams per L. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier dets. which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson's correlation coeff. (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/.
-
203Hirose, S.; Noguchi, T. ESPRESSO: A System for Estimating Protein Expression and Solubility in Protein Expression Systems. Proteomics 2013, 13, 1444– 1456, DOI: 10.1002/pmic.201200175203https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmtV2lurk%253D&md5=adcfeb20aa6d4a259d19fe4c7f88c9e7ESPRESSO: A system for estimating protein expression and solubility in protein expression systemsHirose, Shuichi; Noguchi, TamotsuProteomics (2013), 13 (9), 1444-1456CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)Recombinant protein technol. is essential for conducting protein science and using proteins as materials in pharmaceutical or industrial applications. Although obtaining sol. proteins is still a major exptl. obstacle, knowledge about protein expression/soly. under std. conditions may increase the efficiency and reduce the cost of proteomics studies. In this study, we present a computational approach to est. the probability of protein expression and soly. for two different protein expression systems: in vivo Escherichia coli and wheat germ cell-free, from only the sequence information. It implements two kinds of methods: a sequence/predicted structural property-based method that uses both the sequence and predicted structural features, and a sequence pattern-based method that utilizes the occurrence frequencies of sequence patterns. In the benchmark test, the proposed methods obtained F-scores of around 70%, and outperformed publicly available servers. Applying the proposed methods to genomic data revealed that proteins assocd. with translation or transcription have a strong tendency to be expressed as sol. proteins by the in vivo E. coli expression system. The sequence pattern-based method also has the potential to indicate a candidate region for modification, to increase protein soly. All methods are available for free at the ESPRESSO server (http://mbs.cbrc.jp/ESPRESSO).
-
204Hon, J.; Marusiak, M.; Martinek, T.; Zendulka, J.; Bednar, D.; Damborsky, J. SoluProt: Prediction of Protein Solubility. Nucleic Acids Res. 2018, in preparationThere is no corresponding record for this reference.
-
205DuBay, K. F.; Pawar, A. P.; Chiti, F.; Zurdo, J.; Dobson, C. M.; Vendruscolo, M. Prediction of the Absolute Aggregation Rates of Amyloidogenic Polypeptide Chains. J. Mol. Biol. 2004, 341, 1317– 1326, DOI: 10.1016/j.jmb.2004.06.043205https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXmsVCrtb4%253D&md5=6069c47fb2fff331a1a037c8345cd72fPrediction of the Absolute Aggregation Rates of Amyloidogenic Polypeptide ChainsDuBay, Kateri F.; Pawar, Amol P.; Chiti, Fabrizio; Zurdo, Jesus; Dobson, Christopher M.; Vendruscolo, MicheleJournal of Molecular Biology (2004), 341 (5), 1317-1326CODEN: JMOBAK; ISSN:0022-2836. (Elsevier)Protein aggregation is assocd. with a variety of pathol. conditions, including Alzheimer's and Creutzfeldt-Jakob diseases and type II diabetes. Such degenerative disorders result from the conversion of the normal sol. state of specific proteins into aggregated states that can ultimately form the characteristic amyloid fibrils found in diseased tissue. Under appropriate conditions it appears that many, perhaps all, proteins can be converted in vitro into amyloid fibrils. The aggregation propensities of different polypeptide chains have, however, been obsd. to vary substantially. Here, we describe an approach that uses the knowledge of the amino acid sequence and of the exptl. conditions to reproduce, with a correlation coeff. of 0.92 and over five orders of magnitude, the in vitro aggregation rates of a wide range of unstructured peptides and proteins. These results indicate that the formation of protein aggregates can be rationalized to a considerable extent in terms of simple physico-chem. parameters that describe the properties of polypeptide chains and their environment.
-
206Tartaglia, G. G.; Pawar, A. P.; Campioni, S.; Dobson, C. M.; Chiti, F.; Vendruscolo, M. Prediction of Aggregation-Prone Regions in Structured Proteins. J. Mol. Biol. 2008, 380, 425– 436, DOI: 10.1016/j.jmb.2008.05.013206https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXnt1eltrg%253D&md5=602438424a74012b2c2fd0b17ce944d4Prediction of Aggregation-Prone Regions in Structured ProteinsTartaglia, Gian Gaetano; Pawar, Amol P.; Campioni, Silvia; Dobson, Christopher M.; Chiti, Fabrizio; Vendruscolo, MicheleJournal of Molecular Biology (2008), 380 (2), 425-436CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)We present a method for predicting the regions of the sequences of peptides and proteins that are most important in promoting their aggregation and amyloid formation. The method extends previous approaches by allowing such predictions to be carried out for conditions under which the mols. concerned can be folded or contain a significant degree of persistent structure. In order to achieve this result, the method uses only knowledge of the sequence of amino acids to est. simultaneously both the propensity for folding and aggregation and the way in which these two types of propensity compete. We illustrate the approach by its application to a set of peptides and proteins both assocd. and not assocd. with disease. Our results show not only that the regions of a protein with a high intrinsic aggregation propensity can be identified in a robust manner but also that the structural context of such regions in the monomeric form is crucial for detg. their actual role in the aggregation process.
-
207Conchillo-Solé, O.; de Groot, N. S.; Avilés, F. X.; Vendrell, J.; Daura, X.; Ventura, S. AGGRESCAN: A Server for the Prediction and Evaluation of “Hot Spots” of Aggregation in Polypeptides. BMC Bioinf. 2007, 8, 65, DOI: 10.1186/1471-2105-8-65207https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2s7ntFCjtg%253D%253D&md5=45a7bdfb4bdda006778830f70a5cc030AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptidesConchillo-Sole Oscar; de Groot Natalia S; Aviles Francesc X; Vendrell Josep; Daura Xavier; Ventura SalvadorBMC bioinformatics (2007), 8 (), 65 ISSN:.BACKGROUND: Protein aggregation correlates with the development of several debilitating human disorders of growing incidence, such as Alzheimer's and Parkinson's diseases. On the biotechnological side, protein production is often hampered by the accumulation of recombinant proteins into aggregates. Thus, the development of methods to anticipate the aggregation properties of polypeptides is receiving increasing attention. AGGRESCAN is a web-based software for the prediction of aggregation-prone segments in protein sequences, the analysis of the effect of mutations on protein aggregation propensities and the comparison of the aggregation properties of different proteins or protein sets. RESULTS: AGGRESCAN is based on an aggregation-propensity scale for natural amino acids derived from in vivo experiments and on the assumption that short and specific sequence stretches modulate protein aggregation. The algorithm is shown to identify a series of protein fragments involved in the aggregation of disease-related proteins and to predict the effect of genetic mutations on their deposition propensities. It also provides new insights into the differential aggregation properties displayed by globular proteins, natively unfolded polypeptides, amyloidogenic proteins and proteins found in bacterial inclusion bodies. CONCLUSION: By identifying aggregation-prone segments in proteins, AGGRESCAN http://bioinf.uab.es/aggrescan/ shall facilitate (i) the identification of possible therapeutic targets for anti-depositional strategies in conformational diseases and (ii) the anticipation of aggregation phenomena during storage or recombinant production of bioactive polypeptides or polypeptide sets.
-
208Fernandez-Escamilla, A.-M.; Rousseau, F.; Schymkowitz, J.; Serrano, L. Prediction of Sequence-Dependent and Mutational Effects on the Aggregation of Peptides and Proteins. Nat. Biotechnol. 2004, 22, 1302– 1306, DOI: 10.1038/nbt1012208https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXotFGqtb8%253D&md5=ce1f751f3691066ec1bc6ce5caed6aaePrediction of sequence-dependent and mutational effects on the aggregation of peptides and proteinsFernandez-Escamilla, Ana-Maria; Rousseau, Frederic; Schymkowitz, Joost; Serrano, LuisNature Biotechnology (2004), 22 (10), 1302-1306CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)A statistical mechanics algorithm, TANGO, is developed to predict protein aggregation. TANGO is based on the physico-chem. principles of β-sheet formation, extended by the assumption that the core regions of an aggregate are fully buried. The algorithm accurately predicts the aggregation of a data set of 179 peptides compiled from the literature as well as of a new set of 71 peptides derived from human disease-related proteins, including prion protein, lysozyme and β2-microglobulin. TANGO also correctly predicts pathogenic as well as protective mutations of the Alzheimer β-peptide, human lysozyme and transthyretin, and discriminates between β-sheet propensity and aggregation. The results confirm the model of intermol. β-sheet formation as a widespread underlying mechanism of protein aggregation. Furthermore, the algorithm opens the door to a fully automated, sequence-based design strategy to improve the aggregation properties of proteins of scientific or industrial interest.
-
209Maurer-Stroh, S.; Debulpaep, M.; Kuemmerer, N.; Lopez de la Paz, M.; Martins, I. C.; Reumers, J.; Morris, K. L.; Copland, A.; Serpell, L.; Serrano, L.; Schymkowitz, J. W. H.; Rousseau, F. Exploring the Sequence Determinants of Amyloid Structure Using Position-Specific Scoring Matrices. Nat. Methods 2010, 7, 237– 242, DOI: 10.1038/nmeth.1432209https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhvFGmsbw%253D&md5=788ac031c7946f7d9c7c1f4e8de62a32Exploring the sequence determinants of amyloid structure using position-specific scoring matricesMaurer-Stroh, Sebastian; Debulpaep, Maja; Kuemmerer, Nico; de la Paz, Manuela Lopez; Martins, Ivo Cristiano; Reumers, Joke; Morris, Kyle L.; Copland, Alastair; Serpell, Louise; Serrano, Luis; Schymkowitz, Joost W. H.; Rousseau, FredericNature Methods (2010), 7 (3), 237-242CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)Protein aggregation results in β-sheet-like assemblies that adopt either a variety of amorphous morphologies or ordered amyloid-like structures. These differences in structure also reflect biol. differences; amyloid and amorphous β-sheet aggregates have different chaperone affinities, accumulate in different cellular locations and are degraded by different mechanisms. Further, amyloid function depends entirely on a high intrinsic degree of order. Here we exptl. explored the sequence space of amyloid hexapeptides and used the derived data to build Waltz, a web-based tool that uses a position-specific scoring matrix to det. amyloid-forming sequences. Waltz allows users to identify and better distinguish between amyloid sequences and amorphous β-sheet aggregates and allowed us to identify amyloid-forming regions in functional amyloids.
-
210Walsh, I.; Seno, F.; Tosatto, S. C. E.; Trovato, A. PASTA 2.0: An Improved Server for Protein Aggregation Prediction. Nucleic Acids Res. 2014, 42, W301– 307, DOI: 10.1093/nar/gku399210https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhtFCqs7vF&md5=e5eef7b6922fc7db345b10ff9a14b004PASTA 2.0: an improved server for protein aggregation predictionWalsh, Ian; Seno, Flavio; Tosatto, Silvio C. E.; Trovato, AntonioNucleic Acids Research (2014), 42 (W1), W301-W307CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)The formation of amyloid aggregates upon protein misfolding is related to several devastating degenerative diseases. The propensities of different protein sequences to aggregate into amyloids, how they are enhanced by pathogenic mutations, the presence of aggregation hot spots stabilizing pathol. interactions, the establishing of cross-amyloid interactions between co-aggregating proteins, all rely at the mol. level on the stability of the amyloid cross-beta structure. The authors' redesigned server, PASTA 2.0, provides a versatile platform where all of these different features can be easily predicted on a genomic scale given input sequences. The server provides other pieces of information, such as intrinsic disorder and secondary structure predictions, that complement the aggregation data. The PASTA 2.0 energy function evaluates the stability of putative cross-beta pairings between different sequence stretches. It was re-derived on a larger dataset of globular protein domains. The resulting algorithm was benchmarked on comprehensive peptide and protein test sets, leading to improved, state-of-the-art results with more amyloid forming regions correctly detected at high specificity. The PASTA 2.0 server can be accessed at http://protein.bio.unipd.it/pasta2/.
-
211Bryan, A. W.; Menke, M.; Cowen, L. J.; Lindquist, S. L.; Berger, B. BETASCAN: Probable Beta-Amyloids Identified by Pairwise Probabilistic Analysis. PLoS Comput. Biol. 2009, 5, e1000333, DOI: 10.1371/journal.pcbi.1000333211https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1M3jslaltA%253D%253D&md5=076a6b9a72cda8145ad23af5825d9cc0BETASCAN: probable beta-amyloids identified by pairwise probabilistic analysisBryan Allen W Jr; Menke Matthew; Cowen Lenore J; Lindquist Susan L; Berger BonniePLoS computational biology (2009), 5 (3), e1000333 ISSN:.Amyloids and prion proteins are clinically and biologically important beta-structures, whose supersecondary structures are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Recent work has indicated the utility of pairwise probabilistic statistics in beta-structure prediction. We develop here a new strategy for beta-structure prediction, emphasizing the determination of beta-strands and pairs of beta-strands as fundamental units of beta-structure. Our program, BETASCAN, calculates likelihood scores for potential beta-strands and strand-pairs based on correlations observed in parallel beta-sheets. The program then determines the strands and pairs with the greatest local likelihood for all of the sequence's potential beta-structures. BETASCAN suggests multiple alternate folding patterns and assigns relative a priori probabilities based solely on amino acid sequence, probability tables, and pre-chosen parameters. The algorithm compares favorably with the results of previous algorithms (BETAPRO, PASTA, SALSA, TANGO, and Zyggregator) in beta-structure prediction and amyloid propensity prediction. Accurate prediction is demonstrated for experimentally determined amyloid beta-structures, for a set of known beta-aggregates, and for the parallel beta-strands of beta-helices, amyloid-like globular proteins. BETASCAN is able both to detect beta-strands with higher sensitivity and to detect the edges of beta-strands in a richly beta-like sequence. For two proteins (Abeta and Het-s), there exist multiple sets of experimental data implying contradictory structures; BETASCAN is able to detect each competing structure as a potential structure variant. The ability to correlate multiple alternate beta-structures to experiment opens the possibility of computational investigation of prion strains and structural heterogeneity of amyloid. BETASCAN is publicly accessible on the Web at http://betascan.csail.mit.edu.
-
212Garbuzynskiy, S. O.; Lobanov, M. Y.; Galzitskaya, O. V. FoldAmyloid: A Method of Prediction of Amyloidogenic Regions from Protein Sequence. Bioinformatics 2010, 26, 326– 332, DOI: 10.1093/bioinformatics/btp691212https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhs1Onsrc%253D&md5=54bb87f8753d52c5c9bf8be6e8c86bc9FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequenceGarbuzynskiy, Sergiy O.; Lobanov, Michail Yu.; Galzitskaya, Oxana V.Bioinformatics (2010), 26 (3), 326-332CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation: Amyloidogenic regions in polypeptide chains are very important because such regions are responsible for amyloid formation and aggregation. It is useful to be able to predict positions of amyloidogenic regions in protein chains. Results: Two characteristics (expected probability of hydrogen bonds formation and expected packing d. of residues) have been introduced by us to detect amyloidogenic regions in a protein sequence. We demonstrate that regions with high expected probability of the formation of backbone-backbone hydrogen bonds as well as regions with high expected packing d. are mostly responsible for the formation of amyloid fibrils. Our method (FoldAmyloid) has been tested on a dataset of 407 peptides (144 amyloidogenic and 263 non-amyloidogenic peptides) and has shown good performance in predicting a peptide status: amyloidogenic or non-amyloidogenic. The prediction based on the expected packing d. classified correctly 75% of amyloidogenic peptides and 74% of non-amyloidogenic ones. Two variants (averaging by donors and by acceptors) of prediction based on the probability of formation of backbone-backbone hydrogen bonds gave a comparable efficiency. With a hybrid-scale constructed by merging the above three scales, our method is correct for 80% of amyloidogenic peptides and for 72% of non-amyloidogenic ones. Prediction of amyloidogenic regions in proteins where positions of amyloidogenic regions are known from exptl. data has also been done. In the proteins, our method correctly finds 10 out of 11 amyloidogenic regions.
-
213Goldschmidt, L.; Teng, P. K.; Riek, R.; Eisenberg, D. Identifying the Amylome, Proteins Capable of Forming Amyloid-like Fibrils. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 3487– 3492, DOI: 10.1073/pnas.0915166107213https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXjtFymu74%253D&md5=39dfce19592f6de1c53a6c9469f691d2Identifying the amylome, proteins capable of forming amyloid-like fibrilsGoldschmidt, Lukasz; Teng, Poh K.; Riek, Roland; Eisenberg, DavidProceedings of the National Academy of Sciences of the United States of America (2010), 107 (8), 3487-3492, S3487/1-S3487/13CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)The amylome is the universe of proteins that are capable of forming amyloid-like fibrils. Here we investigate the factors that enable a protein to belong to the amylome. A major factor is the presence in the protein of a segment that can form a tightly complementary interface with an identical segment, which permits the formation of a steric zipper - two self-complementary beta sheets that form the spine of an amyloid fibril. Another factor is sufficient conformational freedom of the self-complementary segment to interact with other mols. Using RNase A as a model system, we validate our fibrillogenic predictions by the 3D profile method based on the crystal structure of NNQQNY and demonstrate that a specific residue order is required for fiber formation. Our genome-wide anal. revealed that self-complementary segments are found in almost all proteins, yet not all proteins form amyloids. The implication is that chaperoning effects have evolved to constrain self-complementary segments from interaction with each other.
-
214Ahmed, A. B.; Znassi, N.; Château, M.-T.; Kajava, A. V. A Structure-Based Approach to Predict Predisposition to Amyloidosis. Alzheimer’s Dementia 2015, 11, 681– 690, DOI: 10.1016/j.jalz.2014.06.007214https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2M%252FksFSqsA%253D%253D&md5=f2f97b3d51ec862bbab6fab75e180239A structure-based approach to predict predisposition to amyloidosisAhmed Abdullah B; Znassi Nadia; Chateau Marie-Therese; Kajava Andrey VAlzheimer's & dementia : the journal of the Alzheimer's Association (2015), 11 (6), 681-90 ISSN:.BACKGROUND: Neurodegenerative diseases and other amyloidoses are linked to the formation of amyloid fibrils. It has been shown that the ability to form these fibrils is coded by the amino acid sequence. Existing methods for the prediction of amyloidogenicity generate an unsatisfactory high number of false positives when tested against sequences of the disease-related proteins. METHODS: Recently, it has been shown that the three-dimensional structure of a majority of disease-related amyloid fibrils contains a β-strand-loop-β-strand motif called β-arch. Using this information, we have developed a novel bioinformatics approach for the prediction of amyloidogenicity. RESULTS: The benchmark results show the superior performance of our method over the existing programs. CONCLUSIONS: As genome sequencing becomes more affordable, our method provides an opportunity to create individual risk profiles for the neurodegenerative, age-related, and other diseases ushering in an era of personalized medicine. It will also be used in the large-scale analysis of proteomes to find new amyloidogenic proteins.
-
215Krogh, A.; Vedelsby, J. Neural Network Ensembles, Cross Validation and Active Learning. In Proceedings of the 7th International Conference on Neural Information Processing Systems (NIPS’94); MIT Press: Cambridge, MA, 1994; pp 231– 238.There is no corresponding record for this reference.
-
216Maclin, R.; Opitz, D. Popular Ensemble Methods: An Empirical Study. J. Artif. Intell. Res. 1999, 11, 169– 198, DOI: 10.1613/jair.614There is no corresponding record for this reference.
-
217Tsolis, A. C.; Papandreou, N. C.; Iconomidou, V. A.; Hamodrakas, S. J. A Consensus Method for the Prediction of “Aggregation-Prone” Peptides in Globular Proteins. PLoS One 2013, 8, e54175, DOI: 10.1371/journal.pone.0054175217https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtlCqsb4%253D&md5=759edce8afae8bbe81b455770c9ab600A consensus method for the prediction of 'aggregation-prone' peptides in globular proteinsTsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.PLoS One (2013), 8 (1), e54175CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)The purpose of this work was to construct a consensus prediction algorithm of 'aggregation-prone' peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the prodn. of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users, for the consensus prediction of amyloidogenic determinants/'aggregation-prone' peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are assocd. with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/soly. in biotechnol. (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins).
-
218Emily, M.; Talvas, A.; Delamarche, C. MetAmyl: A METa-Predictor for AMYLoid Proteins. PLoS One 2013, 8, e79722, DOI: 10.1371/journal.pone.0079722There is no corresponding record for this reference.
-
219Zambrano, R.; Jamroz, M.; Szczasiuk, A.; Pujols, J.; Kmiecik, S.; Ventura, S. AGGRESCAN3D (A3D): Server for Prediction of Aggregation Properties of Protein Structures. Nucleic Acids Res. 2015, 43, W306– 313, DOI: 10.1093/nar/gkv359219https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVymtbjK&md5=4d5a4d94fa0bf2744250860780e2a203AGGRESCAN3D (A3D): server for prediction of aggregation properties of protein structuresZambrano, Rafael; Jamroz, Michal; Szczasiuk, Agata; Pujols, Jordi; Kmiecik, Sebastian; Ventura, SalvadorNucleic Acids Research (2015), 43 (W1), W306-W313CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)Protein aggregation underlies an increasing no. of disorders and constitutes a major bottleneck in the development of therapeutic proteins. Our present understanding on the mol. determinants of protein aggregation has crystd. in a series of predictive algorithms to identify aggregation-prone sites. A majority of these methods rely only on sequence. Therefore, they find difficulties to predict the aggregation properties of folded globular proteins, where aggregation-prone sites are often not contiguous in sequence or buried inside the native structure. The AGGRESCAN3D (A3D) server overcomes these limitations by taking into account the protein structure and the exptl. aggregation propensity scale from the well-established AGGRESCAN method. Using the A3D server, the identified aggregation-prone residues can be virtually mutated to design variants with increased soly., or to test the impact of pathogenic mutations. Addnl., A3D server enables to take into account the dynamic fluctuations of protein structure in soln., which may influence aggregation propensity. This is possible in A3D Dynamic Mode that exploits the CABS-flex approach for the fast simulations of flexibility of globular proteins.
-
220De Baets, G.; Van Durme, J.; van der Kant, R.; Schymkowitz, J.; Rousseau, F. Solubis: Optimize Your Protein. Bioinformatics 2015, 31, 2580– 2582, DOI: 10.1093/bioinformatics/btv162220https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs1Gisr3O&md5=895e1eac4e041610ac951c35a500f7e7Solubis: optimize your proteinDe Baets, Greet; Van Durme, Joost; van der Kant, Rob; Schymkowitz, Joost; Rousseau, FredericBioinformatics (2015), 31 (15), 2580-2582CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)Motivation:Protein aggregation is assocd. with a no. of protein misfolding diseases and is a major concern for therapeutic proteins. Aggregation is caused by the presence of aggregation- prone regions (APRs) in the amino acid sequence of the protein. The lower the aggregation propen- sity of APRs and the better they are protected by native interactions within the folded structure of the protein, the more aggregation is prevented. Therefore, both the local thermodn. stability of APRs in the native structure and their intrinsic aggregation propensity are a key parameter that needs to be optimized to prevent protein aggregation. Results:The Solubis method presented here automates the process of carefully selecting point mutations that minimize the intrinsic aggregation propensity while improving local protein stability.
-
221Van Durme, J.; De Baets, G.; Van Der Kant, R.; Ramakers, M.; Ganesan, A.; Wilkinson, H.; Gallardo, R.; Rousseau, F.; Schymkowitz, J. Solubis: A Webserver To Reduce Protein Aggregation through Mutation. Protein Eng., Des. Sel. 2016, 29, 285– 289, DOI: 10.1093/protein/gzw019221https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs1OntbjM&md5=f2dd5db6195dd37365285f09a44e9c0bSolubis: a webserver to reduce protein aggregation through mutationVan Durme, Joost; De Baets, Greet; Van Der Kant, Rob; Ramakers, Meine; Ganesan, Ashok; Wilkinson, Hannah; Gallardo, Rodrigo; Rousseau, Frederic; Schymkowitz, JoostProtein Engineering, Design & Selection (2016), 29 (8), 285-289CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)Protein aggregation is a major factor limiting the biotechnol. and therapeutic application of many proteins, including enzymes and monoclonal antibodies. The mol. principles underlying aggregation are by now sufficiently understood to allow rational redesign of natural polypeptide sequences for decreased aggregation tendency, and hence potentially increased expression and soly. Given that aggregation-prone regions (APRs) tend to contribute to the stability of the hydrophobic core or to functional sites of the protein, mutations in these regions have to be carefully selected in order not to disrupt protein structure or function. Therefore, we here provide access to an automated pipeline to identify mutations that reduce protein aggregation by reducing the intrinsic aggregation propensity of the sequence (using the TANGO algorithm), while taking care not to disrupt the thermodn. stability of the native structure (using the empirical force-field FoldX). Moreover, by providing a plot of the intrinsic aggregation propensity score of APRs cor. by the local stability of that region in the folded structure, we allow users to prioritize those regions in the protein that are most in need of improvement through protein engineering.
-
Supporting Information
Supporting Information
ARTICLE SECTIONS
The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscatal.8b03613.
Data sets for prediction of protein stability (Table S1); software tools for prediction of protein stability (Table S2); data sets for prediction of protein solubility (Table S3); software tools for prediction of protein solubility (Table S4); comparison of the existing tools with the S350 data set (Table S5) (PDF)
Terms & Conditions
Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.