Download Hi-Res ImageDownload to MS-PowerPointCite This:ACS Catal. 2019, 9, 2, 1033-1054

Computational Design of Stable and Soluble Biocatalysts

Milos Musil

Milos Musil

Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic

IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic

International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic

More by Milos Musil
,
Hannes Konegger

Hannes Konegger

Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic

International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic

More by Hannes Konegger
,
Jiri Hon

Jiri Hon

Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic

IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic

International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic

More by Jiri Hon
,
David Bednar

David Bednar

Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic

International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic

More by David Bednar
, and
Jiri Damborsky*

Jiri Damborsky

Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic

International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic

*E-mail: [email protected]
More by Jiri Damborsky

http://orcid.org/0000-0002-7848-8216

Cite this: ACS Catal. 2019, 9, 2, 1033–1054

Publication Date (Web):December 18, 2018

https://doi.org/10.1021/acscatal.8b03613

Request reuse permissions

Article Views

8055

Altmetric

Citations

LEARN ABOUT THESE METRICS

Article Views are the COUNTER-compliant sum of full text article downloads since November 2008 (both PDF and HTML) across all institutions and individuals. These metrics are regularly updated to reflect usage leading up to the last few days.

Citations are the number of other articles citing this article, calculated by Crossref and updated daily. Find more information about Crossref citation counts.

The Altmetric Attention Score is a quantitative measure of the attention that a research article has received online. Clicking on the donut icon will load a page at altmetric.com with additional details about the score and the social media presence for the given article. Find more information on the Altmetric Attention Score and how the score is calculated.

PDF (3 MB)

Get e-Alerts

Supporting Info (1)»Supporting Information Supporting Information

SUBJECTS:

Get e-Alerts

Abstract

Natural enzymes are delicate biomolecules possessing only marginal thermodynamic stability. Poorly stable, misfolded, and aggregated proteins lead to huge economic losses in the biotechnology and biopharmaceutical industries. Consequently, there is a need to design optimized protein sequences that maximize stability, solubility, and activity over a wide range of temperatures and pH values in buffers of different composition and in the presence of organic cosolvents. This has created great interest in using computational methods to enhance biocatalysts’ robustness and solubility. Suitable methods include (i) energy calculations, (ii) machine learning, (iii) phylogenetic analyses, and (iv) combinations of these approaches. We have witnessed impressive progress in the design of stable enzymes over the last two decades, but predictions of protein solubility and expressibility are scarce. Stabilizing mutations can be predicted accurately using available force fields, and the number of sequences available for phylogenetic analyses is growing. In addition, complex computational workflows are being implemented in intuitive web tools, enhancing the quality of protein stability predictions. Conversely, solubility predictors are limited by the lack of robust and balanced experimental data, an inadequate understanding of fundamental principles of protein aggregation, and a dearth of structural information on folding intermediates. Here we summarize recent progress in the development of computational tools for predicting protein stability and solubility, critically assess their strengths and weaknesses, and identify apparent gaps in data and knowledge. We also present perspectives on the computational design of stable and soluble biocatalysts.

KEYWORDS:

1. Introduction

ARTICLE SECTIONS

Jump To

Nature has developed a remarkable diversity of biochemical reactions that are vital to the continuing evolution of living organisms and the preservation of life. Enzymes are the most prominent catalytic entities in living cells and are collectively capable of catalyzing a vast range of biochemical reactions. The advent of next-generation sequencing together with recent advances in bioinformatics and molecular and structural biology have granted ready access to these rich genetic resources, facilitating the identification of efficient biocatalysts for diverse applications. (1−4) Moreover, the field of protein engineering has matured to a level that allows tailoring of native enzymes for specific practical applications. (5) However, the redesign of an enzyme sequence often imposes unintended secondary effects, frequently reducing the solubility and stability of the target enzyme. (6−9) Strategies for mitigating or eliminating these negative effects include chaperone buffering, (10) chemical modification of the protein structure, (11,12) protein immobilization, (13) medium engineering, (13) the addition of fusion proteins, (14,15) and the introduction of stabilizing or solubilizing mutations by protein engineering. (16−18)

Of particular interest for a mutational strategy is “directed evolution”, which refers to experimental methods that emulate natural evolution by coupling molecular diversity generation to a selection or screening process. However, the immensity of an enzyme‘s sequence space prohibits global evaluation of all possible mutational combinations, (19) frequently causing optimization trajectories to become stuck in evolutionary dead ends. (20,21) This restricts the scope for creating stable and soluble biocatalysts by directed evolution alone and calls for knowledge-guided approaches to navigate the mutational space. (22) Rational protein design strategies can dramatically reduce the experimental effort required for successful directed evolution by consolidating pre-existing information. (23) Semirational strategies that combine directed evolution with structural and sequence data to help identify mutational hotspots amenable to focused screening efforts have been particularly popular recently. (24−26)

This Perspective provides a thorough overview of contemporary data sets and computational protein redesign tools for enhancing enzyme stability or solubility. Preservation of enzymatic activity is of paramount importance in all protein engineering projects. (21,27) However, highly active and stable catalysts are evolutionarily disfavored because they could disrupt the host organism’s homeostatic balance (28) or interfere with the cell’s complicated metabolic regulatory networks. (29,30) Accordingly, several studies have indicated that most natural enzymes operate in a suboptimal regime, (21,28) leaving considerable room for further optimization (Table 1). Unfortunately, activity enhancements often come at the cost of reduced enzyme stability. The protein redesign tools presented here offer ways to avoid this trade-off and also to solubilize the polypeptides, facilitating the purposeful adaptation of natural enzymes. (31) Here we outline the theoretical frameworks of methods commonly used to analyze protein stability and solubility. We also critically review the data sets and software tools available for predictive purposes. This Perspective strives to evaluate the tools from the perspective of users, who are typically interested in accuracy, reliability, user-friendliness, and the strengths and weaknesses of the underlying methods (Table 2). We also present a personal perspective on existing gaps in knowledge and propose possible directions for future development.

Table 1. Selected Experimentally Validated Cases of Successful Computational Redesigns of Stable and Soluble Biocatalysts

Stable Biocatalysts
enzyme \| UniProt ID	substrate	method^j	mutant code	mutations^a	wild-type T_m [°C]	ΔT_m [°C]^b	t_1/2^c	specific activity^c	k_cat/K_m^c	ref
cutinase \| P52956	4-nitrophenyl butyrate	force field	variant 10	7 of 197	62.3	5.7	12.9× (60 °C)	0.64× (25 °C)	n.d.^d	(41)
keratinase \| Q1EM64	keratin	machine learning	quadruple mutant	4 of 379	n.d.	n.d.	8.6× (60 °C)	n.d.	4.11× (40 °C)	(42)
adenylate kinase \| P16304	Mg/ATP, AMP	phylogeny (ASR)	ANC1	66 of 218	53.6	35.4	n.d.	n.d.	1.79× (25 °C)	(43)
β-lactamase \| P62593	benzylpenicillin	phylogeny (CD)	ALL-CON	122 of 262	55.0	23.6	n.d.	n.d.	0.03× (25 °C)	(44)
kemp eliminase \| Q06121	5-nitrobenzisoxazole	phylogeny (CD)^e	R2–4/3D	9 of 247	72.0	10.0	n.d.	n.d.	11.46× (25 °C)	(31)
haloalkane dehalogenase \| P59336	1-iodohexane	hybrid^f	DhaA115	11 of 294	49.0	24.6	200× (60 °C)	0.31× (37 °C)	2.77× (37 °C)	(45)
halohydrin dehalogenase \| Q93D82	rac-p-nitro-2-bromo-1-phenylethanol	hybrid^g	HheC-H12	13 of 253	57.0	25.5	n.d.	n.d.	0.88× (30 °C)	(9)

Soluble Biocatalysts
enzyme \| UniProt ID	substrate	method^j	mutant code	mutations^a	wild-type T_m [°C]	ΔT_m [°C]^b	expr. yield^c	specific activity^c	expr. host	ref
haloalkane dehalogenase \| P59337	1,2-dibromoethane	phylogeny (ASR)	AncHLD2	69 of 317	53.6	21.9	4.8× (20 °C)	1.86× (37 °C)	E. coli	(46)
α-galactosidase \| P06280	α-d-galactose	hybrid^h	A348R/A368P/S405L	3 of 397	n.d.	n.d.	1.4× (37 °C)	2.00× (37 °C)	H. gartleri	(18)
acetylcholinesterase \| P22303	acetylcholine	hybridⁱ	dAChE4	51 of 542	44.0	18.3	2000× (20 °C)	0.89× (25 °C)	E. coli	(47)

^{^a}

Number of introduced mutations and total number of residues.

^{^b}

ΔT_m value of the mutant with respect to the wild-type enzyme.

^{^c}

Fold change in the specified property of the mutant relative to the wild-type enzyme. The temperature at which the given property was measured is given in parentheses.

^{^d}

n.d.: not determined.

^{^e}

Spiked Consensus Design, Directed Evolution.

^{^f}

FireProt: Rosetta, FoldX, Consensus Design.

^{^g}

FRESCO: Rosetta, FoldX, Disufide Bonds, MD.

^{^h}

SOLUBIS: TANGO, FoldX.

^ⁱ

PROSS: Consensus Design, Rosetta.

^{^j}

CD - Consensus Design, ASR - Ancestral Sequence Reconstruction.

Table 2. Advantages and Disadvantages of Methods for the Computational Design of Stable and Soluble Biocatalysts

method	advantages	disadvantages
energy calculations	• granularity of predictions can be adjusted via different force fields	• high computational cost of accurate methods
energy calculations	• web servers make predictions accessible to inexperienced users	• dependence on high-resolution structures
	• ever-growing structural databases together with advances in homology modeling and molecular threading	• trade-offs between stability and activity
	• high accuracy for the prediction of single-point mutations	• predicted stable mutants may not be expressible
		• epistatic effects are not well resolved
machine learning	• very rapid predictions	• lack of balanced high-quality experimental data
machine learning	• easy to implement and use	• limited accuracy of current models
	• wide applicability of features	• risk of overtraining
	• no need to understand all dependencies
	• previously unknown patterns can be discovered
phylogenetics^a	• rich abundance of sequence data	• selection of relevant sequences is nontrivial
	• structures not needed for predictions	• profound understanding of the gene family is required
	• web servers available for certain tasks	• CD: epistatic effects are not considered
	• CD: simple and fast	• ASR: small data set size due to computational costs
	• CD: several filters are available to enhance prediction accuracies	• ASR: requires technical skills and experience
	• ASR: prediction of highly thermostable variants is achievable
	• ASR: sequences of extremophilic proteins are not required
	• ASR: sequence context and epistasis are maintained

^{^a}

CD, consensus design; ASR, ancestral sequence reconstruction.

2. Experimental Framework To Determine Protein Stability and Solubility

ARTICLE SECTIONS

Jump To

2.1. Experimental Determination of Protein Stability

Globular proteins are known to be marginally stable, with free energy differences between the folded and unfolded states (Figure 1) being as low as 5 kcal/mol. (32) Two key concepts in the analysis of protein stability are thermodynamic and kinetic stability. (30,33−35) Thermodynamic stability can be defined on the basis of equilibrium thermodynamics as the Gibbs free energy difference of folding (ΔG). Exact quantification of absolute ΔG values is difficult, (36) so most stability predictors and experimental procedures determine the relative change in free energy (ΔΔG) upon mutation. A commonly used experimental quantity related to ΔΔG is the change in melting temperature (ΔT_m). The melting temperature, T_m, is defined as the temperature at which half of the sample is in the unfolded state, and it can be determined using biophysical techniques (Figure 2) such as circular dichroism spectroscopy (CD), fluorescence spectroscopy (FS), dynamic light scattering (DLS), differential scanning microcalorimetry (DSC), or differential scanning fluorimetry (DSF). (37) The chemical equivalent of T_m is the half-concentration (C_1/2), i.e., the concentration of denaturant at which half the sample exists in the unfolded state. Kinetic stability, on the other hand, is a time-dependent property that is quantified by the height of the free energy barrier of unfolding (ΔG^⧧) separating distinct folding states (Figure 1). Predicting kinetic stability is challenging, (38) and experimentally determined biological half-lives (t_1/2) are preferred to theoretical estimates (Figure 2). The kinetic stability is a key determinant of an enzyme’s functional competence (30) because it is related to the rate at which the protein’s structure is irreversibly altered by proteolysis or aggregation. (29,39,40)

Figure 1. Simplified energy landscape with characteristic conformational states accessible from the native-state ensemble of a folded enzyme. Each point on the plane defined by the X axis and Y axis resembles a different conformation of the enzyme. The corresponding value on the Z axis is the free energy of folding, which has been color-coded to depict the spectrum from less probable high-energy states (red) to more probable low-energy states (blue). The catalytic state is readily accessible from the native-state ensemble but clearly separated by a free energy barrier. Catalysis based on a conformational selection model is assumed, which requires a distinct set of conformations prior to substrate binding and catalysis. (48) A reversible transition from the native state to a partially unfolded state via TS₁ is characterized by the free energy difference of folding ΔG₁ and its free energy barrier ΔG₁^⧧. The partially unfolded state can also constitute the starting point for an irreversible unfolding transition via TS₂, leading to the fully unfolded state. Another irreversible pathway emanating from the partially unfolded state leads to an aggregated state, which is often characterized by the interactions of several biomolecules. ΔG₁ and ΔG₂ relate to thermodynamic stability, while ΔG₁^⧧ and ΔG₂^⧧ relate to kinetic stability.

Figure 2. Representative experimental methods to quantify (a–d) protein stability and (e, f) solubility. Curves for a hypothetical wild-type enzyme (black) and an improved variant exhibiting higher stability or solubility (red) are shown. (a) Differential scanning calorimetry (DSC) curve. T_m is the midpoint of the transition, ΔC_p is the difference between the pre- and post-transition baselines, and ΔH is the area under the curve between the pre- and post-transition baselines. (b) Differential scanning fluorimetry (DSF) curve. Fluorescent dyes progressively bind to exposed hydrophobic regions of unfolding proteins, and the fluorescence signal is detected at different temperatures. T_m corresponds to the midpoint value of the stability curve. (c) Far-UV circular dichroism (CD) curve. Following the change of molar ellipticity at a specific wavelength over a wider temperature range monitors the change in secondary structure of an unfolding protein. The midpoint of the sigmoid curve is related to T_m of the protein. (d) Kinetic deactivation curve. For first-order deactivations, a plot of ln(activity) vs time yields a straight line with a slope of −k. The half-life can be calculated using the equation τ_1/2 = ln(2)/k and hence corresponds to the point (τ_1/2, −0.69) on the fitted line. (e) Protein precipitation experiment. The addition of a precipitant is negatively correlated with the solubility of the folded protein. The parameter β is protein-specific and characterizes the dependence of the solubility on the precipitant concentration. (f) Record from ultracentrifugation. In vitro translation followed by ultracentrifugation allows quantification of protein solubility independent of the proteostatic network of a living cell (the PURE system). The solubility percentage is calculated as the ratio of protein in the supernatant to the total protein measured by autoradiography. (60) Adapted with permission from ref (37). Copyright 2007 Elsevier.

2.2. Experimental Determination of Protein Solubility

Protein solubility is a thermodynamic parameter defined as the concentration of folded protein in a saturated solution that is in equilibrium with a crystalline or amorphous solid phase under given conditions. (49) Two methods can be used to estimate protein solubility in aqueous solutions in vitro: (i) adding lyophilized protein to the solvent and (ii) concentrating a protein solution by ultrafiltration and then estimating the protein fractions in the supernatant and the pellet. Both methods require that the concentration of protein in solution is increased until saturation is reached, which can be difficult to achieve. (49) The difficulties of measuring protein solubility can be alleviated by adding an agent—a precipitant—to reduce the protein’s solubility. Precipitants may be salts, organic solvents, or long-chain polymers.

The term solubility can also be applied to the in vivo observable that describes protein expression quantitatively (expression yield) or qualitatively (soluble/insoluble). Besides the previously given definition of solubility, these two observables critically depend on the expressibility of a given enzyme inside the cell. (50,51) As a polypeptide is synthesized in the ribosome, the emerging chain enters the cell’s highly regulated proteostasis network, (29,35,52) which assists the enzyme to attain its native-state structure. Protein folding does not rely on the random scanning of all accessible conformational states but follows a deterministic folding pathway (53,54) or multiple folding pathways. (55,56) Changes in the protein sequence can perturb such folding pathways, frequently diminishing the expressibility and solubility of an enzyme with a negative impact on its aggregation propensity or the formation of inclusion bodies. (8,9,57,58) One high-throughput in vivo experimental screening assay to test for properly folded enzyme variants is the Split-GFP system. (59) Besides the calculation of the expression yields via the Bradford method and the quantification of mRNA levels of the cells, the PURE system (60) might be a valuable experimental platform to investigate determinants of protein solubility and folding under in vitro conditions (Figure 2).

3. Theoretical Framework for the Design of Robust Proteins

ARTICLE SECTIONS

Jump To

3.1. Principles of Methods Based on Energy Calculations

In silico design of protein stability based on energy calculations has taken a long way from fairly simple (61,62) to more accurate and versatile methods, facilitating reliable high-throughput predictions of thermodynamically and kinetically stable enzymes. (41,63) A force field is a collection of bonded and nonbonded interaction terms (64,65) that are related by a set of equations that can be used to estimate the potential energy of a molecular system. (66) For stability predictions, such potential energy functions can be applied to a protein’s structure to assess the energetic changes caused by the mutations. The most accurate but also the most computationally expensive methods are free energy methods, which rely on molecular dynamics (MD) or Metropolis Monte Carlo simulations. Free energy perturbation has proven to be a potent and rigorous alchemical approach that generates the most meaningful stability predictions, but only for a limited number of mutations. (67) Less accurate but considerably more performant are end-point methods such as molecular mechanics generalized Born (68) or linear interaction energy. (69) These free energy methods require a high level of technical expertise and access to supercomputing facilities, which can be challenging for experimental groups. Over the last 20 years, simpler and simulation-independent stability predictors have been developed. A subdivision into three categories has been proposed, namely, (i) statistical effective energy functions (SEEFs), (ii) empirical effective energy functions (EEEFs), and (iii) physical effective energy functions (PEEFs). (70,71)

SEEFs are fast and can predict changes in stability over the entire sequence space of an average-sized enzyme in a matter of seconds. (72,73) They are derived from curated data sets of folded protein structures, which are projected into a number of stability descriptors. An effective potential can be extracted for every descriptor distribution, and these can be combined to create an overall energy function. (72,74) SEEFs do not explicitly model physical molecular interactions, and the exact physical nature of statistical potentials remains obscure. (71) Consequently, overlapping and double counting of terms relating to the same causative interactions should be avoided. (70) EEEFs include both physical and statistical terms, which are carefully weighted and parametrized to match experimental data. (70,71) The thermodynamic data used in their derivation typically originate from mutational experiments conducted under standard conditions, which can be obtained from databases such as ProTherm. (75−77) EEEFs provide a reasonable compromise between computational cost and accuracy of the free energy function. (78) A major drawback of EEEFs and SEEFs is that their applicability is restricted to the environmental conditions under which the experimental data used for parametrization were acquired. (79,80) PEEFs are closely related to classical molecular mechanics force fields (81,82) and allow a fundamental analysis of molecular interactions. (66) PEEFs have more complex mathematical formalisms (71) and higher computational costs than EEEFs. (70) However, they are versatile, accurate, and capable of predicting behavior of the enzymes under nonstandard conditions, for instance at elevated temperature, nonphysiological pH, or nonstandard salinity. (83)

The accuracies of stability predictors based on such energy functions are still suboptimal (77,79,84−86) because of (i) imbalances in the force fields, (87,88) (ii) insufficient conformational sampling, (85,88) (iii) the occurrence of insoluble species, (8,9) and (iv) intrinsic problems with existing data sets (Table 2). The concept of free energy change upon mutation (ΔΔG) was introduced for a fundamental analysis of the causative factors leading to these deficits. The computation of ΔΔG is based on a thermodynamic cycle (Figure 3), which requires modeling of the folded states of both the wild type and the mutant as well as their unfolded states. (36,67) Contemporary force fields describe enthalpic interactions reasonably well, although they are known to overestimate hydrophobicity and tend to favor nonpolar substitutions. (6,9,89) EEEFs and PEEFs generally underestimate the stability of buried polar residues because they overestimate the energetic cost of unsatisfied salt bridges and hydrogen bonds in the protein core. (58,90,91) The estimation of both conformational and solvent-related entropy is imprecise (9,92) because of the necessity of using computationally less expensive terms. (83) The inability of force field methods to account for entropy-driven contributions can be mitigated by using hybrid methods that incorporate complementary evolution-based approaches. (45,47,92,93) Moreover, most stability predictors have been parametrized using single-point-mutation data sets, resulting in higher prediction errors upon application to multiple-point mutants. (69,94) Whenever epistatic effects (20) are present between two or more individual mutations, force field predictions deviate from experimental results.

Figure 3. Thermodynamic cycle used to compute the free energy change upon mutation (ΔΔG). ΔΔG is calculated according to the formula ΔΔG = ΔG_mut – ΔG_wt = ΔG_f – ΔG_u. For better illustration, the hypothetical folded and unfolded states of the wild type and a two-point mutant are shown. The respective substitution sites have been color-coded in black (wild type) and red (mutant). Adapted with permission from ref (69). Copyright 2012 Wiley.

This shortcoming can be attributed to insufficient conformational sampling of the mutant’s folded state, particularly when the introduced mutations induce large-scale backbone movements. (95) Tools based on EEEFs or PEEFs often apply rotamer libraries to fixed protein backbones, thereby reducing computational costs while providing comparable accuracies for the prediction of single-point mutations. (88) Multistate design (80,96) and flexible backbone sampling techniques (84,97−99) have partly alleviated the sampling problem for multiple-point substitutions by generating conformational ensembles and utilizing energetically more favorable conformations. Enzymes are intrinsically dynamic molecules and populate a high number of heterogeneous conformational substates (100) (Figure 1). Consequently, an adequate treatment of an enzyme’s conformational plasticity (96,97) in the folded states of the wild type and mutant may be crucial for further advances of these methods. Structures obtained by X-ray crystallography do not essentially reflect the global energy minimum of the native state of an enzyme in its natural environment (101) and may therefore be nonideal starting points for stability predictions. (80,102) Besides the folded states, ΔΔG computations rely on sampling of the unfolded states of the wild type and the mutant. Simplifying and less realistic models (random coil or tetrapeptide) are frequently employed for explicit computations of the unfolded-state energies. (68,69) Generally, it is assumed that the free energy of the unfolded state does not change much upon mutation. (68,84)

The aforementioned explanations primarily relate to the prediction of thermodynamic stability. Not much work has been anticipated to predict kinetic stability, which can mostly be explained by the time-dependent nature (30) of this property and the time scales (103) assessable by energy-based methods. However, it is recognized that enhanced thermodynamic stability frequently goes hand in hand with enhanced kinetic stability. (41,45) One energy-based strategy to enhance the kinetic stability of an enzyme is to optimize solvent–solute interactions by introducing surface charges, (104) which can affect its expressibility. (105) The latter property may also be enhanced by computational linker design, (106) providing fusion enzymes with solubilizing protein tags.

3.2. Principles of Methods Based on Machine Learning

Machine learning is a field of computer science that allows computational systems to be constructed without being explicitly programmed. Statistical techniques are used to analyze training data sets and recognize patterns that might be difficult to detect given the limitations of human knowledge and cognitive abilities. Machine learning systems can be trained with or without supervision. In supervised approaches, the system is given a set of example inputs and the corresponding desired outputs in the form of labels indicating the correct classification of each input. Supervised approaches are suitable for training predictive systems, while unsupervised approaches are more suitable for tasks involving data clustering. In recent years, machine learning has become one of the most common approaches for predicting the effects of mutations on protein stability (107−109) and solubility. (57,110) Machine learning does not require full understanding of the mechanistic principles underpinning the target function because they are modeled during the learning process. An important advantage of machine learning methods is that they are very flexible because any characteristic extracted from the data can be used as a feature if it improves the prediction accuracy, i.e., minimizes the prediction error (Table 2). Consequently, machine learning methods can reveal previously unrecognized patterns, relationships, and dependencies that are not considered in knowledge-based models. Moreover, machine learning is much less time-intensive than other methods because once a model has been constructed using the available data, predictions can be obtained almost instantaneously.

The reliability of machine learning approaches depends on the size and quality of the training data set. The weights representing the relative importance of the individual features and the relationships between them are based on experimental observations. Consequently, it is essential to use high-quality experimental data with high consistency when training and testing machine learning methods. The size and balance of the training data set must also be considered carefully. A modest data set with only a few hundred or a few thousand cases might be too small to identify useful descriptors during the learning process. Additionally, lower diversity of the training data set leads to a greater risk that the prediction tool will lose its ability to generalize. In such cases, the weights assigned to individual descriptors might be influenced by over-representation of some descriptors in the training data, while other descriptors that might be very important for general predictive ability could be omitted. Unbalanced training data sets with large differences in the numbers of cases representing individual categories could also lead to erroneous overestimations. For example, a training data set in which 80% of the mutations are destabilizing would allow the predictor to classify most mutations as destabilizing because of the prevalence of such mutations during the learning process. Methods like support vector machines and random forests are known to be more resistant to overfitting caused by unbalanced data sets, (111−113) while standard neural networks and decision trees are particularly sensitive to them. If the data set is too small to be balanced, the problem can be partially addressed by using cost-sensitive matrices, (114) which penalize the predictor more strictly for misclassifying mutations that are sparsely represented in the training data.

In parallel to the issue of the quality and availability of training data, one must address the problem of model validation. Ideally, the validation data set should be balanced and completely independent of the training set. In bioinformatics, it has become common to use k-fold cross-validation as a standard method for testing the performance of newly developed tools. This method entails randomly partitioning the original data set into k subsets. During the learning process, one of the k subsets is used for validation, while the remaining subsets are used as a training data set. This process is performed for each of the k subsets. The main reason for using cross-validation instead of splitting the data set into independent training and validation subsets is that the data set may be too small to support such splitting without harming the model’s ability to learn the important predictive patterns. However, the combination of unbalanced data sets with the random aspect of k-fold cross-validation increases the risk of serious overestimation. Therefore, cross-validation is not a reliable method for measuring model accuracy when lower-quality data sets are used. (115) In conclusion, machine learning is a powerful approach that can reveal unknown interactions that are poorly defined in current force fields (Table 2). However, great care must be taken when constructing the training data set and during validation to avoid overfitting and overestimation of the results.

3.3. Principles of Methods Based on Phylogenetic Analysis

The two most widely used phylogeny-based approaches for stability engineering are consensus design (CD) and ancestral sequence reconstruction (ASR). Continuous cycles of variation and selection have created an enormous diversity of modern-day enzyme sequences that can be processed using phylogenetic techniques (Table 2). Over the last two decades, the advent of next-generation sequencing methods has revolutionized life science but has also introduced new challenges arising from the vast amounts of sequence data that are now available. (116) When phylogenetic analyses are performed, this results in a selection problem: one must carefully decide which sequences to include in any analysis. Identifying suitable homologous sequences to a given target can be particularly challenging. Local alignment algorithms such as the Basic Local Alignment Search Tool (BLAST) (117) offer reasonable accuracy at minimal computational cost. More complex and computationally demanding signature-based and profile-based search algorithms (118−120) have further extended the boundaries of homology detection (121) beyond the twilight zone. (122) The twilight zone is an alignment-length-dependent pairwise sequence identity range above which homologous sequences can reliably be distinguished. When pairwise sequence identities fall within or below this specific range, a large number of false negative sequences will get incorporated into multiple sequence alignments (MSAs). Great care is needed in the construction of biologically relevant MSAs from distantly related homologues. The treatment of nontrivial evolutionary artifacts such as indels, translocations, and inversions within the coding sequence can profoundly affect the quality of an MSA. (123,124) Progressive, iterative, and consistency-based alignment algorithms (125) exclusively consider sequence data and often introduce topological inconsistencies that require manual correction. (126) These deficiencies have been alleviated by incorporating complementary structural or evolutionary information, but such approaches can be computationally demanding. (25,126,127)

CD starts from a set of homologous protein sequences. A genuine MSA is generated using a small number (between a dozen and a few hundred) of homologous sequences, which permits the computation of the frequency distribution of every amino acid position in the alignment. (128) A user-specified conservation threshold is then used to distinguish between ambiguous and conserved “consensus” positions. The core assumption of this method is that the most frequent amino acid at a given position is more likely to be stabilizing. (128−133) It has been noted that high levels of sequence diversity in the MSA can interfere with the preservation of catalytic activity in consensus enzymes; this problem can be particularly acute when the MSA incorporates both prokaryotic and eukaryotic sequences. (129,134) However, the assumption of statistical independence is central to CD. Excessively homogeneous MSAs may violate this assumption, introducing phylogenetic bias that hinders the discovery of more thermostable proteins. (133) The proportions of neutral and destabilizing consensus mutations have been estimated to be 10 and 40%, respectively, among all characterized variants produced using consensus design to date, suggesting a need for a more focused selection of substitution sites. (128,132) To this end, Sullivan et al. (129) discarded mutations of residues with high statistical correlations to other positions in the MSA, thereby increasing the proportion of identified stabilizing mutations to 90%. Vazquez-Figueroa et al. (135) adopted a different approach, successfully using structural information (e.g., the distance between a possible mutation and the active site, secondary structure data, and the total number of intramolecular contacts) to complement traditional CD predictions. Another example of an effective structure-based CD approach involved the analysis of molecular fluctuations based on crystallographic B-factors. (136) Important drawbacks of CD are its inability to account for epistatic interactions (137,138) and an apparent phylogenetic bias in cases where the MSA is dominated by a few subfamilies. (130,139)

ASR is a probabilistic method for inferring primordial enzymes and ancestral mutations, which have proven to be very effective for thermostability engineering. (43,44,46,140) ASR explores the deep evolutionary history of homologous sequences to reassemble a gene’s evolutionary trajectory. (138,141) As a starting point, a phylogenetic gene tree can be inferred from a manually curated MSA and a suitable evolutionary model using either the maximum-likelihood method (142,143) or Bayesian inference. (144) In the simplest case, such statistical inference methods derive parameters from the given MSA for the selected empirical evolutionary model, which defines the underlying amino acid substitution process. Once the gene phylogeny has been established, ancestral sequences corresponding to specific nodes of the tree can be computed, synthesized, overexpressed, and characterized in vitro. In addition to the difficulty of identifying and aligning legitimate sequences, (124) a major challenge encountered in ASR is the computation of a plausible phylogenetic tree that adequately explains the evolutionary relationships of the given sequences. Homogenous evolutionary models assume that amino acid substitutions are homogeneously distributed over time and among sites and are therefore heavily oversimplified models of evolution. (145) Maximum-likelihood methods have been shown to systematically overestimate the thermodynamic stability of deeper ancestors, (140,146) so Bayesian inference methods have been recommended as alternatives to account for this bias. However, Bayesian inference computes ancestral sequences with considerably lower posterior probabilities, sometimes leading to the loss of the biological function. (147) It is not entirely clear why ASR is successful at identifying sequences with improved thermostability. (141) One hypothesis states that its success is an artifact of the ancestral inference methods and resembles a possible bias toward stabilizing consensus sequences. (140,146) Another plausible explanation is based on the thermophilic origin of primordial life. (148,149) Regardless of the reasons for its effectiveness, ASR is clearly a very robust and efficient method for identifying enzyme sequences with high thermodynamic stability and elevated expression yields (Table 2). Furthermore, increases in kinetic stability resulting in higher τ_1/2 have frequently been reported for ancestral enzymes in comparison with their extant forms. (140,150) The sequence context is maintained in the resurrected ancestral enzymes, enabling the conservation of historic mutations causing functionally important epistatic effects. (20,137,138) The fundamental drawbacks of ASR are that users must have considerable methodological skill and a good level of knowledge about the targeted gene family.

4. Data Sets and Software Tools for Designing Stable Proteins

ARTICLE SECTIONS

Jump To

4.1. Data Sets for Protein Stability

The accuracy and reliability of computational methods depends strongly on the size, structure, and quality of the chosen training and validation data sets. The primary source of validation data for protein stability is the ProTherm database. (75) ProTherm is the most extensive freely available database of thermodynamic parameters such as ΔΔG, ΔT_m, and ΔC_p. It currently contains almost 26 000 entries representing both single- and multiple-point mutants of 740 unique proteins. Although ProTherm is the most common source of stability data, it suffers from high redundancy and serious inconsistencies. Particularly troubling are differences in the pH values at which the thermodynamic parameters were determined, missing values, redundancies, and strikingly even disagreements about the signs of ΔΔG values. ProTherm also neglects the existence of intermediate states. (57,107) To overcome the problems of the ProTherm database, the data must be filtered and manually repaired to construct a reliable data set.

Several subsets of the ProTherm database have been developed (Table S1) and used widely to train and validate new prediction tools. The most popular is the freely available PopMuSiC data set, (151) which contains 2648 mutations extracted from the ProTherm database. The data set is unbalanced because only 568 of its mutations are classified as stabilizing or neutral, while 2080 are classified as destabilizing. Furthermore, 755 of its 2648 mutations have reported ΔΔG values in the interval ⟨−0.5, 0.5⟩. Mutations with such ΔΔG values cannot be considered either stabilizing or destabilizing because the average experimental error in ΔΔG measurements is 0.48 kcal/mol. (152) Additionally, the data extracted from ProTherm are insufficiently diverse: around 20% of the PopMuSiC data set comes from a single protein, and 10 proteins (of 131 represented in the data set) account for half of the available data. Inspection of the data reveals that mutations to more hydrophobic residues located on the surface of the protein tend to be stabilizing, whereas mutations that increase the hydrophilicity in the protein core are usually destabilizing. Consequently, most computational tools are likely to identify mutations that increase surface hydrophobicity as stabilizing even though such designs often fail because of poor protein solubility. (58)

Some predictive tools use alternative data sets derived from ProTherm or PopMuSiC for training and validation. The most common benchmarking data set utilized for independent tests is S350, (151) which contains 90 stabilizing and 260 destabilizing mutations in 67 unique proteins. However, this data set is still small for comprehensive evaluation and unbalanced. The recently published PoPMuSiC^sym data set (153) tries to address these issues, containing 342 mutations inserted into 15 wild-type proteins and their inverse mutations inserted into the mutant proteins. A comparative study conducted using this data set showed a bias of the existing tools (Table S2) toward destabilizing mutations, as they performed significantly worse on the set of inverse mutations. Because of the overlaps of the mutations in training and validation data sets, the results of the individual tools can be overestimated. Even the new derivatives of the ProTherm database do not solve the problems arising from the size and structure of the available data. Therefore, there is an urgent need for new experimental data, particularly on the side of stabilizing mutations. Moreover, it would be of immense help for the future development of predictive tools to proceed with the standardization of the stability data, e.g., a unified definition of ΔΔG as a subtraction of the ΔG values for the mutant and the wild type. FireProt DB, a new publicly available database collecting carefully curated protein stability data, is being established at https://loschmidt.chemi.muni.cz/fireprotdb/.

Until the new unbiased data sets arise, a regular accuracy measure considering only the number of correctly predicted mutations from the testing set is not suitable for validation of the predictive tools. For binary classification, the Matthews correlation coefficient (MCC) can be utilized, as it was designed as a balanced measure that is usable even for data sets with a significant difference in the sizes of individual classes. (113) Similarly, when binary predictions are utilized as a filtration step in the hybrid approaches, metrics like sensitivity, specificity, and precision might be useful. When numerical measures are considered, the linear correlation between the predicted and experimental values can be estimated with the use of the Pearson correlation coefficient (PCC) and the average error established as the root-mean-square error (RMSE). Finally, the bias of the computational tools can be estimated as the sum of ΔΔG for the direct and inverse mutations according to Thiltgen and Goldstein. (94) Critical evaluation of the existing tools using the S350 data set revealed that the PCC ranges from 0.29 to 0.81 with an average RMSE of about 1.3 kcal/mol (Table S5).

4.2. Software Tools for Predicting Protein Stability Based on Energy Calculations

Software tools relying on force field calculations are based on either modeling the physical bonds between atoms (PEEFs) or utilizing methods of mathematical statistics (SEEFs). Rosetta (88) is one of the most versatile software suites for macromolecular modeling and consists of several modules. Rosetta Design is a generally applicable module for protein design experiments that evaluates mutations and assigns them scores (in physically detached Rosetta energy units) reflecting their predicted stability. In its newest version, the Rosetta force field converts Rosetta energy units into well-interpretable ΔΔG values. (83) Furthermore, the stand-alone ddg_monomer module was built on top of Rosetta Design and is parametrized specifically for predicting ΔΔG values and protein stability. The Rosetta suite is also supplemented by a wide variety of usable force fields and protocols. The Eris software (154) is based on the Medusa force field and incorporates a side-chain packing algorithm and backbone relaxation method. A similar physical approach is adopted in the Concoord/Poisson–Boltzmann surface area (CC/PBSA) method, (155) which uses the GROMACS force field (156) to evaluate an ensemble of structures initially generated by the Concoord program. (157)

Unlike the previously mentioned methods, in which the values of the individual terms in the force field equation are evaluated by performing calculations based on Newtonian physics, some tools simply fit equations using values derived from the available data. One of the main representatives of this approach is PopMuSiC, (73) whose force field equation includes 13 physical and biochemical terms with values derived from databases of known protein structures. Similar approaches are used by other statistical and empirical tools, including FoldX (78) and Dmutant. (158) Another tool in this class is HotMuSiC, (159) which is based on PopMuSiC and was parametrized specifically for estimating ΔT_m, since the correlation coefficient between ΔΔG and ΔT_m is −0.7. (159) HotMuSiC makes predictions using five temperature-dependent potentials based exclusively on data extracted from mesostable and thermostable proteins.

While PEEFs provide generally more accurate predictions of the effect of mutations on protein stability, there is an apparent trade-off between predictive power and computational demands. In the majority of cases, SEEFs still perform fairly well compared with most machine learning methods and are orders of magnitude faster than PEEFs. Therefore, SEEFs seem to be an acceptable compromise between accuracy and time demands, especially when utilized as filters for prioritization of the mutations in hybrid workflows.

4.3. Software Tools for Predicting Protein Stability Based on Machine Learning

Machine learning methods do not require comprehensive knowledge of the physical forces governing protein structure; their predictions are based exclusively on the available data. The most popular machine learning tools are based on the support vector machines (e.g., EASE-MM, (107) MuStab, (108) I-Mutant, (160) and MuPro (161)) and random forest (e.g., ProMaya (162) and PROTS-RF (163)) methods, which are known to be comparatively resistant to overtraining even when used with unbalanced training data sets (Table S2). Neural networks are rarely used for protein stability engineering because of their high sensitivity to the quality and size of the training data set.

In recent years, several new machine learning approaches have been applied to diverse problems in the field of bioinformatics. Deep learning is used to predict the effects of mutations on human health in DANN (164) and to predict protein secondary structure in SSREDNs. (165) Unfortunately, like regular neural networks, deep learning methods are prone to overfitting because adding extra layers of abstraction increases their ability to model rare dependencies, resulting in a loss of generality. This shortcoming can be addressed by using regularization methods such as Ivakhnenko’s unit pruning. (166,167) However, this does not eliminate problems arising from inadequate training data sets because deep learning has very stringent data requirements. Consequently, deep-learning-based tools such as TopologyNet (168) still have very limited applicability in predicting protein stability.

The robustness and accuracy of computational tools can be increased by combining several machine learning approaches into a single multiagent system, as in the case of MAESTRO. (169) In MAESTRO, neural networks are combined with support vector machines, multiple linear regression, and statistical potentials. The outputs of the individual methods are then averaged to provide users with a single consensus prediction. In such tools, machine learning can be used to train the arbiter that decides how to combine the outputs of the individual methods and their weights, balancing the relative strengths of each method when applied to the type of mutation under consideration. This approach is widely used in metapredictors. (58)

It is difficult to compare individual tools on the basis of the results presented in the publications where they were first reported because most of them were validated using different data sets. This can bias a tool’s performance toward particular proteins or mutation types, causing its general prediction accuracy to be overestimated. Therefore, independent comparative studies are needed. The critical evaluations reported by Kellogg et al., (88) Potapov et al., (77) and Khan and Vihinen (170) revealed that methods based on PEEF calculations systematically outperform tools relying only on machine learning techniques or statistical potentials in independent tests. Furthermore, machine learning methods tend to be more biased, (153,171) and their reported accuracies are overestimated as a result of overtraining. The PCC upper bound for the most commonly used stabilization data sets is about 0.8, and the lower bound of the RMSE is 1 kcal/mol. (172) The applicability of machine learning methods will increase with the size and diversity of the available data in the future.

4.4. Software Tools for Predicting Protein Stability Based on Phylogenetics

Phylogeny-based methods do not require knowledge of high-resolution protein structures; they can be applied to any protein with a known amino acid sequence and a sufficiently high number of sequence homologues. However, although phylogeny-based methods often improve some protein characteristics, the influence of individual mutations manifested during evolution is uncertain. About 50% of all mutations identified by CD are stabilizing, but some may affect protein solubility rather than stability. (131) CD-based methods are therefore frequently utilized as filters during core calculations of hybrid workflows or as components of predictive tools for hotspot identification.

CD is available in several bioinformatics suits (e.g., EMBOSS, (173) 3DM, (25) VectorNTI, (174) and HotSpot Wizard (175)). Although there are no stand-alone tools for CD, there are several for ASR, some using maximum-likelihood methods (e.g., RAxML, (176) FastML, (177) and Ancestors (178)) and others using Bayesian inference (e.g., HandAlign (179) and MrBayes (180)). A major limitation of these methods is that most of the tools require users to upload their own MSA and phylogenetic tree. Constructing these input data is the most important and demanding step of the entire process. To obtain reliable predictions, the initial set of homologue sequences must be filtered to identify a reasonably sized subset of biologically relevant sequences. At present, sets of homologous sequences obtained using BLAST, (117) profile-based methods such as position-specific iterated BLAST, (118) or hidden Markov models (120,181) must be manually curated to ensure reliable ancestral reconstructions.

4.5. Software Tools for Predicting Protein Stability Based on Hybrid Approaches

Hybrid methods make predictions by combining information from several fundamentally different approaches. They offer greater robustness and reliability than individual tools, allowing multiple-point mutants to be designed while reducing the risk of combining mutations with antagonistic effects. Consequently, several research groups are focusing on hybrid methods in their efforts to improve the rational design of thermostable proteins.

The Framework for Rapid Enzyme Stabilization by Computational Libraries (FRESCO) (93) is available as a set of individual tools and scripts, and its use requires a good knowledge of bioinformatics. FRESCO initially selects a pool of potentially stabilizing mutations (FoldX or Rosetta energy cutoff of −5 kJ/mol) and also filters out all residues in close proximity (<10 Å) to active sites. Disulfide bridges are designed by dynamic disulfide discovery using snapshots from MD simulations and subsequently evaluated using the set of geometric criteria. An energy criterion for the maximal molecular mechanics energy of the disulfide bond was also adopted. Furthermore, very short MD simulations predict changes in backbone flexibility upon mutation to remove designs with unreasonable features that are expected to destabilize the protein. About a hundred of the single-point mutants are then subjected to experimental validation to select mutations to be included in the combined multiple-point mutant. Experimental validation of individual mutations greatly reduces the risk of false positives and maximizes the stabilization effect but requires a substantial investment of time and effort.

FireProt (45,89) combines energy- and evolution-based approaches in a fully automated process for designing thermostable multiple-point mutants (Figure 4). FireProt integrates 16 computational tools, utilizing both sequence and structural information in the prediction process. When the energy-based approach is applied, information extracted from the protein sequences (e.g., lists of conserved and correlated residues) is used to exclude potentially deleterious mutations, while structural information is used to obtain estimated ΔΔG values with both FoldX and Rosetta. The second approach is based on back-to-consensus analysis followed by energy filtration using FoldX. Finally, a distance-based graph algorithm is used to create a multiple-point mutant by selecting the most favorable mutually nonconflicting mutations from the pool of all potentially stabilizing mutations. A stand-alone version of FireProt (45) has been implemented as an intuitive web-based application, (89) making this complex modeling workflow accessible to a wide user community. The automation of the whole procedure eliminates the need to select, install, and evaluate tools, optimize their parameters, and interpret intermediate results.

Figure 4. Workflow of the protein thermostabilization platform FireProt. The hybrid method combines evolutionary- and energy-based approaches and designs stable multiple-point mutants by fundamentally different methods. (45) The user is offered three different designs, two based solely on the energy- and evolution-based approaches and a third combining all of the identified mutations. FireProt has been made available as a fully automated and user-friendly web application (89) and is free of charge for academic users at http://loschmidt.chemi.muni.cz/fireprot.

Protein Repair One-Stop Shop (PROSS) (47) is another automated web-based protein stabilization platform. The PROSS workflow begins with a Rosetta design calculation in which the amino acids constituting the protein’s active and binding sites are not eligible for mutation. A position-specific substitution matrix is analyzed to steer the design process away from amino acids that are rarely observed in the sequence homologues, (182) and Rosetta’s computational mutation scanning tool (183) is used to scan the remaining pool of potential amino acid mutations. Finally, Rosetta’s combinatorial sequence design tool is used to find an optimal combination of potentially stabilizing mutations, and an energy function is applied that favors amino acid identities on the basis of their frequency in the multiple-sequence alignment. This phylogeny-based biasing potential allows the designed variants to incorporate mutations found to be neutral or even slightly destabilizing in the Rosetta calculations, (35) which is desirable because some of these mutations might positively influence properties such as catalytic activity or protein solubility.

Hybrid methods represent a step forward in the prediction of protein stability because of their higher reliability at a decreased computational cost. These methods utilize evolution-based approaches as filters for removing potentially deleterious mutations in the conserved or correlated regions of the target protein. Furthermore, hybrid methods identify stabilizing mutations that would be missed by using only force field or phylogeny methods, as these two approaches are often complementary. (92) The increased robustness of the hybrid methods allows for a safer combination of single-point mutations into a multiple-point mutant. Hybrid methods can be further expanded by predictions of protein solubility or catalytic activity.

5. Data Sets and Software Tools for the Design of Soluble Proteins

ARTICLE SECTIONS

Jump To

5.1. Protein Solubility Data Sets

Protein solubility, aggregation propensity, and expressibility are complex properties governed by several distinct biophysical and biological mechanisms. Progress in understanding these mechanisms depends on the availability of large, high-quality, diverse experimental data sets. In addition, the performance of prediction methods must be assessed with respect to the data used during their training. It is therefore important to recognize the strengths and limitations of the available experimental data sets on protein solubility and expressibility. To this end, this section presents a comprehensive review of the data sets available at the time of writing (Table S3).

5.1.1. Protein Solubility Data Sets Based on Full-Length Proteins

Data sources of this type contain information on the solubility of entire proteins produced in a specific expression system, either in vitro using a cell-free expression system or in vivo. Solubility can be determined by separating the liquid component of a sample by centrifugation or filtration and measuring the protein content in a solution, which is normalized by the protein content in the unseparated sample. The normalization removes the relationship between the solubility value and varying protein expression level. Alternatively, proteins may be simply classified as soluble or insoluble.

The Solubility Database of E. coli Proteins (eSOL) (60) contains experimentally measured solubilities for over 4000 E. coli proteins. The solubilities were determined by expressing the proteins using the PURE cell-free expression system (184) and using ultracentrifugation to measure their solubility as the ratio of the protein content in the supernatant to the total protein content of the sample. The limitations of eSOL are that only a moderate number of proteins are represented and that all of them originate from E. coli. In addition, in vitro cell-free expression systems cannot reproduce the post-transcriptional molecular processes that occur during protein expression in vivo. Interestingly, adding the three main cytosolic E. coli chaperones (TF, DnaKJE, and GroEL/GroES) to the in vitro cell-free expression system reduced the number of insoluble proteins from 788 to 24. (185)

TargetTrack, (186) formerly PepcDB or TargetDB, integrates vast amounts of information from the Protein Structure Initiative, a large-scale structure determination project. It contains data from over 900 000 protein crystallization trials using almost 300 000 unique protein sequences, which are termed targets. The database is not focused on solubility, but target proteins can be considered soluble if they reached a particular state in the experimental trial. We note that strictly speaking, this parameter reflects both the expressibility and the solubility of the target proteins. The major drawback of this database is the low quality of the annotations. No reason for failure is recorded for most of the unsuccessful crystallization attempts. Moreover, the experimental protocols are described in free text with no common structure. Therefore, it is difficult to automatically extract information about the expression systems. As a result, the application of strict rules to the target annotations dramatically reduces the number of usable records.

The Northeast Structural Consortium (NESG) (187) database is a subset of TargetTrack containing data on 9644 targets analyzed between 2001 and 2008. The NESG database contains explicit data on protein expression and solubility levels based on uniform protein production in E. coli. Two integer scores are recorded for each target, indicating the protein’s level of expression and the recovery of the soluble fraction. The major drawback of this data set is that it was generated using outdated experimental methods; some of the targets could probably be solubilized using current techniques. Additionally, the database is too small to be used to train new machine learning algorithms. However, it can be used as a high-quality benchmark data set because its explicit experimental observations are more trustworthy than any other data in TargetTrack.

The Human Gene and Protein Database (HGPD) (188) contains expression and solubility measurements on over 9000 human proteins expressed in E. coli, a wheat-germ cell-free expression system, or Brevibacillus. The expression data were obtained using the Gateway system coupled with SDS-PAGE of C-terminal V5- or His-tagged proteins. Like the NESG data, these results originate from a uniform high-throughput protein production pipeline and thus constitute a consistent data set. Moreover, the HGPD provides information at the DNA level, so it includes codon composition data. Its major drawback is that it is focused exclusively on human proteins, so predictors constructed on the basis of its data will have an implicit bias toward human proteins.

AMYPdb (189) contains data on over 12 000 proteins belonging to amyloid precursor families as well as over 6000 generalized sequence patterns useful for assigning new sequences to poorly soluble amyloid precursor families. These data are derived from the literature and by UniProt and PROSITE mining, so they are useful only as training data and for concept verification; they are not suitable for performance validation. This database has not been updated since its release in 2008.

5.1.2. Protein Solubility Data Sets Based on Protein Fragments

Fragment databases often describe properties of short peptides and their tendency to aggregate when exposed to solvent. This tendency does not necessarily correlate with the peptide’s behavior when it is incorporated into a larger globular protein, in which case it may be protected by the formation of a hydrophobic core. Therefore, great care is necessary when using these databases as a basis for solubility prediction.

AmylHex and AmylFrag (190) are literature-based collections of nearly 200 short peptide sequences known to form amyloid fibrils. The major flaws of this database are its strong over-representation (51%) of point variants of a single amyloidogenic hexapeptide (STVIIE) and its low content of data on longer protein fragments.

WALTZ-DB (191) integrates data obtained from the literature and by in-house experimental verification on over 1000 hexapeptides tested for amyloidogenicity. As such, it is a unique resource containing primary experimental data. Of the peptides represented in the data set, 22% were found to be amyloidogenic and 78% were found to be non-amyloidogenic.

AmyLoad (192) combines data collected from WALTZ-DB, AmylHex, AmylFrag, the AGGRESCAN and TANGO validation data sets, and manual reviews of over 90 publications. The data set contains information on almost 1500 amyloidogenic and non-amyloidogenic protein fragments that have been characterized experimentally or computationally. About 30% of the fragments are considered amyloidogenic.

The Human Protein Atlas (HPA) (193) contains data on over 16 000 protein epitope signature tags (PrESTs) that were produced using a uniform E. coli production pipeline. PrESTs are substantial fragments of human proteins ranging from 20 to 150 amino acids. Their expression and solubility were measured and are quantified using integer scores ranging from 0 to 5.

The Curated Protein Aggregation Database (CPAD) (194) is an integrated database that includes data on almost 1700 amyloidogenic protein fragments and aggregation changes upon mutation. The fragments represented in the database include peptides with known and unknown structures, almost 100 verified aggregation-prone regions, and over 2300 aggregation rate changes upon mutation. The database represents a unique resource for validating the effect of mutations on protein aggregation. Unfortunately, it is poorly structured, and the data are not easily downloadable in a machine-friendly format.

5.1.3. Protein Solubility Data Sets Based on Mutants

The existing data sets containing information on protein variants with measured effects on protein solubility are very small and were constructed ad hoc by the authors of prediction software on the basis of literature data. Three representatives of this small group of solubility data sources are OptSolMut, (195) CamSol, (17) and PON-Sol. (57) OptSolMut contains binary solubility data on 137 protein variants, and the amounts of positive and negative samples are nearly balanced. CamSol contains data on 56 protein variants, of which only three are classified as reducing solubility. The PON-Sol data set contains 443 protein variants, of which 222 reportedly have no effect on protein solubility.

5.2. Software Tools for Predicting Protein Solubility

Unlike stability prediction tools, solubility prediction tools differ in their outputs rather than their fundamental operating principles. Almost all solubility prediction tools use some form of machine learning, ranging from simple statistical approaches to modern nonlinear methods such as support vector machines, random forests, or deep neural networks. The tools also use similar sets of features based on amino acid composition and physicochemical properties. Their outputs typically fall into one of three categories: (i) a single solubility score for the entire input sequence, (ii) a solubility profile with a unique score for each amino acid, or (iii) a score reflecting the effect of a specific mutation on solubility. All three outputs are expressed using arbitrary solubility scales with no physical meaning. The following section discusses the available predictive tools and their theoretical underpinnings and critically assesses their reliability (Table S4). Tools that predict single solubility scores for entire protein sequences are most useful for genomic projects because they can help prioritize protein sequences for laboratory production. Conversely, algorithms that provide quantitative scores over fixed-size sequence windows generate solubility profiles that can be used in the rational design of soluble proteins.

5.2.1. Software Tools for Protein Solubility Based on Primary Sequences

One of the first single-score solubility methods was the linear prediction model proposed by Wilkinson and Harrison, (196) which was later simplified by Davis and co-workers. (197) The revised model is surprisingly simple, using only two features (the approximate-charge average and turn-forming residue content) that both measure the relative abundance of specific amino acid types in the sequence. Despite its simplicity, the model can be useful for analyzing certain protein families. For example, it achieved a Spearman correlation coefficient of 0.54 and outperformed several newer tools in the same category (Table S4) when its predictions were compared to experimental data for 20 sequences closely related to a recently characterized haloalkane dehalogenase family. (4)

SOLpro, (198) PROSO II, (199) ccSOL omics, (200) and DeepSol (201) use the TargetTrack database as the source of training data. Consequently, although they use different features and machine learning models, they are quite similar to one another and have many shared strengths and weaknesses. Their most significant drawback is that they do not focus on any one expression system because it is hard to automatically extract expression system data from TargetTrack. Therefore, when validating these tools on a set of proteins expressed in a single expression system (e.g., E. coli), the observed prediction performance might differ significantly from that reported by the tools’ creators. Published results suggest that DeepSol should have the highest prediction accuracy in general. However, this algorithm was created by using deep learning with a moderately sized training set and was validated against a data set representing protein families similar to those included in the training set. Moreover, although good performance is commonly claimed for tools based on TargetTrack, these claims have been strongly questioned. (199,201) In conclusion, the validation of these tools should be evaluated carefully, and further external validation using test sets independent of TargetTrack is needed. Unfortunately, the limitations of the TargetTrack database, from which solubility data can be extracted only via automated parsing, impose a strong performance limit on any tool that relies heavily on its data.

Periscope (202) attempts to predict soluble protein expression in the periplasm of E. coli rather than the cytosol. Although it was trained on a small data set, it was validated against an independent set of proteins and thus might be useful for predicting periplasmatic expression in E. coli.

ESPRESSO (203) estimates protein expression and solubility in both cell-free (wheat germ) and in vivo (E. coli) expression systems. The system has three unique aspects. First, it is based on measured expression and solubility levels of human proteins from the HGPD and thus may be useful for production of human proteins in either of the two relevant expression systems. Second, it offers two types of prediction: property-based and motif-based. The former type resembles the predictions offered by the other machine learning tools in this category. In contrast, motif-based predictions identify positive and negative solubility motifs extracted from the training data. For each negative motif, ESPRESSO suggests point mutations that should turn the negative motif into a positive one, so the tool can be used for the rational design of soluble proteins. Third, ESPRESSO also uses DNA-level features in its property-based method. However, direct verification of its reported performance is currently complicated because the original training and testing data are unavailable.

SoluProt (204) is one of the latest additions to the family of solubility predictors. Its training set is based on the TargetTrack database, (186) which was carefully filtered to keep only targets expressed in E. coli. The negative and positive samples were balanced and equalized with respect to protein length. The independent validation set was derived from the NESG data set. (187) The current version of the tool uses a predictor based on a random forest regression model that employs 36 sequence-based features, including amino acid content, predicted disorder, α-helix and β-sheet content, sequence identity to the Protein Data Bank (PDB), and several aggregated physicochemical properties. SoluProt currently achieves a prediction accuracy of 58.2%, which exceeds that of other currently available tools, and is under active development. An intuitive web interface to the tool will soon be made available to the community at https://loschmidt.chemi.muni.cz/soluprot/.

5.2.2. Software Tools for Predicting Protein Solubility Based on Sequence Profiles

A solubility profile is an abstract construct in which each residue of a given protein sequence is assigned a solubility score that contextually describes its relative contribution to the solubility of the protein as a whole. The solubility scores within a profile may represent aggregation rates or values on an arbitrary scale with no corresponding physical units. In either case, the highest scores represent solubility hotspots. Predictions based on such profiles must be interpreted with care because they rest on a hidden assumption: most profile-predicting methods are trained with data on short linear and unstructured peptides (Table S4), so there is an inherent assumption that the protein of interest is also at least partially unstructured. Therefore, these tools lack specificity when applied to natively folded globular proteins, in which many predicted low-solubility (or aggregation-prone) segments are stabilized by the interactions that maintain the protein’s secondary and tertiary structure. If the protein’s structure or a reasonable homology model is available, it is possible to compensate for these problems by applying structural corrections.

There are several profile-based tools, most of which share at least some concepts and/or training data sets. Zyggregator (205,206) uses a model fitted to the measured aggregation rates of nearly 100 variants of 15 proteins mined from the scientific literature. AGGRESCAN (207) is based on data from a single-codon saturation mutagenesis study of amyloid β 42 protein, in which aggregation rates were measured for 20 protein variants. Because both methods are based on very small data sets, the authors took care to bolster their credibility by applying the models in several case studies.

TANGO, (208) WALTZ, (209) and PASTA (210) predict amyloid plaque formation propensity on the basis of data for short experimentally characterized peptides (mostly hexapeptides). TANGO is the most famous of these tools and has been cited hundreds of times. However, the models used by the newer tools WALTZ and PASTA were inferred from larger experimental data sets, so they are claimed to outperform TANGO. A common concern is that the data sets of amyloidogenic peptides are unbalanced, containing too few non-amyloidogenic fragments (Table S3), which limits the generalizability of predictions obtained with these tools.

BETASCAN, (211) FoldAmyloid, (212) ZipperDB, (213) and ArchCandy (214) learn from experimentally determined structures of amyloidogenic proteins and apply the discovered general concepts at the sequence level. BETASCAN calculates likelihood scores for potential β-strands and strand pairs in sequences based on correlations observed in parallel β-sheets of experimental structures. FoldAmyloid uses the number of contacts per residue and statistics on hydrogen bonds in nearly 4000 PDB structures. In ZipperDB, the input protein is threaded onto a template cross-β spine structure, and the relative threading energy is used to predict amyloidogenicity. ArchCandy evaluates whether a protein segment can fold into β-arcade structures, which are often disease-related, and uses an empirical scoring function to evaluate interactions that disrupt β-arcade formation. These structure-based tools are expected to be inherently more specific than sensitive because structure-derived criteria tend to be relatively strict. When a high sensitivity is required and a structure is available, methods based on short peptides are expected to be more sensitive than structure-based alternatives. It is possible to compensate for false positives by checking the tool’s output against known structures.

Because individual solubility prediction tools have different strengths and weaknesses, efforts have been made to create consensus-based methods that combine multiple tools to mitigate against the weaknesses of individual tools while preserving their strengths. The advantages of consensus methods have been proven both theoretically (215) and empirically. (216) Both AmylPred2 (217) and MetAmyl (218) implement 11 individual methods, including AGGRESCAN, TANGO, and WALTZ. Although the primary publication on AmylPred2 claims superior performance to all of the individual methods, these results should be treated with care because the consensus threshold was validated using the entire data set chosen by the developers. Consequently, there was no independent validation set, and the claimed performance is very likely to be overestimated. MetAmyl uses a specially developed peptide set derived from the WALTZ data set to establish a logistic regression model that integrates the outputs of the individual tools. An evaluation using the AmylPred2 data set indicated that MetAmyl outperformed AmylPred2 despite having been optimized with a different data set. (218) This strongly suggests that MetAmyl performs better than AmylPred2 in general.

5.2.3. Software Tools for Protein Solubility Based on Mutations

While the profile-based tools discussed above can be used to design solubilizing mutations, the methods described in this section are tailored for this purpose and therefore are easier to use. Importantly, most of the methods discussed here require a protein structure as an additional input (Table S4).

OptSolMut (195) uses the concepts from computational geometry to define a scoring function reflecting the changes in solubility due to mutations. The scoring function was optimized using linear programming on the basis of a set of protein variants extracted from the literature. The reported 81% overall accuracy should be taken with care, as the training set was small and the model might not generalize well. In contrast to other tools in this section, OptSolMut is able to predict the effect of multiple-point mutations.

Several tools for predicting the effect of mutations on solubility have been developed from tools for predicting solubility profiles. For example, CamSol, (17) AGGRESCAN3D, (219) SolubiS, (220,221) and SODA (110) are based on the previously published profile-based methods Zyggregator, AGGRESCAN, TANGO, and PASTA, respectively. The workflows of these tools are all very similar: first a solubility profile is predicted, then a correction based on knowledge of the protein’s structure is applied, and finally solubility hotspots are identified and specific mutations targeting low-solubility regions are suggested. CamSol, AGGRESCAN3D, and SODA use structural corrections to refine the predicted solubility profiles by averaging physicochemical properties over residues proximal in three-dimensional space or on the basis of solvent exposure of individual residues. SolubiS uses free energy calculations based on the FoldX force field to avoid potentially destabilizing mutations in aggregation-prone regions and can thus be classified as a hybrid method (Figure 5). CamSol and SODA can make predictions even without structural data. However, this necessarily eliminates the potential to exploit structure-based corrections and thus tends to reduce the prediction accuracy. The main issue with all of these tools is in the difficulty of validating their output. The data sets available for both training and testing are small, and they have only been validated using data for a small number of experimentally characterized protein variants.

Figure 5. Workflow of the protein solubilization platform SolubiS. The platform uses free energy calculations performed with FoldX to avoid potentially destabilizing mutations in aggregation-prone regions identified by TANGO. The results are presented in form of a mutant aggregation and stability spectrum plot. (220) The web server is free of charge for academic users at http://solubis.switchlab.org/.

PON-Sol (57) uses a machine learning algorithm designed from scratch for solubility prediction of protein variants from protein sequences without structure-based corrections. The reported accuracy of this three-class classification method is 43%. The training data set was rather limited, representing a few tens of proteins.

6. Perspectives

ARTICLE SECTIONS

Jump To

Protein Structures from Cryoelectron Microscopy and Hardware-Accelerated Calculations

Access to large and diverse data sets is a key factor in the development of new predictive methods and tools. Therefore, the applicability of force field methods to stability prediction is limited by the availability of relevant tertiary structures. At present, the PDB contains over 77 000 unique protein structures, and around 10 000 new structures are added each year. Advances in structural genomics will provide access to an additional large pool of protein structures, including previously unattainable structures of membrane-bound proteins that will be solved by cryogenic electron microscopy. A tertiary structure of a biomolecule of interest is typically required for predictions employing energy calculations. The general applicability of these methods is also hindered by their computational cost, which imposes a trade-off between accuracy and throughput. The most precise alchemical free energy calculations rely on MD simulations in which both the solute and solvent are modeled atomistically. Such calculations are too costly to be used in systematic mutagenesis campaigns with currently available computational resources. However, they could be selectively used to design mutations whose effects are poorly predicted by otherwise reliable Rosetta or FoldX calculations (e.g., substitutions that change the charge at the protein surface). Their high computational cost could be alleviated by adopting computing employing graphics processing units (GPUs), which has not yet been implemented in a number of software tools. Wider use of GPUs will enable predictions of structures and complexes that are currently too large to process using computationally demanding physical force fields.

Consistent and Balanced Stability Data Sets Are Urgently Needed

Machine learning techniques are faster than force field methods and less dependent on the availability of tertiary structures because many features used in machine-learning-based predictors can be extracted from primary sequences. However, machine learning methods are very sensitive to the size and quality of the experimental data sets available for training and validation. At present, there is a serious lack of reliable experimental data suitable for use in protein stabilization efforts. The only available database—ProTherm—is burdened by errors and contains data on fewer than 2000 single-point mutations after rigorous filtering. This number is insufficient to train reliable machine learning systems without introducing a risk of overfitting. Moreover, the ProTherm database was most recently updated in February 2013, and several protein stabilization projects have been conducted since then. Systematic mining of the scientific literature to incorporate the stability data from these projects could provide valuable data resources for the training and validation of stability predictors. A new database, FireProt DB, is being established for this purpose at https://loschmidt.chemi.muni.cz/fireprotdb/. The research community should make an effort to establish validation procedures to assess the quality of predictions of protein stability and solubility. This could be done by releasing design challenges, but not experimental data, as in the well-known Critical Assessment of Protein Structure Prediction. Such a community-wide assessment is one of the most efficient ways to compare individual tools.

The Shift from Scores to Profiles and Specific Mutations in Solubility Predictions

The problem of unbalanced data sets also affects solubility predictors based on machine learning, especially those that use k-mer content and physicochemical properties as dominant features. The imbalance of the training data sets containing a larger number of negative samples and low diversity of protein structures limit the predictive performance and generalizability to unseen protein families. Over the short history of solubility prediction, there has been a significant and positive shift away from methods that provide single solubility scores toward alternatives that offer more detailed solubility profile predictions and even suggest mutations predicted to enhance protein solubility. However, this trend also poses problems because the quantity of relevant high-quality data decreases as the detail of the predictions increases. For single solubility score predictions, the TargetTrack database (which contains information on tens of thousands of samples) is large enough to support the development of machine learning models. For solubility profile predictions, the number of relevant samples decreases to hundreds or thousands, most of which are amyloidogenic peptides. Matters are worse still for attempts to predict the effect of mutations on protein solubility; in this case, the amount of relevant experimental data is arguably below the minimum needed to make adequate predictions. Therefore, mathematical models developed by machine learning frequently incorporate empirical components such as structure-based corrections. A mechanistic understanding of protein solubility justified by robust statistical analysis can only be expected once larger sets of experimental data become available.

High-Throughput Techniques for Highly Consistent Data Sets

We envisage that the lack of appropriate data for solubility prediction will be partially addressed by studies using novel high-throughput characterization techniques such as droplet microfluidics, fluorescence-activated cell sorting, fluorescence resonance energy transfer, deep sequencing, and deep mutational scanning. Experiments should be conducted under strictly controlled conditions to produce robust data and could employ one or more of the biomolecular and cellular systems that have recently been developed to monitor protein solubility and aggregation inside living cells. Additional high-quality data could be obtained from projects conducted by companies and other private organizations. The data generated under defined conditions need to be properly annotated, for example to report vectors, host organisms, buffers, laboratory conditions, and procedures used for protein expression, purification, and characterization. Proper controls should always be included and the statistics reported to allow a quantitative assessment of data variation. Collected data should be structured to allow processing using computers, which is for example not the case for the largest database of protein solubility data, TargetTrack. The data should be curated and stored in publicly accessible databases following the FAIR principles: Findable, Accessible, Interoperable, and Reusable. New data sets will enable the use of more sophisticated and data-intensive methods such as deep learning and allow proper external validation to be performed. Moreover, because solubility depends largely on the properties of the protein’s surface, corrections based on protein structure and the inclusion of structural data in predictive tools could improve the prediction accuracy. Enhanced-sampling MD simulations of simplified molecular systems might reveal residue interactions that are important for protein folding, while advances in homology modeling and threading can complement sequence-based descriptors by providing structural information at a reasonable computational cost.

Robust Scaffolds for Directed Evolution by Phylogenetic Analyses

Whereas force field and machine learning methods are limited by a lack of data, the problem for phylogenetic approaches is different: high-throughput sequencing has made vast numbers of sequences available, allowing evolutionary analyses to be performed for the vast majority of protein families. The genomes of organisms living under extreme conditions are also becoming available, providing essential information for wider use of CD. This rapid expansion of the accessible sequence space has a downside for the ASR method, which can only use a limited number of homologous sequences for reconstruction. Therefore, large pools of potential homologues make sequence selection a challenging task. Homologue selection can be guided by annotation ontologies (e.g., molecular function, cellular component, and biological process) and other information from bioinformatics and biophysical databases. Furthermore, with increasing numbers of solved protein structures, structure-guided MSAs may displace sequence-based alternatives, and ASR may be more commonly used to generate robust scaffolds for directed evolution campaigns and de novo enzyme design. The degree of uncertainty in ASR increases the further back we go in evolutionary history. Therefore, the reliability of inference methods should be increased to more accurately predict folded, stable, and soluble ancestral proteins.

Addressing Stability–Activity Trade-Offs Using Metadata and Negative and Multistate Designs

The predictive power of computational methods has improved in recent years, with a positive impact mainly in the area of protein stabilization. A very challenging but important task is to predict thermodynamic as well as kinetic stability. There are several spectacular examples illustrating the improvement in kinetic stability by only a few mutations, but to the best of our knowledge, methods specifically targeting kinetic stability have not been developed. Connecting the design of kinetic stability with solubility within a single method could be particularly powerful. Stability–activity trade-offs are intrinsic to protein structures. Buried polar catalytic residues are suboptimal with respect to protein stability, and structural optimization of these functionally relevant regions is likely to also affect the biological activity. Mutations that stabilize regions whose conformational dynamics are important for enzyme activity can similarly be expected to negatively affect the catalytic performance. The incorporation of metadata and smart filters into engineering workflows will help preserve protein activity by enabling the identification of structurally and functionally important residues, which should be systematically excluded from mutagenesis. The incorporation of such negative designs will suppress misfolding and protein aggregation. Furthermore, prediction accuracy is sometimes compromised by using a single structure in calculations. Increasing computational power and the use of GPU hardware will allow the adoption of multistate designs. Extracting multiple representative conformations and averaging results over the ensemble will further improve the robustness and accuracy of predictions.

Enhancing Accuracy by Using Metapredictors, Consensual Force Fields, and Hybrid Methods

There is a clear trend toward combining multiple fundamentally different methods within single predictors, leading to the development of metapredictors, consensual force fields, and hybrid methods. Hybrid methods offer several advantages: (i) even a simple majority voting approach over several methods yields better results than any individual method, each of which has its own strengths and weaknesses; (ii) smart filtering out of “untouchable” residues reduces the time required for calculations to a degree that permits very thorough analysis of the designable residues; (iii) the phylogenetic components of hybrid methods can incorporate both positive and negative design elements; and (iv) the availability of reliable predictions will enable the combination of substitutions to create multiple-point mutants without risking the introduction of destabilizing or antagonistic effects. Hybrid methods represent a natural step forward in the rapidly evolving field of protein stability prediction because improvements in machine learning models are limited by the availability of adequate data sets, while the application of advanced force field methods is restrained by their computational cost. It was recently demonstrated that combining phylogenetic methods and atomistic force fields can effectively optimize stability–activity trade-offs. We also envisage the future enrichment of protein stabilization methods addressing both thermodynamic and kinetic stability with tools for predicting protein solubility, aggregation propensity, and expressibility, eventually yielding all-in-one software suites capable of designing “ideal” biocatalysts.

Supporting Information

ARTICLE SECTIONS

Jump To

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscatal.8b03613.

Data sets for prediction of protein stability (Table S1); software tools for prediction of protein stability (Table S2); data sets for prediction of protein solubility (Table S3); software tools for prediction of protein solubility (Table S4); comparison of the existing tools with the S350 data set (Table S5) (PDF)

cs8b03613_si_001.pdf (712.72 kb)

Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

Author Information

ARTICLE SECTIONS

Jump To

Corresponding Author
- Jiri Damborsky - Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic; http://orcid.org/0000-0002-7848-8216; Email: [email protected]
Authors
- Milos Musil - Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic; IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic; International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
- Hannes Konegger - Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
- Jiri Hon - Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic; IT4Innovations Centre of Excellence, Faculty of Information Technology, Brno University of Technology, 612 66 Brno, Czech Republic; International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
- David Bednar - Loschmidt Laboratories, Centre for Toxic Compounds in the Environment (RECETOX), and Department of Experimental Biology, Faculty of Science, Masaryk University, 625 00 Brno, Czech Republic; International Clinical Research Center, St. Anne’s University Hospital, Pekarska 53, 656 91 Brno, Czech Republic
Notes
The authors declare no competing financial interest.

Acknowledgments

ARTICLE SECTIONS

Jump To

The authors thank the Czech Ministry of Education (LM2015051, LM2015047, LM2015055, CZ.02.1.01/0.0/0.0/16_013/0001761, CZ.02.1.01/0.0/0.0/16_019/0000868, and CZ.02.1.01/0.0/0.0/16_026/0008451) and the European Commission (720776 and 722610) for financial support. H.K. is the MSCA ITN ES-Cat Research Fellow supported by the European Commission (722610). The work of M.M. and J.H. was supported by the ICT Tools, Methods and Technologies for Smart Cities Project of the Brno University of Technology (FIT-S-17-3964).

References

ARTICLE SECTIONS

Jump To

This article references 221 other publications.

1
Choi, J.-M.; Han, S.-S.; Kim, H.-S. Industrial Applications of Enzyme Biocatalysis: Current Status and Future Aspects. Biotechnol. Adv. 2015, 33, 1443– 1454, DOI: 10.1016/j.biotechadv.2015.02.014

Google Scholar

1
Industrial applications of enzyme biocatalysis: Current status and future aspects

Choi, Jung-Min; Han, Sang-Soo; Kim, Hak-Sung

Biotechnology Advances (2015), 33 (7), 1443-1454CODEN: BIADDD; ISSN:0734-9750. (Elsevier)

A review. Enzymes are the most proficient catalysts, offering much more competitive processes compared to chem. catalysts. The no. of industrial applications for enzymes has exploded in recent years, mainly owing to advances in protein engineering technol. and environmental and economic necessities. Herein, we review recent progress in enzyme biocatalysis, and discuss the trends and strategies that are leading to broader industrial enzyme applications. The challenges and opportunities in developing biocatalytic processes are also discussed.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXksV2gsL4%253D&md5=024eb6961205da41328dd5b1c3b19244
2
Mitchell, A. C.; Briquez, P. S.; Hubbell, J. A.; Cochran, J. R. Engineering Growth Factors for Regenerative Medicine Applications. Acta Biomater. 2016, 30, 1– 12, DOI: 10.1016/j.actbio.2015.11.007

Google Scholar

2
Engineering growth factors for regenerative medicine applications

Mitchell, Aaron C.; Briquez, Priscilla S.; Hubbell, Jeffrey A.; Cochran, Jennifer R.

Acta Biomaterialia (2016), 30 (), 1-12CODEN: ABCICB; ISSN:1742-7061. (Elsevier Ltd.)

Growth factors are important morphogenetic proteins that instruct cell behavior and guide tissue repair and renewal. Although their therapeutic potential holds great promise in regenerative medicine applications, translation of growth factors into clin. treatments has been hindered by limitations including poor protein stability, low recombinant expression yield, and suboptimal efficacy. This review highlights current tools, technologies, and approaches to design integrated and effective growth factor-based therapies for regenerative medicine applications. The first section describes rational and combinatorial protein engineering approaches that have been utilized to improve growth factor stability, expression yield, biodistribution, and serum half-life, or alter their cell trafficking behavior or receptor binding affinity. The second section highlights elegant biomaterial-based systems, inspired by the natural extracellular matrix milieu, that have been developed for effective spatial and temporal delivery of growth factors to cell surface receptors. Although appearing distinct, these two approaches are highly complementary and involve principles of mol. design and engineering to be considered in parallel when developing optimal materials for clin. applications. Growth factors are promising therapeutic proteins that have the ability to modulate morphogenetic behaviors, including cell survival, proliferation, migration and differentiation. However, the translation of growth factors into clin. therapies has been hindered by properties such as poor protein stability, low recombinant expression yield, and non-physiol. delivery, which lead to suboptimal efficacy and adverse side effects. To address these needs, researchers are employing clever mol. and material engineering and design strategies to both improve the intrinsic properties of growth factors and effectively control their delivery into tissue. This review highlights examples of interdisciplinary tools and technologies used to augment the therapeutic potential of growth factors for clin. applications in regenerative medicine.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhvFWlur%252FK&md5=d04bfadd38b7372647cdb2fb406331bc
3
Dvořák, P.; Nikel, P. I.; Damborský, J.; de Lorenzo, V. Bioremediation 3.0: Engineering Pollutant-Removing Bacteria in the Times of Systemic Biology. Biotechnol. Adv. 2017, 35, 845– 866, DOI: 10.1016/j.biotechadv.2017.08.001

Google Scholar

3
Bioremediation 3.0: Engineering pollutant-removing bacteria in the times of systemic biology

Dvorak, Pavel; Nikel, Pablo I.; Damborsky, Jiri; de Lorenzo, Victor

Biotechnology Advances (2017), 35 (7), 845-866CODEN: BIADDD; ISSN:0734-9750. (Elsevier)

Elimination or mitigation of the toxic effects of chem. waste released to the environment by industrial and urban activities relies largely on the catalytic activities of microorganisms-specifically bacteria. Given their capacity to evolve rapidly, they have the biochem. power to tackle a large no. of mols. mobilized from their geol. repositories through human action (e.g., hydrocarbons, heavy metals) or generated through chem. synthesis (e.g., xenobiotic compds.). Whereas naturally occurring microbes already have considerable ability to remove many environmental pollutants with no external intervention, the onset of genetic engineering in the 1980s allowed the possibility of rational design of bacteria to catabolize specific compds., which could eventually be released into the environment as bioremediation agents. The complexity of this endeavour and the lack of fundamental knowledge nonetheless led to the virtual abandonment of such a recombinant DNA-based bioremediation only a decade later. In a twist of events, the last few years have witnessed the emergence of new systemic fields (including systems and synthetic biol., and metabolic engineering) that allow revisiting the same environmental pollution challenges through fresh and far more powerful approaches. The focus on contaminated sites and chems. has been broadened by the phenomenal problems of anthropogenic emissions of greenhouse gases and the accumulation of plastic waste on a global scale. In this article, we analyze how contemporary systemic biol. is helping to take the design of bioremediation agents back to the core of environmental biotechnol. We inspect a no. of recent strategies for catabolic pathway construction and optimization and we bring them together by proposing an engineering workflow.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtlOrt7zO&md5=fdd614f3970f7ea580fd20affbbd4f50
4
Vanacek, P.; Sebestova, E.; Babkova, P.; Bidmanova, S.; Daniel, L.; Dvorak, P.; Stepankova, V.; Chaloupkova, R.; Brezovsky, J.; Prokop, Z.; Damborsky, J. Exploration of Enzyme Diversity by Integrating Bioinformatics with Expression Analysis and Biochemical Characterization. ACS Catal. 2018, 8, 2402– 2412, DOI: 10.1021/acscatal.7b03523

Google Scholar

4
Exploration of enzyme diversity by integrating bioinformatics with expression analysis and biochemical characterization

Vanacek, Pavel; Sebestova, Eva; Babkova, Petra; Bidmanova, Sarka; Daniel, Lukas; Dvorak, Pavel; Stepankova, Veronika; Chaloupkova, Radka; Brezovsky, Jan; Prokop, Zbynek; Damborsky, Jiri

ACS Catalysis (2018), 8 (3), 2402-2412CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)

Millions of protein sequences are being discovered at an incredible pace, representing an inexhaustible source of biocatalysts. Here, we describe an integrated system for automated in silico screening and systematic characterization of diverse family members. The workflow consists of (i) identification and computational characterization of relevant genes by sequence/structural bioinformatics, (ii) expression anal. and activity screening of selected proteins, and (iii) complete biochem./biophys. characterization and was validated against the haloalkane dehalogenase family. The sequence-based search identified 658 potential dehalogenases. The subsequent structural bioinformatics prioritized and selected 20 candidates for exploration of protein functional diversity. Out of these 20, the expression anal. and the robotic screening of enzymic activity provided 8 sol. proteins with dehalogenase activity. The enzymes discovered originated from genetically unrelated Bacteria, Eukaryota, and also Archaea. Overall, the integrated system provided biocatalysts with broad catalytic diversity showing unique substrate specificity profiles, covering a wide range of optimal operational temp. from 20 to 70 °C and an unusually broad pH range from 5.7 to 10. We obtained the most catalytically proficient native haloalkane dehalogenase enzyme to date (kcat/K0.5 = 96.8 mM-1s-1), the most thermostable enzyme with melting temp. 71 °C, three different cold-adapted enzymes showing dehalogenase activity at near-to-zero temps., and a biocatalyst degrading the warfare chem. sulfur mustard. The established strategy can be adapted to other enzyme families for exploration of their biocatalytic diversity in a large sequence space continuously growing due to the use of next-generation sequencing technologies.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit1Wiur0%253D&md5=d6ecaacad16a14d9020c4c22df0220f6
5
Bornscheuer, U. T.; Huisman, G. W.; Kazlauskas, R. J.; Lutz, S.; Moore, J. C.; Robins, K. Engineering the Third Wave of Biocatalysis. Nature 2012, 485, 185– 194, DOI: 10.1038/nature11117

Google Scholar

5
Engineering the third wave of biocatalysis

Bornscheuer, U. T.; Huisman, G. W.; Kazlauskas, R. J.; Lutz, S.; Moore, J. C.; Robins, K.

Nature (London, United Kingdom) (2012), 485 (7397), 185-194CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)

A review. Over the past ten years, scientific and technol. advances have established biocatalysis as a practical and environmentally friendly alternative to traditional metallo- and organocatalysis in chem. synthesis, both in the lab. and on an industrial scale. Key advances in DNA sequencing and gene synthesis are at the base of tremendous progress in tailoring biocatalysts by protein engineering and design, and the ability to reorganize enzymes into new biosynthetic pathways. To highlight these achievements, here we discuss applications of protein-engineered biocatalysts ranging from commodity chems. to advanced pharmaceutical intermediates that use enzyme catalysis as a key step.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmvVeqsLk%253D&md5=5f20c530c25ea886f5f5d33dbea0075a
6
Tokuriki, N.; Stricher, F.; Serrano, L.; Tawfik, D. S. How Protein Stability and New Functions Trade Off. PLoS Comput. Biol. 2008, 4, e1000002, DOI: 10.1371/journal.pcbi.1000002

Google Scholar

6
How protein stability and new functions trade off

Tokuriki Nobuhiko; Stricher Francois; Serrano Luis; Tawfik Dan S

PLoS computational biology (2008), 4 (2), e1000002 ISSN:.

Numerous studies have noted that the evolution of new enzymatic specificities is accompanied by loss of the protein's thermodynamic stability (DeltaDeltaG), thus suggesting a tradeoff between the acquisition of new enzymatic functions and stability. However, since most mutations are destabilizing (DeltaDeltaG>0), one should ask how destabilizing mutations that confer new or altered enzymatic functions relative to all other mutations are. We applied DeltaDeltaG computations by FoldX to analyze the effects of 548 mutations that arose from the directed evolution of 22 different enzymes. The stability effects, location, and type of function-altering mutations were compared to DeltaDeltaG changes arising from all possible point mutations in the same enzymes. We found that mutations that modulate enzymatic functions are mostly destabilizing (average DeltaDeltaG = +0.9 kcal/mol), and are almost as destabilizing as the "average" mutation in these enzymes (+1.3 kcal/mol). Although their stability effects are not as dramatic as in key catalytic residues, mutations that modify the substrate binding pockets, and thus mediate new enzymatic specificities, place a larger stability burden than surface mutations that underline neutral, non-adaptive evolutionary changes. How are the destabilizing effects of functional mutations balanced to enable adaptation? Our analysis also indicated that many mutations that appear in directed evolution variants with no obvious role in the new function exert stabilizing effects that may compensate for the destabilizing effects of the crucial function-altering mutations. Thus, the evolution of new enzymatic activities, both in nature and in the laboratory, is dependent on the compensatory, stabilizing effect of apparently "silent" mutations in regions of the protein that are irrelevant to its function.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1czgtFWksg%253D%253D&md5=ade16cd7f3f47d20357654c6b85ce338
7
Dellus-Gur, E.; Toth-Petroczy, A.; Elias, M.; Tawfik, D. S. What Makes a Protein Fold Amenable to Functional Innovation? Fold Polarity and Stability Trade-Offs. J. Mol. Biol. 2013, 425, 2609– 2621, DOI: 10.1016/j.jmb.2013.03.033

Google Scholar

7
What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offs

Dellus-Gur, Eynat; Toth-Petroczy, Agnes; Elias, Mikael; Tawfik, Dan S.

Journal of Molecular Biology (2013), 425 (14), 2609-2621CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)

Protein evolvability includes 2 elements, robustness (or neutrality, mutations having no effect) and innovability (mutations readily inducing new functions). How are these 2 conflicting demands bridged. Does the ability to bridge them relate to the observation that certain folds, such as TIM barrels, accommodate numerous functions, whereas other folds support only one. Here, the authors hypothesized that the key to innovability is polarity, an active site composed of flexible, loosely packed loops alongside a well-sepd., highly ordered scaffold. The authors showed that highly stabilized variants of TEM-1 β-lactamase exhibited selective rigidification of the enzyme's scaffold while the active site loops maintained their conformational plasticity. Polarity therefore results in stabilizing, compensatory mutations not trading off, but instead promoting the acquisition of new activities. Indeed, computational anal. indicated that in folds that accommodate only one function throughout evolution, e.g., dihydrofolate reductase, ≥60% of the active site residues belonged to the scaffold. In contrast, folds assocd. with multiple functions such as the TIM barrel showed high scaffold-active site polarity (∼20% of the active site comprised scaffold residues) and >2-fold higher rates of sequence divergence at active site positions. Thus, this work suggests structural measures of fold polarity that appear to be correlated with innovability, thereby providing new insights regarding protein evolution, design, and engineering.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmtV2rsbs%253D&md5=3c572ecbdb0b1d5ace71fd67257be136
8
Johansson, K. E.; Johansen, N. T.; Christensen, S.; Horowitz, S.; Bardwell, J. C. A.; Olsen, J. G.; Willemoës, M.; Lindorff-Larsen, K.; Ferkinghoff-Borg, J.; Hamelryck, T.; Winther, J. R. Computational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone Template. J. Mol. Biol. 2016, 428, 4361– 4377, DOI: 10.1016/j.jmb.2016.09.013

Google Scholar

8
Computational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone Template

Johansson, Kristoffer E.; Johansen, Nicolai Tidemand; Christensen, Signe; Horowitz, Scott; Bardwell, James C. A.; Olsen, Johan G.; Willemoes, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R.

Journal of Molecular Biology (2016), 428 (21), 4361-4377CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)

Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 exptl. detd. template structures and generated 120 designs from each. For exptl. evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce sol. protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, sol. proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in sol. protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsFOku7%252FM&md5=7f8f2300f99adbd84be36e52806c9a1e
9
Arabnejad, H.; Dal Lago, M.; Jekel, P. A.; Floor, R. J.; Thunnissen, A.-M. W. H.; Terwisscha van Scheltinga, A. C.; Wijma, H. J.; Janssen, D. B. A Robust Cosolvent-Compatible Halohydrin Dehalogenase by Computational Library Design. Protein Eng., Des. Sel. 2017, 30, 175– 189, DOI: 10.1093/protein/gzw068

Google Scholar

9
A robust cosolvent-compatible halohydrin dehalogenase by computational library design

Arabnejad, Hesam; Lago, Marco Dal; Jekel, Peter A.; Floor, Robert J.; Thunnissen, Andy-Mark W. H.; van Scheltinga, Anke C. Terwisscha; Wijma, Hein J.; Janssen, Dick B.

Protein Engineering, Design & Selection (2017), 30 (3), 175-189CODEN: PEDSBR; ISSN:1741-0134. (Oxford University Press)

To improve the applicability of halohydrin dehalogenase as a catalyst for reactions in the presence of org. cosolvents, we explored a computational library design strategy (Framework for Rapid Enzyme Stabilization by Computational libraries) that involves discovery and in silico evaluation of stabilizing mutations. Energy calcns., disulfide bond predictions and mol. dynamics simulations identified 218 point mutations and 35 disulfide bonds with predicted stabilizing effects. Expts. confirmed 29 stabilizing point mutations, most of which were located in two distinct regions, whereas introduction of disulfide bonds was not effective. Combining the best mutations resulted in a 12-fold mutant (HheC-H12) with a 28°C higher apparent melting temp. and a remarkable increase in resistance to cosolvents. This variant also showed a higher optimum temp. for catalysis while activity at low temp. was preserved. Mutant H12 was used as a template for the introduction of mutations that enhance enantioselectivity or activity. Crystal structures showed that the structural changes in the H12 mutant mostly agreed with the computational predictions and that the enhanced stability was mainly due to mutations that redistributed surface charges and improved interactions between subunits, the latter including better interactions of water mols. at the subunit interfaces.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhsFWmsL7P&md5=b96764d41caee68bf2da8c229f63fa95
10
Wyganowski, K. T.; Kaltenbach, M.; Tokuriki, N. GroEL/ES Buffering and Compensatory Mutations Promote Protein Evolution by Stabilizing Folding Intermediates. J. Mol. Biol. 2013, 425, 3403– 3414, DOI: 10.1016/j.jmb.2013.06.028

Google Scholar

10
GroEL/ES Buffering and Compensatory Mutations Promote Protein Evolution by Stabilizing Folding Intermediates

Wyganowski, Kirsten T.; Kaltenbach, Miriam; Tokuriki, Nobuhiko

Journal of Molecular Biology (2013), 425 (18), 3403-3414CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)

Maintaining stability is a major constraint in protein evolution because most mutations are destabilizing. Buffering and/or compensatory mechanisms that counteract this progressive destabilization during functional adaptation are pivotal for protein evolution as well as protein engineering. However, the interplay of these two mechanisms during a full evolutionary trajectory has never been explored. Here, we unravel such dynamics during the lab. evolution of a phosphotriesterase into an arylesterase. A controllable GroEL/ES chaperone co-expression system enabled us to vary the selection environment between buffering and compensatory, which smoothened the trajectory along the fitness landscape to achieve a > 104 increase in arylesterase activity. Biophys. characterization revealed that, in contrast to prevalent models of protein stability and evolution, the variants' sol. cellular expression did not correlate with in vitro stability, and compensatory mutations were linked to a stabilization of folding intermediates. Thus, folding kinetics in the cell are a key feature of protein evolvability.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtFOgtLbO&md5=ff35330976ea66426ccd97b1f481eb5a
11
Lawrence, P. B.; Gavrilov, Y.; Matthews, S. S.; Langlois, M. I.; Shental-Bechor, D.; Greenblatt, H. M.; Pandey, B. K.; Smith, M. S.; Paxman, R.; Torgerson, C. D.; Merrell, J. P.; Ritz, C. C.; Prigozhin, M. B.; Levy, Y.; Price, J. L. Criteria for Selecting PEGylation Sites on Proteins for Higher Thermodynamic and Proteolytic Stability. J. Am. Chem. Soc. 2014, 136, 17547– 17560, DOI: 10.1021/ja5095183

Google Scholar

11
Criteria for Selecting PEGylation Sites on Proteins for Higher Thermodynamic and Proteolytic Stability

Lawrence, Paul B.; Gavrilov, Yulian; Matthews, Sam S.; Langlois, Minnie I.; Shental-Bechor, Dalit; Greenblatt, Harry M.; Pandey, Brijesh K.; Smith, Mason S.; Paxman, Ryan; Torgerson, Chad D.; Merrell, Jacob P.; Ritz, Cameron C.; Prigozhin, Maxim B.; Levy, Yaakov; Price, Joshua L.

Journal of the American Chemical Society (2014), 136 (50), 17547-17560CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)

PEGylation of protein side chains has been used for >30 years to enhance the pharmacokinetic properties of protein drugs. However, there are no structure- or sequence-based guidelines for selecting sites that provide optimal PEG-based pharmacokinetic enhancement with minimal losses to biol. activity. The authors hypothesize that globally optimal PEGylation sites are characterized by the ability of the PEG oligomer to increase protein conformational stability; however, the current understanding of how PEG influences the conformational stability of proteins is incomplete. Here the authors use the WW domain of the human protein Pin 1 (WW) as a model system to probe the impact of PEG on protein conformational stability. Using a combination of exptl. and theor. approaches, the authors develop a structure-based method for predicting which sites within WW are most likely to experience PEG-based stabilization, and this method correctly predicts the location of a stabilizing PEGylation site within the chicken Src SH3 domain. PEG-based stabilization in WW is assocd. with enhanced resistance to proteolysis, is entropic in origin, and likely involves disruption by PEG of the network of hydrogen-bound solvent mols. that surround the protein. The authors' results highlight the possibility of using modern site-specific PEGylation techniques to install PEG oligomers at predetd. locations where PEG will provide optimal increases in conformational and proteolytic stability.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhvFOjsL3F&md5=080147667225e65ace2de42bbb99266c
12
Rueda, N.; Dos Santos, J. C. S.; Ortiz, C.; Torres, R.; Barbosa, O.; Rodrigues, R. C.; Berenguer-Murcia, Á.; Fernandez-Lafuente, R. Chemical Modification in the Design of Immobilized Enzyme Biocatalysts: Drawbacks and Opportunities. Chem. Rec. 2016, 16, 1436– 1455, DOI: 10.1002/tcr.201600007

Google Scholar

12
Chemical Modification in the Design of Immobilized Enzyme Biocatalysts: Drawbacks and Opportunities

Rueda, Nazzoly; dos Santos, Jose C. S.; Ortiz, Claudia; Torres, Rodrigo; Barbosa, Oveimar; Rodrigues, Rafael C.; Berenguer-Murcia, Angel; Fernandez-Lafuente, Roberto

Chemical Record (2016), 16 (3), 1436-1455CODEN: CRHEAK; ISSN:1528-0691. (Wiley-VCH Verlag GmbH & Co. KGaA)

Chem. modification of enzymes and immobilization used to be considered as sep. ways to improve enzyme properties. This review shows how the coupled use of both tools may greatly improve the final biocatalyst performance. Chem. modification of a previously immobilized enzyme is far simpler and easier to control than the modification of the free enzyme. Moreover, if protein modification is performed to improve its immobilization (enriching the enzyme in reactive groups), the final features of the immobilized enzyme may be greatly improved. Chem. modification may be directed to improve enzyme stability, but also to improve selectivity, specificity, activity, and even cell penetrability. Coupling of immobilization and chem. modification with site-directed mutagenesis is a powerful instrument to obtain fully controlled modification. Some new ideas such as photoreceptive enzyme modifiers that change their phys. properties under UV exposition are discussed.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XnslajsL4%253D&md5=a13d77ce97c2c090b8cb40bc079aec4d
13
Stepankova, V.; Bidmanova, S.; Koudelakova, T.; Prokop, Z.; Chaloupkova, R.; Damborsky, J. Strategies for Stabilization of Enzymes in Organic Solvents. ACS Catal. 2013, 3, 2823– 2836, DOI: 10.1021/cs400684x

Google Scholar

13
Strategies for Stabilization of Enzymes in Organic Solvents

Stepankova, Veronika; Bidmanova, Sarka; Koudelakova, Tana; Prokop, Zbynek; Chaloupkova, Radka; Damborsky, Jiri

ACS Catalysis (2013), 3 (12), 2823-2836CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)

A review. One of the major barriers to the use of enzymes in industrial biotechnol. is their insufficient stability under processing conditions. The use of org. solvent systems instead of aq. media for enzymic reactions offers numerous advantages, such as increased soly. of hydrophobic substrates or suppression of water-dependent side reactions. For example, reverse hydrolysis reactions that form esters from acids and alcs. become thermodynamically favorable. However, org. solvents often inactivate enzymes. Industry and academia have devoted considerable effort into developing effective strategies to enhance the lifetime of enzymes in the presence of org. solvents. The strategies can be grouped into three main categories: (i) isolation of novel enzymes functioning under extreme conditions, (ii) modification of enzyme structures to increase their resistance toward nonconventional media, and (iii) modification of the solvent environment to decrease its denaturing effect on enzymes. Here we discuss successful examples representing each of these categories and summarize their advantages and disadvantages. Finally, we highlight some potential future research directions in the field, such as investigation of novel nanomaterials for immobilization, wider application of computational tools for semirational prediction of stabilizing mutations, knowledge-driven modification of key structural elements learned from successfully engineered proteins, and replacement of volatile org. solvents by ionic liqs. and deep eutectic solvents.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhs1Sqs7nL&md5=6fdea25bb110c5b82c8fd8c03dcf7e90
14
Butt, T. R.; Edavettal, S. C.; Hall, J. P.; Mattern, M. R. SUMO Fusion Technology for Difficult-to-Express Proteins. Protein Expression Purif. 2005, 43, 1– 9, DOI: 10.1016/j.pep.2005.03.016

Google Scholar

14
SUMO fusion technology for difficult-to-express proteins

Butt, Tauseef R.; Edavettal, Suzanne C.; Hall, John P.; Mattern, Michael R.

Protein Expression and Purification (2005), 43 (1), 1-9CODEN: PEXPEJ; ISSN:1046-5928. (Elsevier)

A review. The demands of structural and functional genomics for large quantities of sol., properly folded proteins in heterologous hosts have been aided by advancements in the field of protein prodn. and purifn. Escherichia coli, the preferred host for recombinant protein expression, presents many challenges which must be surmounted in order to over-express heterologous proteins. These challenges include the proteolytic degrdn. of target proteins, protein misfolding, poor soly., and the necessity for good purifn. methodologies. Gene fusion technologies have been able to improve heterologous expression by overcoming many of these challenges. The ability of gene fusions to improve expression, soly., purifn., and decrease proteolytic degrdn. will be discussed in this review. The main disadvantage, cleaving the protein fusion, will also be addressed. Focus will be given to the newly described SUMO fusion system and the improvements that this technol. has advanced over traditional gene fusion systems.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXntVGhsbw%253D&md5=999298b10bc86121509754c3fe448bae
15
LaVallie, E. R.; DiBlasio, E. A.; Kovacic, S.; Grant, K. L.; Schendel, P. F.; McCoy, J. M. A Thioredoxin Gene Fusion Expression System That Circumvents Inclusion Body Formation in the E. coli Cytoplasm. Nat. Biotechnol. 1993, 11, 187– 193, DOI: 10.1038/nbt0293-187

Google Scholar

15
A thioredoxin gene fusion expression system that circumvents inclusion body formation in the E. coli cytoplasm

LaVallie, Edward R.; DiBlasio, Elizabeth A.; Kovacic, Sharlotte; Grant, Kathleen L.; Schendel, Paul F.; McCoy, John M.

Bio/Technology (1993), 11 (2), 187-93CODEN: BTCHDA; ISSN:0733-222X.

A versatile Escherichia coli expression system was developed based on the use of E. coli thioredoxin (trxA) as a gene fusion partner. The broad utility of the system is illustrated by the prodn. of a variety of mammalian cytokines and growth factors as thioredoxin fusion proteins. Although many of these cytokines previously have been produced in E. coli as insol. aggregates or inclusion bodies, as thioredoxin fusions they can be made in sol. forms that are biol. active. In general, linkage to thioredoxin dramatically increases the soly. of heterologous proteins synthesized in the E. coli cytoplasm, and thioredoxin fusion proteins usually accumulate to high levels. Two addnl. properties of E. coli thioredoxin, its ability to be specifically released from the E. coli cytoplasm by osmotic shock or freeze/thaw treatments and its intrinsic thermal stability , are retained by some fusions and provide convenient purifn. steps. Active-site loop of E. coli thioredoxin can be used as a general site for small peptide insertions, allowing for the high level prodn. of sol. peptides in the E. coli cytoplasm.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXisFegsr0%253D&md5=d29a3b665417f16971038d34f7e58d92
16
Bloom, J. D.; Labthavikul, S. T.; Otey, C. R.; Arnold, F. H. Protein Stability Promotes Evolvability. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 5869– 5874, DOI: 10.1073/pnas.0510098103

Google Scholar

16
Protein stability promotes evolvability

Bloom, Jesse D.; Labthavikul, Sy T.; Otey, Christopher R.; Arnold, Frances H.

Proceedings of the National Academy of Sciences of the United States of America (2006), 103 (15), 5869-5874CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)

The biophys. properties that enable proteins to so readily evolve to perform diverse biochem. tasks are largely unknown. Here, we show that a protein's capacity to evolve is enhanced by the mutational robustness conferred by extra stability. We use simulations with model lattice proteins to demonstrate how extra stability increases evolvability by allowing a protein to accept a wider range of beneficial mutations while still folding to its native structure. We confirm this view exptl. by mutating marginally stable and thermostable variants of cytochrome P 450 BM3. Mutants of the stabilized parent were more likely to exhibit new or improved functions. Only the stabilized P 450 parent could tolerate the highly destabilizing mutations needed to confer novel activities such as hydroxylating the antiinflammatory drug naproxen. Our work establishes a crucial link between protein stability and evolution. We show that we can exploit this link to discover protein functions, and we suggest how natural evolution might do the same.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XktFait7s%253D&md5=dde8f702bc7083edad42a615aff09292
17
Sormanni, P.; Aprile, F. A.; Vendruscolo, M. The CamSol Method of Rational Design of Protein Mutants with Enhanced Solubility. J. Mol. Biol. 2015, 427, 478– 490, DOI: 10.1016/j.jmb.2014.09.026

Google Scholar

17
The CamSol Method of Rational Design of Protein Mutants with Enhanced Solubility

Sormanni, Pietro; Aprile, Francesco A.; Vendruscolo, Michele

Journal of Molecular Biology (2015), 427 (2), 478-490CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)

Protein soly. is often an essential requirement in biotechnol. and biomedical applications. Great advances in understanding the principles that det. this specific property of proteins have been made during the past decade, in particular concerning the physicochem. characteristics of their constituent amino acids. By exploiting these advances, we present the CamSol method for the rational design of protein variants with enhanced soly. The method works by performing a rapid computational screening of tens of thousand of mutations to identify those with the greatest impact on the soly. of the target protein while maintaining its native state and biol. activity. The application to a single-domain antibody that targets the Alzheimer's Aβ peptide demonstrates that the method predicts with great accuracy soly. changes upon mutation, thus offering a cost-effective strategy to help the prodn. of sol. proteins for academic and industrial purposes.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhslOktbfN&md5=10cea42ff7f45b198c6bc60f52127adf
18
Ganesan, A.; Siekierska, A.; Beerten, J.; Brams, M.; Van Durme, J.; De Baets, G.; Van der Kant, R.; Gallardo, R.; Ramakers, M.; Langenberg, T.; Wilkinson, H.; De Smet, F.; Ulens, C.; Rousseau, F.; Schymkowitz, J. Structural Hot Spots for the Solubility of Globular Proteins. Nat. Commun. 2016, 7, 10816, DOI: 10.1038/ncomms10816

Google Scholar

18
Structural hot spots for the solubility of globular proteins

Ganesan, Ashok; Siekierska, Aleksandra; Beerten, Jacinte; Brams, Marijke; Van Durme, Joost; De Baets, Greet; Van der Kant, Rob; Gallardo, Rodrigo; Ramakers, Meine; Langenberg, Tobias; Wilkinson, Hannah; De Smet, Frederik; Ulens, Chris; Rousseau, Frederic; Schymkowitz, Joost

Nature Communications (2016), 7 (), 10816CODEN: NCAOBW; ISSN:2041-1723. (Nature Publishing Group)

Natural selection shapes protein soly. to physiol. requirements and recombinant applications that require higher protein concns. are often problematic. This raises the question whether the soly. of natural protein sequences can be improved. We here show an anti-correlation between the no. of aggregation prone regions (APRs) in a protein sequence and its soly., suggesting that mutational suppression of APRs provides a simple strategy to increase protein soly. We show that mutations at specific positions within a protein structure can act as APR suppressors without affecting protein stability. These hot spots for protein soly. are both structure and sequence dependent but can be computationally predicted. We demonstrate this by reducing the aggregation of human α-galactosidase and protective antigen of Bacillus anthracis through mutation. Our results indicate that many proteins possess hot spots allowing to adapt protein soly. independently of structure and function.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xjt1emtbw%253D&md5=eba06a338f47637413eff7720f355602
19
Zeymer, C.; Hilvert, D. Directed Evolution of Protein Catalysts. Annu. Rev. Biochem. 2018, 87, 131– 157, DOI: 10.1146/annurev-biochem-062917-012034

Google Scholar

19
Directed Evolution of Protein Catalysts

Zeymer, Cathleen; Hilvert, Donald

Annual Review of Biochemistry (2018), 87 (), 131-157CODEN: ARBOAW; ISSN:0066-4154. (Annual Reviews)

A review. Directed evolution is a powerful technique for generating tailor-made enzymes for a wide range of biocatalytic applications. Following the principles of natural evolution, iterative cycles of mutagenesis and screening or selection are applied to modify protein properties, enhance catalytic activities, or develop completely new protein catalysts for non-natural chem. transformations. This review briefly surveys the exptl. methods used to generate genetic diversity and screen or select for improved enzyme variants. Emphasis is placed on a key challenge, namely how to generate novel catalytic activities that expand the scope of natural reactions. Two particularly effective strategies, exploiting catalytic promiscuity and rational design, are illustrated by representative examples of successfully evolved enzymes. Opportunities for extending these approaches to more complex biocatalytic systems are also considered.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXjs1Oisr0%253D&md5=933b5fe198a29f6e4ac0a738e34a566d
20
Starr, T. N.; Thornton, J. W. Epistasis in Protein Evolution. Protein Sci. 2016, 25, 1204– 1218, DOI: 10.1002/pro.2897

Google Scholar

20
Epistasis in protein evolution

Starr, Tyler N.; Thornton, Joseph W.

Protein Science (2016), 25 (7), 1204-1218CODEN: PRCIEI; ISSN:1469-896X. (Wiley-Blackwell)

The structure, function, and evolution of proteins depend on phys. and genetic interactions among amino acids. Recent studies have used new strategies to explore the prevalence, biochem. mechanisms, and evolutionary implications of these interactions-called epistasis-within proteins. Here we describe an emerging picture of pervasive epistasis in which the phys. and biol. effects of mutations change over the course of evolution in a lineage-specific fashion. Epistasis can restrict the trajectories available to an evolving protein or open new paths to sequences and functions that would otherwise have been inaccessible. We describe two broad classes of epistatic interactions, which arise from different phys. mechanisms and have different effects on evolutionary processes. Specific epistasis-in which one mutation influences the phenotypic effect of few other mutations-is caused by direct and indirect phys. interactions between mutations, which nonadditively change the protein's phys. properties, such as conformation, stability, or affinity for ligands. In contrast, nonspecific epistasis describes mutations that modify the effect of many others; these typically behave additively with respect to the phys. properties of a protein but exhibit epistasis because of a nonlinear relationship between the phys. properties and their biol. effects, such as function or fitness. Both types of interaction are rampant, but specific epistasis has stronger effects on the rate and outcomes of evolution, because it imposes stricter constraints and modulates evolutionary potential more dramatically; it therefore makes evolution more contingent on low-probability historical events and leaves stronger marks on the sequences, structures, and functions of protein families.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xjt1Cnt78%253D&md5=1bebae9ad2da57b530ac51ade55e1813
21
Goldsmith, M.; Tawfik, D. S. Enzyme Engineering: Reaching the Maximal Catalytic Efficiency Peak. Curr. Opin. Struct. Biol. 2017, 47, 140– 150, DOI: 10.1016/j.sbi.2017.09.002

Google Scholar

21
Enzyme engineering: reaching the maximal catalytic efficiency peak

Goldsmith, Moshe; Tawfik, Dan S.

Current Opinion in Structural Biology (2017), 47 (), 140-150CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)

A review. The practical need for highly efficient enzymes presents new challenges in enzyme engineering, in particular, the need to improve catalytic turnover (kcat) or efficiency (kcat/KM) by several orders of magnitude. However, optimizing catalysis demands navigation through complex and rugged fitness landscapes, with optimization trajectories often leading to strong diminishing returns and dead-ends. When no further improvements are obsd. in library screens or selections, it remains unclear whether the maximal catalytic efficiency of the enzyme (the catalytic 'fitness peak') has been reached; or perhaps, an alternative combination of mutations exists that could yield addnl. improvements. Here, we discuss fundamental aspects of the process of catalytic optimization, and offer practical solns. with respect to overcoming optimization plateaus.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhs1ais7vK&md5=6d05255856d58b4d8fd491c5b68da28b
22
Currin, A.; Swainston, N.; Day, P. J.; Kell, D. B. Synthetic Biology for the Directed Evolution of Protein Biocatalysts: Navigating Sequence Space Intelligently. Chem. Soc. Rev. 2015, 44, 1172– 1239, DOI: 10.1039/C4CS00351A

Google Scholar

22
Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently

Currin, Andrew; Swainston, Neil; Day, Philip J.; Kell, Douglas B.

Chemical Society Reviews (2015), 44 (5), 1172-1239CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)

The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biol., whereby increasingly large sequences of DNA can be synthesized de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the no. of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modeling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), esp. with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a no. of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modeling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biol., offers scope for the development of novel biocatalysts that are both highly active and robust.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXitFeht7jK&md5=c921dc51a66756d2d3d96f2d0b619b38
23
Rocklin, G. J.; Chidyausiku, T. M.; Goreshnik, I.; Ford, A.; Houliston, S.; Lemak, A.; Carter, L.; Ravichandran, R.; Mulligan, V. K.; Chevalier, A.; Arrowsmith, C. H.; Baker, D. Global Analysis of Protein Folding Using Massively Parallel Design, Synthesis, and Testing. Science 2017, 357, 168– 175, DOI: 10.1126/science.aan0693

Google Scholar

23
Global analysis of protein folding using massively parallel design, synthesis, and testing

Rocklin, Gabriel J.; Chidyausiku, Tamuka M.; Goreshnik, Inna; Ford, Alex; Houliston, Scott; Lemak, Alexander; Carter, Lauren; Ravichandran, Rashmi; Mulligan, Vikram K.; Chevalier, Aaron; Arrowsmith, Cheryl H.; Baker, David

Science (Washington, DC, United States) (2017), 357 (6347), 168-175CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)

Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are "encoded" in the thousands of known protein structures, "decoding" them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 neg. control sequences. This anal. identified more than 2500 stable designed proteins in four basic folds - a no. sufficient to enable us to systematically examine how sequence dets. folding and stability in uncharted protein space. Iteration between design and expt. increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and expt. and has the potential to transform computational protein design into a data-driven science.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFOjs7rK&md5=0c089edbcc1309b72f412cfe72d149cf
24
Sumbalova, L.; Stourac, J.; Martinek, T.; Bednar, D.; Damborsky, J. HotSpot Wizard 3.0: Web Server for Automated Design of Mutations and Smart Libraries Based on Sequence Input Information. Nucleic Acids Res. 2018, 46, W356– W362, DOI: 10.1093/nar/gky417

Google Scholar

24
HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information

Sumbalova, Lenka; Stourac, Jan; Martinek, Tomas; Bednar, David; Damborsky, Jiri

Nucleic Acids Research (2018), 46 (W1), W356-W362CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)

HotSpot Wizard is a web server used for the automated identification of hotspots in semi-rational protein design to give improved protein stability, catalytic activity, substrate specificity and enantioselectivity. Since there are three orders of magnitude fewer protein structures than sequences in bioinformatic databases, the major limitation to the usability of previous versions was the requirement for the protein structure to be a compulsory input for the calcn. HotSpot Wizard 3.0 now accepts the protein sequence as input data. The protein structure for the query sequence is obtained either from eight repositories of homol. models or is modeled using Modeller and I-Tasser. The quality of the models is then evaluated using three quality assessment tools--WHAT CHECK, PROCHECK and Mol- Probity. During follow-up analyses, the system automatically warns the users whenever they attempt to redesign poorly predicted parts of their homol. models. The second main limitation of HotSpot Wizard's predictions is that it identifies suitable positions for mutagenesis, but does not provide any reliable advice on particular substitutions. A new module for the estn. of thermodn. stabilities using the Rosetta and FoldX suites has been introduced which prevents destabilizing mutations among pre-selected variants entering exptl. testing.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXosVyrs7s%253D&md5=2b71751334fae9917f809937b25e0c34
25
Kuipers, R. K.; Joosten, H.-J.; van Berkel, W. J. H.; Leferink, N. G. H.; Rooijen, E.; Ittmann, E.; van Zimmeren, F.; Jochens, H.; Bornscheuer, U.; Vriend, G.; Martins dos Santos, V. A. P.; Schaap, P. J. 3DM: Systematic Analysis of Heterogeneous Superfamily Data to Discover Protein Functionalities. Proteins: Struct., Funct., Bioinf. 2010, 78, 2101– 2113, DOI: 10.1002/prot.22725

Google Scholar

25
3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities

Kuipers, Remko K.; Joosten, Henk-Jan; van Berkel, Willem J. H.; Leferink, Nicole G. H.; Rooijen, Erik; Ittmann, Erik; van Zimmeren, Frank; Jochens, Helge; Bornscheuer, Uwe; Vriend, Gert; Martins dos Santos, Vitor A. P.; Schaap, Peter J.

Proteins: Structure, Function, and Bioinformatics (2010), 78 (9), 2101-2113CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)

Ten years of experience with mol. class-specific information systems (MCSIS) such as with the hand-curated G protein-coupled receptor database (GPCRDB) or the semiautomatically generated nuclear receptor database has made clear that a wide variety of questions can be answered when protein-related data from many different origins can be flexibly combined. MCSISes revolve around a multiple sequence alignment (MSA) that includes "all" available sequences from the entire superfamily, and it has been shown at many occasions that the quality of these alignments is the most crucial aspect of the MCSIS approach. We describe here a system called 3DM that can automatically build an entire MCSIS. 3DM bases the MSA on a multiple structure alignment, which implies that the availability of a large no. of superfamily members with a known three-dimensional structure is a requirement for 3DM to succeed well. Thirteen MCSISes were constructed and placed on the Internet for examn. These systems have been instrumental in a large series of research projects related to enzyme activity or the understanding and engineering of specificity, protein stability engineering, DNA-diagnostics, drug design, and so forth.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXlslegtrc%253D&md5=0bff2b9dfe4df7986583e43e9a7b1682
26
Reetz, M. T.; Carballeira, J. D. Iterative Saturation Mutagenesis (ISM) for Rapid Directed Evolution of Functional Enzymes. Nat. Protoc. 2007, 2, 891– 903, DOI: 10.1038/nprot.2007.72

Google Scholar

26
Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes

Reetz, Manfred T.; Carballeira, Jose Daniel

Nature Protocols (2007), 2 (4), 891-903CODEN: NPARDW; ISSN:1750-2799. (Nature Publishing Group)

Iterative satn. mutagenesis (ISM) is a new and efficient method for the directed evolution of functional enzymes. It reduces the necessary mol. biol. work and the screening effort drastically. It is based on a Cartesian view of the protein structure, performing iterative cycles of satn. mutagenesis at rationally chosen sites in an enzyme, a given site being composed of one, two or three amino acid positions. The basis for choosing these sites depends on the nature of the catalytic property to be improved, e.g., enantioselectivity, substrate acceptance or thermostability. In the case of thermostability, sites showing highest B-factors (available from x-ray data) are chosen. The pronounced increase in thermostability of the lipase from Bacillus subtilis (Lip A) as a result of applying ISM is illustrated here.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtFGnur%252FP&md5=03f309e6d923d5c2506e3718362b7ee7
27
Liskova, V.; Stepankova, V.; Bednar, D.; Brezovsky, J.; Prokop, Z.; Chaloupkova, R.; Damborsky, J. Different Structural Origins of the Enantioselectivity of Haloalkane Dehalogenases toward Linear β-Haloalkanes: Open-Solvated versus Occluded-Desolvated Active Sites. Angew. Chem., Int. Ed. 2017, 56, 4719– 4723, DOI: 10.1002/anie.201611193

Google Scholar

27
Different Structural Origins of the Enantioselectivity of Haloalkane Dehalogenases toward Linear β-Haloalkanes: Open-Solvated versus Occluded-Desolvated Active Sites

Liskova, Veronika; Stepankova, Veronika; Bednar, David; Brezovsky, Jan; Prokop, Zbynek; Chaloupkova, Radka; Damborsky, Jiri

Angewandte Chemie, International Edition (2017), 56 (17), 4719-4723CODEN: ACIEF5; ISSN:1433-7851. (Wiley-VCH Verlag GmbH & Co. KGaA)

The enzymic enantiodiscrimination of linear β-haloalkanes is difficult because the simple structures of the substrates prevent directional interactions. Herein we describe two distinct mol. mechanisms for the enantiodiscrimination of the β-haloalkane 2-bromopentane by haloalkane dehalogenases. Highly enantioselective DbjA has an open, solvent-accessible active site, whereas the engineered enzyme DhaA31 has an occluded and less solvated cavity but shows similar enantioselectivity. The enantioselectivity of DhaA31 arises from steric hindrance imposed by two specific substitutions rather than hydration as in DbjA.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXkvFWlsrc%253D&md5=180d88a209ea61b5e45f3e03c826db89
28
Bar-Even, A.; Noor, E.; Savir, Y.; Liebermeister, W.; Davidi, D.; Tawfik, D. S.; Milo, R. The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme Parameters. Biochemistry 2011, 50, 4402– 4410, DOI: 10.1021/bi2002289

Google Scholar

28
The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme Parameters

Bar-Even, Arren; Noor, Elad; Savir, Yonatan; Liebermeister, Wolfram; Davidi, Dan; Tawfik, Dan S.; Milo, Ron

Biochemistry (2011), 50 (21), 4402-4410CODEN: BICHAW; ISSN:0006-2960. (American Chemical Society)

The kinetic parameters of enzymes are key to understanding the rate and specificity of most biol. processes. Although specific trends are frequently studied for individual enzymes, global trends are rarely addressed. We performed an anal. of kcat and KM values of several thousand enzymes collected from the literature. We found that the "av. enzyme" exhibits a kcat of ∼10 s-1 and a kcat/KM of ∼ 105 s-1 M-1, much below the diffusion limit and the characteristic textbook portrayal of kinetically superior enzymes. Why do most enzymes exhibit moderate catalytic efficiencies Maximal rates may not evolve in cases where weaker selection pressures are expected. We find, for example, that enzymes operating in secondary metab. are, on av., ∼ 30-fold slower than those of central metab. We also find indications that the physicochem. properties of substrates affect the kinetic parameters. Specifically, low mol. mass and hydrophobicity appear to limit KM optimization. In accordance, substitution with phosphate, CoA, or other large modifiers considerably lowers the KM values of enzymes utilizing the substituted substrates. It therefore appears that both evolutionary selection pressures and physicochem. constraints shape the kinetic parameters of enzymes. It also seems likely that the catalytic efficiency of some enzymes toward their natural substrates could be increased in many cases by natural or lab. evolution.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXlsFWnur8%253D&md5=6cca5d0e98fe4f835de63adfe4059a56
29
Balchin, D.; Hayer-Hartl, M.; Hartl, F. U. In Vivo Aspects of Protein Folding and Quality Control. Science 2016, 353, aac4354, DOI: 10.1126/science.aac4354

Google Scholar

There is no corresponding record for this reference.
30
Colón, W.; Church, J.; Sen, J.; Thibeault, J.; Trasatti, H.; Xia, K. Biological Roles of Protein Kinetic Stability. Biochemistry 2017, 56, 6179– 6186, DOI: 10.1021/acs.biochem.7b00942

Google Scholar

30
Biological Roles of Protein Kinetic Stability

Colon, Wilfredo; Church, Jennifer; Sen, Jayeeta; Thibeault, Jane; Trasatti, Hannah; Xia, Ke

Biochemistry (2017), 56 (47), 6179-6186CODEN: BICHAW; ISSN:0006-2960. (American Chemical Society)

A review. A protein's stability may range from non-existent, as in the case of intrinsically disordered proteins, to very high, as indicated by a protein's resistance to degrdn., even under relatively harsh conditions. The stability of this latter group is usually under kinetic control due to a high activation energy for unfolding that virtually traps the protein in a specific conformation, thereby conferring resistance to proteolytic degrdn. and misfolding-aggregation. The usual outcome of kinetic stability is a longer protein half-life. Thus, the protective role of protein kinetic stability is often appreciated, but relatively little is known about the extent of biol. roles related to this property. Here, we discuss several known or putative biol. roles of protein kinetic stability, including protection from stressors to avoid aggregation or premature degrdn., achieving long-term phenotypic change, and regulating cellular processes by controlling the trigger and timing of mol. motion. The picture that emerges from this anal. is that protein kinetic stability is involved in a myriad of known and yet to be discovered biol. functions via its ability to resist degrdn. and control the timing, extent, and permanency of mol. motion.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhslentrfL&md5=b7e0dd86dd97d503913bcab967ee7495
31
Khersonsky, O.; Kiss, G.; Röthlisberger, D.; Dym, O.; Albeck, S.; Houk, K. N.; Baker, D.; Tawfik, D. S. Bridging the Gaps in Design Methodologies by Evolutionary Optimization of the Stability and Proficiency of Designed Kemp Eliminase KE59. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 10358– 10363, DOI: 10.1073/pnas.1121063109

Google Scholar

31
Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59

Khersonsky, Olga; Kiss, Gert; Rothlisberger, Daniela; Dym, Orly; Albeck, Shira; Houk, Kendall N.; Baker, David; Tawfik, Dan S.

Proceedings of the National Academy of Sciences of the United States of America (2012), 109 (26), 10358-10363, S10358/1-S10358/47CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)

Computational design is a test of our understanding of enzyme catalysis and a means of engineering novel, tailor-made enzymes. While the de novo computational design of catalytically efficient enzymes remains a challenge, designed enzymes may comprise unique starting points for further optimization by directed evolution. Directed evolution of two computationally designed Kemp eliminases, KE07 and KE70, led to low to moderately efficient enzymes (kcat/Km values of ≤5 × 104 M-1s-1). Here we describe the optimization of a third design, KE59. Although KE59 was the most catalytically efficient Kemp eliminase from this design series (by kcat/Km, and by catalyzing the elimination of nonactivated benzisoxazoles), its impaired stability prevented its evolutionary optimization. To boost KE59's evolvability, stabilizing consensus mutations were included in the libraries throughout the directed evolution process. The libraries were also screened with less activated substrates. Sixteen rounds of mutation and selection led to >2000-fold increase in catalytic efficiency, mainly via higher kcat values. The best KE59 variants exhibited kcat/Km values up to 0.6 × 106 M-1s-1, and kcat/kuncat values of ≤107 almost regardless of substrate reactivity. Biochem., structural, and mol. dynamics (MD) simulation studies provided insights regarding the optimization of KE59. Overall, the directed evolution of three different designed Kemp eliminases, KE07, KE70, and KE59, demonstrates that computational designs are highly evolvable and can be optimized to high catalytic efficiencies.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtFWgt7rE&md5=da69b08f6e228aa3c871519287f12688
32
Taverna, D. M.; Goldstein, R. A. Why Are Proteins Marginally Stable?. Proteins: Struct., Funct., Genet. 2002, 46, 105– 109, DOI: 10.1002/prot.10016

Google Scholar

There is no corresponding record for this reference.
33
Sanchez-Ruiz, J. M. Protein Kinetic Stability. Biophys. Chem. 2010, 148, 1– 15, DOI: 10.1016/j.bpc.2010.02.004

Google Scholar

33
Protein kinetic stability

Sanchez-Ruiz, Jose M.

Biophysical Chemistry (2010), 148 (1-3), 1-15CODEN: BICIAZ; ISSN:0301-4622. (Elsevier B.V.)

A review. The relevance of protein stability for biol. function and mol. evolution is widely recognized. Protein stability, however, comes in 2 flavors: (1) thermodn. stability, which is related to a low amt. of unfolded and partially-unfolded states in equil. with the native, functional protein, and (2) kinetic stability, which is related to a high free energy barrier "sepg." the native state from the non-functional forms (unfolded states, irreversibly-denatured protein). Such a barrier may guarantee that the biol. function of the protein is maintained, at least during a physiol. relevant time-scale, even if the native state is not thermodynamically stable with respect to non-functional forms. Kinetic stabilization is likely required in many cases, since proteins often work under conditions (harsh extracellular or crowded intracellular environments) in which deleterious alterations (proteolysis, aggregation, undesirable interactions with other macromol. components) are prone to occur. Also, kinetic stability may provide a mechanism for the evolution of optimal functional properties. Furthermore, enhancement of kinetic stability is essential for many biotechnol. applications of proteins. Despite all of this, many published studies focus on thermodn. stability, partly because it can be easily quantified in vitro for small model proteins and, also, because of the availability of computational algorithms to est. mutation effects on thermodn. stability. Here, the opposite bias is purposely adopted: the exptl. evidence supporting widespread kinetic stabilization of proteins is summarized, the role of natural selection in detg. this feature is discussed, possible mol. mechanisms responsible for kinetic stability are described, and the relation between kinetic destabilization and protein misfolding diseases is highlighted.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXkvFCju7c%253D&md5=b0bccbeea28fd182559a4381889877a6
34
Bommarius, A. S.; Paye, M. F. Stabilizing Biocatalysts. Chem. Soc. Rev. 2013, 42, 6534– 6565, DOI: 10.1039/c3cs60137d

Google Scholar

34
Stabilizing biocatalysts

Bommarius, Andreas S.; Paye, Marietou F.

Chemical Society Reviews (2013), 42 (15), 6534-6565CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)

A review. The area of biocatalysis itself is in rapid development, fueled by both an enhanced repertoire of protein engineering tools and an increasing list of solved problems. Biocatalysts, however, are delicate materials that hover close to the thermodn. limit of stability. In many cases, they need to be stabilized to survive a range of challenges regarding temp., pH value, salt type and concn., co-solvents, as well as shear and surface forces. Biocatalysts may be delicate proteins, however, once stabilized, they are efficiently active enzymes. Kinetic stability must be achieved to a level satisfactory for large-scale process application. Kinetic stability evokes resistance to degrdn. and maintained or increased catalytic efficiency of the enzyme in which the desired reaction is accomplished at an increased rate. However, beyond these limitations, stable biocatalysts can be operated at higher temps. or co-solvent concns., with ensuing redn. in microbial contamination, better soly., as well as in many cases more favorable equil., and can serve as more effective templates for combinatorial and data-driven protein engineering. To increase thermodn. and kinetic stability, immobilization, protein engineering, and medium engineering of biocatalysts are available, the main focus of this work. In the case of protein engineering, there are three main approaches to enhancing the stability of protein biocatalysts: (i) rational design, based on knowledge of the 3D-structure and the catalytic mechanism, (ii) combinatorial design, requiring a protocol to generate diversity at the genetic level, a large, often high throughput, screening capacity to distinguish hits' from misses', and (iii) data-driven design, fueled by the increased availability of nucleotide and amino acid sequences of equiv. functionality.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtVKhtrzI&md5=5ba9406e8a7666b99704af985258606f
35
Goldenzweig, A.; Fleishman, S. J. Principles of Protein Stability and Their Application in Computational Design. Annu. Rev. Biochem. 2018, 87, 105– 129, DOI: 10.1146/annurev-biochem-062917-012102

Google Scholar

35
Principles of Protein Stability and Their Application in Computational Design

Goldenzweig, Adi; Fleishman, Sarel J.

Annual Review of Biochemistry (2018), 87 (), 105-129CODEN: ARBOAW; ISSN:0066-4154. (Annual Reviews)

A review. Proteins are increasingly used in basic and applied biomedical research. Many proteins, however, are only marginally stable and can be expressed in limited amts., thus hampering research and applications. Research has revealed the thermodn., cellular, and evolutionary principles and mechanisms that underlie marginal stability. With this growing understanding, computational stability design methods have advanced over the past two decades starting from methods that selectively addressed only some aspects of marginal stability. Current methods are more general and, by combining phylogenetic anal. with atomistic design, have shown drastic improvements in soly., thermal stability, and aggregation resistance while maintaining the protein's primary mol. activity. Stability design is opening the way to rational engineering of improved enzymes, therapeutics, and vaccines and to the application of protein design methodol. to large proteins and mol. activities that have proven challenging in the past.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXitFyqt7k%253D&md5=d5b820508142e79dafe127543f1ad6b7
36
Hansen, N.; van Gunsteren, W. F. Practical Aspects of Free-Energy Calculations: A Review. J. Chem. Theory Comput. 2014, 10, 2632– 2647, DOI: 10.1021/ct500161f

Google Scholar

36
Practical Aspects of Free-Energy Calculations: A Review

Hansen, Niels; van Gunsteren, Wilfred F.

Journal of Chemical Theory and Computation (2014), 10 (7), 2632-2647CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)

A review. Free-energy calcns. in the framework of classical mol. dynamics simulations are nowadays used in a wide range of research areas including solvation thermodn., mol. recognition, and protein folding. The basic components of a free-energy calcn., i.e., a suitable model Hamiltonian, a sampling protocol, and an estimator for the free energy, are independent of the specific application. However, the attention that one has to pay to these components depends considerably on the specific application. Here, we review six different areas of application and discuss the relative importance of the three main components to provide the reader with an organigram and to make nonexperts aware of the many pitfalls present in free energy calcns.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXotlWjs7k%253D&md5=096fd11727692a87b0884e50bcb4a5e3
37
Polizzi, K. M.; Bommarius, A. S.; Broering, J. M.; Chaparro-Riggers, J. F. Stability of Biocatalysts. Curr. Opin. Chem. Biol. 2007, 11, 220– 225, DOI: 10.1016/j.cbpa.2007.01.685

Google Scholar

37
Stability of biocatalysts

Polizzi, Karen M.; Bommarius, Andreas S.; Broering, James M.; Chaparro-Riggers, Javier F.

Current Opinion in Chemical Biology (2007), 11 (2), 220-225CODEN: COCBF4; ISSN:1367-5931. (Elsevier B.V.)

A review. Here, the authors highlight recent research on the stabilization of enzymes using both chem. and biol. means to increase the lifetime of the biocatalyst. Despite their many favorable qualities, the marginal stability of biocatalysts in many types of reaction media often has prevented or delayed their implementation for industrial-scale synthesis of fine chems. and pharmaceuticals. Consequently, there is great interest in understanding the effects of soln. conditions on protein stability, as well as in developing strategies to improve protein stability in desired reaction media. Recent methods include novel chem. modifications of proteins, lyophilization in the presence of additives, and phys. immobilization on novel supports. Rational and combinatorial protein engineering techniques have been used to yield unmodified proteins with exceptionally improved stability. Both have been aided by the development of computational tools and structure-guided heuristics aimed at reducing library sizes that must be generated and screened to identify improved mutants. The no. of parameters used to indicate protein stability can complicate discussions and investigations, and care should be taken to identify whether thermodn. or kinetic stability limits the obsd. stability of proteins. Although the useful lifetime of a biocatalyst is dictated by its kinetic stability, only 6% of protein stability studies use kinetic stability measures. Clearly, more effort is needed to study how soln. conditions impact protein kinetic stability.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXjvFegurk%253D&md5=0b7ef267bbff8fcbdc29ca772079fc57
38
Buck, P. M.; Kumar, S.; Wang, X.; Agrawal, N. J.; Trout, B. L.; Singh, S. K. Computational Methods To Predict Therapeutic Protein Aggregation. Methods Mol. Biol. 2012, 899, 425– 451, DOI: 10.1007/978-1-61779-921-1_26

Google Scholar

38
Computational methods to predict therapeutic protein aggregation

Buck, Patrick M.; Kumar, Sandeep; Wang, Xiaoling; Agrawal, Neeraj J.; Trout, Bernhardt L.; Singh, Satish K.

Methods in Molecular Biology (New York, NY, United States) (2012), 899 (Therapeutic Proteins), 425-451CODEN: MMBIED; ISSN:1064-3745. (Springer)

A review. Protein based biotherapeutics have emerged as a successful class of pharmaceuticals. However, these macromols. endure a variety of physicochem. degrdns. during manufg., shipping, and storage, which may adversely impact the drug product quality. Of these degrdns., the irreversible self-assocn. of therapeutic proteins to form aggregates is a major challenge in the formulation of these mols. Tools to predict and mitigate protein aggregation are, therefore, of great interest to biopharmaceutical research and development. In this chapter, a no. of such computational tools developed to understand and predict the various steps involved in protein aggregation are described. These tools can be grouped into three general classes: unfolding kinetics and native state thermal stability, colloidal stability, and sequence/structure based aggregation liabilities. Chapter sections introduce each class by discussing how these predictive tools provide insight into the mol. events leading to protein aggregation. The computational methods are then explained in detail along with their advantages and limitations.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXitFags7c%253D&md5=a7b334801df3f41c32bb81f50bf967b4
39
Jaswal, S. S.; Sohl, J. L.; Davis, J. H.; Agard, D. A. Energetic Landscape of α-Lytic Protease Optimizes Longevity through Kinetic Stability. Nature 2002, 415, 343– 346, DOI: 10.1038/415343a

Google Scholar

39
Energetic landscape of α-lytic protease optimizes longevity through kinetic stability

Jaswal, Shella S.; Sohl, Julie L.; Davis, Jonathan H.; Agard, David A.

Nature (London, United Kingdom) (2002), 415 (6869), 343-346CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)

During the evolution of proteins the pressure to optimize biol. activity is moderated by a need for efficient folding. For most proteins, this is accomplished through spontaneous folding to a thermodynamically stable and active native state. However, in the extracellular bacterial α-lytic protease (αLP) these two processes have become decoupled. The native state of αLP is thermodynamically unstable, and when denatured, requires millennia (t1/2 ∼ 1800 yr) to refold. Folding is made possible by an attached folding catalyst, the pro-region, which is degraded on completion of folding, leaving αLP trapped in its native state by a large kinetic unfolding barrier (t1/2 ∼ 1.2 yr). αLP faces two very different folding landscapes: one in the presence of the pro-region controlling folding, and one in its absence restricting unfolding. Here we demonstrate that this sepn. of folding and unfolding pathways has removed constraints placed on the folding of thermodynamically stable proteins, and allowed the evolution of a native state having markedly reduced dynamic fluctuations. This, in turn, has led to a significant extension of the functional lifetime of αLP by the optimal suppression of proteolytic sensitivity.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38Xptlaktg%253D%253D&md5=9d150430ebb485d561b9a8d71cab305b
40
Young, T. A.; Skordalakes, E.; Marqusee, S. Comparison of Proteolytic Susceptibility in Phosphoglycerate Kinases from Yeast and E. coli: Modulation of Conformational Ensembles Without Altering Structure or Stability. J. Mol. Biol. 2007, 368, 1438– 1447, DOI: 10.1016/j.jmb.2007.02.077

Google Scholar

40
Comparison of Proteolytic Susceptibility in Phosphoglycerate Kinases from Yeast and E. coli: Modulation of Conformational Ensembles Without Altering Structure or Stability

Young, Tracy A.; Skordalakes, Emmanuel; Marqusee, Susan

Journal of Molecular Biology (2007), 368 (5), 1438-1447CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)

Escherichia coli phosphoglycerate kinase (PGK) is resistant to proteolytic cleavage while the yeast homolog from Saccharomyces cerevisiae is not. We have explored the biophys. basis of this surprising difference. The sequences of these homologs are 39% identical and 56% similar. Detn. of the crystal structure for the E. coli protein and comparison to the previously solved yeast structure reveals that the two proteins have extremely similar tertiary structures, and their global stabilities detd. by equil. denaturation are also very similar. The extrapolated unfolding rate of E. coli PGK is, however, 105 slower than that of the yeast homolog. This surprisingly large difference in unfolding rates appears to arise from a divergence in the extent of cooperativity between the two structural domains (the N and C-domains) that make up these kinases. This is supported by: (1) the C-domain of E. coli PGK cannot be expressed or fold independently of the N-domain, while both domains of the yeast protein fold in isolation into stable structures and (2) the energetics and kinetics of the proteolytically sensitive state of E. coli PGK match those for global unfolding. This suggests that proteolysis occurs from the globally unfolded state of E. coli PGK, while the characteristics defining the yeast homolog suggest that proteolysis occurs upon unfolding of only the C-domain, with the N-domain remaining folded and consequently resistant to cleavage.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXkslCmsLs%253D&md5=31e892b4872a16ea5507c80dfe4c63d4
41
Shirke, A. N.; Basore, D.; Butterfoss, G. L.; Bonneau, R.; Bystroff, C.; Gross, R. A. Toward Rational Thermostabilization of Aspergillus Oryzae Cutinase: Insights into Catalytic and Structural Stability. Proteins: Struct., Funct., Genet. 2016, 84, 60– 72, DOI: 10.1002/prot.24955

Google Scholar

There is no corresponding record for this reference.
42
Liu, B.; Zhang, J.; Li, B.; Liao, X.; Du, G.; Chen, J. Expression and Characterization of Extreme Alkaline, Oxidation-Resistant Keratinase from Bacillus Licheniformis in Recombinant Bacillus Subtilis WB600 Expression System and Its Application in Wool Fiber Processing. World J. Microbiol. Biotechnol. 2013, 29, 825– 832, DOI: 10.1007/s11274-012-1237-5

Google Scholar

42
Expression and characterization of extreme alkaline, oxidation-resistant keratinase from Bacillus licheniformis in recombinant Bacillus subtilis WB600 expression system and its application in wool fiber processing

Liu, Baihong; Zhang, Juan; Li, Ben; Liao, Xiangru; Du, Guocheng; Chen, Jian

World Journal of Microbiology & Biotechnology (2013), 29 (5), 825-832CODEN: WJMBEY; ISSN:0959-3993. (Springer)

A keratin-degrading bacterium of Bacillus licheniformis BBE11-1 was isolated and its ker gene encoding keratinase with native signal peptide was cloned and expressed in Bacillus subtilis WB600 under the strong PHpaII promoter of the pMA0911 vector. In the 3-L fermenter, the recombinant keratinase was secreted with 323 units/mL when non-induced after 24 h at 37 °C. And then, keratinase was concd. and purified by hydrophobic interaction chromatog. using HiTrap Phenyl-Sepharose Fast Flow. The recombinant keratinase had an optimal temp. and the pH at 40 °C and 10.5, resp., and was stable at 10-50 °C and pH 7-11.5. We found this enzyme can retained 80 % activity after treated 5 h with 1 M H2O2, it was activated by Mg2+, Co2+ and could degraded broad substrates such as degraded feather, bovine serum albumin, casein, gelatin, the keratinase was considered to be a serine protease. Coordinate with Savinase, the keratinase could efficient prevent shrinkage and eliminate fibers of wool, which showed its potential in textile industries and detergent industries.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXlsFWnsb4%253D&md5=ff6b515ff39b7f5d5bb752ba1a1c1ce8
43
Nguyen, V.; Wilson, C.; Hoemberger, M.; Stiller, J. B.; Agafonov, R. V.; Kutter, S.; English, J.; Theobald, D. L.; Kern, D. Evolutionary Drivers of Thermoadaptation in Enzyme Catalysis. Science 2017, 355, 289– 294, DOI: 10.1126/science.aah3717

Google Scholar

43
Evolutionary drivers of thermoadaptation in enzyme catalysis

Nguyen, Vy; Wilson, Christopher; Hoemberger, Marc; Stiller, John B.; Agafonov, Roman V.; Kutter, Steffen; English, Justin; Theobald, Douglas L.; Kern, Dorothee

Science (Washington, DC, United States) (2017), 355 (6322), 289-294CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)

With early life likely to have existed in a hot environment, enzymes had to cope with an inherent drop in catalytic speed caused by lowered temp. Here we characterize the mol. mechanisms underlying thermoadaptation of enzyme catalysis in adenylate kinase using ancestral sequence reconstruction spanning 3 billion years of evolution. We show that evolution solved the enzyme's key kinetic obstacle - how to maintain catalytic speed on a cooler Earth - by exploiting transition-state heat capacity. Tracing the evolution of enzyme activity and stability from the hot-start toward modern hyperthermophilic, mesophilic, and psychrophilic organisms illustrates active pressure vs. passive drift in evolution on a mol. level, refutes the debated activity/stability trade-off, and suggests that the catalytic speed of adenylate kinase is an evolutionary driver for organismal fitness.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVehtbk%253D&md5=f2d5eebf186da3323268e68831d65e49
44
Risso, V. A.; Gavira, J. A.; Gaucher, E. A.; Sanchez-Ruiz, J. M. Phenotypic Comparisons of Consensus Variants versus Laboratory Resurrections of Precambrian Proteins. Proteins: Struct., Funct., Genet. 2014, 82, 887– 896, DOI: 10.1002/prot.24575

Google Scholar

There is no corresponding record for this reference.
45
Bednar, D.; Beerens, K.; Sebestova, E.; Bendl, J.; Khare, S.; Chaloupkova, R.; Prokop, Z.; Brezovsky, J.; Baker, D.; Damborsky, J. FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point Mutants. PLoS Comput. Biol. 2015, 11, e1004556, DOI: 10.1371/journal.pcbi.1004556

Google Scholar

45
FireProt: energy- and evolution-based computational design of thermostable multiple-point mutants

Bednar, David; Beerens, Koen; Sebestova, Eva; Bendl, Jaroslav; Khare, Sagar; Chaloupkova, Radka; Prokop, Zbynek; Brezovsky, Jan; Baker, David; Damborsky, Jiri

PLoS Computational Biology (2015), 11 (11), e1004556/1-e1004556/20CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)

There is great interest in increasing proteins' stability to enhance their utility as biocatalysts, therapeutics, diagnostics and nanomaterials. Directed evolution is a powerful, but exptl. strenuous approach. Computational methods offer attractive alternatives. However, due to the limited reliability of predictions and potentially antagonistic effects of substitutions, only single-point mutations are usually predicted in silico, exptl. verified and then recombined in multiple-point mutants. Thus, substantial screening is still required. Here we present FireProt, a robust computational strategy for predicting highly stable multiple-point mutants that combines energy- and evolution-based approaches with smart filtering to identify additive stabilizing mutations. FireProt's reliability and applicability was demonstrated by validating its predictions against 656 mutations from the ProTherm database. We demonstrate that thermostability of the model enzymes haloalkane dehalogenase DhaA and γ-hexachlorocyclohexane dehydrochlorinase LinA can be substantially increased (ΔTm = 24°C and 21°C) by constructing and characterizing only a handful of multiple-point mutants. FireProt can be applied to any protein for which a tertiary structure and homologous sequences are available, and will facilitate the rapid development of robust proteins for biomedical and biotechnol. applications.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XkvVKhtb4%253D&md5=82389328fe2da01f6f99eba4afe20f40
46
Babkova, P.; Sebestova, E.; Brezovsky, J.; Chaloupkova, R.; Damborsky, J. Ancestral Haloalkane Dehalogenases Show Robustness and Unique Substrate Specificity. ChemBioChem 2017, 18, 1448– 1456, DOI: 10.1002/cbic.201700197

Google Scholar

46
Ancestral Haloalkane Dehalogenases Show Robustness and Unique Substrate Specificity

Babkova, Petra; Sebestova, Eva; Brezovsky, Jan; Chaloupkova, Radka; Damborsky, Jiri

ChemBioChem (2017), 18 (14), 1448-1456CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)

Ancestral sequence reconstruction (ASR) represents a powerful approach for empirical testing structure-function relationships of diverse proteins. We employed ASR to predict sequences of five ancestral haloalkane dehalogenases (HLDs) from the HLD-II subfamily. Genes encoding the inferred ancestral sequences were synthesized and expressed in Escherichia coli, and the resurrected ancestral enzymes (AncHLD1-5) were exptl. characterized. Strikingly, the ancestral HLDs exhibited significantly enhanced thermodn. stability compared to extant enzymes (ΔTm up to 24 °C), as well as higher specific activities with preference for short multi-substituted halogenated substrates. Moreover, multivariate statistical anal. revealed a shift in the substrate specificity profiles of AncHLD1 and AncHLD2. This is extremely difficult to achieve by rational protein engineering. The study highlights that ASR is an efficient approach for the development of novel biocatalysts and robust templates for directed evolution.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXpsFSqsLs%253D&md5=f732a67defa3a56ea468a91bc1c345dd
47
Goldenzweig, A.; Goldsmith, M.; Hill, S. E.; Gertman, O.; Laurino, P.; Ashani, Y.; Dym, O.; Unger, T.; Albeck, S.; Prilusky, J.; Lieberman, R. L.; Aharoni, A.; Silman, I.; Sussman, J. L.; Tawfik, D. S.; Fleishman, S. J. Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and Stability. Mol. Cell 2016, 63, 337– 346, DOI: 10.1016/j.molcel.2016.06.012

Google Scholar

47
Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and Stability

Goldenzweig, Adi; Goldsmith, Moshe; Hill, Shannon E.; Gertman, Or; Laurino, Paola; Ashani, Yacov; Dym, Orly; Unger, Tamar; Albeck, Shira; Prilusky, Jaime; Lieberman, Raquel L.; Aharoni, Amir; Silman, Israel; Sussman, Joel L.; Tawfik, Dan S.; Fleishman, Sarel J.

Molecular Cell (2016), 63 (2), 337-346CODEN: MOCEFL; ISSN:1097-2765. (Elsevier Inc.)

Upon heterologous overexpression, many proteins misfold or aggregate, thus resulting in low functional yields. Human acetylcholinesterase (hAChE), an enzyme mediating synaptic transmission, is a typical case of a human protein that necessitates mammalian systems to obtain functional expression. We developed a computational strategy and designed an AChE variant bearing 51 mutations that improved core packing, surface polarity, and backbone rigidity. This variant expressed at ∼2,000-fold higher levels in E. coli compared to wild-type hAChE and exhibited 20°C higher thermostability with no change in enzymic properties or in the active-site configuration as detd. by crystallog. To demonstrate broad utility, we similarly designed four other human and bacterial proteins. Testing at most three designs per protein, we obtained enhanced stability and/or higher yields of sol. and active protein in E. coli. Our algorithm requires only a 3D structure and several dozen sequences of naturally occurring homologs, and is available at http://pross.weizmann.ac.il.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhtFyiur7L&md5=b0a4f7734048636b9c30bd9449b4d4a1
48
Hammes, G. G.; Chang, Y.-C.; Oas, T. G. Conformational Selection or Induced Fit: A Flux Description of Reaction Mechanism. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 13737, DOI: 10.1073/pnas.0907195106

Google Scholar

48
Conformational selection or induced fit: a flux description of reaction mechanism

Hammes, Gordon G.; Chang, Yu-Chu; Oas, Terrence G.

Proceedings of the National Academy of Sciences of the United States of America (2009), 106 (33), 13737-13741CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)

The mechanism of ligand binding coupled to conformational changes in macromols. has recently attracted considerable interest. The 2 limiting cases are the "induced fit" mechanism (binding first) or "conformational selection" (conformational change first). Described here are the criteria by which the sequence of events can be detd. quant. The relative importance of the 2 pathways is detd. not by comparing rate consts. (a common misconception) but instead by comparing the flux through each pathway. The simple rules for calcg. flux in multistep mechanisms are described and then applied to 2 examples from the literature, neither of which has previously been analyzed using the concept of flux. The first example is the mechanism of conformational change in the binding of NADPH to dihydrofolate reductase (DHFR). The second example is the mechanism of flavodoxin folding coupled to binding of its cofactor, FMN. In both cases, the mechanism switches from being dominated by the conformational selection pathway at low ligand concn. to induced fit at high ligand concn. Over a wide range of conditions, a significant fraction of the flux occurs through both pathways. Such a mixed mechanism likely will be discovered for many cases of coupled conformational change and ligand binding when kinetic data are analyzed by using a flux-based approach.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtFWksL3N&md5=8c3f865f8c53b2d597ec26c8bba27fb3
49
Kramer, R. M.; Shende, V. R.; Motl, N.; Pace, C. N.; Scholtz, J. M. Toward a Molecular Understanding of Protein Solubility: Increased Negative Surface Charge Correlates with Increased Solubility. Biophys. J. 2012, 102, 1907– 1915, DOI: 10.1016/j.bpj.2012.01.060

Google Scholar

49
Toward a Molecular Understanding of Protein Solubility: Increased Negative Surface Charge Correlates with Increased Solubility

Kramer, Ryan M.; Shende, Varad R.; Motl, Nicole; Pace, C. Nick; Scholtz, J. Martin

Biophysical Journal (2012), 102 (8), 1907-1915CODEN: BIOJAU; ISSN:0006-3495. (Cell Press)

Protein soly. is a problem for many protein chemists, including structural biologists and developers of protein pharmaceuticals. Knowledge about how intrinsic factors influence soly. is limited due to the difficulty of obtaining quant. soly. measurements. Soly. measurements in buffer alone are difficult to reproduce, because gels or supersatd. solns. often form, making it impossible to det. soly. values for many proteins. Protein precipitants can be used to obtain comparative soly. measurements and, in some cases, estns. of soly. in buffer alone. Protein precipitants fall into three broad classes: salts, long-chain polymers, and org. solvents. Here, we compare the use of representatives from two classes of precipitants, ammonium sulfate and polyethylene glycol 8000, by measuring the soly. of seven proteins. We find that increased neg. surface charge correlates strongly with increased protein soly. and may be due to strong binding of water by the acidic amino acids. We also find that the soly. results obtained for the two different precipitants agree closely with each other, suggesting that the two precipitants probe similar properties that are relevant to soly. in buffer alone.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmtVeku7k%253D&md5=1425a6b01b8a36cf68766f853607cdf6
50
Khow, O.; Suntrarachun, S. Strategies for Production of Active Eukaryotic Proteins in Bacterial Expression System. Asian Pac. J. Trop. Biomed. 2012, 2, 159– 162, DOI: 10.1016/S2221-1691(11)60213-X

Google Scholar

50
Strategies for production of active eukaryotic proteins in bacterial expression system

Khow, Orawan; Suntrarachun, Sunutcha

Asian Pacific Journal of Tropical Biomedicine (2012), 2 (2), 159-162CODEN: APJTC7; ISSN:2221-1691. (Asian Pacific Tropical Medicine Press)

A review. Bacteria have long been the favorite expression system for recombinant protein prodn. However, the flaw of the system is that insol. and inactive proteins are co-produced due to codon bias, protein folding, phosphorylation, glycosylation, mRNA stability and promoter strength. Factors are cited and the methods to convert to sol. and active proteins are described, for example a tight control of Escherichia coli milieu, refolding from inclusion body and through fusion technol.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XjtVKlsrc%253D&md5=73bf7eb097e7d1c12983ef331a2a06c6
51
Sørensen, H. P.; Mortensen, K. K. Soluble Expression of Recombinant Proteins in the Cytoplasm of Escherichia coli. Microb. Cell Fact. 2005, 4, 1, DOI: 10.1186/1475-2859-4-1

Google Scholar

51
Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli

Sorensen Hans Peter; Mortensen Kim Kusk

Microbial cell factories (2005), 4 (1), 1 ISSN:.

Pure, soluble and functional proteins are of high demand in modern biotechnology. Natural protein sources rarely meet the requirements for quantity, ease of isolation or price and hence recombinant technology is often the method of choice. Recombinant cell factories are constantly employed for the production of protein preparations bound for downstream purification and processing. Eschericia coli is a frequently used host, since it facilitates protein expression by its relative simplicity, its inexpensive and fast high density cultivation, the well known genetics and the large number of compatible molecular tools available. In spite of all these qualities, expression of recombinant proteins with E. coli as the host often results in insoluble and/or nonfunctional proteins. Here we review new approaches to overcome these obstacles by strategies that focus on either controlled expression of target protein in an unmodified form or by applying modifications using expressivity and solubility tags.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2sbnvFOksg%253D%253D&md5=f3fbb4b2b2bce0500b4aa4f806c23e0b
52
Hartl, F. U.; Bracher, A.; Hayer-Hartl, M. Molecular Chaperones in Protein Folding and Proteostasis. Nature 2011, 475, 324– 332, DOI: 10.1038/nature10317

Google Scholar

52
Molecular chaperones in protein folding and proteostasis

Hartl, F. Ulrich; Bracher, Andreas; Hayer-Hartl, Manajit

Nature (London, United Kingdom) (2011), 475 (7356), 324-332CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)

A review. Most proteins must fold into defined 3-dimensional structures to gain functional activity. However, in the cellular environment, newly synthesized proteins are at great risk of aberrant folding and aggregation, potentially forming toxic species. To avoid these dangers, cells invest in a complex network of mol. chaperones, which use ingenious mechanisms to prevent aggregation and promote efficient folding. Because protein mols. are highly dynamic, const. chaperone surveillance is required to ensure protein homeostasis (proteostasis). Recent advances suggest that an age-related decline in proteostasis capacity allows the manifestation of various protein-aggregation diseases, including Alzheimer's disease and Parkinson's disease. Interventions in these and numerous other pathol. states may spring from a detailed understanding of the pathways underlying proteome maintenance.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXpt1aqsb8%253D&md5=8d3045af796a78a2e587bafc3a49211e
53
Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y.; Wriggers, W. Atomic-Level Characterization of the Structural Dynamics of Proteins. Science 2010, 330, 341– 346, DOI: 10.1126/science.1187409

Google Scholar

53
Atomic-Level Characterization of the Structural Dynamics of Proteins

Shaw, David E.; Maragakis, Paul; Lindorff-Larsen, Kresten; Piana, Stefano; Dror, Ron O.; Eastwood, Michael P.; Bank, Joseph A.; Jumper, John M.; Salmon, John K.; Shan, Yibing; Wriggers, Willy

Science (Washington, DC, United States) (2010), 330 (6002), 341-346CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)

Mol. dynamics (MD) simulations are widely used to study protein motions at an at. level of detail, but they have been limited to time scales shorter than those of many biol. crit. conformational changes. We examd. two fundamental processes in protein dynamics-protein folding and conformational change within the folded state-by means of extremely long all-atom MD simulations conducted on a special-purpose machine. Equil. simulations of a WW protein domain captured multiple folding and unfolding events that consistently follow a well-defined folding pathway; sep. simulations of the protein's constituent substructures shed light on possible determinants of this pathway. A 1-ms simulation of the folded protein BPTI reveals a small no. of structurally distinct conformational states whose reversible interconversion is slower than local relaxations within those states by a factor of more than 1000.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXht1OisL%252FN&md5=85c9d897881e8684fc39d69b2b6b2fad
54
Englander, S. W.; Mayne, L. The Case for Defined Protein Folding Pathways. Proc. Natl. Acad. Sci. U. S. A. 2017, 114, 8253– 8258, DOI: 10.1073/pnas.1706196114

Google Scholar

54
The case for defined protein folding pathways

Englander, S. Walter; Mayne, Leland

Proceedings of the National Academy of Sciences of the United States of America (2017), 114 (31), 8253-8258CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)

We consider the differences between the many-pathway protein folding model derived from theor. energy landscape considerations and the defined-pathway model derived from expt. A basic tenet of the energy landscape model is that proteins fold through many heterogeneous pathways by way of amino acid-level dynamics biased toward selecting native-like interactions. The many pathways imagined in the model are not obsd. in the structure-formation stage of folding by expts. that would have found them, but they have now been detected and characterized for one protein in the initial prenucleation stage. Anal. presented here shows that these many microscopic trajectories are not distinct in any functionally significant way, and they have neither the structural information nor the biased energetics needed to select native vs. non-native interactions during folding. The opposed defined-pathway model stems from exptl. results that show that proteins are assemblies of small cooperative units called foldons and that a no. of proteins fold in a reproducible pathway one foldon unit at a time. Thus, the same foldon interactions that encode the native structure of any given protein also naturally encode its particular foldon-based folding pathway, and they collectively sum to produce the energy bias toward native interactions that is necessary for efficient folding. Available information suggests that quantized native structure and stepwise folding coevolved in ancient repeat proteins and were retained as a functional pair due to their utility for solving the difficult protein folding problem.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVWmtL3O&md5=b7132d811d7981126c692fd69f27dd2c
55
Voelz, V. A.; Bowman, G. R.; Beauchamp, K.; Pande, V. S. Molecular Simulation of Ab Initio Protein Folding for a Millisecond Folder NTL9(1–39). J. Am. Chem. Soc. 2010, 132, 1526– 1528, DOI: 10.1021/ja9090353

Google Scholar

55
Molecular Simulation of ab Initio Protein Folding for a Millisecond Folder NTL9(1-39)

Voelz, Vincent A.; Bowman, Gregory R.; Beauchamp, Kyle; Pande, Vijay S.

Journal of the American Chemical Society (2010), 132 (5), 1526-1528CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)

The results obtained suggest that existing force field models using implicit solvent are indeed accurate enough to fold proteins ab initio at long time scales (milliseconds). opening the door to simulating more structurally complex proteins. Moreover, our work demonstrates that there need not be a single pathway or single. dominant mechanism for the folding of a given protein: since the theories proposed for how proteins fold are based on broadly relevant phys. principles, it is natural to imagine that multiple mechanisms could be simultaneously present but that the sequence of the protein, coupled with the chem. environment, would control the balance to which each mechanistic pathway is seen.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXkvFCktQ%253D%253D&md5=0f7e3f2489fc0693ee494b212cde2a6c
56
Eaton, W. A.; Wolynes, P. G. Theory, Simulations, and Experiments Show That Proteins Fold by Multiple Pathways. Proc. Natl. Acad. Sci. U. S. A. 2017, 114, E9759– E9760, DOI: 10.1073/pnas.1716444114

Google Scholar

56
Theory, simulations, and experiments show that proteins fold by multiple pathways

Eaton, William A.; Wolynes, Peter G.

Proceedings of the National Academy of Sciences of the United States of America (2017), 114 (46), E9759-E9760CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)

There is no expanded citation for this reference.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvVShsr3O&md5=ecdf18b4810579010470c23b1349b3f1
57
Yang, Y.; Niroula, A.; Shen, B.; Vihinen, M. PON-Sol: Prediction of Effects of Amino Acid Substitutions on Protein Solubility. Bioinformatics 2016, 32, 2032– 2034, DOI: 10.1093/bioinformatics/btw066

Google Scholar

57
PON-Sol: prediction of effects of amino acid substitutions on protein solubility

Yang, Yang; Niroula, Abhishek; Shen, Bairong; Vihinen, Mauno

Bioinformatics (2016), 32 (13), 2032-2034CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Motivation: Soly. is one of the fundamental protein properties. It is of great interest because of its relevance to protein expression. Reduced soly. and protein aggregation are also assocd. with many diseases. Results: We collected from literature the largest exptl. verified soly. affecting amino acid substitution (AAS) dataset and used it to train a predictor called PON-Sol. The predictor can distinguish both soly. decreasing and increasing variants from those not affecting soly. PON-Sol has normalized correct prediction ratio of 0.491 on cross-validation and 0.432 for independent test set. The performance of the method was compared both to soly. and aggregation predictors and found to be superior. PON-Sol can be used for the prediction of effects of disease-related substitutions, effects on heterologous recombinant protein expression and enhanced crystallizability. One application is to investigate effects of all possible AASs in a protein to aid protein engineering.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsF2lt7jN&md5=718a1b391921d0f38c443b58819ab66a
58
Broom, A.; Jacobi, Z.; Trainor, K.; Meiering, E. M. Computational Tools Help Improve Protein Stability but with a Solubility Tradeoff. J. Biol. Chem. 2017, 292, 14349– 14361, DOI: 10.1074/jbc.M117.784165

Google Scholar

58
Computational tools help improve protein stability but with a solubility tradeoff

Broom, Aron; Jacobi, Zachary; Trainor, Kyle; Meiering, Elizabeth M.

Journal of Biological Chemistry (2017), 292 (35), 14349-14361CODEN: JBCHA3; ISSN:0021-9258. (American Society for Biochemistry and Molecular Biology)

Accurately predicting changes in protein stability upon amino acid substitution is a much sought after goal. Destabilizing mutations are often implicated in disease, whereas stabilizing mutations are of great value for industrial and therapeutic biotechnol. Increasing protein stability is an esp. challenging task, with random substitution yielding stabilizing mutations in only ∼2% of cases. To overcome this bottleneck, computational tools that aim to predict the effect of mutations have been developed; however, achieving accuracy and consistency remains challenging. Here, we combined 11 freely available tools into a meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/). Validation against ∼600 exptl. mutations indicated that our meta-predictor has improved performance over any of the individual tools. The meta-predictor was then used to recommend 10 mutations in a previously designed protein of moderate thermodn. stability, ThreeFoil. Exptl. characterization showed that four mutations increased protein stability and could be amplified through ThreeFoil's structural symmetry to yield several multiple mutants with >2-kcal/mol stabilization. By avoiding residues within functional ties, we could maintain ThreeFoil's glycan-binding capacity. Despite successfully achieving substantial stabilization, however, almost all mutations decreased protein soly., the most common cause of protein design failure. Examn. of the 600-mutation data set revealed that stabilizing mutations on the protein surface tend to increase hydrophobicity and that the individual tools favor this approach to gain stability. Thus, whereas currently available tools can increase protein stability and combining them into a meta-predictor yields enhanced reliability, improvements to the potentials/force fields underlying these tools are needed to avoid gaining protein stability at the cost of soly.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVantLfP&md5=1c7b181ba75fad3548ba938167dd3a92
59
Cabantous, S.; Waldo, G. S. In Vivo and in Vitro Protein Solubility Assays Using Split GFP. Nat. Methods 2006, 3, 845– 854, DOI: 10.1038/nmeth932

Google Scholar

59
In vivo and in vitro protein solubility assays using split GFP

Cabantous, Stephanie; Waldo, Geoffrey S.

Nature Methods (2006), 3 (10), 845-854CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)

The rapid assessment of protein soly. is essential for evaluating expressed proteins and protein variants for use as reagents for downstream studies. Soly. screens based on antibody blots are complex and have limited screening capacity. Protein soly. screens using split β-galactosidase in vivo and in vitro can perturb protein folding. Split GFP used for monitoring protein interactions folds poorly, and to overcome this limitation, we recently developed a protein-tagging system based on self-complementing split GFP derived from an exceptionally well folded variant of GFP termed 'superfolder GFP'. Here we present the step-by-step procedure of the soly. assay using split GFP. A 15-amino-acid GFP fragment, GFP 11, is fused to a test protein. The GFP 1-10 detector fragment is expressed sep. These fragments assoc. spontaneously to form fluorescent GFP. The fragments are sol., and the GFP 11 tag has minimal effect on protein soly. and folding. We describe high-throughput protein soly. screens amenable both for in vivo and in vitro formats. The split-GFP system is composed of two vectors used in the same strain: pTET GFP 11 and pET GFP 1-10. The gene encoding the protein of interest is cloned into the pTET GFP 11 vector (resulting in an N-terminal fusion) and transformed into Escherichia coli BL21 (DE3) cells contg. the pET GFP 1-10 plasmid. We also describe how this system can be used for selecting sol. proteins from a library of variants. The large screening power of the in vivo assay combined with the high accuracy of the in vitro assay point to the efficiency of this two-step split-GFP tool for identifying sol. clones suitable for purifn. and downstream applications.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XpvVCmtb8%253D&md5=1e312220eac04371c8e03d4c8ee6bf48
60
Niwa, T.; Ying, B.-W.; Saito, K.; Jin, W.; Takada, S.; Ueda, T.; Taguchi, H. Bimodal Protein Solubility Distribution Revealed by an Aggregation Analysis of the Entire Ensemble of Escherichia coli Proteins. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 4201– 4206, DOI: 10.1073/pnas.0811922106

Google Scholar

60
Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins

Niwa, Tatsuya; Ying, Bei-Wen; Saito, Katsu; Jin, Wen Zhen; Takada, Shoji; Ueda, Takuya; Taguchi, Hideki

Proceedings of the National Academy of Sciences of the United States of America (2009), 106 (11), 4201-4206CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)

Protein folding often competes with intermol. aggregation, which in most cases irreversibly impairs protein function, as exemplified by the formation of inclusion bodies. Although it has been empirically detd. that some proteins tend to aggregate, the relationship between the protein aggregation propensities and the primary sequences remains poorly understood. Here, the authors individually synthesized the entire ensemble of Escherichia coli proteins by using an in vitro reconstituted translation system and analyzed the aggregation propensities. Because the reconstituted translation system is chaperone-free, they could evaluate the inherent aggregation propensities of thousands of proteins in a translation-coupled manner. A histogram of the solubilities, based on data from 3,173 translated proteins, revealed a clear bimodal distribution, indicating that the aggregation propensities are not evenly distributed across a continuum. Instead, the proteins can be categorized into 2 groups, sol. and aggregation-prone proteins. The aggregation propensity is most prominently correlated with the structural classification of proteins, implying that the prediction of aggregation propensity requires structural information about the protein.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXjslChur4%253D&md5=d40a704ee7e3e75c515c5be76d8c0dbb
61
Eijsink, V. G.; Vriend, G.; van den Burg, B.; van der Zee, J. R.; Veltman, O. R.; Stulp, B. K.; Venema, G. Introduction of a Stabilizing 10 Residue Beta-Hairpin in Bacillus Subtilis Neutral Protease. Protein Eng., Des. Sel. 1992, 5, 157– 163, DOI: 10.1093/protein/5.2.157

Google Scholar

There is no corresponding record for this reference.
62
Lee, C.; Levitt, M. Accurate Prediction of the Stability and Activity Effects of Site-Directed Mutagenesis on a Protein Core. Nature 1991, 352, 448– 451, DOI: 10.1038/352448a0

Google Scholar

62
Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core

Lee, Christopher; Levitt, Michael

Nature (London, United Kingdom) (1991), 352 (6334), 448-51CODEN: NATUAS; ISSN:0028-0836.

Theor. prediction of the structure, stability and activity of proteins, an important unsolved problem in mol. biol., would be of use for guiding site-directed mutagenesis and other protein-engineering techniques. X-ray diffraction studies have provided extensive structural information for many proteins, challenging theorists to develop reliable techniques able to use such knowledge as a base for prediction of mutants' characteristics. Here theor. calcn. of stabilization energies is reported for 78 triple-site sequence variants of λ repressor characterized exptl. The calcd. energies correlate with the mutants' measured activities; active and inactive mutations are discriminated with 92% reliability. They correlate even more directly with the mutant's thermostabilities, correctly identifying two of the mutants to be more stable than the wild type.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3MXltlWgt7g%253D&md5=c6845f89ebb1cfb4b56f72fdeb552838
63
Buß, O.; Muller, D.; Jager, S.; Rudat, J.; Rabe, K. S. Improvement in the Thermostability of a β-Amino Acid Converting ω-Transaminase by Using FoldX. ChemBioChem 2018, 19, 379– 387, DOI: 10.1002/cbic.201700467

Google Scholar

63
Improvement in the Thermostability of a β-Amino Acid Converting ω-Transaminase by Using FoldX

Buss, Oliver; Muller, Delphine; Jager, Sven; Rudat, Jens; Rabe, Kersten S.

ChemBioChem (2018), 19 (4), 379-387CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)

ω-Transaminases (ω-TAs) are important biocatalysts for the synthesis of active, chiral pharmaceutical ingredients contg. amino groups, such as β-amino acids, which are important in peptidomimetics and as building blocks for drugs. However, the application of ω-TAs is limited by the availability and stability of enzymes with high conversion rates. One strategy for the synthesis and optical resoln. of β-phenylalanine and other important arom. β-amino acids is biotransformation by utilizing an ω-transaminase from Variovorax paradoxus. We designed variants of this ω-TA to gain higher process stability on the basis of predictions calcd. by using the FoldX software. We herein report the first thermostabilization of a nonthermostable S-selective ω-TA by FoldX-guided site-directed mutagenesis. The m.p. (Tm) of our best-performing mutant was increased to 59.3 °C, an increase of 4.0 °C relative to the Tm value of the wild-type enzyme, whereas the mutant fully retained its specific activity.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvF2jsbzP&md5=9e031341acdee60b9abf74ba62226fd9
64
Modarres, H. P.; Mofrad, M. R.; Sanati-Nezhad, A. Protein Thermostability Engineering. RSC Adv. 2016, 6, 115252– 115270, DOI: 10.1039/C6RA16992A

Google Scholar

64
Protein thermostability engineering

Modarres, H. Pezeshgi; Mofrad, M. R.; Sanati-Nezhad, A.

RSC Advances (2016), 6 (116), 115252-115270CODEN: RSCACL; ISSN:2046-2069. (Royal Society of Chemistry)

The use of enzymes for industrial and biomedical applications is limited to their function at elevated temps. The principles of thermostability engineering need to be implemented for proteins with low thermal stability to broaden their applications. Therefore, understanding the thermal stability modulating factors of proteins is necessary for engineering their thermostability. In this review, first different thermostability enhancing strategies in both the sequence and structure levels, discovered by studying the natural proteins adapted to different conditions, are introduced. Next, the progress in the development of various computational methods to engineer thermostability of proteins by learning from nature and introducing several popular tools and algorithms for protein thermostability engineering is highlighted. Further discussion includes the challenges in the field of protein thermostability engineering such as the protein stability-activity trade-off. Finally, how thermostability engineering could be instrumental for the design of protein drugs for biomedical applications is demonstrated.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhsl2gt7nE&md5=d1f4472316de49f4a85a1fa175a17b49
65
Pace, C. N.; Scholtz, J. M.; Grimsley, G. R. Forces Stabilizing Proteins. FEBS Lett. 2014, 588, 2177– 2184, DOI: 10.1016/j.febslet.2014.05.006

Google Scholar

There is no corresponding record for this reference.
66
Lazaridis, T.; Karplus, M. Effective Energy Functions for Protein Structure Prediction. Curr. Opin. Struct. Biol. 2000, 10, 139– 145, DOI: 10.1016/S0959-440X(00)00063-4

Google Scholar

66
Effective energy functions for protein structure prediction

Lazaridis, Themis; Karplus, Martin

Current Opinion in Structural Biology (2000), 10 (2), 139-145CODEN: COSBEF; ISSN:0959-440X. (Elsevier Science Ltd.)

A review, with 78 refs. Protein structure prediction, fold recognition, homol. modeling and design rely mainly on statistical effective energy functions. Although the theor. foundation of such functions is not clear, their usefulness has been demonstrated in many applications. Mol. mechanics force fields, particularly when augmented by implicit solvation models, provide phys. effective energy functions that are beginning to play a role in this area.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXivFWgsbY%253D&md5=eeefab13ff97ddc2b40453f19291f365
67
Seeliger, D.; de Groot, B. L. Protein Thermostability Calculations Using Alchemical Free Energy Simulations. Biophys. J. 2010, 98, 2309– 2316, DOI: 10.1016/j.bpj.2010.01.051

Google Scholar

67
Protein thermostability calculations using alchemical free energy simulations

Seeliger, Daniel; de Groot, Bert L.

Biophysical Journal (2010), 98 (10), 2309-2316CODEN: BIOJAU; ISSN:0006-3495. (Cell Press)

Thermal stability of proteins is crucial for both biotechnol. and therapeutic applications. Rational protein engineering therefore frequently aims at increasing thermal stability by introducing stabilizing mutations. The accurate prediction of the thermodn. consequences caused by mutations, however, is highly challenging as thermal stability changes are caused by alterations in the free energy of folding. Growing computational power, however, increasingly allows us to use alchem. free energy simulations, such as free energy perturbation or thermodn. integration, to calc. free energy differences with relatively high accuracy. In this article, we present an automated protocol for setting up alchem. free energy calcns. for mutations of naturally occurring amino acids (except for proline) that allows an unprecedented, automated screening of large mutant libraries. To validate the developed protocol, we calcd. thermodn. stability differences for 109 mutations in the microbial RNase Barnase. The obtained quant. agreement with exptl. data illustrates the potential of the approach in protein engineering and design.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXosFCitrw%253D&md5=cc1e6e66f18be5c171829f6485cff377
68
Zhang, Z.; Wang, L.; Gao, Y.; Zhang, J.; Zhenirovskyy, M.; Alexov, E. Predicting Folding Free Energy Changes upon Single Point Mutations. Bioinformatics 2012, 28, 664– 671, DOI: 10.1093/bioinformatics/bts005

Google Scholar

68
Predicting folding free energy changes upon single point mutations

Zhang, Zhe; Wang, Lin; Gao, Yang; Zhang, Jie; Zhenirovskyy, Maxim; Alexov, Emil

Bioinformatics (2012), 28 (5), 664-671CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Motivation: The folding free energy is an important characteristic of proteins stability and is directly related to protein's wild-type function. The changes of protein's stability due to naturally occurring mutations, missense mutations, are typically causing diseases. Single point mutations made in vitro are frequently used to assess the contribution of given amino acid to the stability of the protein. In both cases, it is desirable to predict the change of the folding free energy upon single point mutations in order to either provide insights of the mol. mechanism of the change or to design new exptl. studies. Results: We report an approach that predicts the free energy change upon single point mutation by utilizing the 3D structure of the wild-type protein. It is based on variation of the mol. mechanics Generalized Born (MMGB) method, scaled with optimized parameters (sMMGB) and utilizing specific model of unfolded state. The corresponding mutations are built in silico and the predictions are tested against large dataset of 1109 mutations with exptl. measured changes of the folding free energy. Benchmarking resulted in root mean square deviation = 1.78 kcal/mol and slope of the linear regression fit between the exptl. data and the calcns. was 1.04. The sMMGB is compared with other leading methods of predicting folding free energy changes upon single mutations and results discussed with respect to various parameters. Availability: All the pdb files the authors used in this article can be downloaded from http://compbio.clemson.edu/downloadDir/mentaldisorders/sMMGBpdb.rar.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XjtlKntLg%253D&md5=f73d2f94ea145bd7a2b6ef7098e5ec52
69
Wickstrom, L.; Gallicchio, E.; Levy, R. M. The Linear Interaction Energy Method for the Prediction of Protein Stability Changes Upon Mutation. Proteins: Struct., Funct., Genet. 2012, 80, 111– 125, DOI: 10.1002/prot.23168

Google Scholar

69
The linear interaction energy method for the prediction of protein stability changes upon mutation

Wickstrom, Lauren; Gallicchio, Emilio; Levy, Ronald M.

Proteins: Structure, Function, and Bioinformatics (2012), 80 (1), 111-125CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)

The coupling of protein energetics and sequence changes is a crit. aspect of computational protein design, as well as for the understanding of protein evolution, human disease, and drug resistance. To study the mol. basis for this coupling, computational tools must be sufficiently accurate and computationally inexpensive enough to handle large amts. of sequence data. We have developed a computational approach based on the linear interaction energy (LIE) approxn. to predict the changes in the free-energy of the native state induced by a single mutation. This approach was applied to a set of 822 mutations in 10 proteins which resulted in an av. unsigned error of 0.82 kcal/mol and a correlation coeff. of 0.72 between the calcd. and exptl. ΔΔG values. The method is able to accurately identify destabilizing hot spot mutations; however, it has difficulty in distinguishing between stabilizing and destabilizing mutations because of the distribution of stability changes for the set of mutations used to parameterize the model. In addn., the model also performs quite well in initial tests on a small set of double mutations. On the basis of these promising results, we can begin to examine the relationship between protein stability and fitness, correlated mutations, and drug resistance.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtlygu73L&md5=bc404545288a8529418812ed678171e1
70
Guerois, R.; Nielsen, J. E.; Serrano, L. Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More than 1000 Mutations. J. Mol. Biol. 2002, 320, 369– 387, DOI: 10.1016/S0022-2836(02)00442-4

Google Scholar

70
Predicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutations

Guerois, Raphael; Nielsen, Jens Erik; Serrano, Luis

Journal of Molecular Biology (2002), 320 (2), 369-387CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Science Ltd.)

We have developed a computer algorithm, FOLDEF (for FOLD-X energy function), to provide a fast and quant. estn. of the importance of the interactions contributing to the stability of proteins and protein complexes. The predictive power of FOLDEF was tested on a very large set of point mutants (1088 mutants) spanning most of the structural environments found in proteins. FOLDEF uses a full at. description of the structure of the proteins. The different energy terms taken into account in FOLDEF have been weighted using empirical data obtained from protein engineering expts. First, we considered a training database of 339 mutants in nine different proteins and optimized the set of parameters and weighting factors that best accounted for the changes in stability of the mutants. The predictive power of the method was then tested using a blind test mutant database of 667 mutants, as well as a database of 82 protein-protein complex mutants. The global correlation obtained for 95 % of the entire mutant database (1030 mutants) is 0.83 with a std. deviation of 0.81 kcal mol-1 and a slope of 0.76. The present energy function uses a min. of computational resources and can therefore easily be used in protein design algorithms, and in the field of protein structure and folding pathways prediction where one requires a fast and accurate energy function.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XkslansLc%253D&md5=1e37d01c8310f0ba153cd2af3f5f771c
71
Mendes, J.; Guerois, R.; Serrano, L. Energy Estimation in Protein Design. Curr. Opin. Struct. Biol. 2002, 12, 441– 446, DOI: 10.1016/S0959-440X(02)00345-7

Google Scholar

71
Energy estimation in protein design

Mendes, Joaquim; Guerois, Raphael; Serrano, Luis

Current Opinion in Structural Biology (2002), 12 (4), 441-446CODEN: COSBEF; ISSN:0959-440X. (Elsevier Science Ltd.)

A review. The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold. The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XlvF2rtLc%253D&md5=d0d2d096d37ae267145a550325ba0cc0
72
Dehouck, Y.; Gilis, D.; Rooman, M. A New Generation of Statistical Potentials for Proteins. Biophys. J. 2006, 90, 4010– 4017, DOI: 10.1529/biophysj.105.079434

Google Scholar

72
A new generation of statistical potentials for proteins

Dehouck, Y.; Gilis, D.; Rooman, M.

Biophysical Journal (2006), 90 (11), 4010-4017CODEN: BIOJAU; ISSN:0006-3495. (Biophysical Society)

We propose a novel and flexible derivation scheme of statistical, database-derived, potentials, which allows one to take simultaneously into account specific correlations between several sequence and structure descriptors. This scheme leads to the decompn. of the total folding free energy of a protein into a sum of lower order terms, thereby giving the possibility to analyze independently each contribution and clarify its significance and importance, to avoid overcounting certain contributions, and to deal more efficiently with the limited size of the database. In addn., this derivation scheme appears as quite general, for many previously developed potentials can be expressed as particular cases of our formalism. We use this formalism as a framework to generate different residue-based energy functions, whose performances are assessed on the basis of their ability to discriminate genuine proteins from decoy models. The optimal potential is generated as a combination of several coupling terms, measuring correlations between residue types, backbone torsion angles, solvent accessibilities, relative positions along the sequence, and interresidue distances. This potential outperforms all tested residue-based potentials, and even several atom-based potentials. Its incorporation in algorithms aiming at predicting protein structure and stability should therefore substantially improve their performances.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XltF2mt7s%253D&md5=9e0b446c406d4d388bb2a4ad0ef271e4
73
Dehouck, Y.; Kwasigroch, J. M.; Gilis, D.; Rooman, M. PoPMuSiC 2.1: A Web Server for the Estimation of Protein Stability Changes upon Mutation and Sequence Optimality. BMC Bioinf. 2011, 12, 151, DOI: 10.1186/1471-2105-12-151

Google Scholar

73
PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality

Dehouck Yves; Kwasigroch Jean Marc; Gilis Dimitri; Rooman Marianne

BMC bioinformatics (2011), 12 (), 151 ISSN:.

BACKGROUND: The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level. RESULTS: PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability. CONCLUSION: The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3MngtVKktg%253D%253D&md5=b05d95255f2c9c47c88a3d96485e76cd
74
Liu, H. On Statistical Energy Functions for Biomolecular Modeling and Design. Quant. Biol. 2015, 3, 157– 167, DOI: 10.1007/s40484-015-0054-x

Google Scholar

74
On statistical energy functions for biomolecular modeling and design

Liu, Haiyan

Quantitative Biology (2015), 3 (4), 157-167CODEN: QBUIA3; ISSN:2095-4697. (Springer GmbH)

Statistical energy functions are general models about at. or residue-level interactions in biomols., derived from existing exptl. data. They provide quant. foundations for structural modeling as well as for structure-based protein sequence design. Statistical energy functions can be derived computationally either based on statistical distributions or based on variational assumptions. We present overviews on the theor. assumptions underlying the various types of approaches. Theor. considerations underlying important pragmatic choices are discussed. [Figure not available: see fulltext.].

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XjvFGrtw%253D%253D&md5=c09a8a84fa14a3c91a5ac461d6443ff9
75
Kumar, M. D. S.; Bava, K. A.; Gromiha, M. M.; Prabakaran, P.; Kitajima, K.; Uedaira, H.; Sarai, A. ProTherm and ProNIT: Thermodynamic Databases for Proteins and Protein–Nucleic Acid Interactions. Nucleic Acids Res. 2006, 34, D204– 206, DOI: 10.1093/nar/gkj103

Google Scholar

75
ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions

Kumar, M. D. Shaji; Bava, K. Abdulla; Gromiha, M. Michael; Prabakaran, Ponraj; Kitajima, Koji; Uedaira, Hatsuho; Sarai, Akinori

Nucleic Acids Research (2006), 34 (Database), D204-D206CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)

ProTherm and ProNIT are two thermodn. databases that contain exptl. detd. thermodn. parameters of protein stability and protein-nucleic acid interactions, resp. The current versions of both the databases have considerably increased the total no. of entries and enhanced search interface with added new fields, improved search, display and sorting options. As on Sept. 2005, ProTherm release 5.0 contains 17 113 entries from 771 proteins, retrieved from 1497 scientific articles (∼20% increase in data from the previous version). ProNIT release 2.0 contains 4900 entries from 273 research articles, representing 158 proteins. Both databases can be queried using WWW interfaces. Both quick search and advanced search are provided on this web page to facilitate easy retrieval and display of the data from these databases. ProTherm is freely available online at http://gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html and ProNIT at http://gibk26.bse.kyutech.ac.jp.jouhou/pronit/pronit.html.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XisFyitA%253D%253D&md5=31a4c4d1ba1948a78963225177f1bcdf
76
Pucci, F.; Bourgeas, R.; Rooman, M. High-Quality Thermodynamic Data on the Stability Changes of Proteins Upon Single-Site Mutations. J. Phys. Chem. Ref. Data 2016, 45, 023104, DOI: 10.1063/1.4947493

Google Scholar

76
High-quality Thermodynamic Data on the Stability Changes of Proteins Upon Single-site Mutations

Pucci, Fabrizio; Bourgeas, Raphael; Rooman, Marianne

Journal of Physical and Chemical Reference Data (2016), 45 (2), 023104/1-023104/53CODEN: JPCRBU; ISSN:0047-2689. (American Institute of Physics)

We have set up and manually curated a dataset contg. exptl. information on the impact of amino acid substitutions in a protein on its thermal stability. It consists of a repository of exptl. measured melting temps. (Tm) and their changes upon point mutations (ΔTm) for proteins having a well-resolved x-ray structure. This high-quality dataset is designed for being used for the training or benchmarking of in silico thermal stability prediction methods. It also reports other exptl. measured thermodn. quantities when available, i.e., the folding enthalpy (ΔH) and heat capacity (ΔCP) of the wild type proteins and their changes upon mutations (ΔΔH and ΔΔCP), as well as the change in folding free energy (ΔΔG) at a ref. temp. These data are analyzed in view of improving our insights into the correlation between thermal and thermodn. stabilities, the asymmetry between the no. of stabilizing and destabilizing mutations, and the difference in stabilization potential of thermostable vs. mesostable proteins. (c) 2016 American Institute of Physics.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XpsF2kt70%253D&md5=054a88915e91acf65599527a74c7d0c6
77
Potapov, V.; Cohen, M.; Schreiber, G. Assessing Computational Methods for Predicting Protein Stability upon Mutation: Good on Average but Not in the Details. Protein Eng., Des. Sel. 2009, 22, 553– 560, DOI: 10.1093/protein/gzp030

Google Scholar

77
Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details

Potapov, Vladimir; Cohen, Mati; Schreiber, Gideon

Protein Engineering, Design & Selection (2009), 22 (9), 553-560CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)

Methods for protein modeling and design advanced rapidly in recent years. At the heart of these computational methods is an energy function that calcs. the free energy of the system. Many of these functions were also developed to est. the consequence of mutation on protein stability or binding affinity. In the current study, the authors chose 6 different methods that were previously reported as being able to predict the change in protein stability (ΔΔG) upon mutation: CC/PBSA, EGAD, FoldX, I-Mutant2.0, Rosetta and Hunter. The authors evaluated their performance on a large set of 2156 single mutations, avoiding for each program the mutations used for training. The correlation coeffs. between exptl. and predicted ΔΔG values were in the range of 0.59 for the best and 0.26 for the worst performing method. All the tested computational methods showed a correct trend in their predictions, but failed in providing the precise values. This is not due to lack in precision of the exptl. data, which showed a correlation coeff. of 0.86 between different measurements. Combining the methods did not significantly improve prediction accuracy compared to a single method. These results suggest that there is still room for improvement, which is crucial if we want forcefields to perform better in their various tasks.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtVKns7zI&md5=428ef7793cd6062f3e4b05831742ce25
78
Schymkowitz, J.; Borg, J.; Stricher, F.; Nys, R.; Rousseau, F.; Serrano, L. The FoldX Web Server: An Online Force Field. Nucleic Acids Res. 2005, 33, W382– 388, DOI: 10.1093/nar/gki387

Google Scholar

78
The FoldX web server: an online force field

Schymkowitz, Joost; Borg, Jesper; Stricher, Francois; Nys, Robby; Rousseau, Frederic; Serrano, Luis

Nucleic Acids Research (2005), 33 (Web Server), W382-W388CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)

FoldX is an empirical force field that was developed for the rapid evaluation of the effect of mutations on the stability, folding and dynamics of proteins and nucleic acids. The core functionality of FoldX, namely the calcn. of the free energy of a macromol. based on its high-resoln. 3D structure, is now publicly available through a web server at http://foldx.embl.de/. The current release allows the calcn. of the stability of a protein, calcn. of the positions of the protons and the prediction of water bridges, prediction of metal binding sites and the anal. of the free energy of complex formation. Alanine scanning, the systematic truncation of side chains to alanine, is also included. In addn., some reporting functions have been added, and it is now possible to print both the at. interaction networks that constitute the protein, print the structural and energetic details of the interactions per atom or per residue, as well as generate a general quality report of the pdb structure. This core functionality will be further extended as more FoldX applications are developed.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXlslyrur4%253D&md5=1c3cd02dfeb8b5df1e1096939aa9cf03
79
Kepp, K. P. Towards a “Golden Standard” for Computing Globin Stability: Stability and Structure Sensitivity of Myoglobin Mutants. Biochim. Biophys. Acta, Proteins Proteomics 2015, 1854, 1239– 1248, DOI: 10.1016/j.bbapap.2015.06.002

Google Scholar

79
Towards a "Golden Standard" for computing globin stability: Stability and structure sensitivity of myoglobin mutants

Kepp, Kasper P.

Biochimica et Biophysica Acta, Proteins and Proteomics (2015), 1854 (10_Part_A), 1239-1248CODEN: BBAPBW; ISSN:1570-9639. (Elsevier B. V.)

Fast and accurate computation of protein stability is increasingly important for e.g. protein engineering and protein misfolding diseases, but no consensus methods exist for important proteins such as globins, and performance may depend on the type of structural input given. This paper reports benchmarking of six protein stability calculators (POPMUSIC 2.1, I-Mutant 2.0, I-Mutant 3.0, CUPSAT, SDM, and mCSM) against 134 exptl. stability changes for mutations of sperm-whale myoglobin. Six different high-resoln. structures were used to test structure sensitivity that may impair protein calcns. The trend accuracy of the methods decreased as I-Mutant 2.0 (R = 0.64 - 0.65), SDM (R = 0.57 - 0.60), POPMUSIC2.1 (R = 0.54 - 0.57), I-Mutant 3.0 (R = 0.53 - 0.55), mCSM (R = 0.35 - 0.47), and CUPSAT (R = 0.25 - 0.48). The mean signed errors increased as SDM < CUPSAT < I-Mutant 2.0 < I-Mutant 3.0 < POPMUSIC 2.1 < mCSM. Mean abs. errors increased as I-Mutant 2.0 < I-Mutant 3.0 < POPMUSIC 2.1 < CUPSAT < SDM < mCSM. Structural sensitivity increased as I-Mutant 3.0 (0.05) < I-Mutant 2.0 (0.09) < POPMUSIC 2.1 (0.12) < SDM (0.18) < mCSM (0.27) < CUPSAT (0.58). Leaving out heterogeneous exptl. data did not change conclusions. The distinct performances reveal room for improvement, but I-Mutant 2.0 is proficient for this purpose, as further validated against a data set of related cytochrome c like proteins. The results also emphasize the importance of high-quality crystal structures and reveal structure-dependent effects even in the near-at. resoln. limit.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtVWrtLjL&md5=92253dc29cffce1ed2835a1df377b9f6
80
Christensen, N. J.; Kepp, K. P. Accurate Stabilities of Laccase Mutants Predicted with a Modified FoldX Protocol. J. Chem. Inf. Model. 2012, 52, 3028– 3042, DOI: 10.1021/ci300398z

Google Scholar

80
Accurate Stabilities of Laccase Mutants Predicted with a Modified FoldX Protocol

Christensen, Niels J.; Kepp, Kasper P.

Journal of Chemical Information and Modeling (2012), 52 (11), 3028-3042CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)

Fungal laccases are multicopper enzymes of industrial importance due to their high stability, multifunctionality, and oxidizing power. This paper reports computational protocols that quantify the relative stability (ΔΔG of folding) of mutants of high-redox-potential laccases (TvLIIIb and PM1L) with up to 11 simultaneously mutated sites with good correlation against exptl. stability trends. Mol. dynamics simulations of the two laccases show that FoldX is very structure-sensitive, since all mutants and the wild type must share structural configuration to avoid artifacts of local sampling. However, using the av. of 50 MD snapshots of the equilibrated trajectories restores correlation (r ∼ 0.7-0.9, r2 ∼ 0.49-0.81) and provides a root-mean-square accuracy of ∼1.2 kcal/mol for ΔΔG or 3.5 °C for T50, suggesting that the time-av. of the crystal structure is recovered. MD-averaged input also reduces the spread in ΔΔG, suggesting that local FoldX sampling overestimates free energy changes because of neglected protein relaxation. FoldX can be viewed as a simple "linear interaction energy" method using sampling of the wild type and mutant and a parametrized relative free energy function: Thus, we show in this work that a substantial "hysteresis" of ∼1 kcal/mol applies to FoldX, and that an improved protocol that reverses calcns. and uses the av. obtained ΔΔG enhances correlation with the exptl. data. As glycosylation is ignored in FoldX, its effect on ΔΔG must be additive to the amino acid mutations. Quant. structure-property relationships of the FoldX energy components produced a substantially improved laccase stability predictor with errors of ∼1 °C for T50, vs 3-5 °C for a std. FoldX protocol. The developed model provides insight into the phys. forces governing the high stability of fungal laccases, most notably the hydrophobic and van der Waals interactions in the folded state, which provide most of the predictive power.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhsFOlt77M&md5=967d831a16967900e285baf70988ad75
81
MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586– 3616, DOI: 10.1021/jp973084f

Google Scholar

81
All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins

MacKerell, A. D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E., III; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M.

Journal of Physical Chemistry B (1998), 102 (18), 3586-3616CODEN: JPCBFK; ISSN:1089-5647. (American Chemical Society)

New protein parameters are reported for the all-atom empirical energy function in the CHARMM program. The parameter evaluation was based on a self-consistent approach designed to achieve a balance between the internal (bonding) and interaction (nonbonding) terms of the force field and among the solvent-solvent, solvent-solute, and solute-solute interactions. Optimization of the internal parameters used exptl. gas-phase geometries, vibrational spectra, and torsional energy surfaces supplemented with ab initio results. The peptide backbone bonding parameters were optimized with respect to data for N-methylacetamide and the alanine dipeptide. The interaction parameters, particularly the at. charges, were detd. by fitting ab initio interaction energies and geometries of complexes between water and model compds. that represented the backbone and the various side chains. In addn., dipole moments, exptl. heats and free energies of vaporization, solvation and sublimation, mol. vols., and crystal pressures and structures were used in the optimization. The resulting protein parameters were tested by applying them to noncyclic tripeptide crystals, cyclic peptide crystals, and the proteins crambin, bovine pancreatic trypsin inhibitor, and carbonmonoxy myoglobin in vacuo and in a crystal. A detailed anal. of the relationship between the alanine dipeptide potential energy surface and calcd. protein φ, χ angles was made and used in optimizing the peptide group torsional parameters. The results demonstrate that use of ab initio structural and energetic data by themselves are not sufficient to obtain an adequate backbone representation for peptides and proteins in soln. and in crystals. Extensive comparisons between mol. dynamics simulation and exptl. data for polypeptides and proteins were performed for both structural and dynamic properties. Calcd. data from energy minimization and dynamics simulations for crystals demonstrate that the latter are needed to obtain meaningful comparisons with exptl. crystal structures. The presented parameters, in combination with the previously published CHARMM all-atom parameters for nucleic acids and lipids, provide a consistent set for condensed-phase simulations of a wide variety of mols. of biol. interest.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXivVOlsb4%253D&md5=ebb5100dafd0daeee60ca2fa66c1324a
82
Oostenbrink, C.; Villa, A.; Mark, A. E.; van Gunsteren, W. F. A Biomolecular Force Field Based on the Free Enthalpy of Hydration and Solvation: The GROMOS Force-Field Parameter Sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656– 1676, DOI: 10.1002/jcc.20090

Google Scholar

82
A biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force-field parameter sets 53A5 and 53A6

Oostenbrink, Chris; Villa, Alessandra; Mark, Alan E.; van Gunsteren, Wilfred F.

Journal of Computational Chemistry (2004), 25 (13), 1656-1676CODEN: JCCHDD; ISSN:0192-8651. (John Wiley & Sons, Inc.)

Successive parameterizations of the GROMOS force field have been used successfully to simulate biomol. systems over a long period of time. The continuing expansion of computational power with time makes it possible to compute ever more properties for an increasing variety of mol. systems with greater precision. This has led to recurrent parameterizations of the GROMOS force field all aimed at achieving better agreement with exptl. data. Here we report the results of the latest, extensive reparameterization of the GROMOS force field. In contrast to the parameterization of other biomol. force fields, this parameterization of the GROMOS force field is based primarily on reproducing the free enthalpies of hydration and apolar solvation for a range of compds. This approach was chosen because the relative free enthalpy of solvation between polar and apolar environments is a key property in many biomol. processes of interest, such as protein folding, biomol. assocn., membrane formation, and transport over membranes. The newest parameter sets, 53A5 and 53A6, were optimized by first fitting to reproduce the thermodn. properties of pure liqs. of a range of small polar mols. and the solvation free enthalpies of amino acid analogs in cyclohexane (53A5). The partial charges were then adjusted to reproduce the hydration free enthalpies in water (53A6). Both parameter sets are fully documented, and the differences between these and previous parameter sets are discussed.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXmvVOhtr4%253D&md5=f2c0be6f44fe768128989c9031957e4e
83
Alford, R. F.; Leaver-Fay, A.; Jeliazkov, J. R.; O’Meara, M. J.; DiMaio, F. P.; Park, H.; Shapovalov, M. V.; Renfrew, P. D.; Mulligan, V. K.; Kappel, K.; Labonte, J. W.; Pacella, M. S.; Bonneau, R.; Bradley, P.; Dunbrack, R. L.; Das, R.; Baker, D.; Kuhlman, B.; Kortemme, T.; Gray, J. J. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 3031– 3048, DOI: 10.1021/acs.jctc.7b00125

Google Scholar

83
The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design

Alford, Rebecca F.; Leaver-Fay, Andrew; Jeliazkov, Jeliazko R.; O'Meara, Matthew J.; DiMaio, Frank P.; Park, Hahnbeom; Shapovalov, Maxim V.; Renfrew, P. Douglas; Mulligan, Vikram K.; Kappel, Kalli; Labonte, Jason W.; Pacella, Michael S.; Bonneau, Richard; Bradley, Philip; Dunbrack, Roland L.; Das, Rhiju; Baker, David; Kuhlman, Brian; Kortemme, Tanja; Gray, Jeffrey J.

Journal of Chemical Theory and Computation (2017), 13 (6), 3031-3048CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)

A review. Over the past decade, the Rosetta biomol. modeling suite has informed diverse biol. questions and engineering challenges ranging from interpretation of low-resoln. structural data to design of nanomaterials, protein therapeutics, and vaccines. Central to Rosetta's success is the energy function: a model parameterized from small mol. and x-ray crystal structure data used to approx. the energy assocd. with each biomol. conformation. This paper describes the math. models and phys. concepts that underlie the latest Rosetta Energy Function, REF15. Applying these concepts, the authors explain how to use Rosetta energies to identify and analyze the features of biomol. models. Finally, the authors discuss the latest advances in the energy function that extend capabilities from sol. proteins to also include membrane proteins, peptides contg. noncanonical amino acids, small mols., carbohydrates, nucleic acids, and other macromols.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXmsFajtb0%253D&md5=7c50732bb0c8d060bbf13df04766ce39
84
Davey, J. A.; Damry, A. M.; Euler, C. K.; Goto, N. K.; Chica, R. A. Prediction of Stable Globular Proteins Using Negative Design with Non-Native Backbone Ensembles. Structure 2015, 23, 2011– 2021, DOI: 10.1016/j.str.2015.07.021

Google Scholar

84
Prediction of Stable Globular Proteins Using Negative Design with Non-native Backbone Ensembles

Davey, James A.; Damry, Adam M.; Euler, Christian K.; Goto, Natalie K.; Chica, Roberto A.

Structure (Oxford, United Kingdom) (2015), 23 (11), 2011-2021CODEN: STRUE6; ISSN:0969-2126. (Elsevier Ltd.)

Accurate predictions of protein stability have great potential to accelerate progress in computational protein design, yet the correlation of predicted and exptl. detd. stabilities remains a significant challenge. To address this problem, we have developed a computational framework based on neg. multistate design in which sequence energy is evaluated in the context of both native and non-native backbone ensembles. This framework was validated exptl. with the design of ten variants of streptococcal protein G domain β1 that retained the wild-type fold, and showed a very strong correlation between predicted and exptl. stabilities (R2 = 0.86). When applied to four different proteins spanning a range of fold types, similarly strong correlations were also obtained. Overall, the enhanced prediction accuracies afforded by this method pave the way for new strategies to facilitate the generation of proteins with novel functions by computational protein design.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhsFKqsbzF&md5=4c11259f13e4cd1a28c5b550631978ee
85
Ó Conchúir, S.; Barlow, K. A.; Pache, R. A.; Ollikainen, N.; Kundert, K.; O’Meara, M. J.; Smith, C. A.; Kortemme, T. A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design. PLoS One 2015, 10, e0130433, DOI: 10.1371/journal.pone.0130433

Google Scholar

There is no corresponding record for this reference.
86
Trainor, K.; Broom, A.; Meiering, E. M. Exploring the Relationships between Protein Sequence, Structure and Solubility. Curr. Opin. Struct. Biol. 2017, 42, 136– 146, DOI: 10.1016/j.sbi.2017.01.004

Google Scholar

86
Exploring the relationships between protein sequence, structure and solubility

Trainor, Kyle; Broom, Aron; Meiering, Elizabeth M.

Current Opinion in Structural Biology (2017), 42 (), 136-146CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)

A review. Aggregation can be thought of as a form of protein folding in which intermol. assocns. lead to the formation of large, insol. assemblies. Various types of aggregates can be differentiated by their internal structures and gross morphologies (e.g., fibrillar or amorphous), and the ability to accurately predict the likelihood of their formation by a given polypeptide is of great practical utility in the fields of biol. (including the study of disease), biotechnol., and biomaterials research. Here we review aggregation/soly. prediction methods and selected applications thereof. The development of increasingly sophisticated methods that incorporate knowledge of conformations possibly adopted by aggregating polypeptide monomers and predict the internal structure of aggregates is improving the accuracy of the predictions and continually expanding the range of applications.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVeitbw%253D&md5=ecc332d41a33abbdd3a2ff614195d08f
87
Das, R. Four Small Puzzles That Rosetta Doesn’t Solve. PLoS One 2011, 6, e20044, DOI: 10.1371/journal.pone.0020044

Google Scholar

87
Four small puzzles that Rosetta doesn't solve

Das, Rhiju

PLoS One (2011), 6 (5), e20044CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)

A complete macromol. modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resoln. structure modeling and design, the Rosetta software suite fares poorly on small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approxns. and omissions in the Rosetta all-atom energy function currently preclude discriminating exptl. obsd. conformations from de novo models at at. resoln. These mol. "puzzles" should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXms1Kmur4%253D&md5=e61085052b8642f9819bf84d8090f4cb
88
Kellogg, E. H.; Leaver-Fay, A.; Baker, D. Role of Conformational Sampling in Computing Mutation-Induced Changes in Protein Structure and Stability. Proteins: Struct., Funct., Genet. 2011, 79, 830– 838, DOI: 10.1002/prot.22921

Google Scholar

88
Role of conformational sampling in computing mutation-induced changes in protein structure and stability

Kellogg, Elizabeth H.; Leaver-Fay, Andrew; Baker, David

Proteins: Structure, Function, and Bioinformatics (2011), 79 (3), 830-838CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)

The prediction of changes in protein stability and structure resulting from single amino acid substitutions is both a fundamental test of macromol. modeling methodol. and an important current problem as high throughput sequencing reveals sequence polymorphisms at an increasing rate. In principle, given the structure of a wild-type protein and a point mutation whose effects are to be predicted, an accurate method should recapitulate both the structural changes and the change in the folding-free energy. Here, we explore the performance of protocols which sample an increasing diversity of conformations. We find that surprisingly similar performances in predicting changes in stability are achieved using protocols that involve very different amts. of conformational sampling, provided that the resoln. of the force field is matched to the resoln. of the sampling method. Methods involving backbone sampling can in some cases closely recapitulate the structural changes accompanying mutations but not surprisingly tend to do more harm than good in cases where structural changes are negligible. Anal. of the outliers in the stability change calcns. suggests areas needing particular improvement; these include the balance between desolvation and the formation of favorable buried polar interactions, and unfolded state modeling.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjtFahsbg%253D&md5=df144d0b7df3f42669c7344c0b13b806
89
Musil, M.; Stourac, J.; Bendl, J.; Brezovsky, J.; Prokop, Z.; Zendulka, J.; Martinek, T.; Bednar, D.; Damborsky, J. FireProt: Web Server for Automated Design of Thermostable Proteins. Nucleic Acids Res. 2017, 45, W393– W399, DOI: 10.1093/nar/gkx285

Google Scholar

89
FireProt: web server for automated design of thermostable proteins

Musil, Milos; Stourac, Jan; Bendl, Jaroslav; Brezovsky, Jan; Prokop, Zbynek; Zendulka, Jaroslav; Martinek, Tomas; Bednar, David; Damborsky, Jiri

Nucleic Acids Research (2017), 45 (W1), W393-W399CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)

There is a continuous interest in increasing proteins stability to enhance their usability in numerous biomedical and biotechnol. applications. A no. of in silico tools for the prediction of the effect of mutations on protein stability have been developed recently. However, only single-point mutations with a small effect on protein stability are typically predicted with the existing tools and have to be followed by laborious protein expression, purifn., and characterization. Here, the authors present FireProt, a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calcn. core. FireProt utilizes sixteen tools and three protein engineering strategies for making reliable protein designs. The server is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostable mutants. FireProt is freely available at http://loschmidt.chemi.muni.cz/fireprot.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1ajtbs%253D&md5=10985da4ecd4d7ff3835a413c85f8e3b
90
Bush, J.; Makhatadze, G. I. Statistical Analysis of Protein Structures Suggests That Buried Ionizable Residues in Proteins Are Hydrogen Bonded or Form Salt Bridges. Proteins: Struct., Funct., Genet. 2011, 79, 2027– 2032, DOI: 10.1002/prot.23067

Google Scholar

90
Statistical analysis of protein structures suggests that buried ionizable residues in proteins are hydrogen bonded or form salt bridges

Bush, Jeffrey; Makhatadze, George I.

Proteins: Structure, Function, and Bioinformatics (2011), 79 (7), 2027-2032CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)

It is well known that nonpolar residues are largely buried in the interior of proteins, whereas polar and ionizable residues tend to be more localized on the protein surface where they are solvent-exposed. Such a distribution of residues between surface and interior is well understood from a thermodn. point: nonpolar side-chains are excluded from contact with solvent water, whereas polar and ionizable groups have favorable interactions with water and thus are preferred at the protein surface. However, there is an increasing amt. of information suggesting that polar and ionizable residues do occur in the protein core, including at positions that have no known functional importance. This is inconsistent with the observations that dehydration of polar and in particular ionizable groups is very energetically unfavorable. To resolve this, the authors performed a detailed anal. of the distribution of fractional burial of polar and ionizable residues using a large set of ∼2600 non-homologous protein structures. The authors showed that when ionizable residues were fully buried, the vast majority of them formed H-bonds and/or salt bridges with other polar/ionizable groups. This observation resolved an apparent contradiction: the energetic penalty of dehydration of polar/ionizable groups is paid off by the favorable energy of H-bonding and/or salt bridge formation in the protein interior. This conclusion agrees well with previous findings based on continuum models for electrostatic interactions in proteins.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXnt1Klt7k%253D&md5=93d332c5c9a965168e698c947386f46b
91
Stranges, P. B.; Kuhlman, B. A Comparison of Successful and Failed Protein Interface Designs Highlights the Challenges of Designing Buried Hydrogen Bonds. Protein Sci. 2013, 22, 74– 82, DOI: 10.1002/pro.2187

Google Scholar

91
A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds

Stranges, P. Benjamin; Kuhlman, Brian

Protein Science (2013), 22 (1), 74-82CODEN: PRCIEI; ISSN:1469-896X. (Wiley-Blackwell)

The accurate design of new protein-protein interactions is a longstanding goal of computational protein design. However, most computationally designed interfaces fail to form exptl. This investigation compares five previously described successful de novo interface designs with 158 failures. Both sets of proteins were designed with the mol. modeling program Rosetta. Designs were considered a success if a high-resoln. crystal structure of the complex closely matched the design model and the equil. dissocn. const. for binding was less than 10 μM. The successes and failures represent a wide variety of interface types and design goals including heterodimers, homodimers, peptide-protein interactions, one-sided designs (i.e., where only one of the proteins was mutated) and two-sided designs. The most striking feature of the successful designs is that they have fewer polar atoms at their interfaces than many of the failed designs. Designs that attempted to create extensive sets of interface-spanning hydrogen bonds resulted in no detectable binding. In contrast, polar atoms make up more than 40% of the interface area of many natural dimers, and native interfaces often contain extensive hydrogen bonding networks. These results suggest that Rosetta may not be accurately balancing hydrogen bonding and electrostatic energies against desolvation penalties and that design processes may not include sufficient sampling to identify side chains in preordered conformations that can fully satisfy the hydrogen bonding potential of the interface.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhvVeksrvO&md5=1e503efb4899c5769d094fa4b4a259b6
92
Beerens, K.; Mazurenko, S.; Kunka, A.; Marques, S. M.; Hansen, N.; Musil, M.; Chaloupkova, R.; Waterman, J.; Brezovsky, J.; Bednar, D.; Prokop, Z.; Damborsky, J. Evolutionary Analysis Is a Powerful Complement to Energy Calculations for Protein Stabilization. ACS Catal. 2018, 8, 9420– 9428, DOI: 10.1021/acscatal.8b01677

Google Scholar

92
Evolutionary Analysis As a Powerful Complement to Energy Calculations for Protein Stabilization

Beerens, Koen; Mazurenko, Stanislav; Kunka, Antonin; Marques, Sergio M.; Hansen, Niels; Musil, Milos; Chaloupkova, Radka; Waterman, Jitka; Brezovsky, Jan; Bednar, David; Prokop, Zbynek; Damborsky, Jiri

ACS Catalysis (2018), 8 (10), 9420-9428CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)

Stability is one of the most important characteristics of proteins employed as biocatalysts, biotherapeutics and biomaterials, and the role of computational approaches in modifying protein stability is rapidly expanding. We have recently identified stabilizing mutations in haloalkane dehalogenase DhaA using phylogenetic anal. but were not able to reproduce the effects of these mutations using force-field calcns. Here we tested four different hypotheses to explain the mol. basis of stabilization using structural, biochem., biophys. and computational analyses. We demonstrate that stabilization of DhaA by the mutations identified using the phylogenetic anal. is driven by both entropy and enthalpy-contributions, in contrast to primarily enthalpy-driven stabilization by mutations designed by the force-field calcns. Comprehensive bioinformatics anal. revealed that more than half (53%) of 1,099 evolution-based stabilizing mutations would be evaluated as de-stabilizing by force-field calcns. Thermodn. integration considers both folded and unfolded states and can describe the entropic component of stabilization, yet it is not suitable for predictive purposes due to computational demands. Altogether, our results strongly suggest that energetic calcns. should be complemented by a phylogenetic anal. in protein stabilization endeavors.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1ChtrjL&md5=c558f092f166df3aa700b008f3bfae5d
93
Wijma, H. J.; Floor, R. J.; Jekel, P. A.; Baker, D.; Marrink, S. J.; Janssen, D. B. Computationally Designed Libraries for Rapid Enzyme Stabilization. Protein Eng., Des. Sel. 2014, 27, 49– 58, DOI: 10.1093/protein/gzt061

Google Scholar

93
Computationally designed libraries for rapid enzyme stabilization

Wijma, Hein J.; Floor, Robert J.; Jekel, Peter A.; Baker, David; Marrink, Siewert J.; Janssen, Dick B.

Protein Engineering, Design & Selection (2014), 27 (2), 49-58CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)

The ability to engineer enzymes and other proteins to any desired stability would have wide-ranging applications. Here, we demonstrate that computational design of a library with chem. diverse stabilizing mutations allows the engineering of drastically stabilized and fully functional variants of the mesostable enzyme limonene epoxide hydrolase. First, point mutations were selected if they significantly improved the predicted free energy of protein folding. Disulfide bonds were designed using sampling of backbone conformational space, which tripled the no. of exptl. stabilizing disulfide bridges. Next, orthogonal in silico screening steps were used to remove chem. unreasonable mutations and mutations that are predicted to increase protein flexibility. The resulting library of 64 variants was exptl. screened, which revealed 21 (pairs of) stabilizing mutations located both in relatively rigid and in flexible areas of the enzyme. Finally, combining 10-12 of these confirmed mutations resulted in multi-site mutants with an increase in apparent melting temp. from 50 to 85°C, enhanced catalytic activity, preserved regioselectivity and a >250-fold longer half-life. The developed Framework for Rapid Enzyme Stabilization by Computational libraries (FRESCO) requires far less screening than conventional directed evolution.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXptV2gsA%253D%253D&md5=86dfd0f58590931be81287805299d234
94
Thiltgen, G.; Goldstein, R. A. Assessing Predictors of Changes in Protein Stability upon Mutation Using Self-Consistency. PLoS One 2012, 7, e46084, DOI: 10.1371/journal.pone.0046084

Google Scholar

94
Assessing predictors of changes in protein stability upon mutation using self-consistency

Thiltgen, Grant; Goldstein, Richard A.

PLoS One (2012), 7 (10), e46084CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)

The ability to predict the effect of mutations on protein stability is important for a wide range of tasks, from protein engineering to assessing the impact of SNPs to understanding basic protein biophysics. A no. of methods have been developed that make these predictions, but assessing the accuracy of these tools is difficult given the limitations and inconsistencies of the exptl. data. We evaluate four different methods based on the ability of these methods to generate consistent results for forward and back mutations and examine how this ability varies with the nature and location of the mutation. We find that, while one method seems to outperform the others, the ability of these methods to make accurate predictions is limited.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xhs12qsrbM&md5=0f37aae808ba1872727b2d0a162f5f07
95
Buß, O.; Rudat, J.; Ochsenreither, K. FoldX as Protein Engineering Tool: Better Than Random Based Approaches?. Comput. Struct. Biotechnol. J. 2018, 16, 25– 33, DOI: 10.1016/j.csbj.2018.01.002

Google Scholar

95
FoldX as Protein Engineering Tool: Better Than Random Based Approaches?

Buss, Oliver; Rudat, Jens; Ochsenreither, Katrin

Computational and Structural Biotechnology Journal (2018), 16 (), 25-33CODEN: CSBJAC; ISSN:2001-0370. (Elsevier B.V.)

Improving protein stability is an important goal for basic research as well as for clin. and industrial applications but no commonly accepted and widely used strategy for efficient engineering is known. Beside random approaches like error prone PCR or phys. techniques to stabilize proteins, e.g. by immobilization, in silico approaches are gaining more attention to apply target-oriented mutagenesis. In this review different algorithms for the prediction of beneficial mutation sites to enhance protein stability are summarized and the advantages and disadvantages of FoldX are highlighted. The question whether the prediction of mutation sites by the algorithm FoldX is more accurate than random based approaches is addressed.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXjsF2gtr0%253D&md5=a4864b8be6a05bd2e9593d27433aaef4
96
Allen, B. D.; Nisthal, A.; Mayo, S. L. Experimental Library Screening Demonstrates the Successful Application of Computational Protein Design to Large Structural Ensembles. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 19838– 19843, DOI: 10.1073/pnas.1012985107

Google Scholar

96
Experimental library screening demonstrates the successful application of computational protein design to large structural ensembles

Allen, Benjamin D.; Nisthal, Alex; Mayo, Stephen L.

Proceedings of the National Academy of Sciences of the United States of America (2010), 107 (46), 19838-19843, S19838/1-S19838/8CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)

The stability, activity, and soly. of a protein sequence are detd. by a delicate balance of mol. interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate more thorough anal., we developed new methods for the design and high-throughput stability detn. of combinatorial mutation libraries based on protein design calcns. The application of these methods to the core design of a small model system produced many variants with improved thermodn. stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and exptl. measured stability values shows clearly that a design procedure need not reproduce exptl. data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technol.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhsVyjsbrE&md5=4f90691cd71820f87fcae32845b45239
97
Barlow, K. A.; Ó Conchúir, S.; Thompson, S.; Suresh, P.; Lucas, J. E.; Heinonen, M.; Kortemme, T. Flex DdG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation. J. Phys. Chem. B 2018, 122, 5389– 5399, DOI: 10.1021/acs.jpcb.7b11367

Google Scholar

97
Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation

Barlow, Kyle A.; Conchuir, Shane O.; Thompson, Samuel; Suresh, Pooja; Lucas, James E.; Heinonen, Markus; Kortemme, Tanja

Journal of Physical Chemistry B (2018), 122 (21), 5389-5399CODEN: JPCBFK; ISSN:1520-5207. (American Chemical Society)

Computationally modeling changes in binding free energies upon mutation (interface ΔΔG) allows large-scale prediction and perturbation of protein-protein interactions. Addnl., methods that consider and sample relevant conformational plasticity should be able to achieve higher prediction accuracy over methods that do not. To test this hypothesis, the authors developed a method within the Rosetta macromol. modeling suite (flex ddG) that samples conformational diversity using "backrub" to generate an ensemble of models and then applies torsion minimization, side chain repacking, and averaging across this ensemble to est. interface ΔΔG values. The authors tested the method on a curated benchmark set of 1240 mutants, and found the method outperformed existing methods that sampled conformational space to a lesser degree. The authors obsd. considerable improvements with flex ddG over existing methods on the subset of small side chain to large side chain mutations, as well as for multiple simultaneous nonalanine mutations, stabilizing mutations, and mutations in antibody-antigen interfaces. Finally, the authors applied a generalized additive model (GAM) approach to the Rosetta energy function; the resulting nonlinear reweighting model improved the agreement with exptl. detd. interface ΔΔG values but also highlighted the necessity of future energy function improvements.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit1ymsb4%253D&md5=0bd9fb996c5579bca1cd4bc13608cd13
98
Ludwiczak, J.; Jarmula, A.; Dunin-Horkawicz, S. Combining Rosetta with Molecular Dynamics (MD): A Benchmark of the MD-Based Ensemble Protein Design. J. Struct. Biol. 2018, 203, 54– 61, DOI: 10.1016/j.jsb.2018.02.004

Google Scholar

98
Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design

Ludwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, Stanislaw

Journal of Structural Biology (2018), 203 (1), 54-61CODEN: JSBIEM; ISSN:1047-8477. (Elsevier Inc.)

Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while mol. dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addn., we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for exptl. validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, esp. for tasks that require a large pool of diverse sequences.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXivFSkur8%253D&md5=d877ddf87c0d62bb2467beaaa0c0c164
99
Davis, I. W.; Arendall, W. B.; Richardson, D. C.; Richardson, J. S. The Backrub Motion: How Protein Backbone Shrugs When a Sidechain Dances. Structure 2006, 14, 265– 274, DOI: 10.1016/j.str.2005.10.007

Google Scholar

99
The Backrub Motion: How Protein Backbone Shrugs When a Sidechain Dances

Davis, Ian W.; Arendall, W. Bryan; Richardson, David C.; Richardson, Jane S.

Structure (Cambridge, MA, United States) (2006), 14 (2), 265-274CODEN: STRUE6; ISSN:0969-2126. (Cell Press)

Surprisingly, the frozen structures from ultra-high-resoln. protein crystallog. reveal a prevalent, but subtle, mode of local backbone motion coupled to much larger, two-state changes of sidechain conformation. This "backrub" motion provides an influential and common type of local plasticity in protein backbone. Concerted reorientation of two adjacent peptides swings the central sidechain perpendicular to the chain direction, changing accessible sidechain conformations while leaving flanking structure undisturbed. Alternate conformations in sub-1 Å crystal structures show backrub motions for two-thirds of the significant Cβ shifts and 3% of the total residues in these proteins (126/3882), accompanied by two-state changes in sidechain rotamer. The B modeling tool is effective in crystallog. rebuilding. For homol. modeling or protein redesign, backrubs can provide realistic, small perturbations to rigid backbones. For large sidechain changes in protein dynamics or for single mutations, backrubs allow backbone accommodation while maintaining H bonds and ideal geometry.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtlKltr8%253D&md5=b3bddc8b2314f8a5dabf31f1c3912241
100
Wei, G.; Xi, W.; Nussinov, R.; Ma, B. Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the Cell. Chem. Rev. 2016, 116, 6516– 6551, DOI: 10.1021/acs.chemrev.5b00562

Google Scholar

100
Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the Cell

Wei, Guanghong; Xi, Wenhui; Nussinov, Ruth; Ma, Buyong

Chemical Reviews (Washington, DC, United States) (2016), 116 (11), 6516-6551CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)

All sol. proteins populate conformational ensembles that together constitute the native state. Their fluctuations in water are intrinsic thermodn. phenomena, and the distributions of the states on the energy landscape are detd. by statistical thermodn.; however, they are optimized to perform their biol. functions. In this review we briefly describe advances in free energy landscape studies of protein conformational ensembles. Exptl. (NMR, small-angle X-ray scattering, single-mol. spectroscopy, and cryo-electron microscopy) and computational (replica-exchange mol. dynamics, metadynamics, and Markov state models) approaches have made great progress in recent years. These address the challenging characterization of the highly flexible and heterogeneous protein ensembles. We focus on structural aspects of protein conformational distributions, from collective motions of single- and multi-domain proteins, intrinsically disordered proteins, to multiprotein complexes. Importantly, we highlight recent studies that illustrate functional adjustment of protein conformational ensembles in the crowded cellular environment. We center on the role of the ensemble in recognition of small- and macro-mols. (protein and RNA/DNA) and emphasize emerging concepts of protein dynamics in enzyme catalysis. Overall, protein ensembles link fundamental physicochem. principles and protein behavior and the cellular network and its regulation.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XitVyhsLo%253D&md5=fac9ab64e11aa3a4f2f0988bf1db1209
101
Fan, H.; Mark, A. E. Relative Stability of Protein Structures Determined by X-Ray Crystallography or NMR Spectroscopy: A Molecular Dynamics Simulation Study. Proteins: Struct., Funct., Genet. 2003, 53, 111– 120, DOI: 10.1002/prot.10496

Google Scholar

101
Relative stability of protein structures determined by X-ray crystallography or NMR spectroscopy: A molecular dynamics simulation study

Fan, Hao; Mark, Alan E.

Proteins: Structure, Function, and Genetics (2003), 53 (1), 111-120CODEN: PSFGEY; ISSN:0887-3585. (Wiley-Liss, Inc.)

The relative stability of protein structures detd. by either x-ray crystallog. or NMR spectroscopy has been investigated by using mol. dynamics simulation techniques. Published structures of 34 proteins contg. between 50 and 100 residues have been evaluated. The proteins selected represent a mixt. of secondary structure types including all α, all β, and α/β. The proteins selected do not contain cysteine-cysteine bridges. In addn., any crystallog. waters, metal ions, cofactors, or bound ligands were removed before the systems were simulated. The stability of the structures was evaluated by simulating, under identical conditions, each of the proteins for at least 5 ns in explicit solvent. It is found that not only do NMR-derived structures have, on av., higher internal strain than structures detd. by x-ray crystallog. but that a significant proportion of the structures are unstable and rapidly diverge in simulations.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXnt1WltLw%253D&md5=b0f43f93057a0824336823a539ae3985
102
Kuzmanic, A.; Pannu, N. S.; Zagrovic, B. X-Ray Refinement Significantly Underestimates the Level of Microscopic Heterogeneity in Biomolecular Crystals. Nat. Commun. 2014, 5, 3220, DOI: 10.1038/ncomms4220

Google Scholar

102
X-ray refinement significantly underestimates the level of microscopic heterogeneity in biomolecular crystals

Kuzmanic Antonija; Zagrovic Bojan; Pannu Navraj S

Nature communications (2014), 5 (), 3220 ISSN:.

Biomolecular X-ray structures typically provide a static, time- and ensemble-averaged view of molecular ensembles in crystals. In the absence of rigid-body motions and lattice defects, B-factors are thought to accurately reflect the structural heterogeneity of such ensembles. In order to study the effects of averaging on B-factors, we employ molecular dynamics simulations to controllably manipulate microscopic heterogeneity of a crystal containing 216 copies of villin headpiece. Using average structure factors derived from simulation, we analyse how well this heterogeneity is captured by high-resolution molecular-replacement-based model refinement. We find that both isotropic and anisotropic refined B-factors often significantly deviate from their actual values known from simulation: even at high 1.0 ÅA resolution and Rfree of 5.9%, B-factors of some well-resolved atoms underestimate their actual values even sixfold. Our results suggest that conformational averaging and inadequate treatment of correlated motion considerably influence estimation of microscopic heterogeneity via B-factors, and invite caution in their interpretation.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2cvivFKgsw%253D%253D&md5=99e76ed6614f1c57b46fa8917bcbcf99
103
Karshikoff, A.; Nilsson, L.; Ladenstein, R. Rigidity versus Flexibility: The Dilemma of Understanding Protein Thermal Stability. FEBS J. 2015, 282, 3899– 3917, DOI: 10.1111/febs.13343

Google Scholar

103
Rigidity versus flexibility: the dilemma of understanding protein thermal stability

Karshikoff, Andrey; Nilsson, Lennart; Ladenstein, Rudolf

FEBS Journal (2015), 282 (20), 3899-3917CODEN: FJEOAC; ISSN:1742-464X. (Wiley-Blackwell)

A review. The role of fluctuations in protein thermostability has recently received considerable attention. In the current literature a dualistic picture can be found as follows. On one hand, thermostability seems to be assocd. with enhanced rigidity of the protein scaffold in parallel with the redn. of flexible parts of the structure. However, in contrast with this argument it has been shown by exptl. studies and computer simulation that thermal tolerance of a protein is not necessarily correlated with the suppression of internal fluctuations and mobility. Both concepts - i.e., rigidity and flexibility - are derived from a mech. engineering perspective and represent temporally insensitive features describing static properties and neglect the notion that relative motion at certain time scales is possible in structurally stable regions of a protein. This suggests that a strict sepn. of rigid and flexible parts of a protein mol. does not correctly describe the reality of the situation. In this work the concepts of mobility/flexibility vs. rigidity will be critically reconsidered by taking into account mol. dynamics calcns. of heat capacity and conformational entropy, salt bridge networks, electrostatic interactions in folded and unfolded states, and the emerging picture of protein thermostability in view of recently developed network theories. Last, but not least, the influence of high temp. on the active site and activity of enzymes will be considered.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtFKgt7vI&md5=065a89fa9d115391b32f09d84a41fb1a
104
Der, B. S.; Kluwe, C.; Miklos, A. E.; Jacak, R.; Lyskov, S.; Gray, J. J.; Georgiou, G.; Ellington, A. D.; Kuhlman, B. Alternative Computational Protocols for Supercharging Protein Surfaces for Reversible Unfolding and Retention of Stability. PLoS One 2013, 8, e64363, DOI: 10.1371/journal.pone.0064363

Google Scholar

104
Alternative computational protocols for supercharging protein surfaces for reversible unfolding and retention of stability

Der, Bryan S.; Kluwe, Christien; Miklos, Aleksandr E.; Jacak, Ron; Lyskov, Sergey; Gray, Jeffrey J.; Georgiou, George; Ellington, Andrew D.; Kuhlman, Brian

PLoS One (2013), 8 (5), e64363CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)

Reengineering protein surfaces to exhibit high net charge, referred to as "supercharging", can improve reversibility of unfolding by preventing aggregation of partially unfolded states. Incorporation of charged side chains should be optimized while considering structural and energetic consequences, as numerous mutations and accumulation of like-charges can also destabilize the native state. A previously demonstrated approach deterministically mutates flexible polar residues (amino acids DERKNQ) with the fewest av. neighboring atoms per side chain atom (AvNAPSA). Our approach uses Rosetta-based energy calcns. to choose the surface mutations. Both protocols are available for use through the ROSIE web server. The automated Rosetta and AvNAPSA approaches for supercharging choose dissimilar mutations, raising an interesting division in surface charging strategy. Rosetta-supercharged variants of GFP (RscG) ranging from -11 to -61 and +7 to +58 were exptl. tested, and for comparison, we re-tested the previously developed AvNAPSA-supercharged variants of GFP (AscG) with +36 and -30 net charge. Mid-charge variants demonstrated ∼3-fold improvement in refolding with retention of stability. However, as we pushed to higher net charges, expression and sol. yield decreased, indicating that net charge or mutational load may be limiting factors. Interestingly, the two different approaches resulted in GFP variants with similar refolding properties. Our results show that there are multiple sets of residues that can be mutated to successfully supercharge a protein, and combining alternative supercharge protocols with exptl. testing can be an effective approach for charge-based improvement to refolding.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpslOgtLY%253D&md5=ad4d2287e1b74fc768c985f14520609a
105
Chan, P.; Curtis, R. A.; Warwicker, J. Soluble Expression of Proteins Correlates with a Lack of Positively-Charged Surface. Sci. Rep. 2013, 3, 3333, DOI: 10.1038/srep03333

Google Scholar

105
Soluble expression of proteins correlates with a lack of positively-charged surface

Chan Pedro; Curtis Robin A; Warwicker Jim

Scientific reports (2013), 3 (), 3333 ISSN:.

Prediction of protein solubility is gaining importance with the growing use of protein molecules as therapeutics, and ongoing requirements for high level expression. We have investigated protein surface features that correlate with insolubility. Non-polar surface patches associate to some degree with insolubility, but this is far exceeded by the association with positively-charged patches. Negatively-charged patches do not separate insoluble/soluble subsets. The separation of soluble and insoluble subsets by positive charge clustering (area under the curve for a ROC plot is 0.85) has a striking parallel with the separation that delineates nucleic acid-binding proteins, although most of the insoluble dataset are not known to bind nucleic acid. Additionally, these basic patches are enriched for arginine, relative to lysine. The results are discussed in the context of expression systems and downstream processing, contributing to a view of protein solubility in which the molecular interactions of charged groups are far from equivalent.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2c3gslGmsw%253D%253D&md5=f919d4ac66e3b0535b9ac0910bbd6341
106
Rezaie, E.; Mohammadi, M.; Sakhteman, A.; Bemani, P.; Ahrari, S. Application of Molecular Dynamics Simulations To Design a Dual-Purpose Oligopeptide Linker Sequence for Fusion Proteins. J. Mol. Model. 2018, 24, 313, DOI: 10.1007/s00894-018-3846-x

Google Scholar

106
Application of molecular dynamics simulations to design a dual-purpose oligopeptide linker sequence for fusion proteins

Rezaie Ehsan; Mohammadi Mozafar; Rezaie Ehsan; Sakhteman Amirhossein; Bemani Peyman; Ahrari Sajjad

Journal of molecular modeling (2018), 24 (11), 313 ISSN:.

Proteins are often monitored by combining a fluorescent polypeptide tag with the target protein. However, due to the high molecular weight and immunogenicity of such tags, they are not suitable choices for combining with fusion proteins such as immunotoxins. In this study, we designed a polypeptide sequence with a dual role (it acts as both a linker and a fluorescent probe) to use with fusion proteins. Two common fluorescent tag sequences based on tetracysteine were compared to a commonly used rigid linker as well as our proposed dual-purpose sequence. Computational investigations showed that the dual-purpose sequence was structurally stable and may be a good choice to use as both a linker and a fluorescence marker between two moieties in a fusion protein.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB3czps1Oktw%253D%253D&md5=53d54b6672d1e6cea66bc0be9574636f
107
Folkman, L.; Stantic, B.; Sattar, A.; Zhou, Y. EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models. J. Mol. Biol. 2016, 428, 1394– 1405, DOI: 10.1016/j.jmb.2016.01.012

Google Scholar

107
EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models.

Folkman, Lukas; Stantic, Bela; Sattar, Abdul; Zhou, Yaoqi

Journal of Molecular Biology (2016), 428 (6), 1394-1405CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)

Protein engineering and characterization of non-synonymous single nucleotide variants (SNVs) require accurate prediction of protein stability changes (ΔΔGu) induced by single amino acid substitutions. Here, we have developed a new prediction method called Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM), which comprises five specialised support vector machine (SVM) models and makes the final prediction from a consensus of two models selected based on the predicted secondary structure and accessible surface area of the mutated residue. The new method is applicable to single-domain monomeric proteins and can predict ΔΔGu with a protein sequence and mutation as the only inputs. EASE-MM yielded a Pearson correlation coeff. of 0.53-0.59 in 10-fold cross-validation and independent testing and was able to outperform other sequence-based methods. When compared to structure-based energy functions, EASE-MM achieved a comparable or better performance. The application to a large dataset of human germline non-synonymous SNVs showed that the disease-causing variants tend to be assocd. with larger magnitudes of ΔΔGu predicted with EASE-MM. The EASE-MM web-server is available at http://sparks-lab.org/server/ease.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsFKmsrg%253D&md5=f18493bae91d6e45eb5bdfe42249d354
108
Teng, S.; Srivastava, A. K.; Wang, L. Sequence Feature-Based Prediction of Protein Stability Changes upon Amino Acid Substitutions. BMC Genomics 2010, 11 (Suppl 2), S5, DOI: 10.1186/1471-2164-11-S2-S5

Google Scholar

There is no corresponding record for this reference.
109
Huang, L.-T.; Gromiha, M. M.; Ho, S.-Y. IPTREE-STAB: Interpretable Decision Tree Based Method for Predicting Protein Stability Changes upon Mutations. Bioinformatics 2007, 23, 1292– 1293, DOI: 10.1093/bioinformatics/btm100

Google Scholar

109
iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations

Huang, Liang-Tsung; Gromiha, M. Michael; Ho, Shinn-Ying

Bioinformatics (2007), 23 (10), 1292-1293CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

A web server, iPTREE-STAB is developed or discriminating the stability of proteins (stabilizing or destabilizing) and predicting their stability changes (ΔΔG) upon single amino acid substitutions from amino acid sequence. The discrimination and prediction are mainly based on decision tree coupled with adaptive boosting algorithm, and classification and regression tree, resp., using three neighboring residues of the mutant site along N- and C-terminals. Our method showed an accuracy of 82% for discriminating the stabilizing and destabilizing mutants, and a correlation of 0.70 for predicting protein stability changes upon mutations.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXntVOjuro%253D&md5=93f9f9be58c4e5fc3091a6409d93ad60
110
Paladin, L.; Piovesan, D.; Tosatto, S. C. E. SODA: Prediction of Protein Solubility from Disorder and Aggregation Propensity. Nucleic Acids Res. 2017, 45, W236– W240, DOI: 10.1093/nar/gkx412

Google Scholar

110
SODA: prediction of protein solubility from disorder and aggregation propensity

Paladin, Lisanna; Piovesan, Damiano; Tosatto, Silvio C. E.

Nucleic Acids Research (2017), 45 (W1), W236-W240CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)

Soly. is an important, albeit not well understood, feature detg. protein behavior. It is of paramount importance in protein engineering, where similar folded proteins may behave in very different ways in soln. Here we present SODA, a novel method to predict the changes of protein soly. based on several physico-chem. properties of the protein. SODA uses the propensity of the protein sequence to aggregate as well as intrinsic disorder, plus hydrophobicity and secondary structure preferences to est. changes in soly. It has been trained and benchmarked on two different datasets. The comparison to other recently published methods shows that SODA has state-of-the-art performance and is particularly well suited to predict mutations decreasing soly. The method is fast, returning results for single mutations in seconds. A usage example estg. the full repertoire of mutations for a human germline antibody highlights several soly. hotspots on the surface.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1amtbY%253D&md5=fad21a88462efc7f300fd49d3396e95a
111
Liaw, A.; Wiener, M. Classification and Regression by RandomForest. R News 2002, 2, 18– 22
Google Scholar

There is no corresponding record for this reference.
112
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5– 32, DOI: 10.1023/A:1010933404324

Google Scholar

There is no corresponding record for this reference.
113
Boughorbel, S.; Jarray, F.; El-Anbari, M. Optimal Classifier for Imbalanced Data Using Matthews Correlation Coefficient Metric. PLoS One 2017, 12, e0177678, DOI: 10.1371/journal.pone.0177678

Google Scholar

113
Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric

Boughorbel, Sabri; Jarray, Fethi; Mohammed, El-Anbari

PLoS One (2017), 12 (6), e0177678/1-e0177678/17CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)

Data imbalance is frequently encountered in biomedical applications. Resampling techniques can be used in binary classification to tackle this issue. However such solns. are not desired when the no. of samples in the small class is limited. Moreover the use of inadequate performance metrics, such as accuracy, lead to poor generalization results because the classifiers tend to predict the largest size class. One of the good approaches to deal with this issue is to optimize performance metrics that are designed to handle data imbalance. Matthews Correlation Coeff. (MCC) is widely used in Bioinformatics as a performance metric. We are interested in developing a new classifier based on the MCC metric to handle imbalanced data. We derive an optimal Bayes classifier for the MCC metric using an approach based on Frechet deriv. We show that the proposed algorithm has the nice theor. property of consistency. Using simulated data, we verify the correctness of our optimality result by searching in the space of all possible binary classifiers. The proposed classifier is evaluated on 64 datasets from a wide range data imbalance. We compare both classification performance and CPU efficiency for three classifiers: 1) the proposed algorithm (MCC-classifier), the Bayes classifier with a default threshold (MCC-base) and imbalanced SVM (SVM-imba). The exptl. evaluation shows that MCC-classifier has a close performance to SVM-imba while being simpler and more efficient.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXkvFaktLk%253D&md5=f3ed23447a504356fa60617bc836ffdf
114
Ling, C. X.; Sheng, V. S. Cost-Sensitive Learning and the Class Imbalance Problem. In Encyclopedia of Machine Learning; Sammut, C., Ed.; Springer: New York, 2007.
Google Scholar

There is no corresponding record for this reference.
115
Rao, R.; Fung, G.; Rosales, R. On the Dangers of Cross-Validation. An Experimental Evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining; Society for Industrial and Applied Mathematics: Philadelphia, PA, 2008; pp 588– 596.
Google Scholar

There is no corresponding record for this reference.
116
Stephens, Z. D.; Lee, S. Y.; Faghri, F.; Campbell, R. H.; Zhai, C.; Efron, M. J.; Iyer, R.; Schatz, M. C.; Sinha, S.; Robinson, G. E. Big Data: Astronomical or Genomical?. PLoS Biol. 2015, 13, e1002195, DOI: 10.1371/journal.pbio.1002195

Google Scholar

116
Big data: astronomical or genomical?

Stephens, Zachary D.; Lee, Skylar Y.; Faghri, Faraz; Campbell, Roy H.; Zhai, Chengxiang; Efron, Miles J.; Iyer, Ravishankar; Schatz, Michael C.; Sinha, Saurabh; Robinson, Gene E.

PLoS Biology (2015), 13 (7), e1002195/1-e1002195/11CODEN: PBLIBG; ISSN:1545-7885. (Public Library of Science)

Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our ests. show that genomics is a "four-headed beast"-it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and anal. We discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the "genomical" challenges of the next decade.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XktVGrsrs%253D&md5=0a81543b8015e929b89cc4bfe228a83c
117
Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic Local Alignment Search Tool. J. Mol. Biol. 1990, 215, 403– 410, DOI: 10.1016/S0022-2836(05)80360-2

Google Scholar

117
Basic local alignment search tool

Altschul, Stephen F.; Gish, Warren; Miller, Webb; Myers, Eugene W.; Lipman, David J.

Journal of Molecular Biology (1990), 215 (3), 403-10CODEN: JMOBAK; ISSN:0022-2836.

A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent math. results on the stochastic properties of MSP scores allow an anal. of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a no. of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the anal. of multiple regions of similarity in long DNA sequences. In addn. to its flexibility and tractability to math. anal., BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3MXitVGmsA%253D%253D&md5=009d2323eb82f0549356880e1101db16
118
Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Res. 1997, 25, 3389– 3402, DOI: 10.1093/nar/25.17.3389

Google Scholar

118
Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Altschul, Stephen F.; Madden, Thomas L.; Schaffer, Alejandro A.; Zhang, Jinghui; Zhang, Zheng; Miller, Webb; Lipman, David J.

Nucleic Acids Research (1997), 25 (17), 3389-3402CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approx. three times the speed of the original. In addn., a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approx. the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biol. relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily. The source code for the new BLAST programs is available by anonymous ftp from the machine ncbi.nlm.nih.gov, within the directory 'blast', and the programs may be run from NCBIs web site at http://www.ncbi.nlm.nih.gov/.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXlvFyhu7w%253D&md5=4e44123e5984e4aca46a9899d347a176
119
Eddy, S. R. Profile Hidden Markov Models. Bioinformatics 1998, 14, 755– 763, DOI: 10.1093/bioinformatics/14.9.755

Google Scholar

119
Profile hidden Markov models

Eddy, Sean R.

Bioinformatics (1998), 14 (9), 755-763CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

A review with many refs. The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement std. pairwise comparison methods for large-scale sequence anal. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1MXktlCmtQ%253D%253D&md5=ff718714f195b87980385b1674a35353
120
Remmert, M.; Biegert, A.; Hauser, A.; Söding, J. HHblits: Lightning-Fast Iterative Protein Sequence Searching by HMM–HMM Alignment. Nat. Methods 2012, 9, 173– 175, DOI: 10.1038/nmeth.1818

Google Scholar

120
HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment

Remmert, Michael; Biegert, Andreas; Hauser, Andreas; Soeding, Johannes

Nature Methods (2012), 9 (2), 173-175CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)

Sequence-based protein function and structure prediction depends crucially on sequence-search sensitivity and accuracy of the resulting sequence alignments. We present an open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): 'HMM-HMM-based lightning-fast iterative sequence search' (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/). Compared to the sequence-search tool PSI-BLAST, HHblits is faster owing to its discretized-profile prefilter, has 50-100% higher sensitivity and generates more accurate alignments.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhs1OltbnO&md5=7173e55f4fe71458233a77c3bd38cf68
121
Pearson, W. R. An Introduction to Sequence Similarity (“Homology”) Searching. Curr. Protoc. Bioinf. 2013, 42, 3.1.1– 3.1.8, DOI: 10.1002/0471250953.bi0301s42

Google Scholar

There is no corresponding record for this reference.
122
Rost, B. Twilight Zone of Protein Sequence Alignments. Protein Eng., Des. Sel. 1999, 12, 85– 94, DOI: 10.1093/protein/12.2.85

Google Scholar

There is no corresponding record for this reference.
123
Fletcher, W.; Yang, Z. The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection. Mol. Biol. Evol. 2010, 27, 2257– 2267, DOI: 10.1093/molbev/msq115

Google Scholar

123
The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection

Fletcher, William; Yang, Ziheng

Molecular Biology and Evolution (2010), 27 (10), 2257-2267CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)

The detection of pos. Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The "branch-site" test is designed to detect localized episodic bouts of pos. selection that affect only a few amino acid residues on particular lineages and has been shown to have reasonable power and low false-pos. rates for a wide range of selection schemes. Previous simulations examg. the performance of the test, however, were conducted under idealized conditions without insertions, deletions, or alignment errors. As the test is sometimes used to analyze divergent sequences, the impact of indels and alignment errors is a major concern. Here, we used a recently developed indel-simulation program to examine the false-pos. rate and power of the branch-site test. We find that insertions and deletions do not cause excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives. Of the alignment methods evaluated, PRANK consistently outperformed MUSCLE, MAFFT, and ClustalW, mostly because the latter programs tend to place nonhomologous codons (or amino acids) into the same column, producing shorter and less accurate alignments and giving the false impression that many amino acid substitutions have occurred at those sites. Our examn. of two previous studies suggests that alignment errors may impact the anal. of mammalian and vertebrate genes by the branch-site test, and it is important to use reliable alignment methods.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXht1WhtL%252FK&md5=243dcf1c1aaee3f824ad895fc7bd3d57
124
Vialle, R. A.; Tamuri, A. U.; Goldman, N. Alignment Modulates Ancestral Sequence Reconstruction Accuracy. Mol. Biol. Evol. 2018, 35, 1783– 1797, DOI: 10.1093/molbev/msy055

Google Scholar

124
Alignment modulates ancestral sequence reconstruction accuracy

Vialle, Ricardo Assuncao; Tamuri, Asif U.; Goldman, Nick

Molecular Biology and Evolution (2018), 35 (7), 1783-1797CODEN: MBEVEO; ISSN:1537-1719. (Oxford University Press)

It relies on multiple sequence alignment (MSA) which may introduce biases, and it remains unknown how MSA methodol. approaches impact ancestral sequence reconstruction (ASR). Here, we investigate how MSA methodol. modulates ASR using a simulation study of various evolutionary scenarios. We evaluate the accuracy of ancestral protein sequence reconstruction for simulated data and compare reconstruction outcomes using different alignment methods. Our results reveal biases introduced not only by aligner algorithms and assumptions, but also tree topol. and the rate of insertions and deletions. Under many conditions we find no substantial differences between the MSAs. However, increasing the difficulty for the aligners can significantly impact ASR. The MAFFT consistency aligners and PRANK variants exhibit the best performance, whereas FSA displays limited performance. We also discover a bias towards reconstructed sequences longer than the true ancestors, deriving from a preference for inferring insertions, in almost all MSA methodol. approaches. In addn., we find measures of MSA quality generally correlate highly with reconstruction accuracy. Thus, we show MSA methodol. differences can affect the quality of reconstructions and propose MSA methods should be selected with care to accurately det. ancestral states with confidence.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtF2rurzO&md5=d87a9d035ac03728fc42191d94ae34d6
125
Chowdhury, B.; Garai, G. A Review on Multiple Sequence Alignment from the Perspective of Genetic Algorithm. Genomics 2017, 109, 419– 431, DOI: 10.1016/j.ygeno.2017.06.007

Google Scholar

125
A review on multiple sequence alignment from the perspective of genetic algorithm

Chowdhury, Biswanath; Garai, Gautam

Genomics (2017), 109 (5-6), 419-431CODEN: GNMCEP; ISSN:0888-7543. (Elsevier Inc.)

A review. Sequence alignment is an active research area in the field of bioinformatics. It is also a crucial task as it guides many other tasks like phylogenetic anal., function, and/or structure prediction of biol. macromols. like DNA, RNA, and Protein. Proteins are the building blocks of every living organism. Although protein alignment problem has been studied for several decades, unfortunately, every available method produces alignment results differently for a single alignment problem. Multiple sequence alignment is characterized as a very high computational complex problem. Many stochastic methods, therefore, are considered for improving the accuracy of alignment. Among them, many researchers frequently use Genetic Algorithm. In this study, we have shown different types of the method applied in alignment and the recent trends in the multiobjective genetic algorithm for solving multiple sequence alignment. Many recent studies have demonstrated considerable progress in finding the alignment accuracy.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFGhsLzM&md5=ea687bedc4969e0baeb473d5c243927a
126
Taly, J.-F.; Magis, C.; Bussotti, G.; Chang, J.-M.; Di Tommaso, P.; Erb, I.; Espinosa-Carrasco, J.; Kemena, C.; Notredame, C. Using the T-Coffee Package to Build Multiple Sequence Alignments of Protein, RNA, DNA Sequences and 3D Structures. Nat. Protoc. 2011, 6, 1669– 1682, DOI: 10.1038/nprot.2011.393

Google Scholar

126
Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures

Taly, Jean-Francois; Magis, Cedrik; Bussotti, Giovanni; Chang, Jia-Ming; Di Tommaso, Paolo; Erb, Ionas; Espinosa-Carrasco, Jose; Kemena, Carsten; Notredame, Cedric

Nature Protocols (2011), 6 (11), 1669-1682CODEN: NPARDW; ISSN:1750-2799. (Nature Publishing Group)

T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biol. sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homol.) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homol. extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXht1yjsrjI&md5=ffd8032f578a0e00234e3ff361219c8b
127
Pei, J.; Grishin, N. V. PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information. Methods Mol. Biol. 2014, 1079, 263– 271, DOI: 10.1007/978-1-62703-646-7_17

Google Scholar

127
PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information

Pei Jimin; Grishin Nick V

Methods in molecular biology (Clifton, N.J.) (2014), 1079 (), 263-71 ISSN:.

Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2c7gvFOnsg%253D%253D&md5=73ceb74e9bc0c51251abf63b4e4d9bd3
128
Steipe, B.; Schiller, B.; Plückthun, A.; Steinbacher, S. Sequence Statistics Reliably Predict Stabilizing Mutations in a Protein Domain. J. Mol. Biol. 1994, 240, 188– 192, DOI: 10.1006/jmbi.1994.1434

Google Scholar

128
Sequence statistics reliably predict stabilizing mutations in a protein domain

Steipe, Boris; Schiller, Britta; Plueckthun, Andreas; Steinbacher, Stefan

Journal of Molecular Biology (1994), 240 (3), 188-92CODEN: JMOBAK; ISSN:0022-2836.

Ig variable domains are generally thought of as well conserved platforms providing the base for antigen binding loops of highly varying sequence and structure. However, domain evolution must ensure a balance between optimizing antigen affinity and the requirements of a stable, cooperatively folding domain. Since random mutations can carry a significant penalty for domain stability, constraints are imposed both on the repertoire of germline sequences and on somatic amino acid replacements during affinity maturation. Analyzing these constraints in the conceptual framework of statistical mech., the authors have been able to predict stabilizing mutations in the McPC603 VK domain from sequence information alone with better than 60% success rate. The validity of this concept not only has far reaching implications for antibody engineering but may also be generalized to engineer other proteins for high stability.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2cXltlWqsLg%253D&md5=83d409e3066ec939fae03e05eaeeefb8
129
Sullivan, B. J.; Nguyen, T.; Durani, V.; Mathur, D.; Rojas, S.; Thomas, M.; Syu, T.; Magliery, T. J. Stabilizing Proteins from Sequence Statistics: The Interplay of Conservation and Correlation in Triosephosphate Isomerase Stability. J. Mol. Biol. 2012, 420, 384– 399, DOI: 10.1016/j.jmb.2012.04.025

Google Scholar

129
Stabilizing Proteins from Sequence Statistics: The Interplay of Conservation and Correlation in Triosephosphate Isomerase Stability

Sullivan, Brandon J.; Nguyen, Tran; Durani, Venuka; Mathur, Deepti; Rojas, Samantha; Thomas, Miriam; Syu, Trixy; Magliery, Thomas J.

Journal of Molecular Biology (2012), 420 (4-5), 384-399CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)

Understanding the determinants of protein stability remains one of protein science's greatest challenges. There are still no computational solns. that calc. the stability effects of even point mutations with sufficient reliability for practical use. Amino acid substitutions rarely increase the stability of native proteins; hence, large libraries and high-throughput screens or selections are needed to stabilize proteins using directed evolution. Consensus mutations have proven effective for increasing stability, but these mutations are successful only about half the time. We set out to understand why some consensus mutations fail to stabilize, and what criteria might be useful to predict stabilization more accurately. Overall, consensus mutations at more conserved positions were more likely to be stabilizing in our model, triosephosphate isomerase (TIM) from Saccharomyces cerevisiae. However, positions coupled to other sites were more likely not to stabilize upon mutation. Destabilizing mutations could be removed both by removing sites with high statistical correlations to other positions and by removing nearly invariant positions at which "hidden correlations" can occur. Application of these rules resulted in identification of stabilizing mutations in 9 out of 10 positions, and amalgamation of all predicted stabilizing positions resulted in the most stable yeast TIM variant we produced (+ 8 °C). In contrast, a multimutant with 14 mutations each found to stabilize TIM independently was destabilized by 2 °C. Our results are a practical extension to the consensus concept of protein stabilization, and they further suggest the importance of positional independence in the mechanism of consensus stabilization.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XntFansb8%253D&md5=e358fa1cb59394f38ae264f104c2b3ec
130
Lehmann, M.; Kostrewa, D.; Wyss, M.; Brugger, R.; D’Arcy, A.; Pasamontes, L.; van Loon, A. P. From DNA Sequence to Improved Functionality: Using Protein Sequence Comparisons to Rapidly Design a Thermostable Consensus Phytase. Protein Eng., Des. Sel. 2000, 13, 49– 57, DOI: 10.1093/protein/13.1.49

Google Scholar

There is no corresponding record for this reference.
131
Magliery, T. J. Protein Stability: Computation, Sequence Statistics, and New Experimental Methods. Curr. Opin. Struct. Biol. 2015, 33, 161– 168, DOI: 10.1016/j.sbi.2015.09.002

Google Scholar

131
Protein stability: computation, sequence statistics, and new experimental methods

Magliery, Thomas J.

Current Opinion in Structural Biology (2015), 33 (), 161-168CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)

A review. Calcg. protein stability and predicting stabilizing mutations remain exceedingly difficult tasks, largely due to the inadequacy of potential functions, the difficulty of modeling entropy and the unfolded state, and challenges of sampling, particularly of backbone conformations. Yet, computational design produced some remarkably stable proteins in recent years, apparently owing to near ideality in structure and sequence features. With caveats, computational prediction of stability can be used to guide mutation, and mutations derived from consensus sequence anal., esp. improved by recent co-variation filters, are very likely to stabilize without sacrificing function. The combination of computational and statistical approaches with library approaches, including new technologies such as deep sequencing and high throughput stability measurements, point to a very exciting near term future for stability engineering, even with difficult computational issues remaining.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhs1SltbvJ&md5=f60fb4ac5dc13566a98015944d24ae0b
132
Porebski, B. T.; Buckle, A. M. Consensus Protein Design. Protein Eng., Des. Sel. 2016, 29, 245– 251, DOI: 10.1093/protein/gzw015

Google Scholar

132
Consensus protein design

Porebski, Benjamin T.; Buckle, Ashley M.

Protein Engineering, Design & Selection (2016), 29 (7), 245-251CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)

A popular and successful strategy in semi-rational design of protein stability is the use of evolutionary information encapsulated in homologous protein sequences. Consensus design is based on the hypothesis that at a given position, the resp. consensus amino acid contributes more than av. to the stability of the protein than non-conserved amino acids. Here, we review the consensus design approach, its theor. underpinnings, successes, limitations and challenges, as well as providing a detailed guide to its application in protein engineering.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsF2jtr%252FO&md5=d96858b68df92bbbd0811bee8188b048
133
Jäckel, C.; Bloom, J. D.; Kast, P.; Arnold, F. H.; Hilvert, D. Consensus Protein Design without Phylogenetic Bias. J. Mol. Biol. 2010, 399, 541– 546, DOI: 10.1016/j.jmb.2010.04.039

Google Scholar

133
Consensus protein design without phylogenetic bias

Jackel Christian; Bloom Jesse D; Kast Peter; Arnold Frances H; Hilvert Donald

Journal of molecular biology (2010), 399 (4), 541-6 ISSN:.

Consensus design is an appealing strategy for the stabilization of proteins. It exploits amino acid conservation in sets of homologous proteins to identify likely beneficial mutations. Nevertheless, its success depends on the phylogenetic diversity of the sequence set available. Here, we show that randomization of a single protein represents a reliable alternative source of sequence diversity that is essentially free of phylogenetic bias. A small number of functional protein sequences selected from binary-patterned libraries suffice as input for the consensus design of active enzymes that are easier to produce and substantially more stable than individual members of the starting data set. Although catalytic activity correlates less consistently with sequence conservation in these extensively randomized proteins, less extreme mutagenesis strategies might be adopted in practice to augment stability while maintaining function.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3cnotF2gsg%253D%253D&md5=7e4dc61c19f12f6625895e1e1c35093c
134
Goyal, V. D.; Magliery, T. J. Phylogenetic Spread of Sequence Data Affects Fitness of SOD1 Consensus Enzymes: Insights from Sequence Statistics and Structural Analyses. Proteins: Struct., Funct., Genet. 2018, 86, 609– 620, DOI: 10.1002/prot.25486

Google Scholar

There is no corresponding record for this reference.
135
Vázquez-Figueroa, E.; Chaparro-Riggers, J.; Bommarius, A. S. Development of a Thermostable Glucose Dehydrogenase by a Structure-Guided Consensus Concept. ChemBioChem 2007, 8, 2295– 2301, DOI: 10.1002/cbic.200700500

Google Scholar

135
Development of a thermostable glucose dehydrogenase by a structure-guided consensus concept

Vazquez-Figueroa, Eduardo; Chaparro-Riggers, Javier; Bommarius, Andreas S.

ChemBioChem (2007), 8 (18), 2295-2301CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)

Instability under non-native processing conditions, esp. at elevated temps., is a major factor preventing the widespread adoption of biocatalysts for industrial synthesis. A crucial distinction of many redox enzymes used to synthesize chiral compds. is the need for cofactors (e.g., NAD(P)(H)) for function. Because of the prohibitively high prices of nicotinamide cofactors, a robust cofactor-regenerating enzyme is required for the economical synthesis of fine chems. by biocatalysis. Here we test the structure-guided consensus for the generation of a thermostable glucose dehydrogenase (GDH). The consensus sequence in combination with addnl. knowledge-based criteria was used to select amino acids for substitutions. Using this approach we generated 24 variants, 11 of which showed higher thermal stability than the wild-type GDH, a success rate of 46%. Of the 24 variants, seven were located at the subunit interface-known to influence GDH stability-and six were more stable (86% success). The best variants feature a half-life of ∼3.5 days at 65°, in contrast to ∼20 min at 25° for the wild type, thus enhancing stability 106-fold. In addn., the three most stabilizing single mutations were transferred to two GDH homologs from Bacillus thuringiensis and Bacillus licheniformis. The thermal stability as measured by half-life and CD222 nm of the GDH variants was increased, as expected. The resulting stability changes provide further support for the view that these residues are crit. for stability of GDHs and reinforce the success of the consensus approach for identifying stabilizing mutations.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXltlenur0%253D&md5=7157c350631e7102bc5fd0b0ef8a4a7c
136
Parthasarathy, S.; Murthy, M. R. Protein Thermal Stability: Insights from Atomic Displacement Parameters (B Values). Protein Eng., Des. Sel. 2000, 13, 9– 13, DOI: 10.1093/protein/13.1.9

Google Scholar

There is no corresponding record for this reference.
137
Cole, M. F.; Gaucher, E. A. Exploiting Models of Molecular Evolution to Efficiently Direct Protein Engineering. J. Mol. Evol. 2011, 72, 193– 203, DOI: 10.1007/s00239-010-9415-2

Google Scholar

137
Exploiting Models of Molecular Evolution to Efficiently Direct Protein Engineering

Cole, Megan F.; Gaucher, Eric A.

Journal of Molecular Evolution (2011), 72 (2), 193-203CODEN: JMEVAU; ISSN:0022-2844. (Springer)

Directed evolution and protein engineering approaches used to generate novel or enhanced biomol. function often use the evolutionary sequence diversity of protein homologs to rationally guide library design. To fully capture this sequence diversity, however, libraries contg. millions of variants are often necessary. Screening libraries of this size is often undesirable due to inaccuracies of high-throughput assays, costs, and time constraints. The ability to effectively cull sequence diversity while still generating the functional diversity within a library thus holds considerable value. This is particularly relevant when high-throughput assays are not amenable to select/screen for certain biomol. properties. Here, we summarize our recent attempts to develop an evolution-guided approach, Reconstructing Evolutionary Adaptive Paths (REAP), for directed evolution and protein engineering that exploits phylogenetic and sequence analyses to identify amino acid substitutions that are likely to alter or enhance function of a protein. To demonstrate the utility of this technique, we highlight our previous work with DNA polymerases in which a REAP-designed small library was used to identify a DNA polymerase capable of accepting non-std. nucleosides. We anticipate that the REAP approach will be used in the future to facilitate the engineering of biopolymers with expanded functions and will thus have a significant impact on the developing field ofevolutionary synthetic biol.'.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjvVSksb4%253D&md5=d54d91ea84b7e5660b5f2f72539c7d58
138
Hochberg, G. K. A.; Thornton, J. W. Reconstructing Ancient Proteins to Understand the Causes of Structure and Function. Annu. Rev. Biophys. 2017, 46, 247– 269, DOI: 10.1146/annurev-biophys-070816-033631

Google Scholar

138
Reconstructing Ancient Proteins to Understand the Causes of Structure and Function

Hochberg, Georg K. A.; Thornton, Joseph W.

Annual Review of Biophysics (2017), 46 (), 247-269CODEN: ARBNCV; ISSN:1936-122X. (Annual Reviews)

A review. A central goal in biochem. is to explain the causes of protein sequence, structure, and function. Mainstream approaches seek to rationalize sequence and structure in terms of their effects on function and to identify function's underlying determinants by comparing related proteins to each other. Although productive, both strategies suffer from intrinsic limitations that have left important aspects of many proteins unexplained. These limits can be overcome by reconstructing ancient proteins, exptl. characterizing their properties, and retracing their evolution through time. This approach has proven to be a powerful means for discovering how historical changes in sequence produced the functions, structures, and other phys./chem. characteristics of modern proteins. It has also illuminated whether protein features evolved because of functional optimization, historical constraint, or blind chance. Here this review recent studies employing ancestral protein reconstruction and show how they have produced new knowledge not only of mol. evolutionary processes but also of the underlying determinants of modern proteins' phys., chem., and biol. properties.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXksVCqs70%253D&md5=19552f9d9e82ad02000e1650203db066
139
Aerts, D.; Verhaeghe, T.; Joosten, H.-J.; Vriend, G.; Soetaert, W.; Desmet, T. Consensus Engineering of Sucrose Phosphorylase: The Outcome Reflects the Sequence Input. Biotechnol. Bioeng. 2013, 110, 2563– 2572, DOI: 10.1002/bit.24940

Google Scholar

139
Consensus Engineering of Sucrose Phosphorylase: The Outcome Reflects the Sequence Input

Aerts, Dirk; Verhaeghe, Tom; Joosten, Henk-Jan; Vriend, Gert; Soetaert, Wim; Desmet, Tom

Biotechnology and Bioengineering (2013), 110 (10), 2563-2572CODEN: BIBIAU; ISSN:0006-3592. (John Wiley & Sons, Inc.)

Consensus engineering, which is replacing amino acids by the most frequently occurring one at their positions in a multiple sequence alignment (MSA), is a known strategy to increase the stability of a protein. The application of this concept to the entire sequence of an enzyme, however, has been tried only a few times mainly because of the problems detg. the consensus in highly variable regions. We show that this problem can be solved by replacing such problematic regions by the corresponding sequence of the natural homolog closest to the consensus. When one or a few sub-families are overrepresented in the MSA the consensus sequence is a biased representation of the sequence space. We examine the influence of this bias by constructing three consensus sequences using different MSAs of sucrose phosphorylase (SP). Each consensus enzyme contained about 70 mutations compared to its closest natural homolog and folded correctly and displayed activity on sucrose. Correlation anal. revealed that the family's co-evolution network was kept intact, which is one of the main advantages of full-length consensus design. The consensus enzymes displayed an "av." thermostability, i.e., one that is higher than some but not all known representatives. We cautiously present practical rules for the design of consensus sequences, but warn that the measure of success depends on which natural enzyme is used as point of comparison.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXntV2nsrY%253D&md5=0eb7d3666500fcc3c3e4b2c9bf0d2726
140
Trudeau, D. L.; Kaltenbach, M.; Tawfik, D. S. On the Potential Origins of the High Stability of Reconstructed Ancestral Proteins. Mol. Biol. Evol. 2016, 33, 2633– 2641, DOI: 10.1093/molbev/msw138

Google Scholar

140
On the potential origins of the high stability of reconstructed ancestral proteins

Trudeau, Devin L.; Kaltenbach, Miriam; Tawfik, Dan S.

Molecular Biology and Evolution (2016), 33 (10), 2633-2641CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)

Ancestral reconstruction provides instrumental insights regarding the biochem. and biophys. characteristics of past proteins. A striking observation relates to the remarkably high thermostability of reconstructed ancestors. The latter has been linked to high environmental temps. in the Precambrian era, the era relating to most reconstructed proteins.We found that inferred ancestors of the serum paraoxonase (PON) enzyme family, including the mammalian ancestor,exhibit dramatically increased thermostabilities compared with the extant, human enzyme (up to 30 °C higher melting temp.). However, the environmental temp. at the time of emergence of mammals is presumed to be similar to the present one. Addnl., the mammalian PON ancestor has superior folding properties (kinetic stability) -unlike the extant mammalian PONs, it expresses in E. coli in a sol. and functional form, and at a high yield. We discuss two potential origins of this unexpectedly high stability. First, ancestral stability may be overestimated by a "consensuseffect," whereby replacing amino acids that are rare in contemporary sequences with the amino acid most common in the family increases protein stability. Comparison to other reconstructed ancestors indicates that the consensus effect may bias some but not all reconstructions. Second, we note that high stability may relate to factors other than high environmental temp. such as oxidative stress or high radiation levels. Foremost, intrinsic factors such as high rates of genetic mutations and/or of transcriptional and translational errors, and less efficient protein quality control systems,may underlie the high kinetic and thermodn. stability of past proteins.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhvVKmsrrK&md5=b75ae4a443a42d2a9c427d4471386c0e
141
Wheeler, L. C.; Lim, S. A.; Marqusee, S.; Harms, M. J. The Thermostability and Specificity of Ancient Proteins. Curr. Opin. Struct. Biol. 2016, 38, 37– 43, DOI: 10.1016/j.sbi.2016.05.015

Google Scholar

141
The thermostability and specificity of ancient proteins

Wheeler, Lucas C.; Lim, Shion A.; Marqusee, Susan; Harms, Michael J.

Current Opinion in Structural Biology (2016), 38 (), 37-43CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)

A review. Were ancient proteins systematically different than modern proteins. The answer to this question is profoundly important, shaping how we understand the origins of protein biochem., biophys., and functional properties. Ancestral sequence reconstruction (ASR), a phylogenetic approach to infer the sequences of ancestral proteins, may reveal such trends. We discuss two proposed trends: a transition from higher to lower thermostability and a tendency for proteins to acquire higher specificity over time. We review the evidence for elevated ancestral thermostability and discuss its possible origins in a changing environmental temp. and/or reconstruction bias. We also conclude that there is, as yet, insufficient data to support a trend from promiscuity to specificity. Finally, we propose future work to understand these proposed evolutionary trends.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XptVCkt7w%253D&md5=0c838f70ed03739135bd21b2da42976a
142
Yang, Z. PAML: A Program Package for Phylogenetic Analysis by Maximum Likelihood. Bioinformatics 1997, 13, 555– 556, DOI: 10.1093/bioinformatics/13.5.555

Google Scholar

There is no corresponding record for this reference.
143
Stamatakis, A. RAxML-VI-HPC: Maximum Likelihood-Based Phylogenetic Analyses with Thousands of Taxa and Mixed Models. Bioinformatics 2006, 22, 2688– 2690, DOI: 10.1093/bioinformatics/btl446

Google Scholar

143
RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models

Stamatakis, Alexandros

Bioinformatics (2006), 22 (21), 2688-2690CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

RAxML-VI-HPC (randomized accelerated max. likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with max. likelihood (ML). Low-level tech. optimizations, a modification of the search algorithm, and the use of the GTR + CAT approxn. as replacement for GTR + Γ yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data contg. 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets ≥4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date contg. 25 057 (1463 bp) and 2182 (51 089 bp) taxa, resp.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtFKlsbfI&md5=7ace2669734254992f338db53aa64702
144
Huelsenbeck, J. P.; Ronquist, F.; Nielsen, R.; Bollback, J. P. Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology. Science 2001, 294, 2310– 2314, DOI: 10.1126/science.1065889

Google Scholar

144
Evolution: Bayesian inference of phylogeny and its impact on evolutionary biology

Huelsenbeck, John P.; Ronquist, Fredrik; Nielsen, Rasmus; Bollback, Jonathan P.

Science (Washington, DC, United States) (2001), 294 (5550), 2310-2314CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)

A review. As a discipline, phylogenetics is becoming transformed by a flood of mol. data. These data allow broad questions to be asked about the history of life, but also present difficult statistical and computational problems. Bayesian inference of phylogeny brings a new perspective to a no. of outstanding issues in evolutionary biol., including the anal. of large phylogenetic trees and complex evolutionary models and the detection of the footprint of natural selection in DNA sequences.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXptFGkt7k%253D&md5=e7a0aada901ae4a53ce15b47e043b436
145
Goldstein, R. A.; Pollard, S. T.; Shah, S. D.; Pollock, D. D. Nonadaptive Amino Acid Convergence Rates Decrease over Time. Mol. Biol. Evol. 2015, 32, 1373– 1381, DOI: 10.1093/molbev/msv041

Google Scholar

145
Nonadaptive amino acid convergence rates decrease over time

Goldstein, R. A.; Pollard, S. T.; Shah, S. D.; Pollock, D. D.

Molecular Biology and Evolution (2015), 32 (6), 1373-1381CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)

Convergence is a central concept in evolutionary studies because it provides strong evidence for adaptation. It also provides information about the nature of the fitness landscape and the repeatability of evolution, and can mislead phylogenetic inference. To understand the role of adaptive convergence, we need to understand the patterns of nonadaptive convergence. Here, we consider the relationship between nonadaptive convergence and divergence in mitochondrial and model proteins. Surprisingly, nonadaptive convergence is much more common than expected in closely related organisms, falling off as organisms diverge. The extent of the convergent drop-off in mitochondrial proteins is well predicted by epistatic or coevolutionary effects in our "evolutionary Stokes shift" models and poorly predicted by conventional evolutionary models. Convergence probabilities decrease dramatically if the ancestral amino acids of branches being compared have diverged, but also drop slowly over evolutionary time even if the ancestral amino acids have not substituted. Convergence probabilities drop-off rapidly for quickly evolving sites, but much more slowly for slowly evolving sites. Furthermore, once sites have diverged their convergence probabilities are extremely low and indistinguishable from convergence levels at randomized sites. These results indicate that we cannot assume that excessive convergence early on is necessarily adaptive. This new understanding should help us to better discriminate adaptive from nonadaptive convergence and develop more relevant evolutionary models with improved validity for phylogenetic inference.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs12qurfM&md5=1adafac45f4d245090310e484216fed6
146
Williams, P. D.; Pollock, D. D.; Blackburne, B. P.; Goldstein, R. A. Assessing the Accuracy of Ancestral Protein Reconstruction Methods. PLoS Comput. Biol. 2006, 2, e69, DOI: 10.1371/journal.pcbi.0020069

Google Scholar

There is no corresponding record for this reference.
147
Eick, G. N.; Bridgham, J. T.; Anderson, D. P.; Harms, M. J.; Thornton, J. W. Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty. Mol. Biol. Evol. 2016, 34, 247– 261, DOI: 10.1093/molbev/msw223

Google Scholar

There is no corresponding record for this reference.
148
Gaucher, E. A.; Govindarajan, S.; Ganesh, O. K. Palaeotemperature Trend for Precambrian Life Inferred from Resurrected Proteins. Nature 2008, 451, 704– 707, DOI: 10.1038/nature06510

Google Scholar

148
Palaeotemperature trend for Precambrian life inferred from resurrected proteins

Gaucher, Eric A.; Govindarajan, Sridhar; Ganesh, Omjoy K.

Nature (London, United Kingdom) (2008), 451 (7179), 704-707CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)

Biosignatures and structures in the geol. record indicate that microbial life has inhabited Earth for ∼3.5 × 109 yr. Research in the phys. sciences has been able to generate statements about the ancient environment that hosted this life. These include the chem. compns. and temps. of the early ocean and atm. Only recently have the natural sciences been able to provide exptl. results describing the environments of ancient life. The authors' previous work with resurrected proteins indicated that ancient life lived in a hot environment. Here, the authors expand the timescale of resurrected proteins to provide a palaeotemp. trend of the environments that hosted life 3.5-0.5 × 109 yr ago. The thermostability of >25 phylogenetically dispersed ancestral elongation factors suggests that the environment supporting ancient life cooled progressively by 30° during that period. Here, the authors show that their results are robust to potential statistical bias assocd. with the posterior distribution of inferred character states, phylogenetic ambiguity, and uncertainties in the amino acid equil. frequencies used by evolutionary models. The results are further supported by a nearly identical cooling trend for the ancient ocean as inferred from the deposition of O isotopes. The convergence of results from natural and phys. sciences suggests that ancient life has continually adapted to changes in environmental temps. throughout its evolutionary history.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhs1Kns7c%253D&md5=12ecd01c6a3fb6f85528bd2424518e85
149
Akanuma, S. Characterization of Reconstructed Ancestral Proteins Suggests a Change in Temperature of the Ancient Biosphere. Life (Basel, Switz.) 2017, 7, 33, DOI: 10.3390/life7030033

Google Scholar

149
Characterization of reconstructed ancestral proteins suggests a change in temperature of the ancient biosphere

Akanuma, Satoshi

Life (Basel, Switzerland) (2017), 7 (3), 33/1-33/14CODEN: LBSIB7; ISSN:2075-1729. (MDPI AG)

Understanding the evolution of ancestral life, and esp. the ability of some organisms to flourish in the variable environments experienced in Earth's early biosphere, requires knowledge of the characteristics and the environment of these ancestral organisms. Information about early life and environmental conditions has been obtained from fossil records and geol. surveys. Recent advances in phylogenetic anal., and an increasing no. of protein sequences available in public databases, have made it possible to infer ancestral protein sequences possessed by ancient organisms. However, the in silico studies that assess the ancestral base content of rRNAs, the frequency of each amino acid in ancestral proteins, and est. the environmental temps. of ancient organisms, show conflicting results. The characterization of ancestral proteins reconstructed in vitro suggests that ancient organisms had very thermally stable proteins, and therefore were thermophilic or hyperthermophilic. Exptl. data supports the idea that only thermophilic ancestors survived the catastrophic increase in temp. of the biosphere that was likely assocd. with meteorite impacts during the early history of Earth. In addn., by expanding the timescale and including more ancestral proteins for reconstruction, it appears as though the Earth's surface temp. gradually decreased over time, from Archean to present.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXjvFKgur4%253D&md5=b10716c8df6fad176f11f86ba8344fc6
150
Gumulya, Y.; Baek, J.-M.; Wun, S.-J.; Thomson, R. E. S.; Harris, K. L.; Hunter, D. J. B.; Behrendorff, J. B. Y. H.; Kulig, J.; Zheng, S.; Wu, X.; Wu, B.; Stok, J. E.; De Voss, J. J.; Schenk, G.; Jurva, U.; Andersson, S.; Isin, E. M.; Bodén, M.; Guddat, L.; Gillam, E. M. J. Engineering Highly Functional Thermostable Proteins Using Ancestral Sequence Reconstruction. Nat. Catal. 2018, 1, 878, DOI: 10.1038/s41929-018-0159-5

Google Scholar

150
Engineering highly functional thermostable proteins using ancestral sequence reconstruction

Gumulya, Yosephin; Baek, Jong-Min; Wun, Shun-Jie; Thomson, Raine E. S.; Harris, Kurt L.; Hunter, Dominic J. B.; Behrendorff, James B. Y. H.; Kulig, Justyna; Zheng, Shan; Wu, Xueming; Wu, Bin; Stok, Jeanette E.; De Voss, James J.; Schenk, Gerhard; Jurva, Ulrik; Andersson, Shalini; Isin, Emre M.; Boden, Mikael; Guddat, Luke; Gillam, Elizabeth M. J.

Nature Catalysis (2018), 1 (11), 878-888CODEN: NCAACP; ISSN:2520-1158. (Nature Research)

Com. biocatalysis requires robust enzymes that can withstand elevated temps. and long incubations. Ancestral reconstruction has shown that pre-Cambrian enzymes were often much more thermostable than extant forms. Here, we resurrect ancestral enzymes that withstand ∼30 °C higher temps. and ≥100 times longer incubations than their extant forms. This is demonstrated on animal cytochromes P 450 that stereo- and regioselectively functionalize unactivated C-H bonds for the synthesis of valuable chems., and bacterial ketol-acid reductoisomerases that are used to make butanol-based biofuels. The vertebrate CYP3 P 450 ancestor showed a 60T50 of 66 °C and enhanced solvent tolerance compared with the human drug-metabolizing CYP3A4, yet comparable activity towards a similarly broad range of substrates. The ancestral ketol-acid reductoisomerase showed an eight-fold higher specific activity than the cognate Escherichia coli form at 25 °C, which increased 3.5-fold at 50 °C. Thus, thermostable proteins can be devised using sequence data alone from even recent ancestors.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtFGisL3E&md5=85eca5d2a0cb9a4d6dc5e8b6e790b718
151
Dehouck, Y.; Grosfils, A.; Folch, B.; Gilis, D.; Bogaerts, P.; Rooman, M. Fast and Accurate Predictions of Protein Stability Changes upon Mutations Using Statistical Potentials and Neural Networks: PoPMuSiC-2.0. Bioinformatics 2009, 25, 2537– 2543, DOI: 10.1093/bioinformatics/btp445

Google Scholar

151
Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0

Dehouck, Yves; Grosfils, Aline; Folch, Benjamin; Gilis, Dimitri; Bogaerts, Philippe; Rooman, Marianne

Bioinformatics (2009), 25 (19), 2537-2543CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

The rational design of proteins with modified properties, through amino acid substitutions, is of crucial importance in a large variety of applications. Given the huge no. of possible substitutions, every protein engineering project would benefit strongly from the guidance of in silico methods able to predict rapidly, and with reasonable accuracy, the stability changes resulting from all possible mutations in a protein. The authors exploit newly developed statistical potentials, based on a formalism that highlights the coupling between 4 protein sequence and structure descriptors, and take into account the amino acid vol. variation upon mutation. The stability change is expressed as a linear combination of these energy functions, whose proportionality coeffs. vary with the solvent accessibility of the mutated residue and are identified with the help of a neural network. A correlation coeff. of R = 0.63 and a root mean square error of σc = 1.15 kcal/mol between measured and predicted stability changes are obtained upon cross-validation. These scores reach R = 0.79, and σc = 0.86 kcal/mol after exclusion of 10% outliers. The predictive power of the authors' method is shown to be significantly higher than that of other programs described in the literature.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtFyhtbbF&md5=59f9acfcafd822a7f3a27eb3cf3538cd
152
Khatun, J.; Khare, S. D.; Dokholyan, N. V. Can Contact Potentials Reliably Predict Stability of Proteins?. J. Mol. Biol. 2004, 336, 1223– 1238, DOI: 10.1016/j.jmb.2004.01.002

Google Scholar

152
Can Contact Potentials Reliably Predict Stability of Proteins?

Khatun, Jainab; Khare, Sagar D.; Dokholyan, Nikolay V.

Journal of Molecular Biology (2004), 336 (5), 1223-1238CODEN: JMOBAK; ISSN:0022-2836. (Elsevier)

The simplest approxn. of interaction potential between amino acid residues in proteins is the contact potential, which defines the effective free energy of a protein conformation by a set of amino acid contacts formed in this conformation. Finding a contact potential capable of predicting free energies of protein states across a variety of protein families will aid protein folding and engineering in silico on a computationally tractable time-scale. We test the ability of contact potentials to accurately and transferably (across various protein families) predict stability changes of proteins upon mutations. We develop a new methodol. to det. the contact potentials in proteins from exptl. measurements of changes in protein's thermodn. stabilities (ΔΔG) upon mutations. We apply our methodol. to derive sets of contact interaction parameters for a hierarchy of interaction models including solvation and multi-body contact parameters. We test how well our models reproduce exptl. measurements by statistical tests. We evaluate the max. accuracy of predictions obtained by using contact potentials and the correlation between parameters derived from different data-sets of exptl. (ΔΔG) values. We argue that it is impossible to reach exptl. accuracy and derive fully transferable contact parameters using the contact models of potentials. However, contact parameters may yield reliable predictions of ΔΔG for datasets of mutations confined to the same amino acid positions in the sequence of a single protein.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXht1Kiu7s%253D&md5=1c0ddc0d286bbd29f6dbde6e0572af8f
153
Pucci, F.; Bernaerts, K. V.; Kwasigroch, J. M.; Rooman, M. Quantification of Biases in Predictions of Protein Stability Changes upon Mutations. Bioinformatics 2018, 34, 3659– 3665, DOI: 10.1093/bioinformatics/bty348

Google Scholar

153
Quantification of biases in predictions of protein stability changes upon mutations

Pucci, Fabrizio; Bernaerts, Katrien V.; Kwasigroch, Jean Marc; Rooman, Marianne

Bioinformatics (2018), 34 (21), 3659-3665CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)

Motivation: Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis expts. feasible, even on a proteome scale. One of these problems is their bias toward the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations. Results: We thoroughly analyzed the biases in the prediction of folding free energy changes upon point mutations (ΔΔG°) and proposed some unbiased solns. We started by constructing a dataset Ssym of exptl. measured ΔΔG°s with an equal no. of stabilizing and destabilizing mutations, by collecting mutations for which the structure of both the wild-type and mutant protein is available. On this balanced dataset, we assessed the performances of 15 widely used ΔΔG° predictors. After the astonishing observation that almost all these methods are strongly biased toward destabilizing mutations, esp. those that use black-box machine learning, we proposed an elegant way to solve the bias issue by imposing phys. symmetries under inverse mutations on the model structure, which we implemented in PoPMuSiCsym. This new predictor constitutes an efficient trade-off between accuracy and absence of biases. Some final considerations and suggestions for further improvement of the predictors are discussed.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVOmtbzF&md5=a54b52981c88512a4c3e843c8aee584b
154
Yin, S.; Ding, F.; Dokholyan, N. V. Eris: An Automated Estimator of Protein Stability. Nat. Methods 2007, 4, 466– 467, DOI: 10.1038/nmeth0607-466

Google Scholar

154
Eris: an automated estimator of protein stability

Yin, Shuangye; Ding, Feng; Dokholyan, Nikolay V.

Nature Methods (2007), 4 (6), 466-467CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)

There is no expanded citation for this reference.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXlvVykurg%253D&md5=63263ef71a3219de60a8faff2ca9cfe3
155
Benedix, A.; Becker, C. M.; de Groot, B. L.; Caflisch, A.; Böckmann, R. A. Predicting Free Energy Changes Using Structural Ensembles. Nat. Methods 2009, 6, 3– 4, DOI: 10.1038/nmeth0109-3

Google Scholar

155
Predicting free energy changes using structural ensembles

Benedix, Alexander; Becker, Caroline M.; de Groot, Bert L.; Caflisch, Amedeo; Boeckmann, Rainer A.

Nature Methods (2009), 6 (1), 3-4CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)

Reliable and fast computation of protein free energy is crucial for protein-structure anal., structure-based protein design and protein docking. Rigorous treatments based on phys. effective energy functions involve computationally expensive methods such as free energy perturbation, which are time-consunming and are thus incompatible with the need to perform extensive scans. Commonly used fast methods, in turn, involve empirically derived scoring functions and usually do not include protein flexibility or are based on statistical potentials and are therefore highly dependent on the availability of case-dependent exptl. training data. Hence, such methods are inherently limited in accuracy and applicability. Here we propose a computational, structure-based method named Concoord/Poisson-Boltzmann surface area (CC/PBSA) for both fast and quant. estn. of the folding free energy of mutants, that is for measuring their conformational stability and for predicting the effect of mutations on protein-protein binding affinity. The first step is to rapidly generate alternative protein conformations via the program Concoord, which efficiently samples the available configurational spaced. The crystal or NMR input structure is translated into a geometric description of the complex, and starting from random coordinates, 300-600 structures both of the mutant and the wild type are generated by iteratively correcting the coordinates until all geometric constraints are fulfilled. Then an energy function based on phys. chem. (force field) and an efficient continuum solvent approach, the soln. of the Poisson-Boltzmann equation and a term for nonpolar solvation, is averaged over the generated structural ensembles.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhsFCku77K&md5=fb79d05fe6984884761d2877f454d87f
156
Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS 4.5: A High-Throughput and Highly Parallel Open Source Molecular Simulation Toolkit. Bioinformatics 2013, 29, 845– 854, DOI: 10.1093/bioinformatics/btt055

Google Scholar

156
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

Pronk, Sander; Pall, Szilard; Schulz, Roland; Larsson, Per; Bjelkmar, Paer; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik

Bioinformatics (2013), 29 (7), 845-854CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Motivation: Mol. simulation has historically been a low-throughput technique, but faster computers and increasing amts. of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomols. with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomol. interaction and function in a manner directly testable by expt. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomols., such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these mols. built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXksFWmsrg%253D&md5=4b25fd6ab4e33725ae56b5da63f4ad68
157
de Groot, B. L.; van Aalten, D. M.; Scheek, R. M.; Amadei, A.; Vriend, G.; Berendsen, H. J. C. Prediction of Protein Conformational Freedom from Distance Constraints. Proteins: Struct., Funct., Genet. 1997, 29, 240– 251, DOI: 10.1002/(SICI)1097-0134(199710)29:2<240::AID-PROT11>3.0.CO;2-O

Google Scholar

157
Prediction of protein conformational freedom from distance constraints

de Groot, B. L.; van Aalten, D. M. F.; Scheek, R. M.; Amadei, A.; Vriend, G.; Berendsen, H. J. C.

Proteins: Structure, Function, and Genetics (1997), 29 (2), 240-251CODEN: PSFGEY; ISSN:0887-3585. (Wiley-Liss)

A method is presented that generates random protein structures that fulfil a set of upper and lower interat. distance limits. These limits depend on distances measured in exptl. structures and the strength of the interat. interaction. Structural differences between generated structures are similar to those obtained from expt. and from MD simulation. Although detailed aspects of dynamical mechanisms are not covered and the extent of variations are only estd. in a relative sense, applications to an IgG-binding domain, an SH3 binding domain, HPr, calmodulin, and lysozyme are presented which illustrate the use of the method as a fast and simple way to predict structural variability in proteins. The method may be used to support the design of mutants, when structural fluctuations for a large no. of mutants are to be screened. The results suggest that motional freedom in proteins is ruled largely by a set of simple geometric constraints.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXntVOhsbY%253D&md5=8840fe5112570bbefcc0ca3e89282ada
158
Hoppe, C.; Schomburg, D. Prediction of Protein Thermostability with a Direction- and Distance-Dependent Knowledge-Based Potential. Protein Sci. 2005, 14, 2682– 2692, DOI: 10.1110/ps.04940705

Google Scholar

158
Prediction of protein thermostability with a direction- and distance-dependent knowledge-based potential

Hoppe, Christian; Schomburg, Dietmar

Protein Science (2005), 14 (10), 2682-2692CODEN: PRCIEI; ISSN:0961-8368. (Cold Spring Harbor Laboratory Press)

The increasing use of enzymes in industrial processes and the importance of understanding protein folding and stability have led to several attempts to predict and quantify the effect of every possible amino acid exchange (mutation) on the thermostability of proteins. In this article the authors describe a knowledge-based discrimination function that acts as a fast and reliable guide in protein engineering and optimization. The function used consists of two parts, a pairwise energy function based on a distance- and direction-dependent at. description of the amino acid environment, and a torsion angle energy function. In a first step a training set of 11 proteins including 646 mutant proteins with exptl. detd. thermostability was used to optimize the knowledge-based energy functions. The resulting potential function was then tested using a test mutant database consisting of 918 various point mutations introduced in 27 proteins. The best correlation coeff. obtained for the exptl. data and the predicted thermostability for the training set is r = 0.81 (561 data points). A total of 76% of the mutations could be predicted correctly as being either stabilizing or destabilizing. The results for the test set are r = 0.74 (747 data points) and 72%, resp. The global correlation over the combined data (1308 mutants) obtained is 0.78.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhtVOrurzN&md5=b9439e5a0eca60cb33c2cbd4762ba7a4
159
Pucci, F.; Bourgeas, R.; Rooman, M. Predicting Protein Thermal Stability Changes upon Point Mutations Using Statistical Potentials: Introducing HoTMuSiC. Sci. Rep. 2016, 6, 23257, DOI: 10.1038/srep23257

Google Scholar

159
Predicting protein thermal stability changes upon point mutations using statistical potentials: Introducing HoTMuSiC

Pucci, Fabrizio; Bourgeas, Raphael; Rooman, Marianne

Scientific Reports (2016), 6 (), 23257CODEN: SRCEC3; ISSN:2045-2322. (Nature Publishing Group)

The accurate prediction of the impact of an amino acid substitution on the thermal stability of a protein is a central issue in protein science, and is of key relevance for the rational optimization of various bioprocesses that use enzymes in unusual conditions. Here we present one of the first computational tools to predict the change in melting temp. ΔTm upon point mutations, given the protein structure and, when available, the melting temp. Tm of the wild-type protein. The key ingredients of our model structure are std. and temp.-dependent statistical potentials, which are combined with the help of an artificial neural network. The model structure was chosen on the basis of a detailed thermodn. anal. of the system. The parameters of the model were identified on a set of more than 1,600 mutations with exptl. measured ΔTm. The performance of our method was tested using a strict 5-fold cross-validation procedure, and was found to be significantly superior to that of competing methods. We obtained a root mean square deviation between predicted and exptl. ΔTm values of 4.2 °C that reduces to 2.9 °C when ten percent outliers are removed. A webserver-based tool is freely available for non-com. use at soft.dezyme.com.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xks1emurk%253D&md5=f945740ddb0a07c32253903a4e3cdfbd
160
Capriotti, E.; Fariselli, P.; Casadio, R. I-Mutant2.0: Predicting Stability Changes upon Mutation from the Protein Sequence or Structure. Nucleic Acids Res. 2005, 33, W306– W310, DOI: 10.1093/nar/gki375

Google Scholar

160
I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure

Capriotti, Emidio; Fariselli, Piero; Casadio, Rita

Nucleic Acids Research (2005), 33 (Web Server), W306-W310CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)

I-Mutant2.0 is a support vector machine (SVM)-based tool for the automatic prediction of protein stability changes upon single point mutations. I-Mutant2.0 predictions are performed starting either from the protein structure or, more importantly, from the protein sequence. This latter task, to the best of our knowledge, is exploited for the first time. The method was trained and tested on a data set derived from ProTherm, which is presently the most comprehensive available database of thermodn. exptl. data of free energy changes of protein stability upon mutation under different conditions. I-Mutant2.0 can be used both as a classifier for predicting the sign of the protein stability change upon mutation and as a regression estimator for predicting the related ΔΔG values. Acting as a classifier, I-Mutant2.0 correctly predicts (with a cross-validation procedure) 80% or 77% of the data set, depending on the usage of structural or sequence information, resp. When predicting ΔΔG values assocd. with mutations, the correlation of predicted with expected/exptl. values is 0.71 (with a std. error of 1.30 kcal/mol) and 0.62 (with a std. error of 1.45 kcal/mol) when structural or sequence information are resp. adopted. Our web interface allows the selection of a predictive mode that depends on the availability of the protein structure and/or sequence. In this latter case, the web server requires only pasting of a protein sequence in a raw format. We therefore introduce I-Mutant2.0 as a unique and valuable helper for protein design, even when the protein structure is not yet known with at. resoln. Availability: http://gpcr.biocomp.uniboit/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXlslyrtLY%253D&md5=75a8728d1e9b62a97b205910ca190d40
161
Cheng, J.; Randall, A.; Baldi, P. Prediction of Protein Stability Changes for Single-Site Mutations Using Support Vector Machines. Proteins: Struct., Funct., Genet. 2006, 62, 1125– 1132, DOI: 10.1002/prot.20810

Google Scholar

161
Prediction of protein stability changes for single-site mutations using support vector machines

Cheng, Jianlin; Randall, Arlo; Baldi, Pierre

Proteins: Structure, Function, and Bioinformatics (2006), 62 (4), 1125-1132CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)

Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. The authors use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. The authors evaluate their approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy - a significant improvement over previously published results. Moreover, the exptl. results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because the authors' method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MU-pro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XivVWnsrY%253D&md5=a14fdcf11c855a7eefbdee0ebb152aea
162
Wainreb, G.; Wolf, L.; Ashkenazy, H.; Dehouck, Y.; Ben-Tal, N. Protein Stability: A Single Recorded Mutation Aids in Predicting the Effects of Other Mutations in the Same Amino Acid Site. Bioinformatics 2011, 27, 3286– 3292, DOI: 10.1093/bioinformatics/btr576

Google Scholar

162
Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid site

Wainreb, Gilad; Wolf, Lior; Ashkenazy, Haim; Dehouck, Yves; Ben-Tal, Nir

Bioinformatics (2011), 27 (23), 3286-3292CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Motivation: Accurate prediction of protein stability is important for understanding the mol. underpinnings of diseases and for the design of new proteins. We introduce a novel approach for the prediction of changes in protein stability that arise from a single-site amino acid substitution; the approach uses available data on mutations occurring in the same position and in other positions. Our algorithm, named Pro-Maya (Protein Mutant stAbilitY Analyzer), combines a collaborative filtering baseline model, Random Forests regression and a diverse set of features. Pro-Maya predicts the stability free energy difference of mutant vs. wild type, denoted as ΔΔG. Results: We evaluated our algorithm extensively using cross-validation on two previously utilized datasets of single amino acid mutations and a (third) validation set. The results indicate that using known ΔΔG values of mutations at the query position improves the accuracy of ΔΔG predictions for other mutations in that position. The accuracy of our predictions in such cases significantly surpasses that of similar methods, achieving, e.g. a Pearson's correlation coeff. of 0.79 and a root mean square error of 0.96 on the validation set. Because Pro-Maya uses a diverse set of features, including predictions using two other methods, it also performs slightly better than other methods in the absence of addnl. exptl. data on the query positions. Availability: Pro-Maya is freely available via web server at http://bentalτac.il/ProMaya. Contact: nirb@tauexτac.il; wolf@Csτac.il.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhsFCit7vE&md5=aecb549af2b498e90d14d6ec222f6e07
163
Li, Y.; Fang, J. PROTS-RF: A Robust Model for Predicting Mutation-Induced Protein Stability Changes. PLoS One 2012, 7, e47247, DOI: 10.1371/journal.pone.0047247

Google Scholar

163
PROTS-RF: a robust model for predicting mutation-induced protein stability changes

Li, Yunqi; Fang, Jianwen

PLoS One (2012), 7 (10), e47247CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)

The ability to improve protein thermostability via protein engineering is of great scientific interest and also has significant practical value. In this report we present PROTS-RF, a robust model based on the Random Forest algorithm capable of predicting thermostability changes induced by not only single-, but also double- or multiple-point mutations. The model is built using 41 features including evolutionary information, secondary structure, solvent accessibility and a set of fragment-based features. It achieves accuracies of 0.799, 0.782, 0.787 and areas under receiver operating characteristic (ROC) curves of 0.873, 0.868 and 0.862 for single-, double- and multiple- point mutation datasets, resp. Contrary to previous suggestions, our results clearly demonstrate that a robust predictive model trained for predicting single point mutation induced thermostability changes can be capable of predicting double and multiple point mutations. It also shows high levels of robustness in the tests using hypothetical reverse mutations. We demonstrate that testing datasets created based on phys. principles can be highly useful for testing the robustness of predictive models.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xhs1SitLnK&md5=290f4ce672a13b0db81ce26e5bc2516d
164
Quang, D.; Chen, Y.; Xie, X. DANN: A Deep Learning Approach for Annotating the Pathogenicity of Genetic Variants. Bioinformatics 2015, 31, 761– 763, DOI: 10.1093/bioinformatics/btu703

Google Scholar

164
DANN: a deep learning approach for annotating the pathogenicity of genetic variants

Quang, Daniel; Chen, Yifei; Xie, Xiaohui

Bioinformatics (2015), 31 (5), 761-763CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Annotating genetic variants, esp. non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large no. of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative redn. in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodol.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1GntLfP&md5=dbb0345a0d2f9b399bdd47e229b40755
165
Wang, Y.; Mao, H.; Yi, Z. Protein Secondary Structure Prediction by Using Deep Learning Method. Knowl.-Based Syst. 2017, 118, 115– 123, DOI: 10.1016/j.knosys.2016.11.015

Google Scholar

There is no corresponding record for this reference.
166
Ivakhnenko, A. G. Polynomial Theory of Complex Systems. IEEE Trans. Syst., Man, Cybern. 1971, SMC-1, 364– 378, DOI: 10.1109/TSMC.1971.4308320

Google Scholar

There is no corresponding record for this reference.
167
Bengio, Y.; Boulanger-Lewandowski, N.; Pascanu, R. Advances in Optimizing Recurrent Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing; IEEE: New York, 2013; pp 8624– 8628.
Google Scholar

There is no corresponding record for this reference.
168
Cang, Z.; Wei, G.-W. TopologyNet: Topology Based Deep Convolutional and Multi-Task Neural Networks for Biomolecular Property Predictions. PLoS Comput. Biol. 2017, 13, e1005690, DOI: 10.1371/journal.pcbi.1005690

Google Scholar

168
TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions

Cang, Zixuan; Wei, Guo-Wei

PLoS Computational Biology (2017), 13 (7), e1005690/1-e1005690/27CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)

Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to threedimensional (3D) biomol. structural data sets have been hindered by the geometric and biol. complexity. To address this problem we introduce the element-specific persistent homol. (ESPH) method. ESPH represents 3D complex geometry by onedimensional (1D) topol. invariants and retains important biol. information via a multichannel image-like representation. This representation reveals hidden structure-function relationships in biomols. We further integrate ESPH and deep convolutional neural networks to construct a multichannel topol. neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the deep learning limitations from small and noisy training sets, we propose a multi-task multichannel topol. convolutional neural network (MM-TCNN). We demonstrate that TopologyNet outperforms the latest methods in the prediction of protein-ligand binding affinities, mutation induced globular protein folding free energy changes, and mutation induced membrane protein folding free energy changes.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXivVWhur4%253D&md5=f09964962b86fa1f30903097cb9e7122
169
Laimer, J.; Hofer, H.; Fritz, M.; Wegenkittl, S.; Lackner, P. MAESTRO - Multi Agent Stability Prediction upon Point Mutations. BMC Bioinf. 2015, 16, 116, DOI: 10.1186/s12859-015-0548-6

Google Scholar

169
MAESTRO--multi agent stability prediction upon point mutations

Laimer Josef; Hofer Heidi; Lackner Peter; Laimer Josef; Fritz Marko; Wegenkittl Stefan

BMC bioinformatics (2015), 16 (), 116 ISSN:.

BACKGROUND: Point mutations can have a strong impact on protein stability. A change in stability may subsequently lead to dysfunction and finally cause diseases. Moreover, protein engineering approaches aim to deliberately modify protein properties, where stability is a major constraint. In order to support basic research and protein design tasks, several computational tools for predicting the change in stability upon mutations have been developed. Comparative studies have shown the usefulness but also limitations of such programs. RESULTS: We aim to contribute a novel method for predicting changes in stability upon point mutation in proteins called MAESTRO. MAESTRO is structure based and distinguishes itself from similar approaches in the following points: (i) MAESTRO implements a multi-agent machine learning system. (ii) It also provides predicted free energy change (Δ ΔG) values and a corresponding prediction confidence estimation. (iii) It provides high throughput scanning for multi-point mutations where sites and types of mutation can be comprehensively controlled. (iv) Finally, the software provides a specific mode for the prediction of stabilizing disulfide bonds. The predictive power of MAESTRO for single point mutations and stabilizing disulfide bonds is comparable to similar methods. CONCLUSIONS: MAESTRO is a versatile tool in the field of stability change prediction upon point mutations. Executables for the Linux and Windows operating systems are freely available to non-commercial users from http://biwww.che.sbg.ac.at/MAESTRO.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2MjkvFGnsQ%253D%253D&md5=531c2cd74bf7afb2770b54fa88e8b71d
170
Khan, S.; Vihinen, M. Performance of Protein Stability Predictors. Hum. Mutat. 2010, 31, 675– 684, DOI: 10.1002/humu.21242

Google Scholar

170
Performance of protein stability predictors

Khan, Sofia; Vihinen, Mauno

Human Mutation (2010), 31 (6), 675-684CODEN: HUMUE3; ISSN:1059-7794. (Wiley-Liss, Inc.)

Stability is a fundamental property affecting function, activity, and regulation of biomols. Stability changes are often found for mutated proteins involved in diseases. Stability predictors computationally predict protein-stability changes caused by mutations. We performed a systematic anal. of 11 online stability predictors' performances. These predictors are CUPSAT, Dmutant, FoldX, I-Mutant2.0, two versions of I-Mutant3.0 (sequence and structure versions), MultiMutate, MUpro, SCide, Scpred, and SRide. As input, 1,784 single mutations found in 80 proteins were used, and these mutations did not include those used for training. The programs' performances were also assessed according to where the mutations were found in the proteins, i.e., in secondary structures and on the surface or in the core of a protein, and according to protein structure type. The extents to which the mutations altered the occupied vols. at the residue sites and the charge interactions were also characterized. The predictions of all programs were in line with the exptl. data. I-Mutant3.0 (utilizing structural information), Dmutant, and FoldX were the most reliable predictors. The stability-center predictors performed with similar accuracy. However, at best, the predictions were only moderately accurate (∼60%) and significantly better tools would be needed for routine anal. of mutation effects.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXosl2lu70%253D&md5=5d887eca5281e83e6e02d9d7e2ff1176
171
Usmanova, D. R.; Bogatyreva, N. S.; Ariño Bernad, J.; Eremina, A. A.; Gorshkova, A. A.; Kanevskiy, G. M.; Lonishin, L. R.; Meister, A. V.; Yakupova, A. G.; Kondrashov, F. A.; Ivankov, D. N. Self-Consistency Test Reveals Systematic Bias in Programs for Prediction Change of Stability upon Mutation. Bioinformatics 2018, 34, 3653– 3658, DOI: 10.1093/bioinformatics/bty340

Google Scholar

171
Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation

Usmanova, Dinara R.; Bogatyreva, Natalya S.; Bernad, Joan Arino; Eremina, Aleksandra A.; Gorshkova, Anastasiya A.; Kanevskiy, German M.; Lonishin, Lyubov R.; Meister, Alexander V.; Yakupova, Alisa G.; Kondrashov, Fyodor A.; Ivankov, Dmitry N.

Bioinformatics (2018), 34 (21), 3653-3658CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)

Motivation: Computational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results esp. when exploring the effects of combination of different mutations. Results: Here we use a protocol to measure the bias as a function of the no. of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without exptl. measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using addnl. relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVOmtb3M&md5=b9568a7765adfed851715d8e389c42f0
172
Montanucci, L.; Martelli, P. L.; Ben-Tal, N.; Fariselli, P. A Natural Upper Bound to the Accuracy of Predicting Protein Stability Changes upon Mutations. 2018, arXiv:1809.10389 [q-bio.BM]. arXiv.org e-Print archive. https://arxiv.org/abs/1809.10389.
Google Scholar

There is no corresponding record for this reference.
173
Rice, P.; Longden, I.; Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000, 16, 276– 277, DOI: 10.1016/S0168-9525(00)02024-2

Google Scholar

173
EMBOSS: the european molecular biology open software suite

Rice, Peter; Longden, Ian; Bleasby, Alan

Trends in Genetics (2000), 16 (6), 276-277CODEN: TRGEE2; ISSN:0168-9525. (Elsevier Science Ltd.)

There is no expanded citation for this reference.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXjvVygsbs%253D&md5=6608aa9c93ff3740ca8af20578774ebe
174
Lu, G.; Moriyama, E. N. Vector NTI, a Balanced All-in-One Sequence Analysis Suite. Briefings Bioinf. 2004, 5, 378– 388, DOI: 10.1093/bib/5.4.378

Google Scholar

174
Vector NTI, a balanced all-in-one sequence analysis suite

Lu, Guoqing; Moriyama, Etsuko N.

Briefings in Bioinformatics (2004), 5 (4), 378-388CODEN: BBIMFX; ISSN:1467-5463. (Henry Stewart Publications)

A review. Vector NTI is a well-balanced desktop application integrated for mol. sequence anal. and biol. data management. It has a centralized database and five application modules: Vector NTI, AlignX, BioAnnotator, ContigExpress and GenomBench. The features and functions available in this software are examd. These include database management, primer design, virtual cloning, alignments, sequence assembly, 3D mol. viewer and Internet tools. Some problems encountered when using this software are also discussed. Vector NTI is a tool that can save time and enhance anal. but it requires some learning on the user's part and there are some issues that need to be addressed by the developer.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhsVejt7k%253D&md5=6b12d412ce01d84107f45d90844ca199
175
Bendl, J.; Stourac, J.; Sebestova, E.; Vavra, O.; Musil, M.; Brezovsky, J.; Damborsky, J. HotSpot Wizard 2.0: Automated Design of Site-Specific Mutations and Smart Libraries in Protein Engineering. Nucleic Acids Res. 2016, 44, W479– 487, DOI: 10.1093/nar/gkw416

Google Scholar

175
HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering

Bendl, Jaroslav; Stourac, Jan; Sebestova, Eva; Vavra, Ondrej; Musil, Milos; Brezovsky, Jan; Damborsky, Jiri

Nucleic Acids Research (2016), 44 (W1), W479-W487CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)

HotSpot Wizard 2.0 is a web server for automated identification of hot spots and design of smart libraries for engineering proteins' stability, catalytic activity, substrate specificity and enantioselectivity. The server integrates sequence, structural and evolutionary information obtained from 3 databases and 20 computational tools. Users are guided through the processes of selecting hot spots using four different protein engineering strategies and optimizing the resulting library's size by narrowing down a set of substitutions at individual randomized positions. The only required input is a query protein structure. The results of the calcns. are mapped onto the protein's structure and visualized with a JSmol applet. HotSpot Wizard lists annotated residues suitable for mutagenesis and can automatically design appropriate codons for each implemented strategy. Overall, HotSpot Wizard provides comprehensive annotations of protein structures and assists protein engineers with the rational design of site-specific mutations and focused libraries.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtV2itrfJ&md5=01158b85880a6ce74f23fa5a8ccb8fb8
176
Stamatakis, A. RAxML Version 8: A Tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics 2014, 30, 1312– 1313, DOI: 10.1093/bioinformatics/btu033

Google Scholar

176
RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies

Stamatakis, Alexandros

Bioinformatics (2014), 30 (9), 1312-1313CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Motivation: Phylogenies are increasingly used in all fields of medical and biol. research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under max. likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addn., an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/std.-RAxML. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXmvFCjsbc%253D&md5=4cd7a44e28cbb6dc49d38056c2c3d3a7
177
Ashkenazy, H.; Penn, O.; Doron-Faigenboim, A.; Cohen, O.; Cannarozzi, G.; Zomer, O.; Pupko, T. FastML: A Web Server for Probabilistic Reconstruction of Ancestral Sequences. Nucleic Acids Res. 2012, 40, W580– 584, DOI: 10.1093/nar/gks498

Google Scholar

177
FastML: a web server for probabilistic reconstruction of ancestral sequences

Ashkenazy, Haim; Penn, Osnat; Doron-Faigenboim, Adi; Cohen, Ofir; Cannarozzi, Gina; Zomer, Oren; Pupko, Tal

Nucleic Acids Research (2012), 40 (W1), W580-W584CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)

Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastmlτac.il/.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXjtVCrs7Y%253D&md5=b38b2e961d01140374e2ae004157411f
178
Diallo, A. B.; Makarenkov, V.; Blanchette, M. Ancestors 1.0: A Web Server for Ancestral Sequence Reconstruction. Bioinformatics 2010, 26, 130– 131, DOI: 10.1093/bioinformatics/btp600

Google Scholar

178
Ancestors 1.0: a web server for ancestral sequence reconstruction

Diallo, Abdoulaye Banire; Makarenkov, Vladimir; Blanchette, Mathieu

Bioinformatics (2010), 26 (1), 130-131CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Summary: The computational inference of ancestral genomes consists of five difficult steps: identifying syntenic regions, inferring ancestral arrangement of syntenic regions, aligning multiple sequences, reconstructing the insertion and deletion history and finally inferring substitutions. Each of these steps have received lot of attention in the past years. However, there currently exists no framework that integrates all of the different steps in an easy workflow. Here, we introduce Ancestors 1.0, a web server allowing one to easily and quickly perform the last three steps of the ancestral genome reconstruction procedure. It implements several alignment algorithms, an indel max. likelihood solver and a context-dependent max. likelihood substitution inference algorithm. The results presented by the server include the posterior probabilities for the last two steps of the ancestral genome reconstruction and the expected error rate of each ancestral base prediction.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhs1WlurnO&md5=97c14a9db63c10f8e238cf1a4424cd10
179
Westesson, O.; Barquist, L.; Holmes, I. HandAlign: Bayesian Multiple Sequence Alignment, Phylogeny and Ancestral Reconstruction. Bioinformatics 2012, 28, 1170– 1171, DOI: 10.1093/bioinformatics/bts058

Google Scholar

179
HandAlign: Bayesian multiple sequence alignment, phylogeny and ancestral reconstruction

Westesson, Oscar; Barquist, Lars; Holmes, Ian

Bioinformatics (2012), 28 (8), 1170-1171CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Summary: We describe , a software package for Bayesian reconstruction of phylogenetic history. The underlying model of sequence evolution describes indels and substitutions. Alignments, trees and model parameters are all treated as jointly dependent random variables and sampled via Metropolis-Hastings Markov chain Monte Carlo (MCMC), enabling systematic statistical parameter inference and hypothesis testing. implements several different MCMC proposal kernels, allows sampling from arbitrary target distributions via Hastings ratios, and uses std. file formats for trees, alignments and models. Availability and Implementation: Installation and usage instructions are at http://biowiki.org/HandAlign Contact: [email protected] Supplementary information: Supplementary material is available at Bioinformatics online.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xlt1Gms70%253D&md5=b92f47dac2f20d877638f8a313602358
180
Ronquist, F.; Teslenko, M.; van der Mark, P.; Ayres, D. L.; Darling, A.; Höhna, S.; Larget, B.; Liu, L.; Suchard, M. A.; Huelsenbeck, J. P. MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice across a Large Model Space. Syst. Biol. 2012, 61, 539– 542, DOI: 10.1093/sysbio/sys029

Google Scholar

180
MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space

Ronquist Fredrik; Teslenko Maxim; van der Mark Paul; Ayres Daniel L; Darling Aaron; Hohna Sebastian; Larget Bret; Liu Liang; Suchard Marc A; Huelsenbeck John P

Systematic biology (2012), 61 (3), 539-42 ISSN:.

Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38vjvFCqsA%253D%253D&md5=08e0e38811e8752992234a53a0cd1d4f
181
Finn, R. D.; Clements, J.; Eddy, S. R. HMMER Web Server: Interactive Sequence Similarity Searching. Nucleic Acids Res. 2011, 39, W29– 37, DOI: 10.1093/nar/gkr367

Google Scholar

181
HMMER web server: interactive sequence similarity searching

Finn, Robert D.; Clements, Jody; Eddy, Sean R.

Nucleic Acids Research (2011), 39 (Web Server), W29-W37CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)

HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted work-flows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the no. of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXosVOntbg%253D&md5=69e4432be46e905b8d9afa29c667f684
182
Altschul, S. F.; Gertz, E. M.; Agarwala, R.; Schäffer, A. A.; Yu, Y.-K. PSI-BLAST Pseudocounts and the Minimum Description Length Principle. Nucleic Acids Res. 2009, 37, 815– 824, DOI: 10.1093/nar/gkn981

Google Scholar

182
PSI-BLAST pseudocounts and the minimum description length principle

Altschul, Stephen F.; Gertz, E. Michael; Agarwala, Richa; Schaeffer, Alejandro A.; Yu, Yi-Kuo

Nucleic Acids Research (2009), 37 (3), 815-824CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)

Position specific score matrixes (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the obsd. amino acid counts in a multiple alignment column. In the absence of theory, the no. of pseudocounts used has been a completely empirical parameter. This article argues that the min. description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a no. of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calcg. pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXisFektrc%253D&md5=589075aa5cc67d2dbfa12552a8a939f1
183
Whitehead, T. A.; Chevalier, A.; Song, Y.; Dreyfus, C.; Fleishman, S. J.; De Mattos, C.; Myers, C. A.; Kamisetty, H.; Blair, P.; Wilson, I. A.; Baker, D. Optimization of Affinity, Specificity and Function of Designed Influenza Inhibitors Using Deep Sequencing. Nat. Biotechnol. 2012, 30, 543– 548, DOI: 10.1038/nbt.2214

Google Scholar

183
Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing

Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan; Dreyfus, Cyrille; Fleishman, Sarel J.; De Mattos, Cecilia; Myers, Chris A.; Kamisetty, Hetunandan; Blair, Patrick; Wilson, Ian A.; Baker, David

Nature Biotechnology (2012), 30 (6), 543-548CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)

We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XnsFKgu7s%253D&md5=510fc078ab77b487db059e932395513c
184
Shimizu, Y.; Inoue, A.; Tomari, Y.; Suzuki, T.; Yokogawa, T.; Nishikawa, K.; Ueda, T. Cell-Free Translation Reconstituted with Purified Components. Nat. Biotechnol. 2001, 19, 751– 755, DOI: 10.1038/90802

Google Scholar

184
Cell-free translation reconstituted with purified components

Shimizu, Yoshihiro; Inoue, Akio; Tomari, Yukihide; Suzuki, Tsutomu; Yokogawa, Takashi; Nishikawa, Kazuya; Ueda, Takuya

Nature Biotechnology (2001), 19 (8), 751-755CODEN: NABIF9; ISSN:1087-0156. (Nature America Inc.)

We have developed a protein-synthesizing system reconstituted from recombinant tagged protein factors purified to homogeneity. The system was able to produce protein at a rate of about 160 μg/mL/h in a batch mode without the need for any supplementary app. The protein products were easily purified within 1 h using affinity chromatog. to remove the tagged protein factors. Moreover, omission of a release factor allowed efficient incorporation of an unnatural amino acid using suppressor tRNA.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXlslekt7g%253D&md5=8560f1b7319ea88b4784a4f02bafcbaf
185
Niwa, T.; Kanamori, T.; Ueda, T.; Taguchi, H. Global Analysis of Chaperone Effects Using a Reconstituted Cell-Free Translation System. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 8937– 8942, DOI: 10.1073/pnas.1201380109

Google Scholar

185
Global analysis of chaperone effects using a reconstituted cell-free translation system

Niwa, Tatsuya; Kanamori, Takashi; Ueda, Takuya; Taguchi, Hideki

Proceedings of the National Academy of Sciences of the United States of America (2012), 109 (23), 8937-8942, S8937/1-S8937/8CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)

Protein folding is often hampered by protein aggregation, which can be prevented by a variety of chaperones in the cell. A dataset that evaluates which chaperones are effective for aggregation-prone proteins would provide an invaluable resource not only for understanding the roles of chaperones, but also for broader applications in protein science and engineering. Therefore, we comprehensively evaluated the effects of the major Escherichia coli chaperones, trigger factor, DnaK/DnaJ/GrpE, and GroEL/GroES, on ∼800 aggregation-prone cytosolic E. coli proteins, using a reconstituted chaperone-free translation system. Statistical analyses revealed the robustness and the intriguing properties of chaperones. The DnaK and GroEL systems drastically increased the solubilities of hundreds of proteins with weak biases, whereas trigger factor had only a marginal effect on soly. The combined addn. of the chaperones was effective for a subset of proteins that were not rescued by any single chaperone system, supporting the synergistic effect of these chaperones. The resource, which is accessible via a public database, can be used to investigate the properties of proteins of interest in terms of their solubilities and chaperone effects.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XovF2gtLw%253D&md5=72312246f5d49ef2d94e69dac05dca7b
186
Berman, H. M.; Gabanyi, M. J.; Kouranov, A.; Micallef, D. I.; Westbrook, J. Protein Structure Initiative - TargetTrack 2000–2017 - All Data Files. DOI: 10.5281/zenodo.821654 .

Google Scholar

There is no corresponding record for this reference.
187
Price, W. N.; Handelman, S. K.; Everett, J. K.; Tong, S. N.; Bracic, A.; Luff, J. D.; Naumov, V.; Acton, T.; Manor, P.; Xiao, R.; Rost, B.; Montelione, G. T.; Hunt, J. F. Large-Scale Experimental Studies Show Unexpected Amino Acid Effects on Protein Expression and Solubility in Vivo in E. coli. Microb. Inf. Exp. 2011, 1, 6, DOI: 10.1186/2042-5783-1-6

Google Scholar

187
Large-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coli

Price, W. Nicholson, II; Handelman, Samuel K.; Everett, John K.; Tong, Saichiu N.; Bracic, Ana; Luff, Jon D.; Naumov, Victor; Acton, Thomas; Manor, Philip; Xiao, Rong; Rost, Burkhard; Montelione, Gaetano T.; Hunt, John F.

Microbial Informatics and Experimentation (2011), 1 (), 6CODEN: MIEIBV; ISSN:2042-5783. (BioMed Central Ltd.)

The biochem. and phys. factors controlling protein expression level and soly. in vivo remain incompletely characterized. To gain insight into the primary sequence features influencing these outcomes, we performed statistical analyses of results from the high-throughput protein-prodn. pipeline of the Northeast Structural Genomics Consortium. Proteins expressed in E. coli and consistently purified were scored independently for expression and soly. levels. These parameters nonetheless show a very strong pos. correlation. We used logistic regressions to det. whether they are systematically influenced by fractional amino acid compn. or several bulk sequence parameters including hydrophobicity, sidechain entropy, electrostatic charge, and predicted backbone disorder. Decreasing hydrophobicity correlates with higher expression and soly. levels, but this correlation apparently derives solely from the beneficial effect of three charged amino acids, at least for bacterial proteins. In fact, the three most hydrophobic residues showed very different correlations with soly. level. Leu showed the strongest neg. correlation among amino acids, while Ile showed a slightly pos. correlation in most data segments. Several other amino acids also had unexpected effects. Notably, Arg correlated with decreased expression and, most surprisingly, soly. of bacterial proteins, an effect only partially attributable to rare codons. However, rare codons did significantly reduce expression despite use of a codon-enhanced strain. Addnl. analyses suggest that pos. but not neg. charged amino acids may reduce translation efficiency in E. coli irresp. of codon usage. While some obsd. effects may reflect indirect evolutionary correlations, others may reflect basic physicochem. phenomena. We used these results to construct and validate predictors of expression and soly. levels and overall protein usability, and we propose new strategies to be explored for engineering improved protein expression and soly.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXpt12gsbw%253D&md5=82e9dc51ba9a58313e6879a9c634717f
188
Hirose, S.; Kawamura, Y.; Yokota, K.; Kuroita, T.; Natsume, T.; Komiya, K.; Tsutsumi, T.; Suwa, Y.; Isogai, T.; Goshima, N.; Noguchi, T. Statistical Analysis of Features Associated with Protein Expression/Solubility in an in Vivo Escherichia coli Expression System and a Wheat Germ Cell-Free Expression System. J. Biochem. 2011, 150, 73– 81, DOI: 10.1093/jb/mvr042

Google Scholar

188
Statistical analysis of features associated with protein expression/solubility in an in vivo Escherichia coli expression system and a wheat germ cell-free expression system

Hirose, Shuichi; Kawamura, Yoshifumi; Yokota, Kiyonobu; Kuroita, Toshihiro; Natsume, Tohru; Komiya, Kazuo; Tsutsumi, Takeshi; Suwa, Yorimasa; Isogai, Takao; Goshima, Naoki; Noguchi, Tamotsu

Journal of Biochemistry (2011), 150 (1), 73-81CODEN: JOBIAO; ISSN:0021-924X. (Japanese Biochemical Society)

Recombinant protein technol. is an important tool in many industrial and pharmacol. applications. Although the success rate of obtaining sol. proteins is relatively low, knowledge of protein expression/soly. under std.' conditions may increase the efficiency and reduce the cost of proteomics studies. In this study, we conducted a genome-scale expt. to assess the overexpression and the soly. of human full-length cDNA in an in vivo Escherichia coli expression system and a wheat germ cell-free expression system. We evaluated the influences of sequence and structural features on protein expression/soly. in each system and estd. a minimal set of features assocd. with them. A comparison of the feature sets related to protein expression/soly. in the in vivo Escherichia coli expression system revealed that the structural information was strongly assocd. with protein expression, rather than protein soly. Moreover, a significant difference was found in the no. of features assocd. with protein soly. in the two expression systems.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXosVOnsbg%253D&md5=7330f2d39d93e7641ee73536e6faee97
189
Pawlicki, S.; Le Béchec, A.; Delamarche, C. AMYPdb: A Database Dedicated to Amyloid Precursor Proteins. BMC Bioinf. 2008, 9, 273, DOI: 10.1186/1471-2105-9-273

Google Scholar

189
AMYPdb: a database dedicated to amyloid precursor proteins

Pawlicki Sandrine; Le Bechec Antony; Delamarche Christian

BMC bioinformatics (2008), 9 (), 273 ISSN:.

BACKGROUND: Misfolding and aggregation of proteins into ordered fibrillar structures is associated with a number of severe pathologies, including Alzheimer's disease, prion diseases, and type II diabetes. The rapid accumulation of knowledge about the sequences and structures of these proteins allows using of in silico methods to investigate the molecular mechanisms of their abnormal conformational changes and assembly. However, such an approach requires the collection of accurate data, which are inconveniently dispersed among several generalist databases. RESULTS: We therefore created a free online knowledge database (AMYPdb) dedicated to amyloid precursor proteins and we have performed large scale sequence analysis of the included data. Currently, AMYPdb integrates data on 31 families, including 1,705 proteins from nearly 600 organisms. It displays links to more than 2,300 bibliographic references and 1,200 3D-structures. A Wiki system is available to insert data into the database, providing a sharing and collaboration environment. We generated and analyzed 3,621 amino acid sequence patterns, reporting highly specific patterns for each amyloid family, along with patterns likely to be involved in protein misfolding and aggregation. CONCLUSION: AMYPdb is a comprehensive online database aiming at the centralization of bioinformatic data regarding all amyloid proteins and their precursors. Our sequence pattern discovery and analysis approach unveiled protein regions of significant interest. AMYPdb is freely accessible 1.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1cvis1Kqsg%253D%253D&md5=066a0a7b2527a74deb78bad957070fc4
190
Thompson, M. J.; Sievers, S. A.; Karanicolas, J.; Ivanova, M. I.; Baker, D.; Eisenberg, D. The 3D Profile Method for Identifying Fibril-Forming Segments of Proteins. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 4074– 4078, DOI: 10.1073/pnas.0511295103

Google Scholar

190
The 3D profile method for identifying fibril-forming segments of proteins

Thompson, Michael J.; Sievers, Stuart A.; Karanicolas, John; Ivanova, Magdalena I.; Baker, David; Eisenberg, David

Proceedings of the National Academy of Sciences of the United States of America (2006), 103 (11), 4074-4078CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)

Based on the crystal structure of the cross-β spine formed by the peptide NNQQNY, we have developed a computational approach for identifying those segments of amyloidogenic proteins that themselves can form amyloid-like fibrils. The approach builds on expts. showing that hexapeptides are sufficient for forming amyloid-like fibrils. Each six-residue peptide of a protein of interest is mapped onto an ensemble of templates, or 3D profile, generated from the crystal structure of the peptide NNQQNY by small displacements of one of the two intermeshed β-sheets relative to the other. The energy of each mapping of a sequence to the profile is evaluated by using ROSETTADESIGN, and the lowest energy match for a given peptide to the template library is taken as the putative prediction. If the energy of the putative prediction is lower than a threshold value, a prediction of fibril formation is made. This method can reach an accuracy of ≈80% with a P value of ≈10-12 when a conservative energy threshold is used to sep. peptides that form fibrils from those that do not. We see enrichment for pos. predictions in a set of fibril-forming segments of amyloid proteins, and we illustrate the method with applications to proteins of interest in amyloid research.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XivFWitbo%253D&md5=e9bbb052fa861fe0f2ac116efeedaa23
191
Beerten, J.; Van Durme, J.; Gallardo, R.; Capriotti, E.; Serpell, L.; Rousseau, F.; Schymkowitz, J. WALTZ-DB: A Benchmark Database of Amyloidogenic Hexapeptides. Bioinformatics 2015, 31, 1698– 1700, DOI: 10.1093/bioinformatics/btv027

Google Scholar

191
WALTZ-DB: a benchmark database of amyloidogenic hexapeptides

Beerten, Jacinte; Van Durme, Joost; Gallardo, Rodrigo; Capriotti, Emidio; Serpell, Louise; Rousseau, Frederic; Schymkowitz, Joost

Bioinformatics (2015), 31 (10), 1698-1700CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Summary: Accurate prediction of amyloid-forming amino acid sequences remains an important challenge. We here present an online database that provides open access to the largest set of exptl. characterized amyloid forming hexapeptides. To this end, we expanded our previous set of 280 hexapeptides used to develop the Waltz algorithm with 89 peptides from literature review and by systematic exptl. characterization of the aggregation of 720 hexapeptides by transmission electron microscopy, dye binding and Fourier transform IR spectroscopy. This brings the total no. of exptl. characterized hexapeptides in the WALTZ-DB database to 1089, of which 244 are annotated as pos. for amyloid formation.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1GntLbL&md5=dea8cf53396bc57a03ab464287bca20c
192
Wozniak, P. P.; Kotulska, M. AmyLoad: Website Dedicated to Amyloidogenic Protein Fragments. Bioinformatics 2015, 31, 3395– 3397, DOI: 10.1093/bioinformatics/btv375

Google Scholar

192
AmyLoad: website dedicated to amyloidogenic protein fragments

Wozniak, Pawel P.; Kotulska, Malgorzata

Bioinformatics (2015), 31 (20), 3395-3397CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Analyses of amyloidogenic sequence fragments are essential in studies of neurodegenerative diseases. However, there is no one internet dataset that collects all the sequences that have been investigated for their amyloidogenicity. Therefore, we have created the AmyLoad website which collects the amyloidogenic sequences from all major sources. The website allows for filtration of the fragments and provides detailed information about each of them. Registered users can both personalize their work with the website and submit their own sequences into the database. To maintain database reliability, submitted sequences are reviewed before making them available to the public. Finally, we re-implemented several amyloidogenic sequence predictors, thus the AmyLoad website can be used as a sequence anal. tool. We encourage researchers working on amyloid proteins to contribute to our service.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1Cit7zK&md5=6a5be50aa459e25d0138ffd3226de846
193
Sastry, A.; Monk, J.; Tegel, H.; Uhlen, M.; Palsson, B. O.; Rockberg, J.; Brunk, E. Machine Learning in Computational Biology to Accelerate High-Throughput Protein Expression. Bioinformatics 2017, 33, 2487– 2495, DOI: 10.1093/bioinformatics/btx207

Google Scholar

193
Machine learning in computational biology to accelerate high-throughput protein expression

Sastry Anand; Monk Jonathan; Palsson Bernhard O; Brunk Elizabeth; Tegel Hanna; Uhlen Mathias; Rockberg Johan; Uhlen Mathias; Palsson Bernhard O; Brunk Elizabeth

Bioinformatics (Oxford, England) (2017), 33 (16), 2487-2495 ISSN:.

Motivation: The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility. Results: Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation. Availability and implementation: We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template for analysis of further expression and solubility datasets. Contact: [email protected] or [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1cvmslaguw%253D%253D&md5=df1098665ccb1c5b077d6c6887322336
194
Thangakani, A. M.; Nagarajan, R.; Kumar, S.; Sakthivel, R.; Velmurugan, D.; Gromiha, M. M. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation. PLoS One 2016, 11, e0152949, DOI: 10.1371/journal.pone.0152949

Google Scholar

194
CPAD, curated protein aggregation database: a repository of manually curated experimental data on protein and peptide aggregation

Thangakani, A. Mary; Nagarajan, R.; Kumar, Sandeep; Sakthivel, R.; Velmurugan, D.; Gromiha, M. Michael

PLoS One (2016), 11 (4), e0152949/1-e0152949/7CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)

Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are crit. to research on protein misfolding diseases, such as Alzheimer's and Parkinson's, as well as biotechnol. prodn. of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from exptl. studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 exptl. obsd. aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of addnl. information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Exptl. validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1Gns7bK&md5=25848d120280c0afb71e16fbe96f918d
195
Tian, Y.; Deutsch, C.; Krishnamoorthy, B. Scoring Function To Predict Solubility Mutagenesis. Algorithms Mol. Biol. 2010, 5, 33, DOI: 10.1186/1748-7188-5-33

Google Scholar

195
Scoring function to predict solubility mutagenesis

Tian Ye; Deutsch Christopher; Krishnamoorthy Bala

Algorithms for molecular biology : AMB (2010), 5 (), 33 ISSN:.

BACKGROUND: Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention. RESULTS: We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%. AVAILABILITY: Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3cbgvVOltA%253D%253D&md5=8bdcda410281dcee011391f17e78febf
196
Wilkinson, D. L.; Harrison, R. G. Predicting the Solubility of Recombinant Proteins in Escherichia coli. Nat. Biotechnol. 1991, 9, 443– 448, DOI: 10.1038/nbt0591-443

Google Scholar

196
Predicting the solubility of recombinant proteins in Escherichia coli

Wilkinson, David L.; Harrison, Roger G.

Bio/Technology (1991), 9 (5), 443-8CODEN: BTCHDA; ISSN:0733-222X.

The cause of inclusion body formation in E. coli grown at 37° was studied using statistical anal. of the compn. of 81 proteins that do and do not form inclusion bodies. Six compn. derived parameters were used. In declining order of their correlation with inclusion body formation, the parameters are charge av., turn forming residue fraction, cysteine fraction, proline fraction, hydrophilicity, and total no. of residues. The correlation with inclusion body formation is strong for the 1st 2 parameters but weak for the last 4. This correlation can be used to predict the probability that a protein will form inclusion bodies using only the protein's amino acid compn. as the basis for the prediction.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK38Xjt1an&md5=b522dcdccd3f0c40b85d10cd5df10826
197
Davis, G. D.; Elisee, C.; Newham, D. M.; Harrison, R. G. New Fusion Protein Systems Designed to Give Soluble Expression in Escherichia coli. Biotechnol. Bioeng. 1999, 65, 382– 388, DOI: 10.1002/(SICI)1097-0290(19991120)65:4<382::AID-BIT2>3.0.CO;2-I

Google Scholar

197
New fusion protein systems designed to give soluble expression in Escherichia coli

Davis, Gregory D.; Elisee, Claude; Newham, Denton M.; Harrison, Roger G.

Biotechnology and Bioengineering (1999), 65 (4), 382-388CODEN: BIBIAU; ISSN:0006-3592. (John Wiley & Sons, Inc.)

Three native E. coli proteins-NusA, GrpE, and bacterioferritin (BFR)-were studied in fusion proteins expressed in E. coli for their ability to confer soly. on a target insol. protein at the C-terminus of the fusion protein. These three proteins were chosen based on their favorable cytoplasmic soly. characteristics as predicted by a statistical soly. model for recombinant proteins in E. coli. Modeling predicted the probability of sol. fusion protein expression for the target insol. protein human interleukin-3 (hIL-3) in the following order: NusA (most sol.), GrpE, BFR, and thioredoxin (least sol.). Expression expts. at 37° showed that the NusA/hIL-3 fusion protein was expressed almost completely in the sol. fraction, while GrpE/hIL-3 and BFR/hIL-3 exhibited partial soly. at 37°. Thioredoxin/hIL-3 was expressed almost completely in the insol. fraction. Fusion proteins consisting of NusA and either bovine growth hormone or human interferon-γ were also expressed in E. coli at 37° and again showed that the fusion protein was almost completely sol. Starting with the NusA/hIL-3 fusion protein with an N-terminal histidine tag, purified hIL-3 with full biol. activity was obtained using immobilized metal affinity chromatog., factor Xa protease cleavage, and anion exchange chromatog.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1MXmslGktr8%253D&md5=3c966e2554f136b96e47e14c11680506
198
Magnan, C. N.; Randall, A.; Baldi, P. SOLpro: Accurate Sequence-Based Prediction of Protein Solubility. Bioinformatics 2009, 25, 2200– 2207, DOI: 10.1093/bioinformatics/btp386

Google Scholar

198
SOLpro: accurate sequence-based prediction of protein solubility

Magnan, Christophe N.; Randall, Arlo; Baldi, Pierre

Bioinformatics (2009), 25 (17), 2200-2207CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Protein insoly. is a major obstacle for many exptl. studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be sol. on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the soly. of insol. proteins. Here, the authors first curate a large, non-redundant and balanced training set of more than 17 000 proteins. Next, the authors ext. and study 23 groups of features computed directly or predicted (e.g. secondary structure) from the primary sequence. The data and the features are used to train a two-stage support vector machine (SVM) architecture. The resulting predictor, SOLpro, is compared directly with existing methods and shows significant improvement according to std. evaluation metrics, with an overall accuracy of over 74% estd. using multiple runs of 10-fold cross-validation. SOLpro is integrated in the SCRATCH suite of predictors and is available for download as a standalone application and as a web server at: http://scratch.proteomics.ics.uci.edu.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtVelu7fE&md5=7c24ccbf700c19b311ecd42abe49ec4a
199
Smialowski, P.; Doose, G.; Torkler, P.; Kaufmann, S.; Frishman, D. PROSO II—A New Method for Protein Solubility Prediction. FEBS J. 2012, 279, 2192– 2200, DOI: 10.1111/j.1742-4658.2012.08603.x

Google Scholar

199
PROSO II - a new method for protein solubility prediction

Smialowski, Pawel; Doose, Gero; Torkler, Phillipp; Kaufmann, Stefanie; Frishman, Dmitrij

FEBS Journal (2012), 279 (12), 2192-2200CODEN: FJEOAC; ISSN:1742-464X. (Wiley-Blackwell)

Many fields of science and industry depend on efficient prodn. of active protein using heterologous expression in Escherichia coli. The soly. of proteins upon expression is dependent on their amino acid sequence. Prediction of soly. from sequence is therefore highly valuable. We present a novel machine-learning-based model called PROSO II which makes use of new classification methods and growth in exptl. data to improve coverage and accuracy of soly. predictions. The classification algorithm is organized as a two-layered structure in which the output of a primary Parzen window model for sequence similarity and a logistic regression classifier of amino acid k-mer compn. serve as input for a second-level logistic regression classifier. Compared with previously published research our model is trained on five times more data than used by any other method before (82,000 proteins). When tested on a sep. holdout set not used at any point of method development our server attained the best results in comparison with other currently available methods: accuracy 75.4%, Matthew's correlation coeff. 0.39, sensitivity 0.731, specificity 0.759, gain (sol.) 2.263. In summary, due to utilization of cutting edge machine learning technologies combined with the largest currently available exptl. data set the PROSO II server constitutes a substantial improvement in protein soly. predictions.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xps12qtrs%253D&md5=e80fc695e7ec155c3e173d218793f10f
200
Agostini, F.; Cirillo, D.; Livi, C. M.; Delli Ponti, R.; Tartaglia, G. G. CcSOL Omics: A Webserver for Solubility Prediction of Endogenous and Heterologous Expression in Escherichia coli. Bioinformatics 2014, 30, 2975– 2977, DOI: 10.1093/bioinformatics/btu420

Google Scholar

200
ccSOL omics: a webserver for solubility prediction of endogenous and heterologous expression in Escherichia coli

Agostini, Federico; Cirillo, Davide; Livi, Carmen Maria; Delli Ponti, Riccardo; Tartaglia, Gian Gaetano

Bioinformatics (2014), 30 (20), 2975-2977CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Summary: Here we introduce ccSOL omics, a webserver for largescale calcns. of protein soly. Our method allows (i) proteome- wide predictions; (ii) identification of sol. fragments within each sequences; (iii) exhaustive single-point mutation anal. Results: Using coil/disorder, hydrophobicity, hydrophilicity, β-sheet and α-helix propensities, we built a predictor of protein soly. Our approach shows an accuracy of 79% on the training set (36 990 Target Track entries). Validation on three independent sets indicates that ccSOL omics discriminates sol. and insol. proteins with an accuracy of 74% on 31 760 proteins sharing 530% sequence similarity.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhtFOrt7nP&md5=ff80067f4bfe752df02b81f0db836b99
201
Khurana, S.; Rawi, R.; Kunji, K.; Chuang, G.-Y.; Bensmail, H.; Mall, R. DeepSol: A Deep Learning Framework for Sequence-Based Protein Solubility Prediction. Bioinformatics 2018, 34, 2605– 2613, DOI: 10.1093/bioinformatics/bty166

Google Scholar

201
DeepSol: a deep learning framework for sequence-based protein solubility prediction

Khurana, Sameer; Rawi, Reda; Kunji, Khalid; Chuang, Gwo-Yu; Bensmail, Halima; Mall, Raghvendra

Bioinformatics (2018), 34 (15), 2605-2613CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)

Motivation: Protein soly. plays a vital role in pharmaceutical research and prodn. yield. For a given protein, the extent of its soly. can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein soly. predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein soly. predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and addnl. sequence and structural features extd. from the protein sequence. Results: DeepSol outperformed all known sequence-based state-of-the-art soly. prediction methods and attained an accuracy of 0.77 and Matthew's correlation coeff. of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced prodn. capacity and can more reliably predict soly. of novel proteins.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVWis7fO&md5=7b526bdc5291f8cf9eec6d1f13ad1289
202
Chang, C. C. H.; Li, C.; Webb, G. I.; Tey, B.; Song, J.; Ramanan, R. N. Periscope: Quantitative Prediction of Soluble Protein Expression in the Periplasm of Escherichia coli. Sci. Rep. 2016, 6, 21844, DOI: 10.1038/srep21844

Google Scholar

202
Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coli

Chang, Catherine Ching Han; Li, Chen; Webb, Geoffrey I.; Tey, Beng Ti; Song, Jiangning; Ramanan, Ramakrishnan Nagasundara

Scientific Reports (2016), 6 (), 21844CODEN: SRCEC3; ISSN:2045-2322. (Nature Publishing Group)

Periplasmic expression of sol. proteins in Escherichia coli not only offers a much-simplified downstream purifn. process, but also enhances the probability of obtaining correctly folded and biol. active proteins. Different combinations of signal peptides and target proteins lead to different sol. protein expression levels, ranging from negligible to several grams per L. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier dets. which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson's correlation coeff. (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XjsFers7w%253D&md5=def551b2e361651ff1400145da18de96
203
Hirose, S.; Noguchi, T. ESPRESSO: A System for Estimating Protein Expression and Solubility in Protein Expression Systems. Proteomics 2013, 13, 1444– 1456, DOI: 10.1002/pmic.201200175

Google Scholar

203
ESPRESSO: A system for estimating protein expression and solubility in protein expression systems

Hirose, Shuichi; Noguchi, Tamotsu

Proteomics (2013), 13 (9), 1444-1456CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)

Recombinant protein technol. is essential for conducting protein science and using proteins as materials in pharmaceutical or industrial applications. Although obtaining sol. proteins is still a major exptl. obstacle, knowledge about protein expression/soly. under std. conditions may increase the efficiency and reduce the cost of proteomics studies. In this study, we present a computational approach to est. the probability of protein expression and soly. for two different protein expression systems: in vivo Escherichia coli and wheat germ cell-free, from only the sequence information. It implements two kinds of methods: a sequence/predicted structural property-based method that uses both the sequence and predicted structural features, and a sequence pattern-based method that utilizes the occurrence frequencies of sequence patterns. In the benchmark test, the proposed methods obtained F-scores of around 70%, and outperformed publicly available servers. Applying the proposed methods to genomic data revealed that proteins assocd. with translation or transcription have a strong tendency to be expressed as sol. proteins by the in vivo E. coli expression system. The sequence pattern-based method also has the potential to indicate a candidate region for modification, to increase protein soly. All methods are available for free at the ESPRESSO server (http://mbs.cbrc.jp/ESPRESSO).

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmtV2lurk%253D&md5=adcfeb20aa6d4a259d19fe4c7f88c9e7
204
Hon, J.; Marusiak, M.; Martinek, T.; Zendulka, J.; Bednar, D.; Damborsky, J. SoluProt: Prediction of Protein Solubility. Nucleic Acids Res. 2018, in preparation
Google Scholar

There is no corresponding record for this reference.
205
DuBay, K. F.; Pawar, A. P.; Chiti, F.; Zurdo, J.; Dobson, C. M.; Vendruscolo, M. Prediction of the Absolute Aggregation Rates of Amyloidogenic Polypeptide Chains. J. Mol. Biol. 2004, 341, 1317– 1326, DOI: 10.1016/j.jmb.2004.06.043

Google Scholar

205
Prediction of the Absolute Aggregation Rates of Amyloidogenic Polypeptide Chains

DuBay, Kateri F.; Pawar, Amol P.; Chiti, Fabrizio; Zurdo, Jesus; Dobson, Christopher M.; Vendruscolo, Michele

Journal of Molecular Biology (2004), 341 (5), 1317-1326CODEN: JMOBAK; ISSN:0022-2836. (Elsevier)

Protein aggregation is assocd. with a variety of pathol. conditions, including Alzheimer's and Creutzfeldt-Jakob diseases and type II diabetes. Such degenerative disorders result from the conversion of the normal sol. state of specific proteins into aggregated states that can ultimately form the characteristic amyloid fibrils found in diseased tissue. Under appropriate conditions it appears that many, perhaps all, proteins can be converted in vitro into amyloid fibrils. The aggregation propensities of different polypeptide chains have, however, been obsd. to vary substantially. Here, we describe an approach that uses the knowledge of the amino acid sequence and of the exptl. conditions to reproduce, with a correlation coeff. of 0.92 and over five orders of magnitude, the in vitro aggregation rates of a wide range of unstructured peptides and proteins. These results indicate that the formation of protein aggregates can be rationalized to a considerable extent in terms of simple physico-chem. parameters that describe the properties of polypeptide chains and their environment.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXmsVCrtb4%253D&md5=6069c47fb2fff331a1a037c8345cd72f
206
Tartaglia, G. G.; Pawar, A. P.; Campioni, S.; Dobson, C. M.; Chiti, F.; Vendruscolo, M. Prediction of Aggregation-Prone Regions in Structured Proteins. J. Mol. Biol. 2008, 380, 425– 436, DOI: 10.1016/j.jmb.2008.05.013

Google Scholar

206
Prediction of Aggregation-Prone Regions in Structured Proteins

Tartaglia, Gian Gaetano; Pawar, Amol P.; Campioni, Silvia; Dobson, Christopher M.; Chiti, Fabrizio; Vendruscolo, Michele

Journal of Molecular Biology (2008), 380 (2), 425-436CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)

We present a method for predicting the regions of the sequences of peptides and proteins that are most important in promoting their aggregation and amyloid formation. The method extends previous approaches by allowing such predictions to be carried out for conditions under which the mols. concerned can be folded or contain a significant degree of persistent structure. In order to achieve this result, the method uses only knowledge of the sequence of amino acids to est. simultaneously both the propensity for folding and aggregation and the way in which these two types of propensity compete. We illustrate the approach by its application to a set of peptides and proteins both assocd. and not assocd. with disease. Our results show not only that the regions of a protein with a high intrinsic aggregation propensity can be identified in a robust manner but also that the structural context of such regions in the monomeric form is crucial for detg. their actual role in the aggregation process.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXnt1eltrg%253D&md5=602438424a74012b2c2fd0b17ce944d4
207
Conchillo-Solé, O.; de Groot, N. S.; Avilés, F. X.; Vendrell, J.; Daura, X.; Ventura, S. AGGRESCAN: A Server for the Prediction and Evaluation of “Hot Spots” of Aggregation in Polypeptides. BMC Bioinf. 2007, 8, 65, DOI: 10.1186/1471-2105-8-65

Google Scholar

207
AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides

Conchillo-Sole Oscar; de Groot Natalia S; Aviles Francesc X; Vendrell Josep; Daura Xavier; Ventura Salvador

BMC bioinformatics (2007), 8 (), 65 ISSN:.

BACKGROUND: Protein aggregation correlates with the development of several debilitating human disorders of growing incidence, such as Alzheimer's and Parkinson's diseases. On the biotechnological side, protein production is often hampered by the accumulation of recombinant proteins into aggregates. Thus, the development of methods to anticipate the aggregation properties of polypeptides is receiving increasing attention. AGGRESCAN is a web-based software for the prediction of aggregation-prone segments in protein sequences, the analysis of the effect of mutations on protein aggregation propensities and the comparison of the aggregation properties of different proteins or protein sets. RESULTS: AGGRESCAN is based on an aggregation-propensity scale for natural amino acids derived from in vivo experiments and on the assumption that short and specific sequence stretches modulate protein aggregation. The algorithm is shown to identify a series of protein fragments involved in the aggregation of disease-related proteins and to predict the effect of genetic mutations on their deposition propensities. It also provides new insights into the differential aggregation properties displayed by globular proteins, natively unfolded polypeptides, amyloidogenic proteins and proteins found in bacterial inclusion bodies. CONCLUSION: By identifying aggregation-prone segments in proteins, AGGRESCAN http://bioinf.uab.es/aggrescan/ shall facilitate (i) the identification of possible therapeutic targets for anti-depositional strategies in conformational diseases and (ii) the anticipation of aggregation phenomena during storage or recombinant production of bioactive polypeptides or polypeptide sets.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2s7ntFCjtg%253D%253D&md5=45a7bdfb4bdda006778830f70a5cc030
208
Fernandez-Escamilla, A.-M.; Rousseau, F.; Schymkowitz, J.; Serrano, L. Prediction of Sequence-Dependent and Mutational Effects on the Aggregation of Peptides and Proteins. Nat. Biotechnol. 2004, 22, 1302– 1306, DOI: 10.1038/nbt1012

Google Scholar

208
Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins

Fernandez-Escamilla, Ana-Maria; Rousseau, Frederic; Schymkowitz, Joost; Serrano, Luis

Nature Biotechnology (2004), 22 (10), 1302-1306CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)

A statistical mechanics algorithm, TANGO, is developed to predict protein aggregation. TANGO is based on the physico-chem. principles of β-sheet formation, extended by the assumption that the core regions of an aggregate are fully buried. The algorithm accurately predicts the aggregation of a data set of 179 peptides compiled from the literature as well as of a new set of 71 peptides derived from human disease-related proteins, including prion protein, lysozyme and β2-microglobulin. TANGO also correctly predicts pathogenic as well as protective mutations of the Alzheimer β-peptide, human lysozyme and transthyretin, and discriminates between β-sheet propensity and aggregation. The results confirm the model of intermol. β-sheet formation as a widespread underlying mechanism of protein aggregation. Furthermore, the algorithm opens the door to a fully automated, sequence-based design strategy to improve the aggregation properties of proteins of scientific or industrial interest.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXotFGqtb8%253D&md5=ce1f751f3691066ec1bc6ce5caed6aae
209
Maurer-Stroh, S.; Debulpaep, M.; Kuemmerer, N.; Lopez de la Paz, M.; Martins, I. C.; Reumers, J.; Morris, K. L.; Copland, A.; Serpell, L.; Serrano, L.; Schymkowitz, J. W. H.; Rousseau, F. Exploring the Sequence Determinants of Amyloid Structure Using Position-Specific Scoring Matrices. Nat. Methods 2010, 7, 237– 242, DOI: 10.1038/nmeth.1432

Google Scholar

209
Exploring the sequence determinants of amyloid structure using position-specific scoring matrices

Maurer-Stroh, Sebastian; Debulpaep, Maja; Kuemmerer, Nico; de la Paz, Manuela Lopez; Martins, Ivo Cristiano; Reumers, Joke; Morris, Kyle L.; Copland, Alastair; Serpell, Louise; Serrano, Luis; Schymkowitz, Joost W. H.; Rousseau, Frederic

Nature Methods (2010), 7 (3), 237-242CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)

Protein aggregation results in β-sheet-like assemblies that adopt either a variety of amorphous morphologies or ordered amyloid-like structures. These differences in structure also reflect biol. differences; amyloid and amorphous β-sheet aggregates have different chaperone affinities, accumulate in different cellular locations and are degraded by different mechanisms. Further, amyloid function depends entirely on a high intrinsic degree of order. Here we exptl. explored the sequence space of amyloid hexapeptides and used the derived data to build Waltz, a web-based tool that uses a position-specific scoring matrix to det. amyloid-forming sequences. Waltz allows users to identify and better distinguish between amyloid sequences and amorphous β-sheet aggregates and allowed us to identify amyloid-forming regions in functional amyloids.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhvFGmsbw%253D&md5=788ac031c7946f7d9c7c1f4e8de62a32
210
Walsh, I.; Seno, F.; Tosatto, S. C. E.; Trovato, A. PASTA 2.0: An Improved Server for Protein Aggregation Prediction. Nucleic Acids Res. 2014, 42, W301– 307, DOI: 10.1093/nar/gku399

Google Scholar

210
PASTA 2.0: an improved server for protein aggregation prediction

Walsh, Ian; Seno, Flavio; Tosatto, Silvio C. E.; Trovato, Antonio

Nucleic Acids Research (2014), 42 (W1), W301-W307CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)

The formation of amyloid aggregates upon protein misfolding is related to several devastating degenerative diseases. The propensities of different protein sequences to aggregate into amyloids, how they are enhanced by pathogenic mutations, the presence of aggregation hot spots stabilizing pathol. interactions, the establishing of cross-amyloid interactions between co-aggregating proteins, all rely at the mol. level on the stability of the amyloid cross-beta structure. The authors' redesigned server, PASTA 2.0, provides a versatile platform where all of these different features can be easily predicted on a genomic scale given input sequences. The server provides other pieces of information, such as intrinsic disorder and secondary structure predictions, that complement the aggregation data. The PASTA 2.0 energy function evaluates the stability of putative cross-beta pairings between different sequence stretches. It was re-derived on a larger dataset of globular protein domains. The resulting algorithm was benchmarked on comprehensive peptide and protein test sets, leading to improved, state-of-the-art results with more amyloid forming regions correctly detected at high specificity. The PASTA 2.0 server can be accessed at http://protein.bio.unipd.it/pasta2/.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhtFCqs7vF&md5=e5eef7b6922fc7db345b10ff9a14b004
211
Bryan, A. W.; Menke, M.; Cowen, L. J.; Lindquist, S. L.; Berger, B. BETASCAN: Probable Beta-Amyloids Identified by Pairwise Probabilistic Analysis. PLoS Comput. Biol. 2009, 5, e1000333, DOI: 10.1371/journal.pcbi.1000333

Google Scholar

211
BETASCAN: probable beta-amyloids identified by pairwise probabilistic analysis

Bryan Allen W Jr; Menke Matthew; Cowen Lenore J; Lindquist Susan L; Berger Bonnie

PLoS computational biology (2009), 5 (3), e1000333 ISSN:.

Amyloids and prion proteins are clinically and biologically important beta-structures, whose supersecondary structures are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Recent work has indicated the utility of pairwise probabilistic statistics in beta-structure prediction. We develop here a new strategy for beta-structure prediction, emphasizing the determination of beta-strands and pairs of beta-strands as fundamental units of beta-structure. Our program, BETASCAN, calculates likelihood scores for potential beta-strands and strand-pairs based on correlations observed in parallel beta-sheets. The program then determines the strands and pairs with the greatest local likelihood for all of the sequence's potential beta-structures. BETASCAN suggests multiple alternate folding patterns and assigns relative a priori probabilities based solely on amino acid sequence, probability tables, and pre-chosen parameters. The algorithm compares favorably with the results of previous algorithms (BETAPRO, PASTA, SALSA, TANGO, and Zyggregator) in beta-structure prediction and amyloid propensity prediction. Accurate prediction is demonstrated for experimentally determined amyloid beta-structures, for a set of known beta-aggregates, and for the parallel beta-strands of beta-helices, amyloid-like globular proteins. BETASCAN is able both to detect beta-strands with higher sensitivity and to detect the edges of beta-strands in a richly beta-like sequence. For two proteins (Abeta and Het-s), there exist multiple sets of experimental data implying contradictory structures; BETASCAN is able to detect each competing structure as a potential structure variant. The ability to correlate multiple alternate beta-structures to experiment opens the possibility of computational investigation of prion strains and structural heterogeneity of amyloid. BETASCAN is publicly accessible on the Web at http://betascan.csail.mit.edu.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1M3jslaltA%253D%253D&md5=076a6b9a72cda8145ad23af5825d9cc0
212
Garbuzynskiy, S. O.; Lobanov, M. Y.; Galzitskaya, O. V. FoldAmyloid: A Method of Prediction of Amyloidogenic Regions from Protein Sequence. Bioinformatics 2010, 26, 326– 332, DOI: 10.1093/bioinformatics/btp691

Google Scholar

212
FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence

Garbuzynskiy, Sergiy O.; Lobanov, Michail Yu.; Galzitskaya, Oxana V.

Bioinformatics (2010), 26 (3), 326-332CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Motivation: Amyloidogenic regions in polypeptide chains are very important because such regions are responsible for amyloid formation and aggregation. It is useful to be able to predict positions of amyloidogenic regions in protein chains. Results: Two characteristics (expected probability of hydrogen bonds formation and expected packing d. of residues) have been introduced by us to detect amyloidogenic regions in a protein sequence. We demonstrate that regions with high expected probability of the formation of backbone-backbone hydrogen bonds as well as regions with high expected packing d. are mostly responsible for the formation of amyloid fibrils. Our method (FoldAmyloid) has been tested on a dataset of 407 peptides (144 amyloidogenic and 263 non-amyloidogenic peptides) and has shown good performance in predicting a peptide status: amyloidogenic or non-amyloidogenic. The prediction based on the expected packing d. classified correctly 75% of amyloidogenic peptides and 74% of non-amyloidogenic ones. Two variants (averaging by donors and by acceptors) of prediction based on the probability of formation of backbone-backbone hydrogen bonds gave a comparable efficiency. With a hybrid-scale constructed by merging the above three scales, our method is correct for 80% of amyloidogenic peptides and for 72% of non-amyloidogenic ones. Prediction of amyloidogenic regions in proteins where positions of amyloidogenic regions are known from exptl. data has also been done. In the proteins, our method correctly finds 10 out of 11 amyloidogenic regions.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhs1Onsrc%253D&md5=54bb87f8753d52c5c9bf8be6e8c86bc9
213
Goldschmidt, L.; Teng, P. K.; Riek, R.; Eisenberg, D. Identifying the Amylome, Proteins Capable of Forming Amyloid-like Fibrils. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 3487– 3492, DOI: 10.1073/pnas.0915166107

Google Scholar

213
Identifying the amylome, proteins capable of forming amyloid-like fibrils

Goldschmidt, Lukasz; Teng, Poh K.; Riek, Roland; Eisenberg, David

Proceedings of the National Academy of Sciences of the United States of America (2010), 107 (8), 3487-3492, S3487/1-S3487/13CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)

The amylome is the universe of proteins that are capable of forming amyloid-like fibrils. Here we investigate the factors that enable a protein to belong to the amylome. A major factor is the presence in the protein of a segment that can form a tightly complementary interface with an identical segment, which permits the formation of a steric zipper - two self-complementary beta sheets that form the spine of an amyloid fibril. Another factor is sufficient conformational freedom of the self-complementary segment to interact with other mols. Using RNase A as a model system, we validate our fibrillogenic predictions by the 3D profile method based on the crystal structure of NNQQNY and demonstrate that a specific residue order is required for fiber formation. Our genome-wide anal. revealed that self-complementary segments are found in almost all proteins, yet not all proteins form amyloids. The implication is that chaperoning effects have evolved to constrain self-complementary segments from interaction with each other.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXjtFymu74%253D&md5=39dfce19592f6de1c53a6c9469f691d2
214
Ahmed, A. B.; Znassi, N.; Château, M.-T.; Kajava, A. V. A Structure-Based Approach to Predict Predisposition to Amyloidosis. Alzheimer’s Dementia 2015, 11, 681– 690, DOI: 10.1016/j.jalz.2014.06.007

Google Scholar

214
A structure-based approach to predict predisposition to amyloidosis

Ahmed Abdullah B; Znassi Nadia; Chateau Marie-Therese; Kajava Andrey V

Alzheimer's & dementia : the journal of the Alzheimer's Association (2015), 11 (6), 681-90 ISSN:.

BACKGROUND: Neurodegenerative diseases and other amyloidoses are linked to the formation of amyloid fibrils. It has been shown that the ability to form these fibrils is coded by the amino acid sequence. Existing methods for the prediction of amyloidogenicity generate an unsatisfactory high number of false positives when tested against sequences of the disease-related proteins. METHODS: Recently, it has been shown that the three-dimensional structure of a majority of disease-related amyloid fibrils contains a β-strand-loop-β-strand motif called β-arch. Using this information, we have developed a novel bioinformatics approach for the prediction of amyloidogenicity. RESULTS: The benchmark results show the superior performance of our method over the existing programs. CONCLUSIONS: As genome sequencing becomes more affordable, our method provides an opportunity to create individual risk profiles for the neurodegenerative, age-related, and other diseases ushering in an era of personalized medicine. It will also be used in the large-scale analysis of proteomes to find new amyloidogenic proteins.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2M%252FksFSqsA%253D%253D&md5=f2f97b3d51ec862bbab6fab75e180239
215
Krogh, A.; Vedelsby, J. Neural Network Ensembles, Cross Validation and Active Learning. In Proceedings of the 7th International Conference on Neural Information Processing Systems (NIPS’94); MIT Press: Cambridge, MA, 1994; pp 231– 238.
Google Scholar

There is no corresponding record for this reference.
216
Maclin, R.; Opitz, D. Popular Ensemble Methods: An Empirical Study. J. Artif. Intell. Res. 1999, 11, 169– 198, DOI: 10.1613/jair.614

Google Scholar

There is no corresponding record for this reference.
217
Tsolis, A. C.; Papandreou, N. C.; Iconomidou, V. A.; Hamodrakas, S. J. A Consensus Method for the Prediction of “Aggregation-Prone” Peptides in Globular Proteins. PLoS One 2013, 8, e54175, DOI: 10.1371/journal.pone.0054175

Google Scholar

217
A consensus method for the prediction of 'aggregation-prone' peptides in globular proteins

Tsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.

PLoS One (2013), 8 (1), e54175CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)

The purpose of this work was to construct a consensus prediction algorithm of 'aggregation-prone' peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the prodn. of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users, for the consensus prediction of amyloidogenic determinants/'aggregation-prone' peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are assocd. with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/soly. in biotechnol. (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins).

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtlCqsb4%253D&md5=759edce8afae8bbe81b455770c9ab600
218
Emily, M.; Talvas, A.; Delamarche, C. MetAmyl: A METa-Predictor for AMYLoid Proteins. PLoS One 2013, 8, e79722, DOI: 10.1371/journal.pone.0079722

Google Scholar

There is no corresponding record for this reference.
219
Zambrano, R.; Jamroz, M.; Szczasiuk, A.; Pujols, J.; Kmiecik, S.; Ventura, S. AGGRESCAN3D (A3D): Server for Prediction of Aggregation Properties of Protein Structures. Nucleic Acids Res. 2015, 43, W306– 313, DOI: 10.1093/nar/gkv359

Google Scholar

219
AGGRESCAN3D (A3D): server for prediction of aggregation properties of protein structures

Zambrano, Rafael; Jamroz, Michal; Szczasiuk, Agata; Pujols, Jordi; Kmiecik, Sebastian; Ventura, Salvador

Nucleic Acids Research (2015), 43 (W1), W306-W313CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)

Protein aggregation underlies an increasing no. of disorders and constitutes a major bottleneck in the development of therapeutic proteins. Our present understanding on the mol. determinants of protein aggregation has crystd. in a series of predictive algorithms to identify aggregation-prone sites. A majority of these methods rely only on sequence. Therefore, they find difficulties to predict the aggregation properties of folded globular proteins, where aggregation-prone sites are often not contiguous in sequence or buried inside the native structure. The AGGRESCAN3D (A3D) server overcomes these limitations by taking into account the protein structure and the exptl. aggregation propensity scale from the well-established AGGRESCAN method. Using the A3D server, the identified aggregation-prone residues can be virtually mutated to design variants with increased soly., or to test the impact of pathogenic mutations. Addnl., A3D server enables to take into account the dynamic fluctuations of protein structure in soln., which may influence aggregation propensity. This is possible in A3D Dynamic Mode that exploits the CABS-flex approach for the fast simulations of flexibility of globular proteins.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVymtbjK&md5=4d5a4d94fa0bf2744250860780e2a203
220
De Baets, G.; Van Durme, J.; van der Kant, R.; Schymkowitz, J.; Rousseau, F. Solubis: Optimize Your Protein. Bioinformatics 2015, 31, 2580– 2582, DOI: 10.1093/bioinformatics/btv162

Google Scholar

220
Solubis: optimize your protein

De Baets, Greet; Van Durme, Joost; van der Kant, Rob; Schymkowitz, Joost; Rousseau, Frederic

Bioinformatics (2015), 31 (15), 2580-2582CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)

Motivation:Protein aggregation is assocd. with a no. of protein misfolding diseases and is a major concern for therapeutic proteins. Aggregation is caused by the presence of aggregation- prone regions (APRs) in the amino acid sequence of the protein. The lower the aggregation propen- sity of APRs and the better they are protected by native interactions within the folded structure of the protein, the more aggregation is prevented. Therefore, both the local thermodn. stability of APRs in the native structure and their intrinsic aggregation propensity are a key parameter that needs to be optimized to prevent protein aggregation. Results:The Solubis method presented here automates the process of carefully selecting point mutations that minimize the intrinsic aggregation propensity while improving local protein stability.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs1Gisr3O&md5=895e1eac4e041610ac951c35a500f7e7
221
Van Durme, J.; De Baets, G.; Van Der Kant, R.; Ramakers, M.; Ganesan, A.; Wilkinson, H.; Gallardo, R.; Rousseau, F.; Schymkowitz, J. Solubis: A Webserver To Reduce Protein Aggregation through Mutation. Protein Eng., Des. Sel. 2016, 29, 285– 289, DOI: 10.1093/protein/gzw019

Google Scholar

221
Solubis: a webserver to reduce protein aggregation through mutation

Van Durme, Joost; De Baets, Greet; Van Der Kant, Rob; Ramakers, Meine; Ganesan, Ashok; Wilkinson, Hannah; Gallardo, Rodrigo; Rousseau, Frederic; Schymkowitz, Joost

Protein Engineering, Design & Selection (2016), 29 (8), 285-289CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)

Protein aggregation is a major factor limiting the biotechnol. and therapeutic application of many proteins, including enzymes and monoclonal antibodies. The mol. principles underlying aggregation are by now sufficiently understood to allow rational redesign of natural polypeptide sequences for decreased aggregation tendency, and hence potentially increased expression and soly. Given that aggregation-prone regions (APRs) tend to contribute to the stability of the hydrophobic core or to functional sites of the protein, mutations in these regions have to be carefully selected in order not to disrupt protein structure or function. Therefore, we here provide access to an automated pipeline to identify mutations that reduce protein aggregation by reducing the intrinsic aggregation propensity of the sequence (using the TANGO algorithm), while taking care not to disrupt the thermodn. stability of the native structure (using the empirical force-field FoldX). Moreover, by providing a plot of the intrinsic aggregation propensity score of APRs cor. by the local stability of that region in the folded structure, we allow users to prioritize those regions in the protein that are most in need of improvement through protein engineering.

>> More from SciFinder ^®
https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs1OntbjM&md5=f2dd5db6195dd37365285f09a44e9c0b

Cited By

ARTICLE SECTIONS

Jump To

This article is cited by 86 publications.

Bingxin Zhou, Lirong Zheng, Banghao Wu, Yang Tan, Outongyi Lv, Kai Yi, Guisheng Fan, Liang Hong. Protein Engineering with Lightweight Graph Denoising Neural Networks. Journal of Chemical Information and Modeling 2024, Article ASAP.
Tong Zhu, Jinyuan Sun, Hua Pang, Bian Wu. Computational Enzyme Redesign Enhances Tolerance to Denaturants for Peptide C-Terminal Amidation. JACS Au 2024, 4 (2) , 788-797. https://doi.org/10.1021/jacsau.3c00792
Braun Markus, Gruber Christian C, Krassnigg Andreas, Kummer Arkadij, Lutz Stefan, Oberdorfer Gustav, Siirola Elina, Snajdrova Radka. Accelerating Biocatalysis Discovery with Machine Learning: A Paradigm Shift in Enzyme Engineering, Discovery, and Design. ACS Catalysis 2023, 13 (21) , 14454-14469. https://doi.org/10.1021/acscatal.3c03417
Antonin Kunka, Sérgio M. Marques, Martin Havlasek, Michal Vasina, Nikola Velatova, Lucia Cengelova, David Kovar, Jiri Damborsky, Martin Marek, David Bednar, Zbynek Prokop. Advancing Enzyme’s Stability and Catalytic Efficiency through Synergy of Force-Field Calculations, Evolutionary Analysis, and Machine Learning. ACS Catalysis 2023, 13 (19) , 12506-12518. https://doi.org/10.1021/acscatal.3c02575
Lixia Liu, Shenghu Zhou, Yu Deng. Rational Design of the Substrate Tunnel of β-Ketothiolase Reveals a Local Cationic Domain Modulated Rule that Improves the Efficiency of Claisen Condensation. ACS Catalysis 2023, 13 (12) , 8183-8194. https://doi.org/10.1021/acscatal.3c01426
Sergi Roda, Henrik Terholsen, Jule Ruth Heike Meyer, Albert Cañellas-Solé, Victor Guallar, Uwe Bornscheuer, Masoud Kazemi. AsiteDesign: a Semirational Algorithm for an Automated Enzyme Design. The Journal of Physical Chemistry B 2023, 127 (12) , 2661-2670. https://doi.org/10.1021/acs.jpcb.2c07091
Tianjin Yang, Alessia Villois, Antonín Kunka, Fulvio Grigolato, Paolo Arosio, Zbynek Prokop, Andrew deMello, Stavros Stavrakis. Droplet-Based Microfluidic Temperature-Jump Platform for the Rapid Assessment of Biomolecular Kinetics. Analytical Chemistry 2022, 94 (48) , 16675-16684. https://doi.org/10.1021/acs.analchem.2c03009
David A. Hueting, Sudarsana R. Vanga, Per-Olof Syrén. Thermoadaptation in an Ancestral Diterpene Cyclase by Altered Loop Stability. The Journal of Physical Chemistry B 2022, 126 (21) , 3809-3821. https://doi.org/10.1021/acs.jpcb.1c10605
Daniel Markthaler, Maximilian Fleck, Bartosz Stankiewicz, Niels Hansen. Exploring the Effect of Enhanced Sampling on Protein Stability Prediction. Journal of Chemical Theory and Computation 2022, 18 (4) , 2569-2583. https://doi.org/10.1021/acs.jctc.1c01012
Jiajun Chen, Ding Chen, Qiuming Chen, Wei Xu, Wenli Zhang, Wanmeng Mu. Computer-Aided Targeted Mutagenesis of Thermoclostridium caenicola d-Allulose 3-Epimerase for Improved Thermostability. Journal of Agricultural and Food Chemistry 2022, 70 (6) , 1943-1951. https://doi.org/10.1021/acs.jafc.1c07256
Jennifer L. Kennemur, Rajat Maji, Manuel J. Scharf, Benjamin List. Catalytic Asymmetric Hydroalkoxylation of C–C Multiple Bonds. Chemical Reviews 2021, 121 (24) , 14649-14681. https://doi.org/10.1021/acs.chemrev.1c00620
Ailan Huang, Chengcheng Chai, Jiayu Zhang, Lei Zhao, Fuping Lu, Fufeng Liu. Engineered N57P Variant of Ulvan Lyase with Improvement of Catalytic Efficiency and Thermostability via Reducing Loop Flexibility and Anchoring Substrate. ACS Sustainable Chemistry & Engineering 2021, 9 (48) , 16415-16423. https://doi.org/10.1021/acssuschemeng.1c06348
Klara Markova, Antonin Kunka, Klaudia Chmelova, Martin Havlasek, Petra Babkova, Sérgio M. Marques, Michal Vasina, Joan Planas-Iglesias, Radka Chaloupkova, David Bednar, Zbynek Prokop, Jiri Damborsky, Martin Marek. Computational Enzyme Stabilization Can Affect Folding Energy Landscapes and Lead to Catalytically Enhanced Domain-Swapped Dimers. ACS Catalysis 2021, 11 (21) , 12864-12885. https://doi.org/10.1021/acscatal.1c03343
Kohei Kozuka, Shogo Nakano, Yasuhisa Asano, Sohei Ito. Partial Consensus Design and Enhancement of Protein Function by Secondary-Structure-Guided Consensus Mutations. Biochemistry 2021, 60 (29) , 2309-2319. https://doi.org/10.1021/acs.biochem.1c00309
Rae A. Corrigan, Guowei Qi, Andrew C. Thiel, Jack R. Lynn, Brandon D. Walker, Thomas L. Casavant, Louis Lagardere, Jean-Philip Piquemal, Jay W. Ponder, Pengyu Ren, Michael J. Schnieders. Implicit Solvents for the Polarizable Atomic Multipole AMOEBA Force Field. Journal of Chemical Theory and Computation 2021, 17 (4) , 2323-2341. https://doi.org/10.1021/acs.jctc.0c01286
Megan V. Doble, Lorenz Obrecht, Henk-Jan Joosten, Misun Lee, Henriette J. Rozeboom, Emma Branigan, James. H. Naismith, Dick B. Janssen, Amanda G. Jarvis, Paul C. J. Kamer. Engineering Thermostability in Artificial Metalloenzymes to Increase Catalytic Activity. ACS Catalysis 2021, 11 (6) , 3620-3627. https://doi.org/10.1021/acscatal.0c05413
Yinglu Cui, Yanchun Chen, Xinyue Liu, Saijun Dong, Yu’e Tian, Yuxin Qiao, Ruchira Mitra, Jing Han, Chunli Li, Xu Han, Weidong Liu, Quan Chen, Wangqing Wei, Xin Wang, Wenbin Du, Shuangyan Tang, Hua Xiang, Haiyan Liu, Yong Liang, Kendall N. Houk, Bian Wu. Computational Redesign of a PETase for Plastic Biodegradation under Ambient Condition by the GRAPE Strategy. ACS Catalysis 2021, 11 (3) , 1340-1350. https://doi.org/10.1021/acscatal.0c05126
Christoph K. Winkler, Joerg H. Schrittwieser, Wolfgang Kroutil. Power of Biocatalysis for Organic Synthesis. ACS Central Science 2021, 7 (1) , 55-71. https://doi.org/10.1021/acscentsci.0c01496
Peishan Huang, Simon K. S. Chu, Henrique N. Frizzo, Morgan P. Connolly, Ryan W. Caster, Justin B. Siegel. Evaluating Protein Engineering Thermostability Prediction Tools Using an Independently Generated Dataset. ACS Omega 2020, 5 (12) , 6487-6493. https://doi.org/10.1021/acsomega.9b04105
Qinglong Meng, Nikolas Capra, Cyntia M. Palacio, Elisa Lanfranchi, Marleen Otzen, Luc Z. van Schie, Henriëtte J. Rozeboom, Andy-Mark W. H. Thunnissen, Hein J. Wijma, Dick B. Janssen. Robust ω-Transaminases by Computational Stabilization of the Subunit Interface. ACS Catalysis 2020, 10 (5) , 2915-2928. https://doi.org/10.1021/acscatal.9b05223
Stanislav Mazurenko, Zbynek Prokop, Jiri Damborsky. Machine Learning in Enzyme Engineering. ACS Catalysis 2020, 10 (2) , 1210-1223. https://doi.org/10.1021/acscatal.9b04321
Brianne R. King, Kiera H. Sumida, Jessica L. Caruso, David Baker, Jesse G. Zalatan. Computational stabilization of a non-heme iron enzyme enables efficient evolution of new function. 2024https://doi.org/10.1101/2024.04.18.590141
Mohammad Reza Rahbar, Navid Nezafat, Mohammad Hossein Morowvat, Amir Savardashtaki, Mohammad Bagher Ghoshoon, Kamran Mehrabani-Zeinabad, Younes Ghasemi. Targeting Efficient Features of Urate Oxidase to Increase Its Solubility. Applied Biochemistry and Biotechnology 2024, 41 https://doi.org/10.1007/s12010-023-04819-w
Suhyeon Kim, Seongmin Ga, Hayeon Bae, Ronald Sluyter, Konstantin Konstantinov, Lok Kumar Shrestha, Yong Ho Kim, Jung Ho Kim, Katsuhiko Ariga. Multidisciplinary approaches for enzyme biocatalysis in pharmaceuticals: protein engineering, computational biology, and nanoarchitectonics. EES Catalysis 2024, 2 (1) , 14-48. https://doi.org/10.1039/D3EY00239J
Hiroki Ozawa, Ibuki Unno, Ryohei Sekine, Taichi Chisuga, Sohei Ito, Shogo Nakano. Development of evolutionary algorithm-based protein redesign method. Cell Reports Physical Science 2024, 5 (1) , 101758. https://doi.org/10.1016/j.xcrp.2023.101758
Lihang Xie. Biofoundries for plant-derived bioactive compounds. 2024, 257-283. https://doi.org/10.1016/B978-0-443-15558-1.00005-9
Carola Jerves, Rui P. P. Neves, Saulo L. da Silva, Maria J. Ramos, Pedro A. Fernandes. Rate-enhancing PETase mutations determined through DFT/MM molecular dynamics simulations. New Journal of Chemistry 2023, 48 (1) , 45-54. https://doi.org/10.1039/D3NJ04204A
Elena Tomarelli, Bruno Cerra, Francesco G. Mutti, Antimo Gioiello. Merging Continuous Flow Technology, Photochemistry and Biocatalysis to Streamline Steroid Synthesis. Advanced Synthesis & Catalysis 2023, 365 (23) , 4024-4048. https://doi.org/10.1002/adsc.202300305
Honglin Lu, Maoyuan Xue, Xinling Nie, Hongzheng Luo, Zhongbiao Tan, Xiao Yang, Hao Shi, Xun Li, Tao Wang. Glycoside hydrolases in the biodegradation of lignocellulosic biomass. 3 Biotech 2023, 13 (12) https://doi.org/10.1007/s13205-023-03819-1
Md Sakib Hossen, Md. Nazmul Hasan, Munima Haque, Tawsif Al Arian, Sajal Kumar Halder, Md. Jasim Uddin, M. Abdullah-Al-Mamun, Md Salman Shakil. Immunoinformatics-aided rational design of multiepitope-based peptide vaccine (MEBV) targeting human parainfluenza virus 3 (HPIV-3) stable proteins. Journal of Genetic Engineering and Biotechnology 2023, 21 (1) , 162. https://doi.org/10.1186/s43141-023-00623-5
Milos Musil, Andrej Jezik, Jana Horackova, Simeon Borko, Petr Kabourek, Jiri Damborsky, David Bednar. FireProt 2.0: web-based platform for the fully automated design of thermostable proteins. Briefings in Bioinformatics 2023, 25 (1) https://doi.org/10.1093/bib/bbad425
Mahrokh Dastmalchi, Mahdiyeh Alizadeh, Omid Jamshidi-Kandjan, Hassan Rezazadeh, Maryam Hamzeh-Mivehroud, Mohammad M Farajollahi, Siavoush Dastmalchi. Expression and Biological Evaluation of an Engineered Recombinant L-asparaginase Designed by In Silico Method Based on Sequence of the Enzyme from Escherichia coli. Advanced Pharmaceutical Bulletin 2023, 13 (4) , 827-836. https://doi.org/10.34172/apb.2023.085
Caroline Torres de Oliveira, Michelle Alexandrino de Assis, Marcio Antonio Mazutti, Gonçalo Amarante Guimarães Pereira, Débora de Oliveira. Production of recombinant cutinases and their potential applications in polymer hydrolysis: The current status. Process Biochemistry 2023, 134 , 30-46. https://doi.org/10.1016/j.procbio.2023.10.020
Jie Luo, Chenshuo Song, Wenjing Cui, Laichuang Han, Zhemin Zhou. Counteraction of stability-activity trade-off of Nattokinase through flexible region shifting. Food Chemistry 2023, 423 , 136241. https://doi.org/10.1016/j.foodchem.2023.136241
Liliana Mammino. Green chemistry and computational chemistry: A wealth of promising synergies. Sustainable Chemistry and Pharmacy 2023, 34 , 101151. https://doi.org/10.1016/j.scp.2023.101151
Qing Guo, Meiling Dan, Yuting Zheng, Ji Shen, Guohua Zhao, Damao Wang. Improving the thermostability of a novel PL-6 family alginate lyase by rational design engineering for industrial preparation of alginate oligosaccharides. International Journal of Biological Macromolecules 2023, 249 , 125998. https://doi.org/10.1016/j.ijbiomac.2023.125998
Zheng Wei, Tanja Knaus, Yuxin Liu, Ziran Zhai, Andrea F. G. Gargano, Gadi Rothenberg, Ning Yan, Francesco G. Mutti. A high-performance electrochemical biosensor using an engineered urate oxidase. Chemical Communications 2023, 59 (52) , 8071-8074. https://doi.org/10.1039/D3CC01869E
Anwesha Chatterjee, Sonakshi Puri, Pankaj Kumar Sharma, P. R. Deepa, Shibasish Chowdhury. Nature-inspired Enzyme engineering and sustainable catalysis: biochemical clues from the world of plants and extremophiles. Frontiers in Bioengineering and Biotechnology 2023, 11 https://doi.org/10.3389/fbioe.2023.1229300
Stefanie Hanreich, Elisa Bonandi, Ivana Drienovská. Design of Artificial Enzymes: Insights into Protein Scaffolds. ChemBioChem 2023, 24 (6) https://doi.org/10.1002/cbic.202200566
Jie Gu, Yan Xu, Yao Nie. Role of distal sites in enzyme engineering. Biotechnology Advances 2023, 63 , 108094. https://doi.org/10.1016/j.biotechadv.2023.108094
Hanbeen Kim, Jakyeom Seo. A Novel Strategy to Identify Endolysins with Lytic Activity against Methicillin-Resistant Staphylococcus aureus. International Journal of Molecular Sciences 2023, 24 (6) , 5772. https://doi.org/10.3390/ijms24065772
Zhixin Dou, Yuqing Sun, Xukai Jiang, Xiuyun Wu, Yingjie Li, Bin Gong, Lushan Wang. Data-driven strategies for the computational design of enzyme thermal stability: trends, perspectives, and prospects. Acta Biochimica et Biophysica Sinica 2023, 55 (3) , 343-355. https://doi.org/10.3724/abbs.2023033
Delaney M. Anderson, Lakshmi P. Jayanthi, Shachi Gosavi, Elizabeth M. Meiering. Engineering the kinetic stability of a β-trefoil protein by tuning its topological complexity. Frontiers in Molecular Biosciences 2023, 10 https://doi.org/10.3389/fmolb.2023.1021733
Tianhao Yu, Aashutosh Girish Boob, Michael J. Volk, Xuan Liu, Haiyang Cui, Huimin Zhao. Machine learning-enabled retrobiosynthesis of molecules. Nature Catalysis 2023, 6 (2) , 137-151. https://doi.org/10.1038/s41929-022-00909-w
María Laura Foresti, María Luján Ferreira. Enzyme immobilization for use in nonconventional media. 2023, 165-202. https://doi.org/10.1016/B978-0-323-91317-1.00008-6
Charu Tripathi, Twinkle Yadav. Recent approaches and innovations for enzyme engineering used in industrial biotechnology. 2023, 161-175. https://doi.org/10.1016/B978-0-323-95332-0.00017-X
Seyyed Soheil Rahmatabadi, Keivan Mobini, Soudabeh Askari, Javad Najafian, Keyvan Karami, Bijan Soleymani, Ali Mostafaie. In silico characterization of fructosyl peptide oxidase properties from Eupenicillium terrenum. Journal of Molecular Recognition 2022, 35 (11) https://doi.org/10.1002/jmr.2980
Zhuha Basit, Hira Akram, Muhammad Mudassir Iqbal, Gulzar Muhammad, Muhammad Shahbaz Aslam, Iram Gul, Muhammad Jamil, Mudassir Hussain Tahir. Protein Redesign and Engineering Using Machine Learning. 2022, 247-282. https://doi.org/10.1002/9781394167258.ch9
Muhammad Naveed, Jawad-ul Hassan, Muneeb Ahmad, Nida Naeem, Muhammad Saad Mughal, Ali A. Rabaan, Mohammed Aljeldah, Basim R. Al Shammari, Mohammed Alissa, Amal A. Sabour, Rana A. Alaeq, Maha A. Alshiekheid, Safaa A. Turkistani, Abdirahman Hussein Elmi, Naveed Ahmed. Designing mRNA- and Peptide-Based Vaccine Construct against Emerging Multidrug-Resistant Citrobacter freundii: A Computational-Based Subtractive Proteomics Approach. Medicina 2022, 58 (10) , 1356. https://doi.org/10.3390/medicina58101356
Michal Vasina, Pavel Vanacek, Jiri Hon, David Kovar, Hana Faldynova, Antonin Kunka, Tomas Buryska, Christoffel P.S. Badenhorst, Stanislav Mazurenko, David Bednar, Stavros Stavrakis, Uwe T. Bornscheuer, Andrew deMello, Jiri Damborsky, Zbynek Prokop. Advanced database mining of efficient haloalkane dehalogenases by sequence and structure bioinformatics and microfluidics. Chem Catalysis 2022, 2 (10) , 2704-2725. https://doi.org/10.1016/j.checat.2022.09.011
Aisaraphon Phintha, Pimchai Chaiyen. Rational and mechanistic approaches for improving biocatalyst performance. Chem Catalysis 2022, 2 (10) , 2614-2643. https://doi.org/10.1016/j.checat.2022.09.026
Yu-Jie Yang, Xiao-Qiong Pei, Yan Liu, Zhong-Liu Wu. Thermostabilizing ketoreductase ChKRED20 by consensus mutagenesis at dimeric interfaces. Enzyme and Microbial Technology 2022, 158 , 110052. https://doi.org/10.1016/j.enzmictec.2022.110052
Erich R Kuechler, Matthew Jacobson, Thibault Mayor, Jörg Gsponer. GraPES: The Granule Protein Enrichment Server for prediction of biological condensate constituents. Nucleic Acids Research 2022, 50 (W1) , W384-W391. https://doi.org/10.1093/nar/gkac279
Antonin Kunka, David Lacko, Jan Stourac, Jiri Damborsky, Zbynek Prokop, Stanislav Mazurenko. CalFitter 2.0: Leveraging the power of singular value decomposition to analyse protein thermostability. Nucleic Acids Research 2022, 50 (W1) , W145-W151. https://doi.org/10.1093/nar/gkac378
Yinglu Cui, Jinyuan Sun, Bian Wu. Computational enzyme redesign: large jumps in function. Trends in Chemistry 2022, 4 (5) , 409-419. https://doi.org/10.1016/j.trechm.2022.03.001
Yanxia Wang, Yao Chen, Ling Jiang, He Huang. Improvement of the enzymatic detoxification activity towards mycotoxins through structure-based engineering. Biotechnology Advances 2022, 56 , 107927. https://doi.org/10.1016/j.biotechadv.2022.107927
Ziyang Huang, Xueqin Lv, Guoyun Sun, Xinzhu Mao, Wei Lu, Yanfeng Liu, Jianghua Li, Guocheng Du, Long Liu. Chitin deacetylase: from molecular structure to practical applications. Systems Microbiology and Biomanufacturing 2022, 2 (2) , 271-284. https://doi.org/10.1007/s43393-022-00077-9
Michal Vasina, Jan Velecký, Joan Planas-Iglesias, Sergio M. Marques, Jana Skarupova, Jiri Damborsky, David Bednar, Stanislav Mazurenko, Zbynek Prokop. Tools for computational design and high-throughput screening of therapeutic enzymes. Advanced Drug Delivery Reviews 2022, 183 , 114143. https://doi.org/10.1016/j.addr.2022.114143
Petr Rozhin, Jada Abdel Monem Gamal, Silvia Giordani, Silvia Marchesan. Carbon Nanomaterials (CNMs) and Enzymes: From Nanozymes to CNM-Enzyme Conjugates and Biodegradation. Materials 2022, 15 (3) , 1037. https://doi.org/10.3390/ma15031037
Xavier F. Cadet, Jean Christophe Gelly, Aster van Noord, Frédéric Cadet, Carlos G. Acevedo-Rocha. Learning Strategies in Protein Directed Evolution. 2022, 225-275. https://doi.org/10.1007/978-1-0716-2152-3_15
Michal Vasina, Pavel Vanacek, Jiri Hon, David Kovar, Hana Faldynova, Antonin Kunka, Tomas Buryska, Christoffel P. S. Badenhorst, Stanislav Mazurenko, David Bednar, Stavros Stavrakis, Uwe T. Bornscheuer, Andrew deMello, Jiri Damborsky, Zbynek Prokop. Advanced Database Mining of Efficient Biocatalysts by Sequence and Structure Bioinformatics and Microfluidics. SSRN Electronic Journal 2022, 43 https://doi.org/10.2139/ssrn.4111603
Ziheng Cui, Shiding Zhang, Shengyu Zhang, Biqiang Chen, Yushan Zhu, Tianwei Tan. Green biomanufacturing promoted by automatic retrobiosynthesis planning and computational enzyme design. Chinese Journal of Chemical Engineering 2022, 41 , 6-21. https://doi.org/10.1016/j.cjche.2021.08.017
Kyle Trainor, Colleen M. Doyle, Avril Metcalfe-Roach, Julia Steckner, Daša Lipovšek, Heather Malakian, David Langley, Stanley R. Krystek Jr., Elizabeth M. Meiering. Design for Solubility May Reveal Induction of Amide Hydrogen/Deuterium Exchange by Protein Self-Association. Journal of Molecular Biology 2022, 434 (2) , 167398. https://doi.org/10.1016/j.jmb.2021.167398
Pritam Giri, Amol D. Pagar, Mahesh D. Patil, Hyungdon Yun. Chemical modification of enzymes to improve biocatalytic performance. Biotechnology Advances 2021, 53 , 107868. https://doi.org/10.1016/j.biotechadv.2021.107868
Jinling Xu, Haisheng Zhou, Haoran Yu, Tong Deng, Ziyuan Wang, Hongyu Zhang, Jianping Wu, Lirong Yang. Computational design of highly stable and soluble alcohol dehydrogenase for NADPH regeneration. Bioresources and Bioprocessing 2021, 8 (1) https://doi.org/10.1186/s40643-021-00362-w
Benjamin B. V. Louis, Luciano A. Abriata. Reviewing Challenges of Predicting Protein Melting Temperature Change Upon Mutation Through the Full Analysis of a Highly Detailed Dataset with High-Resolution Structures. Molecular Biotechnology 2021, 63 (10) , 863-884. https://doi.org/10.1007/s12033-021-00349-0
Sérgio M Marques, Joan Planas-Iglesias, Jiri Damborsky. Web-based tools for computational enzyme design. Current Opinion in Structural Biology 2021, 69 , 19-34. https://doi.org/10.1016/j.sbi.2021.01.010
Milos Musil, Rayyan Tariq Khan, Andy Beier, Jan Stourac, Hannes Konegger, Jiri Damborsky, David Bednar. FireProtASR: A Web Server for Fully Automated Ancestral Sequence Reconstruction. Briefings in Bioinformatics 2021, 22 (4) https://doi.org/10.1093/bib/bbaa337
Yameng Xu, Yaokang Wu, Xueqin Lv, Guoyun Sun, Hongzhi Zhang, Taichi Chen, Guocheng Du, Jianghua Li, Long Liu. Design and construction of novel biocatalyst for bioprocessing: Recent advances and future outlook. Bioresource Technology 2021, 332 , 125071. https://doi.org/10.1016/j.biortech.2021.125071
Carlos Eduardo Sequeiros-Borja, Bartłomiej Surpeta, Jan Brezovsky. Recent advances in user-friendly computational tools to engineer protein function. Briefings in Bioinformatics 2021, 22 (3) https://doi.org/10.1093/bib/bbaa150
Jiri Hon, Martin Marusiak, Tomas Martinek, Antonin Kunka, Jaroslav Zendulka, David Bednar, Jiri Damborsky, . SoluProt: prediction of soluble protein expression in Escherichia coli. Bioinformatics 2021, 37 (1) , 23-28. https://doi.org/10.1093/bioinformatics/btaa1102
Yan Liu, Zi-Yi Li, Chao Guo, Can Cui, Hui Lin, Zhong-Liu Wu. Enhancing the thermal stability of ketoreductase ChKRED12 using the FireProt web server. Process Biochemistry 2021, 101 , 207-212. https://doi.org/10.1016/j.procbio.2020.11.018
Jan Stourac, Juraj Dubrava, Milos Musil, Jana Horackova, Jiri Damborsky, Stanislav Mazurenko, David Bednar. FireProtDB: database of manually curated protein stability data. Nucleic Acids Research 2021, 49 (D1) , D319-D324. https://doi.org/10.1093/nar/gkaa981
Tanatarov Dinmukhamed, Ziyang Huang, Yanfeng Liu, Xueqin Lv, Jianghua Li, Guocheng Du, Long Liu. Current advances in design and engineering strategies of industrial enzymes. Systems Microbiology and Biomanufacturing 2021, 1 (1) , 15-23. https://doi.org/10.1007/s43393-020-00005-9
Jiahua Bi, Xiaoran Jing, Lunjie Wu, Xia Zhou, Jie Gu, Yao Nie, Yan Xu. Computational design of noncanonical amino acid-based thioether staples at N/C-terminal domains of multi-modular pullulanase for thermostabilization in enzyme catalysis. Computational and Structural Biotechnology Journal 2021, 19 , 577-585. https://doi.org/10.1016/j.csbj.2020.12.043
Stanislav Mazurenko. Predicting protein stability and solubility changes upon mutations: data perspective. ChemCatChem 2020, 12 (22) , 5590-5598. https://doi.org/10.1002/cctc.202000933
Klara Markova, Klaudia Chmelova, Sérgio M. Marques, Philippe Carpentier, David Bednar, Jiri Damborsky, Martin Marek. Decoding the intricate network of molecular interactions of a hyperstable engineered biocatalyst. Chemical Science 2020, 11 (41) , 11162-11178. https://doi.org/10.1039/D0SC03367G
Qiuming Chen, Yaqin Xiao, Wenli Zhang, Wanmeng Mu. Current methods and applications in computational protein design for food industry. Critical Reviews in Food Science and Nutrition 2020, 60 (19) , 3259-3270. https://doi.org/10.1080/10408398.2019.1682513
Marian H. Hettiaratchi, Matthew J. O’Meara, Teresa R. O’Meara, Andrew J. Pickering, Nitzan Letko-Khait, Molly S. Shoichet. Reengineering biocatalysts: Computational redesign of chondroitinase ABC improves efficacy and stability. Science Advances 2020, 6 (34) https://doi.org/10.1126/sciadv.abc6378
Jiri Hon, Simeon Borko, Jan Stourac, Zbynek Prokop, Jaroslav Zendulka, David Bednar, Tomas Martinek, Jiri Damborsky. EnzymeMiner: automated mining of soluble enzymes with diverse structures, catalytic properties and stabilities. Nucleic Acids Research 2020, 48 (W1) , W104-W109. https://doi.org/10.1093/nar/gkaa372
Sara Arana-Peña, Diego Carballares, Ángel Berenguer-Murcia, Andrés Alcántara, Rafael Rodrigues, Roberto Fernandez-Lafuente. One Pot Use of Combilipases for Full Modification of Oils and Fats: Multifunctional and Heterogeneous Substrates. Catalysts 2020, 10 (6) , 605. https://doi.org/10.3390/catal10060605
Kulandai Arockia Rajesh Packiam, Ramakrishnan Nagasundara Ramanan, Chien Wei Ooi, Lakshminarasimhan Krishnaswamy, Beng Ti Tey. Stepwise optimization of recombinant protein production in Escherichia coli utilizing computational and experimental approaches. Applied Microbiology and Biotechnology 2020, 104 (8) , 3253-3266. https://doi.org/10.1007/s00253-020-10454-w
Yi Zhang, Alberta NA Aryee, Benjamin K Simpson. Current role of in silico approaches for food enzymes. Current Opinion in Food Science 2020, 31 , 63-70. https://doi.org/10.1016/j.cofs.2019.11.003
Pornkanok Pongpamorn, Pratchaya Watthaisong, Panu Pimviriyakul, Aritsara Jaruwat, Narin Lawan, Penchit Chitnumsub, Pimchai Chaiyen. Identification of a Hotspot Residue for Improving the Thermostability of a Flavin‐Dependent Monooxygenase. ChemBioChem 2019, 20 (24) , 3020-3031. https://doi.org/10.1002/cbic.201900413
Susanna Navarro, Salvador Ventura. Computational re-design of protein structures to improve solubility. Expert Opinion on Drug Discovery 2019, 14 (10) , 1077-1088. https://doi.org/10.1080/17460441.2019.1637413
Andy Beier, Jiri Damborsky, Zbynek Prokop. Transhalogenation Catalysed by Haloalkane Dehalogenases Engineered to Stop Natural Pathway at Intermediate. Advanced Synthesis & Catalysis 2019, 361 (11) , 2438-2442. https://doi.org/10.1002/adsc.201900132

Download PDF

Abstract

High Resolution Image

Download MS PowerPoint Slide

Figure 1

Figure 1. Simplified energy landscape with characteristic conformational states accessible from the native-state ensemble of a folded enzyme. Each point on the plane defined by the X axis and Y axis resembles a different conformation of the enzyme. The corresponding value on the Z axis is the free energy of folding, which has been color-coded to depict the spectrum from less probable high-energy states (red) to more probable low-energy states (blue). The catalytic state is readily accessible from the native-state ensemble but clearly separated by a free energy barrier. Catalysis based on a conformational selection model is assumed, which requires a distinct set of conformations prior to substrate binding and catalysis. (48) A reversible transition from the native state to a partially unfolded state via TS₁ is characterized by the free energy difference of folding ΔG₁ and its free energy barrier ΔG₁^⧧. The partially unfolded state can also constitute the starting point for an irreversible unfolding transition via TS₂, leading to the fully unfolded state. Another irreversible pathway emanating from the partially unfolded state leads to an aggregated state, which is often characterized by the interactions of several biomolecules. ΔG₁ and ΔG₂ relate to thermodynamic stability, while ΔG₁^⧧ and ΔG₂^⧧ relate to kinetic stability.

High Resolution Image

Download MS PowerPoint Slide

Figure 2

Figure 2. Representative experimental methods to quantify (a–d) protein stability and (e, f) solubility. Curves for a hypothetical wild-type enzyme (black) and an improved variant exhibiting higher stability or solubility (red) are shown. (a) Differential scanning calorimetry (DSC) curve. T_m is the midpoint of the transition, ΔC_p is the difference between the pre- and post-transition baselines, and ΔH is the area under the curve between the pre- and post-transition baselines. (b) Differential scanning fluorimetry (DSF) curve. Fluorescent dyes progressively bind to exposed hydrophobic regions of unfolding proteins, and the fluorescence signal is detected at different temperatures. T_m corresponds to the midpoint value of the stability curve. (c) Far-UV circular dichroism (CD) curve. Following the change of molar ellipticity at a specific wavelength over a wider temperature range monitors the change in secondary structure of an unfolding protein. The midpoint of the sigmoid curve is related to T_m of the protein. (d) Kinetic deactivation curve. For first-order deactivations, a plot of ln(activity) vs time yields a straight line with a slope of −k. The half-life can be calculated using the equation τ_1/2 = ln(2)/k and hence corresponds to the point (τ_1/2, −0.69) on the fitted line. (e) Protein precipitation experiment. The addition of a precipitant is negatively correlated with the solubility of the folded protein. The parameter β is protein-specific and characterizes the dependence of the solubility on the precipitant concentration. (f) Record from ultracentrifugation. In vitro translation followed by ultracentrifugation allows quantification of protein solubility independent of the proteostatic network of a living cell (the PURE system). The solubility percentage is calculated as the ratio of protein in the supernatant to the total protein measured by autoradiography. (60) Adapted with permission from ref (37). Copyright 2007 Elsevier.

High Resolution Image

Download MS PowerPoint Slide

Figure 3

Figure 3. Thermodynamic cycle used to compute the free energy change upon mutation (ΔΔG). ΔΔG is calculated according to the formula ΔΔG = ΔG_mut – ΔG_wt = ΔG_f – ΔG_u. For better illustration, the hypothetical folded and unfolded states of the wild type and a two-point mutant are shown. The respective substitution sites have been color-coded in black (wild type) and red (mutant). Adapted with permission from ref (69). Copyright 2012 Wiley.

High Resolution Image

Download MS PowerPoint Slide

Figure 4

Figure 4. Workflow of the protein thermostabilization platform FireProt. The hybrid method combines evolutionary- and energy-based approaches and designs stable multiple-point mutants by fundamentally different methods. (45) The user is offered three different designs, two based solely on the energy- and evolution-based approaches and a third combining all of the identified mutations. FireProt has been made available as a fully automated and user-friendly web application (89) and is free of charge for academic users at http://loschmidt.chemi.muni.cz/fireprot.

High Resolution Image

Download MS PowerPoint Slide

Figure 5

Figure 5. Workflow of the protein solubilization platform SolubiS. The platform uses free energy calculations performed with FoldX to avoid potentially destabilizing mutations in aggregation-prone regions identified by TANGO. The results are presented in form of a mutant aggregation and stability spectrum plot. (220) The web server is free of charge for academic users at http://solubis.switchlab.org/.

High Resolution Image

Download MS PowerPoint Slide
References

ARTICLE SECTIONS
Jump To

This article references 221 other publications.
1. 1
  Choi, J.-M.; Han, S.-S.; Kim, H.-S. Industrial Applications of Enzyme Biocatalysis: Current Status and Future Aspects. Biotechnol. Adv. 2015, 33, 1443– 1454, DOI: 10.1016/j.biotechadv.2015.02.014
  
  1
  Industrial applications of enzyme biocatalysis: Current status and future aspects
  
  Choi, Jung-Min; Han, Sang-Soo; Kim, Hak-Sung
  
  Biotechnology Advances (2015), 33 (7), 1443-1454CODEN: BIADDD; ISSN:0734-9750. (Elsevier)
  
  A review. Enzymes are the most proficient catalysts, offering much more competitive processes compared to chem. catalysts. The no. of industrial applications for enzymes has exploded in recent years, mainly owing to advances in protein engineering technol. and environmental and economic necessities. Herein, we review recent progress in enzyme biocatalysis, and discuss the trends and strategies that are leading to broader industrial enzyme applications. The challenges and opportunities in developing biocatalytic processes are also discussed.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXksV2gsL4%253D&md5=024eb6961205da41328dd5b1c3b19244
2. 2
  Mitchell, A. C.; Briquez, P. S.; Hubbell, J. A.; Cochran, J. R. Engineering Growth Factors for Regenerative Medicine Applications. Acta Biomater. 2016, 30, 1– 12, DOI: 10.1016/j.actbio.2015.11.007
  
  2
  Engineering growth factors for regenerative medicine applications
  
  Mitchell, Aaron C.; Briquez, Priscilla S.; Hubbell, Jeffrey A.; Cochran, Jennifer R.
  
  Acta Biomaterialia (2016), 30 (), 1-12CODEN: ABCICB; ISSN:1742-7061. (Elsevier Ltd.)
  
  Growth factors are important morphogenetic proteins that instruct cell behavior and guide tissue repair and renewal. Although their therapeutic potential holds great promise in regenerative medicine applications, translation of growth factors into clin. treatments has been hindered by limitations including poor protein stability, low recombinant expression yield, and suboptimal efficacy. This review highlights current tools, technologies, and approaches to design integrated and effective growth factor-based therapies for regenerative medicine applications. The first section describes rational and combinatorial protein engineering approaches that have been utilized to improve growth factor stability, expression yield, biodistribution, and serum half-life, or alter their cell trafficking behavior or receptor binding affinity. The second section highlights elegant biomaterial-based systems, inspired by the natural extracellular matrix milieu, that have been developed for effective spatial and temporal delivery of growth factors to cell surface receptors. Although appearing distinct, these two approaches are highly complementary and involve principles of mol. design and engineering to be considered in parallel when developing optimal materials for clin. applications. Growth factors are promising therapeutic proteins that have the ability to modulate morphogenetic behaviors, including cell survival, proliferation, migration and differentiation. However, the translation of growth factors into clin. therapies has been hindered by properties such as poor protein stability, low recombinant expression yield, and non-physiol. delivery, which lead to suboptimal efficacy and adverse side effects. To address these needs, researchers are employing clever mol. and material engineering and design strategies to both improve the intrinsic properties of growth factors and effectively control their delivery into tissue. This review highlights examples of interdisciplinary tools and technologies used to augment the therapeutic potential of growth factors for clin. applications in regenerative medicine.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhvFWlur%252FK&md5=d04bfadd38b7372647cdb2fb406331bc
3. 3
  Dvořák, P.; Nikel, P. I.; Damborský, J.; de Lorenzo, V. Bioremediation 3.0: Engineering Pollutant-Removing Bacteria in the Times of Systemic Biology. Biotechnol. Adv. 2017, 35, 845– 866, DOI: 10.1016/j.biotechadv.2017.08.001
  
  3
  Bioremediation 3.0: Engineering pollutant-removing bacteria in the times of systemic biology
  
  Dvorak, Pavel; Nikel, Pablo I.; Damborsky, Jiri; de Lorenzo, Victor
  
  Biotechnology Advances (2017), 35 (7), 845-866CODEN: BIADDD; ISSN:0734-9750. (Elsevier)
  
  Elimination or mitigation of the toxic effects of chem. waste released to the environment by industrial and urban activities relies largely on the catalytic activities of microorganisms-specifically bacteria. Given their capacity to evolve rapidly, they have the biochem. power to tackle a large no. of mols. mobilized from their geol. repositories through human action (e.g., hydrocarbons, heavy metals) or generated through chem. synthesis (e.g., xenobiotic compds.). Whereas naturally occurring microbes already have considerable ability to remove many environmental pollutants with no external intervention, the onset of genetic engineering in the 1980s allowed the possibility of rational design of bacteria to catabolize specific compds., which could eventually be released into the environment as bioremediation agents. The complexity of this endeavour and the lack of fundamental knowledge nonetheless led to the virtual abandonment of such a recombinant DNA-based bioremediation only a decade later. In a twist of events, the last few years have witnessed the emergence of new systemic fields (including systems and synthetic biol., and metabolic engineering) that allow revisiting the same environmental pollution challenges through fresh and far more powerful approaches. The focus on contaminated sites and chems. has been broadened by the phenomenal problems of anthropogenic emissions of greenhouse gases and the accumulation of plastic waste on a global scale. In this article, we analyze how contemporary systemic biol. is helping to take the design of bioremediation agents back to the core of environmental biotechnol. We inspect a no. of recent strategies for catabolic pathway construction and optimization and we bring them together by proposing an engineering workflow.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtlOrt7zO&md5=fdd614f3970f7ea580fd20affbbd4f50
4. 4
  Vanacek, P.; Sebestova, E.; Babkova, P.; Bidmanova, S.; Daniel, L.; Dvorak, P.; Stepankova, V.; Chaloupkova, R.; Brezovsky, J.; Prokop, Z.; Damborsky, J. Exploration of Enzyme Diversity by Integrating Bioinformatics with Expression Analysis and Biochemical Characterization. ACS Catal. 2018, 8, 2402– 2412, DOI: 10.1021/acscatal.7b03523
  
  4
  Exploration of enzyme diversity by integrating bioinformatics with expression analysis and biochemical characterization
  
  Vanacek, Pavel; Sebestova, Eva; Babkova, Petra; Bidmanova, Sarka; Daniel, Lukas; Dvorak, Pavel; Stepankova, Veronika; Chaloupkova, Radka; Brezovsky, Jan; Prokop, Zbynek; Damborsky, Jiri
  
  ACS Catalysis (2018), 8 (3), 2402-2412CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)
  
  Millions of protein sequences are being discovered at an incredible pace, representing an inexhaustible source of biocatalysts. Here, we describe an integrated system for automated in silico screening and systematic characterization of diverse family members. The workflow consists of (i) identification and computational characterization of relevant genes by sequence/structural bioinformatics, (ii) expression anal. and activity screening of selected proteins, and (iii) complete biochem./biophys. characterization and was validated against the haloalkane dehalogenase family. The sequence-based search identified 658 potential dehalogenases. The subsequent structural bioinformatics prioritized and selected 20 candidates for exploration of protein functional diversity. Out of these 20, the expression anal. and the robotic screening of enzymic activity provided 8 sol. proteins with dehalogenase activity. The enzymes discovered originated from genetically unrelated Bacteria, Eukaryota, and also Archaea. Overall, the integrated system provided biocatalysts with broad catalytic diversity showing unique substrate specificity profiles, covering a wide range of optimal operational temp. from 20 to 70 °C and an unusually broad pH range from 5.7 to 10. We obtained the most catalytically proficient native haloalkane dehalogenase enzyme to date (kcat/K0.5 = 96.8 mM-1s-1), the most thermostable enzyme with melting temp. 71 °C, three different cold-adapted enzymes showing dehalogenase activity at near-to-zero temps., and a biocatalyst degrading the warfare chem. sulfur mustard. The established strategy can be adapted to other enzyme families for exploration of their biocatalytic diversity in a large sequence space continuously growing due to the use of next-generation sequencing technologies.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit1Wiur0%253D&md5=d6ecaacad16a14d9020c4c22df0220f6
5. 5
  Bornscheuer, U. T.; Huisman, G. W.; Kazlauskas, R. J.; Lutz, S.; Moore, J. C.; Robins, K. Engineering the Third Wave of Biocatalysis. Nature 2012, 485, 185– 194, DOI: 10.1038/nature11117
  
  5
  Engineering the third wave of biocatalysis
  
  Bornscheuer, U. T.; Huisman, G. W.; Kazlauskas, R. J.; Lutz, S.; Moore, J. C.; Robins, K.
  
  Nature (London, United Kingdom) (2012), 485 (7397), 185-194CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)
  
  A review. Over the past ten years, scientific and technol. advances have established biocatalysis as a practical and environmentally friendly alternative to traditional metallo- and organocatalysis in chem. synthesis, both in the lab. and on an industrial scale. Key advances in DNA sequencing and gene synthesis are at the base of tremendous progress in tailoring biocatalysts by protein engineering and design, and the ability to reorganize enzymes into new biosynthetic pathways. To highlight these achievements, here we discuss applications of protein-engineered biocatalysts ranging from commodity chems. to advanced pharmaceutical intermediates that use enzyme catalysis as a key step.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmvVeqsLk%253D&md5=5f20c530c25ea886f5f5d33dbea0075a
6. 6
  Tokuriki, N.; Stricher, F.; Serrano, L.; Tawfik, D. S. How Protein Stability and New Functions Trade Off. PLoS Comput. Biol. 2008, 4, e1000002, DOI: 10.1371/journal.pcbi.1000002
  
  6
  How protein stability and new functions trade off
  
  Tokuriki Nobuhiko; Stricher Francois; Serrano Luis; Tawfik Dan S
  
  PLoS computational biology (2008), 4 (2), e1000002 ISSN:.
  
  Numerous studies have noted that the evolution of new enzymatic specificities is accompanied by loss of the protein's thermodynamic stability (DeltaDeltaG), thus suggesting a tradeoff between the acquisition of new enzymatic functions and stability. However, since most mutations are destabilizing (DeltaDeltaG>0), one should ask how destabilizing mutations that confer new or altered enzymatic functions relative to all other mutations are. We applied DeltaDeltaG computations by FoldX to analyze the effects of 548 mutations that arose from the directed evolution of 22 different enzymes. The stability effects, location, and type of function-altering mutations were compared to DeltaDeltaG changes arising from all possible point mutations in the same enzymes. We found that mutations that modulate enzymatic functions are mostly destabilizing (average DeltaDeltaG = +0.9 kcal/mol), and are almost as destabilizing as the "average" mutation in these enzymes (+1.3 kcal/mol). Although their stability effects are not as dramatic as in key catalytic residues, mutations that modify the substrate binding pockets, and thus mediate new enzymatic specificities, place a larger stability burden than surface mutations that underline neutral, non-adaptive evolutionary changes. How are the destabilizing effects of functional mutations balanced to enable adaptation? Our analysis also indicated that many mutations that appear in directed evolution variants with no obvious role in the new function exert stabilizing effects that may compensate for the destabilizing effects of the crucial function-altering mutations. Thus, the evolution of new enzymatic activities, both in nature and in the laboratory, is dependent on the compensatory, stabilizing effect of apparently "silent" mutations in regions of the protein that are irrelevant to its function.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1czgtFWksg%253D%253D&md5=ade16cd7f3f47d20357654c6b85ce338
7. 7
  Dellus-Gur, E.; Toth-Petroczy, A.; Elias, M.; Tawfik, D. S. What Makes a Protein Fold Amenable to Functional Innovation? Fold Polarity and Stability Trade-Offs. J. Mol. Biol. 2013, 425, 2609– 2621, DOI: 10.1016/j.jmb.2013.03.033
  
  7
  What makes a protein fold amenable to functional innovation? Fold polarity and stability trade-offs
  
  Dellus-Gur, Eynat; Toth-Petroczy, Agnes; Elias, Mikael; Tawfik, Dan S.
  
  Journal of Molecular Biology (2013), 425 (14), 2609-2621CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)
  
  Protein evolvability includes 2 elements, robustness (or neutrality, mutations having no effect) and innovability (mutations readily inducing new functions). How are these 2 conflicting demands bridged. Does the ability to bridge them relate to the observation that certain folds, such as TIM barrels, accommodate numerous functions, whereas other folds support only one. Here, the authors hypothesized that the key to innovability is polarity, an active site composed of flexible, loosely packed loops alongside a well-sepd., highly ordered scaffold. The authors showed that highly stabilized variants of TEM-1 β-lactamase exhibited selective rigidification of the enzyme's scaffold while the active site loops maintained their conformational plasticity. Polarity therefore results in stabilizing, compensatory mutations not trading off, but instead promoting the acquisition of new activities. Indeed, computational anal. indicated that in folds that accommodate only one function throughout evolution, e.g., dihydrofolate reductase, ≥60% of the active site residues belonged to the scaffold. In contrast, folds assocd. with multiple functions such as the TIM barrel showed high scaffold-active site polarity (∼20% of the active site comprised scaffold residues) and >2-fold higher rates of sequence divergence at active site positions. Thus, this work suggests structural measures of fold polarity that appear to be correlated with innovability, thereby providing new insights regarding protein evolution, design, and engineering.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmtV2rsbs%253D&md5=3c572ecbdb0b1d5ace71fd67257be136
8. 8
  Johansson, K. E.; Johansen, N. T.; Christensen, S.; Horowitz, S.; Bardwell, J. C. A.; Olsen, J. G.; Willemoës, M.; Lindorff-Larsen, K.; Ferkinghoff-Borg, J.; Hamelryck, T.; Winther, J. R. Computational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone Template. J. Mol. Biol. 2016, 428, 4361– 4377, DOI: 10.1016/j.jmb.2016.09.013
  
  8
  Computational Redesign of Thioredoxin Is Hypersensitive toward Minor Conformational Changes in the Backbone Template
  
  Johansson, Kristoffer E.; Johansen, Nicolai Tidemand; Christensen, Signe; Horowitz, Scott; Bardwell, James C. A.; Olsen, Johan G.; Willemoes, Martin; Lindorff-Larsen, Kresten; Ferkinghoff-Borg, Jesper; Hamelryck, Thomas; Winther, Jakob R.
  
  Journal of Molecular Biology (2016), 428 (21), 4361-4377CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)
  
  Despite the development of powerful computational tools, the full-sequence design of proteins still remains a challenging task. To investigate the limits and capabilities of computational tools, we conducted a study of the ability of the program Rosetta to predict sequences that recreate the authentic fold of thioredoxin. Focusing on the influence of conformational details in the template structures, we based our study on 8 exptl. detd. template structures and generated 120 designs from each. For exptl. evaluation, we chose six sequences from each of the eight templates by objective criteria. The 48 selected sequences were evaluated based on their progressive ability to (1) produce sol. protein in Escherichia coli and (2) yield stable monomeric protein, and (3) on the ability of the stable, sol. proteins to adopt the target fold. Of the 48 designs, we were able to synthesize 32, 20 of which resulted in sol. protein. Of these, only two were sufficiently stable to be purified. An X-ray crystal structure was solved for one of the designs, revealing a close resemblance to the target structure. We found a significant difference among the eight template structures to realize the above three criteria despite their high structural similarity. Thus, in order to improve the success rate of computational full-sequence design methods, we recommend that multiple template structures are used. Furthermore, this study shows that special care should be taken when optimizing the geometry of a structure prior to computational design when using a method that is based on rigid conformations.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsFOku7%252FM&md5=7f8f2300f99adbd84be36e52806c9a1e
9. 9
  Arabnejad, H.; Dal Lago, M.; Jekel, P. A.; Floor, R. J.; Thunnissen, A.-M. W. H.; Terwisscha van Scheltinga, A. C.; Wijma, H. J.; Janssen, D. B. A Robust Cosolvent-Compatible Halohydrin Dehalogenase by Computational Library Design. Protein Eng., Des. Sel. 2017, 30, 175– 189, DOI: 10.1093/protein/gzw068
  
  9
  A robust cosolvent-compatible halohydrin dehalogenase by computational library design
  
  Arabnejad, Hesam; Lago, Marco Dal; Jekel, Peter A.; Floor, Robert J.; Thunnissen, Andy-Mark W. H.; van Scheltinga, Anke C. Terwisscha; Wijma, Hein J.; Janssen, Dick B.
  
  Protein Engineering, Design & Selection (2017), 30 (3), 175-189CODEN: PEDSBR; ISSN:1741-0134. (Oxford University Press)
  
  To improve the applicability of halohydrin dehalogenase as a catalyst for reactions in the presence of org. cosolvents, we explored a computational library design strategy (Framework for Rapid Enzyme Stabilization by Computational libraries) that involves discovery and in silico evaluation of stabilizing mutations. Energy calcns., disulfide bond predictions and mol. dynamics simulations identified 218 point mutations and 35 disulfide bonds with predicted stabilizing effects. Expts. confirmed 29 stabilizing point mutations, most of which were located in two distinct regions, whereas introduction of disulfide bonds was not effective. Combining the best mutations resulted in a 12-fold mutant (HheC-H12) with a 28°C higher apparent melting temp. and a remarkable increase in resistance to cosolvents. This variant also showed a higher optimum temp. for catalysis while activity at low temp. was preserved. Mutant H12 was used as a template for the introduction of mutations that enhance enantioselectivity or activity. Crystal structures showed that the structural changes in the H12 mutant mostly agreed with the computational predictions and that the enhanced stability was mainly due to mutations that redistributed surface charges and improved interactions between subunits, the latter including better interactions of water mols. at the subunit interfaces.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhsFWmsL7P&md5=b96764d41caee68bf2da8c229f63fa95
10. 10
  Wyganowski, K. T.; Kaltenbach, M.; Tokuriki, N. GroEL/ES Buffering and Compensatory Mutations Promote Protein Evolution by Stabilizing Folding Intermediates. J. Mol. Biol. 2013, 425, 3403– 3414, DOI: 10.1016/j.jmb.2013.06.028
  
  10
  GroEL/ES Buffering and Compensatory Mutations Promote Protein Evolution by Stabilizing Folding Intermediates
  
  Wyganowski, Kirsten T.; Kaltenbach, Miriam; Tokuriki, Nobuhiko
  
  Journal of Molecular Biology (2013), 425 (18), 3403-3414CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)
  
  Maintaining stability is a major constraint in protein evolution because most mutations are destabilizing. Buffering and/or compensatory mechanisms that counteract this progressive destabilization during functional adaptation are pivotal for protein evolution as well as protein engineering. However, the interplay of these two mechanisms during a full evolutionary trajectory has never been explored. Here, we unravel such dynamics during the lab. evolution of a phosphotriesterase into an arylesterase. A controllable GroEL/ES chaperone co-expression system enabled us to vary the selection environment between buffering and compensatory, which smoothened the trajectory along the fitness landscape to achieve a > 104 increase in arylesterase activity. Biophys. characterization revealed that, in contrast to prevalent models of protein stability and evolution, the variants' sol. cellular expression did not correlate with in vitro stability, and compensatory mutations were linked to a stabilization of folding intermediates. Thus, folding kinetics in the cell are a key feature of protein evolvability.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtFOgtLbO&md5=ff35330976ea66426ccd97b1f481eb5a
11. 11
  Lawrence, P. B.; Gavrilov, Y.; Matthews, S. S.; Langlois, M. I.; Shental-Bechor, D.; Greenblatt, H. M.; Pandey, B. K.; Smith, M. S.; Paxman, R.; Torgerson, C. D.; Merrell, J. P.; Ritz, C. C.; Prigozhin, M. B.; Levy, Y.; Price, J. L. Criteria for Selecting PEGylation Sites on Proteins for Higher Thermodynamic and Proteolytic Stability. J. Am. Chem. Soc. 2014, 136, 17547– 17560, DOI: 10.1021/ja5095183
  
  11
  Criteria for Selecting PEGylation Sites on Proteins for Higher Thermodynamic and Proteolytic Stability
  
  Lawrence, Paul B.; Gavrilov, Yulian; Matthews, Sam S.; Langlois, Minnie I.; Shental-Bechor, Dalit; Greenblatt, Harry M.; Pandey, Brijesh K.; Smith, Mason S.; Paxman, Ryan; Torgerson, Chad D.; Merrell, Jacob P.; Ritz, Cameron C.; Prigozhin, Maxim B.; Levy, Yaakov; Price, Joshua L.
  
  Journal of the American Chemical Society (2014), 136 (50), 17547-17560CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)
  
  PEGylation of protein side chains has been used for >30 years to enhance the pharmacokinetic properties of protein drugs. However, there are no structure- or sequence-based guidelines for selecting sites that provide optimal PEG-based pharmacokinetic enhancement with minimal losses to biol. activity. The authors hypothesize that globally optimal PEGylation sites are characterized by the ability of the PEG oligomer to increase protein conformational stability; however, the current understanding of how PEG influences the conformational stability of proteins is incomplete. Here the authors use the WW domain of the human protein Pin 1 (WW) as a model system to probe the impact of PEG on protein conformational stability. Using a combination of exptl. and theor. approaches, the authors develop a structure-based method for predicting which sites within WW are most likely to experience PEG-based stabilization, and this method correctly predicts the location of a stabilizing PEGylation site within the chicken Src SH3 domain. PEG-based stabilization in WW is assocd. with enhanced resistance to proteolysis, is entropic in origin, and likely involves disruption by PEG of the network of hydrogen-bound solvent mols. that surround the protein. The authors' results highlight the possibility of using modern site-specific PEGylation techniques to install PEG oligomers at predetd. locations where PEG will provide optimal increases in conformational and proteolytic stability.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhvFOjsL3F&md5=080147667225e65ace2de42bbb99266c
12. 12
  Rueda, N.; Dos Santos, J. C. S.; Ortiz, C.; Torres, R.; Barbosa, O.; Rodrigues, R. C.; Berenguer-Murcia, Á.; Fernandez-Lafuente, R. Chemical Modification in the Design of Immobilized Enzyme Biocatalysts: Drawbacks and Opportunities. Chem. Rec. 2016, 16, 1436– 1455, DOI: 10.1002/tcr.201600007
  
  12
  Chemical Modification in the Design of Immobilized Enzyme Biocatalysts: Drawbacks and Opportunities
  
  Rueda, Nazzoly; dos Santos, Jose C. S.; Ortiz, Claudia; Torres, Rodrigo; Barbosa, Oveimar; Rodrigues, Rafael C.; Berenguer-Murcia, Angel; Fernandez-Lafuente, Roberto
  
  Chemical Record (2016), 16 (3), 1436-1455CODEN: CRHEAK; ISSN:1528-0691. (Wiley-VCH Verlag GmbH & Co. KGaA)
  
  Chem. modification of enzymes and immobilization used to be considered as sep. ways to improve enzyme properties. This review shows how the coupled use of both tools may greatly improve the final biocatalyst performance. Chem. modification of a previously immobilized enzyme is far simpler and easier to control than the modification of the free enzyme. Moreover, if protein modification is performed to improve its immobilization (enriching the enzyme in reactive groups), the final features of the immobilized enzyme may be greatly improved. Chem. modification may be directed to improve enzyme stability, but also to improve selectivity, specificity, activity, and even cell penetrability. Coupling of immobilization and chem. modification with site-directed mutagenesis is a powerful instrument to obtain fully controlled modification. Some new ideas such as photoreceptive enzyme modifiers that change their phys. properties under UV exposition are discussed.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XnslajsL4%253D&md5=a13d77ce97c2c090b8cb40bc079aec4d
13. 13
  Stepankova, V.; Bidmanova, S.; Koudelakova, T.; Prokop, Z.; Chaloupkova, R.; Damborsky, J. Strategies for Stabilization of Enzymes in Organic Solvents. ACS Catal. 2013, 3, 2823– 2836, DOI: 10.1021/cs400684x
  
  13
  Strategies for Stabilization of Enzymes in Organic Solvents
  
  Stepankova, Veronika; Bidmanova, Sarka; Koudelakova, Tana; Prokop, Zbynek; Chaloupkova, Radka; Damborsky, Jiri
  
  ACS Catalysis (2013), 3 (12), 2823-2836CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)
  
  A review. One of the major barriers to the use of enzymes in industrial biotechnol. is their insufficient stability under processing conditions. The use of org. solvent systems instead of aq. media for enzymic reactions offers numerous advantages, such as increased soly. of hydrophobic substrates or suppression of water-dependent side reactions. For example, reverse hydrolysis reactions that form esters from acids and alcs. become thermodynamically favorable. However, org. solvents often inactivate enzymes. Industry and academia have devoted considerable effort into developing effective strategies to enhance the lifetime of enzymes in the presence of org. solvents. The strategies can be grouped into three main categories: (i) isolation of novel enzymes functioning under extreme conditions, (ii) modification of enzyme structures to increase their resistance toward nonconventional media, and (iii) modification of the solvent environment to decrease its denaturing effect on enzymes. Here we discuss successful examples representing each of these categories and summarize their advantages and disadvantages. Finally, we highlight some potential future research directions in the field, such as investigation of novel nanomaterials for immobilization, wider application of computational tools for semirational prediction of stabilizing mutations, knowledge-driven modification of key structural elements learned from successfully engineered proteins, and replacement of volatile org. solvents by ionic liqs. and deep eutectic solvents.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhs1Sqs7nL&md5=6fdea25bb110c5b82c8fd8c03dcf7e90
14. 14
  Butt, T. R.; Edavettal, S. C.; Hall, J. P.; Mattern, M. R. SUMO Fusion Technology for Difficult-to-Express Proteins. Protein Expression Purif. 2005, 43, 1– 9, DOI: 10.1016/j.pep.2005.03.016
  
  14
  SUMO fusion technology for difficult-to-express proteins
  
  Butt, Tauseef R.; Edavettal, Suzanne C.; Hall, John P.; Mattern, Michael R.
  
  Protein Expression and Purification (2005), 43 (1), 1-9CODEN: PEXPEJ; ISSN:1046-5928. (Elsevier)
  
  A review. The demands of structural and functional genomics for large quantities of sol., properly folded proteins in heterologous hosts have been aided by advancements in the field of protein prodn. and purifn. Escherichia coli, the preferred host for recombinant protein expression, presents many challenges which must be surmounted in order to over-express heterologous proteins. These challenges include the proteolytic degrdn. of target proteins, protein misfolding, poor soly., and the necessity for good purifn. methodologies. Gene fusion technologies have been able to improve heterologous expression by overcoming many of these challenges. The ability of gene fusions to improve expression, soly., purifn., and decrease proteolytic degrdn. will be discussed in this review. The main disadvantage, cleaving the protein fusion, will also be addressed. Focus will be given to the newly described SUMO fusion system and the improvements that this technol. has advanced over traditional gene fusion systems.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXntVGhsbw%253D&md5=999298b10bc86121509754c3fe448bae
15. 15
  LaVallie, E. R.; DiBlasio, E. A.; Kovacic, S.; Grant, K. L.; Schendel, P. F.; McCoy, J. M. A Thioredoxin Gene Fusion Expression System That Circumvents Inclusion Body Formation in the E. coli Cytoplasm. Nat. Biotechnol. 1993, 11, 187– 193, DOI: 10.1038/nbt0293-187
  
  15
  A thioredoxin gene fusion expression system that circumvents inclusion body formation in the E. coli cytoplasm
  
  LaVallie, Edward R.; DiBlasio, Elizabeth A.; Kovacic, Sharlotte; Grant, Kathleen L.; Schendel, Paul F.; McCoy, John M.
  
  Bio/Technology (1993), 11 (2), 187-93CODEN: BTCHDA; ISSN:0733-222X.
  
  A versatile Escherichia coli expression system was developed based on the use of E. coli thioredoxin (trxA) as a gene fusion partner. The broad utility of the system is illustrated by the prodn. of a variety of mammalian cytokines and growth factors as thioredoxin fusion proteins. Although many of these cytokines previously have been produced in E. coli as insol. aggregates or inclusion bodies, as thioredoxin fusions they can be made in sol. forms that are biol. active. In general, linkage to thioredoxin dramatically increases the soly. of heterologous proteins synthesized in the E. coli cytoplasm, and thioredoxin fusion proteins usually accumulate to high levels. Two addnl. properties of E. coli thioredoxin, its ability to be specifically released from the E. coli cytoplasm by osmotic shock or freeze/thaw treatments and its intrinsic thermal stability , are retained by some fusions and provide convenient purifn. steps. Active-site loop of E. coli thioredoxin can be used as a general site for small peptide insertions, allowing for the high level prodn. of sol. peptides in the E. coli cytoplasm.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3sXisFegsr0%253D&md5=d29a3b665417f16971038d34f7e58d92
16. 16
  Bloom, J. D.; Labthavikul, S. T.; Otey, C. R.; Arnold, F. H. Protein Stability Promotes Evolvability. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 5869– 5874, DOI: 10.1073/pnas.0510098103
  
  16
  Protein stability promotes evolvability
  
  Bloom, Jesse D.; Labthavikul, Sy T.; Otey, Christopher R.; Arnold, Frances H.
  
  Proceedings of the National Academy of Sciences of the United States of America (2006), 103 (15), 5869-5874CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)
  
  The biophys. properties that enable proteins to so readily evolve to perform diverse biochem. tasks are largely unknown. Here, we show that a protein's capacity to evolve is enhanced by the mutational robustness conferred by extra stability. We use simulations with model lattice proteins to demonstrate how extra stability increases evolvability by allowing a protein to accept a wider range of beneficial mutations while still folding to its native structure. We confirm this view exptl. by mutating marginally stable and thermostable variants of cytochrome P 450 BM3. Mutants of the stabilized parent were more likely to exhibit new or improved functions. Only the stabilized P 450 parent could tolerate the highly destabilizing mutations needed to confer novel activities such as hydroxylating the antiinflammatory drug naproxen. Our work establishes a crucial link between protein stability and evolution. We show that we can exploit this link to discover protein functions, and we suggest how natural evolution might do the same.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XktFait7s%253D&md5=dde8f702bc7083edad42a615aff09292
17. 17
  Sormanni, P.; Aprile, F. A.; Vendruscolo, M. The CamSol Method of Rational Design of Protein Mutants with Enhanced Solubility. J. Mol. Biol. 2015, 427, 478– 490, DOI: 10.1016/j.jmb.2014.09.026
  
  17
  The CamSol Method of Rational Design of Protein Mutants with Enhanced Solubility
  
  Sormanni, Pietro; Aprile, Francesco A.; Vendruscolo, Michele
  
  Journal of Molecular Biology (2015), 427 (2), 478-490CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)
  
  Protein soly. is often an essential requirement in biotechnol. and biomedical applications. Great advances in understanding the principles that det. this specific property of proteins have been made during the past decade, in particular concerning the physicochem. characteristics of their constituent amino acids. By exploiting these advances, we present the CamSol method for the rational design of protein variants with enhanced soly. The method works by performing a rapid computational screening of tens of thousand of mutations to identify those with the greatest impact on the soly. of the target protein while maintaining its native state and biol. activity. The application to a single-domain antibody that targets the Alzheimer's Aβ peptide demonstrates that the method predicts with great accuracy soly. changes upon mutation, thus offering a cost-effective strategy to help the prodn. of sol. proteins for academic and industrial purposes.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhslOktbfN&md5=10cea42ff7f45b198c6bc60f52127adf
18. 18
  Ganesan, A.; Siekierska, A.; Beerten, J.; Brams, M.; Van Durme, J.; De Baets, G.; Van der Kant, R.; Gallardo, R.; Ramakers, M.; Langenberg, T.; Wilkinson, H.; De Smet, F.; Ulens, C.; Rousseau, F.; Schymkowitz, J. Structural Hot Spots for the Solubility of Globular Proteins. Nat. Commun. 2016, 7, 10816, DOI: 10.1038/ncomms10816
  
  18
  Structural hot spots for the solubility of globular proteins
  
  Ganesan, Ashok; Siekierska, Aleksandra; Beerten, Jacinte; Brams, Marijke; Van Durme, Joost; De Baets, Greet; Van der Kant, Rob; Gallardo, Rodrigo; Ramakers, Meine; Langenberg, Tobias; Wilkinson, Hannah; De Smet, Frederik; Ulens, Chris; Rousseau, Frederic; Schymkowitz, Joost
  
  Nature Communications (2016), 7 (), 10816CODEN: NCAOBW; ISSN:2041-1723. (Nature Publishing Group)
  
  Natural selection shapes protein soly. to physiol. requirements and recombinant applications that require higher protein concns. are often problematic. This raises the question whether the soly. of natural protein sequences can be improved. We here show an anti-correlation between the no. of aggregation prone regions (APRs) in a protein sequence and its soly., suggesting that mutational suppression of APRs provides a simple strategy to increase protein soly. We show that mutations at specific positions within a protein structure can act as APR suppressors without affecting protein stability. These hot spots for protein soly. are both structure and sequence dependent but can be computationally predicted. We demonstrate this by reducing the aggregation of human α-galactosidase and protective antigen of Bacillus anthracis through mutation. Our results indicate that many proteins possess hot spots allowing to adapt protein soly. independently of structure and function.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xjt1emtbw%253D&md5=eba06a338f47637413eff7720f355602
19. 19
  Zeymer, C.; Hilvert, D. Directed Evolution of Protein Catalysts. Annu. Rev. Biochem. 2018, 87, 131– 157, DOI: 10.1146/annurev-biochem-062917-012034
  
  19
  Directed Evolution of Protein Catalysts
  
  Zeymer, Cathleen; Hilvert, Donald
  
  Annual Review of Biochemistry (2018), 87 (), 131-157CODEN: ARBOAW; ISSN:0066-4154. (Annual Reviews)
  
  A review. Directed evolution is a powerful technique for generating tailor-made enzymes for a wide range of biocatalytic applications. Following the principles of natural evolution, iterative cycles of mutagenesis and screening or selection are applied to modify protein properties, enhance catalytic activities, or develop completely new protein catalysts for non-natural chem. transformations. This review briefly surveys the exptl. methods used to generate genetic diversity and screen or select for improved enzyme variants. Emphasis is placed on a key challenge, namely how to generate novel catalytic activities that expand the scope of natural reactions. Two particularly effective strategies, exploiting catalytic promiscuity and rational design, are illustrated by representative examples of successfully evolved enzymes. Opportunities for extending these approaches to more complex biocatalytic systems are also considered.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXjs1Oisr0%253D&md5=933b5fe198a29f6e4ac0a738e34a566d
20. 20
  Starr, T. N.; Thornton, J. W. Epistasis in Protein Evolution. Protein Sci. 2016, 25, 1204– 1218, DOI: 10.1002/pro.2897
  
  20
  Epistasis in protein evolution
  
  Starr, Tyler N.; Thornton, Joseph W.
  
  Protein Science (2016), 25 (7), 1204-1218CODEN: PRCIEI; ISSN:1469-896X. (Wiley-Blackwell)
  
  The structure, function, and evolution of proteins depend on phys. and genetic interactions among amino acids. Recent studies have used new strategies to explore the prevalence, biochem. mechanisms, and evolutionary implications of these interactions-called epistasis-within proteins. Here we describe an emerging picture of pervasive epistasis in which the phys. and biol. effects of mutations change over the course of evolution in a lineage-specific fashion. Epistasis can restrict the trajectories available to an evolving protein or open new paths to sequences and functions that would otherwise have been inaccessible. We describe two broad classes of epistatic interactions, which arise from different phys. mechanisms and have different effects on evolutionary processes. Specific epistasis-in which one mutation influences the phenotypic effect of few other mutations-is caused by direct and indirect phys. interactions between mutations, which nonadditively change the protein's phys. properties, such as conformation, stability, or affinity for ligands. In contrast, nonspecific epistasis describes mutations that modify the effect of many others; these typically behave additively with respect to the phys. properties of a protein but exhibit epistasis because of a nonlinear relationship between the phys. properties and their biol. effects, such as function or fitness. Both types of interaction are rampant, but specific epistasis has stronger effects on the rate and outcomes of evolution, because it imposes stricter constraints and modulates evolutionary potential more dramatically; it therefore makes evolution more contingent on low-probability historical events and leaves stronger marks on the sequences, structures, and functions of protein families.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xjt1Cnt78%253D&md5=1bebae9ad2da57b530ac51ade55e1813
21. 21
  Goldsmith, M.; Tawfik, D. S. Enzyme Engineering: Reaching the Maximal Catalytic Efficiency Peak. Curr. Opin. Struct. Biol. 2017, 47, 140– 150, DOI: 10.1016/j.sbi.2017.09.002
  
  21
  Enzyme engineering: reaching the maximal catalytic efficiency peak
  
  Goldsmith, Moshe; Tawfik, Dan S.
  
  Current Opinion in Structural Biology (2017), 47 (), 140-150CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)
  
  A review. The practical need for highly efficient enzymes presents new challenges in enzyme engineering, in particular, the need to improve catalytic turnover (kcat) or efficiency (kcat/KM) by several orders of magnitude. However, optimizing catalysis demands navigation through complex and rugged fitness landscapes, with optimization trajectories often leading to strong diminishing returns and dead-ends. When no further improvements are obsd. in library screens or selections, it remains unclear whether the maximal catalytic efficiency of the enzyme (the catalytic 'fitness peak') has been reached; or perhaps, an alternative combination of mutations exists that could yield addnl. improvements. Here, we discuss fundamental aspects of the process of catalytic optimization, and offer practical solns. with respect to overcoming optimization plateaus.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhs1ais7vK&md5=6d05255856d58b4d8fd491c5b68da28b
22. 22
  Currin, A.; Swainston, N.; Day, P. J.; Kell, D. B. Synthetic Biology for the Directed Evolution of Protein Biocatalysts: Navigating Sequence Space Intelligently. Chem. Soc. Rev. 2015, 44, 1172– 1239, DOI: 10.1039/C4CS00351A
  
  22
  Synthetic biology for the directed evolution of protein biocatalysts: navigating sequence space intelligently
  
  Currin, Andrew; Swainston, Neil; Day, Philip J.; Kell, Douglas B.
  
  Chemical Society Reviews (2015), 44 (5), 1172-1239CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)
  
  The amino acid sequence of a protein affects both its structure and its function. Thus, the ability to modify the sequence, and hence the structure and activity, of individual proteins in a systematic way, opens up many opportunities, both scientifically and (as we focus on here) for exploitation in biocatalysis. Modern methods of synthetic biol., whereby increasingly large sequences of DNA can be synthesized de novo, allow an unprecedented ability to engineer proteins with novel functions. However, the no. of possible proteins is far too large to test individually, so we need means for navigating the 'search space' of possible protein sequences efficiently and reliably in order to find desirable activities and other properties. Enzymologists distinguish binding (Kd) and catalytic (kcat) steps. In a similar way, judicious strategies have blended design (for binding, specificity and active site modeling) with the more empirical methods of classical directed evolution (DE) for improving kcat (where natural evolution rarely seeks the highest values), esp. with regard to residues distant from the active site and where the functional linkages underpinning enzyme dynamics are both unknown and hard to predict. Epistasis (where the 'best' amino acid at one site depends on that or those at others) is a notable feature of directed evolution. The aim of this review is to highlight some of the approaches that are being developed to allow us to use directed evolution to improve enzyme properties, often dramatically. We note that directed evolution differs in a no. of ways from natural evolution, including in particular the available mechanisms and the likely selection pressures. Thus, we stress the opportunities afforded by techniques that enable one to map sequence to (structure and) activity in silico, as an effective means of modeling and exploring protein landscapes. Because known landscapes may be assessed and reasoned about as a whole, simultaneously, this offers opportunities for protein improvement not readily available to natural evolution on rapid timescales. Intelligent landscape navigation, informed by sequence-activity relationships and coupled to the emerging methods of synthetic biol., offers scope for the development of novel biocatalysts that are both highly active and robust.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXitFeht7jK&md5=c921dc51a66756d2d3d96f2d0b619b38
23. 23
  Rocklin, G. J.; Chidyausiku, T. M.; Goreshnik, I.; Ford, A.; Houliston, S.; Lemak, A.; Carter, L.; Ravichandran, R.; Mulligan, V. K.; Chevalier, A.; Arrowsmith, C. H.; Baker, D. Global Analysis of Protein Folding Using Massively Parallel Design, Synthesis, and Testing. Science 2017, 357, 168– 175, DOI: 10.1126/science.aan0693
  
  23
  Global analysis of protein folding using massively parallel design, synthesis, and testing
  
  Rocklin, Gabriel J.; Chidyausiku, Tamuka M.; Goreshnik, Inna; Ford, Alex; Houliston, Scott; Lemak, Alexander; Carter, Lauren; Ravichandran, Rashmi; Mulligan, Vikram K.; Chevalier, Aaron; Arrowsmith, Cheryl H.; Baker, David
  
  Science (Washington, DC, United States) (2017), 357 (6347), 168-175CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)
  
  Proteins fold into unique native structures stabilized by thousands of weak interactions that collectively overcome the entropic cost of folding. Although these forces are "encoded" in the thousands of known protein structures, "decoding" them is challenging because of the complexity of natural proteins that have evolved for function, not stability. We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1000 natural proteins, 10,000 point mutants, and 30,000 neg. control sequences. This anal. identified more than 2500 stable designed proteins in four basic folds - a no. sufficient to enable us to systematically examine how sequence dets. folding and stability in uncharted protein space. Iteration between design and expt. increased the design success rate from 6% to 47%, produced stable proteins unlike those found in nature for topologies where design was initially unsuccessful, and revealed subtle contributions to stability as designs became increasingly optimized. Our approach achieves the long-standing goal of a tight feedback cycle between computation and expt. and has the potential to transform computational protein design into a data-driven science.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFOjs7rK&md5=0c089edbcc1309b72f412cfe72d149cf
24. 24
  Sumbalova, L.; Stourac, J.; Martinek, T.; Bednar, D.; Damborsky, J. HotSpot Wizard 3.0: Web Server for Automated Design of Mutations and Smart Libraries Based on Sequence Input Information. Nucleic Acids Res. 2018, 46, W356– W362, DOI: 10.1093/nar/gky417
  
  24
  HotSpot Wizard 3.0: web server for automated design of mutations and smart libraries based on sequence input information
  
  Sumbalova, Lenka; Stourac, Jan; Martinek, Tomas; Bednar, David; Damborsky, Jiri
  
  Nucleic Acids Research (2018), 46 (W1), W356-W362CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)
  
  HotSpot Wizard is a web server used for the automated identification of hotspots in semi-rational protein design to give improved protein stability, catalytic activity, substrate specificity and enantioselectivity. Since there are three orders of magnitude fewer protein structures than sequences in bioinformatic databases, the major limitation to the usability of previous versions was the requirement for the protein structure to be a compulsory input for the calcn. HotSpot Wizard 3.0 now accepts the protein sequence as input data. The protein structure for the query sequence is obtained either from eight repositories of homol. models or is modeled using Modeller and I-Tasser. The quality of the models is then evaluated using three quality assessment tools--WHAT CHECK, PROCHECK and Mol- Probity. During follow-up analyses, the system automatically warns the users whenever they attempt to redesign poorly predicted parts of their homol. models. The second main limitation of HotSpot Wizard's predictions is that it identifies suitable positions for mutagenesis, but does not provide any reliable advice on particular substitutions. A new module for the estn. of thermodn. stabilities using the Rosetta and FoldX suites has been introduced which prevents destabilizing mutations among pre-selected variants entering exptl. testing.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXosVyrs7s%253D&md5=2b71751334fae9917f809937b25e0c34
25. 25
  Kuipers, R. K.; Joosten, H.-J.; van Berkel, W. J. H.; Leferink, N. G. H.; Rooijen, E.; Ittmann, E.; van Zimmeren, F.; Jochens, H.; Bornscheuer, U.; Vriend, G.; Martins dos Santos, V. A. P.; Schaap, P. J. 3DM: Systematic Analysis of Heterogeneous Superfamily Data to Discover Protein Functionalities. Proteins: Struct., Funct., Bioinf. 2010, 78, 2101– 2113, DOI: 10.1002/prot.22725
  
  25
  3DM: systematic analysis of heterogeneous superfamily data to discover protein functionalities
  
  Kuipers, Remko K.; Joosten, Henk-Jan; van Berkel, Willem J. H.; Leferink, Nicole G. H.; Rooijen, Erik; Ittmann, Erik; van Zimmeren, Frank; Jochens, Helge; Bornscheuer, Uwe; Vriend, Gert; Martins dos Santos, Vitor A. P.; Schaap, Peter J.
  
  Proteins: Structure, Function, and Bioinformatics (2010), 78 (9), 2101-2113CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)
  
  Ten years of experience with mol. class-specific information systems (MCSIS) such as with the hand-curated G protein-coupled receptor database (GPCRDB) or the semiautomatically generated nuclear receptor database has made clear that a wide variety of questions can be answered when protein-related data from many different origins can be flexibly combined. MCSISes revolve around a multiple sequence alignment (MSA) that includes "all" available sequences from the entire superfamily, and it has been shown at many occasions that the quality of these alignments is the most crucial aspect of the MCSIS approach. We describe here a system called 3DM that can automatically build an entire MCSIS. 3DM bases the MSA on a multiple structure alignment, which implies that the availability of a large no. of superfamily members with a known three-dimensional structure is a requirement for 3DM to succeed well. Thirteen MCSISes were constructed and placed on the Internet for examn. These systems have been instrumental in a large series of research projects related to enzyme activity or the understanding and engineering of specificity, protein stability engineering, DNA-diagnostics, drug design, and so forth.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXlslegtrc%253D&md5=0bff2b9dfe4df7986583e43e9a7b1682
26. 26
  Reetz, M. T.; Carballeira, J. D. Iterative Saturation Mutagenesis (ISM) for Rapid Directed Evolution of Functional Enzymes. Nat. Protoc. 2007, 2, 891– 903, DOI: 10.1038/nprot.2007.72
  
  26
  Iterative saturation mutagenesis (ISM) for rapid directed evolution of functional enzymes
  
  Reetz, Manfred T.; Carballeira, Jose Daniel
  
  Nature Protocols (2007), 2 (4), 891-903CODEN: NPARDW; ISSN:1750-2799. (Nature Publishing Group)
  
  Iterative satn. mutagenesis (ISM) is a new and efficient method for the directed evolution of functional enzymes. It reduces the necessary mol. biol. work and the screening effort drastically. It is based on a Cartesian view of the protein structure, performing iterative cycles of satn. mutagenesis at rationally chosen sites in an enzyme, a given site being composed of one, two or three amino acid positions. The basis for choosing these sites depends on the nature of the catalytic property to be improved, e.g., enantioselectivity, substrate acceptance or thermostability. In the case of thermostability, sites showing highest B-factors (available from x-ray data) are chosen. The pronounced increase in thermostability of the lipase from Bacillus subtilis (Lip A) as a result of applying ISM is illustrated here.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXhtFGnur%252FP&md5=03f309e6d923d5c2506e3718362b7ee7
27. 27
  Liskova, V.; Stepankova, V.; Bednar, D.; Brezovsky, J.; Prokop, Z.; Chaloupkova, R.; Damborsky, J. Different Structural Origins of the Enantioselectivity of Haloalkane Dehalogenases toward Linear β-Haloalkanes: Open-Solvated versus Occluded-Desolvated Active Sites. Angew. Chem., Int. Ed. 2017, 56, 4719– 4723, DOI: 10.1002/anie.201611193
  
  27
  Different Structural Origins of the Enantioselectivity of Haloalkane Dehalogenases toward Linear β-Haloalkanes: Open-Solvated versus Occluded-Desolvated Active Sites
  
  Liskova, Veronika; Stepankova, Veronika; Bednar, David; Brezovsky, Jan; Prokop, Zbynek; Chaloupkova, Radka; Damborsky, Jiri
  
  Angewandte Chemie, International Edition (2017), 56 (17), 4719-4723CODEN: ACIEF5; ISSN:1433-7851. (Wiley-VCH Verlag GmbH & Co. KGaA)
  
  The enzymic enantiodiscrimination of linear β-haloalkanes is difficult because the simple structures of the substrates prevent directional interactions. Herein we describe two distinct mol. mechanisms for the enantiodiscrimination of the β-haloalkane 2-bromopentane by haloalkane dehalogenases. Highly enantioselective DbjA has an open, solvent-accessible active site, whereas the engineered enzyme DhaA31 has an occluded and less solvated cavity but shows similar enantioselectivity. The enantioselectivity of DhaA31 arises from steric hindrance imposed by two specific substitutions rather than hydration as in DbjA.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXkvFWlsrc%253D&md5=180d88a209ea61b5e45f3e03c826db89
28. 28
  Bar-Even, A.; Noor, E.; Savir, Y.; Liebermeister, W.; Davidi, D.; Tawfik, D. S.; Milo, R. The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme Parameters. Biochemistry 2011, 50, 4402– 4410, DOI: 10.1021/bi2002289
  
  28
  The Moderately Efficient Enzyme: Evolutionary and Physicochemical Trends Shaping Enzyme Parameters
  
  Bar-Even, Arren; Noor, Elad; Savir, Yonatan; Liebermeister, Wolfram; Davidi, Dan; Tawfik, Dan S.; Milo, Ron
  
  Biochemistry (2011), 50 (21), 4402-4410CODEN: BICHAW; ISSN:0006-2960. (American Chemical Society)
  
  The kinetic parameters of enzymes are key to understanding the rate and specificity of most biol. processes. Although specific trends are frequently studied for individual enzymes, global trends are rarely addressed. We performed an anal. of kcat and KM values of several thousand enzymes collected from the literature. We found that the "av. enzyme" exhibits a kcat of ∼10 s-1 and a kcat/KM of ∼ 105 s-1 M-1, much below the diffusion limit and the characteristic textbook portrayal of kinetically superior enzymes. Why do most enzymes exhibit moderate catalytic efficiencies Maximal rates may not evolve in cases where weaker selection pressures are expected. We find, for example, that enzymes operating in secondary metab. are, on av., ∼ 30-fold slower than those of central metab. We also find indications that the physicochem. properties of substrates affect the kinetic parameters. Specifically, low mol. mass and hydrophobicity appear to limit KM optimization. In accordance, substitution with phosphate, CoA, or other large modifiers considerably lowers the KM values of enzymes utilizing the substituted substrates. It therefore appears that both evolutionary selection pressures and physicochem. constraints shape the kinetic parameters of enzymes. It also seems likely that the catalytic efficiency of some enzymes toward their natural substrates could be increased in many cases by natural or lab. evolution.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXlsFWnur8%253D&md5=6cca5d0e98fe4f835de63adfe4059a56
29. 29
  Balchin, D.; Hayer-Hartl, M.; Hartl, F. U. In Vivo Aspects of Protein Folding and Quality Control. Science 2016, 353, aac4354, DOI: 10.1126/science.aac4354
  
  There is no corresponding record for this reference.
30. 30
  Colón, W.; Church, J.; Sen, J.; Thibeault, J.; Trasatti, H.; Xia, K. Biological Roles of Protein Kinetic Stability. Biochemistry 2017, 56, 6179– 6186, DOI: 10.1021/acs.biochem.7b00942
  
  30
  Biological Roles of Protein Kinetic Stability
  
  Colon, Wilfredo; Church, Jennifer; Sen, Jayeeta; Thibeault, Jane; Trasatti, Hannah; Xia, Ke
  
  Biochemistry (2017), 56 (47), 6179-6186CODEN: BICHAW; ISSN:0006-2960. (American Chemical Society)
  
  A review. A protein's stability may range from non-existent, as in the case of intrinsically disordered proteins, to very high, as indicated by a protein's resistance to degrdn., even under relatively harsh conditions. The stability of this latter group is usually under kinetic control due to a high activation energy for unfolding that virtually traps the protein in a specific conformation, thereby conferring resistance to proteolytic degrdn. and misfolding-aggregation. The usual outcome of kinetic stability is a longer protein half-life. Thus, the protective role of protein kinetic stability is often appreciated, but relatively little is known about the extent of biol. roles related to this property. Here, we discuss several known or putative biol. roles of protein kinetic stability, including protection from stressors to avoid aggregation or premature degrdn., achieving long-term phenotypic change, and regulating cellular processes by controlling the trigger and timing of mol. motion. The picture that emerges from this anal. is that protein kinetic stability is involved in a myriad of known and yet to be discovered biol. functions via its ability to resist degrdn. and control the timing, extent, and permanency of mol. motion.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhslentrfL&md5=b7e0dd86dd97d503913bcab967ee7495
31. 31
  Khersonsky, O.; Kiss, G.; Röthlisberger, D.; Dym, O.; Albeck, S.; Houk, K. N.; Baker, D.; Tawfik, D. S. Bridging the Gaps in Design Methodologies by Evolutionary Optimization of the Stability and Proficiency of Designed Kemp Eliminase KE59. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 10358– 10363, DOI: 10.1073/pnas.1121063109
  
  31
  Bridging the gaps in design methodologies by evolutionary optimization of the stability and proficiency of designed Kemp eliminase KE59
  
  Khersonsky, Olga; Kiss, Gert; Rothlisberger, Daniela; Dym, Orly; Albeck, Shira; Houk, Kendall N.; Baker, David; Tawfik, Dan S.
  
  Proceedings of the National Academy of Sciences of the United States of America (2012), 109 (26), 10358-10363, S10358/1-S10358/47CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)
  
  Computational design is a test of our understanding of enzyme catalysis and a means of engineering novel, tailor-made enzymes. While the de novo computational design of catalytically efficient enzymes remains a challenge, designed enzymes may comprise unique starting points for further optimization by directed evolution. Directed evolution of two computationally designed Kemp eliminases, KE07 and KE70, led to low to moderately efficient enzymes (kcat/Km values of ≤5 × 104 M-1s-1). Here we describe the optimization of a third design, KE59. Although KE59 was the most catalytically efficient Kemp eliminase from this design series (by kcat/Km, and by catalyzing the elimination of nonactivated benzisoxazoles), its impaired stability prevented its evolutionary optimization. To boost KE59's evolvability, stabilizing consensus mutations were included in the libraries throughout the directed evolution process. The libraries were also screened with less activated substrates. Sixteen rounds of mutation and selection led to >2000-fold increase in catalytic efficiency, mainly via higher kcat values. The best KE59 variants exhibited kcat/Km values up to 0.6 × 106 M-1s-1, and kcat/kuncat values of ≤107 almost regardless of substrate reactivity. Biochem., structural, and mol. dynamics (MD) simulation studies provided insights regarding the optimization of KE59. Overall, the directed evolution of three different designed Kemp eliminases, KE07, KE70, and KE59, demonstrates that computational designs are highly evolvable and can be optimized to high catalytic efficiencies.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhtFWgt7rE&md5=da69b08f6e228aa3c871519287f12688
32. 32
  Taverna, D. M.; Goldstein, R. A. Why Are Proteins Marginally Stable?. Proteins: Struct., Funct., Genet. 2002, 46, 105– 109, DOI: 10.1002/prot.10016
  
  There is no corresponding record for this reference.
33. 33
  Sanchez-Ruiz, J. M. Protein Kinetic Stability. Biophys. Chem. 2010, 148, 1– 15, DOI: 10.1016/j.bpc.2010.02.004
  
  33
  Protein kinetic stability
  
  Sanchez-Ruiz, Jose M.
  
  Biophysical Chemistry (2010), 148 (1-3), 1-15CODEN: BICIAZ; ISSN:0301-4622. (Elsevier B.V.)
  
  A review. The relevance of protein stability for biol. function and mol. evolution is widely recognized. Protein stability, however, comes in 2 flavors: (1) thermodn. stability, which is related to a low amt. of unfolded and partially-unfolded states in equil. with the native, functional protein, and (2) kinetic stability, which is related to a high free energy barrier "sepg." the native state from the non-functional forms (unfolded states, irreversibly-denatured protein). Such a barrier may guarantee that the biol. function of the protein is maintained, at least during a physiol. relevant time-scale, even if the native state is not thermodynamically stable with respect to non-functional forms. Kinetic stabilization is likely required in many cases, since proteins often work under conditions (harsh extracellular or crowded intracellular environments) in which deleterious alterations (proteolysis, aggregation, undesirable interactions with other macromol. components) are prone to occur. Also, kinetic stability may provide a mechanism for the evolution of optimal functional properties. Furthermore, enhancement of kinetic stability is essential for many biotechnol. applications of proteins. Despite all of this, many published studies focus on thermodn. stability, partly because it can be easily quantified in vitro for small model proteins and, also, because of the availability of computational algorithms to est. mutation effects on thermodn. stability. Here, the opposite bias is purposely adopted: the exptl. evidence supporting widespread kinetic stabilization of proteins is summarized, the role of natural selection in detg. this feature is discussed, possible mol. mechanisms responsible for kinetic stability are described, and the relation between kinetic destabilization and protein misfolding diseases is highlighted.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXkvFCju7c%253D&md5=b0bccbeea28fd182559a4381889877a6
34. 34
  Bommarius, A. S.; Paye, M. F. Stabilizing Biocatalysts. Chem. Soc. Rev. 2013, 42, 6534– 6565, DOI: 10.1039/c3cs60137d
  
  34
  Stabilizing biocatalysts
  
  Bommarius, Andreas S.; Paye, Marietou F.
  
  Chemical Society Reviews (2013), 42 (15), 6534-6565CODEN: CSRVBR; ISSN:0306-0012. (Royal Society of Chemistry)
  
  A review. The area of biocatalysis itself is in rapid development, fueled by both an enhanced repertoire of protein engineering tools and an increasing list of solved problems. Biocatalysts, however, are delicate materials that hover close to the thermodn. limit of stability. In many cases, they need to be stabilized to survive a range of challenges regarding temp., pH value, salt type and concn., co-solvents, as well as shear and surface forces. Biocatalysts may be delicate proteins, however, once stabilized, they are efficiently active enzymes. Kinetic stability must be achieved to a level satisfactory for large-scale process application. Kinetic stability evokes resistance to degrdn. and maintained or increased catalytic efficiency of the enzyme in which the desired reaction is accomplished at an increased rate. However, beyond these limitations, stable biocatalysts can be operated at higher temps. or co-solvent concns., with ensuing redn. in microbial contamination, better soly., as well as in many cases more favorable equil., and can serve as more effective templates for combinatorial and data-driven protein engineering. To increase thermodn. and kinetic stability, immobilization, protein engineering, and medium engineering of biocatalysts are available, the main focus of this work. In the case of protein engineering, there are three main approaches to enhancing the stability of protein biocatalysts: (i) rational design, based on knowledge of the 3D-structure and the catalytic mechanism, (ii) combinatorial design, requiring a protocol to generate diversity at the genetic level, a large, often high throughput, screening capacity to distinguish hits' from misses', and (iii) data-driven design, fueled by the increased availability of nucleotide and amino acid sequences of equiv. functionality.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtVKhtrzI&md5=5ba9406e8a7666b99704af985258606f
35. 35
  Goldenzweig, A.; Fleishman, S. J. Principles of Protein Stability and Their Application in Computational Design. Annu. Rev. Biochem. 2018, 87, 105– 129, DOI: 10.1146/annurev-biochem-062917-012102
  
  35
  Principles of Protein Stability and Their Application in Computational Design
  
  Goldenzweig, Adi; Fleishman, Sarel J.
  
  Annual Review of Biochemistry (2018), 87 (), 105-129CODEN: ARBOAW; ISSN:0066-4154. (Annual Reviews)
  
  A review. Proteins are increasingly used in basic and applied biomedical research. Many proteins, however, are only marginally stable and can be expressed in limited amts., thus hampering research and applications. Research has revealed the thermodn., cellular, and evolutionary principles and mechanisms that underlie marginal stability. With this growing understanding, computational stability design methods have advanced over the past two decades starting from methods that selectively addressed only some aspects of marginal stability. Current methods are more general and, by combining phylogenetic anal. with atomistic design, have shown drastic improvements in soly., thermal stability, and aggregation resistance while maintaining the protein's primary mol. activity. Stability design is opening the way to rational engineering of improved enzymes, therapeutics, and vaccines and to the application of protein design methodol. to large proteins and mol. activities that have proven challenging in the past.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXitFyqt7k%253D&md5=d5b820508142e79dafe127543f1ad6b7
36. 36
  Hansen, N.; van Gunsteren, W. F. Practical Aspects of Free-Energy Calculations: A Review. J. Chem. Theory Comput. 2014, 10, 2632– 2647, DOI: 10.1021/ct500161f
  
  36
  Practical Aspects of Free-Energy Calculations: A Review
  
  Hansen, Niels; van Gunsteren, Wilfred F.
  
  Journal of Chemical Theory and Computation (2014), 10 (7), 2632-2647CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)
  
  A review. Free-energy calcns. in the framework of classical mol. dynamics simulations are nowadays used in a wide range of research areas including solvation thermodn., mol. recognition, and protein folding. The basic components of a free-energy calcn., i.e., a suitable model Hamiltonian, a sampling protocol, and an estimator for the free energy, are independent of the specific application. However, the attention that one has to pay to these components depends considerably on the specific application. Here, we review six different areas of application and discuss the relative importance of the three main components to provide the reader with an organigram and to make nonexperts aware of the many pitfalls present in free energy calcns.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXotlWjs7k%253D&md5=096fd11727692a87b0884e50bcb4a5e3
37. 37
  Polizzi, K. M.; Bommarius, A. S.; Broering, J. M.; Chaparro-Riggers, J. F. Stability of Biocatalysts. Curr. Opin. Chem. Biol. 2007, 11, 220– 225, DOI: 10.1016/j.cbpa.2007.01.685
  
  37
  Stability of biocatalysts
  
  Polizzi, Karen M.; Bommarius, Andreas S.; Broering, James M.; Chaparro-Riggers, Javier F.
  
  Current Opinion in Chemical Biology (2007), 11 (2), 220-225CODEN: COCBF4; ISSN:1367-5931. (Elsevier B.V.)
  
  A review. Here, the authors highlight recent research on the stabilization of enzymes using both chem. and biol. means to increase the lifetime of the biocatalyst. Despite their many favorable qualities, the marginal stability of biocatalysts in many types of reaction media often has prevented or delayed their implementation for industrial-scale synthesis of fine chems. and pharmaceuticals. Consequently, there is great interest in understanding the effects of soln. conditions on protein stability, as well as in developing strategies to improve protein stability in desired reaction media. Recent methods include novel chem. modifications of proteins, lyophilization in the presence of additives, and phys. immobilization on novel supports. Rational and combinatorial protein engineering techniques have been used to yield unmodified proteins with exceptionally improved stability. Both have been aided by the development of computational tools and structure-guided heuristics aimed at reducing library sizes that must be generated and screened to identify improved mutants. The no. of parameters used to indicate protein stability can complicate discussions and investigations, and care should be taken to identify whether thermodn. or kinetic stability limits the obsd. stability of proteins. Although the useful lifetime of a biocatalyst is dictated by its kinetic stability, only 6% of protein stability studies use kinetic stability measures. Clearly, more effort is needed to study how soln. conditions impact protein kinetic stability.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXjvFegurk%253D&md5=0b7ef267bbff8fcbdc29ca772079fc57
38. 38
  Buck, P. M.; Kumar, S.; Wang, X.; Agrawal, N. J.; Trout, B. L.; Singh, S. K. Computational Methods To Predict Therapeutic Protein Aggregation. Methods Mol. Biol. 2012, 899, 425– 451, DOI: 10.1007/978-1-61779-921-1_26
  
  38
  Computational methods to predict therapeutic protein aggregation
  
  Buck, Patrick M.; Kumar, Sandeep; Wang, Xiaoling; Agrawal, Neeraj J.; Trout, Bernhardt L.; Singh, Satish K.
  
  Methods in Molecular Biology (New York, NY, United States) (2012), 899 (Therapeutic Proteins), 425-451CODEN: MMBIED; ISSN:1064-3745. (Springer)
  
  A review. Protein based biotherapeutics have emerged as a successful class of pharmaceuticals. However, these macromols. endure a variety of physicochem. degrdns. during manufg., shipping, and storage, which may adversely impact the drug product quality. Of these degrdns., the irreversible self-assocn. of therapeutic proteins to form aggregates is a major challenge in the formulation of these mols. Tools to predict and mitigate protein aggregation are, therefore, of great interest to biopharmaceutical research and development. In this chapter, a no. of such computational tools developed to understand and predict the various steps involved in protein aggregation are described. These tools can be grouped into three general classes: unfolding kinetics and native state thermal stability, colloidal stability, and sequence/structure based aggregation liabilities. Chapter sections introduce each class by discussing how these predictive tools provide insight into the mol. events leading to protein aggregation. The computational methods are then explained in detail along with their advantages and limitations.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXitFags7c%253D&md5=a7b334801df3f41c32bb81f50bf967b4
39. 39
  Jaswal, S. S.; Sohl, J. L.; Davis, J. H.; Agard, D. A. Energetic Landscape of α-Lytic Protease Optimizes Longevity through Kinetic Stability. Nature 2002, 415, 343– 346, DOI: 10.1038/415343a
  
  39
  Energetic landscape of α-lytic protease optimizes longevity through kinetic stability
  
  Jaswal, Shella S.; Sohl, Julie L.; Davis, Jonathan H.; Agard, David A.
  
  Nature (London, United Kingdom) (2002), 415 (6869), 343-346CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)
  
  During the evolution of proteins the pressure to optimize biol. activity is moderated by a need for efficient folding. For most proteins, this is accomplished through spontaneous folding to a thermodynamically stable and active native state. However, in the extracellular bacterial α-lytic protease (αLP) these two processes have become decoupled. The native state of αLP is thermodynamically unstable, and when denatured, requires millennia (t1/2 ∼ 1800 yr) to refold. Folding is made possible by an attached folding catalyst, the pro-region, which is degraded on completion of folding, leaving αLP trapped in its native state by a large kinetic unfolding barrier (t1/2 ∼ 1.2 yr). αLP faces two very different folding landscapes: one in the presence of the pro-region controlling folding, and one in its absence restricting unfolding. Here we demonstrate that this sepn. of folding and unfolding pathways has removed constraints placed on the folding of thermodynamically stable proteins, and allowed the evolution of a native state having markedly reduced dynamic fluctuations. This, in turn, has led to a significant extension of the functional lifetime of αLP by the optimal suppression of proteolytic sensitivity.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38Xptlaktg%253D%253D&md5=9d150430ebb485d561b9a8d71cab305b
40. 40
  Young, T. A.; Skordalakes, E.; Marqusee, S. Comparison of Proteolytic Susceptibility in Phosphoglycerate Kinases from Yeast and E. coli: Modulation of Conformational Ensembles Without Altering Structure or Stability. J. Mol. Biol. 2007, 368, 1438– 1447, DOI: 10.1016/j.jmb.2007.02.077
  
  40
  Comparison of Proteolytic Susceptibility in Phosphoglycerate Kinases from Yeast and E. coli: Modulation of Conformational Ensembles Without Altering Structure or Stability
  
  Young, Tracy A.; Skordalakes, Emmanuel; Marqusee, Susan
  
  Journal of Molecular Biology (2007), 368 (5), 1438-1447CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)
  
  Escherichia coli phosphoglycerate kinase (PGK) is resistant to proteolytic cleavage while the yeast homolog from Saccharomyces cerevisiae is not. We have explored the biophys. basis of this surprising difference. The sequences of these homologs are 39% identical and 56% similar. Detn. of the crystal structure for the E. coli protein and comparison to the previously solved yeast structure reveals that the two proteins have extremely similar tertiary structures, and their global stabilities detd. by equil. denaturation are also very similar. The extrapolated unfolding rate of E. coli PGK is, however, 105 slower than that of the yeast homolog. This surprisingly large difference in unfolding rates appears to arise from a divergence in the extent of cooperativity between the two structural domains (the N and C-domains) that make up these kinases. This is supported by: (1) the C-domain of E. coli PGK cannot be expressed or fold independently of the N-domain, while both domains of the yeast protein fold in isolation into stable structures and (2) the energetics and kinetics of the proteolytically sensitive state of E. coli PGK match those for global unfolding. This suggests that proteolysis occurs from the globally unfolded state of E. coli PGK, while the characteristics defining the yeast homolog suggest that proteolysis occurs upon unfolding of only the C-domain, with the N-domain remaining folded and consequently resistant to cleavage.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXkslCmsLs%253D&md5=31e892b4872a16ea5507c80dfe4c63d4
41. 41
  Shirke, A. N.; Basore, D.; Butterfoss, G. L.; Bonneau, R.; Bystroff, C.; Gross, R. A. Toward Rational Thermostabilization of Aspergillus Oryzae Cutinase: Insights into Catalytic and Structural Stability. Proteins: Struct., Funct., Genet. 2016, 84, 60– 72, DOI: 10.1002/prot.24955
  
  There is no corresponding record for this reference.
42. 42
  Liu, B.; Zhang, J.; Li, B.; Liao, X.; Du, G.; Chen, J. Expression and Characterization of Extreme Alkaline, Oxidation-Resistant Keratinase from Bacillus Licheniformis in Recombinant Bacillus Subtilis WB600 Expression System and Its Application in Wool Fiber Processing. World J. Microbiol. Biotechnol. 2013, 29, 825– 832, DOI: 10.1007/s11274-012-1237-5
  
  42
  Expression and characterization of extreme alkaline, oxidation-resistant keratinase from Bacillus licheniformis in recombinant Bacillus subtilis WB600 expression system and its application in wool fiber processing
  
  Liu, Baihong; Zhang, Juan; Li, Ben; Liao, Xiangru; Du, Guocheng; Chen, Jian
  
  World Journal of Microbiology & Biotechnology (2013), 29 (5), 825-832CODEN: WJMBEY; ISSN:0959-3993. (Springer)
  
  A keratin-degrading bacterium of Bacillus licheniformis BBE11-1 was isolated and its ker gene encoding keratinase with native signal peptide was cloned and expressed in Bacillus subtilis WB600 under the strong PHpaII promoter of the pMA0911 vector. In the 3-L fermenter, the recombinant keratinase was secreted with 323 units/mL when non-induced after 24 h at 37 °C. And then, keratinase was concd. and purified by hydrophobic interaction chromatog. using HiTrap Phenyl-Sepharose Fast Flow. The recombinant keratinase had an optimal temp. and the pH at 40 °C and 10.5, resp., and was stable at 10-50 °C and pH 7-11.5. We found this enzyme can retained 80 % activity after treated 5 h with 1 M H2O2, it was activated by Mg2+, Co2+ and could degraded broad substrates such as degraded feather, bovine serum albumin, casein, gelatin, the keratinase was considered to be a serine protease. Coordinate with Savinase, the keratinase could efficient prevent shrinkage and eliminate fibers of wool, which showed its potential in textile industries and detergent industries.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXlsFWnsb4%253D&md5=ff6b515ff39b7f5d5bb752ba1a1c1ce8
43. 43
  Nguyen, V.; Wilson, C.; Hoemberger, M.; Stiller, J. B.; Agafonov, R. V.; Kutter, S.; English, J.; Theobald, D. L.; Kern, D. Evolutionary Drivers of Thermoadaptation in Enzyme Catalysis. Science 2017, 355, 289– 294, DOI: 10.1126/science.aah3717
  
  43
  Evolutionary drivers of thermoadaptation in enzyme catalysis
  
  Nguyen, Vy; Wilson, Christopher; Hoemberger, Marc; Stiller, John B.; Agafonov, Roman V.; Kutter, Steffen; English, Justin; Theobald, Douglas L.; Kern, Dorothee
  
  Science (Washington, DC, United States) (2017), 355 (6322), 289-294CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)
  
  With early life likely to have existed in a hot environment, enzymes had to cope with an inherent drop in catalytic speed caused by lowered temp. Here we characterize the mol. mechanisms underlying thermoadaptation of enzyme catalysis in adenylate kinase using ancestral sequence reconstruction spanning 3 billion years of evolution. We show that evolution solved the enzyme's key kinetic obstacle - how to maintain catalytic speed on a cooler Earth - by exploiting transition-state heat capacity. Tracing the evolution of enzyme activity and stability from the hot-start toward modern hyperthermophilic, mesophilic, and psychrophilic organisms illustrates active pressure vs. passive drift in evolution on a mol. level, refutes the debated activity/stability trade-off, and suggests that the catalytic speed of adenylate kinase is an evolutionary driver for organismal fitness.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVehtbk%253D&md5=f2d5eebf186da3323268e68831d65e49
44. 44
  Risso, V. A.; Gavira, J. A.; Gaucher, E. A.; Sanchez-Ruiz, J. M. Phenotypic Comparisons of Consensus Variants versus Laboratory Resurrections of Precambrian Proteins. Proteins: Struct., Funct., Genet. 2014, 82, 887– 896, DOI: 10.1002/prot.24575
  
  There is no corresponding record for this reference.
45. 45
  Bednar, D.; Beerens, K.; Sebestova, E.; Bendl, J.; Khare, S.; Chaloupkova, R.; Prokop, Z.; Brezovsky, J.; Baker, D.; Damborsky, J. FireProt: Energy- and Evolution-Based Computational Design of Thermostable Multiple-Point Mutants. PLoS Comput. Biol. 2015, 11, e1004556, DOI: 10.1371/journal.pcbi.1004556
  
  45
  FireProt: energy- and evolution-based computational design of thermostable multiple-point mutants
  
  Bednar, David; Beerens, Koen; Sebestova, Eva; Bendl, Jaroslav; Khare, Sagar; Chaloupkova, Radka; Prokop, Zbynek; Brezovsky, Jan; Baker, David; Damborsky, Jiri
  
  PLoS Computational Biology (2015), 11 (11), e1004556/1-e1004556/20CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)
  
  There is great interest in increasing proteins' stability to enhance their utility as biocatalysts, therapeutics, diagnostics and nanomaterials. Directed evolution is a powerful, but exptl. strenuous approach. Computational methods offer attractive alternatives. However, due to the limited reliability of predictions and potentially antagonistic effects of substitutions, only single-point mutations are usually predicted in silico, exptl. verified and then recombined in multiple-point mutants. Thus, substantial screening is still required. Here we present FireProt, a robust computational strategy for predicting highly stable multiple-point mutants that combines energy- and evolution-based approaches with smart filtering to identify additive stabilizing mutations. FireProt's reliability and applicability was demonstrated by validating its predictions against 656 mutations from the ProTherm database. We demonstrate that thermostability of the model enzymes haloalkane dehalogenase DhaA and γ-hexachlorocyclohexane dehydrochlorinase LinA can be substantially increased (ΔTm = 24°C and 21°C) by constructing and characterizing only a handful of multiple-point mutants. FireProt can be applied to any protein for which a tertiary structure and homologous sequences are available, and will facilitate the rapid development of robust proteins for biomedical and biotechnol. applications.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XkvVKhtb4%253D&md5=82389328fe2da01f6f99eba4afe20f40
46. 46
  Babkova, P.; Sebestova, E.; Brezovsky, J.; Chaloupkova, R.; Damborsky, J. Ancestral Haloalkane Dehalogenases Show Robustness and Unique Substrate Specificity. ChemBioChem 2017, 18, 1448– 1456, DOI: 10.1002/cbic.201700197
  
  46
  Ancestral Haloalkane Dehalogenases Show Robustness and Unique Substrate Specificity
  
  Babkova, Petra; Sebestova, Eva; Brezovsky, Jan; Chaloupkova, Radka; Damborsky, Jiri
  
  ChemBioChem (2017), 18 (14), 1448-1456CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)
  
  Ancestral sequence reconstruction (ASR) represents a powerful approach for empirical testing structure-function relationships of diverse proteins. We employed ASR to predict sequences of five ancestral haloalkane dehalogenases (HLDs) from the HLD-II subfamily. Genes encoding the inferred ancestral sequences were synthesized and expressed in Escherichia coli, and the resurrected ancestral enzymes (AncHLD1-5) were exptl. characterized. Strikingly, the ancestral HLDs exhibited significantly enhanced thermodn. stability compared to extant enzymes (ΔTm up to 24 °C), as well as higher specific activities with preference for short multi-substituted halogenated substrates. Moreover, multivariate statistical anal. revealed a shift in the substrate specificity profiles of AncHLD1 and AncHLD2. This is extremely difficult to achieve by rational protein engineering. The study highlights that ASR is an efficient approach for the development of novel biocatalysts and robust templates for directed evolution.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXpsFSqsLs%253D&md5=f732a67defa3a56ea468a91bc1c345dd
47. 47
  Goldenzweig, A.; Goldsmith, M.; Hill, S. E.; Gertman, O.; Laurino, P.; Ashani, Y.; Dym, O.; Unger, T.; Albeck, S.; Prilusky, J.; Lieberman, R. L.; Aharoni, A.; Silman, I.; Sussman, J. L.; Tawfik, D. S.; Fleishman, S. J. Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and Stability. Mol. Cell 2016, 63, 337– 346, DOI: 10.1016/j.molcel.2016.06.012
  
  47
  Automated Structure- and Sequence-Based Design of Proteins for High Bacterial Expression and Stability
  
  Goldenzweig, Adi; Goldsmith, Moshe; Hill, Shannon E.; Gertman, Or; Laurino, Paola; Ashani, Yacov; Dym, Orly; Unger, Tamar; Albeck, Shira; Prilusky, Jaime; Lieberman, Raquel L.; Aharoni, Amir; Silman, Israel; Sussman, Joel L.; Tawfik, Dan S.; Fleishman, Sarel J.
  
  Molecular Cell (2016), 63 (2), 337-346CODEN: MOCEFL; ISSN:1097-2765. (Elsevier Inc.)
  
  Upon heterologous overexpression, many proteins misfold or aggregate, thus resulting in low functional yields. Human acetylcholinesterase (hAChE), an enzyme mediating synaptic transmission, is a typical case of a human protein that necessitates mammalian systems to obtain functional expression. We developed a computational strategy and designed an AChE variant bearing 51 mutations that improved core packing, surface polarity, and backbone rigidity. This variant expressed at ∼2,000-fold higher levels in E. coli compared to wild-type hAChE and exhibited 20°C higher thermostability with no change in enzymic properties or in the active-site configuration as detd. by crystallog. To demonstrate broad utility, we similarly designed four other human and bacterial proteins. Testing at most three designs per protein, we obtained enhanced stability and/or higher yields of sol. and active protein in E. coli. Our algorithm requires only a 3D structure and several dozen sequences of naturally occurring homologs, and is available at http://pross.weizmann.ac.il.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhtFyiur7L&md5=b0a4f7734048636b9c30bd9449b4d4a1
48. 48
  Hammes, G. G.; Chang, Y.-C.; Oas, T. G. Conformational Selection or Induced Fit: A Flux Description of Reaction Mechanism. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 13737, DOI: 10.1073/pnas.0907195106
  
  48
  Conformational selection or induced fit: a flux description of reaction mechanism
  
  Hammes, Gordon G.; Chang, Yu-Chu; Oas, Terrence G.
  
  Proceedings of the National Academy of Sciences of the United States of America (2009), 106 (33), 13737-13741CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)
  
  The mechanism of ligand binding coupled to conformational changes in macromols. has recently attracted considerable interest. The 2 limiting cases are the "induced fit" mechanism (binding first) or "conformational selection" (conformational change first). Described here are the criteria by which the sequence of events can be detd. quant. The relative importance of the 2 pathways is detd. not by comparing rate consts. (a common misconception) but instead by comparing the flux through each pathway. The simple rules for calcg. flux in multistep mechanisms are described and then applied to 2 examples from the literature, neither of which has previously been analyzed using the concept of flux. The first example is the mechanism of conformational change in the binding of NADPH to dihydrofolate reductase (DHFR). The second example is the mechanism of flavodoxin folding coupled to binding of its cofactor, FMN. In both cases, the mechanism switches from being dominated by the conformational selection pathway at low ligand concn. to induced fit at high ligand concn. Over a wide range of conditions, a significant fraction of the flux occurs through both pathways. Such a mixed mechanism likely will be discovered for many cases of coupled conformational change and ligand binding when kinetic data are analyzed by using a flux-based approach.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtFWksL3N&md5=8c3f865f8c53b2d597ec26c8bba27fb3
49. 49
  Kramer, R. M.; Shende, V. R.; Motl, N.; Pace, C. N.; Scholtz, J. M. Toward a Molecular Understanding of Protein Solubility: Increased Negative Surface Charge Correlates with Increased Solubility. Biophys. J. 2012, 102, 1907– 1915, DOI: 10.1016/j.bpj.2012.01.060
  
  49
  Toward a Molecular Understanding of Protein Solubility: Increased Negative Surface Charge Correlates with Increased Solubility
  
  Kramer, Ryan M.; Shende, Varad R.; Motl, Nicole; Pace, C. Nick; Scholtz, J. Martin
  
  Biophysical Journal (2012), 102 (8), 1907-1915CODEN: BIOJAU; ISSN:0006-3495. (Cell Press)
  
  Protein soly. is a problem for many protein chemists, including structural biologists and developers of protein pharmaceuticals. Knowledge about how intrinsic factors influence soly. is limited due to the difficulty of obtaining quant. soly. measurements. Soly. measurements in buffer alone are difficult to reproduce, because gels or supersatd. solns. often form, making it impossible to det. soly. values for many proteins. Protein precipitants can be used to obtain comparative soly. measurements and, in some cases, estns. of soly. in buffer alone. Protein precipitants fall into three broad classes: salts, long-chain polymers, and org. solvents. Here, we compare the use of representatives from two classes of precipitants, ammonium sulfate and polyethylene glycol 8000, by measuring the soly. of seven proteins. We find that increased neg. surface charge correlates strongly with increased protein soly. and may be due to strong binding of water by the acidic amino acids. We also find that the soly. results obtained for the two different precipitants agree closely with each other, suggesting that the two precipitants probe similar properties that are relevant to soly. in buffer alone.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XmtVeku7k%253D&md5=1425a6b01b8a36cf68766f853607cdf6
50. 50
  Khow, O.; Suntrarachun, S. Strategies for Production of Active Eukaryotic Proteins in Bacterial Expression System. Asian Pac. J. Trop. Biomed. 2012, 2, 159– 162, DOI: 10.1016/S2221-1691(11)60213-X
  
  50
  Strategies for production of active eukaryotic proteins in bacterial expression system
  
  Khow, Orawan; Suntrarachun, Sunutcha
  
  Asian Pacific Journal of Tropical Biomedicine (2012), 2 (2), 159-162CODEN: APJTC7; ISSN:2221-1691. (Asian Pacific Tropical Medicine Press)
  
  A review. Bacteria have long been the favorite expression system for recombinant protein prodn. However, the flaw of the system is that insol. and inactive proteins are co-produced due to codon bias, protein folding, phosphorylation, glycosylation, mRNA stability and promoter strength. Factors are cited and the methods to convert to sol. and active proteins are described, for example a tight control of Escherichia coli milieu, refolding from inclusion body and through fusion technol.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XjtVKlsrc%253D&md5=73bf7eb097e7d1c12983ef331a2a06c6
51. 51
  Sørensen, H. P.; Mortensen, K. K. Soluble Expression of Recombinant Proteins in the Cytoplasm of Escherichia coli. Microb. Cell Fact. 2005, 4, 1, DOI: 10.1186/1475-2859-4-1
  
  51
  Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli
  
  Sorensen Hans Peter; Mortensen Kim Kusk
  
  Microbial cell factories (2005), 4 (1), 1 ISSN:.
  
  Pure, soluble and functional proteins are of high demand in modern biotechnology. Natural protein sources rarely meet the requirements for quantity, ease of isolation or price and hence recombinant technology is often the method of choice. Recombinant cell factories are constantly employed for the production of protein preparations bound for downstream purification and processing. Eschericia coli is a frequently used host, since it facilitates protein expression by its relative simplicity, its inexpensive and fast high density cultivation, the well known genetics and the large number of compatible molecular tools available. In spite of all these qualities, expression of recombinant proteins with E. coli as the host often results in insoluble and/or nonfunctional proteins. Here we review new approaches to overcome these obstacles by strategies that focus on either controlled expression of target protein in an unmodified form or by applying modifications using expressivity and solubility tags.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2sbnvFOksg%253D%253D&md5=f3fbb4b2b2bce0500b4aa4f806c23e0b
52. 52
  Hartl, F. U.; Bracher, A.; Hayer-Hartl, M. Molecular Chaperones in Protein Folding and Proteostasis. Nature 2011, 475, 324– 332, DOI: 10.1038/nature10317
  
  52
  Molecular chaperones in protein folding and proteostasis
  
  Hartl, F. Ulrich; Bracher, Andreas; Hayer-Hartl, Manajit
  
  Nature (London, United Kingdom) (2011), 475 (7356), 324-332CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)
  
  A review. Most proteins must fold into defined 3-dimensional structures to gain functional activity. However, in the cellular environment, newly synthesized proteins are at great risk of aberrant folding and aggregation, potentially forming toxic species. To avoid these dangers, cells invest in a complex network of mol. chaperones, which use ingenious mechanisms to prevent aggregation and promote efficient folding. Because protein mols. are highly dynamic, const. chaperone surveillance is required to ensure protein homeostasis (proteostasis). Recent advances suggest that an age-related decline in proteostasis capacity allows the manifestation of various protein-aggregation diseases, including Alzheimer's disease and Parkinson's disease. Interventions in these and numerous other pathol. states may spring from a detailed understanding of the pathways underlying proteome maintenance.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXpt1aqsb8%253D&md5=8d3045af796a78a2e587bafc3a49211e
53. 53
  Shaw, D. E.; Maragakis, P.; Lindorff-Larsen, K.; Piana, S.; Dror, R. O.; Eastwood, M. P.; Bank, J. A.; Jumper, J. M.; Salmon, J. K.; Shan, Y.; Wriggers, W. Atomic-Level Characterization of the Structural Dynamics of Proteins. Science 2010, 330, 341– 346, DOI: 10.1126/science.1187409
  
  53
  Atomic-Level Characterization of the Structural Dynamics of Proteins
  
  Shaw, David E.; Maragakis, Paul; Lindorff-Larsen, Kresten; Piana, Stefano; Dror, Ron O.; Eastwood, Michael P.; Bank, Joseph A.; Jumper, John M.; Salmon, John K.; Shan, Yibing; Wriggers, Willy
  
  Science (Washington, DC, United States) (2010), 330 (6002), 341-346CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)
  
  Mol. dynamics (MD) simulations are widely used to study protein motions at an at. level of detail, but they have been limited to time scales shorter than those of many biol. crit. conformational changes. We examd. two fundamental processes in protein dynamics-protein folding and conformational change within the folded state-by means of extremely long all-atom MD simulations conducted on a special-purpose machine. Equil. simulations of a WW protein domain captured multiple folding and unfolding events that consistently follow a well-defined folding pathway; sep. simulations of the protein's constituent substructures shed light on possible determinants of this pathway. A 1-ms simulation of the folded protein BPTI reveals a small no. of structurally distinct conformational states whose reversible interconversion is slower than local relaxations within those states by a factor of more than 1000.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXht1OisL%252FN&md5=85c9d897881e8684fc39d69b2b6b2fad
54. 54
  Englander, S. W.; Mayne, L. The Case for Defined Protein Folding Pathways. Proc. Natl. Acad. Sci. U. S. A. 2017, 114, 8253– 8258, DOI: 10.1073/pnas.1706196114
  
  54
  The case for defined protein folding pathways
  
  Englander, S. Walter; Mayne, Leland
  
  Proceedings of the National Academy of Sciences of the United States of America (2017), 114 (31), 8253-8258CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)
  
  We consider the differences between the many-pathway protein folding model derived from theor. energy landscape considerations and the defined-pathway model derived from expt. A basic tenet of the energy landscape model is that proteins fold through many heterogeneous pathways by way of amino acid-level dynamics biased toward selecting native-like interactions. The many pathways imagined in the model are not obsd. in the structure-formation stage of folding by expts. that would have found them, but they have now been detected and characterized for one protein in the initial prenucleation stage. Anal. presented here shows that these many microscopic trajectories are not distinct in any functionally significant way, and they have neither the structural information nor the biased energetics needed to select native vs. non-native interactions during folding. The opposed defined-pathway model stems from exptl. results that show that proteins are assemblies of small cooperative units called foldons and that a no. of proteins fold in a reproducible pathway one foldon unit at a time. Thus, the same foldon interactions that encode the native structure of any given protein also naturally encode its particular foldon-based folding pathway, and they collectively sum to produce the energy bias toward native interactions that is necessary for efficient folding. Available information suggests that quantized native structure and stepwise folding coevolved in ancient repeat proteins and were retained as a functional pair due to their utility for solving the difficult protein folding problem.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVWmtL3O&md5=b7132d811d7981126c692fd69f27dd2c
55. 55
  Voelz, V. A.; Bowman, G. R.; Beauchamp, K.; Pande, V. S. Molecular Simulation of Ab Initio Protein Folding for a Millisecond Folder NTL9(1–39). J. Am. Chem. Soc. 2010, 132, 1526– 1528, DOI: 10.1021/ja9090353
  
  55
  Molecular Simulation of ab Initio Protein Folding for a Millisecond Folder NTL9(1-39)
  
  Voelz, Vincent A.; Bowman, Gregory R.; Beauchamp, Kyle; Pande, Vijay S.
  
  Journal of the American Chemical Society (2010), 132 (5), 1526-1528CODEN: JACSAT; ISSN:0002-7863. (American Chemical Society)
  
  The results obtained suggest that existing force field models using implicit solvent are indeed accurate enough to fold proteins ab initio at long time scales (milliseconds). opening the door to simulating more structurally complex proteins. Moreover, our work demonstrates that there need not be a single pathway or single. dominant mechanism for the folding of a given protein: since the theories proposed for how proteins fold are based on broadly relevant phys. principles, it is natural to imagine that multiple mechanisms could be simultaneously present but that the sequence of the protein, coupled with the chem. environment, would control the balance to which each mechanistic pathway is seen.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXkvFCktQ%253D%253D&md5=0f7e3f2489fc0693ee494b212cde2a6c
56. 56
  Eaton, W. A.; Wolynes, P. G. Theory, Simulations, and Experiments Show That Proteins Fold by Multiple Pathways. Proc. Natl. Acad. Sci. U. S. A. 2017, 114, E9759– E9760, DOI: 10.1073/pnas.1716444114
  
  56
  Theory, simulations, and experiments show that proteins fold by multiple pathways
  
  Eaton, William A.; Wolynes, Peter G.
  
  Proceedings of the National Academy of Sciences of the United States of America (2017), 114 (46), E9759-E9760CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)
  
  There is no expanded citation for this reference.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvVShsr3O&md5=ecdf18b4810579010470c23b1349b3f1
57. 57
  Yang, Y.; Niroula, A.; Shen, B.; Vihinen, M. PON-Sol: Prediction of Effects of Amino Acid Substitutions on Protein Solubility. Bioinformatics 2016, 32, 2032– 2034, DOI: 10.1093/bioinformatics/btw066
  
  57
  PON-Sol: prediction of effects of amino acid substitutions on protein solubility
  
  Yang, Yang; Niroula, Abhishek; Shen, Bairong; Vihinen, Mauno
  
  Bioinformatics (2016), 32 (13), 2032-2034CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Motivation: Soly. is one of the fundamental protein properties. It is of great interest because of its relevance to protein expression. Reduced soly. and protein aggregation are also assocd. with many diseases. Results: We collected from literature the largest exptl. verified soly. affecting amino acid substitution (AAS) dataset and used it to train a predictor called PON-Sol. The predictor can distinguish both soly. decreasing and increasing variants from those not affecting soly. PON-Sol has normalized correct prediction ratio of 0.491 on cross-validation and 0.432 for independent test set. The performance of the method was compared both to soly. and aggregation predictors and found to be superior. PON-Sol can be used for the prediction of effects of disease-related substitutions, effects on heterologous recombinant protein expression and enhanced crystallizability. One application is to investigate effects of all possible AASs in a protein to aid protein engineering.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsF2lt7jN&md5=718a1b391921d0f38c443b58819ab66a
58. 58
  Broom, A.; Jacobi, Z.; Trainor, K.; Meiering, E. M. Computational Tools Help Improve Protein Stability but with a Solubility Tradeoff. J. Biol. Chem. 2017, 292, 14349– 14361, DOI: 10.1074/jbc.M117.784165
  
  58
  Computational tools help improve protein stability but with a solubility tradeoff
  
  Broom, Aron; Jacobi, Zachary; Trainor, Kyle; Meiering, Elizabeth M.
  
  Journal of Biological Chemistry (2017), 292 (35), 14349-14361CODEN: JBCHA3; ISSN:0021-9258. (American Society for Biochemistry and Molecular Biology)
  
  Accurately predicting changes in protein stability upon amino acid substitution is a much sought after goal. Destabilizing mutations are often implicated in disease, whereas stabilizing mutations are of great value for industrial and therapeutic biotechnol. Increasing protein stability is an esp. challenging task, with random substitution yielding stabilizing mutations in only ∼2% of cases. To overcome this bottleneck, computational tools that aim to predict the effect of mutations have been developed; however, achieving accuracy and consistency remains challenging. Here, we combined 11 freely available tools into a meta-predictor (meieringlab.uwaterloo.ca/stabilitypredict/). Validation against ∼600 exptl. mutations indicated that our meta-predictor has improved performance over any of the individual tools. The meta-predictor was then used to recommend 10 mutations in a previously designed protein of moderate thermodn. stability, ThreeFoil. Exptl. characterization showed that four mutations increased protein stability and could be amplified through ThreeFoil's structural symmetry to yield several multiple mutants with >2-kcal/mol stabilization. By avoiding residues within functional ties, we could maintain ThreeFoil's glycan-binding capacity. Despite successfully achieving substantial stabilization, however, almost all mutations decreased protein soly., the most common cause of protein design failure. Examn. of the 600-mutation data set revealed that stabilizing mutations on the protein surface tend to increase hydrophobicity and that the individual tools favor this approach to gain stability. Thus, whereas currently available tools can increase protein stability and combining them into a meta-predictor yields enhanced reliability, improvements to the potentials/force fields underlying these tools are needed to avoid gaining protein stability at the cost of soly.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVantLfP&md5=1c7b181ba75fad3548ba938167dd3a92
59. 59
  Cabantous, S.; Waldo, G. S. In Vivo and in Vitro Protein Solubility Assays Using Split GFP. Nat. Methods 2006, 3, 845– 854, DOI: 10.1038/nmeth932
  
  59
  In vivo and in vitro protein solubility assays using split GFP
  
  Cabantous, Stephanie; Waldo, Geoffrey S.
  
  Nature Methods (2006), 3 (10), 845-854CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)
  
  The rapid assessment of protein soly. is essential for evaluating expressed proteins and protein variants for use as reagents for downstream studies. Soly. screens based on antibody blots are complex and have limited screening capacity. Protein soly. screens using split β-galactosidase in vivo and in vitro can perturb protein folding. Split GFP used for monitoring protein interactions folds poorly, and to overcome this limitation, we recently developed a protein-tagging system based on self-complementing split GFP derived from an exceptionally well folded variant of GFP termed 'superfolder GFP'. Here we present the step-by-step procedure of the soly. assay using split GFP. A 15-amino-acid GFP fragment, GFP 11, is fused to a test protein. The GFP 1-10 detector fragment is expressed sep. These fragments assoc. spontaneously to form fluorescent GFP. The fragments are sol., and the GFP 11 tag has minimal effect on protein soly. and folding. We describe high-throughput protein soly. screens amenable both for in vivo and in vitro formats. The split-GFP system is composed of two vectors used in the same strain: pTET GFP 11 and pET GFP 1-10. The gene encoding the protein of interest is cloned into the pTET GFP 11 vector (resulting in an N-terminal fusion) and transformed into Escherichia coli BL21 (DE3) cells contg. the pET GFP 1-10 plasmid. We also describe how this system can be used for selecting sol. proteins from a library of variants. The large screening power of the in vivo assay combined with the high accuracy of the in vitro assay point to the efficiency of this two-step split-GFP tool for identifying sol. clones suitable for purifn. and downstream applications.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XpvVCmtb8%253D&md5=1e312220eac04371c8e03d4c8ee6bf48
60. 60
  Niwa, T.; Ying, B.-W.; Saito, K.; Jin, W.; Takada, S.; Ueda, T.; Taguchi, H. Bimodal Protein Solubility Distribution Revealed by an Aggregation Analysis of the Entire Ensemble of Escherichia coli Proteins. Proc. Natl. Acad. Sci. U. S. A. 2009, 106, 4201– 4206, DOI: 10.1073/pnas.0811922106
  
  60
  Bimodal protein solubility distribution revealed by an aggregation analysis of the entire ensemble of Escherichia coli proteins
  
  Niwa, Tatsuya; Ying, Bei-Wen; Saito, Katsu; Jin, Wen Zhen; Takada, Shoji; Ueda, Takuya; Taguchi, Hideki
  
  Proceedings of the National Academy of Sciences of the United States of America (2009), 106 (11), 4201-4206CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)
  
  Protein folding often competes with intermol. aggregation, which in most cases irreversibly impairs protein function, as exemplified by the formation of inclusion bodies. Although it has been empirically detd. that some proteins tend to aggregate, the relationship between the protein aggregation propensities and the primary sequences remains poorly understood. Here, the authors individually synthesized the entire ensemble of Escherichia coli proteins by using an in vitro reconstituted translation system and analyzed the aggregation propensities. Because the reconstituted translation system is chaperone-free, they could evaluate the inherent aggregation propensities of thousands of proteins in a translation-coupled manner. A histogram of the solubilities, based on data from 3,173 translated proteins, revealed a clear bimodal distribution, indicating that the aggregation propensities are not evenly distributed across a continuum. Instead, the proteins can be categorized into 2 groups, sol. and aggregation-prone proteins. The aggregation propensity is most prominently correlated with the structural classification of proteins, implying that the prediction of aggregation propensity requires structural information about the protein.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXjslChur4%253D&md5=d40a704ee7e3e75c515c5be76d8c0dbb
61. 61
  Eijsink, V. G.; Vriend, G.; van den Burg, B.; van der Zee, J. R.; Veltman, O. R.; Stulp, B. K.; Venema, G. Introduction of a Stabilizing 10 Residue Beta-Hairpin in Bacillus Subtilis Neutral Protease. Protein Eng., Des. Sel. 1992, 5, 157– 163, DOI: 10.1093/protein/5.2.157
  
  There is no corresponding record for this reference.
62. 62
  Lee, C.; Levitt, M. Accurate Prediction of the Stability and Activity Effects of Site-Directed Mutagenesis on a Protein Core. Nature 1991, 352, 448– 451, DOI: 10.1038/352448a0
  
  62
  Accurate prediction of the stability and activity effects of site-directed mutagenesis on a protein core
  
  Lee, Christopher; Levitt, Michael
  
  Nature (London, United Kingdom) (1991), 352 (6334), 448-51CODEN: NATUAS; ISSN:0028-0836.
  
  Theor. prediction of the structure, stability and activity of proteins, an important unsolved problem in mol. biol., would be of use for guiding site-directed mutagenesis and other protein-engineering techniques. X-ray diffraction studies have provided extensive structural information for many proteins, challenging theorists to develop reliable techniques able to use such knowledge as a base for prediction of mutants' characteristics. Here theor. calcn. of stabilization energies is reported for 78 triple-site sequence variants of λ repressor characterized exptl. The calcd. energies correlate with the mutants' measured activities; active and inactive mutations are discriminated with 92% reliability. They correlate even more directly with the mutant's thermostabilities, correctly identifying two of the mutants to be more stable than the wild type.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3MXltlWgt7g%253D&md5=c6845f89ebb1cfb4b56f72fdeb552838
63. 63
  Buß, O.; Muller, D.; Jager, S.; Rudat, J.; Rabe, K. S. Improvement in the Thermostability of a β-Amino Acid Converting ω-Transaminase by Using FoldX. ChemBioChem 2018, 19, 379– 387, DOI: 10.1002/cbic.201700467
  
  63
  Improvement in the Thermostability of a β-Amino Acid Converting ω-Transaminase by Using FoldX
  
  Buss, Oliver; Muller, Delphine; Jager, Sven; Rudat, Jens; Rabe, Kersten S.
  
  ChemBioChem (2018), 19 (4), 379-387CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)
  
  ω-Transaminases (ω-TAs) are important biocatalysts for the synthesis of active, chiral pharmaceutical ingredients contg. amino groups, such as β-amino acids, which are important in peptidomimetics and as building blocks for drugs. However, the application of ω-TAs is limited by the availability and stability of enzymes with high conversion rates. One strategy for the synthesis and optical resoln. of β-phenylalanine and other important arom. β-amino acids is biotransformation by utilizing an ω-transaminase from Variovorax paradoxus. We designed variants of this ω-TA to gain higher process stability on the basis of predictions calcd. by using the FoldX software. We herein report the first thermostabilization of a nonthermostable S-selective ω-TA by FoldX-guided site-directed mutagenesis. The m.p. (Tm) of our best-performing mutant was increased to 59.3 °C, an increase of 4.0 °C relative to the Tm value of the wild-type enzyme, whereas the mutant fully retained its specific activity.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhvF2jsbzP&md5=9e031341acdee60b9abf74ba62226fd9
64. 64
  Modarres, H. P.; Mofrad, M. R.; Sanati-Nezhad, A. Protein Thermostability Engineering. RSC Adv. 2016, 6, 115252– 115270, DOI: 10.1039/C6RA16992A
  
  64
  Protein thermostability engineering
  
  Modarres, H. Pezeshgi; Mofrad, M. R.; Sanati-Nezhad, A.
  
  RSC Advances (2016), 6 (116), 115252-115270CODEN: RSCACL; ISSN:2046-2069. (Royal Society of Chemistry)
  
  The use of enzymes for industrial and biomedical applications is limited to their function at elevated temps. The principles of thermostability engineering need to be implemented for proteins with low thermal stability to broaden their applications. Therefore, understanding the thermal stability modulating factors of proteins is necessary for engineering their thermostability. In this review, first different thermostability enhancing strategies in both the sequence and structure levels, discovered by studying the natural proteins adapted to different conditions, are introduced. Next, the progress in the development of various computational methods to engineer thermostability of proteins by learning from nature and introducing several popular tools and algorithms for protein thermostability engineering is highlighted. Further discussion includes the challenges in the field of protein thermostability engineering such as the protein stability-activity trade-off. Finally, how thermostability engineering could be instrumental for the design of protein drugs for biomedical applications is demonstrated.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhsl2gt7nE&md5=d1f4472316de49f4a85a1fa175a17b49
65. 65
  Pace, C. N.; Scholtz, J. M.; Grimsley, G. R. Forces Stabilizing Proteins. FEBS Lett. 2014, 588, 2177– 2184, DOI: 10.1016/j.febslet.2014.05.006
  
  There is no corresponding record for this reference.
66. 66
  Lazaridis, T.; Karplus, M. Effective Energy Functions for Protein Structure Prediction. Curr. Opin. Struct. Biol. 2000, 10, 139– 145, DOI: 10.1016/S0959-440X(00)00063-4
  
  66
  Effective energy functions for protein structure prediction
  
  Lazaridis, Themis; Karplus, Martin
  
  Current Opinion in Structural Biology (2000), 10 (2), 139-145CODEN: COSBEF; ISSN:0959-440X. (Elsevier Science Ltd.)
  
  A review, with 78 refs. Protein structure prediction, fold recognition, homol. modeling and design rely mainly on statistical effective energy functions. Although the theor. foundation of such functions is not clear, their usefulness has been demonstrated in many applications. Mol. mechanics force fields, particularly when augmented by implicit solvation models, provide phys. effective energy functions that are beginning to play a role in this area.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXivFWgsbY%253D&md5=eeefab13ff97ddc2b40453f19291f365
67. 67
  Seeliger, D.; de Groot, B. L. Protein Thermostability Calculations Using Alchemical Free Energy Simulations. Biophys. J. 2010, 98, 2309– 2316, DOI: 10.1016/j.bpj.2010.01.051
  
  67
  Protein thermostability calculations using alchemical free energy simulations
  
  Seeliger, Daniel; de Groot, Bert L.
  
  Biophysical Journal (2010), 98 (10), 2309-2316CODEN: BIOJAU; ISSN:0006-3495. (Cell Press)
  
  Thermal stability of proteins is crucial for both biotechnol. and therapeutic applications. Rational protein engineering therefore frequently aims at increasing thermal stability by introducing stabilizing mutations. The accurate prediction of the thermodn. consequences caused by mutations, however, is highly challenging as thermal stability changes are caused by alterations in the free energy of folding. Growing computational power, however, increasingly allows us to use alchem. free energy simulations, such as free energy perturbation or thermodn. integration, to calc. free energy differences with relatively high accuracy. In this article, we present an automated protocol for setting up alchem. free energy calcns. for mutations of naturally occurring amino acids (except for proline) that allows an unprecedented, automated screening of large mutant libraries. To validate the developed protocol, we calcd. thermodn. stability differences for 109 mutations in the microbial RNase Barnase. The obtained quant. agreement with exptl. data illustrates the potential of the approach in protein engineering and design.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXosFCitrw%253D&md5=cc1e6e66f18be5c171829f6485cff377
68. 68
  Zhang, Z.; Wang, L.; Gao, Y.; Zhang, J.; Zhenirovskyy, M.; Alexov, E. Predicting Folding Free Energy Changes upon Single Point Mutations. Bioinformatics 2012, 28, 664– 671, DOI: 10.1093/bioinformatics/bts005
  
  68
  Predicting folding free energy changes upon single point mutations
  
  Zhang, Zhe; Wang, Lin; Gao, Yang; Zhang, Jie; Zhenirovskyy, Maxim; Alexov, Emil
  
  Bioinformatics (2012), 28 (5), 664-671CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Motivation: The folding free energy is an important characteristic of proteins stability and is directly related to protein's wild-type function. The changes of protein's stability due to naturally occurring mutations, missense mutations, are typically causing diseases. Single point mutations made in vitro are frequently used to assess the contribution of given amino acid to the stability of the protein. In both cases, it is desirable to predict the change of the folding free energy upon single point mutations in order to either provide insights of the mol. mechanism of the change or to design new exptl. studies. Results: We report an approach that predicts the free energy change upon single point mutation by utilizing the 3D structure of the wild-type protein. It is based on variation of the mol. mechanics Generalized Born (MMGB) method, scaled with optimized parameters (sMMGB) and utilizing specific model of unfolded state. The corresponding mutations are built in silico and the predictions are tested against large dataset of 1109 mutations with exptl. measured changes of the folding free energy. Benchmarking resulted in root mean square deviation = 1.78 kcal/mol and slope of the linear regression fit between the exptl. data and the calcns. was 1.04. The sMMGB is compared with other leading methods of predicting folding free energy changes upon single mutations and results discussed with respect to various parameters. Availability: All the pdb files the authors used in this article can be downloaded from http://compbio.clemson.edu/downloadDir/mentaldisorders/sMMGBpdb.rar.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XjtlKntLg%253D&md5=f73d2f94ea145bd7a2b6ef7098e5ec52
69. 69
  Wickstrom, L.; Gallicchio, E.; Levy, R. M. The Linear Interaction Energy Method for the Prediction of Protein Stability Changes Upon Mutation. Proteins: Struct., Funct., Genet. 2012, 80, 111– 125, DOI: 10.1002/prot.23168
  
  69
  The linear interaction energy method for the prediction of protein stability changes upon mutation
  
  Wickstrom, Lauren; Gallicchio, Emilio; Levy, Ronald M.
  
  Proteins: Structure, Function, and Bioinformatics (2012), 80 (1), 111-125CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)
  
  The coupling of protein energetics and sequence changes is a crit. aspect of computational protein design, as well as for the understanding of protein evolution, human disease, and drug resistance. To study the mol. basis for this coupling, computational tools must be sufficiently accurate and computationally inexpensive enough to handle large amts. of sequence data. We have developed a computational approach based on the linear interaction energy (LIE) approxn. to predict the changes in the free-energy of the native state induced by a single mutation. This approach was applied to a set of 822 mutations in 10 proteins which resulted in an av. unsigned error of 0.82 kcal/mol and a correlation coeff. of 0.72 between the calcd. and exptl. ΔΔG values. The method is able to accurately identify destabilizing hot spot mutations; however, it has difficulty in distinguishing between stabilizing and destabilizing mutations because of the distribution of stability changes for the set of mutations used to parameterize the model. In addn., the model also performs quite well in initial tests on a small set of double mutations. On the basis of these promising results, we can begin to examine the relationship between protein stability and fitness, correlated mutations, and drug resistance.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhtlygu73L&md5=bc404545288a8529418812ed678171e1
70. 70
  Guerois, R.; Nielsen, J. E.; Serrano, L. Predicting Changes in the Stability of Proteins and Protein Complexes: A Study of More than 1000 Mutations. J. Mol. Biol. 2002, 320, 369– 387, DOI: 10.1016/S0022-2836(02)00442-4
  
  70
  Predicting changes in the stability of proteins and protein complexes: A study of more than 1000 mutations
  
  Guerois, Raphael; Nielsen, Jens Erik; Serrano, Luis
  
  Journal of Molecular Biology (2002), 320 (2), 369-387CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Science Ltd.)
  
  We have developed a computer algorithm, FOLDEF (for FOLD-X energy function), to provide a fast and quant. estn. of the importance of the interactions contributing to the stability of proteins and protein complexes. The predictive power of FOLDEF was tested on a very large set of point mutants (1088 mutants) spanning most of the structural environments found in proteins. FOLDEF uses a full at. description of the structure of the proteins. The different energy terms taken into account in FOLDEF have been weighted using empirical data obtained from protein engineering expts. First, we considered a training database of 339 mutants in nine different proteins and optimized the set of parameters and weighting factors that best accounted for the changes in stability of the mutants. The predictive power of the method was then tested using a blind test mutant database of 667 mutants, as well as a database of 82 protein-protein complex mutants. The global correlation obtained for 95 % of the entire mutant database (1030 mutants) is 0.83 with a std. deviation of 0.81 kcal mol-1 and a slope of 0.76. The present energy function uses a min. of computational resources and can therefore easily be used in protein design algorithms, and in the field of protein structure and folding pathways prediction where one requires a fast and accurate energy function.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XkslansLc%253D&md5=1e37d01c8310f0ba153cd2af3f5f771c
71. 71
  Mendes, J.; Guerois, R.; Serrano, L. Energy Estimation in Protein Design. Curr. Opin. Struct. Biol. 2002, 12, 441– 446, DOI: 10.1016/S0959-440X(02)00345-7
  
  71
  Energy estimation in protein design
  
  Mendes, Joaquim; Guerois, Raphael; Serrano, Luis
  
  Current Opinion in Structural Biology (2002), 12 (4), 441-446CODEN: COSBEF; ISSN:0959-440X. (Elsevier Science Ltd.)
  
  A review. The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold. The progress achieved by several groups in the field of computational protein design shows that successful design methods include two major features: efficient algorithms to deal with the combinatorial exploration of sequence space and optimal energy functions to rank sequences according to their fitness for the given fold.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD38XlvF2rtLc%253D&md5=d0d2d096d37ae267145a550325ba0cc0
72. 72
  Dehouck, Y.; Gilis, D.; Rooman, M. A New Generation of Statistical Potentials for Proteins. Biophys. J. 2006, 90, 4010– 4017, DOI: 10.1529/biophysj.105.079434
  
  72
  A new generation of statistical potentials for proteins
  
  Dehouck, Y.; Gilis, D.; Rooman, M.
  
  Biophysical Journal (2006), 90 (11), 4010-4017CODEN: BIOJAU; ISSN:0006-3495. (Biophysical Society)
  
  We propose a novel and flexible derivation scheme of statistical, database-derived, potentials, which allows one to take simultaneously into account specific correlations between several sequence and structure descriptors. This scheme leads to the decompn. of the total folding free energy of a protein into a sum of lower order terms, thereby giving the possibility to analyze independently each contribution and clarify its significance and importance, to avoid overcounting certain contributions, and to deal more efficiently with the limited size of the database. In addn., this derivation scheme appears as quite general, for many previously developed potentials can be expressed as particular cases of our formalism. We use this formalism as a framework to generate different residue-based energy functions, whose performances are assessed on the basis of their ability to discriminate genuine proteins from decoy models. The optimal potential is generated as a combination of several coupling terms, measuring correlations between residue types, backbone torsion angles, solvent accessibilities, relative positions along the sequence, and interresidue distances. This potential outperforms all tested residue-based potentials, and even several atom-based potentials. Its incorporation in algorithms aiming at predicting protein structure and stability should therefore substantially improve their performances.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XltF2mt7s%253D&md5=9e0b446c406d4d388bb2a4ad0ef271e4
73. 73
  Dehouck, Y.; Kwasigroch, J. M.; Gilis, D.; Rooman, M. PoPMuSiC 2.1: A Web Server for the Estimation of Protein Stability Changes upon Mutation and Sequence Optimality. BMC Bioinf. 2011, 12, 151, DOI: 10.1186/1471-2105-12-151
  
  73
  PoPMuSiC 2.1: a web server for the estimation of protein stability changes upon mutation and sequence optimality
  
  Dehouck Yves; Kwasigroch Jean Marc; Gilis Dimitri; Rooman Marianne
  
  BMC bioinformatics (2011), 12 (), 151 ISSN:.
  
  BACKGROUND: The rational design of modified proteins with controlled stability is of extreme importance in a whole range of applications, notably in the biotechnological and environmental areas, where proteins are used for their catalytic or other functional activities. Future breakthroughs in medical research may also be expected from an improved understanding of the effect of naturally occurring disease-causing mutations on the molecular level. RESULTS: PoPMuSiC-2.1 is a web server that predicts the thermodynamic stability changes caused by single site mutations in proteins, using a linear combination of statistical potentials whose coefficients depend on the solvent accessibility of the mutated residue. PoPMuSiC presents good prediction performances (correlation coefficient of 0.8 between predicted and measured stability changes, in cross validation, after exclusion of 10% outliers). It is moreover very fast, allowing the prediction of the stability changes resulting from all possible mutations in a medium size protein in less than a minute. This unique functionality is user-friendly implemented in PoPMuSiC and is particularly easy to exploit. Another new functionality of our server concerns the estimation of the optimality of each amino acid in the sequence, with respect to the stability of the structure. It may be used to detect structural weaknesses, i.e. clusters of non-optimal residues, which represent particularly interesting sites for introducing targeted mutations. This sequence optimality data is also expected to have significant implications in the prediction and the analysis of particular structural or functional protein regions. To illustrate the interest of this new functionality, we apply it to a dataset of known catalytic sites, and show that a much larger than average concentration of structural weaknesses is detected, quantifying how these sites have been optimized for function rather than stability. CONCLUSION: The freely available PoPMuSiC-2.1 web server is highly useful for identifying very rapidly a list of possibly relevant mutations with the desired stability properties, on which subsequent experimental studies can be focused. It can also be used to detect sequence regions corresponding to structural weaknesses, which could be functionally important or structurally delicate regions, with obvious applications in rational protein design.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3MngtVKktg%253D%253D&md5=b05d95255f2c9c47c88a3d96485e76cd
74. 74
  Liu, H. On Statistical Energy Functions for Biomolecular Modeling and Design. Quant. Biol. 2015, 3, 157– 167, DOI: 10.1007/s40484-015-0054-x
  
  74
  On statistical energy functions for biomolecular modeling and design
  
  Liu, Haiyan
  
  Quantitative Biology (2015), 3 (4), 157-167CODEN: QBUIA3; ISSN:2095-4697. (Springer GmbH)
  
  Statistical energy functions are general models about at. or residue-level interactions in biomols., derived from existing exptl. data. They provide quant. foundations for structural modeling as well as for structure-based protein sequence design. Statistical energy functions can be derived computationally either based on statistical distributions or based on variational assumptions. We present overviews on the theor. assumptions underlying the various types of approaches. Theor. considerations underlying important pragmatic choices are discussed. [Figure not available: see fulltext.].
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XjvFGrtw%253D%253D&md5=c09a8a84fa14a3c91a5ac461d6443ff9
75. 75
  Kumar, M. D. S.; Bava, K. A.; Gromiha, M. M.; Prabakaran, P.; Kitajima, K.; Uedaira, H.; Sarai, A. ProTherm and ProNIT: Thermodynamic Databases for Proteins and Protein–Nucleic Acid Interactions. Nucleic Acids Res. 2006, 34, D204– 206, DOI: 10.1093/nar/gkj103
  
  75
  ProTherm and ProNIT: thermodynamic databases for proteins and protein-nucleic acid interactions
  
  Kumar, M. D. Shaji; Bava, K. Abdulla; Gromiha, M. Michael; Prabakaran, Ponraj; Kitajima, Koji; Uedaira, Hatsuho; Sarai, Akinori
  
  Nucleic Acids Research (2006), 34 (Database), D204-D206CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
  
  ProTherm and ProNIT are two thermodn. databases that contain exptl. detd. thermodn. parameters of protein stability and protein-nucleic acid interactions, resp. The current versions of both the databases have considerably increased the total no. of entries and enhanced search interface with added new fields, improved search, display and sorting options. As on Sept. 2005, ProTherm release 5.0 contains 17 113 entries from 771 proteins, retrieved from 1497 scientific articles (∼20% increase in data from the previous version). ProNIT release 2.0 contains 4900 entries from 273 research articles, representing 158 proteins. Both databases can be queried using WWW interfaces. Both quick search and advanced search are provided on this web page to facilitate easy retrieval and display of the data from these databases. ProTherm is freely available online at http://gibk26.bse.kyutech.ac.jp/jouhou/Protherm/protherm.html and ProNIT at http://gibk26.bse.kyutech.ac.jp.jouhou/pronit/pronit.html.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XisFyitA%253D%253D&md5=31a4c4d1ba1948a78963225177f1bcdf
76. 76
  Pucci, F.; Bourgeas, R.; Rooman, M. High-Quality Thermodynamic Data on the Stability Changes of Proteins Upon Single-Site Mutations. J. Phys. Chem. Ref. Data 2016, 45, 023104, DOI: 10.1063/1.4947493
  
  76
  High-quality Thermodynamic Data on the Stability Changes of Proteins Upon Single-site Mutations
  
  Pucci, Fabrizio; Bourgeas, Raphael; Rooman, Marianne
  
  Journal of Physical and Chemical Reference Data (2016), 45 (2), 023104/1-023104/53CODEN: JPCRBU; ISSN:0047-2689. (American Institute of Physics)
  
  We have set up and manually curated a dataset contg. exptl. information on the impact of amino acid substitutions in a protein on its thermal stability. It consists of a repository of exptl. measured melting temps. (Tm) and their changes upon point mutations (ΔTm) for proteins having a well-resolved x-ray structure. This high-quality dataset is designed for being used for the training or benchmarking of in silico thermal stability prediction methods. It also reports other exptl. measured thermodn. quantities when available, i.e., the folding enthalpy (ΔH) and heat capacity (ΔCP) of the wild type proteins and their changes upon mutations (ΔΔH and ΔΔCP), as well as the change in folding free energy (ΔΔG) at a ref. temp. These data are analyzed in view of improving our insights into the correlation between thermal and thermodn. stabilities, the asymmetry between the no. of stabilizing and destabilizing mutations, and the difference in stabilization potential of thermostable vs. mesostable proteins. (c) 2016 American Institute of Physics.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XpsF2kt70%253D&md5=054a88915e91acf65599527a74c7d0c6
77. 77
  Potapov, V.; Cohen, M.; Schreiber, G. Assessing Computational Methods for Predicting Protein Stability upon Mutation: Good on Average but Not in the Details. Protein Eng., Des. Sel. 2009, 22, 553– 560, DOI: 10.1093/protein/gzp030
  
  77
  Assessing computational methods for predicting protein stability upon mutation: good on average but not in the details
  
  Potapov, Vladimir; Cohen, Mati; Schreiber, Gideon
  
  Protein Engineering, Design & Selection (2009), 22 (9), 553-560CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)
  
  Methods for protein modeling and design advanced rapidly in recent years. At the heart of these computational methods is an energy function that calcs. the free energy of the system. Many of these functions were also developed to est. the consequence of mutation on protein stability or binding affinity. In the current study, the authors chose 6 different methods that were previously reported as being able to predict the change in protein stability (ΔΔG) upon mutation: CC/PBSA, EGAD, FoldX, I-Mutant2.0, Rosetta and Hunter. The authors evaluated their performance on a large set of 2156 single mutations, avoiding for each program the mutations used for training. The correlation coeffs. between exptl. and predicted ΔΔG values were in the range of 0.59 for the best and 0.26 for the worst performing method. All the tested computational methods showed a correct trend in their predictions, but failed in providing the precise values. This is not due to lack in precision of the exptl. data, which showed a correlation coeff. of 0.86 between different measurements. Combining the methods did not significantly improve prediction accuracy compared to a single method. These results suggest that there is still room for improvement, which is crucial if we want forcefields to perform better in their various tasks.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtVKns7zI&md5=428ef7793cd6062f3e4b05831742ce25
78. 78
  Schymkowitz, J.; Borg, J.; Stricher, F.; Nys, R.; Rousseau, F.; Serrano, L. The FoldX Web Server: An Online Force Field. Nucleic Acids Res. 2005, 33, W382– 388, DOI: 10.1093/nar/gki387
  
  78
  The FoldX web server: an online force field
  
  Schymkowitz, Joost; Borg, Jesper; Stricher, Francois; Nys, Robby; Rousseau, Frederic; Serrano, Luis
  
  Nucleic Acids Research (2005), 33 (Web Server), W382-W388CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
  
  FoldX is an empirical force field that was developed for the rapid evaluation of the effect of mutations on the stability, folding and dynamics of proteins and nucleic acids. The core functionality of FoldX, namely the calcn. of the free energy of a macromol. based on its high-resoln. 3D structure, is now publicly available through a web server at http://foldx.embl.de/. The current release allows the calcn. of the stability of a protein, calcn. of the positions of the protons and the prediction of water bridges, prediction of metal binding sites and the anal. of the free energy of complex formation. Alanine scanning, the systematic truncation of side chains to alanine, is also included. In addn., some reporting functions have been added, and it is now possible to print both the at. interaction networks that constitute the protein, print the structural and energetic details of the interactions per atom or per residue, as well as generate a general quality report of the pdb structure. This core functionality will be further extended as more FoldX applications are developed.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXlslyrur4%253D&md5=1c3cd02dfeb8b5df1e1096939aa9cf03
79. 79
  Kepp, K. P. Towards a “Golden Standard” for Computing Globin Stability: Stability and Structure Sensitivity of Myoglobin Mutants. Biochim. Biophys. Acta, Proteins Proteomics 2015, 1854, 1239– 1248, DOI: 10.1016/j.bbapap.2015.06.002
  
  79
  Towards a "Golden Standard" for computing globin stability: Stability and structure sensitivity of myoglobin mutants
  
  Kepp, Kasper P.
  
  Biochimica et Biophysica Acta, Proteins and Proteomics (2015), 1854 (10_Part_A), 1239-1248CODEN: BBAPBW; ISSN:1570-9639. (Elsevier B. V.)
  
  Fast and accurate computation of protein stability is increasingly important for e.g. protein engineering and protein misfolding diseases, but no consensus methods exist for important proteins such as globins, and performance may depend on the type of structural input given. This paper reports benchmarking of six protein stability calculators (POPMUSIC 2.1, I-Mutant 2.0, I-Mutant 3.0, CUPSAT, SDM, and mCSM) against 134 exptl. stability changes for mutations of sperm-whale myoglobin. Six different high-resoln. structures were used to test structure sensitivity that may impair protein calcns. The trend accuracy of the methods decreased as I-Mutant 2.0 (R = 0.64 - 0.65), SDM (R = 0.57 - 0.60), POPMUSIC2.1 (R = 0.54 - 0.57), I-Mutant 3.0 (R = 0.53 - 0.55), mCSM (R = 0.35 - 0.47), and CUPSAT (R = 0.25 - 0.48). The mean signed errors increased as SDM < CUPSAT < I-Mutant 2.0 < I-Mutant 3.0 < POPMUSIC 2.1 < mCSM. Mean abs. errors increased as I-Mutant 2.0 < I-Mutant 3.0 < POPMUSIC 2.1 < CUPSAT < SDM < mCSM. Structural sensitivity increased as I-Mutant 3.0 (0.05) < I-Mutant 2.0 (0.09) < POPMUSIC 2.1 (0.12) < SDM (0.18) < mCSM (0.27) < CUPSAT (0.58). Leaving out heterogeneous exptl. data did not change conclusions. The distinct performances reveal room for improvement, but I-Mutant 2.0 is proficient for this purpose, as further validated against a data set of related cytochrome c like proteins. The results also emphasize the importance of high-quality crystal structures and reveal structure-dependent effects even in the near-at. resoln. limit.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtVWrtLjL&md5=92253dc29cffce1ed2835a1df377b9f6
80. 80
  Christensen, N. J.; Kepp, K. P. Accurate Stabilities of Laccase Mutants Predicted with a Modified FoldX Protocol. J. Chem. Inf. Model. 2012, 52, 3028– 3042, DOI: 10.1021/ci300398z
  
  80
  Accurate Stabilities of Laccase Mutants Predicted with a Modified FoldX Protocol
  
  Christensen, Niels J.; Kepp, Kasper P.
  
  Journal of Chemical Information and Modeling (2012), 52 (11), 3028-3042CODEN: JCISD8; ISSN:1549-9596. (American Chemical Society)
  
  Fungal laccases are multicopper enzymes of industrial importance due to their high stability, multifunctionality, and oxidizing power. This paper reports computational protocols that quantify the relative stability (ΔΔG of folding) of mutants of high-redox-potential laccases (TvLIIIb and PM1L) with up to 11 simultaneously mutated sites with good correlation against exptl. stability trends. Mol. dynamics simulations of the two laccases show that FoldX is very structure-sensitive, since all mutants and the wild type must share structural configuration to avoid artifacts of local sampling. However, using the av. of 50 MD snapshots of the equilibrated trajectories restores correlation (r ∼ 0.7-0.9, r2 ∼ 0.49-0.81) and provides a root-mean-square accuracy of ∼1.2 kcal/mol for ΔΔG or 3.5 °C for T50, suggesting that the time-av. of the crystal structure is recovered. MD-averaged input also reduces the spread in ΔΔG, suggesting that local FoldX sampling overestimates free energy changes because of neglected protein relaxation. FoldX can be viewed as a simple "linear interaction energy" method using sampling of the wild type and mutant and a parametrized relative free energy function: Thus, we show in this work that a substantial "hysteresis" of ∼1 kcal/mol applies to FoldX, and that an improved protocol that reverses calcns. and uses the av. obtained ΔΔG enhances correlation with the exptl. data. As glycosylation is ignored in FoldX, its effect on ΔΔG must be additive to the amino acid mutations. Quant. structure-property relationships of the FoldX energy components produced a substantially improved laccase stability predictor with errors of ∼1 °C for T50, vs 3-5 °C for a std. FoldX protocol. The developed model provides insight into the phys. forces governing the high stability of fungal laccases, most notably the hydrophobic and van der Waals interactions in the folded state, which provide most of the predictive power.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhsFOlt77M&md5=967d831a16967900e285baf70988ad75
81. 81
  MacKerell, A. D.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E.; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiórkiewicz-Kuczera, J.; Yin, D.; Karplus, M. All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins. J. Phys. Chem. B 1998, 102, 3586– 3616, DOI: 10.1021/jp973084f
  
  81
  All-Atom Empirical Potential for Molecular Modeling and Dynamics Studies of Proteins
  
  MacKerell, A. D., Jr.; Bashford, D.; Bellott, M.; Dunbrack, R. L.; Evanseck, J. D.; Field, M. J.; Fischer, S.; Gao, J.; Guo, H.; Ha, S.; Joseph-McCarthy, D.; Kuchnir, L.; Kuczera, K.; Lau, F. T. K.; Mattos, C.; Michnick, S.; Ngo, T.; Nguyen, D. T.; Prodhom, B.; Reiher, W. E., III; Roux, B.; Schlenkrich, M.; Smith, J. C.; Stote, R.; Straub, J.; Watanabe, M.; Wiorkiewicz-Kuczera, J.; Yin, D.; Karplus, M.
  
  Journal of Physical Chemistry B (1998), 102 (18), 3586-3616CODEN: JPCBFK; ISSN:1089-5647. (American Chemical Society)
  
  New protein parameters are reported for the all-atom empirical energy function in the CHARMM program. The parameter evaluation was based on a self-consistent approach designed to achieve a balance between the internal (bonding) and interaction (nonbonding) terms of the force field and among the solvent-solvent, solvent-solute, and solute-solute interactions. Optimization of the internal parameters used exptl. gas-phase geometries, vibrational spectra, and torsional energy surfaces supplemented with ab initio results. The peptide backbone bonding parameters were optimized with respect to data for N-methylacetamide and the alanine dipeptide. The interaction parameters, particularly the at. charges, were detd. by fitting ab initio interaction energies and geometries of complexes between water and model compds. that represented the backbone and the various side chains. In addn., dipole moments, exptl. heats and free energies of vaporization, solvation and sublimation, mol. vols., and crystal pressures and structures were used in the optimization. The resulting protein parameters were tested by applying them to noncyclic tripeptide crystals, cyclic peptide crystals, and the proteins crambin, bovine pancreatic trypsin inhibitor, and carbonmonoxy myoglobin in vacuo and in a crystal. A detailed anal. of the relationship between the alanine dipeptide potential energy surface and calcd. protein φ, χ angles was made and used in optimizing the peptide group torsional parameters. The results demonstrate that use of ab initio structural and energetic data by themselves are not sufficient to obtain an adequate backbone representation for peptides and proteins in soln. and in crystals. Extensive comparisons between mol. dynamics simulation and exptl. data for polypeptides and proteins were performed for both structural and dynamic properties. Calcd. data from energy minimization and dynamics simulations for crystals demonstrate that the latter are needed to obtain meaningful comparisons with exptl. crystal structures. The presented parameters, in combination with the previously published CHARMM all-atom parameters for nucleic acids and lipids, provide a consistent set for condensed-phase simulations of a wide variety of mols. of biol. interest.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1cXivVOlsb4%253D&md5=ebb5100dafd0daeee60ca2fa66c1324a
82. 82
  Oostenbrink, C.; Villa, A.; Mark, A. E.; van Gunsteren, W. F. A Biomolecular Force Field Based on the Free Enthalpy of Hydration and Solvation: The GROMOS Force-Field Parameter Sets 53A5 and 53A6. J. Comput. Chem. 2004, 25, 1656– 1676, DOI: 10.1002/jcc.20090
  
  82
  A biomolecular force field based on the free enthalpy of hydration and solvation: The GROMOS force-field parameter sets 53A5 and 53A6
  
  Oostenbrink, Chris; Villa, Alessandra; Mark, Alan E.; van Gunsteren, Wilfred F.
  
  Journal of Computational Chemistry (2004), 25 (13), 1656-1676CODEN: JCCHDD; ISSN:0192-8651. (John Wiley & Sons, Inc.)
  
  Successive parameterizations of the GROMOS force field have been used successfully to simulate biomol. systems over a long period of time. The continuing expansion of computational power with time makes it possible to compute ever more properties for an increasing variety of mol. systems with greater precision. This has led to recurrent parameterizations of the GROMOS force field all aimed at achieving better agreement with exptl. data. Here we report the results of the latest, extensive reparameterization of the GROMOS force field. In contrast to the parameterization of other biomol. force fields, this parameterization of the GROMOS force field is based primarily on reproducing the free enthalpies of hydration and apolar solvation for a range of compds. This approach was chosen because the relative free enthalpy of solvation between polar and apolar environments is a key property in many biomol. processes of interest, such as protein folding, biomol. assocn., membrane formation, and transport over membranes. The newest parameter sets, 53A5 and 53A6, were optimized by first fitting to reproduce the thermodn. properties of pure liqs. of a range of small polar mols. and the solvation free enthalpies of amino acid analogs in cyclohexane (53A5). The partial charges were then adjusted to reproduce the hydration free enthalpies in water (53A6). Both parameter sets are fully documented, and the differences between these and previous parameter sets are discussed.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXmvVOhtr4%253D&md5=f2c0be6f44fe768128989c9031957e4e
83. 83
  Alford, R. F.; Leaver-Fay, A.; Jeliazkov, J. R.; O’Meara, M. J.; DiMaio, F. P.; Park, H.; Shapovalov, M. V.; Renfrew, P. D.; Mulligan, V. K.; Kappel, K.; Labonte, J. W.; Pacella, M. S.; Bonneau, R.; Bradley, P.; Dunbrack, R. L.; Das, R.; Baker, D.; Kuhlman, B.; Kortemme, T.; Gray, J. J. The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design. J. Chem. Theory Comput. 2017, 13, 3031– 3048, DOI: 10.1021/acs.jctc.7b00125
  
  83
  The Rosetta All-Atom Energy Function for Macromolecular Modeling and Design
  
  Alford, Rebecca F.; Leaver-Fay, Andrew; Jeliazkov, Jeliazko R.; O'Meara, Matthew J.; DiMaio, Frank P.; Park, Hahnbeom; Shapovalov, Maxim V.; Renfrew, P. Douglas; Mulligan, Vikram K.; Kappel, Kalli; Labonte, Jason W.; Pacella, Michael S.; Bonneau, Richard; Bradley, Philip; Dunbrack, Roland L.; Das, Rhiju; Baker, David; Kuhlman, Brian; Kortemme, Tanja; Gray, Jeffrey J.
  
  Journal of Chemical Theory and Computation (2017), 13 (6), 3031-3048CODEN: JCTCCE; ISSN:1549-9618. (American Chemical Society)
  
  A review. Over the past decade, the Rosetta biomol. modeling suite has informed diverse biol. questions and engineering challenges ranging from interpretation of low-resoln. structural data to design of nanomaterials, protein therapeutics, and vaccines. Central to Rosetta's success is the energy function: a model parameterized from small mol. and x-ray crystal structure data used to approx. the energy assocd. with each biomol. conformation. This paper describes the math. models and phys. concepts that underlie the latest Rosetta Energy Function, REF15. Applying these concepts, the authors explain how to use Rosetta energies to identify and analyze the features of biomol. models. Finally, the authors discuss the latest advances in the energy function that extend capabilities from sol. proteins to also include membrane proteins, peptides contg. noncanonical amino acids, small mols., carbohydrates, nucleic acids, and other macromols.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXmsFajtb0%253D&md5=7c50732bb0c8d060bbf13df04766ce39
84. 84
  Davey, J. A.; Damry, A. M.; Euler, C. K.; Goto, N. K.; Chica, R. A. Prediction of Stable Globular Proteins Using Negative Design with Non-Native Backbone Ensembles. Structure 2015, 23, 2011– 2021, DOI: 10.1016/j.str.2015.07.021
  
  84
  Prediction of Stable Globular Proteins Using Negative Design with Non-native Backbone Ensembles
  
  Davey, James A.; Damry, Adam M.; Euler, Christian K.; Goto, Natalie K.; Chica, Roberto A.
  
  Structure (Oxford, United Kingdom) (2015), 23 (11), 2011-2021CODEN: STRUE6; ISSN:0969-2126. (Elsevier Ltd.)
  
  Accurate predictions of protein stability have great potential to accelerate progress in computational protein design, yet the correlation of predicted and exptl. detd. stabilities remains a significant challenge. To address this problem, we have developed a computational framework based on neg. multistate design in which sequence energy is evaluated in the context of both native and non-native backbone ensembles. This framework was validated exptl. with the design of ten variants of streptococcal protein G domain β1 that retained the wild-type fold, and showed a very strong correlation between predicted and exptl. stabilities (R2 = 0.86). When applied to four different proteins spanning a range of fold types, similarly strong correlations were also obtained. Overall, the enhanced prediction accuracies afforded by this method pave the way for new strategies to facilitate the generation of proteins with novel functions by computational protein design.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhsFKqsbzF&md5=4c11259f13e4cd1a28c5b550631978ee
85. 85
  Ó Conchúir, S.; Barlow, K. A.; Pache, R. A.; Ollikainen, N.; Kundert, K.; O’Meara, M. J.; Smith, C. A.; Kortemme, T. A Web Resource for Standardized Benchmark Datasets, Metrics, and Rosetta Protocols for Macromolecular Modeling and Design. PLoS One 2015, 10, e0130433, DOI: 10.1371/journal.pone.0130433
  
  There is no corresponding record for this reference.
86. 86
  Trainor, K.; Broom, A.; Meiering, E. M. Exploring the Relationships between Protein Sequence, Structure and Solubility. Curr. Opin. Struct. Biol. 2017, 42, 136– 146, DOI: 10.1016/j.sbi.2017.01.004
  
  86
  Exploring the relationships between protein sequence, structure and solubility
  
  Trainor, Kyle; Broom, Aron; Meiering, Elizabeth M.
  
  Current Opinion in Structural Biology (2017), 42 (), 136-146CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)
  
  A review. Aggregation can be thought of as a form of protein folding in which intermol. assocns. lead to the formation of large, insol. assemblies. Various types of aggregates can be differentiated by their internal structures and gross morphologies (e.g., fibrillar or amorphous), and the ability to accurately predict the likelihood of their formation by a given polypeptide is of great practical utility in the fields of biol. (including the study of disease), biotechnol., and biomaterials research. Here we review aggregation/soly. prediction methods and selected applications thereof. The development of increasingly sophisticated methods that incorporate knowledge of conformations possibly adopted by aggregating polypeptide monomers and predict the internal structure of aggregates is improving the accuracy of the predictions and continually expanding the range of applications.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhsVeitbw%253D&md5=ecc332d41a33abbdd3a2ff614195d08f
87. 87
  Das, R. Four Small Puzzles That Rosetta Doesn’t Solve. PLoS One 2011, 6, e20044, DOI: 10.1371/journal.pone.0020044
  
  87
  Four small puzzles that Rosetta doesn't solve
  
  Das, Rhiju
  
  PLoS One (2011), 6 (5), e20044CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)
  
  A complete macromol. modeling package must be able to solve the simplest structure prediction problems. Despite recent successes in high resoln. structure modeling and design, the Rosetta software suite fares poorly on small protein and RNA puzzles, some as small as four residues. To illustrate these problems, this manuscript presents Rosetta results for four well-defined test cases: the 20-residue mini-protein Trp cage, an even smaller disulfide-stabilized conotoxin, the reactive loop of a serine protease inhibitor, and a UUCG RNA tetraloop. In contrast to previous Rosetta studies, several lines of evidence indicate that conformational sampling is not the major bottleneck in modeling these small systems. Instead, approxns. and omissions in the Rosetta all-atom energy function currently preclude discriminating exptl. obsd. conformations from de novo models at at. resoln. These mol. "puzzles" should serve as useful model systems for developers wishing to make foundational improvements to this powerful modeling suite.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXms1Kmur4%253D&md5=e61085052b8642f9819bf84d8090f4cb
88. 88
  Kellogg, E. H.; Leaver-Fay, A.; Baker, D. Role of Conformational Sampling in Computing Mutation-Induced Changes in Protein Structure and Stability. Proteins: Struct., Funct., Genet. 2011, 79, 830– 838, DOI: 10.1002/prot.22921
  
  88
  Role of conformational sampling in computing mutation-induced changes in protein structure and stability
  
  Kellogg, Elizabeth H.; Leaver-Fay, Andrew; Baker, David
  
  Proteins: Structure, Function, and Bioinformatics (2011), 79 (3), 830-838CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)
  
  The prediction of changes in protein stability and structure resulting from single amino acid substitutions is both a fundamental test of macromol. modeling methodol. and an important current problem as high throughput sequencing reveals sequence polymorphisms at an increasing rate. In principle, given the structure of a wild-type protein and a point mutation whose effects are to be predicted, an accurate method should recapitulate both the structural changes and the change in the folding-free energy. Here, we explore the performance of protocols which sample an increasing diversity of conformations. We find that surprisingly similar performances in predicting changes in stability are achieved using protocols that involve very different amts. of conformational sampling, provided that the resoln. of the force field is matched to the resoln. of the sampling method. Methods involving backbone sampling can in some cases closely recapitulate the structural changes accompanying mutations but not surprisingly tend to do more harm than good in cases where structural changes are negligible. Anal. of the outliers in the stability change calcns. suggests areas needing particular improvement; these include the balance between desolvation and the formation of favorable buried polar interactions, and unfolded state modeling.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjtFahsbg%253D&md5=df144d0b7df3f42669c7344c0b13b806
89. 89
  Musil, M.; Stourac, J.; Bendl, J.; Brezovsky, J.; Prokop, Z.; Zendulka, J.; Martinek, T.; Bednar, D.; Damborsky, J. FireProt: Web Server for Automated Design of Thermostable Proteins. Nucleic Acids Res. 2017, 45, W393– W399, DOI: 10.1093/nar/gkx285
  
  89
  FireProt: web server for automated design of thermostable proteins
  
  Musil, Milos; Stourac, Jan; Bendl, Jaroslav; Brezovsky, Jan; Prokop, Zbynek; Zendulka, Jaroslav; Martinek, Tomas; Bednar, David; Damborsky, Jiri
  
  Nucleic Acids Research (2017), 45 (W1), W393-W399CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)
  
  There is a continuous interest in increasing proteins stability to enhance their usability in numerous biomedical and biotechnol. applications. A no. of in silico tools for the prediction of the effect of mutations on protein stability have been developed recently. However, only single-point mutations with a small effect on protein stability are typically predicted with the existing tools and have to be followed by laborious protein expression, purifn., and characterization. Here, the authors present FireProt, a web server for the automated design of multiple-point thermostable mutant proteins that combines structural and evolutionary information in its calcn. core. FireProt utilizes sixteen tools and three protein engineering strategies for making reliable protein designs. The server is complemented with interactive, easy-to-use interface that allows users to directly analyze and optionally modify designed thermostable mutants. FireProt is freely available at http://loschmidt.chemi.muni.cz/fireprot.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1ajtbs%253D&md5=10985da4ecd4d7ff3835a413c85f8e3b
90. 90
  Bush, J.; Makhatadze, G. I. Statistical Analysis of Protein Structures Suggests That Buried Ionizable Residues in Proteins Are Hydrogen Bonded or Form Salt Bridges. Proteins: Struct., Funct., Genet. 2011, 79, 2027– 2032, DOI: 10.1002/prot.23067
  
  90
  Statistical analysis of protein structures suggests that buried ionizable residues in proteins are hydrogen bonded or form salt bridges
  
  Bush, Jeffrey; Makhatadze, George I.
  
  Proteins: Structure, Function, and Bioinformatics (2011), 79 (7), 2027-2032CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)
  
  It is well known that nonpolar residues are largely buried in the interior of proteins, whereas polar and ionizable residues tend to be more localized on the protein surface where they are solvent-exposed. Such a distribution of residues between surface and interior is well understood from a thermodn. point: nonpolar side-chains are excluded from contact with solvent water, whereas polar and ionizable groups have favorable interactions with water and thus are preferred at the protein surface. However, there is an increasing amt. of information suggesting that polar and ionizable residues do occur in the protein core, including at positions that have no known functional importance. This is inconsistent with the observations that dehydration of polar and in particular ionizable groups is very energetically unfavorable. To resolve this, the authors performed a detailed anal. of the distribution of fractional burial of polar and ionizable residues using a large set of ∼2600 non-homologous protein structures. The authors showed that when ionizable residues were fully buried, the vast majority of them formed H-bonds and/or salt bridges with other polar/ionizable groups. This observation resolved an apparent contradiction: the energetic penalty of dehydration of polar/ionizable groups is paid off by the favorable energy of H-bonding and/or salt bridge formation in the protein interior. This conclusion agrees well with previous findings based on continuum models for electrostatic interactions in proteins.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXnt1Klt7k%253D&md5=93d332c5c9a965168e698c947386f46b
91. 91
  Stranges, P. B.; Kuhlman, B. A Comparison of Successful and Failed Protein Interface Designs Highlights the Challenges of Designing Buried Hydrogen Bonds. Protein Sci. 2013, 22, 74– 82, DOI: 10.1002/pro.2187
  
  91
  A comparison of successful and failed protein interface designs highlights the challenges of designing buried hydrogen bonds
  
  Stranges, P. Benjamin; Kuhlman, Brian
  
  Protein Science (2013), 22 (1), 74-82CODEN: PRCIEI; ISSN:1469-896X. (Wiley-Blackwell)
  
  The accurate design of new protein-protein interactions is a longstanding goal of computational protein design. However, most computationally designed interfaces fail to form exptl. This investigation compares five previously described successful de novo interface designs with 158 failures. Both sets of proteins were designed with the mol. modeling program Rosetta. Designs were considered a success if a high-resoln. crystal structure of the complex closely matched the design model and the equil. dissocn. const. for binding was less than 10 μM. The successes and failures represent a wide variety of interface types and design goals including heterodimers, homodimers, peptide-protein interactions, one-sided designs (i.e., where only one of the proteins was mutated) and two-sided designs. The most striking feature of the successful designs is that they have fewer polar atoms at their interfaces than many of the failed designs. Designs that attempted to create extensive sets of interface-spanning hydrogen bonds resulted in no detectable binding. In contrast, polar atoms make up more than 40% of the interface area of many natural dimers, and native interfaces often contain extensive hydrogen bonding networks. These results suggest that Rosetta may not be accurately balancing hydrogen bonding and electrostatic energies against desolvation penalties and that design processes may not include sufficient sampling to identify side chains in preordered conformations that can fully satisfy the hydrogen bonding potential of the interface.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XhvVeksrvO&md5=1e503efb4899c5769d094fa4b4a259b6
92. 92
  Beerens, K.; Mazurenko, S.; Kunka, A.; Marques, S. M.; Hansen, N.; Musil, M.; Chaloupkova, R.; Waterman, J.; Brezovsky, J.; Bednar, D.; Prokop, Z.; Damborsky, J. Evolutionary Analysis Is a Powerful Complement to Energy Calculations for Protein Stabilization. ACS Catal. 2018, 8, 9420– 9428, DOI: 10.1021/acscatal.8b01677
  
  92
  Evolutionary Analysis As a Powerful Complement to Energy Calculations for Protein Stabilization
  
  Beerens, Koen; Mazurenko, Stanislav; Kunka, Antonin; Marques, Sergio M.; Hansen, Niels; Musil, Milos; Chaloupkova, Radka; Waterman, Jitka; Brezovsky, Jan; Bednar, David; Prokop, Zbynek; Damborsky, Jiri
  
  ACS Catalysis (2018), 8 (10), 9420-9428CODEN: ACCACS; ISSN:2155-5435. (American Chemical Society)
  
  Stability is one of the most important characteristics of proteins employed as biocatalysts, biotherapeutics and biomaterials, and the role of computational approaches in modifying protein stability is rapidly expanding. We have recently identified stabilizing mutations in haloalkane dehalogenase DhaA using phylogenetic anal. but were not able to reproduce the effects of these mutations using force-field calcns. Here we tested four different hypotheses to explain the mol. basis of stabilization using structural, biochem., biophys. and computational analyses. We demonstrate that stabilization of DhaA by the mutations identified using the phylogenetic anal. is driven by both entropy and enthalpy-contributions, in contrast to primarily enthalpy-driven stabilization by mutations designed by the force-field calcns. Comprehensive bioinformatics anal. revealed that more than half (53%) of 1,099 evolution-based stabilizing mutations would be evaluated as de-stabilizing by force-field calcns. Thermodn. integration considers both folded and unfolded states and can describe the entropic component of stabilization, yet it is not suitable for predictive purposes due to computational demands. Altogether, our results strongly suggest that energetic calcns. should be complemented by a phylogenetic anal. in protein stabilization endeavors.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1ChtrjL&md5=c558f092f166df3aa700b008f3bfae5d
93. 93
  Wijma, H. J.; Floor, R. J.; Jekel, P. A.; Baker, D.; Marrink, S. J.; Janssen, D. B. Computationally Designed Libraries for Rapid Enzyme Stabilization. Protein Eng., Des. Sel. 2014, 27, 49– 58, DOI: 10.1093/protein/gzt061
  
  93
  Computationally designed libraries for rapid enzyme stabilization
  
  Wijma, Hein J.; Floor, Robert J.; Jekel, Peter A.; Baker, David; Marrink, Siewert J.; Janssen, Dick B.
  
  Protein Engineering, Design & Selection (2014), 27 (2), 49-58CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)
  
  The ability to engineer enzymes and other proteins to any desired stability would have wide-ranging applications. Here, we demonstrate that computational design of a library with chem. diverse stabilizing mutations allows the engineering of drastically stabilized and fully functional variants of the mesostable enzyme limonene epoxide hydrolase. First, point mutations were selected if they significantly improved the predicted free energy of protein folding. Disulfide bonds were designed using sampling of backbone conformational space, which tripled the no. of exptl. stabilizing disulfide bridges. Next, orthogonal in silico screening steps were used to remove chem. unreasonable mutations and mutations that are predicted to increase protein flexibility. The resulting library of 64 variants was exptl. screened, which revealed 21 (pairs of) stabilizing mutations located both in relatively rigid and in flexible areas of the enzyme. Finally, combining 10-12 of these confirmed mutations resulted in multi-site mutants with an increase in apparent melting temp. from 50 to 85°C, enhanced catalytic activity, preserved regioselectivity and a >250-fold longer half-life. The developed Framework for Rapid Enzyme Stabilization by Computational libraries (FRESCO) requires far less screening than conventional directed evolution.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXptV2gsA%253D%253D&md5=86dfd0f58590931be81287805299d234
94. 94
  Thiltgen, G.; Goldstein, R. A. Assessing Predictors of Changes in Protein Stability upon Mutation Using Self-Consistency. PLoS One 2012, 7, e46084, DOI: 10.1371/journal.pone.0046084
  
  94
  Assessing predictors of changes in protein stability upon mutation using self-consistency
  
  Thiltgen, Grant; Goldstein, Richard A.
  
  PLoS One (2012), 7 (10), e46084CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)
  
  The ability to predict the effect of mutations on protein stability is important for a wide range of tasks, from protein engineering to assessing the impact of SNPs to understanding basic protein biophysics. A no. of methods have been developed that make these predictions, but assessing the accuracy of these tools is difficult given the limitations and inconsistencies of the exptl. data. We evaluate four different methods based on the ability of these methods to generate consistent results for forward and back mutations and examine how this ability varies with the nature and location of the mutation. We find that, while one method seems to outperform the others, the ability of these methods to make accurate predictions is limited.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xhs12qsrbM&md5=0f37aae808ba1872727b2d0a162f5f07
95. 95
  Buß, O.; Rudat, J.; Ochsenreither, K. FoldX as Protein Engineering Tool: Better Than Random Based Approaches?. Comput. Struct. Biotechnol. J. 2018, 16, 25– 33, DOI: 10.1016/j.csbj.2018.01.002
  
  95
  FoldX as Protein Engineering Tool: Better Than Random Based Approaches?
  
  Buss, Oliver; Rudat, Jens; Ochsenreither, Katrin
  
  Computational and Structural Biotechnology Journal (2018), 16 (), 25-33CODEN: CSBJAC; ISSN:2001-0370. (Elsevier B.V.)
  
  Improving protein stability is an important goal for basic research as well as for clin. and industrial applications but no commonly accepted and widely used strategy for efficient engineering is known. Beside random approaches like error prone PCR or phys. techniques to stabilize proteins, e.g. by immobilization, in silico approaches are gaining more attention to apply target-oriented mutagenesis. In this review different algorithms for the prediction of beneficial mutation sites to enhance protein stability are summarized and the advantages and disadvantages of FoldX are highlighted. The question whether the prediction of mutation sites by the algorithm FoldX is more accurate than random based approaches is addressed.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXjsF2gtr0%253D&md5=a4864b8be6a05bd2e9593d27433aaef4
96. 96
  Allen, B. D.; Nisthal, A.; Mayo, S. L. Experimental Library Screening Demonstrates the Successful Application of Computational Protein Design to Large Structural Ensembles. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 19838– 19843, DOI: 10.1073/pnas.1012985107
  
  96
  Experimental library screening demonstrates the successful application of computational protein design to large structural ensembles
  
  Allen, Benjamin D.; Nisthal, Alex; Mayo, Stephen L.
  
  Proceedings of the National Academy of Sciences of the United States of America (2010), 107 (46), 19838-19843, S19838/1-S19838/8CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)
  
  The stability, activity, and soly. of a protein sequence are detd. by a delicate balance of mol. interactions in a variety of conformational states. Even so, most computational protein design methods model sequences in the context of a single native conformation. Simulations that model the native state as an ensemble have been mostly neglected due to the lack of sufficiently powerful optimization algorithms for multistate design. Here, we have applied our multistate design algorithm to study the potential utility of various forms of input structural data for design. To facilitate more thorough anal., we developed new methods for the design and high-throughput stability detn. of combinatorial mutation libraries based on protein design calcns. The application of these methods to the core design of a small model system produced many variants with improved thermodn. stability and showed that multistate design methods can be readily applied to large structural ensembles. We found that exhaustive screening of our designed libraries helped to clarify several sources of simulation error that would have otherwise been difficult to ascertain. Interestingly, the lack of correlation between our simulated and exptl. measured stability values shows clearly that a design procedure need not reproduce exptl. data exactly to achieve success. This surprising result suggests potentially fruitful directions for the improvement of computational protein design technol.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhsVyjsbrE&md5=4f90691cd71820f87fcae32845b45239
97. 97
  Barlow, K. A.; Ó Conchúir, S.; Thompson, S.; Suresh, P.; Lucas, J. E.; Heinonen, M.; Kortemme, T. Flex DdG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation. J. Phys. Chem. B 2018, 122, 5389– 5399, DOI: 10.1021/acs.jpcb.7b11367
  
  97
  Flex ddG: Rosetta Ensemble-Based Estimation of Changes in Protein-Protein Binding Affinity upon Mutation
  
  Barlow, Kyle A.; Conchuir, Shane O.; Thompson, Samuel; Suresh, Pooja; Lucas, James E.; Heinonen, Markus; Kortemme, Tanja
  
  Journal of Physical Chemistry B (2018), 122 (21), 5389-5399CODEN: JPCBFK; ISSN:1520-5207. (American Chemical Society)
  
  Computationally modeling changes in binding free energies upon mutation (interface ΔΔG) allows large-scale prediction and perturbation of protein-protein interactions. Addnl., methods that consider and sample relevant conformational plasticity should be able to achieve higher prediction accuracy over methods that do not. To test this hypothesis, the authors developed a method within the Rosetta macromol. modeling suite (flex ddG) that samples conformational diversity using "backrub" to generate an ensemble of models and then applies torsion minimization, side chain repacking, and averaging across this ensemble to est. interface ΔΔG values. The authors tested the method on a curated benchmark set of 1240 mutants, and found the method outperformed existing methods that sampled conformational space to a lesser degree. The authors obsd. considerable improvements with flex ddG over existing methods on the subset of small side chain to large side chain mutations, as well as for multiple simultaneous nonalanine mutations, stabilizing mutations, and mutations in antibody-antigen interfaces. Finally, the authors applied a generalized additive model (GAM) approach to the Rosetta energy function; the resulting nonlinear reweighting model improved the agreement with exptl. detd. interface ΔΔG values but also highlighted the necessity of future energy function improvements.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXit1ymsb4%253D&md5=0bd9fb996c5579bca1cd4bc13608cd13
98. 98
  Ludwiczak, J.; Jarmula, A.; Dunin-Horkawicz, S. Combining Rosetta with Molecular Dynamics (MD): A Benchmark of the MD-Based Ensemble Protein Design. J. Struct. Biol. 2018, 203, 54– 61, DOI: 10.1016/j.jsb.2018.02.004
  
  98
  Combining Rosetta with molecular dynamics (MD): A benchmark of the MD-based ensemble protein design
  
  Ludwiczak, Jan; Jarmula, Adam; Dunin-Horkawicz, Stanislaw
  
  Journal of Structural Biology (2018), 203 (1), 54-61CODEN: JSBIEM; ISSN:1047-8477. (Elsevier Inc.)
  
  Computational protein design is a set of procedures for computing amino acid sequences that will fold into a specified structure. Rosetta Design, a commonly used software for protein design, allows for the effective identification of sequences compatible with a given backbone structure, while mol. dynamics (MD) simulations can thoroughly sample near-native conformations. We benchmarked a procedure in which Rosetta design is started on MD-derived structural ensembles and showed that such a combined approach generates 20-30% more diverse sequences than currently available methods with only a slight increase in computation time. Importantly, the increase in diversity is achieved without a loss in the quality of the designed sequences assessed by their resemblance to natural sequences. We demonstrate that the MD-based procedure is also applicable to de novo design tasks started from backbone structures without any sequence information. In addn., we implemented a protocol that can be used to assess the stability of designed models and to select the best candidates for exptl. validation. In sum our results demonstrate that the MD ensemble-based flexible backbone design can be a viable method for protein design, esp. for tasks that require a large pool of diverse sequences.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXivFSkur8%253D&md5=d877ddf87c0d62bb2467beaaa0c0c164
99. 99
  Davis, I. W.; Arendall, W. B.; Richardson, D. C.; Richardson, J. S. The Backrub Motion: How Protein Backbone Shrugs When a Sidechain Dances. Structure 2006, 14, 265– 274, DOI: 10.1016/j.str.2005.10.007
  
  99
  The Backrub Motion: How Protein Backbone Shrugs When a Sidechain Dances
  
  Davis, Ian W.; Arendall, W. Bryan; Richardson, David C.; Richardson, Jane S.
  
  Structure (Cambridge, MA, United States) (2006), 14 (2), 265-274CODEN: STRUE6; ISSN:0969-2126. (Cell Press)
  
  Surprisingly, the frozen structures from ultra-high-resoln. protein crystallog. reveal a prevalent, but subtle, mode of local backbone motion coupled to much larger, two-state changes of sidechain conformation. This "backrub" motion provides an influential and common type of local plasticity in protein backbone. Concerted reorientation of two adjacent peptides swings the central sidechain perpendicular to the chain direction, changing accessible sidechain conformations while leaving flanking structure undisturbed. Alternate conformations in sub-1 Å crystal structures show backrub motions for two-thirds of the significant Cβ shifts and 3% of the total residues in these proteins (126/3882), accompanied by two-state changes in sidechain rotamer. The B modeling tool is effective in crystallog. rebuilding. For homol. modeling or protein redesign, backrubs can provide realistic, small perturbations to rigid backbones. For large sidechain changes in protein dynamics or for single mutations, backrubs allow backbone accommodation while maintaining H bonds and ideal geometry.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtlKltr8%253D&md5=b3bddc8b2314f8a5dabf31f1c3912241
100. 100
  Wei, G.; Xi, W.; Nussinov, R.; Ma, B. Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the Cell. Chem. Rev. 2016, 116, 6516– 6551, DOI: 10.1021/acs.chemrev.5b00562
  
  100
  Protein Ensembles: How Does Nature Harness Thermodynamic Fluctuations for Life? The Diverse Functional Roles of Conformational Ensembles in the Cell
  
  Wei, Guanghong; Xi, Wenhui; Nussinov, Ruth; Ma, Buyong
  
  Chemical Reviews (Washington, DC, United States) (2016), 116 (11), 6516-6551CODEN: CHREAY; ISSN:0009-2665. (American Chemical Society)
  
  All sol. proteins populate conformational ensembles that together constitute the native state. Their fluctuations in water are intrinsic thermodn. phenomena, and the distributions of the states on the energy landscape are detd. by statistical thermodn.; however, they are optimized to perform their biol. functions. In this review we briefly describe advances in free energy landscape studies of protein conformational ensembles. Exptl. (NMR, small-angle X-ray scattering, single-mol. spectroscopy, and cryo-electron microscopy) and computational (replica-exchange mol. dynamics, metadynamics, and Markov state models) approaches have made great progress in recent years. These address the challenging characterization of the highly flexible and heterogeneous protein ensembles. We focus on structural aspects of protein conformational distributions, from collective motions of single- and multi-domain proteins, intrinsically disordered proteins, to multiprotein complexes. Importantly, we highlight recent studies that illustrate functional adjustment of protein conformational ensembles in the crowded cellular environment. We center on the role of the ensemble in recognition of small- and macro-mols. (protein and RNA/DNA) and emphasize emerging concepts of protein dynamics in enzyme catalysis. Overall, protein ensembles link fundamental physicochem. principles and protein behavior and the cellular network and its regulation.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XitVyhsLo%253D&md5=fac9ab64e11aa3a4f2f0988bf1db1209
101. 101
  Fan, H.; Mark, A. E. Relative Stability of Protein Structures Determined by X-Ray Crystallography or NMR Spectroscopy: A Molecular Dynamics Simulation Study. Proteins: Struct., Funct., Genet. 2003, 53, 111– 120, DOI: 10.1002/prot.10496
  
  101
  Relative stability of protein structures determined by X-ray crystallography or NMR spectroscopy: A molecular dynamics simulation study
  
  Fan, Hao; Mark, Alan E.
  
  Proteins: Structure, Function, and Genetics (2003), 53 (1), 111-120CODEN: PSFGEY; ISSN:0887-3585. (Wiley-Liss, Inc.)
  
  The relative stability of protein structures detd. by either x-ray crystallog. or NMR spectroscopy has been investigated by using mol. dynamics simulation techniques. Published structures of 34 proteins contg. between 50 and 100 residues have been evaluated. The proteins selected represent a mixt. of secondary structure types including all α, all β, and α/β. The proteins selected do not contain cysteine-cysteine bridges. In addn., any crystallog. waters, metal ions, cofactors, or bound ligands were removed before the systems were simulated. The stability of the structures was evaluated by simulating, under identical conditions, each of the proteins for at least 5 ns in explicit solvent. It is found that not only do NMR-derived structures have, on av., higher internal strain than structures detd. by x-ray crystallog. but that a significant proportion of the structures are unstable and rapidly diverge in simulations.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3sXnt1WltLw%253D&md5=b0f43f93057a0824336823a539ae3985
102. 102
  Kuzmanic, A.; Pannu, N. S.; Zagrovic, B. X-Ray Refinement Significantly Underestimates the Level of Microscopic Heterogeneity in Biomolecular Crystals. Nat. Commun. 2014, 5, 3220, DOI: 10.1038/ncomms4220
  
  102
  X-ray refinement significantly underestimates the level of microscopic heterogeneity in biomolecular crystals
  
  Kuzmanic Antonija; Zagrovic Bojan; Pannu Navraj S
  
  Nature communications (2014), 5 (), 3220 ISSN:.
  
  Biomolecular X-ray structures typically provide a static, time- and ensemble-averaged view of molecular ensembles in crystals. In the absence of rigid-body motions and lattice defects, B-factors are thought to accurately reflect the structural heterogeneity of such ensembles. In order to study the effects of averaging on B-factors, we employ molecular dynamics simulations to controllably manipulate microscopic heterogeneity of a crystal containing 216 copies of villin headpiece. Using average structure factors derived from simulation, we analyse how well this heterogeneity is captured by high-resolution molecular-replacement-based model refinement. We find that both isotropic and anisotropic refined B-factors often significantly deviate from their actual values known from simulation: even at high 1.0 ÅA resolution and Rfree of 5.9%, B-factors of some well-resolved atoms underestimate their actual values even sixfold. Our results suggest that conformational averaging and inadequate treatment of correlated motion considerably influence estimation of microscopic heterogeneity via B-factors, and invite caution in their interpretation.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2cvivFKgsw%253D%253D&md5=99e76ed6614f1c57b46fa8917bcbcf99
103. 103
  Karshikoff, A.; Nilsson, L.; Ladenstein, R. Rigidity versus Flexibility: The Dilemma of Understanding Protein Thermal Stability. FEBS J. 2015, 282, 3899– 3917, DOI: 10.1111/febs.13343
  
  103
  Rigidity versus flexibility: the dilemma of understanding protein thermal stability
  
  Karshikoff, Andrey; Nilsson, Lennart; Ladenstein, Rudolf
  
  FEBS Journal (2015), 282 (20), 3899-3917CODEN: FJEOAC; ISSN:1742-464X. (Wiley-Blackwell)
  
  A review. The role of fluctuations in protein thermostability has recently received considerable attention. In the current literature a dualistic picture can be found as follows. On one hand, thermostability seems to be assocd. with enhanced rigidity of the protein scaffold in parallel with the redn. of flexible parts of the structure. However, in contrast with this argument it has been shown by exptl. studies and computer simulation that thermal tolerance of a protein is not necessarily correlated with the suppression of internal fluctuations and mobility. Both concepts - i.e., rigidity and flexibility - are derived from a mech. engineering perspective and represent temporally insensitive features describing static properties and neglect the notion that relative motion at certain time scales is possible in structurally stable regions of a protein. This suggests that a strict sepn. of rigid and flexible parts of a protein mol. does not correctly describe the reality of the situation. In this work the concepts of mobility/flexibility vs. rigidity will be critically reconsidered by taking into account mol. dynamics calcns. of heat capacity and conformational entropy, salt bridge networks, electrostatic interactions in folded and unfolded states, and the emerging picture of protein thermostability in view of recently developed network theories. Last, but not least, the influence of high temp. on the active site and activity of enzymes will be considered.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhtFKgt7vI&md5=065a89fa9d115391b32f09d84a41fb1a
104. 104
  Der, B. S.; Kluwe, C.; Miklos, A. E.; Jacak, R.; Lyskov, S.; Gray, J. J.; Georgiou, G.; Ellington, A. D.; Kuhlman, B. Alternative Computational Protocols for Supercharging Protein Surfaces for Reversible Unfolding and Retention of Stability. PLoS One 2013, 8, e64363, DOI: 10.1371/journal.pone.0064363
  
  104
  Alternative computational protocols for supercharging protein surfaces for reversible unfolding and retention of stability
  
  Der, Bryan S.; Kluwe, Christien; Miklos, Aleksandr E.; Jacak, Ron; Lyskov, Sergey; Gray, Jeffrey J.; Georgiou, George; Ellington, Andrew D.; Kuhlman, Brian
  
  PLoS One (2013), 8 (5), e64363CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)
  
  Reengineering protein surfaces to exhibit high net charge, referred to as "supercharging", can improve reversibility of unfolding by preventing aggregation of partially unfolded states. Incorporation of charged side chains should be optimized while considering structural and energetic consequences, as numerous mutations and accumulation of like-charges can also destabilize the native state. A previously demonstrated approach deterministically mutates flexible polar residues (amino acids DERKNQ) with the fewest av. neighboring atoms per side chain atom (AvNAPSA). Our approach uses Rosetta-based energy calcns. to choose the surface mutations. Both protocols are available for use through the ROSIE web server. The automated Rosetta and AvNAPSA approaches for supercharging choose dissimilar mutations, raising an interesting division in surface charging strategy. Rosetta-supercharged variants of GFP (RscG) ranging from -11 to -61 and +7 to +58 were exptl. tested, and for comparison, we re-tested the previously developed AvNAPSA-supercharged variants of GFP (AscG) with +36 and -30 net charge. Mid-charge variants demonstrated ∼3-fold improvement in refolding with retention of stability. However, as we pushed to higher net charges, expression and sol. yield decreased, indicating that net charge or mutational load may be limiting factors. Interestingly, the two different approaches resulted in GFP variants with similar refolding properties. Our results show that there are multiple sets of residues that can be mutated to successfully supercharge a protein, and combining alternative supercharge protocols with exptl. testing can be an effective approach for charge-based improvement to refolding.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXpslOgtLY%253D&md5=ad4d2287e1b74fc768c985f14520609a
105. 105
  Chan, P.; Curtis, R. A.; Warwicker, J. Soluble Expression of Proteins Correlates with a Lack of Positively-Charged Surface. Sci. Rep. 2013, 3, 3333, DOI: 10.1038/srep03333
  
  105
  Soluble expression of proteins correlates with a lack of positively-charged surface
  
  Chan Pedro; Curtis Robin A; Warwicker Jim
  
  Scientific reports (2013), 3 (), 3333 ISSN:.
  
  Prediction of protein solubility is gaining importance with the growing use of protein molecules as therapeutics, and ongoing requirements for high level expression. We have investigated protein surface features that correlate with insolubility. Non-polar surface patches associate to some degree with insolubility, but this is far exceeded by the association with positively-charged patches. Negatively-charged patches do not separate insoluble/soluble subsets. The separation of soluble and insoluble subsets by positive charge clustering (area under the curve for a ROC plot is 0.85) has a striking parallel with the separation that delineates nucleic acid-binding proteins, although most of the insoluble dataset are not known to bind nucleic acid. Additionally, these basic patches are enriched for arginine, relative to lysine. The results are discussed in the context of expression systems and downstream processing, contributing to a view of protein solubility in which the molecular interactions of charged groups are far from equivalent.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2c3gslGmsw%253D%253D&md5=f919d4ac66e3b0535b9ac0910bbd6341
106. 106
  Rezaie, E.; Mohammadi, M.; Sakhteman, A.; Bemani, P.; Ahrari, S. Application of Molecular Dynamics Simulations To Design a Dual-Purpose Oligopeptide Linker Sequence for Fusion Proteins. J. Mol. Model. 2018, 24, 313, DOI: 10.1007/s00894-018-3846-x
  
  106
  Application of molecular dynamics simulations to design a dual-purpose oligopeptide linker sequence for fusion proteins
  
  Rezaie Ehsan; Mohammadi Mozafar; Rezaie Ehsan; Sakhteman Amirhossein; Bemani Peyman; Ahrari Sajjad
  
  Journal of molecular modeling (2018), 24 (11), 313 ISSN:.
  
  Proteins are often monitored by combining a fluorescent polypeptide tag with the target protein. However, due to the high molecular weight and immunogenicity of such tags, they are not suitable choices for combining with fusion proteins such as immunotoxins. In this study, we designed a polypeptide sequence with a dual role (it acts as both a linker and a fluorescent probe) to use with fusion proteins. Two common fluorescent tag sequences based on tetracysteine were compared to a commonly used rigid linker as well as our proposed dual-purpose sequence. Computational investigations showed that the dual-purpose sequence was structurally stable and may be a good choice to use as both a linker and a fluorescence marker between two moieties in a fusion protein.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BB3czps1Oktw%253D%253D&md5=53d54b6672d1e6cea66bc0be9574636f
107. 107
  Folkman, L.; Stantic, B.; Sattar, A.; Zhou, Y. EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models. J. Mol. Biol. 2016, 428, 1394– 1405, DOI: 10.1016/j.jmb.2016.01.012
  
  107
  EASE-MM: Sequence-Based Prediction of Mutation-Induced Stability Changes with Feature-Based Multiple Models.
  
  Folkman, Lukas; Stantic, Bela; Sattar, Abdul; Zhou, Yaoqi
  
  Journal of Molecular Biology (2016), 428 (6), 1394-1405CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)
  
  Protein engineering and characterization of non-synonymous single nucleotide variants (SNVs) require accurate prediction of protein stability changes (ΔΔGu) induced by single amino acid substitutions. Here, we have developed a new prediction method called Evolutionary, Amino acid, and Structural Encodings with Multiple Models (EASE-MM), which comprises five specialised support vector machine (SVM) models and makes the final prediction from a consensus of two models selected based on the predicted secondary structure and accessible surface area of the mutated residue. The new method is applicable to single-domain monomeric proteins and can predict ΔΔGu with a protein sequence and mutation as the only inputs. EASE-MM yielded a Pearson correlation coeff. of 0.53-0.59 in 10-fold cross-validation and independent testing and was able to outperform other sequence-based methods. When compared to structure-based energy functions, EASE-MM achieved a comparable or better performance. The application to a large dataset of human germline non-synonymous SNVs showed that the disease-causing variants tend to be assocd. with larger magnitudes of ΔΔGu predicted with EASE-MM. The EASE-MM web-server is available at http://sparks-lab.org/server/ease.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsFKmsrg%253D&md5=f18493bae91d6e45eb5bdfe42249d354
108. 108
  Teng, S.; Srivastava, A. K.; Wang, L. Sequence Feature-Based Prediction of Protein Stability Changes upon Amino Acid Substitutions. BMC Genomics 2010, 11 (Suppl 2), S5, DOI: 10.1186/1471-2164-11-S2-S5
  
  There is no corresponding record for this reference.
109. 109
  Huang, L.-T.; Gromiha, M. M.; Ho, S.-Y. IPTREE-STAB: Interpretable Decision Tree Based Method for Predicting Protein Stability Changes upon Mutations. Bioinformatics 2007, 23, 1292– 1293, DOI: 10.1093/bioinformatics/btm100
  
  109
  iPTREE-STAB: interpretable decision tree based method for predicting protein stability changes upon mutations
  
  Huang, Liang-Tsung; Gromiha, M. Michael; Ho, Shinn-Ying
  
  Bioinformatics (2007), 23 (10), 1292-1293CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  A web server, iPTREE-STAB is developed or discriminating the stability of proteins (stabilizing or destabilizing) and predicting their stability changes (ΔΔG) upon single amino acid substitutions from amino acid sequence. The discrimination and prediction are mainly based on decision tree coupled with adaptive boosting algorithm, and classification and regression tree, resp., using three neighboring residues of the mutant site along N- and C-terminals. Our method showed an accuracy of 82% for discriminating the stabilizing and destabilizing mutants, and a correlation of 0.70 for predicting protein stability changes upon mutations.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXntVOjuro%253D&md5=93f9f9be58c4e5fc3091a6409d93ad60
110. 110
  Paladin, L.; Piovesan, D.; Tosatto, S. C. E. SODA: Prediction of Protein Solubility from Disorder and Aggregation Propensity. Nucleic Acids Res. 2017, 45, W236– W240, DOI: 10.1093/nar/gkx412
  
  110
  SODA: prediction of protein solubility from disorder and aggregation propensity
  
  Paladin, Lisanna; Piovesan, Damiano; Tosatto, Silvio C. E.
  
  Nucleic Acids Research (2017), 45 (W1), W236-W240CODEN: NARHAD; ISSN:1362-4962. (Oxford University Press)
  
  Soly. is an important, albeit not well understood, feature detg. protein behavior. It is of paramount importance in protein engineering, where similar folded proteins may behave in very different ways in soln. Here we present SODA, a novel method to predict the changes of protein soly. based on several physico-chem. properties of the protein. SODA uses the propensity of the protein sequence to aggregate as well as intrinsic disorder, plus hydrophobicity and secondary structure preferences to est. changes in soly. It has been trained and benchmarked on two different datasets. The comparison to other recently published methods shows that SODA has state-of-the-art performance and is particularly well suited to predict mutations decreasing soly. The method is fast, returning results for single mutations in seconds. A usage example estg. the full repertoire of mutations for a human germline antibody highlights several soly. hotspots on the surface.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXhs1amtbY%253D&md5=fad21a88462efc7f300fd49d3396e95a
111. 111
  Liaw, A.; Wiener, M. Classification and Regression by RandomForest. R News 2002, 2, 18– 22
  
  There is no corresponding record for this reference.
112. 112
  Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5– 32, DOI: 10.1023/A:1010933404324
  
  There is no corresponding record for this reference.
113. 113
  Boughorbel, S.; Jarray, F.; El-Anbari, M. Optimal Classifier for Imbalanced Data Using Matthews Correlation Coefficient Metric. PLoS One 2017, 12, e0177678, DOI: 10.1371/journal.pone.0177678
  
  113
  Optimal classifier for imbalanced data using Matthews Correlation Coefficient metric
  
  Boughorbel, Sabri; Jarray, Fethi; Mohammed, El-Anbari
  
  PLoS One (2017), 12 (6), e0177678/1-e0177678/17CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)
  
  Data imbalance is frequently encountered in biomedical applications. Resampling techniques can be used in binary classification to tackle this issue. However such solns. are not desired when the no. of samples in the small class is limited. Moreover the use of inadequate performance metrics, such as accuracy, lead to poor generalization results because the classifiers tend to predict the largest size class. One of the good approaches to deal with this issue is to optimize performance metrics that are designed to handle data imbalance. Matthews Correlation Coeff. (MCC) is widely used in Bioinformatics as a performance metric. We are interested in developing a new classifier based on the MCC metric to handle imbalanced data. We derive an optimal Bayes classifier for the MCC metric using an approach based on Frechet deriv. We show that the proposed algorithm has the nice theor. property of consistency. Using simulated data, we verify the correctness of our optimality result by searching in the space of all possible binary classifiers. The proposed classifier is evaluated on 64 datasets from a wide range data imbalance. We compare both classification performance and CPU efficiency for three classifiers: 1) the proposed algorithm (MCC-classifier), the Bayes classifier with a default threshold (MCC-base) and imbalanced SVM (SVM-imba). The exptl. evaluation shows that MCC-classifier has a close performance to SVM-imba while being simpler and more efficient.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXkvFaktLk%253D&md5=f3ed23447a504356fa60617bc836ffdf
114. 114
  Ling, C. X.; Sheng, V. S. Cost-Sensitive Learning and the Class Imbalance Problem. In Encyclopedia of Machine Learning; Sammut, C., Ed.; Springer: New York, 2007.
  
  There is no corresponding record for this reference.
115. 115
  Rao, R.; Fung, G.; Rosales, R. On the Dangers of Cross-Validation. An Experimental Evaluation. In Proceedings of the 2008 SIAM International Conference on Data Mining; Society for Industrial and Applied Mathematics: Philadelphia, PA, 2008; pp 588– 596.
  
  There is no corresponding record for this reference.
116. 116
  Stephens, Z. D.; Lee, S. Y.; Faghri, F.; Campbell, R. H.; Zhai, C.; Efron, M. J.; Iyer, R.; Schatz, M. C.; Sinha, S.; Robinson, G. E. Big Data: Astronomical or Genomical?. PLoS Biol. 2015, 13, e1002195, DOI: 10.1371/journal.pbio.1002195
  
  116
  Big data: astronomical or genomical?
  
  Stephens, Zachary D.; Lee, Skylar Y.; Faghri, Faraz; Campbell, Roy H.; Zhai, Chengxiang; Efron, Miles J.; Iyer, Ravishankar; Schatz, Michael C.; Sinha, Saurabh; Robinson, Gene E.
  
  PLoS Biology (2015), 13 (7), e1002195/1-e1002195/11CODEN: PBLIBG; ISSN:1545-7885. (Public Library of Science)
  
  Genomics is a Big Data science and is going to get much bigger, very soon, but it is not known whether the needs of genomics will exceed other Big Data domains. Projecting to the year 2025, we compared genomics with three other major generators of Big Data: astronomy, YouTube, and Twitter. Our ests. show that genomics is a "four-headed beast"-it is either on par with or the most demanding of the domains analyzed here in terms of data acquisition, storage, distribution, and anal. We discuss aspects of new technologies that will need to be developed to rise up and meet the computational challenges that genomics poses for the near future. Now is the time for concerted, community-wide planning for the "genomical" challenges of the next decade.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XktVGrsrs%253D&md5=0a81543b8015e929b89cc4bfe228a83c
117. 117
  Altschul, S. F.; Gish, W.; Miller, W.; Myers, E. W.; Lipman, D. J. Basic Local Alignment Search Tool. J. Mol. Biol. 1990, 215, 403– 410, DOI: 10.1016/S0022-2836(05)80360-2
  
  117
  Basic local alignment search tool
  
  Altschul, Stephen F.; Gish, Warren; Miller, Webb; Myers, Eugene W.; Lipman, David J.
  
  Journal of Molecular Biology (1990), 215 (3), 403-10CODEN: JMOBAK; ISSN:0022-2836.
  
  A new approach to rapid sequence comparison, basic local alignment search tool (BLAST), directly approximates alignments that optimize a measure of local similarity, the maximal segment pair (MSP) score. Recent math. results on the stochastic properties of MSP scores allow an anal. of the performance of this method as well as the statistical significance of alignments it generates. The basic algorithm is simple and robust; it can be implemented in a no. of ways and applied in a variety of contexts including straightforward DNA and protein sequence database searches, motif searches, gene identification searches, and in the anal. of multiple regions of similarity in long DNA sequences. In addn. to its flexibility and tractability to math. anal., BLAST is an order of magnitude faster than existing sequence comparison tools of comparable sensitivity.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK3MXitVGmsA%253D%253D&md5=009d2323eb82f0549356880e1101db16
118. 118
  Altschul, S. F.; Madden, T. L.; Schäffer, A. A.; Zhang, J.; Zhang, Z.; Miller, W.; Lipman, D. J. Gapped BLAST and PSI-BLAST: A New Generation of Protein Database Search Programs. Nucleic Acids Res. 1997, 25, 3389– 3402, DOI: 10.1093/nar/25.17.3389
  
  118
  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
  
  Altschul, Stephen F.; Madden, Thomas L.; Schaffer, Alejandro A.; Zhang, Jinghui; Zhang, Zheng; Miller, Webb; Lipman, David J.
  
  Nucleic Acids Research (1997), 25 (17), 3389-3402CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
  
  The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approx. three times the speed of the original. In addn., a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approx. the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biol. relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily. The source code for the new BLAST programs is available by anonymous ftp from the machine ncbi.nlm.nih.gov, within the directory 'blast', and the programs may be run from NCBIs web site at http://www.ncbi.nlm.nih.gov/.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXlvFyhu7w%253D&md5=4e44123e5984e4aca46a9899d347a176
119. 119
  Eddy, S. R. Profile Hidden Markov Models. Bioinformatics 1998, 14, 755– 763, DOI: 10.1093/bioinformatics/14.9.755
  
  119
  Profile hidden Markov models
  
  Eddy, Sean R.
  
  Bioinformatics (1998), 14 (9), 755-763CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  A review with many refs. The recent literature on profile hidden Markov model (profile HMM) methods and software is reviewed. Profile HMMs turn a multiple sequence alignment into a position-specific scoring system suitable for searching databases for remotely homologous sequences. Profile HMM analyses complement std. pairwise comparison methods for large-scale sequence anal. Several software implementations and two large libraries of profile HMMs of common protein domains are available. HMM methods performed comparably to threading methods in the CASP2 structure prediction exercise.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1MXktlCmtQ%253D%253D&md5=ff718714f195b87980385b1674a35353
120. 120
  Remmert, M.; Biegert, A.; Hauser, A.; Söding, J. HHblits: Lightning-Fast Iterative Protein Sequence Searching by HMM–HMM Alignment. Nat. Methods 2012, 9, 173– 175, DOI: 10.1038/nmeth.1818
  
  120
  HHblits: lightning-fast iterative protein sequence searching by HMM-HMM alignment
  
  Remmert, Michael; Biegert, Andreas; Hauser, Andreas; Soeding, Johannes
  
  Nature Methods (2012), 9 (2), 173-175CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)
  
  Sequence-based protein function and structure prediction depends crucially on sequence-search sensitivity and accuracy of the resulting sequence alignments. We present an open-source, general-purpose tool that represents both query and database sequences by profile hidden Markov models (HMMs): 'HMM-HMM-based lightning-fast iterative sequence search' (HHblits; http://toolkit.genzentrum.lmu.de/hhblits/). Compared to the sequence-search tool PSI-BLAST, HHblits is faster owing to its discretized-profile prefilter, has 50-100% higher sensitivity and generates more accurate alignments.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhs1OltbnO&md5=7173e55f4fe71458233a77c3bd38cf68
121. 121
  Pearson, W. R. An Introduction to Sequence Similarity (“Homology”) Searching. Curr. Protoc. Bioinf. 2013, 42, 3.1.1– 3.1.8, DOI: 10.1002/0471250953.bi0301s42
  
  There is no corresponding record for this reference.
122. 122
  Rost, B. Twilight Zone of Protein Sequence Alignments. Protein Eng., Des. Sel. 1999, 12, 85– 94, DOI: 10.1093/protein/12.2.85
  
  There is no corresponding record for this reference.
123. 123
  Fletcher, W.; Yang, Z. The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection. Mol. Biol. Evol. 2010, 27, 2257– 2267, DOI: 10.1093/molbev/msq115
  
  123
  The Effect of Insertions, Deletions, and Alignment Errors on the Branch-Site Test of Positive Selection
  
  Fletcher, William; Yang, Ziheng
  
  Molecular Biology and Evolution (2010), 27 (10), 2257-2267CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)
  
  The detection of pos. Darwinian selection affecting protein-coding genes remains a topic of great interest and importance. The "branch-site" test is designed to detect localized episodic bouts of pos. selection that affect only a few amino acid residues on particular lineages and has been shown to have reasonable power and low false-pos. rates for a wide range of selection schemes. Previous simulations examg. the performance of the test, however, were conducted under idealized conditions without insertions, deletions, or alignment errors. As the test is sometimes used to analyze divergent sequences, the impact of indels and alignment errors is a major concern. Here, we used a recently developed indel-simulation program to examine the false-pos. rate and power of the branch-site test. We find that insertions and deletions do not cause excessive false positives if the alignment is correct, but alignment errors can lead to unacceptably high false positives. Of the alignment methods evaluated, PRANK consistently outperformed MUSCLE, MAFFT, and ClustalW, mostly because the latter programs tend to place nonhomologous codons (or amino acids) into the same column, producing shorter and less accurate alignments and giving the false impression that many amino acid substitutions have occurred at those sites. Our examn. of two previous studies suggests that alignment errors may impact the anal. of mammalian and vertebrate genes by the branch-site test, and it is important to use reliable alignment methods.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXht1WhtL%252FK&md5=243dcf1c1aaee3f824ad895fc7bd3d57
124. 124
  Vialle, R. A.; Tamuri, A. U.; Goldman, N. Alignment Modulates Ancestral Sequence Reconstruction Accuracy. Mol. Biol. Evol. 2018, 35, 1783– 1797, DOI: 10.1093/molbev/msy055
  
  124
  Alignment modulates ancestral sequence reconstruction accuracy
  
  Vialle, Ricardo Assuncao; Tamuri, Asif U.; Goldman, Nick
  
  Molecular Biology and Evolution (2018), 35 (7), 1783-1797CODEN: MBEVEO; ISSN:1537-1719. (Oxford University Press)
  
  It relies on multiple sequence alignment (MSA) which may introduce biases, and it remains unknown how MSA methodol. approaches impact ancestral sequence reconstruction (ASR). Here, we investigate how MSA methodol. modulates ASR using a simulation study of various evolutionary scenarios. We evaluate the accuracy of ancestral protein sequence reconstruction for simulated data and compare reconstruction outcomes using different alignment methods. Our results reveal biases introduced not only by aligner algorithms and assumptions, but also tree topol. and the rate of insertions and deletions. Under many conditions we find no substantial differences between the MSAs. However, increasing the difficulty for the aligners can significantly impact ASR. The MAFFT consistency aligners and PRANK variants exhibit the best performance, whereas FSA displays limited performance. We also discover a bias towards reconstructed sequences longer than the true ancestors, deriving from a preference for inferring insertions, in almost all MSA methodol. approaches. In addn., we find measures of MSA quality generally correlate highly with reconstruction accuracy. Thus, we show MSA methodol. differences can affect the quality of reconstructions and propose MSA methods should be selected with care to accurately det. ancestral states with confidence.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtF2rurzO&md5=d87a9d035ac03728fc42191d94ae34d6
125. 125
  Chowdhury, B.; Garai, G. A Review on Multiple Sequence Alignment from the Perspective of Genetic Algorithm. Genomics 2017, 109, 419– 431, DOI: 10.1016/j.ygeno.2017.06.007
  
  125
  A review on multiple sequence alignment from the perspective of genetic algorithm
  
  Chowdhury, Biswanath; Garai, Gautam
  
  Genomics (2017), 109 (5-6), 419-431CODEN: GNMCEP; ISSN:0888-7543. (Elsevier Inc.)
  
  A review. Sequence alignment is an active research area in the field of bioinformatics. It is also a crucial task as it guides many other tasks like phylogenetic anal., function, and/or structure prediction of biol. macromols. like DNA, RNA, and Protein. Proteins are the building blocks of every living organism. Although protein alignment problem has been studied for several decades, unfortunately, every available method produces alignment results differently for a single alignment problem. Multiple sequence alignment is characterized as a very high computational complex problem. Many stochastic methods, therefore, are considered for improving the accuracy of alignment. Among them, many researchers frequently use Genetic Algorithm. In this study, we have shown different types of the method applied in alignment and the recent trends in the multiobjective genetic algorithm for solving multiple sequence alignment. Many recent studies have demonstrated considerable progress in finding the alignment accuracy.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtFGhsLzM&md5=ea687bedc4969e0baeb473d5c243927a
126. 126
  Taly, J.-F.; Magis, C.; Bussotti, G.; Chang, J.-M.; Di Tommaso, P.; Erb, I.; Espinosa-Carrasco, J.; Kemena, C.; Notredame, C. Using the T-Coffee Package to Build Multiple Sequence Alignments of Protein, RNA, DNA Sequences and 3D Structures. Nat. Protoc. 2011, 6, 1669– 1682, DOI: 10.1038/nprot.2011.393
  
  126
  Using the T-Coffee package to build multiple sequence alignments of protein, RNA, DNA sequences and 3D structures
  
  Taly, Jean-Francois; Magis, Cedrik; Bussotti, Giovanni; Chang, Jia-Ming; Di Tommaso, Paolo; Erb, Ionas; Espinosa-Carrasco, Jose; Kemena, Carsten; Notredame, Cedric
  
  Nature Protocols (2011), 6 (11), 1669-1682CODEN: NPARDW; ISSN:1750-2799. (Nature Publishing Group)
  
  T-Coffee (Tree-based consistency objective function for alignment evaluation) is a versatile multiple sequence alignment (MSA) method suitable for aligning most types of biol. sequences. The main strength of T-Coffee is its ability to combine third party aligners and to integrate structural (or homol.) information when building MSAs. The series of protocols presented here show how the package can be used to multiply align proteins, RNA and DNA sequences. The protein section shows how users can select the most suitable T-Coffee mode for their data set. Detailed protocols include T-Coffee, the default mode, M-Coffee, a meta version able to combine several third party aligners into one, PSI (position-specific iterated)-Coffee, the homol. extended mode suitable for remote homologs and Expresso, the structure-based multiple aligner. We then also show how the T-RMSD (tree based on root mean square deviation) option can be used to produce a functionally informative structure-based clustering. RNA alignment procedures are described for using R-Coffee, a mode able to use predicted RNA secondary structures when aligning RNA sequences. DNA alignments are illustrated with Pro-Coffee, a multiple aligner specific of promoter regions. We also present some of the many reformatting utilities bundled with T-Coffee. The package is an open-source freeware available from http://www.tcoffee.org/.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXht1yjsrjI&md5=ffd8032f578a0e00234e3ff361219c8b
127. 127
  Pei, J.; Grishin, N. V. PROMALS3D: Multiple Protein Sequence Alignment Enhanced with Evolutionary and Three-Dimensional Structural Information. Methods Mol. Biol. 2014, 1079, 263– 271, DOI: 10.1007/978-1-62703-646-7_17
  
  127
  PROMALS3D: multiple protein sequence alignment enhanced with evolutionary and three-dimensional structural information
  
  Pei Jimin; Grishin Nick V
  
  Methods in molecular biology (Clifton, N.J.) (2014), 1079 (), 263-71 ISSN:.
  
  Multiple sequence alignment (MSA) is an essential tool with many applications in bioinformatics and computational biology. Accurate MSA construction for divergent proteins remains a difficult computational task. The constantly increasing protein sequences and structures in public databases could be used to improve alignment quality. PROMALS3D is a tool for protein MSA construction enhanced with additional evolutionary and structural information from database searches. PROMALS3D automatically identifies homologs from sequence and structure databases for input proteins, derives structure-based constraints from alignments of three-dimensional structures, and combines them with sequence-based constraints of profile-profile alignments in a consistency-based framework to construct high-quality multiple sequence alignments. PROMALS3D output is a consensus alignment enriched with sequence and structural information about input proteins and their homologs. PROMALS3D Web server and package are available at http://prodata.swmed.edu/PROMALS3D.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2c7gvFOnsg%253D%253D&md5=73ceb74e9bc0c51251abf63b4e4d9bd3
128. 128
  Steipe, B.; Schiller, B.; Plückthun, A.; Steinbacher, S. Sequence Statistics Reliably Predict Stabilizing Mutations in a Protein Domain. J. Mol. Biol. 1994, 240, 188– 192, DOI: 10.1006/jmbi.1994.1434
  
  128
  Sequence statistics reliably predict stabilizing mutations in a protein domain
  
  Steipe, Boris; Schiller, Britta; Plueckthun, Andreas; Steinbacher, Stefan
  
  Journal of Molecular Biology (1994), 240 (3), 188-92CODEN: JMOBAK; ISSN:0022-2836.
  
  Ig variable domains are generally thought of as well conserved platforms providing the base for antigen binding loops of highly varying sequence and structure. However, domain evolution must ensure a balance between optimizing antigen affinity and the requirements of a stable, cooperatively folding domain. Since random mutations can carry a significant penalty for domain stability, constraints are imposed both on the repertoire of germline sequences and on somatic amino acid replacements during affinity maturation. Analyzing these constraints in the conceptual framework of statistical mech., the authors have been able to predict stabilizing mutations in the McPC603 VK domain from sequence information alone with better than 60% success rate. The validity of this concept not only has far reaching implications for antibody engineering but may also be generalized to engineer other proteins for high stability.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2cXltlWqsLg%253D&md5=83d409e3066ec939fae03e05eaeeefb8
129. 129
  Sullivan, B. J.; Nguyen, T.; Durani, V.; Mathur, D.; Rojas, S.; Thomas, M.; Syu, T.; Magliery, T. J. Stabilizing Proteins from Sequence Statistics: The Interplay of Conservation and Correlation in Triosephosphate Isomerase Stability. J. Mol. Biol. 2012, 420, 384– 399, DOI: 10.1016/j.jmb.2012.04.025
  
  129
  Stabilizing Proteins from Sequence Statistics: The Interplay of Conservation and Correlation in Triosephosphate Isomerase Stability
  
  Sullivan, Brandon J.; Nguyen, Tran; Durani, Venuka; Mathur, Deepti; Rojas, Samantha; Thomas, Miriam; Syu, Trixy; Magliery, Thomas J.
  
  Journal of Molecular Biology (2012), 420 (4-5), 384-399CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)
  
  Understanding the determinants of protein stability remains one of protein science's greatest challenges. There are still no computational solns. that calc. the stability effects of even point mutations with sufficient reliability for practical use. Amino acid substitutions rarely increase the stability of native proteins; hence, large libraries and high-throughput screens or selections are needed to stabilize proteins using directed evolution. Consensus mutations have proven effective for increasing stability, but these mutations are successful only about half the time. We set out to understand why some consensus mutations fail to stabilize, and what criteria might be useful to predict stabilization more accurately. Overall, consensus mutations at more conserved positions were more likely to be stabilizing in our model, triosephosphate isomerase (TIM) from Saccharomyces cerevisiae. However, positions coupled to other sites were more likely not to stabilize upon mutation. Destabilizing mutations could be removed both by removing sites with high statistical correlations to other positions and by removing nearly invariant positions at which "hidden correlations" can occur. Application of these rules resulted in identification of stabilizing mutations in 9 out of 10 positions, and amalgamation of all predicted stabilizing positions resulted in the most stable yeast TIM variant we produced (+ 8 °C). In contrast, a multimutant with 14 mutations each found to stabilize TIM independently was destabilized by 2 °C. Our results are a practical extension to the consensus concept of protein stabilization, and they further suggest the importance of positional independence in the mechanism of consensus stabilization.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XntFansb8%253D&md5=e358fa1cb59394f38ae264f104c2b3ec
130. 130
  Lehmann, M.; Kostrewa, D.; Wyss, M.; Brugger, R.; D’Arcy, A.; Pasamontes, L.; van Loon, A. P. From DNA Sequence to Improved Functionality: Using Protein Sequence Comparisons to Rapidly Design a Thermostable Consensus Phytase. Protein Eng., Des. Sel. 2000, 13, 49– 57, DOI: 10.1093/protein/13.1.49
  
  There is no corresponding record for this reference.
131. 131
  Magliery, T. J. Protein Stability: Computation, Sequence Statistics, and New Experimental Methods. Curr. Opin. Struct. Biol. 2015, 33, 161– 168, DOI: 10.1016/j.sbi.2015.09.002
  
  131
  Protein stability: computation, sequence statistics, and new experimental methods
  
  Magliery, Thomas J.
  
  Current Opinion in Structural Biology (2015), 33 (), 161-168CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)
  
  A review. Calcg. protein stability and predicting stabilizing mutations remain exceedingly difficult tasks, largely due to the inadequacy of potential functions, the difficulty of modeling entropy and the unfolded state, and challenges of sampling, particularly of backbone conformations. Yet, computational design produced some remarkably stable proteins in recent years, apparently owing to near ideality in structure and sequence features. With caveats, computational prediction of stability can be used to guide mutation, and mutations derived from consensus sequence anal., esp. improved by recent co-variation filters, are very likely to stabilize without sacrificing function. The combination of computational and statistical approaches with library approaches, including new technologies such as deep sequencing and high throughput stability measurements, point to a very exciting near term future for stability engineering, even with difficult computational issues remaining.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2MXhs1SltbvJ&md5=f60fb4ac5dc13566a98015944d24ae0b
132. 132
  Porebski, B. T.; Buckle, A. M. Consensus Protein Design. Protein Eng., Des. Sel. 2016, 29, 245– 251, DOI: 10.1093/protein/gzw015
  
  132
  Consensus protein design
  
  Porebski, Benjamin T.; Buckle, Ashley M.
  
  Protein Engineering, Design & Selection (2016), 29 (7), 245-251CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)
  
  A popular and successful strategy in semi-rational design of protein stability is the use of evolutionary information encapsulated in homologous protein sequences. Consensus design is based on the hypothesis that at a given position, the resp. consensus amino acid contributes more than av. to the stability of the protein than non-conserved amino acids. Here, we review the consensus design approach, its theor. underpinnings, successes, limitations and challenges, as well as providing a detailed guide to its application in protein engineering.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhsF2jtr%252FO&md5=d96858b68df92bbbd0811bee8188b048
133. 133
  Jäckel, C.; Bloom, J. D.; Kast, P.; Arnold, F. H.; Hilvert, D. Consensus Protein Design without Phylogenetic Bias. J. Mol. Biol. 2010, 399, 541– 546, DOI: 10.1016/j.jmb.2010.04.039
  
  133
  Consensus protein design without phylogenetic bias
  
  Jackel Christian; Bloom Jesse D; Kast Peter; Arnold Frances H; Hilvert Donald
  
  Journal of molecular biology (2010), 399 (4), 541-6 ISSN:.
  
  Consensus design is an appealing strategy for the stabilization of proteins. It exploits amino acid conservation in sets of homologous proteins to identify likely beneficial mutations. Nevertheless, its success depends on the phylogenetic diversity of the sequence set available. Here, we show that randomization of a single protein represents a reliable alternative source of sequence diversity that is essentially free of phylogenetic bias. A small number of functional protein sequences selected from binary-patterned libraries suffice as input for the consensus design of active enzymes that are easier to produce and substantially more stable than individual members of the starting data set. Although catalytic activity correlates less consistently with sequence conservation in these extensively randomized proteins, less extreme mutagenesis strategies might be adopted in practice to augment stability while maintaining function.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3cnotF2gsg%253D%253D&md5=7e4dc61c19f12f6625895e1e1c35093c
134. 134
  Goyal, V. D.; Magliery, T. J. Phylogenetic Spread of Sequence Data Affects Fitness of SOD1 Consensus Enzymes: Insights from Sequence Statistics and Structural Analyses. Proteins: Struct., Funct., Genet. 2018, 86, 609– 620, DOI: 10.1002/prot.25486
  
  There is no corresponding record for this reference.
135. 135
  Vázquez-Figueroa, E.; Chaparro-Riggers, J.; Bommarius, A. S. Development of a Thermostable Glucose Dehydrogenase by a Structure-Guided Consensus Concept. ChemBioChem 2007, 8, 2295– 2301, DOI: 10.1002/cbic.200700500
  
  135
  Development of a thermostable glucose dehydrogenase by a structure-guided consensus concept
  
  Vazquez-Figueroa, Eduardo; Chaparro-Riggers, Javier; Bommarius, Andreas S.
  
  ChemBioChem (2007), 8 (18), 2295-2301CODEN: CBCHFX; ISSN:1439-4227. (Wiley-VCH Verlag GmbH & Co. KGaA)
  
  Instability under non-native processing conditions, esp. at elevated temps., is a major factor preventing the widespread adoption of biocatalysts for industrial synthesis. A crucial distinction of many redox enzymes used to synthesize chiral compds. is the need for cofactors (e.g., NAD(P)(H)) for function. Because of the prohibitively high prices of nicotinamide cofactors, a robust cofactor-regenerating enzyme is required for the economical synthesis of fine chems. by biocatalysis. Here we test the structure-guided consensus for the generation of a thermostable glucose dehydrogenase (GDH). The consensus sequence in combination with addnl. knowledge-based criteria was used to select amino acids for substitutions. Using this approach we generated 24 variants, 11 of which showed higher thermal stability than the wild-type GDH, a success rate of 46%. Of the 24 variants, seven were located at the subunit interface-known to influence GDH stability-and six were more stable (86% success). The best variants feature a half-life of ∼3.5 days at 65°, in contrast to ∼20 min at 25° for the wild type, thus enhancing stability 106-fold. In addn., the three most stabilizing single mutations were transferred to two GDH homologs from Bacillus thuringiensis and Bacillus licheniformis. The thermal stability as measured by half-life and CD222 nm of the GDH variants was increased, as expected. The resulting stability changes provide further support for the view that these residues are crit. for stability of GDHs and reinforce the success of the consensus approach for identifying stabilizing mutations.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXltlenur0%253D&md5=7157c350631e7102bc5fd0b0ef8a4a7c
136. 136
  Parthasarathy, S.; Murthy, M. R. Protein Thermal Stability: Insights from Atomic Displacement Parameters (B Values). Protein Eng., Des. Sel. 2000, 13, 9– 13, DOI: 10.1093/protein/13.1.9
  
  There is no corresponding record for this reference.
137. 137
  Cole, M. F.; Gaucher, E. A. Exploiting Models of Molecular Evolution to Efficiently Direct Protein Engineering. J. Mol. Evol. 2011, 72, 193– 203, DOI: 10.1007/s00239-010-9415-2
  
  137
  Exploiting Models of Molecular Evolution to Efficiently Direct Protein Engineering
  
  Cole, Megan F.; Gaucher, Eric A.
  
  Journal of Molecular Evolution (2011), 72 (2), 193-203CODEN: JMEVAU; ISSN:0022-2844. (Springer)
  
  Directed evolution and protein engineering approaches used to generate novel or enhanced biomol. function often use the evolutionary sequence diversity of protein homologs to rationally guide library design. To fully capture this sequence diversity, however, libraries contg. millions of variants are often necessary. Screening libraries of this size is often undesirable due to inaccuracies of high-throughput assays, costs, and time constraints. The ability to effectively cull sequence diversity while still generating the functional diversity within a library thus holds considerable value. This is particularly relevant when high-throughput assays are not amenable to select/screen for certain biomol. properties. Here, we summarize our recent attempts to develop an evolution-guided approach, Reconstructing Evolutionary Adaptive Paths (REAP), for directed evolution and protein engineering that exploits phylogenetic and sequence analyses to identify amino acid substitutions that are likely to alter or enhance function of a protein. To demonstrate the utility of this technique, we highlight our previous work with DNA polymerases in which a REAP-designed small library was used to identify a DNA polymerase capable of accepting non-std. nucleosides. We anticipate that the REAP approach will be used in the future to facilitate the engineering of biopolymers with expanded functions and will thus have a significant impact on the developing field ofevolutionary synthetic biol.'.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXjvVSksb4%253D&md5=d54d91ea84b7e5660b5f2f72539c7d58
138. 138
  Hochberg, G. K. A.; Thornton, J. W. Reconstructing Ancient Proteins to Understand the Causes of Structure and Function. Annu. Rev. Biophys. 2017, 46, 247– 269, DOI: 10.1146/annurev-biophys-070816-033631
  
  138
  Reconstructing Ancient Proteins to Understand the Causes of Structure and Function
  
  Hochberg, Georg K. A.; Thornton, Joseph W.
  
  Annual Review of Biophysics (2017), 46 (), 247-269CODEN: ARBNCV; ISSN:1936-122X. (Annual Reviews)
  
  A review. A central goal in biochem. is to explain the causes of protein sequence, structure, and function. Mainstream approaches seek to rationalize sequence and structure in terms of their effects on function and to identify function's underlying determinants by comparing related proteins to each other. Although productive, both strategies suffer from intrinsic limitations that have left important aspects of many proteins unexplained. These limits can be overcome by reconstructing ancient proteins, exptl. characterizing their properties, and retracing their evolution through time. This approach has proven to be a powerful means for discovering how historical changes in sequence produced the functions, structures, and other phys./chem. characteristics of modern proteins. It has also illuminated whether protein features evolved because of functional optimization, historical constraint, or blind chance. Here this review recent studies employing ancestral protein reconstruction and show how they have produced new knowledge not only of mol. evolutionary processes but also of the underlying determinants of modern proteins' phys., chem., and biol. properties.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXksVCqs70%253D&md5=19552f9d9e82ad02000e1650203db066
139. 139
  Aerts, D.; Verhaeghe, T.; Joosten, H.-J.; Vriend, G.; Soetaert, W.; Desmet, T. Consensus Engineering of Sucrose Phosphorylase: The Outcome Reflects the Sequence Input. Biotechnol. Bioeng. 2013, 110, 2563– 2572, DOI: 10.1002/bit.24940
  
  139
  Consensus Engineering of Sucrose Phosphorylase: The Outcome Reflects the Sequence Input
  
  Aerts, Dirk; Verhaeghe, Tom; Joosten, Henk-Jan; Vriend, Gert; Soetaert, Wim; Desmet, Tom
  
  Biotechnology and Bioengineering (2013), 110 (10), 2563-2572CODEN: BIBIAU; ISSN:0006-3592. (John Wiley & Sons, Inc.)
  
  Consensus engineering, which is replacing amino acids by the most frequently occurring one at their positions in a multiple sequence alignment (MSA), is a known strategy to increase the stability of a protein. The application of this concept to the entire sequence of an enzyme, however, has been tried only a few times mainly because of the problems detg. the consensus in highly variable regions. We show that this problem can be solved by replacing such problematic regions by the corresponding sequence of the natural homolog closest to the consensus. When one or a few sub-families are overrepresented in the MSA the consensus sequence is a biased representation of the sequence space. We examine the influence of this bias by constructing three consensus sequences using different MSAs of sucrose phosphorylase (SP). Each consensus enzyme contained about 70 mutations compared to its closest natural homolog and folded correctly and displayed activity on sucrose. Correlation anal. revealed that the family's co-evolution network was kept intact, which is one of the main advantages of full-length consensus design. The consensus enzymes displayed an "av." thermostability, i.e., one that is higher than some but not all known representatives. We cautiously present practical rules for the design of consensus sequences, but warn that the measure of success depends on which natural enzyme is used as point of comparison.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXntV2nsrY%253D&md5=0eb7d3666500fcc3c3e4b2c9bf0d2726
140. 140
  Trudeau, D. L.; Kaltenbach, M.; Tawfik, D. S. On the Potential Origins of the High Stability of Reconstructed Ancestral Proteins. Mol. Biol. Evol. 2016, 33, 2633– 2641, DOI: 10.1093/molbev/msw138
  
  140
  On the potential origins of the high stability of reconstructed ancestral proteins
  
  Trudeau, Devin L.; Kaltenbach, Miriam; Tawfik, Dan S.
  
  Molecular Biology and Evolution (2016), 33 (10), 2633-2641CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)
  
  Ancestral reconstruction provides instrumental insights regarding the biochem. and biophys. characteristics of past proteins. A striking observation relates to the remarkably high thermostability of reconstructed ancestors. The latter has been linked to high environmental temps. in the Precambrian era, the era relating to most reconstructed proteins.We found that inferred ancestors of the serum paraoxonase (PON) enzyme family, including the mammalian ancestor,exhibit dramatically increased thermostabilities compared with the extant, human enzyme (up to 30 °C higher melting temp.). However, the environmental temp. at the time of emergence of mammals is presumed to be similar to the present one. Addnl., the mammalian PON ancestor has superior folding properties (kinetic stability) -unlike the extant mammalian PONs, it expresses in E. coli in a sol. and functional form, and at a high yield. We discuss two potential origins of this unexpectedly high stability. First, ancestral stability may be overestimated by a "consensuseffect," whereby replacing amino acids that are rare in contemporary sequences with the amino acid most common in the family increases protein stability. Comparison to other reconstructed ancestors indicates that the consensus effect may bias some but not all reconstructions. Second, we note that high stability may relate to factors other than high environmental temp. such as oxidative stress or high radiation levels. Foremost, intrinsic factors such as high rates of genetic mutations and/or of transcriptional and translational errors, and less efficient protein quality control systems,may underlie the high kinetic and thermodn. stability of past proteins.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhvVKmsrrK&md5=b75ae4a443a42d2a9c427d4471386c0e
141. 141
  Wheeler, L. C.; Lim, S. A.; Marqusee, S.; Harms, M. J. The Thermostability and Specificity of Ancient Proteins. Curr. Opin. Struct. Biol. 2016, 38, 37– 43, DOI: 10.1016/j.sbi.2016.05.015
  
  141
  The thermostability and specificity of ancient proteins
  
  Wheeler, Lucas C.; Lim, Shion A.; Marqusee, Susan; Harms, Michael J.
  
  Current Opinion in Structural Biology (2016), 38 (), 37-43CODEN: COSBEF; ISSN:0959-440X. (Elsevier Ltd.)
  
  A review. Were ancient proteins systematically different than modern proteins. The answer to this question is profoundly important, shaping how we understand the origins of protein biochem., biophys., and functional properties. Ancestral sequence reconstruction (ASR), a phylogenetic approach to infer the sequences of ancestral proteins, may reveal such trends. We discuss two proposed trends: a transition from higher to lower thermostability and a tendency for proteins to acquire higher specificity over time. We review the evidence for elevated ancestral thermostability and discuss its possible origins in a changing environmental temp. and/or reconstruction bias. We also conclude that there is, as yet, insufficient data to support a trend from promiscuity to specificity. Finally, we propose future work to understand these proposed evolutionary trends.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XptVCkt7w%253D&md5=0c838f70ed03739135bd21b2da42976a
142. 142
  Yang, Z. PAML: A Program Package for Phylogenetic Analysis by Maximum Likelihood. Bioinformatics 1997, 13, 555– 556, DOI: 10.1093/bioinformatics/13.5.555
  
  There is no corresponding record for this reference.
143. 143
  Stamatakis, A. RAxML-VI-HPC: Maximum Likelihood-Based Phylogenetic Analyses with Thousands of Taxa and Mixed Models. Bioinformatics 2006, 22, 2688– 2690, DOI: 10.1093/bioinformatics/btl446
  
  143
  RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
  
  Stamatakis, Alexandros
  
  Bioinformatics (2006), 22 (21), 2688-2690CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  RAxML-VI-HPC (randomized accelerated max. likelihood for high performance computing) is a sequential and parallel program for inference of large phylogenies with max. likelihood (ML). Low-level tech. optimizations, a modification of the search algorithm, and the use of the GTR + CAT approxn. as replacement for GTR + Γ yield a program that is between 2.7 and 52 times faster than the previous version of RAxML. A large-scale performance comparison with GARLI, PHYML, IQPNNI and MrBayes on real data contg. 1000 up to 6722 taxa shows that RAxML requires at least 5.6 times less main memory and yields better trees in similar times than the best competing program (GARLI) on datasets up to 2500 taxa. On datasets ≥4000 taxa it also runs 2-3 times faster than GARLI. RAxML has been parallelized with MPI to conduct parallel multiple bootstraps and inferences on distinct starting trees. The program has been used to compute ML trees on two of the largest alignments to date contg. 25 057 (1463 bp) and 2182 (51 089 bp) taxa, resp.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XhtFKlsbfI&md5=7ace2669734254992f338db53aa64702
144. 144
  Huelsenbeck, J. P.; Ronquist, F.; Nielsen, R.; Bollback, J. P. Bayesian Inference of Phylogeny and Its Impact on Evolutionary Biology. Science 2001, 294, 2310– 2314, DOI: 10.1126/science.1065889
  
  144
  Evolution: Bayesian inference of phylogeny and its impact on evolutionary biology
  
  Huelsenbeck, John P.; Ronquist, Fredrik; Nielsen, Rasmus; Bollback, Jonathan P.
  
  Science (Washington, DC, United States) (2001), 294 (5550), 2310-2314CODEN: SCIEAS; ISSN:0036-8075. (American Association for the Advancement of Science)
  
  A review. As a discipline, phylogenetics is becoming transformed by a flood of mol. data. These data allow broad questions to be asked about the history of life, but also present difficult statistical and computational problems. Bayesian inference of phylogeny brings a new perspective to a no. of outstanding issues in evolutionary biol., including the anal. of large phylogenetic trees and complex evolutionary models and the detection of the footprint of natural selection in DNA sequences.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXptFGkt7k%253D&md5=e7a0aada901ae4a53ce15b47e043b436
145. 145
  Goldstein, R. A.; Pollard, S. T.; Shah, S. D.; Pollock, D. D. Nonadaptive Amino Acid Convergence Rates Decrease over Time. Mol. Biol. Evol. 2015, 32, 1373– 1381, DOI: 10.1093/molbev/msv041
  
  145
  Nonadaptive amino acid convergence rates decrease over time
  
  Goldstein, R. A.; Pollard, S. T.; Shah, S. D.; Pollock, D. D.
  
  Molecular Biology and Evolution (2015), 32 (6), 1373-1381CODEN: MBEVEO; ISSN:0737-4038. (Oxford University Press)
  
  Convergence is a central concept in evolutionary studies because it provides strong evidence for adaptation. It also provides information about the nature of the fitness landscape and the repeatability of evolution, and can mislead phylogenetic inference. To understand the role of adaptive convergence, we need to understand the patterns of nonadaptive convergence. Here, we consider the relationship between nonadaptive convergence and divergence in mitochondrial and model proteins. Surprisingly, nonadaptive convergence is much more common than expected in closely related organisms, falling off as organisms diverge. The extent of the convergent drop-off in mitochondrial proteins is well predicted by epistatic or coevolutionary effects in our "evolutionary Stokes shift" models and poorly predicted by conventional evolutionary models. Convergence probabilities decrease dramatically if the ancestral amino acids of branches being compared have diverged, but also drop slowly over evolutionary time even if the ancestral amino acids have not substituted. Convergence probabilities drop-off rapidly for quickly evolving sites, but much more slowly for slowly evolving sites. Furthermore, once sites have diverged their convergence probabilities are extremely low and indistinguishable from convergence levels at randomized sites. These results indicate that we cannot assume that excessive convergence early on is necessarily adaptive. This new understanding should help us to better discriminate adaptive from nonadaptive convergence and develop more relevant evolutionary models with improved validity for phylogenetic inference.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs12qurfM&md5=1adafac45f4d245090310e484216fed6
146. 146
  Williams, P. D.; Pollock, D. D.; Blackburne, B. P.; Goldstein, R. A. Assessing the Accuracy of Ancestral Protein Reconstruction Methods. PLoS Comput. Biol. 2006, 2, e69, DOI: 10.1371/journal.pcbi.0020069
  
  There is no corresponding record for this reference.
147. 147
  Eick, G. N.; Bridgham, J. T.; Anderson, D. P.; Harms, M. J.; Thornton, J. W. Robustness of Reconstructed Ancestral Protein Functions to Statistical Uncertainty. Mol. Biol. Evol. 2016, 34, 247– 261, DOI: 10.1093/molbev/msw223
  
  There is no corresponding record for this reference.
148. 148
  Gaucher, E. A.; Govindarajan, S.; Ganesh, O. K. Palaeotemperature Trend for Precambrian Life Inferred from Resurrected Proteins. Nature 2008, 451, 704– 707, DOI: 10.1038/nature06510
  
  148
  Palaeotemperature trend for Precambrian life inferred from resurrected proteins
  
  Gaucher, Eric A.; Govindarajan, Sridhar; Ganesh, Omjoy K.
  
  Nature (London, United Kingdom) (2008), 451 (7179), 704-707CODEN: NATUAS; ISSN:0028-0836. (Nature Publishing Group)
  
  Biosignatures and structures in the geol. record indicate that microbial life has inhabited Earth for ∼3.5 × 109 yr. Research in the phys. sciences has been able to generate statements about the ancient environment that hosted this life. These include the chem. compns. and temps. of the early ocean and atm. Only recently have the natural sciences been able to provide exptl. results describing the environments of ancient life. The authors' previous work with resurrected proteins indicated that ancient life lived in a hot environment. Here, the authors expand the timescale of resurrected proteins to provide a palaeotemp. trend of the environments that hosted life 3.5-0.5 × 109 yr ago. The thermostability of >25 phylogenetically dispersed ancestral elongation factors suggests that the environment supporting ancient life cooled progressively by 30° during that period. Here, the authors show that their results are robust to potential statistical bias assocd. with the posterior distribution of inferred character states, phylogenetic ambiguity, and uncertainties in the amino acid equil. frequencies used by evolutionary models. The results are further supported by a nearly identical cooling trend for the ancient ocean as inferred from the deposition of O isotopes. The convergence of results from natural and phys. sciences suggests that ancient life has continually adapted to changes in environmental temps. throughout its evolutionary history.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhs1Kns7c%253D&md5=12ecd01c6a3fb6f85528bd2424518e85
149. 149
  Akanuma, S. Characterization of Reconstructed Ancestral Proteins Suggests a Change in Temperature of the Ancient Biosphere. Life (Basel, Switz.) 2017, 7, 33, DOI: 10.3390/life7030033
  
  149
  Characterization of reconstructed ancestral proteins suggests a change in temperature of the ancient biosphere
  
  Akanuma, Satoshi
  
  Life (Basel, Switzerland) (2017), 7 (3), 33/1-33/14CODEN: LBSIB7; ISSN:2075-1729. (MDPI AG)
  
  Understanding the evolution of ancestral life, and esp. the ability of some organisms to flourish in the variable environments experienced in Earth's early biosphere, requires knowledge of the characteristics and the environment of these ancestral organisms. Information about early life and environmental conditions has been obtained from fossil records and geol. surveys. Recent advances in phylogenetic anal., and an increasing no. of protein sequences available in public databases, have made it possible to infer ancestral protein sequences possessed by ancient organisms. However, the in silico studies that assess the ancestral base content of rRNAs, the frequency of each amino acid in ancestral proteins, and est. the environmental temps. of ancient organisms, show conflicting results. The characterization of ancestral proteins reconstructed in vitro suggests that ancient organisms had very thermally stable proteins, and therefore were thermophilic or hyperthermophilic. Exptl. data supports the idea that only thermophilic ancestors survived the catastrophic increase in temp. of the biosphere that was likely assocd. with meteorite impacts during the early history of Earth. In addn., by expanding the timescale and including more ancestral proteins for reconstruction, it appears as though the Earth's surface temp. gradually decreased over time, from Archean to present.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXjvFKgur4%253D&md5=b10716c8df6fad176f11f86ba8344fc6
150. 150
  Gumulya, Y.; Baek, J.-M.; Wun, S.-J.; Thomson, R. E. S.; Harris, K. L.; Hunter, D. J. B.; Behrendorff, J. B. Y. H.; Kulig, J.; Zheng, S.; Wu, X.; Wu, B.; Stok, J. E.; De Voss, J. J.; Schenk, G.; Jurva, U.; Andersson, S.; Isin, E. M.; Bodén, M.; Guddat, L.; Gillam, E. M. J. Engineering Highly Functional Thermostable Proteins Using Ancestral Sequence Reconstruction. Nat. Catal. 2018, 1, 878, DOI: 10.1038/s41929-018-0159-5
  
  150
  Engineering highly functional thermostable proteins using ancestral sequence reconstruction
  
  Gumulya, Yosephin; Baek, Jong-Min; Wun, Shun-Jie; Thomson, Raine E. S.; Harris, Kurt L.; Hunter, Dominic J. B.; Behrendorff, James B. Y. H.; Kulig, Justyna; Zheng, Shan; Wu, Xueming; Wu, Bin; Stok, Jeanette E.; De Voss, James J.; Schenk, Gerhard; Jurva, Ulrik; Andersson, Shalini; Isin, Emre M.; Boden, Mikael; Guddat, Luke; Gillam, Elizabeth M. J.
  
  Nature Catalysis (2018), 1 (11), 878-888CODEN: NCAACP; ISSN:2520-1158. (Nature Research)
  
  Com. biocatalysis requires robust enzymes that can withstand elevated temps. and long incubations. Ancestral reconstruction has shown that pre-Cambrian enzymes were often much more thermostable than extant forms. Here, we resurrect ancestral enzymes that withstand ∼30 °C higher temps. and ≥100 times longer incubations than their extant forms. This is demonstrated on animal cytochromes P 450 that stereo- and regioselectively functionalize unactivated C-H bonds for the synthesis of valuable chems., and bacterial ketol-acid reductoisomerases that are used to make butanol-based biofuels. The vertebrate CYP3 P 450 ancestor showed a 60T50 of 66 °C and enhanced solvent tolerance compared with the human drug-metabolizing CYP3A4, yet comparable activity towards a similarly broad range of substrates. The ancestral ketol-acid reductoisomerase showed an eight-fold higher specific activity than the cognate Escherichia coli form at 25 °C, which increased 3.5-fold at 50 °C. Thus, thermostable proteins can be devised using sequence data alone from even recent ancestors.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtFGisL3E&md5=85eca5d2a0cb9a4d6dc5e8b6e790b718
151. 151
  Dehouck, Y.; Grosfils, A.; Folch, B.; Gilis, D.; Bogaerts, P.; Rooman, M. Fast and Accurate Predictions of Protein Stability Changes upon Mutations Using Statistical Potentials and Neural Networks: PoPMuSiC-2.0. Bioinformatics 2009, 25, 2537– 2543, DOI: 10.1093/bioinformatics/btp445
  
  151
  Fast and accurate predictions of protein stability changes upon mutations using statistical potentials and neural networks: PoPMuSiC-2.0
  
  Dehouck, Yves; Grosfils, Aline; Folch, Benjamin; Gilis, Dimitri; Bogaerts, Philippe; Rooman, Marianne
  
  Bioinformatics (2009), 25 (19), 2537-2543CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  The rational design of proteins with modified properties, through amino acid substitutions, is of crucial importance in a large variety of applications. Given the huge no. of possible substitutions, every protein engineering project would benefit strongly from the guidance of in silico methods able to predict rapidly, and with reasonable accuracy, the stability changes resulting from all possible mutations in a protein. The authors exploit newly developed statistical potentials, based on a formalism that highlights the coupling between 4 protein sequence and structure descriptors, and take into account the amino acid vol. variation upon mutation. The stability change is expressed as a linear combination of these energy functions, whose proportionality coeffs. vary with the solvent accessibility of the mutated residue and are identified with the help of a neural network. A correlation coeff. of R = 0.63 and a root mean square error of σc = 1.15 kcal/mol between measured and predicted stability changes are obtained upon cross-validation. These scores reach R = 0.79, and σc = 0.86 kcal/mol after exclusion of 10% outliers. The predictive power of the authors' method is shown to be significantly higher than that of other programs described in the literature.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtFyhtbbF&md5=59f9acfcafd822a7f3a27eb3cf3538cd
152. 152
  Khatun, J.; Khare, S. D.; Dokholyan, N. V. Can Contact Potentials Reliably Predict Stability of Proteins?. J. Mol. Biol. 2004, 336, 1223– 1238, DOI: 10.1016/j.jmb.2004.01.002
  
  152
  Can Contact Potentials Reliably Predict Stability of Proteins?
  
  Khatun, Jainab; Khare, Sagar D.; Dokholyan, Nikolay V.
  
  Journal of Molecular Biology (2004), 336 (5), 1223-1238CODEN: JMOBAK; ISSN:0022-2836. (Elsevier)
  
  The simplest approxn. of interaction potential between amino acid residues in proteins is the contact potential, which defines the effective free energy of a protein conformation by a set of amino acid contacts formed in this conformation. Finding a contact potential capable of predicting free energies of protein states across a variety of protein families will aid protein folding and engineering in silico on a computationally tractable time-scale. We test the ability of contact potentials to accurately and transferably (across various protein families) predict stability changes of proteins upon mutations. We develop a new methodol. to det. the contact potentials in proteins from exptl. measurements of changes in protein's thermodn. stabilities (ΔΔG) upon mutations. We apply our methodol. to derive sets of contact interaction parameters for a hierarchy of interaction models including solvation and multi-body contact parameters. We test how well our models reproduce exptl. measurements by statistical tests. We evaluate the max. accuracy of predictions obtained by using contact potentials and the correlation between parameters derived from different data-sets of exptl. (ΔΔG) values. We argue that it is impossible to reach exptl. accuracy and derive fully transferable contact parameters using the contact models of potentials. However, contact parameters may yield reliable predictions of ΔΔG for datasets of mutations confined to the same amino acid positions in the sequence of a single protein.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXht1Kiu7s%253D&md5=1c0ddc0d286bbd29f6dbde6e0572af8f
153. 153
  Pucci, F.; Bernaerts, K. V.; Kwasigroch, J. M.; Rooman, M. Quantification of Biases in Predictions of Protein Stability Changes upon Mutations. Bioinformatics 2018, 34, 3659– 3665, DOI: 10.1093/bioinformatics/bty348
  
  153
  Quantification of biases in predictions of protein stability changes upon mutations
  
  Pucci, Fabrizio; Bernaerts, Katrien V.; Kwasigroch, Jean Marc; Rooman, Marianne
  
  Bioinformatics (2018), 34 (21), 3659-3665CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)
  
  Motivation: Bioinformatics tools that predict protein stability changes upon point mutations have made a lot of progress in the last decades and have become accurate and fast enough to make computational mutagenesis expts. feasible, even on a proteome scale. One of these problems is their bias toward the learning datasets which, being dominated by destabilizing mutations, causes predictions to be better for destabilizing than for stabilizing mutations. Results: We thoroughly analyzed the biases in the prediction of folding free energy changes upon point mutations (ΔΔG°) and proposed some unbiased solns. We started by constructing a dataset Ssym of exptl. measured ΔΔG°s with an equal no. of stabilizing and destabilizing mutations, by collecting mutations for which the structure of both the wild-type and mutant protein is available. On this balanced dataset, we assessed the performances of 15 widely used ΔΔG° predictors. After the astonishing observation that almost all these methods are strongly biased toward destabilizing mutations, esp. those that use black-box machine learning, we proposed an elegant way to solve the bias issue by imposing phys. symmetries under inverse mutations on the model structure, which we implemented in PoPMuSiCsym. This new predictor constitutes an efficient trade-off between accuracy and absence of biases. Some final considerations and suggestions for further improvement of the predictors are discussed.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVOmtbzF&md5=a54b52981c88512a4c3e843c8aee584b
154. 154
  Yin, S.; Ding, F.; Dokholyan, N. V. Eris: An Automated Estimator of Protein Stability. Nat. Methods 2007, 4, 466– 467, DOI: 10.1038/nmeth0607-466
  
  154
  Eris: an automated estimator of protein stability
  
  Yin, Shuangye; Ding, Feng; Dokholyan, Nikolay V.
  
  Nature Methods (2007), 4 (6), 466-467CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)
  
  There is no expanded citation for this reference.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2sXlvVykurg%253D&md5=63263ef71a3219de60a8faff2ca9cfe3
155. 155
  Benedix, A.; Becker, C. M.; de Groot, B. L.; Caflisch, A.; Böckmann, R. A. Predicting Free Energy Changes Using Structural Ensembles. Nat. Methods 2009, 6, 3– 4, DOI: 10.1038/nmeth0109-3
  
  155
  Predicting free energy changes using structural ensembles
  
  Benedix, Alexander; Becker, Caroline M.; de Groot, Bert L.; Caflisch, Amedeo; Boeckmann, Rainer A.
  
  Nature Methods (2009), 6 (1), 3-4CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)
  
  Reliable and fast computation of protein free energy is crucial for protein-structure anal., structure-based protein design and protein docking. Rigorous treatments based on phys. effective energy functions involve computationally expensive methods such as free energy perturbation, which are time-consunming and are thus incompatible with the need to perform extensive scans. Commonly used fast methods, in turn, involve empirically derived scoring functions and usually do not include protein flexibility or are based on statistical potentials and are therefore highly dependent on the availability of case-dependent exptl. training data. Hence, such methods are inherently limited in accuracy and applicability. Here we propose a computational, structure-based method named Concoord/Poisson-Boltzmann surface area (CC/PBSA) for both fast and quant. estn. of the folding free energy of mutants, that is for measuring their conformational stability and for predicting the effect of mutations on protein-protein binding affinity. The first step is to rapidly generate alternative protein conformations via the program Concoord, which efficiently samples the available configurational spaced. The crystal or NMR input structure is translated into a geometric description of the complex, and starting from random coordinates, 300-600 structures both of the mutant and the wild type are generated by iteratively correcting the coordinates until all geometric constraints are fulfilled. Then an energy function based on phys. chem. (force field) and an efficient continuum solvent approach, the soln. of the Poisson-Boltzmann equation and a term for nonpolar solvation, is averaged over the generated structural ensembles.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXhsFCku77K&md5=fb79d05fe6984884761d2877f454d87f
156. 156
  Pronk, S.; Páll, S.; Schulz, R.; Larsson, P.; Bjelkmar, P.; Apostolov, R.; Shirts, M. R.; Smith, J. C.; Kasson, P. M.; van der Spoel, D.; Hess, B.; Lindahl, E. GROMACS 4.5: A High-Throughput and Highly Parallel Open Source Molecular Simulation Toolkit. Bioinformatics 2013, 29, 845– 854, DOI: 10.1093/bioinformatics/btt055
  
  156
  GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit
  
  Pronk, Sander; Pall, Szilard; Schulz, Roland; Larsson, Per; Bjelkmar, Paer; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik
  
  Bioinformatics (2013), 29 (7), 845-854CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Motivation: Mol. simulation has historically been a low-throughput technique, but faster computers and increasing amts. of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomols. with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomol. interaction and function in a manner directly testable by expt. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomols., such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these mols. built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXksFWmsrg%253D&md5=4b25fd6ab4e33725ae56b5da63f4ad68
157. 157
  de Groot, B. L.; van Aalten, D. M.; Scheek, R. M.; Amadei, A.; Vriend, G.; Berendsen, H. J. C. Prediction of Protein Conformational Freedom from Distance Constraints. Proteins: Struct., Funct., Genet. 1997, 29, 240– 251, DOI: 10.1002/(SICI)1097-0134(199710)29:2<240::AID-PROT11>3.0.CO;2-O
  
  157
  Prediction of protein conformational freedom from distance constraints
  
  de Groot, B. L.; van Aalten, D. M. F.; Scheek, R. M.; Amadei, A.; Vriend, G.; Berendsen, H. J. C.
  
  Proteins: Structure, Function, and Genetics (1997), 29 (2), 240-251CODEN: PSFGEY; ISSN:0887-3585. (Wiley-Liss)
  
  A method is presented that generates random protein structures that fulfil a set of upper and lower interat. distance limits. These limits depend on distances measured in exptl. structures and the strength of the interat. interaction. Structural differences between generated structures are similar to those obtained from expt. and from MD simulation. Although detailed aspects of dynamical mechanisms are not covered and the extent of variations are only estd. in a relative sense, applications to an IgG-binding domain, an SH3 binding domain, HPr, calmodulin, and lysozyme are presented which illustrate the use of the method as a fast and simple way to predict structural variability in proteins. The method may be used to support the design of mutants, when structural fluctuations for a large no. of mutants are to be screened. The results suggest that motional freedom in proteins is ruled largely by a set of simple geometric constraints.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK2sXntVOhsbY%253D&md5=8840fe5112570bbefcc0ca3e89282ada
158. 158
  Hoppe, C.; Schomburg, D. Prediction of Protein Thermostability with a Direction- and Distance-Dependent Knowledge-Based Potential. Protein Sci. 2005, 14, 2682– 2692, DOI: 10.1110/ps.04940705
  
  158
  Prediction of protein thermostability with a direction- and distance-dependent knowledge-based potential
  
  Hoppe, Christian; Schomburg, Dietmar
  
  Protein Science (2005), 14 (10), 2682-2692CODEN: PRCIEI; ISSN:0961-8368. (Cold Spring Harbor Laboratory Press)
  
  The increasing use of enzymes in industrial processes and the importance of understanding protein folding and stability have led to several attempts to predict and quantify the effect of every possible amino acid exchange (mutation) on the thermostability of proteins. In this article the authors describe a knowledge-based discrimination function that acts as a fast and reliable guide in protein engineering and optimization. The function used consists of two parts, a pairwise energy function based on a distance- and direction-dependent at. description of the amino acid environment, and a torsion angle energy function. In a first step a training set of 11 proteins including 646 mutant proteins with exptl. detd. thermostability was used to optimize the knowledge-based energy functions. The resulting potential function was then tested using a test mutant database consisting of 918 various point mutations introduced in 27 proteins. The best correlation coeff. obtained for the exptl. data and the predicted thermostability for the training set is r = 0.81 (561 data points). A total of 76% of the mutations could be predicted correctly as being either stabilizing or destabilizing. The results for the test set are r = 0.74 (747 data points) and 72%, resp. The global correlation over the combined data (1308 mutants) obtained is 0.78.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhtVOrurzN&md5=b9439e5a0eca60cb33c2cbd4762ba7a4
159. 159
  Pucci, F.; Bourgeas, R.; Rooman, M. Predicting Protein Thermal Stability Changes upon Point Mutations Using Statistical Potentials: Introducing HoTMuSiC. Sci. Rep. 2016, 6, 23257, DOI: 10.1038/srep23257
  
  159
  Predicting protein thermal stability changes upon point mutations using statistical potentials: Introducing HoTMuSiC
  
  Pucci, Fabrizio; Bourgeas, Raphael; Rooman, Marianne
  
  Scientific Reports (2016), 6 (), 23257CODEN: SRCEC3; ISSN:2045-2322. (Nature Publishing Group)
  
  The accurate prediction of the impact of an amino acid substitution on the thermal stability of a protein is a central issue in protein science, and is of key relevance for the rational optimization of various bioprocesses that use enzymes in unusual conditions. Here we present one of the first computational tools to predict the change in melting temp. ΔTm upon point mutations, given the protein structure and, when available, the melting temp. Tm of the wild-type protein. The key ingredients of our model structure are std. and temp.-dependent statistical potentials, which are combined with the help of an artificial neural network. The model structure was chosen on the basis of a detailed thermodn. anal. of the system. The parameters of the model were identified on a set of more than 1,600 mutations with exptl. measured ΔTm. The performance of our method was tested using a strict 5-fold cross-validation procedure, and was found to be significantly superior to that of competing methods. We obtained a root mean square deviation between predicted and exptl. ΔTm values of 4.2 °C that reduces to 2.9 °C when ten percent outliers are removed. A webserver-based tool is freely available for non-com. use at soft.dezyme.com.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xks1emurk%253D&md5=f945740ddb0a07c32253903a4e3cdfbd
160. 160
  Capriotti, E.; Fariselli, P.; Casadio, R. I-Mutant2.0: Predicting Stability Changes upon Mutation from the Protein Sequence or Structure. Nucleic Acids Res. 2005, 33, W306– W310, DOI: 10.1093/nar/gki375
  
  160
  I-Mutant2.0: predicting stability changes upon mutation from the protein sequence or structure
  
  Capriotti, Emidio; Fariselli, Piero; Casadio, Rita
  
  Nucleic Acids Research (2005), 33 (Web Server), W306-W310CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
  
  I-Mutant2.0 is a support vector machine (SVM)-based tool for the automatic prediction of protein stability changes upon single point mutations. I-Mutant2.0 predictions are performed starting either from the protein structure or, more importantly, from the protein sequence. This latter task, to the best of our knowledge, is exploited for the first time. The method was trained and tested on a data set derived from ProTherm, which is presently the most comprehensive available database of thermodn. exptl. data of free energy changes of protein stability upon mutation under different conditions. I-Mutant2.0 can be used both as a classifier for predicting the sign of the protein stability change upon mutation and as a regression estimator for predicting the related ΔΔG values. Acting as a classifier, I-Mutant2.0 correctly predicts (with a cross-validation procedure) 80% or 77% of the data set, depending on the usage of structural or sequence information, resp. When predicting ΔΔG values assocd. with mutations, the correlation of predicted with expected/exptl. values is 0.71 (with a std. error of 1.30 kcal/mol) and 0.62 (with a std. error of 1.45 kcal/mol) when structural or sequence information are resp. adopted. Our web interface allows the selection of a predictive mode that depends on the availability of the protein structure and/or sequence. In this latter case, the web server requires only pasting of a protein sequence in a raw format. We therefore introduce I-Mutant2.0 as a unique and valuable helper for protein design, even when the protein structure is not yet known with at. resoln. Availability: http://gpcr.biocomp.uniboit/cgi/predictors/I-Mutant2.0/I-Mutant2.0.cgi.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXlslyrtLY%253D&md5=75a8728d1e9b62a97b205910ca190d40
161. 161
  Cheng, J.; Randall, A.; Baldi, P. Prediction of Protein Stability Changes for Single-Site Mutations Using Support Vector Machines. Proteins: Struct., Funct., Genet. 2006, 62, 1125– 1132, DOI: 10.1002/prot.20810
  
  161
  Prediction of protein stability changes for single-site mutations using support vector machines
  
  Cheng, Jianlin; Randall, Arlo; Baldi, Pierre
  
  Proteins: Structure, Function, and Bioinformatics (2006), 62 (4), 1125-1132CODEN: PSFBAF ISSN:. (Wiley-Liss, Inc.)
  
  Accurate prediction of protein stability changes resulting from single amino acid mutations is important for understanding protein structures and designing new proteins. The authors use support vector machines to predict protein stability changes for single amino acid mutations leveraging both sequence and structural information. The authors evaluate their approach using cross-validation methods on a large dataset of single amino acid mutations. When only the sign of the stability changes is considered, the predictive method achieves 84% accuracy - a significant improvement over previously published results. Moreover, the exptl. results show that the prediction accuracy obtained using sequence alone is close to the accuracy obtained using tertiary structure information. Because the authors' method can accurately predict protein stability changes using primary sequence information only, it is applicable to many situations where the tertiary structure is unknown, overcoming a major limitation of previous methods which require tertiary information. The web server for predictions of protein stability changes upon mutations (MU-pro), software, and datasets are available at http://www.igb.uci.edu/servers/servers.html.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XivVWnsrY%253D&md5=a14fdcf11c855a7eefbdee0ebb152aea
162. 162
  Wainreb, G.; Wolf, L.; Ashkenazy, H.; Dehouck, Y.; Ben-Tal, N. Protein Stability: A Single Recorded Mutation Aids in Predicting the Effects of Other Mutations in the Same Amino Acid Site. Bioinformatics 2011, 27, 3286– 3292, DOI: 10.1093/bioinformatics/btr576
  
  162
  Protein stability: a single recorded mutation aids in predicting the effects of other mutations in the same amino acid site
  
  Wainreb, Gilad; Wolf, Lior; Ashkenazy, Haim; Dehouck, Yves; Ben-Tal, Nir
  
  Bioinformatics (2011), 27 (23), 3286-3292CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Motivation: Accurate prediction of protein stability is important for understanding the mol. underpinnings of diseases and for the design of new proteins. We introduce a novel approach for the prediction of changes in protein stability that arise from a single-site amino acid substitution; the approach uses available data on mutations occurring in the same position and in other positions. Our algorithm, named Pro-Maya (Protein Mutant stAbilitY Analyzer), combines a collaborative filtering baseline model, Random Forests regression and a diverse set of features. Pro-Maya predicts the stability free energy difference of mutant vs. wild type, denoted as ΔΔG. Results: We evaluated our algorithm extensively using cross-validation on two previously utilized datasets of single amino acid mutations and a (third) validation set. The results indicate that using known ΔΔG values of mutations at the query position improves the accuracy of ΔΔG predictions for other mutations in that position. The accuracy of our predictions in such cases significantly surpasses that of similar methods, achieving, e.g. a Pearson's correlation coeff. of 0.79 and a root mean square error of 0.96 on the validation set. Because Pro-Maya uses a diverse set of features, including predictions using two other methods, it also performs slightly better than other methods in the absence of addnl. exptl. data on the query positions. Availability: Pro-Maya is freely available via web server at http://bentalτac.il/ProMaya. Contact: nirb@tauexτac.il; wolf@Csτac.il.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXhsFCit7vE&md5=aecb549af2b498e90d14d6ec222f6e07
163. 163
  Li, Y.; Fang, J. PROTS-RF: A Robust Model for Predicting Mutation-Induced Protein Stability Changes. PLoS One 2012, 7, e47247, DOI: 10.1371/journal.pone.0047247
  
  163
  PROTS-RF: a robust model for predicting mutation-induced protein stability changes
  
  Li, Yunqi; Fang, Jianwen
  
  PLoS One (2012), 7 (10), e47247CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)
  
  The ability to improve protein thermostability via protein engineering is of great scientific interest and also has significant practical value. In this report we present PROTS-RF, a robust model based on the Random Forest algorithm capable of predicting thermostability changes induced by not only single-, but also double- or multiple-point mutations. The model is built using 41 features including evolutionary information, secondary structure, solvent accessibility and a set of fragment-based features. It achieves accuracies of 0.799, 0.782, 0.787 and areas under receiver operating characteristic (ROC) curves of 0.873, 0.868 and 0.862 for single-, double- and multiple- point mutation datasets, resp. Contrary to previous suggestions, our results clearly demonstrate that a robust predictive model trained for predicting single point mutation induced thermostability changes can be capable of predicting double and multiple point mutations. It also shows high levels of robustness in the tests using hypothetical reverse mutations. We demonstrate that testing datasets created based on phys. principles can be highly useful for testing the robustness of predictive models.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xhs1SitLnK&md5=290f4ce672a13b0db81ce26e5bc2516d
164. 164
  Quang, D.; Chen, Y.; Xie, X. DANN: A Deep Learning Approach for Annotating the Pathogenicity of Genetic Variants. Bioinformatics 2015, 31, 761– 763, DOI: 10.1093/bioinformatics/btu703
  
  164
  DANN: a deep learning approach for annotating the pathogenicity of genetic variants
  
  Quang, Daniel; Chen, Yifei; Xie, Xiaohui
  
  Bioinformatics (2015), 31 (5), 761-763CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Annotating genetic variants, esp. non-coding variants, for the purpose of identifying pathogenic variants remains a challenge. Combined annotation-dependent depletion (CADD) is an algorithm designed to annotate both coding and non-coding variants, and has been shown to outperform other annotation algorithms. CADD trains a linear kernel support vector machine (SVM) to differentiate evolutionarily derived, likely benign, alleles from simulated, likely deleterious, variants. However, SVMs cannot capture non-linear relationships among the features, which can limit performance. To address this issue, we have developed DANN. DANN uses the same feature set and training data as CADD to train a deep neural network (DNN). DNNs can capture non-linear relationships among features and are better suited than SVMs for problems with a large no. of samples and features. We exploit Compute Unified Device Architecture-compatible graphics processing units and deep learning techniques such as dropout and momentum training to accelerate the DNN training. DANN achieves about a 19% relative redn. in the error rate and about a 14% relative increase in the area under the curve (AUC) metric over CADD's SVM methodol.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1GntLfP&md5=dbb0345a0d2f9b399bdd47e229b40755
165. 165
  Wang, Y.; Mao, H.; Yi, Z. Protein Secondary Structure Prediction by Using Deep Learning Method. Knowl.-Based Syst. 2017, 118, 115– 123, DOI: 10.1016/j.knosys.2016.11.015
  
  There is no corresponding record for this reference.
166. 166
  Ivakhnenko, A. G. Polynomial Theory of Complex Systems. IEEE Trans. Syst., Man, Cybern. 1971, SMC-1, 364– 378, DOI: 10.1109/TSMC.1971.4308320
  
  There is no corresponding record for this reference.
167. 167
  Bengio, Y.; Boulanger-Lewandowski, N.; Pascanu, R. Advances in Optimizing Recurrent Networks. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing; IEEE: New York, 2013; pp 8624– 8628.
  
  There is no corresponding record for this reference.
168. 168
  Cang, Z.; Wei, G.-W. TopologyNet: Topology Based Deep Convolutional and Multi-Task Neural Networks for Biomolecular Property Predictions. PLoS Comput. Biol. 2017, 13, e1005690, DOI: 10.1371/journal.pcbi.1005690
  
  168
  TopologyNet: Topology based deep convolutional and multi-task neural networks for biomolecular property predictions
  
  Cang, Zixuan; Wei, Guo-Wei
  
  PLoS Computational Biology (2017), 13 (7), e1005690/1-e1005690/27CODEN: PCBLBG; ISSN:1553-7358. (Public Library of Science)
  
  Although deep learning approaches have had tremendous success in image, video and audio processing, computer vision, and speech recognition, their applications to threedimensional (3D) biomol. structural data sets have been hindered by the geometric and biol. complexity. To address this problem we introduce the element-specific persistent homol. (ESPH) method. ESPH represents 3D complex geometry by onedimensional (1D) topol. invariants and retains important biol. information via a multichannel image-like representation. This representation reveals hidden structure-function relationships in biomols. We further integrate ESPH and deep convolutional neural networks to construct a multichannel topol. neural network (TopologyNet) for the predictions of protein-ligand binding affinities and protein stability changes upon mutation. To overcome the deep learning limitations from small and noisy training sets, we propose a multi-task multichannel topol. convolutional neural network (MM-TCNN). We demonstrate that TopologyNet outperforms the latest methods in the prediction of protein-ligand binding affinities, mutation induced globular protein folding free energy changes, and mutation induced membrane protein folding free energy changes.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1cXivVWhur4%253D&md5=f09964962b86fa1f30903097cb9e7122
169. 169
  Laimer, J.; Hofer, H.; Fritz, M.; Wegenkittl, S.; Lackner, P. MAESTRO - Multi Agent Stability Prediction upon Point Mutations. BMC Bioinf. 2015, 16, 116, DOI: 10.1186/s12859-015-0548-6
  
  169
  MAESTRO--multi agent stability prediction upon point mutations
  
  Laimer Josef; Hofer Heidi; Lackner Peter; Laimer Josef; Fritz Marko; Wegenkittl Stefan
  
  BMC bioinformatics (2015), 16 (), 116 ISSN:.
  
  BACKGROUND: Point mutations can have a strong impact on protein stability. A change in stability may subsequently lead to dysfunction and finally cause diseases. Moreover, protein engineering approaches aim to deliberately modify protein properties, where stability is a major constraint. In order to support basic research and protein design tasks, several computational tools for predicting the change in stability upon mutations have been developed. Comparative studies have shown the usefulness but also limitations of such programs. RESULTS: We aim to contribute a novel method for predicting changes in stability upon point mutation in proteins called MAESTRO. MAESTRO is structure based and distinguishes itself from similar approaches in the following points: (i) MAESTRO implements a multi-agent machine learning system. (ii) It also provides predicted free energy change (Δ ΔG) values and a corresponding prediction confidence estimation. (iii) It provides high throughput scanning for multi-point mutations where sites and types of mutation can be comprehensively controlled. (iv) Finally, the software provides a specific mode for the prediction of stabilizing disulfide bonds. The predictive power of MAESTRO for single point mutations and stabilizing disulfide bonds is comparable to similar methods. CONCLUSIONS: MAESTRO is a versatile tool in the field of stability change prediction upon point mutations. Executables for the Linux and Windows operating systems are freely available to non-commercial users from http://biwww.che.sbg.ac.at/MAESTRO.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2MjkvFGnsQ%253D%253D&md5=531c2cd74bf7afb2770b54fa88e8b71d
170. 170
  Khan, S.; Vihinen, M. Performance of Protein Stability Predictors. Hum. Mutat. 2010, 31, 675– 684, DOI: 10.1002/humu.21242
  
  170
  Performance of protein stability predictors
  
  Khan, Sofia; Vihinen, Mauno
  
  Human Mutation (2010), 31 (6), 675-684CODEN: HUMUE3; ISSN:1059-7794. (Wiley-Liss, Inc.)
  
  Stability is a fundamental property affecting function, activity, and regulation of biomols. Stability changes are often found for mutated proteins involved in diseases. Stability predictors computationally predict protein-stability changes caused by mutations. We performed a systematic anal. of 11 online stability predictors' performances. These predictors are CUPSAT, Dmutant, FoldX, I-Mutant2.0, two versions of I-Mutant3.0 (sequence and structure versions), MultiMutate, MUpro, SCide, Scpred, and SRide. As input, 1,784 single mutations found in 80 proteins were used, and these mutations did not include those used for training. The programs' performances were also assessed according to where the mutations were found in the proteins, i.e., in secondary structures and on the surface or in the core of a protein, and according to protein structure type. The extents to which the mutations altered the occupied vols. at the residue sites and the charge interactions were also characterized. The predictions of all programs were in line with the exptl. data. I-Mutant3.0 (utilizing structural information), Dmutant, and FoldX were the most reliable predictors. The stability-center predictors performed with similar accuracy. However, at best, the predictions were only moderately accurate (∼60%) and significantly better tools would be needed for routine anal. of mutation effects.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXosl2lu70%253D&md5=5d887eca5281e83e6e02d9d7e2ff1176
171. 171
  Usmanova, D. R.; Bogatyreva, N. S.; Ariño Bernad, J.; Eremina, A. A.; Gorshkova, A. A.; Kanevskiy, G. M.; Lonishin, L. R.; Meister, A. V.; Yakupova, A. G.; Kondrashov, F. A.; Ivankov, D. N. Self-Consistency Test Reveals Systematic Bias in Programs for Prediction Change of Stability upon Mutation. Bioinformatics 2018, 34, 3653– 3658, DOI: 10.1093/bioinformatics/bty340
  
  171
  Self-consistency test reveals systematic bias in programs for prediction change of stability upon mutation
  
  Usmanova, Dinara R.; Bogatyreva, Natalya S.; Bernad, Joan Arino; Eremina, Aleksandra A.; Gorshkova, Anastasiya A.; Kanevskiy, German M.; Lonishin, Lyubov R.; Meister, Alexander V.; Yakupova, Alisa G.; Kondrashov, Fyodor A.; Ivankov, Dmitry N.
  
  Bioinformatics (2018), 34 (21), 3653-3658CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)
  
  Motivation: Computational prediction of the effect of mutations on protein stability is used by researchers in many fields. The utility of the prediction methods is affected by their accuracy and bias. Bias, a systematic shift of the predicted change of stability, has been noted as an issue for several methods, but has not been investigated systematically. Presence of the bias may lead to misleading results esp. when exploring the effects of combination of different mutations. Results: Here we use a protocol to measure the bias as a function of the no. of introduced mutations. It is based on a self-consistency test of the reciprocity the effect of a mutation. An advantage of the used approach is that it relies solely on crystal structures without exptl. measured stability values. We applied the protocol to four popular algorithms predicting change of protein stability upon mutation, FoldX, Eris, Rosetta and I-Mutant, and found an inherent bias. For one program, FoldX, we manage to substantially reduce the bias using addnl. relaxation by Modeller. Authors using algorithms for predicting effects of mutations should be aware of the bias described here.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVOmtb3M&md5=b9568a7765adfed851715d8e389c42f0
172. 172
  Montanucci, L.; Martelli, P. L.; Ben-Tal, N.; Fariselli, P. A Natural Upper Bound to the Accuracy of Predicting Protein Stability Changes upon Mutations. 2018, arXiv:1809.10389 [q-bio.BM]. arXiv.org e-Print archive. https://arxiv.org/abs/1809.10389.
  
  There is no corresponding record for this reference.
173. 173
  Rice, P.; Longden, I.; Bleasby, A. EMBOSS: The European Molecular Biology Open Software Suite. Trends Genet. 2000, 16, 276– 277, DOI: 10.1016/S0168-9525(00)02024-2
  
  173
  EMBOSS: the european molecular biology open software suite
  
  Rice, Peter; Longden, Ian; Bleasby, Alan
  
  Trends in Genetics (2000), 16 (6), 276-277CODEN: TRGEE2; ISSN:0168-9525. (Elsevier Science Ltd.)
  
  There is no expanded citation for this reference.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3cXjvVygsbs%253D&md5=6608aa9c93ff3740ca8af20578774ebe
174. 174
  Lu, G.; Moriyama, E. N. Vector NTI, a Balanced All-in-One Sequence Analysis Suite. Briefings Bioinf. 2004, 5, 378– 388, DOI: 10.1093/bib/5.4.378
  
  174
  Vector NTI, a balanced all-in-one sequence analysis suite
  
  Lu, Guoqing; Moriyama, Etsuko N.
  
  Briefings in Bioinformatics (2004), 5 (4), 378-388CODEN: BBIMFX; ISSN:1467-5463. (Henry Stewart Publications)
  
  A review. Vector NTI is a well-balanced desktop application integrated for mol. sequence anal. and biol. data management. It has a centralized database and five application modules: Vector NTI, AlignX, BioAnnotator, ContigExpress and GenomBench. The features and functions available in this software are examd. These include database management, primer design, virtual cloning, alignments, sequence assembly, 3D mol. viewer and Internet tools. Some problems encountered when using this software are also discussed. Vector NTI is a tool that can save time and enhance anal. but it requires some learning on the user's part and there are some issues that need to be addressed by the developer.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2MXhsVejt7k%253D&md5=6b12d412ce01d84107f45d90844ca199
175. 175
  Bendl, J.; Stourac, J.; Sebestova, E.; Vavra, O.; Musil, M.; Brezovsky, J.; Damborsky, J. HotSpot Wizard 2.0: Automated Design of Site-Specific Mutations and Smart Libraries in Protein Engineering. Nucleic Acids Res. 2016, 44, W479– 487, DOI: 10.1093/nar/gkw416
  
  175
  HotSpot Wizard 2.0: automated design of site-specific mutations and smart libraries in protein engineering
  
  Bendl, Jaroslav; Stourac, Jan; Sebestova, Eva; Vavra, Ondrej; Musil, Milos; Brezovsky, Jan; Damborsky, Jiri
  
  Nucleic Acids Research (2016), 44 (W1), W479-W487CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
  
  HotSpot Wizard 2.0 is a web server for automated identification of hot spots and design of smart libraries for engineering proteins' stability, catalytic activity, substrate specificity and enantioselectivity. The server integrates sequence, structural and evolutionary information obtained from 3 databases and 20 computational tools. Users are guided through the processes of selecting hot spots using four different protein engineering strategies and optimizing the resulting library's size by narrowing down a set of substitutions at individual randomized positions. The only required input is a query protein structure. The results of the calcns. are mapped onto the protein's structure and visualized with a JSmol applet. HotSpot Wizard lists annotated residues suitable for mutagenesis and can automatically design appropriate codons for each implemented strategy. Overall, HotSpot Wizard provides comprehensive annotations of protein structures and assists protein engineers with the rational design of site-specific mutations and focused libraries.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtV2itrfJ&md5=01158b85880a6ce74f23fa5a8ccb8fb8
176. 176
  Stamatakis, A. RAxML Version 8: A Tool for Phylogenetic Analysis and Post-Analysis of Large Phylogenies. Bioinformatics 2014, 30, 1312– 1313, DOI: 10.1093/bioinformatics/btu033
  
  176
  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies
  
  Stamatakis, Alexandros
  
  Bioinformatics (2014), 30 (9), 1312-1313CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Motivation: Phylogenies are increasingly used in all fields of medical and biol. research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under max. likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addn., an up-to-date 50-page user manual covering all new RAxML options is available. Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/std.-RAxML. Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXmvFCjsbc%253D&md5=4cd7a44e28cbb6dc49d38056c2c3d3a7
177. 177
  Ashkenazy, H.; Penn, O.; Doron-Faigenboim, A.; Cohen, O.; Cannarozzi, G.; Zomer, O.; Pupko, T. FastML: A Web Server for Probabilistic Reconstruction of Ancestral Sequences. Nucleic Acids Res. 2012, 40, W580– 584, DOI: 10.1093/nar/gks498
  
  177
  FastML: a web server for probabilistic reconstruction of ancestral sequences
  
  Ashkenazy, Haim; Penn, Osnat; Doron-Faigenboim, Adi; Cohen, Ofir; Cannarozzi, Gina; Zomer, Oren; Pupko, Tal
  
  Nucleic Acids Research (2012), 40 (W1), W580-W584CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
  
  Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at http://fastmlτac.il/.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXjtVCrs7Y%253D&md5=b38b2e961d01140374e2ae004157411f
178. 178
  Diallo, A. B.; Makarenkov, V.; Blanchette, M. Ancestors 1.0: A Web Server for Ancestral Sequence Reconstruction. Bioinformatics 2010, 26, 130– 131, DOI: 10.1093/bioinformatics/btp600
  
  178
  Ancestors 1.0: a web server for ancestral sequence reconstruction
  
  Diallo, Abdoulaye Banire; Makarenkov, Vladimir; Blanchette, Mathieu
  
  Bioinformatics (2010), 26 (1), 130-131CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Summary: The computational inference of ancestral genomes consists of five difficult steps: identifying syntenic regions, inferring ancestral arrangement of syntenic regions, aligning multiple sequences, reconstructing the insertion and deletion history and finally inferring substitutions. Each of these steps have received lot of attention in the past years. However, there currently exists no framework that integrates all of the different steps in an easy workflow. Here, we introduce Ancestors 1.0, a web server allowing one to easily and quickly perform the last three steps of the ancestral genome reconstruction procedure. It implements several alignment algorithms, an indel max. likelihood solver and a context-dependent max. likelihood substitution inference algorithm. The results presented by the server include the posterior probabilities for the last two steps of the ancestral genome reconstruction and the expected error rate of each ancestral base prediction.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhs1WlurnO&md5=97c14a9db63c10f8e238cf1a4424cd10
179. 179
  Westesson, O.; Barquist, L.; Holmes, I. HandAlign: Bayesian Multiple Sequence Alignment, Phylogeny and Ancestral Reconstruction. Bioinformatics 2012, 28, 1170– 1171, DOI: 10.1093/bioinformatics/bts058
  
  179
  HandAlign: Bayesian multiple sequence alignment, phylogeny and ancestral reconstruction
  
  Westesson, Oscar; Barquist, Lars; Holmes, Ian
  
  Bioinformatics (2012), 28 (8), 1170-1171CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Summary: We describe , a software package for Bayesian reconstruction of phylogenetic history. The underlying model of sequence evolution describes indels and substitutions. Alignments, trees and model parameters are all treated as jointly dependent random variables and sampled via Metropolis-Hastings Markov chain Monte Carlo (MCMC), enabling systematic statistical parameter inference and hypothesis testing. implements several different MCMC proposal kernels, allows sampling from arbitrary target distributions via Hastings ratios, and uses std. file formats for trees, alignments and models. Availability and Implementation: Installation and usage instructions are at http://biowiki.org/HandAlign Contact: [email protected] Supplementary information: Supplementary material is available at Bioinformatics online.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xlt1Gms70%253D&md5=b92f47dac2f20d877638f8a313602358
180. 180
  Ronquist, F.; Teslenko, M.; van der Mark, P.; Ayres, D. L.; Darling, A.; Höhna, S.; Larget, B.; Liu, L.; Suchard, M. A.; Huelsenbeck, J. P. MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice across a Large Model Space. Syst. Biol. 2012, 61, 539– 542, DOI: 10.1093/sysbio/sys029
  
  180
  MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space
  
  Ronquist Fredrik; Teslenko Maxim; van der Mark Paul; Ayres Daniel L; Darling Aaron; Hohna Sebastian; Larget Bret; Liu Liang; Suchard Marc A; Huelsenbeck John P
  
  Systematic biology (2012), 61 (3), 539-42 ISSN:.
  
  Since its introduction in 2001, MrBayes has grown in popularity as a software package for Bayesian phylogenetic inference using Markov chain Monte Carlo (MCMC) methods. With this note, we announce the release of version 3.2, a major upgrade to the latest official release presented in 2003. The new version provides convergence diagnostics and allows multiple analyses to be run in parallel with convergence progress monitored on the fly. The introduction of new proposals and automatic optimization of tuning parameters has improved convergence for many problems. The new version also sports significantly faster likelihood calculations through streaming single-instruction-multiple-data extensions (SSE) and support of the BEAGLE library, allowing likelihood calculations to be delegated to graphics processing units (GPUs) on compatible hardware. Speedup factors range from around 2 with SSE code to more than 50 with BEAGLE for codon problems. Checkpointing across all models allows long runs to be completed even when an analysis is prematurely terminated. New models include relaxed clocks, dating, model averaging across time-reversible substitution models, and support for hard, negative, and partial (backbone) tree constraints. Inference of species trees from gene trees is supported by full incorporation of the Bayesian estimation of species trees (BEST) algorithms. Marginal model likelihoods for Bayes factor tests can be estimated accurately across the entire model space using the stepping stone method. The new version provides more output options than previously, including samples of ancestral states, site rates, site d(N)/d(S) rations, branch rates, and node dates. A wide range of statistics on tree parameters can also be output for visualization in FigTree and compatible software.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC38vjvFCqsA%253D%253D&md5=08e0e38811e8752992234a53a0cd1d4f
181. 181
  Finn, R. D.; Clements, J.; Eddy, S. R. HMMER Web Server: Interactive Sequence Similarity Searching. Nucleic Acids Res. 2011, 39, W29– 37, DOI: 10.1093/nar/gkr367
  
  181
  HMMER web server: interactive sequence similarity searching
  
  Finn, Robert D.; Clements, Jody; Eddy, Sean R.
  
  Nucleic Acids Research (2011), 39 (Web Server), W29-W37CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
  
  HMMER is a software suite for protein sequence similarity searches using probabilistic methods. Previously, HMMER has mainly been available only as a computationally intensive UNIX command-line tool, restricting its use. Recent advances in the software, HMMER3, have resulted in a 100-fold speed gain relative to previous versions. It is now feasible to make efficient profile hidden Markov model (profile HMM) searches via the web. A HMMER web server (http://hmmer.janelia.org) has been designed and implemented such that most protein database searches return within a few seconds. Methods are available for searching either a single protein sequence, multiple protein sequence alignment or profile HMM against a target sequence database, and for searching a protein sequence against Pfam. The web server is designed to cater to a range of different user expertise and accepts batch uploading of multiple queries at once. All search methods are also available as RESTful web services, thereby allowing them to be readily integrated as remotely executed tasks in locally scripted work-flows. We have focused on minimizing search times and the ability to rapidly display tabular results, regardless of the no. of matches found, developing graphical summaries of the search results to provide quick, intuitive appraisement of them.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXosVOntbg%253D&md5=69e4432be46e905b8d9afa29c667f684
182. 182
  Altschul, S. F.; Gertz, E. M.; Agarwala, R.; Schäffer, A. A.; Yu, Y.-K. PSI-BLAST Pseudocounts and the Minimum Description Length Principle. Nucleic Acids Res. 2009, 37, 815– 824, DOI: 10.1093/nar/gkn981
  
  182
  PSI-BLAST pseudocounts and the minimum description length principle
  
  Altschul, Stephen F.; Gertz, E. Michael; Agarwala, Richa; Schaeffer, Alejandro A.; Yu, Yi-Kuo
  
  Nucleic Acids Research (2009), 37 (3), 815-824CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
  
  Position specific score matrixes (PSSMs) are derived from multiple sequence alignments to aid in the recognition of distant protein sequence relationships. The PSI-BLAST protein database search program derives the column scores of its PSSMs with the aid of pseudocounts, added to the obsd. amino acid counts in a multiple alignment column. In the absence of theory, the no. of pseudocounts used has been a completely empirical parameter. This article argues that the min. description length principle can motivate the choice of this parameter. Specifically, for realistic alignments, the principle supports the practice of using a no. of pseudocounts essentially independent of alignment size. However, it also implies that more highly conserved columns should use fewer pseudocounts, increasing the inter-column contrast of the implied PSSMs. A new method for calcg. pseudocounts that significantly improves PSI-BLAST's; retrieval accuracy is now employed by default.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXisFektrc%253D&md5=589075aa5cc67d2dbfa12552a8a939f1
183. 183
  Whitehead, T. A.; Chevalier, A.; Song, Y.; Dreyfus, C.; Fleishman, S. J.; De Mattos, C.; Myers, C. A.; Kamisetty, H.; Blair, P.; Wilson, I. A.; Baker, D. Optimization of Affinity, Specificity and Function of Designed Influenza Inhibitors Using Deep Sequencing. Nat. Biotechnol. 2012, 30, 543– 548, DOI: 10.1038/nbt.2214
  
  183
  Optimization of affinity, specificity and function of designed influenza inhibitors using deep sequencing
  
  Whitehead, Timothy A.; Chevalier, Aaron; Song, Yifan; Dreyfus, Cyrille; Fleishman, Sarel J.; De Mattos, Cecilia; Myers, Chris A.; Kamisetty, Hetunandan; Blair, Patrick; Wilson, Ian A.; Baker, David
  
  Nature Biotechnology (2012), 30 (6), 543-548CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)
  
  We show that comprehensive sequence-function maps obtained by deep sequencing can be used to reprogram interaction specificity and to leapfrog over bottlenecks in affinity maturation by combining many individually small contributions not detectable in conventional approaches. We use this approach to optimize two computationally designed inhibitors against H1N1 influenza hemagglutinin and, in both cases, obtain variants with subnanomolar binding affinity. The most potent of these, a 51-residue protein, is broadly cross-reactive against all influenza group 1 hemagglutinins, including human H2, and neutralizes H1N1 viruses with a potency that rivals that of several human monoclonal antibodies, demonstrating that computational design followed by comprehensive energy landscape mapping can generate proteins with potential therapeutic utility.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XnsFKgu7s%253D&md5=510fc078ab77b487db059e932395513c
184. 184
  Shimizu, Y.; Inoue, A.; Tomari, Y.; Suzuki, T.; Yokogawa, T.; Nishikawa, K.; Ueda, T. Cell-Free Translation Reconstituted with Purified Components. Nat. Biotechnol. 2001, 19, 751– 755, DOI: 10.1038/90802
  
  184
  Cell-free translation reconstituted with purified components
  
  Shimizu, Yoshihiro; Inoue, Akio; Tomari, Yukihide; Suzuki, Tsutomu; Yokogawa, Takashi; Nishikawa, Kazuya; Ueda, Takuya
  
  Nature Biotechnology (2001), 19 (8), 751-755CODEN: NABIF9; ISSN:1087-0156. (Nature America Inc.)
  
  We have developed a protein-synthesizing system reconstituted from recombinant tagged protein factors purified to homogeneity. The system was able to produce protein at a rate of about 160 μg/mL/h in a batch mode without the need for any supplementary app. The protein products were easily purified within 1 h using affinity chromatog. to remove the tagged protein factors. Moreover, omission of a release factor allowed efficient incorporation of an unnatural amino acid using suppressor tRNA.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD3MXlslekt7g%253D&md5=8560f1b7319ea88b4784a4f02bafcbaf
185. 185
  Niwa, T.; Kanamori, T.; Ueda, T.; Taguchi, H. Global Analysis of Chaperone Effects Using a Reconstituted Cell-Free Translation System. Proc. Natl. Acad. Sci. U. S. A. 2012, 109, 8937– 8942, DOI: 10.1073/pnas.1201380109
  
  185
  Global analysis of chaperone effects using a reconstituted cell-free translation system
  
  Niwa, Tatsuya; Kanamori, Takashi; Ueda, Takuya; Taguchi, Hideki
  
  Proceedings of the National Academy of Sciences of the United States of America (2012), 109 (23), 8937-8942, S8937/1-S8937/8CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)
  
  Protein folding is often hampered by protein aggregation, which can be prevented by a variety of chaperones in the cell. A dataset that evaluates which chaperones are effective for aggregation-prone proteins would provide an invaluable resource not only for understanding the roles of chaperones, but also for broader applications in protein science and engineering. Therefore, we comprehensively evaluated the effects of the major Escherichia coli chaperones, trigger factor, DnaK/DnaJ/GrpE, and GroEL/GroES, on ∼800 aggregation-prone cytosolic E. coli proteins, using a reconstituted chaperone-free translation system. Statistical analyses revealed the robustness and the intriguing properties of chaperones. The DnaK and GroEL systems drastically increased the solubilities of hundreds of proteins with weak biases, whereas trigger factor had only a marginal effect on soly. The combined addn. of the chaperones was effective for a subset of proteins that were not rescued by any single chaperone system, supporting the synergistic effect of these chaperones. The resource, which is accessible via a public database, can be used to investigate the properties of proteins of interest in terms of their solubilities and chaperone effects.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38XovF2gtLw%253D&md5=72312246f5d49ef2d94e69dac05dca7b
186. 186
  Berman, H. M.; Gabanyi, M. J.; Kouranov, A.; Micallef, D. I.; Westbrook, J. Protein Structure Initiative - TargetTrack 2000–2017 - All Data Files. DOI: 10.5281/zenodo.821654 .
  
  There is no corresponding record for this reference.
187. 187
  Price, W. N.; Handelman, S. K.; Everett, J. K.; Tong, S. N.; Bracic, A.; Luff, J. D.; Naumov, V.; Acton, T.; Manor, P.; Xiao, R.; Rost, B.; Montelione, G. T.; Hunt, J. F. Large-Scale Experimental Studies Show Unexpected Amino Acid Effects on Protein Expression and Solubility in Vivo in E. coli. Microb. Inf. Exp. 2011, 1, 6, DOI: 10.1186/2042-5783-1-6
  
  187
  Large-scale experimental studies show unexpected amino acid effects on protein expression and solubility in vivo in E. coli
  
  Price, W. Nicholson, II; Handelman, Samuel K.; Everett, John K.; Tong, Saichiu N.; Bracic, Ana; Luff, Jon D.; Naumov, Victor; Acton, Thomas; Manor, Philip; Xiao, Rong; Rost, Burkhard; Montelione, Gaetano T.; Hunt, John F.
  
  Microbial Informatics and Experimentation (2011), 1 (), 6CODEN: MIEIBV; ISSN:2042-5783. (BioMed Central Ltd.)
  
  The biochem. and phys. factors controlling protein expression level and soly. in vivo remain incompletely characterized. To gain insight into the primary sequence features influencing these outcomes, we performed statistical analyses of results from the high-throughput protein-prodn. pipeline of the Northeast Structural Genomics Consortium. Proteins expressed in E. coli and consistently purified were scored independently for expression and soly. levels. These parameters nonetheless show a very strong pos. correlation. We used logistic regressions to det. whether they are systematically influenced by fractional amino acid compn. or several bulk sequence parameters including hydrophobicity, sidechain entropy, electrostatic charge, and predicted backbone disorder. Decreasing hydrophobicity correlates with higher expression and soly. levels, but this correlation apparently derives solely from the beneficial effect of three charged amino acids, at least for bacterial proteins. In fact, the three most hydrophobic residues showed very different correlations with soly. level. Leu showed the strongest neg. correlation among amino acids, while Ile showed a slightly pos. correlation in most data segments. Several other amino acids also had unexpected effects. Notably, Arg correlated with decreased expression and, most surprisingly, soly. of bacterial proteins, an effect only partially attributable to rare codons. However, rare codons did significantly reduce expression despite use of a codon-enhanced strain. Addnl. analyses suggest that pos. but not neg. charged amino acids may reduce translation efficiency in E. coli irresp. of codon usage. While some obsd. effects may reflect indirect evolutionary correlations, others may reflect basic physicochem. phenomena. We used these results to construct and validate predictors of expression and soly. levels and overall protein usability, and we propose new strategies to be explored for engineering improved protein expression and soly.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXpt12gsbw%253D&md5=82e9dc51ba9a58313e6879a9c634717f
188. 188
  Hirose, S.; Kawamura, Y.; Yokota, K.; Kuroita, T.; Natsume, T.; Komiya, K.; Tsutsumi, T.; Suwa, Y.; Isogai, T.; Goshima, N.; Noguchi, T. Statistical Analysis of Features Associated with Protein Expression/Solubility in an in Vivo Escherichia coli Expression System and a Wheat Germ Cell-Free Expression System. J. Biochem. 2011, 150, 73– 81, DOI: 10.1093/jb/mvr042
  
  188
  Statistical analysis of features associated with protein expression/solubility in an in vivo Escherichia coli expression system and a wheat germ cell-free expression system
  
  Hirose, Shuichi; Kawamura, Yoshifumi; Yokota, Kiyonobu; Kuroita, Toshihiro; Natsume, Tohru; Komiya, Kazuo; Tsutsumi, Takeshi; Suwa, Yorimasa; Isogai, Takao; Goshima, Naoki; Noguchi, Tamotsu
  
  Journal of Biochemistry (2011), 150 (1), 73-81CODEN: JOBIAO; ISSN:0021-924X. (Japanese Biochemical Society)
  
  Recombinant protein technol. is an important tool in many industrial and pharmacol. applications. Although the success rate of obtaining sol. proteins is relatively low, knowledge of protein expression/soly. under std.' conditions may increase the efficiency and reduce the cost of proteomics studies. In this study, we conducted a genome-scale expt. to assess the overexpression and the soly. of human full-length cDNA in an in vivo Escherichia coli expression system and a wheat germ cell-free expression system. We evaluated the influences of sequence and structural features on protein expression/soly. in each system and estd. a minimal set of features assocd. with them. A comparison of the feature sets related to protein expression/soly. in the in vivo Escherichia coli expression system revealed that the structural information was strongly assocd. with protein expression, rather than protein soly. Moreover, a significant difference was found in the no. of features assocd. with protein soly. in the two expression systems.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3MXosVOnsbg%253D&md5=7330f2d39d93e7641ee73536e6faee97
189. 189
  Pawlicki, S.; Le Béchec, A.; Delamarche, C. AMYPdb: A Database Dedicated to Amyloid Precursor Proteins. BMC Bioinf. 2008, 9, 273, DOI: 10.1186/1471-2105-9-273
  
  189
  AMYPdb: a database dedicated to amyloid precursor proteins
  
  Pawlicki Sandrine; Le Bechec Antony; Delamarche Christian
  
  BMC bioinformatics (2008), 9 (), 273 ISSN:.
  
  BACKGROUND: Misfolding and aggregation of proteins into ordered fibrillar structures is associated with a number of severe pathologies, including Alzheimer's disease, prion diseases, and type II diabetes. The rapid accumulation of knowledge about the sequences and structures of these proteins allows using of in silico methods to investigate the molecular mechanisms of their abnormal conformational changes and assembly. However, such an approach requires the collection of accurate data, which are inconveniently dispersed among several generalist databases. RESULTS: We therefore created a free online knowledge database (AMYPdb) dedicated to amyloid precursor proteins and we have performed large scale sequence analysis of the included data. Currently, AMYPdb integrates data on 31 families, including 1,705 proteins from nearly 600 organisms. It displays links to more than 2,300 bibliographic references and 1,200 3D-structures. A Wiki system is available to insert data into the database, providing a sharing and collaboration environment. We generated and analyzed 3,621 amino acid sequence patterns, reporting highly specific patterns for each amyloid family, along with patterns likely to be involved in protein misfolding and aggregation. CONCLUSION: AMYPdb is a comprehensive online database aiming at the centralization of bioinformatic data regarding all amyloid proteins and their precursors. Our sequence pattern discovery and analysis approach unveiled protein regions of significant interest. AMYPdb is freely accessible 1.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1cvis1Kqsg%253D%253D&md5=066a0a7b2527a74deb78bad957070fc4
190. 190
  Thompson, M. J.; Sievers, S. A.; Karanicolas, J.; Ivanova, M. I.; Baker, D.; Eisenberg, D. The 3D Profile Method for Identifying Fibril-Forming Segments of Proteins. Proc. Natl. Acad. Sci. U. S. A. 2006, 103, 4074– 4078, DOI: 10.1073/pnas.0511295103
  
  190
  The 3D profile method for identifying fibril-forming segments of proteins
  
  Thompson, Michael J.; Sievers, Stuart A.; Karanicolas, John; Ivanova, Magdalena I.; Baker, David; Eisenberg, David
  
  Proceedings of the National Academy of Sciences of the United States of America (2006), 103 (11), 4074-4078CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)
  
  Based on the crystal structure of the cross-β spine formed by the peptide NNQQNY, we have developed a computational approach for identifying those segments of amyloidogenic proteins that themselves can form amyloid-like fibrils. The approach builds on expts. showing that hexapeptides are sufficient for forming amyloid-like fibrils. Each six-residue peptide of a protein of interest is mapped onto an ensemble of templates, or 3D profile, generated from the crystal structure of the peptide NNQQNY by small displacements of one of the two intermeshed β-sheets relative to the other. The energy of each mapping of a sequence to the profile is evaluated by using ROSETTADESIGN, and the lowest energy match for a given peptide to the template library is taken as the putative prediction. If the energy of the putative prediction is lower than a threshold value, a prediction of fibril formation is made. This method can reach an accuracy of ≈80% with a P value of ≈10-12 when a conservative energy threshold is used to sep. peptides that form fibrils from those that do not. We see enrichment for pos. predictions in a set of fibril-forming segments of amyloid proteins, and we illustrate the method with applications to proteins of interest in amyloid research.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD28XivFWitbo%253D&md5=e9bbb052fa861fe0f2ac116efeedaa23
191. 191
  Beerten, J.; Van Durme, J.; Gallardo, R.; Capriotti, E.; Serpell, L.; Rousseau, F.; Schymkowitz, J. WALTZ-DB: A Benchmark Database of Amyloidogenic Hexapeptides. Bioinformatics 2015, 31, 1698– 1700, DOI: 10.1093/bioinformatics/btv027
  
  191
  WALTZ-DB: a benchmark database of amyloidogenic hexapeptides
  
  Beerten, Jacinte; Van Durme, Joost; Gallardo, Rodrigo; Capriotti, Emidio; Serpell, Louise; Rousseau, Frederic; Schymkowitz, Joost
  
  Bioinformatics (2015), 31 (10), 1698-1700CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Summary: Accurate prediction of amyloid-forming amino acid sequences remains an important challenge. We here present an online database that provides open access to the largest set of exptl. characterized amyloid forming hexapeptides. To this end, we expanded our previous set of 280 hexapeptides used to develop the Waltz algorithm with 89 peptides from literature review and by systematic exptl. characterization of the aggregation of 720 hexapeptides by transmission electron microscopy, dye binding and Fourier transform IR spectroscopy. This brings the total no. of exptl. characterized hexapeptides in the WALTZ-DB database to 1089, of which 244 are annotated as pos. for amyloid formation.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1GntLbL&md5=dea8cf53396bc57a03ab464287bca20c
192. 192
  Wozniak, P. P.; Kotulska, M. AmyLoad: Website Dedicated to Amyloidogenic Protein Fragments. Bioinformatics 2015, 31, 3395– 3397, DOI: 10.1093/bioinformatics/btv375
  
  192
  AmyLoad: website dedicated to amyloidogenic protein fragments
  
  Wozniak, Pawel P.; Kotulska, Malgorzata
  
  Bioinformatics (2015), 31 (20), 3395-3397CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Analyses of amyloidogenic sequence fragments are essential in studies of neurodegenerative diseases. However, there is no one internet dataset that collects all the sequences that have been investigated for their amyloidogenicity. Therefore, we have created the AmyLoad website which collects the amyloidogenic sequences from all major sources. The website allows for filtration of the fragments and provides detailed information about each of them. Registered users can both personalize their work with the website and submit their own sequences into the database. To maintain database reliability, submitted sequences are reviewed before making them available to the public. Finally, we re-implemented several amyloidogenic sequence predictors, thus the AmyLoad website can be used as a sequence anal. tool. We encourage researchers working on amyloid proteins to contribute to our service.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1Cit7zK&md5=6a5be50aa459e25d0138ffd3226de846
193. 193
  Sastry, A.; Monk, J.; Tegel, H.; Uhlen, M.; Palsson, B. O.; Rockberg, J.; Brunk, E. Machine Learning in Computational Biology to Accelerate High-Throughput Protein Expression. Bioinformatics 2017, 33, 2487– 2495, DOI: 10.1093/bioinformatics/btx207
  
  193
  Machine learning in computational biology to accelerate high-throughput protein expression
  
  Sastry Anand; Monk Jonathan; Palsson Bernhard O; Brunk Elizabeth; Tegel Hanna; Uhlen Mathias; Rockberg Johan; Uhlen Mathias; Palsson Bernhard O; Brunk Elizabeth
  
  Bioinformatics (Oxford, England) (2017), 33 (16), 2487-2495 ISSN:.
  
  Motivation: The Human Protein Atlas (HPA) enables the simultaneous characterization of thousands of proteins across various tissues to pinpoint their spatial location in the human body. This has been achieved through transcriptomics and high-throughput immunohistochemistry-based approaches, where over 40 000 unique human protein fragments have been expressed in E. coli. These datasets enable quantitative tracking of entire cellular proteomes and present new avenues for understanding molecular-level properties influencing expression and solubility. Results: Combining computational biology and machine learning identifies protein properties that hinder the HPA high-throughput antibody production pipeline. We predict protein expression and solubility with accuracies of 70% and 80%, respectively, based on a subset of key properties (aromaticity, hydropathy and isoelectric point). We guide the selection of protein fragments based on these characteristics to optimize high-throughput experimentation. Availability and implementation: We present the machine learning workflow as a series of IPython notebooks hosted on GitHub (https://github.com/SBRG/Protein_ML). The workflow can be used as a template for analysis of further expression and solubility datasets. Contact: [email protected] or [email protected]. Supplementary information: Supplementary data are available at Bioinformatics online.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC1cvmslaguw%253D%253D&md5=df1098665ccb1c5b077d6c6887322336
194. 194
  Thangakani, A. M.; Nagarajan, R.; Kumar, S.; Sakthivel, R.; Velmurugan, D.; Gromiha, M. M. CPAD, Curated Protein Aggregation Database: A Repository of Manually Curated Experimental Data on Protein and Peptide Aggregation. PLoS One 2016, 11, e0152949, DOI: 10.1371/journal.pone.0152949
  
  194
  CPAD, curated protein aggregation database: a repository of manually curated experimental data on protein and peptide aggregation
  
  Thangakani, A. Mary; Nagarajan, R.; Kumar, Sandeep; Sakthivel, R.; Velmurugan, D.; Gromiha, M. Michael
  
  PLoS One (2016), 11 (4), e0152949/1-e0152949/7CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)
  
  Accurate distinction between peptide sequences that can form amyloid-fibrils or amorphous β-aggregates, identification of potential aggregation prone regions in proteins, and prediction of change in aggregation rate of a protein upon mutation(s) are crit. to research on protein misfolding diseases, such as Alzheimer's and Parkinson's, as well as biotechnol. prodn. of protein based therapeutics. We have developed a Curated Protein Aggregation Database (CPAD), which has collected results from exptl. studies performed by scientific community aimed at understanding protein/peptide aggregation. CPAD contains more than 2300 exptl. obsd. aggregation rates upon mutations in known amyloidogenic proteins. Each entry includes numerical values for the following parameters: change in rate of aggregation as measured by fluorescence intensity or turbidity, name and source of the protein, Uniprot and Protein Data Bank codes, single point as well as multiple mutations, and literature citation. The data in CPAD has been supplemented with five different types of addnl. information: (i) Amyloid fibril forming hexa-peptides, (ii) Amorphous β-aggregating hexa-peptides, (iii) Amyloid fibril forming peptides of different lengths, (iv) Amyloid fibril forming hexa-peptides whose crystal structures are available in the Protein Data Bank (PDB) and (v) Exptl. validated aggregation prone regions found in amyloidogenic proteins. Furthermore, CPAD is linked to other related databases and resources, such as Uniprot, Protein Data Bank, PUBMED, GAP, TANGO, WALTZ etc. We have set up a web interface with different search and display options so that users have the ability to get the data in multiple ways.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xht1Gns7bK&md5=25848d120280c0afb71e16fbe96f918d
195. 195
  Tian, Y.; Deutsch, C.; Krishnamoorthy, B. Scoring Function To Predict Solubility Mutagenesis. Algorithms Mol. Biol. 2010, 5, 33, DOI: 10.1186/1748-7188-5-33
  
  195
  Scoring function to predict solubility mutagenesis
  
  Tian Ye; Deutsch Christopher; Krishnamoorthy Bala
  
  Algorithms for molecular biology : AMB (2010), 5 (), 33 ISSN:.
  
  BACKGROUND: Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention. RESULTS: We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%. AVAILABILITY: Executables of programs, tables of weights, and datasets of mutants are available from the following web page: http://www.wsu.edu/~kbala/OptSolMut.html.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC3cbgvVOltA%253D%253D&md5=8bdcda410281dcee011391f17e78febf
196. 196
  Wilkinson, D. L.; Harrison, R. G. Predicting the Solubility of Recombinant Proteins in Escherichia coli. Nat. Biotechnol. 1991, 9, 443– 448, DOI: 10.1038/nbt0591-443
  
  196
  Predicting the solubility of recombinant proteins in Escherichia coli
  
  Wilkinson, David L.; Harrison, Roger G.
  
  Bio/Technology (1991), 9 (5), 443-8CODEN: BTCHDA; ISSN:0733-222X.
  
  The cause of inclusion body formation in E. coli grown at 37° was studied using statistical anal. of the compn. of 81 proteins that do and do not form inclusion bodies. Six compn. derived parameters were used. In declining order of their correlation with inclusion body formation, the parameters are charge av., turn forming residue fraction, cysteine fraction, proline fraction, hydrophilicity, and total no. of residues. The correlation with inclusion body formation is strong for the 1st 2 parameters but weak for the last 4. This correlation can be used to predict the probability that a protein will form inclusion bodies using only the protein's amino acid compn. as the basis for the prediction.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK38Xjt1an&md5=b522dcdccd3f0c40b85d10cd5df10826
197. 197
  Davis, G. D.; Elisee, C.; Newham, D. M.; Harrison, R. G. New Fusion Protein Systems Designed to Give Soluble Expression in Escherichia coli. Biotechnol. Bioeng. 1999, 65, 382– 388, DOI: 10.1002/(SICI)1097-0290(19991120)65:4<382::AID-BIT2>3.0.CO;2-I
  
  197
  New fusion protein systems designed to give soluble expression in Escherichia coli
  
  Davis, Gregory D.; Elisee, Claude; Newham, Denton M.; Harrison, Roger G.
  
  Biotechnology and Bioengineering (1999), 65 (4), 382-388CODEN: BIBIAU; ISSN:0006-3592. (John Wiley & Sons, Inc.)
  
  Three native E. coli proteins-NusA, GrpE, and bacterioferritin (BFR)-were studied in fusion proteins expressed in E. coli for their ability to confer soly. on a target insol. protein at the C-terminus of the fusion protein. These three proteins were chosen based on their favorable cytoplasmic soly. characteristics as predicted by a statistical soly. model for recombinant proteins in E. coli. Modeling predicted the probability of sol. fusion protein expression for the target insol. protein human interleukin-3 (hIL-3) in the following order: NusA (most sol.), GrpE, BFR, and thioredoxin (least sol.). Expression expts. at 37° showed that the NusA/hIL-3 fusion protein was expressed almost completely in the sol. fraction, while GrpE/hIL-3 and BFR/hIL-3 exhibited partial soly. at 37°. Thioredoxin/hIL-3 was expressed almost completely in the insol. fraction. Fusion proteins consisting of NusA and either bovine growth hormone or human interferon-γ were also expressed in E. coli at 37° and again showed that the fusion protein was almost completely sol. Starting with the NusA/hIL-3 fusion protein with an N-terminal histidine tag, purified hIL-3 with full biol. activity was obtained using immobilized metal affinity chromatog., factor Xa protease cleavage, and anion exchange chromatog.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADyaK1MXmslGktr8%253D&md5=3c966e2554f136b96e47e14c11680506
198. 198
  Magnan, C. N.; Randall, A.; Baldi, P. SOLpro: Accurate Sequence-Based Prediction of Protein Solubility. Bioinformatics 2009, 25, 2200– 2207, DOI: 10.1093/bioinformatics/btp386
  
  198
  SOLpro: accurate sequence-based prediction of protein solubility
  
  Magnan, Christophe N.; Randall, Arlo; Baldi, Pierre
  
  Bioinformatics (2009), 25 (17), 2200-2207CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Protein insoly. is a major obstacle for many exptl. studies. A sequence-based prediction method able to accurately predict the propensity of a protein to be sol. on overexpression could be used, for instance, to prioritize targets in large-scale proteomics projects and to identify mutations likely to increase the soly. of insol. proteins. Here, the authors first curate a large, non-redundant and balanced training set of more than 17 000 proteins. Next, the authors ext. and study 23 groups of features computed directly or predicted (e.g. secondary structure) from the primary sequence. The data and the features are used to train a two-stage support vector machine (SVM) architecture. The resulting predictor, SOLpro, is compared directly with existing methods and shows significant improvement according to std. evaluation metrics, with an overall accuracy of over 74% estd. using multiple runs of 10-fold cross-validation. SOLpro is integrated in the SCRATCH suite of predictors and is available for download as a standalone application and as a web server at: http://scratch.proteomics.ics.uci.edu.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1MXhtVelu7fE&md5=7c24ccbf700c19b311ecd42abe49ec4a
199. 199
  Smialowski, P.; Doose, G.; Torkler, P.; Kaufmann, S.; Frishman, D. PROSO II—A New Method for Protein Solubility Prediction. FEBS J. 2012, 279, 2192– 2200, DOI: 10.1111/j.1742-4658.2012.08603.x
  
  199
  PROSO II - a new method for protein solubility prediction
  
  Smialowski, Pawel; Doose, Gero; Torkler, Phillipp; Kaufmann, Stefanie; Frishman, Dmitrij
  
  FEBS Journal (2012), 279 (12), 2192-2200CODEN: FJEOAC; ISSN:1742-464X. (Wiley-Blackwell)
  
  Many fields of science and industry depend on efficient prodn. of active protein using heterologous expression in Escherichia coli. The soly. of proteins upon expression is dependent on their amino acid sequence. Prediction of soly. from sequence is therefore highly valuable. We present a novel machine-learning-based model called PROSO II which makes use of new classification methods and growth in exptl. data to improve coverage and accuracy of soly. predictions. The classification algorithm is organized as a two-layered structure in which the output of a primary Parzen window model for sequence similarity and a logistic regression classifier of amino acid k-mer compn. serve as input for a second-level logistic regression classifier. Compared with previously published research our model is trained on five times more data than used by any other method before (82,000 proteins). When tested on a sep. holdout set not used at any point of method development our server attained the best results in comparison with other currently available methods: accuracy 75.4%, Matthew's correlation coeff. 0.39, sensitivity 0.731, specificity 0.759, gain (sol.) 2.263. In summary, due to utilization of cutting edge machine learning technologies combined with the largest currently available exptl. data set the PROSO II server constitutes a substantial improvement in protein soly. predictions.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC38Xps12qtrs%253D&md5=e80fc695e7ec155c3e173d218793f10f
200. 200
  Agostini, F.; Cirillo, D.; Livi, C. M.; Delli Ponti, R.; Tartaglia, G. G. CcSOL Omics: A Webserver for Solubility Prediction of Endogenous and Heterologous Expression in Escherichia coli. Bioinformatics 2014, 30, 2975– 2977, DOI: 10.1093/bioinformatics/btu420
  
  200
  ccSOL omics: a webserver for solubility prediction of endogenous and heterologous expression in Escherichia coli
  
  Agostini, Federico; Cirillo, Davide; Livi, Carmen Maria; Delli Ponti, Riccardo; Tartaglia, Gian Gaetano
  
  Bioinformatics (2014), 30 (20), 2975-2977CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Summary: Here we introduce ccSOL omics, a webserver for largescale calcns. of protein soly. Our method allows (i) proteome- wide predictions; (ii) identification of sol. fragments within each sequences; (iii) exhaustive single-point mutation anal. Results: Using coil/disorder, hydrophobicity, hydrophilicity, β-sheet and α-helix propensities, we built a predictor of protein soly. Our approach shows an accuracy of 79% on the training set (36 990 Target Track entries). Validation on three independent sets indicates that ccSOL omics discriminates sol. and insol. proteins with an accuracy of 74% on 31 760 proteins sharing 530% sequence similarity.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XhtFOrt7nP&md5=ff80067f4bfe752df02b81f0db836b99
201. 201
  Khurana, S.; Rawi, R.; Kunji, K.; Chuang, G.-Y.; Bensmail, H.; Mall, R. DeepSol: A Deep Learning Framework for Sequence-Based Protein Solubility Prediction. Bioinformatics 2018, 34, 2605– 2613, DOI: 10.1093/bioinformatics/bty166
  
  201
  DeepSol: a deep learning framework for sequence-based protein solubility prediction
  
  Khurana, Sameer; Rawi, Reda; Kunji, Khalid; Chuang, Gwo-Yu; Bensmail, Halima; Mall, Raghvendra
  
  Bioinformatics (2018), 34 (15), 2605-2613CODEN: BOINFP; ISSN:1367-4811. (Oxford University Press)
  
  Motivation: Protein soly. plays a vital role in pharmaceutical research and prodn. yield. For a given protein, the extent of its soly. can represent the quality of its function, and is ultimately defined by its sequence. Thus, it is imperative to develop novel, highly accurate in silico sequence-based protein soly. predictors. In this work we propose, DeepSol, a novel Deep Learning-based protein soly. predictor. The backbone of our framework is a convolutional neural network that exploits k-mer structure and addnl. sequence and structural features extd. from the protein sequence. Results: DeepSol outperformed all known sequence-based state-of-the-art soly. prediction methods and attained an accuracy of 0.77 and Matthew's correlation coeff. of 0.55. The superior prediction accuracy of DeepSol allows to screen for sequences with enhanced prodn. capacity and can more reliably predict soly. of novel proteins.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC1MXhtVWis7fO&md5=7b526bdc5291f8cf9eec6d1f13ad1289
202. 202
  Chang, C. C. H.; Li, C.; Webb, G. I.; Tey, B.; Song, J.; Ramanan, R. N. Periscope: Quantitative Prediction of Soluble Protein Expression in the Periplasm of Escherichia coli. Sci. Rep. 2016, 6, 21844, DOI: 10.1038/srep21844
  
  202
  Periscope: quantitative prediction of soluble protein expression in the periplasm of Escherichia coli
  
  Chang, Catherine Ching Han; Li, Chen; Webb, Geoffrey I.; Tey, Beng Ti; Song, Jiangning; Ramanan, Ramakrishnan Nagasundara
  
  Scientific Reports (2016), 6 (), 21844CODEN: SRCEC3; ISSN:2045-2322. (Nature Publishing Group)
  
  Periplasmic expression of sol. proteins in Escherichia coli not only offers a much-simplified downstream purifn. process, but also enhances the probability of obtaining correctly folded and biol. active proteins. Different combinations of signal peptides and target proteins lead to different sol. protein expression levels, ranging from negligible to several grams per L. Accurate algorithms for rational selection of promising candidates can serve as a powerful tool to complement with current trial-and-error approaches. Accordingly, proteomics studies can be conducted with greater efficiency and cost-effectiveness. Here, we developed a predictor with a two-stage architecture, to predict the real-valued expression level of target protein in the periplasm. The output of the first-stage support vector machine (SVM) classifier dets. which second-stage support vector regression (SVR) classifier to be used. When tested on an independent test dataset, the predictor achieved an overall prediction accuracy of 78% and a Pearson's correlation coeff. (PCC) of 0.77. We further illustrate the relative importance of various features with respect to different models. The results indicate that the occurrence of dipeptide glutamine and aspartic acid is the most important feature for the classification model. Finally, we provide access to the implemented predictor through the Periscope webserver, freely accessible at http://lightning.med.monash.edu/periscope/.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28XjsFers7w%253D&md5=def551b2e361651ff1400145da18de96
203. 203
  Hirose, S.; Noguchi, T. ESPRESSO: A System for Estimating Protein Expression and Solubility in Protein Expression Systems. Proteomics 2013, 13, 1444– 1456, DOI: 10.1002/pmic.201200175
  
  203
  ESPRESSO: A system for estimating protein expression and solubility in protein expression systems
  
  Hirose, Shuichi; Noguchi, Tamotsu
  
  Proteomics (2013), 13 (9), 1444-1456CODEN: PROTC7; ISSN:1615-9853. (Wiley-VCH Verlag GmbH & Co. KGaA)
  
  Recombinant protein technol. is essential for conducting protein science and using proteins as materials in pharmaceutical or industrial applications. Although obtaining sol. proteins is still a major exptl. obstacle, knowledge about protein expression/soly. under std. conditions may increase the efficiency and reduce the cost of proteomics studies. In this study, we present a computational approach to est. the probability of protein expression and soly. for two different protein expression systems: in vivo Escherichia coli and wheat germ cell-free, from only the sequence information. It implements two kinds of methods: a sequence/predicted structural property-based method that uses both the sequence and predicted structural features, and a sequence pattern-based method that utilizes the occurrence frequencies of sequence patterns. In the benchmark test, the proposed methods obtained F-scores of around 70%, and outperformed publicly available servers. Applying the proposed methods to genomic data revealed that proteins assocd. with translation or transcription have a strong tendency to be expressed as sol. proteins by the in vivo E. coli expression system. The sequence pattern-based method also has the potential to indicate a candidate region for modification, to increase protein soly. All methods are available for free at the ESPRESSO server (http://mbs.cbrc.jp/ESPRESSO).
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXmtV2lurk%253D&md5=adcfeb20aa6d4a259d19fe4c7f88c9e7
204. 204
  Hon, J.; Marusiak, M.; Martinek, T.; Zendulka, J.; Bednar, D.; Damborsky, J. SoluProt: Prediction of Protein Solubility. Nucleic Acids Res. 2018, in preparation
  
  There is no corresponding record for this reference.
205. 205
  DuBay, K. F.; Pawar, A. P.; Chiti, F.; Zurdo, J.; Dobson, C. M.; Vendruscolo, M. Prediction of the Absolute Aggregation Rates of Amyloidogenic Polypeptide Chains. J. Mol. Biol. 2004, 341, 1317– 1326, DOI: 10.1016/j.jmb.2004.06.043
  
  205
  Prediction of the Absolute Aggregation Rates of Amyloidogenic Polypeptide Chains
  
  DuBay, Kateri F.; Pawar, Amol P.; Chiti, Fabrizio; Zurdo, Jesus; Dobson, Christopher M.; Vendruscolo, Michele
  
  Journal of Molecular Biology (2004), 341 (5), 1317-1326CODEN: JMOBAK; ISSN:0022-2836. (Elsevier)
  
  Protein aggregation is assocd. with a variety of pathol. conditions, including Alzheimer's and Creutzfeldt-Jakob diseases and type II diabetes. Such degenerative disorders result from the conversion of the normal sol. state of specific proteins into aggregated states that can ultimately form the characteristic amyloid fibrils found in diseased tissue. Under appropriate conditions it appears that many, perhaps all, proteins can be converted in vitro into amyloid fibrils. The aggregation propensities of different polypeptide chains have, however, been obsd. to vary substantially. Here, we describe an approach that uses the knowledge of the amino acid sequence and of the exptl. conditions to reproduce, with a correlation coeff. of 0.92 and over five orders of magnitude, the in vitro aggregation rates of a wide range of unstructured peptides and proteins. These results indicate that the formation of protein aggregates can be rationalized to a considerable extent in terms of simple physico-chem. parameters that describe the properties of polypeptide chains and their environment.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXmsVCrtb4%253D&md5=6069c47fb2fff331a1a037c8345cd72f
206. 206
  Tartaglia, G. G.; Pawar, A. P.; Campioni, S.; Dobson, C. M.; Chiti, F.; Vendruscolo, M. Prediction of Aggregation-Prone Regions in Structured Proteins. J. Mol. Biol. 2008, 380, 425– 436, DOI: 10.1016/j.jmb.2008.05.013
  
  206
  Prediction of Aggregation-Prone Regions in Structured Proteins
  
  Tartaglia, Gian Gaetano; Pawar, Amol P.; Campioni, Silvia; Dobson, Christopher M.; Chiti, Fabrizio; Vendruscolo, Michele
  
  Journal of Molecular Biology (2008), 380 (2), 425-436CODEN: JMOBAK; ISSN:0022-2836. (Elsevier Ltd.)
  
  We present a method for predicting the regions of the sequences of peptides and proteins that are most important in promoting their aggregation and amyloid formation. The method extends previous approaches by allowing such predictions to be carried out for conditions under which the mols. concerned can be folded or contain a significant degree of persistent structure. In order to achieve this result, the method uses only knowledge of the sequence of amino acids to est. simultaneously both the propensity for folding and aggregation and the way in which these two types of propensity compete. We illustrate the approach by its application to a set of peptides and proteins both assocd. and not assocd. with disease. Our results show not only that the regions of a protein with a high intrinsic aggregation propensity can be identified in a robust manner but also that the structural context of such regions in the monomeric form is crucial for detg. their actual role in the aggregation process.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD1cXnt1eltrg%253D&md5=602438424a74012b2c2fd0b17ce944d4
207. 207
  Conchillo-Solé, O.; de Groot, N. S.; Avilés, F. X.; Vendrell, J.; Daura, X.; Ventura, S. AGGRESCAN: A Server for the Prediction and Evaluation of “Hot Spots” of Aggregation in Polypeptides. BMC Bioinf. 2007, 8, 65, DOI: 10.1186/1471-2105-8-65
  
  207
  AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides
  
  Conchillo-Sole Oscar; de Groot Natalia S; Aviles Francesc X; Vendrell Josep; Daura Xavier; Ventura Salvador
  
  BMC bioinformatics (2007), 8 (), 65 ISSN:.
  
  BACKGROUND: Protein aggregation correlates with the development of several debilitating human disorders of growing incidence, such as Alzheimer's and Parkinson's diseases. On the biotechnological side, protein production is often hampered by the accumulation of recombinant proteins into aggregates. Thus, the development of methods to anticipate the aggregation properties of polypeptides is receiving increasing attention. AGGRESCAN is a web-based software for the prediction of aggregation-prone segments in protein sequences, the analysis of the effect of mutations on protein aggregation propensities and the comparison of the aggregation properties of different proteins or protein sets. RESULTS: AGGRESCAN is based on an aggregation-propensity scale for natural amino acids derived from in vivo experiments and on the assumption that short and specific sequence stretches modulate protein aggregation. The algorithm is shown to identify a series of protein fragments involved in the aggregation of disease-related proteins and to predict the effect of genetic mutations on their deposition propensities. It also provides new insights into the differential aggregation properties displayed by globular proteins, natively unfolded polypeptides, amyloidogenic proteins and proteins found in bacterial inclusion bodies. CONCLUSION: By identifying aggregation-prone segments in proteins, AGGRESCAN http://bioinf.uab.es/aggrescan/ shall facilitate (i) the identification of possible therapeutic targets for anti-depositional strategies in conformational diseases and (ii) the anticipation of aggregation phenomena during storage or recombinant production of bioactive polypeptides or polypeptide sets.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD2s7ntFCjtg%253D%253D&md5=45a7bdfb4bdda006778830f70a5cc030
208. 208
  Fernandez-Escamilla, A.-M.; Rousseau, F.; Schymkowitz, J.; Serrano, L. Prediction of Sequence-Dependent and Mutational Effects on the Aggregation of Peptides and Proteins. Nat. Biotechnol. 2004, 22, 1302– 1306, DOI: 10.1038/nbt1012
  
  208
  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins
  
  Fernandez-Escamilla, Ana-Maria; Rousseau, Frederic; Schymkowitz, Joost; Serrano, Luis
  
  Nature Biotechnology (2004), 22 (10), 1302-1306CODEN: NABIF9; ISSN:1087-0156. (Nature Publishing Group)
  
  A statistical mechanics algorithm, TANGO, is developed to predict protein aggregation. TANGO is based on the physico-chem. principles of β-sheet formation, extended by the assumption that the core regions of an aggregate are fully buried. The algorithm accurately predicts the aggregation of a data set of 179 peptides compiled from the literature as well as of a new set of 71 peptides derived from human disease-related proteins, including prion protein, lysozyme and β2-microglobulin. TANGO also correctly predicts pathogenic as well as protective mutations of the Alzheimer β-peptide, human lysozyme and transthyretin, and discriminates between β-sheet propensity and aggregation. The results confirm the model of intermol. β-sheet formation as a widespread underlying mechanism of protein aggregation. Furthermore, the algorithm opens the door to a fully automated, sequence-based design strategy to improve the aggregation properties of proteins of scientific or industrial interest.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BD2cXotFGqtb8%253D&md5=ce1f751f3691066ec1bc6ce5caed6aae
209. 209
  Maurer-Stroh, S.; Debulpaep, M.; Kuemmerer, N.; Lopez de la Paz, M.; Martins, I. C.; Reumers, J.; Morris, K. L.; Copland, A.; Serpell, L.; Serrano, L.; Schymkowitz, J. W. H.; Rousseau, F. Exploring the Sequence Determinants of Amyloid Structure Using Position-Specific Scoring Matrices. Nat. Methods 2010, 7, 237– 242, DOI: 10.1038/nmeth.1432
  
  209
  Exploring the sequence determinants of amyloid structure using position-specific scoring matrices
  
  Maurer-Stroh, Sebastian; Debulpaep, Maja; Kuemmerer, Nico; de la Paz, Manuela Lopez; Martins, Ivo Cristiano; Reumers, Joke; Morris, Kyle L.; Copland, Alastair; Serpell, Louise; Serrano, Luis; Schymkowitz, Joost W. H.; Rousseau, Frederic
  
  Nature Methods (2010), 7 (3), 237-242CODEN: NMAEA3; ISSN:1548-7091. (Nature Publishing Group)
  
  Protein aggregation results in β-sheet-like assemblies that adopt either a variety of amorphous morphologies or ordered amyloid-like structures. These differences in structure also reflect biol. differences; amyloid and amorphous β-sheet aggregates have different chaperone affinities, accumulate in different cellular locations and are degraded by different mechanisms. Further, amyloid function depends entirely on a high intrinsic degree of order. Here we exptl. explored the sequence space of amyloid hexapeptides and used the derived data to build Waltz, a web-based tool that uses a position-specific scoring matrix to det. amyloid-forming sequences. Waltz allows users to identify and better distinguish between amyloid sequences and amorphous β-sheet aggregates and allowed us to identify amyloid-forming regions in functional amyloids.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhvFGmsbw%253D&md5=788ac031c7946f7d9c7c1f4e8de62a32
210. 210
  Walsh, I.; Seno, F.; Tosatto, S. C. E.; Trovato, A. PASTA 2.0: An Improved Server for Protein Aggregation Prediction. Nucleic Acids Res. 2014, 42, W301– 307, DOI: 10.1093/nar/gku399
  
  210
  PASTA 2.0: an improved server for protein aggregation prediction
  
  Walsh, Ian; Seno, Flavio; Tosatto, Silvio C. E.; Trovato, Antonio
  
  Nucleic Acids Research (2014), 42 (W1), W301-W307CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
  
  The formation of amyloid aggregates upon protein misfolding is related to several devastating degenerative diseases. The propensities of different protein sequences to aggregate into amyloids, how they are enhanced by pathogenic mutations, the presence of aggregation hot spots stabilizing pathol. interactions, the establishing of cross-amyloid interactions between co-aggregating proteins, all rely at the mol. level on the stability of the amyloid cross-beta structure. The authors' redesigned server, PASTA 2.0, provides a versatile platform where all of these different features can be easily predicted on a genomic scale given input sequences. The server provides other pieces of information, such as intrinsic disorder and secondary structure predictions, that complement the aggregation data. The PASTA 2.0 energy function evaluates the stability of putative cross-beta pairings between different sequence stretches. It was re-derived on a larger dataset of globular protein domains. The resulting algorithm was benchmarked on comprehensive peptide and protein test sets, leading to improved, state-of-the-art results with more amyloid forming regions correctly detected at high specificity. The PASTA 2.0 server can be accessed at http://protein.bio.unipd.it/pasta2/.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2cXhtFCqs7vF&md5=e5eef7b6922fc7db345b10ff9a14b004
211. 211
  Bryan, A. W.; Menke, M.; Cowen, L. J.; Lindquist, S. L.; Berger, B. BETASCAN: Probable Beta-Amyloids Identified by Pairwise Probabilistic Analysis. PLoS Comput. Biol. 2009, 5, e1000333, DOI: 10.1371/journal.pcbi.1000333
  
  211
  BETASCAN: probable beta-amyloids identified by pairwise probabilistic analysis
  
  Bryan Allen W Jr; Menke Matthew; Cowen Lenore J; Lindquist Susan L; Berger Bonnie
  
  PLoS computational biology (2009), 5 (3), e1000333 ISSN:.
  
  Amyloids and prion proteins are clinically and biologically important beta-structures, whose supersecondary structures are difficult to determine by standard experimental or computational means. In addition, significant conformational heterogeneity is known or suspected to exist in many amyloid fibrils. Recent work has indicated the utility of pairwise probabilistic statistics in beta-structure prediction. We develop here a new strategy for beta-structure prediction, emphasizing the determination of beta-strands and pairs of beta-strands as fundamental units of beta-structure. Our program, BETASCAN, calculates likelihood scores for potential beta-strands and strand-pairs based on correlations observed in parallel beta-sheets. The program then determines the strands and pairs with the greatest local likelihood for all of the sequence's potential beta-structures. BETASCAN suggests multiple alternate folding patterns and assigns relative a priori probabilities based solely on amino acid sequence, probability tables, and pre-chosen parameters. The algorithm compares favorably with the results of previous algorithms (BETAPRO, PASTA, SALSA, TANGO, and Zyggregator) in beta-structure prediction and amyloid propensity prediction. Accurate prediction is demonstrated for experimentally determined amyloid beta-structures, for a set of known beta-aggregates, and for the parallel beta-strands of beta-helices, amyloid-like globular proteins. BETASCAN is able both to detect beta-strands with higher sensitivity and to detect the edges of beta-strands in a richly beta-like sequence. For two proteins (Abeta and Het-s), there exist multiple sets of experimental data implying contradictory structures; BETASCAN is able to detect each competing structure as a potential structure variant. The ability to correlate multiple alternate beta-structures to experiment opens the possibility of computational investigation of prion strains and structural heterogeneity of amyloid. BETASCAN is publicly accessible on the Web at http://betascan.csail.mit.edu.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BD1M3jslaltA%253D%253D&md5=076a6b9a72cda8145ad23af5825d9cc0
212. 212
  Garbuzynskiy, S. O.; Lobanov, M. Y.; Galzitskaya, O. V. FoldAmyloid: A Method of Prediction of Amyloidogenic Regions from Protein Sequence. Bioinformatics 2010, 26, 326– 332, DOI: 10.1093/bioinformatics/btp691
  
  212
  FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence
  
  Garbuzynskiy, Sergiy O.; Lobanov, Michail Yu.; Galzitskaya, Oxana V.
  
  Bioinformatics (2010), 26 (3), 326-332CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Motivation: Amyloidogenic regions in polypeptide chains are very important because such regions are responsible for amyloid formation and aggregation. It is useful to be able to predict positions of amyloidogenic regions in protein chains. Results: Two characteristics (expected probability of hydrogen bonds formation and expected packing d. of residues) have been introduced by us to detect amyloidogenic regions in a protein sequence. We demonstrate that regions with high expected probability of the formation of backbone-backbone hydrogen bonds as well as regions with high expected packing d. are mostly responsible for the formation of amyloid fibrils. Our method (FoldAmyloid) has been tested on a dataset of 407 peptides (144 amyloidogenic and 263 non-amyloidogenic peptides) and has shown good performance in predicting a peptide status: amyloidogenic or non-amyloidogenic. The prediction based on the expected packing d. classified correctly 75% of amyloidogenic peptides and 74% of non-amyloidogenic ones. Two variants (averaging by donors and by acceptors) of prediction based on the probability of formation of backbone-backbone hydrogen bonds gave a comparable efficiency. With a hybrid-scale constructed by merging the above three scales, our method is correct for 80% of amyloidogenic peptides and for 72% of non-amyloidogenic ones. Prediction of amyloidogenic regions in proteins where positions of amyloidogenic regions are known from exptl. data has also been done. In the proteins, our method correctly finds 10 out of 11 amyloidogenic regions.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXhs1Onsrc%253D&md5=54bb87f8753d52c5c9bf8be6e8c86bc9
213. 213
  Goldschmidt, L.; Teng, P. K.; Riek, R.; Eisenberg, D. Identifying the Amylome, Proteins Capable of Forming Amyloid-like Fibrils. Proc. Natl. Acad. Sci. U. S. A. 2010, 107, 3487– 3492, DOI: 10.1073/pnas.0915166107
  
  213
  Identifying the amylome, proteins capable of forming amyloid-like fibrils
  
  Goldschmidt, Lukasz; Teng, Poh K.; Riek, Roland; Eisenberg, David
  
  Proceedings of the National Academy of Sciences of the United States of America (2010), 107 (8), 3487-3492, S3487/1-S3487/13CODEN: PNASA6; ISSN:0027-8424. (National Academy of Sciences)
  
  The amylome is the universe of proteins that are capable of forming amyloid-like fibrils. Here we investigate the factors that enable a protein to belong to the amylome. A major factor is the presence in the protein of a segment that can form a tightly complementary interface with an identical segment, which permits the formation of a steric zipper - two self-complementary beta sheets that form the spine of an amyloid fibril. Another factor is sufficient conformational freedom of the self-complementary segment to interact with other mols. Using RNase A as a model system, we validate our fibrillogenic predictions by the 3D profile method based on the crystal structure of NNQQNY and demonstrate that a specific residue order is required for fiber formation. Our genome-wide anal. revealed that self-complementary segments are found in almost all proteins, yet not all proteins form amyloids. The implication is that chaperoning effects have evolved to constrain self-complementary segments from interaction with each other.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3cXjtFymu74%253D&md5=39dfce19592f6de1c53a6c9469f691d2
214. 214
  Ahmed, A. B.; Znassi, N.; Château, M.-T.; Kajava, A. V. A Structure-Based Approach to Predict Predisposition to Amyloidosis. Alzheimer’s Dementia 2015, 11, 681– 690, DOI: 10.1016/j.jalz.2014.06.007
  
  214
  A structure-based approach to predict predisposition to amyloidosis
  
  Ahmed Abdullah B; Znassi Nadia; Chateau Marie-Therese; Kajava Andrey V
  
  Alzheimer's & dementia : the journal of the Alzheimer's Association (2015), 11 (6), 681-90 ISSN:.
  
  BACKGROUND: Neurodegenerative diseases and other amyloidoses are linked to the formation of amyloid fibrils. It has been shown that the ability to form these fibrils is coded by the amino acid sequence. Existing methods for the prediction of amyloidogenicity generate an unsatisfactory high number of false positives when tested against sequences of the disease-related proteins. METHODS: Recently, it has been shown that the three-dimensional structure of a majority of disease-related amyloid fibrils contains a β-strand-loop-β-strand motif called β-arch. Using this information, we have developed a novel bioinformatics approach for the prediction of amyloidogenicity. RESULTS: The benchmark results show the superior performance of our method over the existing programs. CONCLUSIONS: As genome sequencing becomes more affordable, our method provides an opportunity to create individual risk profiles for the neurodegenerative, age-related, and other diseases ushering in an era of personalized medicine. It will also be used in the large-scale analysis of proteomes to find new amyloidogenic proteins.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A280%3ADC%252BC2M%252FksFSqsA%253D%253D&md5=f2f97b3d51ec862bbab6fab75e180239
215. 215
  Krogh, A.; Vedelsby, J. Neural Network Ensembles, Cross Validation and Active Learning. In Proceedings of the 7th International Conference on Neural Information Processing Systems (NIPS’94); MIT Press: Cambridge, MA, 1994; pp 231– 238.
  
  There is no corresponding record for this reference.
216. 216
  Maclin, R.; Opitz, D. Popular Ensemble Methods: An Empirical Study. J. Artif. Intell. Res. 1999, 11, 169– 198, DOI: 10.1613/jair.614
  
  There is no corresponding record for this reference.
217. 217
  Tsolis, A. C.; Papandreou, N. C.; Iconomidou, V. A.; Hamodrakas, S. J. A Consensus Method for the Prediction of “Aggregation-Prone” Peptides in Globular Proteins. PLoS One 2013, 8, e54175, DOI: 10.1371/journal.pone.0054175
  
  217
  A consensus method for the prediction of 'aggregation-prone' peptides in globular proteins
  
  Tsolis, Antonios C.; Papandreou, Nikos C.; Iconomidou, Vassiliki A.; Hamodrakas, Stavros J.
  
  PLoS One (2013), 8 (1), e54175CODEN: POLNCL; ISSN:1932-6203. (Public Library of Science)
  
  The purpose of this work was to construct a consensus prediction algorithm of 'aggregation-prone' peptides in globular proteins, combining existing tools. This allows comparison of the different algorithms and the prodn. of more objective and accurate results. Eleven (11) individual methods are combined and produce AMYLPRED2, a publicly, freely available web tool to academic users, for the consensus prediction of amyloidogenic determinants/'aggregation-prone' peptides in proteins, from sequence alone. The performance of AMYLPRED2 indicates that it functions better than individual aggregation-prediction algorithms, as perhaps expected. AMYLPRED2 is a useful tool for identifying amyloid-forming regions in proteins that are assocd. with several conformational diseases, called amyloidoses, such as Altzheimer's, Parkinson's, prion diseases and type II diabetes. It may also be useful for understanding the properties of protein folding and misfolding and for helping to the control of protein aggregation/soly. in biotechnol. (recombinant proteins forming bacterial inclusion bodies) and biotherapeutics (monoclonal antibodies and biopharmaceutical proteins).
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC3sXhtlCqsb4%253D&md5=759edce8afae8bbe81b455770c9ab600
218. 218
  Emily, M.; Talvas, A.; Delamarche, C. MetAmyl: A METa-Predictor for AMYLoid Proteins. PLoS One 2013, 8, e79722, DOI: 10.1371/journal.pone.0079722
  
  There is no corresponding record for this reference.
219. 219
  Zambrano, R.; Jamroz, M.; Szczasiuk, A.; Pujols, J.; Kmiecik, S.; Ventura, S. AGGRESCAN3D (A3D): Server for Prediction of Aggregation Properties of Protein Structures. Nucleic Acids Res. 2015, 43, W306– 313, DOI: 10.1093/nar/gkv359
  
  219
  AGGRESCAN3D (A3D): server for prediction of aggregation properties of protein structures
  
  Zambrano, Rafael; Jamroz, Michal; Szczasiuk, Agata; Pujols, Jordi; Kmiecik, Sebastian; Ventura, Salvador
  
  Nucleic Acids Research (2015), 43 (W1), W306-W313CODEN: NARHAD; ISSN:0305-1048. (Oxford University Press)
  
  Protein aggregation underlies an increasing no. of disorders and constitutes a major bottleneck in the development of therapeutic proteins. Our present understanding on the mol. determinants of protein aggregation has crystd. in a series of predictive algorithms to identify aggregation-prone sites. A majority of these methods rely only on sequence. Therefore, they find difficulties to predict the aggregation properties of folded globular proteins, where aggregation-prone sites are often not contiguous in sequence or buried inside the native structure. The AGGRESCAN3D (A3D) server overcomes these limitations by taking into account the protein structure and the exptl. aggregation propensity scale from the well-established AGGRESCAN method. Using the A3D server, the identified aggregation-prone residues can be virtually mutated to design variants with increased soly., or to test the impact of pathogenic mutations. Addnl., A3D server enables to take into account the dynamic fluctuations of protein structure in soln., which may influence aggregation propensity. This is possible in A3D Dynamic Mode that exploits the CABS-flex approach for the fast simulations of flexibility of globular proteins.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC2sXhtVymtbjK&md5=4d5a4d94fa0bf2744250860780e2a203
220. 220
  De Baets, G.; Van Durme, J.; van der Kant, R.; Schymkowitz, J.; Rousseau, F. Solubis: Optimize Your Protein. Bioinformatics 2015, 31, 2580– 2582, DOI: 10.1093/bioinformatics/btv162
  
  220
  Solubis: optimize your protein
  
  De Baets, Greet; Van Durme, Joost; van der Kant, Rob; Schymkowitz, Joost; Rousseau, Frederic
  
  Bioinformatics (2015), 31 (15), 2580-2582CODEN: BOINFP; ISSN:1367-4803. (Oxford University Press)
  
  Motivation:Protein aggregation is assocd. with a no. of protein misfolding diseases and is a major concern for therapeutic proteins. Aggregation is caused by the presence of aggregation- prone regions (APRs) in the amino acid sequence of the protein. The lower the aggregation propen- sity of APRs and the better they are protected by native interactions within the folded structure of the protein, the more aggregation is prevented. Therefore, both the local thermodn. stability of APRs in the native structure and their intrinsic aggregation propensity are a key parameter that needs to be optimized to prevent protein aggregation. Results:The Solubis method presented here automates the process of carefully selecting point mutations that minimize the intrinsic aggregation propensity while improving local protein stability.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs1Gisr3O&md5=895e1eac4e041610ac951c35a500f7e7
221. 221
  Van Durme, J.; De Baets, G.; Van Der Kant, R.; Ramakers, M.; Ganesan, A.; Wilkinson, H.; Gallardo, R.; Rousseau, F.; Schymkowitz, J. Solubis: A Webserver To Reduce Protein Aggregation through Mutation. Protein Eng., Des. Sel. 2016, 29, 285– 289, DOI: 10.1093/protein/gzw019
  
  221
  Solubis: a webserver to reduce protein aggregation through mutation
  
  Van Durme, Joost; De Baets, Greet; Van Der Kant, Rob; Ramakers, Meine; Ganesan, Ashok; Wilkinson, Hannah; Gallardo, Rodrigo; Rousseau, Frederic; Schymkowitz, Joost
  
  Protein Engineering, Design & Selection (2016), 29 (8), 285-289CODEN: PEDSBR; ISSN:1741-0126. (Oxford University Press)
  
  Protein aggregation is a major factor limiting the biotechnol. and therapeutic application of many proteins, including enzymes and monoclonal antibodies. The mol. principles underlying aggregation are by now sufficiently understood to allow rational redesign of natural polypeptide sequences for decreased aggregation tendency, and hence potentially increased expression and soly. Given that aggregation-prone regions (APRs) tend to contribute to the stability of the hydrophobic core or to functional sites of the protein, mutations in these regions have to be carefully selected in order not to disrupt protein structure or function. Therefore, we here provide access to an automated pipeline to identify mutations that reduce protein aggregation by reducing the intrinsic aggregation propensity of the sequence (using the TANGO algorithm), while taking care not to disrupt the thermodn. stability of the native structure (using the empirical force-field FoldX). Moreover, by providing a plot of the intrinsic aggregation propensity score of APRs cor. by the local stability of that region in the folded structure, we allow users to prioritize those regions in the protein that are most in need of improvement through protein engineering.
  
  >> More from SciFinder ^®
  https://chemport.cas.org/services/resolver?origin=ACS&resolution=options&coi=1%3ACAS%3A528%3ADC%252BC28Xhs1OntbjM&md5=f2dd5db6195dd37365285f09a44e9c0b
Supporting Information
Supporting Information

ARTICLE SECTIONS
Jump To

The Supporting Information is available free of charge on the ACS Publications website at DOI: 10.1021/acscatal.8b03613.
- Data sets for prediction of protein stability (Table S1); software tools for prediction of protein stability (Table S2); data sets for prediction of protein solubility (Table S3); software tools for prediction of protein solubility (Table S4); comparison of the existing tools with the S350 data set (Table S5) (PDF)
- cs8b03613_si_001.pdf (712.72 kb)
Terms & Conditions

Most electronic Supporting Information files are available without a subscription to ACS Web Editions. Such files may be downloaded by article for research use (if there is a public use license linked to the relevant article, that license may permit other uses). Permission may be obtained from ACS for other uses through requests via the RightsLink permission system: http://pubs.acs.org/page/copyright/permissions.html.

PDF [ 3MB]

All Types

Computational Design of Stable and Soluble Biocatalysts

Article Views

Altmetric

Citations

Abstract

1. Introduction

2. Experimental Framework To Determine Protein Stability and Solubility

2.1. Experimental Determination of Protein Stability

Figure 1

Figure 2

2.2. Experimental Determination of Protein Solubility

3. Theoretical Framework for the Design of Robust Proteins

3.1. Principles of Methods Based on Energy Calculations

Figure 3

3.2. Principles of Methods Based on Machine Learning

3.3. Principles of Methods Based on Phylogenetic Analysis

4. Data Sets and Software Tools for Designing Stable Proteins

4.1. Data Sets for Protein Stability

4.2. Software Tools for Predicting Protein Stability Based on Energy Calculations

4.3. Software Tools for Predicting Protein Stability Based on Machine Learning

4.4. Software Tools for Predicting Protein Stability Based on Phylogenetics

4.5. Software Tools for Predicting Protein Stability Based on Hybrid Approaches

Figure 4

5. Data Sets and Software Tools for the Design of Soluble Proteins

5.1. Protein Solubility Data Sets

5.1.1. Protein Solubility Data Sets Based on Full-Length Proteins

5.1.2. Protein Solubility Data Sets Based on Protein Fragments

5.1.3. Protein Solubility Data Sets Based on Mutants

5.2. Software Tools for Predicting Protein Solubility

5.2.1. Software Tools for Protein Solubility Based on Primary Sequences

5.2.2. Software Tools for Predicting Protein Solubility Based on Sequence Profiles

5.2.3. Software Tools for Protein Solubility Based on Mutations

Figure 5

6. Perspectives

Protein Structures from Cryoelectron Microscopy and Hardware-Accelerated Calculations

Consistent and Balanced Stability Data Sets Are Urgently Needed

The Shift from Scores to Profiles and Specific Mutations in Solubility Predictions

High-Throughput Techniques for Highly Consistent Data Sets

Robust Scaffolds for Directed Evolution by Phylogenetic Analyses

Addressing Stability–Activity Trade-Offs Using Metadata and Negative and Multistate Designs

Enhancing Accuracy by Using Metapredictors, Consensual Force Fields, and Hybrid Methods

Supporting Information

Terms & Conditions

Author Information

Acknowledgments

References

Cited By

Abstract

Figure 1

Figure 2

Figure 3

Figure 4

Figure 5

References

Supporting Information

Supporting Information

Terms & Conditions

STEP 1:

STEP 2: