Volume 60, Issue 12 p. 769-774
Critical Review
Free Access

Unraveling the mysteries of protein folding and misfolding

Heath Ecroyd

Corresponding Author

Heath Ecroyd

School of Chemistry and Physics, University of Adelaide, Adelaide, SA, Australia

Tel: +61-8-8303-5505

School of Chemistry & Physics, University of Adelaide, Adelaide, SA 5005. AustraliaSearch for more papers by this author
John A. Carver

John A. Carver

School of Chemistry and Physics, University of Adelaide, Adelaide, SA, Australia

Search for more papers by this author
First published: 02 September 2008
Citations: 60

Abstract

This mini-review focuses on the processes and consequences of protein folding and misfolding. The latter process often leads to protein aggregation and precipitation with the aggregates adopting either highly ordered (amyloid fibril) or disordered (amorphous) forms. In particular, the amyloid fibril is discussed because this form has gained considerable notoriety due to its close links to a variety of debilitating diseases including Alzheimer's, Parkinson's, Huntington's, and Creutzfeldt-Jakob diseases, and type-II diabetes. In each of these diseases a different protein forms fibrils, yet the fibrils formed have a very similar structure. The mechanism by which fibrils form, fibril structure, and the cytotoxicity associated with fibril formation are discussed. The generic nature of amyloid fibril structure suggests that a common target may be accessible to treat amyloid fibril-associated diseases. As such, the ability of some molecules, for example, the small heat-shock family of molecular chaperone proteins, to inhibit fibril formation is of interest due to their therapeutic potential. © 2008 IUBMB IUBMB Life, 60(12): 769–774, 2008

INTRODUCTION

The dogma of protein folding, based largely on the work of Christian Anfinsen some 50 years ago (1), is that all the information required for a protein to fold into its proper three-dimensional structure (and hence functional form) is contained within its amino acid sequence. However, even if, following translation, a protein successfully attains its biologically active state, this often does not herald the end-point of its folding/unfolding life. Many proteins go through cycles of unfolding and refolding due to a variety of factors that include transport across a membrane, cellular secretion, or exposure to stress conditions (e.g. changes in pH, temperature). As a result, the chance for a protein to misfold is relatively high and so the whole process of protein folding must be tightly regulated to ensure that it proceeds smoothly. The failure of a protein to fold correctly can have serious consequences: it is now recognized that protein misfolding lies at the very heart of a variety of our most debilitating diseases (see Table 1).

Table 1. Some of the diseases associated with amyloid fibril formation and the main protein component of the aggregates formed
Disease Main component of aggregates associated with disease
Alzheimer's disease   Aβ peptides, Tau
Frontal-temporal dementias   Tau
Parkinson's disease   α-Synuclein
Dementia with Lewy bodies   α-Synuclein
Transmissible spongiform encephalopathies (e.g. Creutzfeldt-Jakob disease and Mad Cow)   Prion
Huntington's disease   Huntingtin
Type II diabetes   Amylin
Senile systemic amyloidosis   Transthyretin
Familial amyloid polyneuropathy I   Transthyretin
Familial amyloid polyneuropathy III   Apolipoprotein AI
Haemodialysis-related amyloidosis   β2-Microglobulin
Injection-localized amyloidosis   Insulin
Hereditary nonneuropathic systemic amyloidosis   Lysozyme
Spinocerebellar ataxias   Ataxins
Spinocerebellar ataxia 17   TATA-box binding protein
Primary systemic amyloidosis   Ig light chains
Secondary systemic amyloidosis   Serum amyloid A
Amyotrophic lateral sclerosis   Superoxide dismutase I
Medullary carcinoma of the thyroid   Calcitonin

Abbreviations

EGCG, epigallocatechin gallate; sHsp, small heat-shock protein.

PROTEINS CAN AGGREGATE THROUGH TWO DISTINCT MECHANISMS

During and immediately following its translation on the ribosome, the newly formed protein meets the first major hurdle of its life: to fold into the conformation it requires in order to fulfil its raison d'être. This, in itself, is not a trivial task because the number of theoretical interactions between each of its amino acid side chains far exceeds the total number of protein molecules within the cell and establishing the correct interactions is vital if the protein is to fold correctly. In addition, the protein must fold within the crowded environment of the cell, in which the intracellular concentration of proteins can be as high as 350 mg/mL (2), and so the chance of it making inappropriate contacts with other proteins is very high. Yet, the driving force that pushes the protein to attain its lowest free energy state (i.e. its native conformation in the majority of cases) ensures that most proteins fold spontaneously and rapidly (in the order of micro- to milliseconds) and, more often than not, folding occurs without problems (3). Interestingly, many proteins never attain a defined conformation, and instead, in their biologically active state, remain intrinsically disordered, that is, they have ill-defined secondary and tertiary structures in their native state.

For other proteins, folding does not occur unassisted and instead the folding process is overseen by a number of auxiliary proteins, such as catalysts (e.g. enzymes that catalyse the correct disulfide bond combination and the formation of trans-proline isomers) and molecular chaperones, which ensure a high degree of folding fidelity. Molecular chaperones that are involved in ensuring correct protein folding include the well-characterized Hsp60, Hsp70, and Hsp90 families, in which the chaperone action is coupled to ATP hydrolysis. Although occurring quickly, the folding pathway of a protein typically does not occur in one step but instead proceeds through a number of intermediately folded states (each with lower energy than the unfolded protein) in which a few key initial contacts are established that are crucial in directing the correct protein structure (4-6). Subsequent hydrogen bonding and hydrophobic interactions enable the protein to attain its fully folded form (Fig. 1). The folding pathway is reversible. The folded protein, when required to or when subjected to stress (that causes disruptions to hydrogen bonding and hydrophobic interactions between some side chains), partially unfolds to its intermediately folded state(s).

Details are in the caption following the image

The protein on-folding pathway and the off-folding pathways that lead to protein aggregation. An unfolded protein folds to its native state via the formation of partially folded intermediates. This process is fast and reversible. However, under conditions in which the partially folded intermediates persist (e.g. during times of cellular stress or due to mutation), they can mutually associate via exposed hydrophobic regions that are normally buried in the core of the protein in its native state. When this occurs, the intermediates aggregate via either a disordered or ordered mechanism, leading to the formation of amorphous (disordered) precipitates or ordered amyloid fibrils, respectively.

Despite the number of checkpoints that exist to ensure proper folding of proteins, problems can arise due to undesirable interactions during folding. The main cause of this is the persistence of intermediately folded states of the protein on the folding pathway, a process that can be exacerbated by mutation and/or cellular stress. These intermediates states, which expose increased hydrophobicity to solution, are prone to self association and subsequent aggregation and precipitation. When this occurs, the protein leaves the folding pathway and enters the protein off-folding pathway, which is relatively slow (in the order of seconds) and driven primarily by the hydrophobic interactions between intermediately folded states (Fig. 1). The off-folding pathway comprises two distinct routes by which aggregation of the protein may proceed (i.e. the formation of disordered, amorphous aggregates or ordered amyloid fibrils); which off-folding pathway predominates is thought to be governed by the rate at which a protein unfolds and aggregates, its amino acid sequence, and the nature of the intermediates that are formed (3, 5, 6).

A disordered aggregation mechanism results from the rapid unfolding and subsequent aggregation of intermediately folded proteins, in which individual monomers add to the growing clump of aggregated protein through a random process. This leads to the formation of amorphous aggregates which eventually become so large that they form an insoluble precipitate. This type of aggregation is most often the bane of protein researchers as it is the underlying mechanism behind inclusion body formation in bacterial cells during recombinant protein expression and is also responsible for proteins “falling out” of solution when changing buffer conditions. With regards to inclusion body formation, the huge amount of protein formed overwhelms the cell's ability to properly fold the newly expressed protein and so the misfolded protein aggregates and precipitates. However, under normal circumstances in the cell, amorphous aggregation is often not of major concern because the cell has “machinery” (such as tagging of the protein with ubiquitin) that is well equipped to detect their formation and dispose of them into the proteasomal “dustbin” before they precipitate.

AMYLOID FIBRILS ARE FORMED THROUGH AN ORDERED AGGREGATION MECHANISM

In contrast to the formation of amorphous (disordered) clumps of protein, aggregation may occur more slowly through a highly ordered, nucleation-dependent mechanism in which partially folded forms of the protein associate together to form a stable nucleus (the rate-determining step). This nucleus then acts as a template to sequester other intermediates to add to the growing thread of aggregated protein (protofibril). The sequential addition of partially folded intermediates to the ends of the chain leads to the formation of a highly structured, insoluble form of protein known as an amyloid fibril (Fig. 2). Such a mechanism explains the observed kinetics of fibril formation as monitored using amyloidogenic dyes such as thioflavin T (ThT) or Congo red (Fig. 2). Both the length of the lag phase (i.e. the time taken to form a stable nucleus) and the rate of elongation are highly dependent on the protein concentration through their reliance on the concentration of partially folded intermediates present at any given time (7). Whilst this nucleation-dependent mechanism holds for most in vitro amyloid fibril forming species studied to date, alternative mechanisms do exist [e.g. in which the rate-limiting step is the dissociation of the amyloidogenic species from a binding partner or oligomeric state (8)].

Details are in the caption following the image

Monitoring the formation of amyloid fibrils and their generic core architecture. (A) The typical structure of amyloid fibrils as viewed by transmission electron micrograph showing them as long, unbranched, rope-like fibers. Scale bar is 1 μm. (B) In the left panel, a magnified view of an α-synuclein fibril highlighting its internal protofilament substructure (scale bar is 200 nm). In the right panel, a schematic view of an amyloid fibril formed from insulin [reproduced from (9)]. This model shows the core structure of each filament, that is, the typical cross β-sheet array formed from sheets of β-strands lying perpendicular to the axis of the fibril and the aligning of these β-sheets into individual filaments. (C) Monitoring amyloid fibril formation via the change in fluorescence of the amyloidogenic dye thioflavin T upon its binding to the fibril. The kinetics of fibril formation include a lag phase, elongation phase, and plateau phase. Typically, as the concentration of protein increases, the lag phase of the reaction decreases and the rate of fibril elongation increases. (D) X-ray fiber diffraction of amyloid fibrils showing the diagnostic meridional and equatorial reflections which form the “cross β-sheet” pattern. (E) The standard nucleation-dependent model of amyloid fibril formation. Fibril formation commences with the unfolding of a native protein, forming a pool of partially folded intermediates, a process that is reversible. The partially folded intermediates are able to associate with each other until they reach a critical size/mass at which a stable nucleus is formed. The formation of this nucleus from the partially folded intermediates is slow and rate-limiting in the overall process of fibril formation (lag phase). Fibril elongation then proceeds via the addition of intermediates to the growing nucleus. The mechanism also explains how seeding the reaction increases the reaction rate and decreases the lag phase because addition of preformed fibrils overcomes the time required to form nuclei [adapted from (8)].

Amyloid fibril formation is associated with a wide range of diseases and is believed to be causative, or at least linked, to the onset and progression of these diseases (see later) (4, 6, 10). However, the disease-related proteins found as fibrillar aggregates in vivo share no obvious sequence or structural similarities in their native state. Moreover, the amyloid fibril conformation has been found to be accessible to a diverse range of proteins, such that it is now thought to be a generic structural form that all proteins can adopt given appropriate conditions (11). In some cases, in particular those associated with protein deposition diseases, unstructured or intrinsically disordered peptides or proteins, such as the amyloid-β peptides, islet amyloid polypeptide, tau and α-synuclein, assemble into fibrils directly from their native state, without the requirement for an initial partial unfolding step.

THE GENERIC STRUCTURE OF AMYLOID FIBRILS

The characterization of amyloid fibril formation by proteins in vitro has, to date, largely focused on biophysical studies to determine the structure of the fibril, and biochemical studies into the mechanism and kinetics of the process. Through techniques such as X-ray fiber diffraction, cryo-electron microscopy, and solid state NMR spectroscopy, we now have a good understanding of the core architecture of individual fibrils. All fibrils share a characteristic “cross β-sheet array,” so called because individual fibrils are made up of sheets of β-strands which lie perpendicular to the core axis of the fibril and which stack together to form an individual filament (Fig. 2B). This results in a characteristic cross formed by the meridional and equatorial reflections in X-ray diffraction studies (the former of ∼4.5 Å and latter of ∼9–11 Å) (Fig. 2D), which represent the hydrogen bonding distance between adjacent β-strands that make up a β-sheet and the distance between β-sheets respectively. The presence of this “cross β-sheet array” as the underlying architecture of fibrils observed by techniques such as transmission electron microscopy and atomic force microscopy is now seen as the diagnostic test for the presence of amyloid fibrils. Mature fibrils are commonly composed of 2–6 protofilaments that plait together into rope-like fibers, 5–10 nm in diameter and up to a few microns in length. The fibrils formed are often unbranched, extremely stable, and resistant to degradation by proteases and denaturants. These properties are thought to be responsible for the difficulty the cell has in eliminating fibrils once they have been formed.

The overall stability of the fibril is achieved by intermolecular hydrogen bonds between the amide and carbonyl groups of the polypeptide main chain; the peptide backbone being common to all proteins is therefore thought to dictate why all fibrils share a common morphology. It also explains why very structurally diverse proteins are able to adopt the generic amyloid-fibril conformation, including those which are predominately α-helical in their native state, for example, myoglobin (12-14). However, the propensity for a given peptide or protein to form fibrils varies dramatically with its amino acid sequence and some regions of a protein are more aggregation prone than others. Thus, whilst it may be true that all proteins are capable of forming fibrils, the composition and amino acid sequence of a protein profoundly affect its propensity to adopt such structures. Moreover, the tendency of a region of the polypeptide chain to form fibrils, rather than fold correctly, depends on a number of intrinsic factors including the propensity for it to form β-strands, its hydrophobicity and its overall net charge (15, 16). The specific link between these physiochemical properties of constituent amino acid residues and their aggregation propensities has led to the development of predicative algorithms for amyloidogenic regions of proteins that are based solely on their amino acid sequences (17-19).

TOXIC PROTEIN AGGREGATION AND ITS PREVENTION

In each of the amyloid diseases, of which at least 20 have now been identified, the fibrils that are formed are primarily associated with one protein or protein fragment, for example, the amyloid-β peptides in Alzheimer's disease, α-synuclein in Parkinson's disease, and the prion protein in the transmissible spongiform encephalopathies such as Creutzfeldt-Jakob disease (see Table 1). In many of these diseases, the fibrils then self assemble into tangled plaques, the hallmark of most neurodegenerative conditions and the site at which the toxic effect of fibril formation is most evident. However, the cytotoxicity associated with amyloid formation is not restricted to disease related proteins: fibrils and their precursors formed from non–disease-related proteins, such as the SH3 domain from bovine phosphatidylinositol 3′ kinase and the N-terminal domain of E. coli HypF protein (HypF-N), show similar levels of cytotoxicity (20). There remains considerable debate as to the species responsible for the cytotoxicity of amyloid fibrils. Although there are obvious negative effects of extracellular amyloid plaque deposition, recent studies have suggested that it is primarily the soluble, pre-fibrillar dimers, trimers, or other such oligomers, which are formed during the early stages of fibril formation, that are responsible for cell toxicity in neurodegenerative diseases such as Alzheimer's and Parkinson's disease (4, 6). Others have indicated that the mature fibril can also be toxic (21, 22), and in fact, the cytotoxic species may vary depending on the fibril-forming protein.

A wide variety of biochemical changes has been reported following exposure of neuronal cells in culture to amyloid fibrils or their precursors. It is not clear how the species formed during amyloid fibril assembly cause cell death, and indeed whether the mechanism behind the toxicity is the same for all amyloid fibril-forming proteins. A number of hypotheses have been proposed, for example, that a soluble precursor forms a pore-like structure in cell membranes (the amyloid channel or amyloid pore) which culminates in neuronal death by unregulated membrane permeabilization (23, 24). Others have suggested that the toxicity of pre-fibrillar amyloid species is due to the production of reactive oxygen species (e.g. hydrogen peroxide) by the aggregating target protein itself, which are generated as a consequence of the fibril-forming process (25). In support of this, cells can be protected against amyloid aggregate toxicity by treatment with anti-oxidants such as tocopherol, lipoic acid and reduced glutathione. An additional advantage of some of these compounds is that they are also able to inhibit the process of fibril formation (26), most likely due to a direct effect on the hydrophobic association steps required for nuclei formation. Moreover, a recent study found that the polyphenol epigallocatechin gallate (EGCG) redirects the aggregation of α-synuclein from the ordered fibril forming pathway over to the amorphous (disordered) pathway resulting in the formation of nontoxic protein aggregates (27). Thus, these compounds or derivatives thereof are promising therapeutics due to their combined anti-oxidant and anti-amyloidogenic activities.

Other well-described inhibitors of protein aggregation, including amyloid fibril formation, are molecular chaperones proteins, in particular, intracellular small heat-shock proteins (sHsps) and extracellular clusterin (28). The sHsps are a ubiquitous group of proteins that are the cell's first line of defence against physiological stress conditions that promote protein aggregation. For example, their expression is elevated significantly under conditions of elevated temperature. The chaperone action of sHsps does not require ATP hydrolysis and therefore they can be utilized by the cell under conditions in which energy levels are low, for example, during cellular stress. The sHsps seem to employ a number of distinct mechanisms to prevent protein aggregation. In some instances, they bind to long-lived, partially structured intermediates, primarily though hydrophobic interactions, to form a stable, soluble, chaperone-target protein complex (i.e. a “reservoir of intermediates”) (29, 30). Neither sHsps nor clusterin have the capability of refolding the target protein but instead act to maintain its solubility until cellular conditions allow it to be picked up and acted upon by other chaperones (such as Hsp70 and Hsp60 that use ATP hydrolysis to refold the protein). In other cases, sHsps may only transiently interact with the target protein to stabilize it and allow it to refold back to its native state upon release. For example, the latter mechanism is utilized by the sHsp, α-crystallin, against fibril formation by apolipoprotein C-II (31). We have also shown that αB-crystallin can redirect the aggregation of α-synuclein from the ordered fibril forming off-folding pathway over to the amorphous aggregation pathway (30) in a similar manner as described for EGCG (27) (see earlier).

No matter which mechanism is utilized, sHsps and clusterin are ideally suited to prevent amyloid fibril formation because they act very efficiently against slowly aggregating target proteins, a process that governs the ordered aggregation mechanism that leads to fibril formation. Interestingly, the expression of sHsps and clusterin is upregulated in many amyloid neurodegenerative diseases and they are found in high amounts in amyloid plaques (32), presumably as a result of their attempts to prevent protein aggregation. Studies in which these chaperone proteins are over-expressed in cellular models of amyloidoses will enable an assessment of their therapeutic potential in the treatment of such diseases.

CONCLUSIONS

Although often overlooked, the folding and unfolding processes undertaken by proteins during their life cycle is not a trivial one. Like all biological processes, the folding pathway is tightly regulated to ensure proteins reach their correct, functional form. However, problems do occur and the number of protein conformational diseases that are now recognized is an indication of the importance of proteins achieving and maintaining their correct fold. That the amyloid fibril conformation is potentially accessible to all proteins, no matter what their native state, indicates that for many this form of toxic protein aggregation is a constant possibility in vivo. Significantly, diseases associated with amyloid fibril formation represent some of the western world's most debilitating conditions and, because many are associated with old age, will become more prevalent over the coming decades as the population ages. As such, a greater understanding is required of the mechanism by which fibrils are formed and strategies to prevent their formation.

WORTH ANOTHER LOOK

From time to time we republish review articles from the Australian Biochemist, the magazine of the Australian Society for Biochemistry and Molecular Biology Inc. This exposes these excellent reviews to a much wider and different readership. Here we republish a review in a slightly modified form on protein folding and misfolding, which originally appeared in the Australian Biochemist, Vol. 39, April 2008.

We are most grateful for the permission of Heath Ecroyd, and of Rebecca Lew, the Editor of the Australian Biochemist, to republish this review.

Acknowledgements

H.E. is supported by a National Health and Medical Research Council (NHMRC) Peter Doherty Fellowship and J.A.C.'s research is supported by grants from the NHMRC and Australian Research Council.