Structure determination
Wild‐type Lb
pro crystals could not be obtained. However, substitution of the active site nucleophile Cys51 (
Piccione et al., 1995a;
Roberts and Belsham, 1995;
Ziegler et al., 1995) with Ala allowed crystals to be grown as reported (
Guarné et al., 1996). These contained eight Lb
pro molecules in the asymmetric unit but were difficult both to reproduce and to manipulate. Thus, although a native diffraction data set from these crystals was obtained up to 3.0 Å resolution, the corresponding phases could not be determined. We speculated that the properties of these crystals resulted from interactions between the C‐terminus of one molecule and the active site of a neighbour, and therefore prepared a truncated form of the inactive Lb
pro lacking six C‐terminal amino acids (termed sLb
pro). New crystals were obtained containing two sLb
pro molecules in the asymmetric unit and the structure was solved by a combination of isomorphous replacement and density modification techniques. The Lb
pro Cys51Ala mutant was then determined by molecular replacement using the sLb
pro structure as a search model (
Table I; see Materials and methods).
Description of the overall Lbpro structure
The Lb
pro structure (
Figure 2A–C) presents a compact, globular region, ranging from Met29 to Tyr183 with an overall cubic shape of approximate edge dimensions of 30 Å, from which a flexible C‐terminal extension (CTE) ranging from Asp184 to Lys201 extrudes. The globular region is divided into two subdomains, with the catalytically essential residues Cys51 (replaced by Ala in the Lb
pro structure) and His148 located at the interface. The first, N‐terminal subdomain contains four α‐helices (α1, α2, α3 and α4) and two short antiparallel β‐strands (β1 and β2) comprising only residues Glu30–Thr32 and Lys38–Thr40, respectively. The longest α‐helices α1 and α3 (comprising residues Asn50–Glu64 and Leu78–Gly91, respectively) run perpendicular to each other, with the catalytic Cys51 being located towards the N‐terminal side of helix α1. The shortest helix α2 spans only six residues (Phe68–Ser73) and runs almost parallel to α3. The second Lb
pro subdomain displays a fold belonging to the all β‐family of proteins, as the only regular secondary structure elements are contained in a mixed β‐sheet formed by one parallel (β3 with β4) and six antiparallel (β4–β9) β‐strands (
Figure 2A–C). The essential His148 is located on the turn connecting the longest strands β5 and β6 (residues Phe137–Leu143 and Ala149–Thr155) which occupy a central position in the sheet.
The main difference between the structures of Lb
pro and sLb
pro is that, in the latter, the 12 residues remaining in the CTE are disordered in both molecules present in the asymmetric unit (from residue Asp184 onwards). The superimposition of the globular regions of Lb
pro and sLb
pro models gives an r.m.s. deviation of 0.4 Å (performed using the program SHP;
Stuart et al., 1979), which can be considered as an upper limit for the coordinate errors in the compact region of the two structures.
Several groups proposed a papain‐like fold for the picornaviral L
pro structure (
Gorbalenya et al., 1991;
Piccione et al., 1995a;
Skern et al., 1998). The papain‐fold consists of left‐ (L) and right‐hand (R) domains [standard papain‐fold view (
Kamphuis et al., 1985)] which are structurally equivalent to the two subdomains described for Lb
pro. Superimposition of main‐chain C
α atoms from Lb
pro with papain (
Figure 2A and
D) gives an averaged r.m.s. deviation of 1.3 Å for 76 equivalent residues. As the globular region of Lb
pro represents the smallest polypeptide fragment with a papain‐like topology, it lacks most of the decoration found in papain, including the prosegment‐binding loop (PBL;
Coulombe et al., 1996;
Figure 2D). Regions most conserved between the papain‐like proteases and Lb
pro structures are located around the active centre, especially secondary elements α1 and β5–β6, containing the catalytic cysteine and histidine residues (
Figure 2B and
C). Papain‐like proteases have, however, no equivalent to the Lb
pro CTE.
The extended conformations of the Lb
pro CTEs are stabilized by a network of intermolecular interactions originating from the exchange of the CTEs between neighbouring molecules. In four out of the eight Lb
pro molecules in the asymmetric unit, all residues in the CTE are visible (
Figures 2C and
3) while, in the other four molecules, there is some disorder, and residues Glu186–Glu191 have not been traced (
Figures 2C and
3). In the four molecules containing the disorder, the conformation of the visible residues is closely related to, though subtly different from that found in the other four (
Figures 2B, C and
3). The 10 C‐terminal amino acids, residues Trp192–Lys201, are nevertheless well defined in all eight crystallographically independent subunits and present identical interactions with the substrate‐binding pocket of adjacent molecules in the crystal.
Besides the exchange of the CTEs between neighbouring molecules, the crystal packing of both Lb
pro and sLb
pro forms shows a polymeric character brought about by two noteworthy peculiarities. First, there is a covalent intermolecular disulfide bridge between adjacent molecules (
Figure 3) in both the sLb
pro and Lb
pro forms, although the spatial relationship between disulfide‐linked molecules differs in the two crystal forms. Secondly, there is a large contact area of 1088 Å
2 between the helical domains of the neighbouring molecules which are related by a local 2‐fold axis. Again, the dimer defined by the two subunits in contact is present in both the sLb
pro and the Lb
pro crystal forms. However, as Lb
pro functions enzymatically as a monomer, the biological relevance of both possible dimers is not clear.
The active site cleft
The active site containing the catalytic residues Cys51 and His148 is located on top of a deep cleft in the interdomain region, as observed for other members of the papain superfamily. Both the location of the active site and the spatial arrangement of the catalytic residues are well preserved (
Figures 2B–D and
4A). In papain superfamily members, the active site histidine (P‐159; papain numbering is used throughout when describing papain‐like proteases) is maintained in the correct orientation with respect to the nucleophilic cysteine (P‐25) by a hydrogen bond to the side‐chain oxygen of a conserved asparagine residue (P‐175). Asp163 carries out this task in Lb
pro (
Figure 4A and
B), and this residue is strictly conserved in leader proteases (
Figure 2A). In all members of the papain superfamily examined so far, a tryptophan residue (P‐177) covers the hydrogen bond formed between the Asn–His pair (
Figure 4C); substitution of Trp P‐177 reduces papain activity (
Berti and Storer, 1995). Neither this aromatic residue nor the 11 residue loop (P‐175– P‐185) that anchors it in the papain‐like enzymes is found in Lb
pro. In its place is a β‐turn containing a cluster of four acidic residues (Asp163, Asp164, Glu165 and Asp166;
Figure 4B); these residues confer a strong local negative charge, so that the environment is quite different from those in most papain‐like enzymes (
Figure 5). In the absence of the tryptophan residue, the carboxylate group of Asp163 may be required to form a stronger hydrogen bond with His148. Despite these differences, Lb
pro represents a papain‐like enzyme without this fully conserved tryptophan residue.
Another important catalytic residue in papain superfamily members is a conserved glutamine (P‐19) whose side‐chain amide, together with the main‐chain nitrogen of the catalytic cysteine (P‐25), stabilizes the negative charge developing on the scissile carbonyl oxygen during nucleophilic attack. This structural feature, termed the oxyanion hole, is also present in Lb
pro. However, Asn46 replaces the conserved glutamine residue (
Figure 4B and
C). In Lb
pro, the arrangement of the turn positioning Asn46 and the orientation of its side‐chain amide [flipped by 180° compared with the equivalent glutamine (P‐19) in all other papain‐like proteases] differ from those of other members of the papain superfamily (compare
Figure 4B and
C). This is probably due to the fact that, in Lb
pro, only four residues separate Asn46 from Cys51, whereas five residues are found in other papain‐like proteases. The shorter side chain of Asn46 and its orientation are required because the tighter turn of the Lb
pro brings the main chain closer to the catalytic residues than in other papain‐like enzymes. The orientation of the side chain of Asn46 is fixed by hydrogen bonds to the main‐chain nitrogen of Asp49, forming an Asn‐pseudoturn, and the side‐chain carboxylate of Asp164 (
Figure 4B). This aspartate residue, conserved in all Lb
pro sequences analysed so far (
Figure 2A), is located immediately after the catalytic Asp163 in the acidic loop described above, and participates in an intricate network of hydrogen bonds involving residues Asn46, Asp49, Asn54 and Asp164 (
Figure 4B). This network might contribute to the stability of the active site structure and catalytic activity.
The interaction between Lpro and its C‐terminus
FMDV L
pro frees itself from the growing polypeptide chain by specific cleavage at its own C‐terminus (
Figures 1A,B and
2A). Thus, the presence of CTE residues inside the substrate‐binding pockets of adjacent molecules illustrates substrate recognition during self‐processing and represents, in fact, the P side of the substrate in the self‐processing reaction. The peptide backbone of the final residues of the CTE is in an extended conformation (
Figure 3) similar to that observed in complexes of enzymes of the papain superfamily with peptide‐like inhibitors (
Yamamoto et al., 1991,
1992).
The main interactions between the CTE and the substrate‐binding site (
Figure 6) are provided by Lys201′ and Leu200′, with minor contributions from residues Lys199′–Val196′ (residues in the CTE of a symmetry‐related molecule are labelled with primed numbers). The final CTE residue, Lys201′, is positioned close to the active site Cys51 (replaced by Ala in this structure) as if catalysis had been completed. One of its carboxylate oxygens, located in the oxyanion hole, is hydrogen bonded to the side chain of Asn46 and the main‐chain nitrogen of Cys51, whereas the second carboxylate oxygen accepts a hydrogen bond from the imidazole ring of the catalytic His148 (
Figure 6), expected to be protonated as described for papain (
Yamamoto et al., 1991;
Brocklehurst et al., 1998). The CTE establishes additional interactions with the substrate‐binding site through hydrogen bonds between main‐chain atoms. Thus, the main‐chain nitrogen of Lys201′ forms a hydrogen bond with the main‐chain carbonyl of Glu147; the main‐chain oxygen and nitrogen atoms of Leu200′ are hydrogen bonded to the main‐chain nitrogen and oxygen, respectively, of Gly98, building a short antiparallel β‐sheet. These hydrogen bond interactions are a conserved feature in substrate binding by papain superfamily members.
The S
1 subsite in Lb
pro is a narrow cleft bounded by the loop preceding the central helix α1 on one side, the β‐turn connecting strands β5 and β6 on the other side and the active site at the bottom (
Figures 2B and
6A,
B). In the Lb
pro structure, the aliphatic portion of the side chain of Lys201′, which occupies the S
1 subsite, is sandwiched between the main chain of residues His95–Glu96 and the side chain of Glu147, while its amino group establishes electrostatic interactions with the carboxylates of Glu96 and Glu147 (
Figure 6A). The S
1 subsite in papain and other family members is a wide, unrestricted pocket which exerts relatively little influence on the substrate specificity (
Figure 5). In Lb
pro, which clearly prefers lysine at P
1 in the self‐processing reaction (
Figure 2A), this subsite has become narrower and deeper due to a rearrangement of the loop connecting strands β5 and β6 on the R domain. Amino acid sequence alignments and modelling of the ERV1 L
pro imply a correlation between the side chain at P
1 and that of residue 147. Thus, FMDV enzymes, with lysine in P
1, have a negatively charged glutamate at position 147; in contrast, the corresponding residue in ERV1, with serine at P
1, is Gly149, shorter and not charged (
Figure 2A).
The side chain of Leu200′ (P
2) is completely buried in a hydrophobic S
2 pocket formed by Trp52, Gly97–Pro100, Leu143, Glu147–Ala149 and Leu178 (
Figure 6A and
C). The architecture of the Lb
pro S
2 subsite is very similar to that of other papain superfamily proteases; in fact, the residues defining the pocket in papain are identical to those in Lb
pro, with the exception of Leu143 which is equivalent to Val P‐133.
CTE residues Lys199′ (P3) and Arg198′ (P4) occupy loose pockets on opposite faces of the cleft, in the S3 and S4 subsites, respectively. The aliphatic portion of the side chain of Lys199′ makes van der Waals contacts with main‐chain atoms of residues Gly97–Gly98, and its amino group interacts through a hydrogen bond with the main‐chain carbonyl group of Glu93 and through weak ionic interactions with the side‐chain carboxylates of Glu93 and Glu96. Arg198′ makes van der Waals contacts with Gly98, Pro99, Leu143 and extensively with Gln146. Its guanidinium group also hydrogen‐bonds to the amide side chain of Gln146. Residue Gln197′ (P5) has its side chain exposed to the solvent, but still contacts through its main chain Pro99, thus making a very open subsite S5. Finally, Val196′ is buried in a hydrophobic cavity (subsite S6 formed by residues Pro99, Ala101, Val127 and Leu178), located on the interdomain cleft, just underneath subsite S2.
Biological implications of the structure
Self‐cleavage at the C‐terminus. The presence of the CTE in the active site of adjacent molecules argues for intermolecular self‐processing. However, although in the crystal structure of Lb
pro the CTE projects away towards neighbouring molecules, instead of folding back into its own substrate‐binding cleft, several structural features suggest that self‐processing
in cis is possible and might even be favoured. First, the interface between the globular domains exchanging their CTEs is composed of weak interactions, indicating that this region is not designed to promote an intermolecular reaction. Secondly, residues located immediately after Tyr183, at which point the polypeptide chain leaves the globular region to begin the CTE, favour a turn. Notably, Asp184 and Glu186 are conserved in all serotypes of FMDV and in ERV1, enabling the placement of a conserved hydrophobic residue (Leu188) into a shallow hydrophobic pocket formed by residues Ala118, Pro121, Thr130, Met132 and Cβ of Asp136. This interaction leads the CTE polypeptide in a direction compatible with both
cis and
trans self‐processing. A polar, highly flexible stretch (residues Asn189–Glu191; modelled in
Figure 7) should overcome the distance between this pocket and the above‐described subsites in the substrate‐binding cleft. A tryptophan residue at position 192 (or the aromatic residue found in ERV1 Tyr201) would enhance self‐processing
in cis by stacking its aromatic ring with the exposed and conserved Trp105 (ERV1 Tyr103) of the globular domain. Thus, the CTE would reach subsite S
6 (interaction with Val196) with minor rearrangements of the main chain (
Figure 7); the electrostatic and van der Waals interactions of the CTE with the substrate‐binding cleft would maintain the correct orientation for the self‐processing
in cis.
Why are intramolecular CTE interactions not observed in the Lbpro crystal structure? First, the path of the CTE to its ‘own’ active site is partially blocked by the intermolecular disulfide bridge between neighbouring molecules. This bond is not believed to be present in the reducing environment inside the cell. Secondly, crystal packing requirements may have also favoured the observed CTE interactions. Finally, the ability to cleave eIF4G requires the CTE to be flexible and not remain in the active site. Evidence for the flexible nature of the CTE is provided by the lack of density in the sLbpro form and the differences in the positions of certain residues in the disordered form of the CTE. The disorder in the CTE also appears to be energetically favoured, as freezing the polar CTE in a fixed conformation would impose a high entropic penalty.
Cleavage of eIF4G. The tendency of the CTE to leave the active site of the same polypeptide chain after
cis cleavage is of significance for the recognition and cleavage of eIF4G by Lb
pro. Thus, the enzyme cannot be inhibited by binding to its own C‐terminus, which served as the recognition site for the cleavage on the viral polyprotein. However, as indicated above, this implies that the S site does not provide sufficient interactions to maintain the substrate in the active site; indeed, the absence of significant intramolecular product inhibition is probably a direct result of this inability. Thus, to bind to its cleavage site on eIF4G, which lacks a basic P
1 residue, the enzyme appears to employ the acidic patch of the S′ site (
Figure 5) to provide an ionic interaction with the P′
1 Arg residue of the cleavage site on eIF4G. Indeed, it is noteworthy that all intermolecular substrates of Lb
pro identified so far
in vitro contain basic residues at P′
1 or P′
2 or both, even when the P
1 residue is basic (data not shown). Taken together, these observations suggest that a substrate containing basic residues at both P
1 and P′
1 should be an optimized substrate for the Lb
pro. As, however, no data are available on the sequence preference of Lb
pro on peptide substrates, experiments are underway to investigate this notion.
The acidic patch at the S′ site of Lb
pr°, coupled with the narrow cleft traversing the active site (
Figures 5 and
6A), also appear to be the reasons why the Lb
pro is clearly much more specific than papain, although the two enzymes possess an S
2 pocket almost identical in composition and topology. Thus, Lb
pro does not cleave an immunoglobulin molecule, a classic substrate of papain. Furthermore, although eIF4G is an efficient substrate for papain, the cleavage products are not the same as those of Lb
pro (B.Hampoelz and T.Skern, unpublished).