Journal of Molecular Biology
A New Clustering of Antibody CDR Loop Conformations
Graphical Abstract
Research Highlights
► Clustering of CDR conformations from over 300 antibody structures. ► Modern clustering method using affinity propagation in dihedral angle space. ► Most of Chothia's 25 canonical conformations confirmed. ► A total of 72 clusters identified for use in antibody structure prediction and design.
Introduction
Prediction of the three-dimensional structure of antibodies is an important step in improving their affinity, stability, and suitability as therapeutics. Given the conserved structure of the frameworks of the heavy-chain variable (VH) domain and the light-chain variable (VL) domain, much of the attention in structural bioinformatics has focused on complementarity-determining regions (CDRs) involved in binding antigens. Studies by Chothia, Lesk, Thornton, and others in the 1980s and 1990s centered around the idea of identifying a small number of “canonical structures” for six CDR loops [H1, H2, and H3 of the VH domain; L1, L2, and L3 of the VL domain] of various lengths.1, 2, 3, 4 The central hypothesis, first stated in 1987,1 was that “most of the hypervariable regions in immunoglobulins have one of a small discrete set of main-chain conformations that we call ‘canonical structures,’” and that a small number of key residues could be used to predict to which conformational class a new CDR sequence might belong. In further studies, Al-Lazikani et al.,2 Martin and Thornton,3 Oliva et al.,5 Wilmot and Thornton,6 Shirai et al.,7 and Kuroda et al.8 defined canonical structures based on loop length and, in some cases, different conformations for certain loop lengths. Residues at some positions—in particular glycine, proline, aromatic residues, and hydrogen-bond donors and acceptors—were proposed to be responsible for differences in conformation. In their 1997 study, Chothia and coworkers found a total of 25 canonical classes due to the larger number of structures available.2
Chothia et al. used a manual clustering of antibody loops and sequences to define their canonical classes. Martin and Thornton in 1996 used a quantitative clustering approach for an automated classification scheme.3 They performed a cluster analysis in internal coordinate space, followed by a postcluster merging of groups of structures in Cartesian coordinate space [using root-mean-square deviation (RMSD)] to classify the observed CDRs. In some instances, they observed that although a loop might be closer in sequence to one of the Chothia canonical classes, it structurally belonged to another. They note this as a limitation to the more sequence-based analyses of previous studies.
There have been a number of studies that focused specifically on the structural motifs found in the structurally diverse heavy-chain H3 CDR.3, 9, 10, 11, 12, 13 Morea et al. divided the H3 hypervariable region into a “torso” region and the “head” of the loop.4 They found that the torso typically takes on one of two conformations, either bulged or extended β-sheet, and the possible conformations of the head region are then limited by the structure of the torso residues. Oliva et al. also divided H3 loops into groups based on structure.5 They defined loop conformations using a geometric alphabet, as described by Wilmot and Thornton.6 Shirai et al. identified, through inspection, a series of sequence–structure relationships that they then transformed into a set of rules to classify H3 structures.7 In particular, they believed that the presence or the absence of salt bridges in the ‘torso’ region, as defined by Morea et al., leads to either bulged or extended conformations in that region.4 Kuroda et al. later revised their list of H3 sequence–structure rules with the availability of more H3 structures.8
For non-H3 loops, the most recent comprehensive analyses of their conformations were performed in 1996–1998. With the large increase in the number of available antibody structures, we decided to revisit the analysis of the conformations of antibody CDRs to see whether the canonical classes based on 17 structures2 or fewer than 60 structures3 have held up and whether new ones may be identified. In this article, we update the classification of all six CDR regions based on the current Protein Data Bank (PDB). We filtered out low-resolution structures, loops with high B-factors or high conformational energies, and redundant sequences. A total of 337 unique heavy chains and 311 unique light chains were used to construct a structural database of antibody loops. Unlike Chothia's analysis, we found it most intuitive to group CDRs into CDR type (L1, L2, etc.) and loop length. We refer to these as “CDR–length combinations” or simply “CDR-lengths” for short. For instance, a common loop length for CDR L1 is 11, and we designate this as “L1–11.” We then applied clustering to the conformations of all loops of a particular CDR–length combination using an affinity propagation clustering method9 with a dihedral-angle distance function. We found that most of the canonical conformations found by Chothia et al. occur in many of the 300+ antibody structures now available. We have identified a total of 72 clusters of conformations, most of which are observed in two or more antibody structures. We provide a detailed comparison of our results to previous antibody loop classifications based on smaller data sets.
Section snippets
Data set
As described in Materials and Methods, we used manually curated multiple-sequence alignments to construct hidden Markov models (HMMs) of the VH and VL domains. We used these models to search the entire set of PDB sequences to identify all PDB chains with antibody variable domains. There were a total of 923 antibody PDB entries that contain at least one hypervariable loop with all backbone atom positions defined. Since the asymmetric units of many PDB entries contain more than one copy of the
Discussion
In this work, we have revisited the problem of clustering the structures of the six CDR loops of antibodies. A thorough analysis such as this has not been accomplished since the work of Chothia et al. and Martin and Thornton in 1996–1997. The number of antibody structures is at least 5-fold larger now than it was then (and 15-fold larger than the set used by Chothia). Because of this, we have been able to remove questionable structures (those of low resolution or high-energy backbone
Hidden Markov models of the V domains of heavy and light chains
PSI-BLAST32 was used to search a database of all sequences in the PDB (the nonredundant sequence file pdbaanr available on our PISCES Web site),33, 34 using the variable domain regions of the antibody structure in PDB entry 1Q9R .14 Only sequences above a 35% identity and E-values better than 1.0 × 10− 20 were kept, such that only antibody domains remained (e.g., excluding T-cell receptors and other Ig sequences). The resulting heavy-chain and light-chain sequences were culled at 90% identity using
Acknowledgements
This work was supported by National Institutes of Health grants P20 GM76222 and R01 GM84453 (R.L.D., principal investigator) and National Institutes of Health training grant T32 CA009035.
References (40)
- et al.
Canonical structures for the hypervariable regions of immunoglobulins
J. Mol. Biol.
(1987) - et al.
Standard conformations for the canonical structures of immunoglobulins
J. Mol. Biol.
(1997) - et al.
Structural families in loops of homologous proteins: automatic classification, modeling, and application to antibodies
J. Mol. Biol.
(1996) - et al.
Conformations of the third hypervariable region in the VH domain of immunoglobulins
J. Mol. Biol.
(1998) - et al.
Automated classification of antibody complementarity determining region 3 of the heavy chain (H3) loops into canonical forms and its application to protein structure prediction
J. Mol. Biol.
(1998) - et al.
H3-rules: identification of CDR-H3 structures in antibodies
FEBS Lett.
(1999) - et al.
Yet another numbering scheme for immunoglobulin variable domains: an automatic modeling and analysis tool
J. Mol. Biol.
(2001) - et al.
Structural families in loops of homologous proteins: automatic classification, modelling and application to antibodies
J. Mol. Biol.
(1996) - et al.
High-resolution crystal structure of the Fab-fragments of a family of mouse catalytic antibodies with esterase activity
J. Mol. Biol.
(2003) - et al.
The role of hydrogen bonding via interfacial water molecules in antigen–antibody complexation. The HyHEL-10–HEL interaction
J. Biol. Chem.
(2003)