Introduction
Proteins can recognize specific nucleotide sequences within DNA molecules by the combined effects of different types of interactions. These include specific patterns of hydrogen bonding, hydrophobic interactions and electrostatic interactions of amino acid residues with the bases and the phosphate backbone of the DNA. Water molecules and the flexibility of proteins and DNA can also aid the recognition process (
Rhodes et al., 1996). Specificity can be achieved using α‐helices, β‐strands or peptide loops to form a backbone structure for the presentation of specific amino acids to the DNA target sequence. Protein‐DNA interfaces have been identified in some complexes, and in the case of a phage repressor, in which a helix is used for recognition, a rational change of amino acids on one side of the helix has led to a repressor with a new, but predicted, target sequence (
Wharton and Ptashne, 1985). No general recognition code by which amino acids recognize bases exists, but the deduction of one may be possible in the favourable case of the α‐helix used by zinc fingers (reviewed in
Choo and Klug, 1997). Such a code may be very difficult or impossible to find for other proteins that use more flexible β strands or polypeptide loops to recognize their DNA targets.
Type I restriction and modification (R‐M) systems are complex enzymes that recognize a specific nucleotide sequence in DNA by unknown mechanisms. A type I R‐M enzyme is made up of three subunits encoded by the genes
hsdR,
hsdM and
hsdS. HsdM and HsdS comprise a methyltransferase (M
2S
1), which on association with HsdR forms an endonuclease (R
2M
2S
1). The methyltransferase modifies hemimethylated target sequences produced after each round of DNA replication, while the endonuclease cleaves unmethylated DNA from foreign sources such as viruses (for review see
Wilson and Murray, 1991;
Redaschi and Bickle, 1996).
The HsdS subunit of type I systems confers sequence specificity. It recognizes an asymmetric bipartite nucleotide sequence;
EcoKI, for example, recognizes the sequence AAC‐N
6‐GTGC. HsdS contains two target recognition domains (TRDs), each of ∼160 amino acids; the N‐terminal TRD interacts with the trinucleotide (5′) component of its target sequence and the C‐terminal TRD with the tetranucleotide (3′) component. Each TRD functions independently of the other and they can be interchanged to generate novel hybrid specificities (
Fuller‐Pace et al., 1984;
Nagaraja et al., 1985;
Cowan et al., 1989). However, a single TRD is not sufficient for DNA recognition, as a truncated HsdS subunit with only one TRD dimerizes to produce an enzyme which recognizes a symmetric bipartite nucleotide sequence (
Abadjieva et al., 1993;
Meister et al., 1993).
No type I R‐M enzyme has been crystallized and there is negligible information to imply which amino acids within a TRD interact with DNA. Random mutations were introduced into the 5′ region of hsdS coding for the 160 amino acids that include the N‐terminal TRD, with the aim of identifying those amino acids that confer specificity. In such experiments, it is important that the changes should be made randomly throughout the coding sequence of the TRD in order to avoid any bias imposed by the experimenters. It was anticipated that those amino acids that could be changed without altering the R‐M activity would be unlikely to be involved in target recognition. Amino acids for which substitutions resulted in the loss of both restriction and modification activity (i.e. an r−m− phenotype) would include those involved in a specific interaction with DNA.
As the result of random mutations, 79 of the 160 residues within the TRD were altered, and 94 of the 101 substitutions failed to lead to the loss of both restriction and modification activities, although some impaired restriction. Most of those rare mutations that resulted in the abolition of R‐M activity were within the interval between residues 80 and 110. On the basis of a model derived from sequence alignments, this same interval has been predicted to form the protein‐DNA interface comprising two loops flanking a β‐strand (
Sturrock and Dryden, 1997). The analysis of the region identified by random mutagenesis was extended by using site‐directed mutagenesis to change some residues that the model predicts to be of significance. Enzymes from a selection of mutants were purified, and their properties, particularly DNA binding, examined. A strong correlation was found between phenotype and DNA‐binding affinity as determined by fluorescence anisotropy.
Discussion
Restriction enzymes must recognize their target sequences with great precision, otherwise the cutting of modified DNA as the result of recognition errors could have severe consequences. Nevertheless, no structural motif characteristic of target recognition has been identified. For type I R‐M systems, the most definitive experiments merely correlate entire TRDs of 150‐180 amino acids with the recognition of the tri‐ or tetra‐nucleotide component sequences (
Cowan et al., 1989). We therefore applied random mutagenesis to the segment of the specificity genes of
EcoKI that is responsible for the recognition of the 5′‐trinucleotide component of the target sequence. Random mutagenesis was chosen with the aim of identifying those amino acid substitutions that were innocuous as well as those that impaired specificity. In this context, it is noteworthy that, for the majority of amino acid substitutions, the genetic tests failed to detect any effect on the restriction and modification phenotypes. Unexpectedly common, but not obviously localized, were substitutions in HsdS that only impaired restriction but not modification. Such substitutions could affect the interaction of HsdS with HsdR (
Zinkevich et al., 1992;
Weiserova and Firman, 1998) or they could influence the interaction of the complex with its DNA target.
The structure of a complex of M.
HhaI with its target sequence shows that two loops, and the β‐strand between them, interact with DNA (
Klimasauskas et al., 1994). This region includes several very short conserved amino acid sequences which are found in the TRDs of C5‐methyltransferases (
Cheng and Blumenthal, 1996;
Lange et al., 1996). Comparisons of the sequences of 51 TRDs of type I R‐M systems have resulted in a multiple sequence alignment that supports a distant relationship between the structure of the type‐I TRDs and the known structure of the TRD of M.
HhaI, a monomeric DNA methyltransferase (
Sturrock and Dryden, 1997). Of the 51 TRD sequences examined, the N‐terminal TRD of
EcoKI shows the closest similarity to the TRD of M.
HhaI, sufficient to suggest that
EcoKI might, like M.
HhaI, interact with DNA via two loops and a β‐strand. On the basis of the alignment, these structures are predicted to occur between amino acids 84 and 121 of the N‐terminal TRD of
EcoKI. The initial aim of the random mutagenesis was to identify amino acids close to the DNA, but the data are also relevant as a test of the predictive value of the model.
The majority of the mutations that conferred an r
−m
− phenotype, the phenotype anticipated for mutations leading to loss of specificity, were loosely clustered. The significance of this finding is enhanced when the substitutions are examined in the light of the structural model of
Sturrock and Dryden (1997). In the context of this structural model, the loosely clustered mutations that confer an r
−m
− phenotype are preferentially located at the protein‐DNA interface (
Figure 5); amino acid residues 91, 92, 103 and 107 appear close to the interface and our data have shown them to be important for DNA binding.
Chen et al. (1995) found that tyrosine 27 in the HsdS subunit of
EcoKI could be crosslinked to DNA, implying its close proximity to DNA. However, replacing tyrosine with either cysteine or phenylalanine does not confer an r
−m
− phenotype (M.O'Neill, unpublished data). Y27 is not within the part of the TRD included in the model of
Sturrock and Dryden (1997), and the phenotype of mutants with the changes Y27F and Y27C indicates that the tyrosine residue is not relevant to DNA specificity.
Taylor et al. (1996) have reported experiments in which the exposed lysine residues in
EcoR124I were identified by chemical modification. Lysine residues 261, 297 and 327 in the carboxy‐TRD were susceptible to modification especially in the absence of bound DNA. K297 is the most strongly modified residue and it lies within the second of the two proposed loops (
Sturrock and Dryden, 1997). The three lysine residues identified by chemical modification are conserved in the amino TRD of
StySKI (
Thorpe et al., 1997), which recognizes the same target sequence as the carboxy‐TRD of
EcoR124I. The circumstantial evidence that is available therefore is consistent with the present data that implicate interaction of the putative loop‐β‐strand‐loop with the target sequence.
The T57P and the G141A substitutions are outside the confines of the protein‐DNA interface predicted by the model; nevertheless, they confer an r−m− phenotype. This may indicate that the model has incorrectly identified the interface or that the model is incomplete. However, the protein containing T57P may have a catalytic defect since the protein has the same subunit structure as the wild‐type enzyme and can still bind DNA. In addition, it was found that the substitution T57A did not affect modification, implying that DNA specificity was maintained. Therefore, there is no reason to implicate residue T57 in DNA specificity.
The effect of the substitution of glycine 141 with alanine is more difficult to explain. The protein with this substitution was purified largely in the M1S1 form, but this does not explain the defect in DNA binding. The inactivity may be attributed to a severe defect in the structure of the dimer, an indirect conformational change affecting the protein‐DNA interface or a direct effect at the interface. The last of these three alternatives would imply that the model, based on the relationship with M.HhaI, incompletely defines the protein‐DNA interface.
Comparing the normal EcoKI methyltransferase with proteins that contain amino acid substitutions but can still bind to their target sequence, one can calculate the free energy difference for DNA binding between the two proteins. The free energy difference is rather small, equivalent to the loss of at most one hydrogen bond in the protein‐DNA complex. This suggests that any structural perturbation due to the amino acid change is small. The free energy changes, and by implication any structural perturbation at the protein‐DNA interface, are much larger for amino acid changes that abolish DNA binding.
We conclude that the loss of DNA binding in some of our mutants is due to the disruption of the protein‐DNA interface by a combination of steric hindrance, electrostatic interactions and loss of hydrogen bonding. The results of the random mutagenesis procedure on the amino‐TRD of
EcoKI show that most of the mutations leading to an r
−m
− phenotype were confined to a limited region of the TRD which had been predicted to interact with DNA (
Sturrock and Dryden, 1997). Mutations that did not alter the phenotype were spread throughout the TRD. Further site‐directed mutagenesis of the putative DNA‐binding region produced changes in the phenotype consistent with the structural model. Amino acid changes that had little effect on the modification phenotype, r
+m
+ or r
−m
+, had little effect on DNA binding. Four amino acid changes in the predicted DNA‐binding region that abolished modification activity
in vivo prevented DNA binding
in vitro. Most of the amino acid substitutions support the structural model and the methods used to derive it. Our study opens the way for further analysis of DNA sequence recognition by type I R‐M enzymes and identifies other amino acids potentially involved in the formation of an active protein‐DNA complex.
Materials and methods
Bacterial strains
All the bacterial strains used were derivatives of
Escherichia coli K‐12. C600 (
Appleyard, 1954) was used as a restriction‐proficient (r
+m
+), λ‐sensitive strain. Two r
−m
− derivatives of 71‐18 (
Messing et al., 1977) were used for complementation tests. In the first, NM522 (
Gough and Murray, 1983),
hsdM and
S are deleted; in the second, NM521
kan (this work), the
kanr 'Genblock' of Pharmacia, is substituted for a segment of
hsdM and
S. The
kanr marker facilitated monitoring the transfer of
hsdS mutations to the
hsd operon in the bacterial chromosome. A
sup° Δ (
hsd) strain, NM679 (
King and Murray, 1995), was used to select λ
hsd phages lacking amber mutations (
Webb et al., 1996), and as the host for propagating
hsdM+S− plasmids for the production of mutant methyltransferase for purification. The
hsdR endA recA strain, DH5α (
Grant et al., 1990), was used as a competent host for the recovery and amplification of plasmid DNA for sequence analysis following random mutagenesis; XL1‐Blue was provided for use with the Quickchange™ mutagenesis kit of Stratagene.
Plasmids
The plasmid used for analysing random mutations in segments of hsdS included both hsdM and S, to enable tests for K‐specific modification, but it required new restriction sites to permit substitution of mutated DNA segments for the wild‐type sequences. New targets were incorporated, none of which altered an amino acid sequence. To ensure that all members of each random library included the mutagenized DNA, the rationale of displacing a readily identifiable 'stuffer' fragment, rather than the wild‐type sequence, was adopted.
The
hsdM and
S genes were subcloned from pJFMS (
Dryden et al., 1993) using the PCR method of overlap extension (
Ho et al., 1989). A
BamHI‐
SalI fragment and a
SalI‐
EcoRI fragment (see
Figure 6A) were amplified by the PCR, and the two fragments were joined in a second PCR reaction. The resulting
BamHI‐
EcoRI fragment was cloned in pUC18. The
BamHI site is a natural sequence within
hsdM, SalI provides a new target within the centre of
hsdS and
EcoRI is a new target that separates the
hsd genes from irrelevant bacterial DNA present in pJFMS. The
SmaI‐
BamI fragment including
hsdM was isolated from pJFMS and added to reconstruct the
hsdMS sequence in pUES6 (
Figure 6A).
A
BamHI‐
SalI fragment including kan
r replaced the
BamHI‐
SalI fragment of pUES6. The
kanr gene (Pharmacia) from pGEM3
kan (a gift from F.Fuller‐Pace) was first subcloned as a
PstI fragment in pBluescript to provide the flanking
BamHI and
SalI targets. Mutagenesis of the sequence coding for the entire amino‐TRD commonly resulted in multiple amino acid changes. The target for mutagenesis was therefore split into two segments of equal size in order to maximize single mutations. To permit this, a unique
BspEI site was introduced (pBSH1 in
Figure 6B). pBSH1 was made by recloning the
hsdMS genes in a derivative of pUC18 that lacked the
HindIII target in the polylinker. An 18 bp stuffer fragment including diagnostic targets then replaced the
BspEI‐
HindIII fragment of
hsdS.
In pXB1 (
Figure 6C), a unique
XhoI site was generated within a sequence only 30 bp from the 3′ end of
hsdM of pBSH1. This permitted mutagenesis of the 5′ end of
hsdS in the presence of very little
hsdM sequence. The
XhoI‐
BspEI fragment in pXB1 was replaced with an 18 bp stuffer fragment.
pJES23 has the
SmaI‐
EcoRI fragment from pUES6 cloned in the expression vector pJF118 (
Furste et al., 1986).
DNA manipulations
Klenow enzyme, T4 DNA ligase and all the restriction enzymes were supplied by NEB Biologicals, alkaline phosphatase by Epicentre Technologies and the Taq polymerase ('Red Hot' polymerase) by Advanced Biotechnologies.
Plasmid DNA was isolated from DH5α or XL1‐Blue strains using Qiagen Mini‐prep kits or Nucleon Mini‐prep kits (Amersham), and DNA sequences were determined using a Perkin‐Elmer ABI Prism 377 DNA Sequencer.
Site‐specific mutations were made, unless otherwise stated, by the 'Quick Change' site‐directed mutagenesis kit (Stratagene).
Random mutagenesis using PCR
The frequency with which mutations are found in a segment of DNA is a function of the polymerase error rate and the number of cycles in the PCR (
Eckert and Kunkel, 1991). Therefore, the frequency of mutation can be manipulated experimentally by increasing the number of cycles and/or increasing the error rate of the polymerase. This latter factor can be adjusted by altering parameters such as the length of time at the extension temperature (72°C) (
Zhou et al., 1991), the addition of transition metal ions, e.g. Mn
2+ (
Leung et al., 1989), and the dNTP composition and concentration (
Spee et al., 1993). Two methods were used to generate random mutations: the first required the addition of MnCl
2 to the reaction and was adapted from
Leung et al. (1989). A PCR comprising 10 ng of template, 200 μM dNTPs and 200 nM primers was supplemented with 0.5 mM MnCl
2. Reaction conditions were 35 cycles at 95°C for 1 min, 58°C for 1 min, 72°C for 3.5 min. The second method, using limiting dNTPs and a dITP supplement, was carried out as described in
Spee et al. (1993), although the PCR conditions were as above.
Generating random mutations within the TRD
All the mutants are listed in
Table I. The first target for mutagenesis, the 766 bp
BamHI‐
SalI fragment, generated mutants 1‐100. One PCR used a supplement of MnCl
2 and the second was prepared using dITP and limiting dNTPs. DNA obtained from the mutagenic PCRs was digested with
BamHI and
SalI and ligated in pUEKan
R. Ampicillin‐resistant transformants of DH5α that had lost resistance to kanamycin were presumed to have the correct fragment. Analysis of the DNA of the transformants showed that in 99% of them the
kan gene had been replaced with a
BamHI‐
SalI fragment corresponding to the 766 bp fragment of the TRD.
A second library of mutants (those with numbers between 101 and 200) was obtained by random mutagenesis of the 233 bp BspEI‐HindIII fragment. The mutagenized DNA was digested with BspEI and HindIII, and cloned in pBCBH1. Transformants of DH5α were screened for the loss of the stuffer fragments; most had replaced the stuffer fragment with mutagenized hsdS DNA. Both methods of mutagenesis were used to generate mutations; the dNTP method gave a lower yield of DNA and therefore the majority of mutants came from the PCR with MnCl2.
A third library of mutants (with numbers between 201 and 275) was obtained from random mutagenesis of the XhoI‐BspEI fragment of pXB1 using MnCl2. The mutagenized DNA was cloned in pXSC1 and treated in the same way as pBSH1 and pBCBH1.
Transfer of mutations to the chromosome
Mutations were transferred to the chromosome from a λ
hsd phage by homologous recombination (
Gough and Murray, 1983). The
hsdS gene tagged with
kanr was transferred to the chromosome of the
hsd+ strain 17‐81 to generate the r
−m
− derivative NM521
kan. kan was then replaced by
hsd genes including point mutations in
hsdS; any Kan
S strains should have acquired the
hsdS region that had been subjected to mutagenesis.
Test for restriction and modification
The
hsdR+M−S− strain NM522 transformed with
hsdM+S− plasmids, and
hsdR+M+S− strains in which the mutation was on the chromosome, were tested for their ability to restrict and modify unmodified λ phages (λ
vir.0), as described by
Fuller‐Pace et al. (1985).
Protein and substrate preparation
Methyltransferases were purified from 500 ml cultures of NM679 carrying derivatives of pJES23. The protein concentrations were determined by UV spectroscopy as described previously (
Dryden et al., 1993,
1997).
Synthetic, 21 bp unmethylated DNA oligonucleotide duplexes containing a hexachlorofluorescein label on the top strand were prepared as described previously (
Powell et al., 1998a) and used to compare DNA binding by the methyltransferases via observation of the change in fluorescence anisotropy of the label which occurs upon protein binding. The DNA sequence of the top strand was 5′‐hexachlorofluorescein‐GCCT
AACCACGTG
GTGCGTAC‐3′. The 21 bp fluorescent duplex containing the
EcoKI target has the same sequence as the central portion of the 45 bp duplex used in gel retardation (
Powell et al., 1993).
Measurement of molecular weight and AdoMet binding
Protein was applied to a FPLC Superose12 gel filtration column (Pharmacia) which had been calibrated with proteins of known molecular weight. The buffer used was 20 mM Tris‐HCl, 0.1 M NaCl, 6 mM MgCl2, 7 mM β‐mercaptoethanol pH 8; the column flow rate was 1 ml/min and the elution profile was monitored at 280 nm. To determine the extent of AdoMet binding, protein and [3H‐methyl]AdoMet (Amersham) were mixed together at a concentration of 3 μM in 20 mM Tris, 20 mM MES, 0.2 M NaCl, 10 mM MgCl2, 7 mM β‐mercaptoethanol, 0.1 mM EDTA pH 8. Samples were then exposed to UV radiation for 20 min to crosslink the protein to the AdoMet. The samples were run on a 10% SDS‐polyacrylamide gel and transferred by electroblotting to a PVDF membrane. The dried membrane was then coated with enhanced autoradiography scintillating wax (EABiotech) as per the manufacturer's instructions and exposed to preflashed X‐ray film at −70°C. After development of the film, the PVDF blot was stripped of the EA‐wax by washing it with toluene and the proteins stained with Coomassie Blue. This allowed a direct matching of the cross‐linked proteins with the X‐ray film image.
Measurement of DNA binding with fluorescent oligonucleotides
The buffer used for fluorescence anisotropy experiments was 20 mM Tris‐HCl, 0.1 M NaCl, 6 mM MgCl
2, 7 mM β‐mercaptoethanol pH 8. AdoMet (New England Biolabs) was present at 100 μM to ensure saturation of binding sites (
Powell et al., 1993). Anisotropy measurements were performed at 25°C on 400 μl samples in an Edinburgh Instruments FS900T fluorimeter for DNA concentrations of 5 nM. The excitation wavelength was 530 nm, and the emission wavelength 570 nm with bandwidths of 3.6 and 10 nm, respectively. The excitation pathlength was 10 mm, the emission pathlength 2 mm. Small amounts of protein were added to the DNA solution in the cuvette using a microlitre syringe, and gently stirred. The cuvette was not removed from the instrument for these additions. The fluorescence intensity was measured with crossed and parallel polarizer orientations and the anisotropy calculated. The broad emission spectrum of the probe allowed the instrumental G factor to be set to 1 ± 0.01 by small adjustments of the emission slitwidths. Each protein titration was repeated at least in duplicate. Each titration took ∼45 min to complete. Data were fitted to a single‐site binding equation which accounted for significant concentrations of complex when the DNA concentration was similar to the
Kd (
Heyduk and Lee, 1990). The binding of protein to hexachlorofluorescein‐labelled 21 bp duplexes caused changes of <10% in the fluorescence emission intensity of the fluorescent probe, thereby nearly satisfying the assumption in the binding equation (
Heyduk and Lee, 1990) that the quantum yield was invariant (
Hill and Royer, 1997). The change in anisotropy reflected the increase in size and change of shape of the DNA duplex upon protein binding (
Jameson and Sawyer, 1995;
Hill and Royer, 1997).