Engineering of Large Numbers of Highly Specific Homing

mammalian cells that in most cases are highly specific and distinct from ... the recipient locus.13,14 Given their natural function ... Second, statistical analysis revealed rules for protein/DNA interaction. Finally, novel ... our first findings was that the original I-CreI protein ... the impact of mutation in regions other than K5 to.
857KB taille 3 téléchargements 207 vues
DTD 5

ARTICLE IN PRESS

doi:10.1016/j.jmb.2005.10.065

J. Mol. Biol. (2005) xx, 1–16

Engineering of Large Numbers of Highly Specific Homing Endonucleases that Induce Recombination on Novel DNA Targets Sylvain Arnould1†, Patrick Chames1†, Christophe Perez1 Emmanuel Lacroix1, Aymeric Duclert1, Jean-Charles Epinat1 Franc¸ois Stricher3, Anne-Sophie Petit1, Ame´lie Patin1 Sophie Guillier1, Sandra Rolland1, Jesu´s Prieto2, Francisco J. Blanco2 Jero´nimo Bravo2, Guillermo Montoya2, Luis Serrano3 Philippe Duchateau1 and Fre´de´ric Paˆques1* 1

CELLECTIS S.A., 102 route de Noisy 93235 Romainville France 2

Structural Biology and Biocomputing Programme Centro Nacional de Investigaciones Oncologica C/ Melchor Fdez Almagro 28029 Madrid, Spain 3 European Molecular Biology Laboratory, Meyerhofstrasse D-62117 Heidelberg, Germany

The last decade has seen the emergence of a universal method for precise and efficient genome engineering. This method relies on the use of sequence-specific endonucleases such as homing endonucleases. The structures of several of these proteins are known, allowing for site-directed mutagenesis of residues essential for DNA binding. Here, we show that a semi-rational approach can be used to derive hundreds of novel proteins from I-CreI, a homing endonuclease from the LAGLIDADG family. These novel endonucleases display a wide range of cleavage patterns in yeast and mammalian cells that in most cases are highly specific and distinct from I-CreI. Second, rules for protein/DNA interaction can be inferred from statistical analysis. Third, novel endonucleases can be combined to create heterodimeric protein species, thereby greatly enhancing the number of potential targets. These results describe a straightforward approach for engineering novel endonucleases with tailored specificities, while preserving the activity and specificity of natural homing endonucleases, and thereby deliver new tools for genome engineering. q 2005 Published by Elsevier Ltd.

*Corresponding author

Keywords: I-CreI; homing endonucleases; protein engineering; specificity; recombination

Introduction Meganucleases are by definition sequencespecific endonucleases with large (O12 bp) cleavage sites and they can be used to achieve very high levels of gene targeting efficiencies in mammalian cells and plants,1–6 making meganuclease-induced recombination an efficient and robust method for genome engineering. The major limitation of the current technology is the requirement for the prior introduction of a meganuclease cleavage site in the locus of interest. Thus, the † S.A. & P.C. contributed equally to this work. Abbreviations used: wt, wild-type; ORF, open reading frame; CHO, Chinese hamster ovary; HE, homing endonuclease. E-mail address of the corresponding author: [email protected] 0022-2836/$ - see front matter q 2005 Published by Elsevier Ltd.

generation of novel meganucleases with tailored specificities is under intense investigation. Such proteins could be used to cleave endogenous chromosomal sequences and lead to a wide range of applications, including the correction of mutations responsible for inherited monogenic diseases. Recently, fusion of Cys2-His2 type ZincFinger Proteins (ZFP) with the catalytic domain of the FokI nuclease were used to make functional sequence-specific endonucleases.7,8 The binding specificity of ZFPs is relatively easy to manipulate, and a repertoire of novel artificial ZFPs able to bind many (G/A)NN(G/A)NN(G/A)NN sequences is now available.9–11 Nevertheless, preserving a very narrow specificity is one of the major issues for genome engineering applications, and presently it is unclear whether ZFPs would fulfil the very strict requirements for therapeutic applications.

DTD 5

ARTICLE IN PRESS

2 Homing endonucleases (HEs) are a widespread family of natural meganucleases including hundreds of proteins.12 These proteins are encoded by mobile genetic elements, which propagate by a process called “homing”: the endonuclease cleaves a cognate allele from which the mobile element is absent, thereby stimulating a homologous recombination event that duplicates the mobile DNA into the recipient locus.13,14 Given their natural function and their exceptional specificity, HEs provide ideal scaffolds to derive novel endonucleases for genome engineering. Data have been accumulated over the last decade characterizing the LAGLIDADG family, the largest of the four HE families.12 LAGLIDADG refers to the only sequence actually conserved throughout the family and is found in one or (more often15) two copies in the protein. Proteins with a single motif, such as I-CreI, form homodimers and cleave palindromic or pseudo-palindromic DNA sequences, whereas the larger, double motif proteins, such as I-SceI, are monomers and cleave non-palindromic targets. Seven different LAGLIDADG proteins have been crystallized, and they exhibit a very striking conservation of the core structure that contrasts with the lack of similarity at the primary sequence level.16–24 In this core structure, two characteristic abbabba folds, contributed by two monomers, or by two domains in double LAGLIDAG proteins, are facing each other with a 2-fold symmetry. DNA binding depends on the four b strands from each domain, folded into an antiparallel b-sheet, and forming a saddle on the DNA helix major groove. The catalytic core is central, with a contribution of both symmetric monomers/domains. In addition to this core structure, other domains can be found: for example, PI-SceI, an intein, has a protein-splicing domain, and an additional DNA-binding domain.20,25 These structural similarities prompted the construction of chimeric and single-chain artificial HEs,26–28 showing that these proteins were robust enough to withstand extensive modifications. Recently, Seligman and co-workers used a rational approach to substitute specific individual residues of the I-CreI abbabba fold, and observed substantial cleavage of novel targets.29,30 In a similar way, Gimble et al. modified the additional DNA-binding domain of PI-SceI and obtained variant proteins with altered binding specificity.31 Here, we used a rational approach along with a cell-based high-throughput assay that directly monitors endonuclease-induced recombination in living cells, to identify functional variants of the I-CreI homodimeric protein. Hundreds of homing endonucleases with new specificities were identified, cleaving altogether 38 DNA targets. Many new proteins maintained their very narrow specificity and displayed high levels of cleavage of new targets in living cells. At the same time, no detectable activity on the wild-type I-CreI cleavage site was detected, showing that engineering can preserve efficacy and specificity. Second, statistical analysis revealed rules for protein/DNA

Engineering of Specific Homing Endonucleases

interaction. Finally, novel proteins could be combined in functional heterodimers to cleave chimeric targets. Our results describe a rational approach to rapidly evolve the specificity of homing endonucleases by collecting large samples of novel endonucleases, and combining them to achieve tailored specificities.

Results Screening for new functional endonucleases I-CreI is a dimeric homing endonuclease that cleaves a 22 bp pseudo-palindromic target. Analysis of I-CreI structure bound to its natural target shows that in each monomer, eight residues establish direct interactions with seven bases.16 Residues Q44, R68, R70 contact three consecutive base-pairs at position 3–5 (and K3 to K5, Figure 1). An exhaustive protein library versus target library approach was undertaken to engineer locally this part of the DNA-binding interface. First, the I-CreI scaffold was mutated from D75 to N to decrease likely energetic strains caused by the replacement of the basic residues R68 and R70 in the library that satisfy the hydrogen-acceptor potential of the buried D75 in the I-CreI structure. The D75N mutation did not affect the protein structure, but decreased the toxicity of I-CreI in overexpression experiments (data not shown). Since I-CreI and its N75 derivative display similar in vitro activities (data not shown) and levels of specificity (see below), further studies will be necessary to determine the origin of I-CreI toxicity. Next, positions 44, 68 and 70 were randomized and 64 palindromic targets resulting from substitutions in positions G3,G4 and G5 of a palindromic target cleaved by I-CreI18 were generated, as described in the legend to Figure 1. Variant proteins, which all have the D75N mutation, are named here after the residues found in positions 44, 68 and 70 (for example, the I-CreI N75 scaffold is QRR, and KTG is I-CreI K44 T68 G70 N75). Palindromic targets are named after the three bases found on the top strand at K5, K4 and K3 (for example, the CAT target has C in K5, A in K4 and T in K3). We have previously described an assay to monitor homing endonuclease-induced recombination in yeast cells.26 A modified assay, the principle of which is described in the legend to Figure 2(a), includes a mating step, and allows for the screening of the same meganuclease with several different targets. We used a robot-assisted mating protocol to screen a large number of endonucleases from our library. The general screening strategy is described in the legend to Figure 2(b). A total of 13,824 homing endonuclease-expressing clones (about 2.3-fold the theoretical diversity) were spotted at high density (20 spots/cm2) on nylon filters and individually tested against each one of the 64 target strains (884,608 spots). A total of 2100 clones

DTD 5

ARTICLE IN PRESS

Engineering of Specific Homing Endonucleases

3 showing an activity against at least one target were isolated (Figure 2(b)) and the open reading frame (ORF) encoding the homing endonuclease was amplified by PCR and sequenced. A total of 410 different sequences were identified and a similar number of corresponding clones were chosen to analyse further. The spotting density was reduced to four spots/cm2 and each clone was tested against the 64 reporter strains in quadruplicate, thereby creating complete profiles (as in Figure 3(a)). A total of 350 positives could be confirmed. Next, to avoid the possibility of strains containing more than one clone, mutant ORFs were amplified by PCR, and recloned in the yeast vector. The resulting plasmids were individually transformed back into yeast. A total of 294 such clones were obtained and tested at low density (four spots/cm2). Differences with primary screening were observed mostly for weak signals, with 28 weak cleavers appearing now as negatives. Only one positive clone displayed a pattern different from what was observed in the primary profiling. The 350 validated clones showed very diverse patterns. Some of these new profiles shared some similarity with the wild-type scaffold, whereas many others were totally different. Various examples are shown in Figure 3(a). Homing endonucleases can usually accommodate some degeneracy in their target sequences, and one of our first findings was that the original I-CreI protein itself cleaves seven different targets in yeast. In a former study, Monnat and co-workers used an in vitro selection experiment to assess the degeneracy of I-CreI cleavage, and had already shown that GCC, TCC and ATC were cleavable, although the impact of mutation in regions other than K5 to K3 was unclear in this study.32 Other sequences found by Monnat and co-workers, such as ATT and GGC, were not detected here, reflecting either the sensitivity of our yeast assay, or the impact of the adjacent mutations. Many of our mutants displayed cleavage degeneracy as well, with the number of cleaved sequences ranging from one to 21 with an average of 5.0 sequences cleaved (standard deviationZ3.6). Interestingly, in 50 mutants (14%), specificity was altered so that they cleaved exactly one target; 37 (11%) cleaved two targets, 61 (17%) cleaved three targets and 58 (17%) cleaved four targets. For five targets and above, percentages

Figure 1. Rationale of the experiment. (a) Localization of the area of the binding interface (bottom view) chosen for randomization (green) and interacting base-pairs (K3 orange, K4 pink, K5 magenta). (b) Zoom showing residues 44, 68, 70 chosen for randomization, D75 and interacting base-pairs. R68 and R70 make classic interactions with two guanosine bases, in which the NH1 and NH2 groups of the arginine side-chain make a hydrogen bond with the N7 and O6 groups of the guanosine ring. Q44 makes a hydrogen bond to the side-chain N7 group of an adenine. These residues are held in place by a hydrogen-bond network involving the side-chain OG group

of T46 and the side-chain OD1 and OD2 groups of D75.) (c) Design of the library and targets. The interactions of I-CreI residues Q44, R68 an R70 with DNA targets are indicated (top). The target described here is a palindrome derived from the I-CreI natural target, and cleaved by I-CreI.18 Cleavage positions are indicated by arrowheads. In the library, residues 44, 68 and 70 are replaced with ADEGHKNPQRST. Since I-CreI is a homodimer, the library was screened with palindromic targets. A total of 64 palindromic targets resulting from substitutions in positions G3, G4 and G5 were generated. A few examples of such targets are shown (bottom).

DTD 5

ARTICLE IN PRESS

4

Engineering of Specific Homing Endonucleases

Figure 2. Screening. (a) Yeast screening assay principle. A strain harbouring the expression vector encoding the variants is mated with a strain harbouring a reporter plasmid. In the reporter plasmid, a LacZ reporter gene is interrupted with an insert containing the site of interest, flanked by two direct repeats. Upon mating, the endonuclease (gray oval) performs a double-strand break on the site of interest, allowing restoration of a functional LacZ (white oval) gene by single-strand annealing (SSA) between the two flanking direct repeats. (b) Scheme of the experiment. A library of I-CreI variants is built using PCR, cloned into a replicative yeast expression vector and transformed in S. cerevisiae strain FYC2-6A (MATa, trp1D63, leu2D1, his3D200). The 64 palindromic targets are cloned in the LacZ-based yeast reporter vector, and the resulting clones transformed into strain FYBL2-7B (MATa, ura3D851, trp1D63, leu2D1, lys2D202). Robot-assisted gridding on filter membrane is used to perform mating between individual clones expressing homing endonuclease variants and individual clones harbouring a reporter plasmid. After primary high-throughput screening, the ORFs of positive clones are amplified by PCR and sequenced. A total of 410 different variants were identified among the 2100 positives, and tested at low density, to establish complete patterns, and 350 clones were validated. Also, 294 mutants were recloned in yeast vectors, and tested in a secondary screen, and results confirmed those obtained without recloning. Chosen clones are then assayed for cleavage activity in a similar CHO-based assay and eventually in vitro.

were lower than 10%. Altogether, 38 targets were cleaved by the mutants (Figure 4(a)). Interestingly, GTT has been identified in the cleavage site of two I-CreI homologues, isolated from green algae chloroplastic introns.15 Cleavage was barely observed on targets with A in position G3, and never with targets with TGN and CGN at positions G5, G4, G3.

Novel homing endonucleases can cleave novel targets while keeping high activity and narrow specificity Eight representative mutants (belonging to six different clusters, see below) were chosen for further characterization (Figure 3). We first determined whether data in yeast could be confirmed in

DTD 5

ARTICLE IN PRESS 5

Engineering of Specific Homing Endonucleases

mammalian cells, by using an assay based on the transient cotransfection of a homing endonuclease expressing vector and a target vector, as described in previous reports.26,33 The eight mutant ORFs and the 64 targets were cloned into appropriate vectors, and we used a robot-assisted microtitre-based protocol to co-transfect in Chinese hamster ovary (CHO) cells each selected variant with each one of the 64 different reporter plasmids. Homing endonuclease-induced recombination was measured by a standard, quantitative ONPG assay that monitors the restoration of a functional b-galactosidase gene. Profiles were found to be qualitatively and quantitatively reproducible in five independent experiments (not shown). As shown in Figure 3(a), strong and medium signals were nearly always observed with both yeast and CHO cells (with the exception of ADK), thereby validating the relevance of our yeast HTS process. However, weak signals observed in yeast were often not

detected in CHO cells, likely due to a difference in the detection level (see QRR and targets GTG, GCT, and TTC). Four mutants were also produced in Escherichia coli and purified by metal affinity chromatography. Their relative in vitro cleavage efficiencies against the wild-type site and their cognate sites were determined. The extent of cleavage under standardized conditions was assessed across a broad range of concentrations for the mutants (Figure 3(b)). Similarly, we analysed the activity of I-CreI wt on these targets. In many case, 100% cleavage of the substrate could not be achieved, likely reflecting the fact that these proteins may have little or no turnover.34,35 In general, in vitro assays confirmed the data obtained in yeast and CHO cells. Surprisingly, the GTT target was efficiently cleaved by I-CreI, whereas no cleavage was observed in yeast in this study, nor in bacteria in a previous report from Seligman and co-workers.30

Figure 3(a) (legend on following page)

ARTICLE IN PRESS

DTD 5

6

Engineering of Specific Homing Endonucleases

Specificity shifts were obvious from the profiles obtained in yeast and CHO: the I-CreI favourite GTC target was not cleaved or barely cleaved, while signals were observed with new targets. This switch of specificity was confirmed for QAN, DRK, RAT and KTG by in vitro analysis, as shown in Figure 3(b). In addition, these four mutants, which display various levels of activity in yeast and CHO (Figure 3(a)), were shown to cleave 17–60% of their favourite target in vitro (Figure 3(b)), with similar kinetics to I-CreI (half of maximal cleavage by 13– 25 nM). This outcome may largely depend on the sensitivity of our yeast assay: proteins with lesser activity may have been represented in the library, but not identified in the primary screen. Nevertheless, in the identified variants, activity was largely preserved by engineering. Third, the number of cleaved targets varied among the mutants: strong cleavers such as QRR, QAN, ARL

and KTG have a spectrum of cleavage in the range of what is observed with I-CreI (five to eight detectable signals in yeast, three to six in CHO). Specificity is more difficult to compare with mutants that cleave weakly. For example, a single weak signal is observed with DRK but might represent the only detectable signal resulting from the attenuation of a more complex pattern. Nevertheless, the behaviour of variants that cleave strongly shows that engineering preserves a very narrow specificity. Hierarchical clustering defines seven I-CreI variant families Next, hierarchical clustering was used to determine whether families could be identified among the numerous and diverse cleavage patterns of the variants. Since primary and secondary screening

(b)

I-CreI

84 63

42 21 16

10 7.4 4.2 2.1 1.1 0.5 0 (nM)

I-CreI vs GTT

% Cleavage

I-CreI vs GTC

40

% Cleavage

GTT

60 40

GTC

20 0

20 40 60 80 Protein concentration (nM)

CCT 0

20 40 60 80 Protein concentration (nM)

100

80 60

20 0

GTC 20 40 60 80 Protein concentration (nM)

100

RAT 100

80

80

% Cleavage

100

60 CCT

40 20 0

GGG

40

0

100

KTG

0

GGG

20

DRK

80

0

GTT

60

100

100 % Cleavage

80

0

QAN

% Cleavage

GTC

100

20

40

60

GTC 80

Protein concentration (nM)

100

60 40 CCT

20 0

0

20

40

60

GTC 80

100

Protein concentration (nM)

Figure 3. Cleavage patterns. Mutants are identified by three letters, corresponding to the residues in positions 44, 68 and 70. Each mutant is tested versus the 64 targets derived from the I-CreI natural targets, and a series of control targets. Target map is indicated in the top right panel. (a) Cleavage patterns in yeast (left) and mammalian cells (right) for the I-CreI protein, and eight derivatives. For yeast, the initial raw data (filter) are shown. For CHO cells, quantitative raw data (ONPG measurement) are shown, values superior to 0.25 are highlighted in light blue, values superior to 0.5 in medium blue, values superior to 1 in dark blue. LacZ, positive control; 0, no target; U1, U2 and U3, three different uncleaved controls. (b) Cleavage in vitro. I-CreI and four mutants are tested against a set of two or four targets, including the target resulting in the strongest signal in yeast and CHO. Digests are performed at 37 8C for 1 h, with 2 nM linearized substrate, as described in Materials and Methods. Raw data are shown for I-CreI with two different targets. With both GGG and CCT, cleavage is not detected with I-CreI.

ARTICLE IN PRESS

DTD 5

7

Engineering of Specific Homing Endonucleases

gave congruent results, we used quantitative data from the first round of yeast low-density screening for analysis to permit a larger sample size. Both variants and targets were clustered using standard hierarchical clustering with Euclidean distance and Ward’s method,36 and seven clusters were defined (Figure 4(b)). Detailed analysis is shown for three of

them (Figure 4(c)) and the results are summarized in Table 1. For each cluster, a set of preferred targets could be identified on the basis of the frequency and intensity of the signal (Figure 4(c)). The three preferred targets for each cluster are indicated in Table 1, with their cleavage frequencies. The sum of these frequencies is a measurement of the specificity

(a) GGG GGA 7

GGT GGC AGG AGA 3 1 1

AGT

AGC

GAG GAA 103

GAT GAC AAG AAA 146 95 7

AAT 15

AAC 15

GTG GTA 5 66

GTT 109

GTC ATG 99

ATA

ATT

ATC

GCG GCA 7

GCT GCC ACG 100 79

TGG

TGT

TGA

TGC

CGG

5

6

ACA

ACT 45

ACC 31

CGA

CGT

CGC

TAG TAA 48

TAT CAG CAA TAC 111 53 6

CAT 38

CAC 18

TTG

TTT 57

CTA

CTT 39

CTC 26

TCT CCG CCA TCC 91 4 74

CCT 105

CCC 70

TTA 2

TCG TCA 21

TTC 58

CTG

(b)

7

6

5 4

3 2

GTC GTG GTT GCT GCC TTT TTC GCG ATC ATT GTA TTG AGG GGT GGC GGG CAC AAC AAT AAG CAG CAT TAG TCG CCG ACT CTC ACC CTT TAC GAC GAT GAG TAT TCT CCT TCC CCC

1

Figure 4(a) and (b) (legend on following page)

DTD 5

ARTICLE IN PRESS

8 of the cluster. For example, in cluster 1, the three preferred targets (GTT/C/G), account for 78.1% of the observed cleavage, with 46.2% for GTT alone, revealing a very narrow specificity. Actually, this cluster includes several proteins which, as QAN, cleave mostly GTT (Figure 3(a)). In contrast, the

Engineering of Specific Homing Endonucleases

three preferred targets in cluster 2 represent only 36.6% of all observed signals. In accordance with the relatively broad and diverse patterns observed in this cluster, QRR cleaves five targets (Figure 3(a)), while other cluster members’ activity is not restricted to these five targets.

Figure 4. Statistical analysis. (a) Cleaved target. Targets cleaved by I-CreI variants are coloured in grey. The number of proteins cleaving each target is shown below, and the level of grey colouration is proportional to the average signal intensity obtained with these cutters in yeast. (b) Hierarchical clustering of mutant and target data in yeast. Both mutants and targets were clustered using hierarchical clustering with Euclidean distance and Ward’s method.36 Clustering was done with hclust from the R package. Mutants and targets dendrograms were reordered to optimize positions of the clusters and the mutant dendrogram was cut at the height of 8 with deduced clusters alternately coloured in blue and green. QRR mutant and GTC target are indicated by an arrow. Grey levels reflect the intensity of the signal. (c) Analysis of three out of the seven clusters. For each mutant cluster (clusters 1, 3 and 7), the cumulative intensity for each target was computed and a bar plot (left column) shows, in decreasing order, the normalized intensities. For each cluster, the number of amino acid residue of each type at each position (44, 68 and 70) is shown as a colour-coded histogram in the right column. The legend of amino acid colour code is at the bottom of the Figure.

ARTICLE IN PRESS

DTD 5

9

Engineering of Specific Homing Endonucleases

Table 1. Cluster analysis Three preferred targetsa Cluster 1

Examples (Figure 3(a))

% Cleavage

QAN

GTT GTC GTG

QRR

GTT GTC TCT

ARL

GAT TAT GAG

AGR

GAC TAC

46.2 18.3 13.6 SZ78.1 13.4 11.8 11.4 SZ36.6 27.9 23.2 15.7 SZ66.8 22.7 14.5

G A T C G A T C G A T C G A

0.5 2.0 82.4 15.1 0 4.9 56.9 38.2 2.4 88.9 5.7 3.0 0.3 91.9

13.4 SZ50.6 29.21 15.4 11.4 SZ56.05.9 30.1 19.6 13.9 SZ63.6 20.8 19.6 15.3 SZ55.7

T C G A T C G A T C G A T C

6.6 1.2 1.6 73.8 13.4 11.2 0 4.0 6.3 89.7 0 0.2 14.4 85.4

8 proteins 3 65 proteins 4 31 proteins 5

GAT ADK DRK

GAT TAT GAC

KTG RAT

CCT TCT TCC

81 proteins 6 51 proteins 7 37 proteins a b

Nucleotide in position 4 (%)a

Sequence

77 proteins 2

Preferred amino acidb

CCT TCT TCC

44

68

70

Q 80.5% (62/77) Q 100.0% (8/8)

R 100.0% (8/8)

A 63.0% (41/65)

R 33.8% (22/65)

A&N 51.6% & 35. 4% (16&11/31)

R 48.4%

R 67.7%

15/31

21/31

K 62.7% (32/51) K 91.9% (34/37)

Frequencies according to the cleavage index, as described in the legend to Figure 4(c). In each position, residues present in more than one-third of the cluster are indicated.

Analysis of the residues found in each cluster showed strong biases for position 44: Q is overwhelmingly represented in clusters 1 and 2, whereas A and N are more frequent in clusters 3 and 4, and K in clusters 6 and 7. Meanwhile, these biases were correlated with strong base preferences for DNA positions G4, with a large majority of T:A base-pairs in cluster 1 and 2, A:T in clusters 3, 4 and 5, and C:G in clusters 6 and 7 (see Table 1). The structure of I-CreI bound to its target shows that residue Q44 interacts with the bottom strand in position K4 (and the top strand of position C4, see Figure 1(b) and (c)). Our result suggests that this interaction is largely conserved in our mutants, and reveals a “code”, wherein Q44 would establish contact with adenine, A44 (or less frequently N44) with thymine, and K44 with guanine. Such correlation was not observed for positions 68 and 70. Structural perspective To understand the reasons behind the change in specificity of the new variants it is important to look at the structure of the complex between I-CreI and the DNA. DNA/protein interactions are summarized in Figure 1. Using a new version of the automatic protein design algorithm FOLD-X,37–39 we have systematically mutated the base-pairs

in positions G3, G4 and G5 (a total of 64 combinations) and calculated the interaction energy of the new protein DNA complexes after moving and relaxing the amino acid residues Q44, R68 and R70 and their neighbours (Table 2). The interaction of I-CreI with the 64 targets could be predicted relatively well (compare the lowest value of Table 2 with the I-CreI yeast pattern in Figure 3(a)). The first obvious conclusion from the FOLD-X analysis and the modelled 64 complexes is that the base-pair A:T in G3 is not tolerated at all with any other DNA mutant combination due to a steric clash of the methyl group of the thymine with T46. This result likely explains why no mutant was found that recognizes a template of the type NNA (compare Table 2 and Figure 4(a)). In addition, mutation in G3 automatically results in a loss of activity due to the fact that to make a H-bond with another base other than a G, R70 needs to break the double ionpair with D75 and leave a buried unsatisfied charged group (Figure 1(b)). The D75N mutation allows more freedom to R70 (it is held by only one bond), so that it now can make a single hydrogen bond to the complementary base of TG3, explaining why QRR in the context of the N75 variant recognizes the sequence GTT. For the base-pair T:A in G4, the Q44 side-chain NH2 group makes a hydrogen bond to the N7 group of the A base

ARTICLE IN PRESS

DTD 5

10

Engineering of Specific Homing Endonucleases

Table 2. Interaction energy for I-CreI in complex with the 64 variant targets Difference in free energy of interaction between I-CreI and the DNA with respect to the GTC sequence (in kcal/mol)a GGG 3.9 GAG 4.2 GTG 3.8 GCG 4.5 TGG 3.6 TAG 4.9 TTG 3.5 TCG 3.5 a

GGA 8.3 GAA 6.8 GTA 6.2 GCA 6.1 TGA 8.6 TAA 8.4 TTA 7.2 TCA 6.9

GGT 4.3 GAT 4.4 GTT 4.3 GCT 5.2 TGT 4.1 TAT 5.3 TTT 4.1 TCT 4.6

GGC 1.6 GAC 2.4 GTC 0.0 GCC 1.1 TGC 3.5 TAC 2.8 TTC 0.7 TCC 2.6

AGG 4.4 AAG 5.7 ATG 2.9 ACG 4.7 CGG 4.7 CAG 6.2 CTG 4.8 CCG 4.7

AGA 8.6 AAA 9.2 ATA 5.6 ACA 7.9 CGA 7.9 CAA 9.2 CTA 6.5 CCA 8.2

AGT 4.7 AAT 6.0 ATT 3.3 ACT 4.7 CGT 5.3 CAT 6.6 CTT 5.3 CCT 5.0

AGC 3.5 AAC 3.3 ATC 1.3 ACC 1.9 CGC 4.0 CAC 3.4 CTC 1.0 CCC 3.7

All triplets predicted to bind with less than 1.5 kcal/mol difference with the I-CreI protein are in bold.

(Figure 1(b)). In principle, a C:G pair also should be accepted, since the same bond can be made with the equivalent group of a C base. This explains why a Q at position 44 creates a strong tendency to get a T:A pair in G4, or also a C:G, but no A:T or G:C pairs. Finally the guanine at position G5 that interacts with R68, cannot easily be replaced by an A, not only because it will break the interaction with R68 but also because it will introduce van der Waals’ clashes with the side-chain of I24. This explains why few proteins cleave ANN targets (see Figure 4(a)). A C:G pair will break the two H-bonds with R68, and a T:A pair, in principle, could be tolerated, since still one H-bond could be made (data not shown). Thus simple modeling and energy calculation easily explains the experimental preferences found for the wt protein, as well as the general trends observed for the mutants. Variants can be assembled in functional heterodimers to cleave new DNA target sequences All selected variants are homodimers capable of cleaving palindromic sites. To test whether the list of cleavable targets could be extended by creating heterodimers that would cleave hybrid cleavage sites (as described in Figure 5(a)), we chose a subset of mutants with distinct profiles and cloned them in two different yeast vectors marked by LEU2 or KAN genes. We next co-expressed combinations of mutants in yeast with a set of palindromic and non-palindromic chimeric DNA targets. An example is shown in Figure 5(b): co-expression of the KTG and QAN mutants resulted in the cleavage of two chimeric targets, GTT/GCC and GTT/CCT, that were not cleaved by either mutant alone. The palindromic GTT, CCT and GCC targets (and other targets of KTG and QAN) were also cleaved, likely resulting from homodimeric species formation, but unrelated targets were not. In addition, a GTT, CCT or GCC half-site was not sufficient to allow cleavage, since such targets were fully resistant (see GGG/GCC, GAT/GCC, GCC/TAC, and many

others, in Figure 5(b)). Unexpected cleavage was observed only with GTC/CCT, with KTG homodimers, but the signal remained very weak. Thus, efficient cleavage requires the cooperative binding of two mutant monomers. These results demonstrate a good level of specificity for heterodimeric species. Altogether, a total of 112 combinations of 14 different proteins were tested in yeast (not shown), and 37.5% of the combinations (42/112) revealed a positive signal on their predicted chimeric target. Quantitative data are shown for six examples in Figure 5(c), and for the same six combinations, results were confirmed in CHO cells in transient cotransfection experiments, with a subset of relevant targets (Figure 5(d)). As a general rule, functional heterodimers were always obtained when one of the two expressed proteins gave a strong signal as homodimer. In these cases, unexpected cleavage of an irrelevant target was rare, and always faint, as for KTG (see Figure 5(b)). For example, DRN and RRN, two low-activity mutants, give functional heterodimers with strong cutters such as KTG or QRR (Figure 5(c) and (d)), whereas no cleavage of chimeric targets could be detected by co-expression of the same weak mutants (not shown).

Discussion We set out to determine whether a large number of novel endonucleases could be derived from HEs, while keeping both high activity and high specificity levels. The creation of a large number of mutants with new specificities was clearly demonstrated. A large diversity of novel profiles was found, characterized by the cleavage of novel targets. Based on the mutation of single I-CreI residues, Seligman and co-workers29,30 found that a few I-CreI variants with altered specificity could be obtained. Here, we show that global alteration of a whole subdomain can yield hundreds of novel endonucleases. In our study, preservation of a very narrow specificity was also achieved. In a former

ARTICLE IN PRESS

DTD 5

11

Engineering of Specific Homing Endonucleases

study, Gimble et al.31 derived several proteins from PI-SceI, by mutating the DNA-binding residues from the intein domain. However, most proteins maintained similar affinities for the wild-type target (a)

-12 -11-10 -9 -8 -7 -6 -5 -4 -3 -2 -1

GTT CCT GTT/CCT

(b)

and the new targets in an in vitro binding test. Although it is difficult to compare results obtained with different assays and scaffolds, many of our mutants cleaving a novel target had lost most or all

5’ 3’ 5’ 3’ 5’ 3’

1 2 3 4

5

6 7

8 9 10 11 12

TCAAAACGTTGTACAACGTTTTGA AGTTTTGCAACATGTTGCAAAACT TCAAAACCCTGTACAGGGTTTTGA AGTTTTGGGACATGTCCCAAAACT TCAAAACGTTGTACAGGGTTTTGA AGTTTTGCAACATGTCCCAAAACT

3’ 5’ 3’ 5’ 3’ 5’

AAC/ACT AAC/ACC AAC/CCT AAC/CCC ACT/CAC ACC/CAC CAT/CCT CAC/CCT CAC/CCC GGG/AAC

GCC

GGG/ACT GGG/ACC GGG/CAT GGG/CAC GGG/CCT GGG/CCC GGG/GAT GGG/GAC GGG/GTT GGG/GTC

TAT

GGG/GCT GGG/GCC GGG/TAT GGG/TAC GGG/TCT GGG/TCC GAT/CAT GAT/CCT GAT/GAC GAT/GTT

CAT

GAT/GTC GAT/GCC GAT/TAT GAC/ACT GAC/ACC GAC/CAT GAC/CCT GAC/CCC GAC/GTT GAC/GTC

CCT

GAC/GCT GAC/GCC GAC/TAT GAC/TCT GAC/TCC GTT/CAT GTT/CCT GTT/GTC GTT/GCC GTT/TAT

0

GTC/CAT GTC/CCT GTC/GCC GTC/TAT GCT/AAC GCT/CAC GCT/TAC GCC/AAC GCC/CAT GCC/CAC

0

GCC/CCT GCC/TAT GCC/TAC TAT/CAT TAT/CCT TAC/ACT TAC/ACC TAC/CCT TAC/CCC TAC/TCT

0

TAC/TCC TCT/AAC TCT/CAC TCC/AAC TCC/CAC

0

KTG

GGG

GAT

GAC

GTT

QAN

GTC

KTGxQAN

GCC GTT/CCT CCT

GTT/GCC

GTT/GTC

GTC/CCT GTT GCC/CCT

(c)

KTGxQAN

1.0 0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

0 CCT

1.0

GCC GTT GTT GTT AAC CCT GCC ACC

KTGxDRN

CCT GAC GAC GAC TAC TAC AAC CCT TCT CCT TCT ACC

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

KTGxATS

1.0

0.8

0 CCT GAC GAT GGG GGG GAT GAC AAC CCT CCT CCT ACC

1.0

KTGxAGR

1.0

QRRxRRN

CCT CAT GAT TAT CAT GAT TAT AAC CCT CCT CCT ACC 1.0

0.8

0.8

0.6

0.6

0.4

0.4

0.2

0.2

0

RATxASR

0 GTC GTT GCC CAT GTC GTT GCC AAC CAT CAT CAT ACC

CCT GAC GAC AAC CCT ACC

Figure 5(a)–(c) (legend on following page)

ARTICLE IN PRESS

DTD 5

12

Engineering of Specific Homing Endonucleases

activity with any of the seven targets cleaved by I-CreI (Figure 3(a)). This loss of activity for the initial I-CreI target was confirmed in vitro (Figure 3(b)). Moreover, representative mutants shown in Figure 3(a) are at least as selective as the I-CreI wt protein, and engineering was not associated with a relaxation of specificity. It has been proposed that HE degeneracy is maintained because it allows homing to novel genes and genomes.12 This model suggests that proteins with similar or even higher specificity might be derived from HEs. It might explain why a large panel of our proteins had preserved the essential qualities of natural homing endonucleases. The creation and characterization of a large number of novel proteins allowed for a structural and statistical analysis of bases and amino acid distributions. Interestingly, the “rules” that could be

(d)

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0

KTGxQAN

CCT

GTT GTT CCT

KTGxDRN

inferred for the interactions between residue 44 and DNA position G4 (glutamine/adenine; alanine or asparagine/thymine; lysine/guanine) fit with what is often observed for zinc-finger proteins, 9 suggesting a general code for protein/DNA interaction. Similar behaviour is found for the wild-type protein for DNA positions G3 and G5. However, no such trends could be observed for DNA positions G3 and G5 for the mutants, and our current thinking is that the interaction with these bases does not depend on a given single residue. In such cases, which might actually be the majority, statistical analysis might require very large samples of novel variant proteins. In contrast, FOLD-X analysis does not have such requirements. The fact that homology modeling of the DNA variants and energy calculation with FOLD-X37–39 reproduces to a good extent the experimental behaviour

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0

CCT GGG GGG CCT 1.4 1.2 1.0 0.8 0.6 0.4 0.2 0

QRRxRRN

CAT

GTT GTT CAT

KTGxAGR

CCT GAC GAC CCT

KTGxATS

CCT

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0

TAT TAT CCT

RATxASR

CCT

GAC GAC CCT

Figure 5. Examples of heterodimeric activities. (a) Example of hybrid or chimeric site. GTT and CCT are two palindromic sites derived from the I-CreI site as described in the legend to Figure 1(c). The GTT/CCT hybrid site displays the GTT sequence on the top strand at K5, K4, K3 and the CCT sequence on the bottom strand at 5, 4, 3. (b) Cotransformation of KTG and QAN in yeast. Target organization is shown in the top panel: targets with a single GTT, CCT or GCC half site are in red; targets with two such half sites, which are expected to be cleaved by homo- and/or heterodimers, are in red and in grey boxes; 0: no target. Results are shown on the three panels below. An unexpected faint signal is observed for GTC/CCT, cleaved by KTG. Cleavage of GTT/GTC is likely a consequence of the faint cleavage of GTC by QAN, observed in vitro (Figure 3(b)) and sometimes faintly in yeast (see Figure 3(a)). (c) Co-transformation of selected mutants in yeast: quantitative data. For clarity, only results on relevant hybrid targets are shown. The AAC/ ACC target is always shown as an example of unrelated target. Note that for the KTGxAGR couple, the palindromic TAC and TCT targets cleaved by AGR and KTG, respectively, were not assayed, but can be visualized in Figure 3(a). Cleavage of the CAT target by the RRN mutant is very low, and could not be quantified in yeast. (d) Transient co-transfection in CHO cells. For (c) and (d): black bars, signal for the first mutant alone; grey bars, signal for the second mutant alone; hatched bars, signal obtained by co-expression or cotransfection.

DTD 5

ARTICLE IN PRESS 13

Engineering of Specific Homing Endonucleases

found for the I-CreI protein offers hope that a combination of protein design and screening will allow the generation of new, very specific homing endonucleases. Recent work has shown that new homing endonucleases could be engineered by domain swapping of different monomers.26–28 These chimeric proteins were able to cleave the hybrid target corresponding to the fusion of the two half parent DNA sequence targets. Here, we applied the same strategy to the I-CreI mutants, by simple coexpression. The combinations were functional and strictly predictable regarding their DNA specificity as long as the homodimeric activity was strong for at least one partner, which likely reflects the sensitivity of our assay. These results suggest that functional heterodimers could be formed, and therefore that the alteration of a precise area of the DNA-binding interface (positions 44, 68, 70) has a limited influence on the protein/protein interface constituted by the LAGLIDADG domains. However, co-expression leads to a mixture of homodimers and heterodimers, and the cleavage of palindromic cleavage sites by homodimers results in a lesser specificity. We have demonstrated the possibility to create a functional single chain protein with I-CreI,26 and the same strategy can be used in this case to obtain a fusion protein that will cleave only non-palindromic chimeric sites. The generation of collections of novel homing endonucleases, and the ability to combine them, considerably enriches the number of DNA sequences that can be targeted, but does not yet saturate all potential sequences. However, there are many ways to increase the number of target sequences. First, by choosing the degenerate VVK codon to create diversity, we excluded eight amino acids from positions 44, 68 and 70, and these residues may be involved in new specificities. More importantly, we limited the scope of this study to base-contacting residues, whereas an indirect but determinant influence of surrounding residues is highly probable in terms of an interaction with DNA and/or with DNA-binding residues. In the future, to target as many sequences as possible, the binding interface has to be engineered more globally. Second, other areas of the DNA-binding domain need to be altered, to create new binding domains, which can ultimately be combined with the mutants described here. This would expand further the potential target sequences, with the proviso that a combinatorial approach might be more difficult to apply within a single monomer or domain than between monomers. Finally, we and others have shown that LAGLIDADG domains from different proteins can be assembled into functional chimeric molecules,26– 28 and it should thus be possible to expand further the number of combinations. Thus, combinations of LAGLIDADG domains from a growing collection of novel proteins should eventually result in the generation of dedicated homing endonucleases able to cleave sequences from many genes of

interest. Given the specificity of our proteins, potential applications include the cleavage of viral genomes or the correction of genetic defects via double-strand break-induced recombination, both of which will lead to future therapeutics.

Materials and Methods Structure analyses All analyses of protein structures were realized using Pymol. The structures from I-CreI correspond to Protein Data Bank accession number 1G9Y. Residue numbering in the text always refers to these structures, except for residues in the second I-CreI protein domain of the homodimer, where residue numbers were set as for the first domain. Energies (Table 2) have been calculated for one half of the dimeric I-CreI complex. The FOLD-X algorithm37–39 modified to mutate DNA was used with a softer version of the van der Waals’ clashes. Importantly, FOLD-X recognizes water molecules making two or more hydrogen bonds to the protein (either directly or through another water molecule). Also, FOLD-X can dress a protein or DNA with predicted water-bridges.39 Thus, crystallographic and predicted water-bridges were consistent in the modelling. Construction of mutant libraries I-CreI wt and I-CreI D75N ORFs were synthesized as described.26 Mutation D75N was introduced by replacing codon 75 with AAC. The diversity of the homing endonuclease library was generated by PCR using degenerate primers from Sigma harbouring codon VVK (18 codons, amino acids ADEGHKNPQRST) at position 44, 68 and 70, and as DNA template, the I-CreI gene. The final PCR product was digested with specific restriction enzymes, and cloned back into the I-CreI ORF digested with the same restriction enzymes (detailed protocol is available upon request), in pCLS542. In this 2m-based replicative vector marked with the LEU2 gene, I-CreI variants are under the control of a galactose-inducible promoter.26 After electroporation in E. coli, we obtained 7!104 clones, representing 12 times the theoretical diversity at the DNA level (183Z5832). DNA was extracted and transformed into Saccharomyces cerevisiae strain FYC2-6A (MATa, trp1D63, leu2D1, his3D200). A total of 13,824 colonies were picked using a colony picker (QpixII, genetix), and grown in 144 microtitre plates. Construction of target clones The C1221 24 bp palindrome (5 0 TCAAAACGTCGT ACGACGTTTTGA 3 0 ) is a repeat of the half-site of the nearly palindromic natural I-CreI target (TCAAAACGTC GTGAGACAGTTTGG). C1221 is cleaved as efficiently as the I-CreI natural target in vitro and ex vivo in both yeast and mammalian cells. The 64 palindromic targets were derived as follows: 64 pairs of oligonucleotides (5 0 GGCATACAAGTTTCAAAACNNNGTACNNNGTTTT GACAATCGTCTGTCA 3 0 and reverse complementary sequences) were ordered form Sigma, annealed and cloned into pGEM-T Easy (Promega). Next, a 400 bp PvuII fragment was excised and cloned into the yeast vector pFL39-ADH-LACURAZ and the mammalian

DTD 5

ARTICLE IN PRESS

14 vector pcDNA3.1-LACURAZ-DURA, both described previously.26 The 75 hybrid target sequences were cloned as follows: oligonucleotides were designed that contained two different half sites of each mutant palindrome (Proligo). Double-stranded target DNA, generated by PCR amplification of the single-stranded oligonucleotides, was cloned using the Gateway protocol (Invitrogen) into yeast and mammalian reporter vectors. Yeast reporter vectors were transformed into S. cerevisiae strain FYBL2-7B (MAT a, ura3D851, trp1D63, leu2D1, lys2D202).

Engineering of Specific Homing Endonucleases

buffer (10 mM Tris–HCl (pH 7.5), 150 mM NaCl, 0.1% Triton X100, 0.1 mg/ml of BSA, protease inhibitors), 10 ml of Mg 100! buffer (100 mM MgCl2, 35% b-mercaptoethanol), 110 ml of ONPG (8 mg/ml) and 780 ml of 0.1 M sodium phosphate (pH 7.5). After 3 h incubation at 37 8C, absorbance was measured at 420 nm. The entire process was performed on an automated Velocity11 BioCel platform. Protein expression and purification

Mating of homing endonuclease-expressing clones and screening in yeast Mating was performed using a colony gridder (QpixII, Genetix). Mutants were gridded on nylon filters covering YPD plates, using a high gridding density (about 20 spots/cm2). A second gridding process was performed on the same filters to spot a second layer consisting of 64 or 75 different reporter-harbouring yeast strains for each variant. Membranes were placed on solid agar YPD-rich medium, and incubated at 30 8C for one night, to allow mating. Next, filters were transferred to synthetic medium, lacking leucine and tryptophan, with galactose (1%) as a carbon source (and with G418 for co-expression experiments), and incubated for five days at 37 8C, to select for diploids carrying the expression and target vectors. After five days, filters were placed on solid agarose medium with 0.02% X-Gal in 0.5 M sodium phosphate buffer (pH 7.0), 0.1% (w/v) SDS, 6% dimethyl formamide (DMF), 7 mM b-mercaptoethanol, 1% (w/v) agarose, and incubated at 37 8C, to monitor b-galactosidase activity. Results were analyzed by scanning and quantification was performed using a proprietary software. Sequence and re-cloning of primary hits The ORF of positive clones identified during the primary screening in yeast was amplified by PCR and sequenced. Then, ORFs were recloned using the Gateway protocol (Invitrogen). ORFs were amplified by PCR on yeast colonies,40 using primers 5 0 GGGGACAAGTTTG TACAAAAAAGCAGGCTTCGAAGGAGATAGAACC ATGGCCAATACCAAATATAACAAAGAGTTCC 3 0 and 5 0 GGGGACCACTTTGTACAAGAAAGCTGGGTTTAG TCGGCCGCCGGGGAGGATTTCTTCTTCTCGC 3 0 from Proligo. PCR products were cloned either in (i) yeast gateway expression vectors harbouring a galactoseinducible promoter, LEU2 or KanR as selectable marker and a 2m origin of replication, (ii) a CHO gateway expression vector pCDNA6.2 from Invitrogen, (iii) a pET 24d(C) vector from Novagen. Resulting clones were verified by sequencing (Millegen).

His-tagged proteins were over-expressed in E. coli BL21 (DE3)pLysS cells using pET-24d (C) vectors (Novagen). Induction with IPTG (0.3 mM), was performed at 25 8C. Cells were sonicated in a solution of 50 mM sodium phosphate (pH 8), 300 mM sodium chloride containing protease inhibitors (Complete EDTA-free tablets, Roche) and 5% (v/v) glycerol. Cell lysates were centrifuged at 100,000g for 60 min. His-tagged proteins were then affinity-purified, using 5 ml Hi-Trap chelating HP columns (Amersham Biosciences) loaded with cobalt. Several fractions were collected during elution with a linear gradient of imidazole (up to 0.25 M imidazole, followed by plateau at 0.5 M imidazole, 0.3 M NaCl and 50 mM sodium phosphate pH 8). Protein-rich fractions (determined by SDS-PAGE) were applied to the second column. The crude purified samples were taken to pH 6 and applied to a 5 ml HiTrap Heparin HP column (Amersham Biosciences) equilibrated with 20 mM sodium phosphate (pH 6.0). Bound proteins were eluted with a sodium chloride continuous gradient with 20 mM sodium phosphate and 1 M sodium chloride. The purified fractions were submitted to SDS-PAGE and concentrated (10 kDa cut-off Centriprep Amicon Ultra system), frozen in liquid nitrogen and stored at K80 8C. Purified proteins were desalted using PD10 columns (Sephadex G-25M, Amersham Biosciences) in PBS or 10 mM Tris–HCl (pH 8) buffer. In vitro cleavage assays pGEM plasmids with single homing endonuclease DNA target cut sites were first linearized with XmnI. Cleavage assays were performed at 37 8C in 10 mM Tris– HCl (pH 8), 50 mM NaCl, 10 mM MgCl2, 1 mM DTT, 50 mg/ml of BSA. The target substrate concentration was 2 nM. A dilution range between 0 nM and 85 nM was used for each protein, in a 25 ml final volume reaction. Reactions were stopped after 1 h by addition of 5 ml of 45% glycerol, 95 mM EDTA (pH 8), 1.5% (w/v) SDS, 1.5 mg/ml of proteinase K and 0.048% (w/v) bromophenol blue (6! Buffer Stop) and incubated at 37 8C for 30 min. Digests were run on agarose electrophoresis gel, and fragments quantified after ethidium bromide staining, to calculate the percentage of cleavage.

Mammalian cell assays Hierarchical clustering A total of 104 CHO cells were plated in each well of 96well microplates. At 24 h after seeding, they were transfected with Polyfect transfection reagent according to the supplier’s (Qiagen) protocol (175 ng of total DNA mix, 1 ml of polyfect, 150 ml of F12K medium per well). Cells were incubated at 37 8C, 5% CO2. At 72 h after transfection, culture medium was removed and 150 ml of lysis/revelation buffer added for b-galactosidase liquid assay. Typically, for 1 l of buffer, we used 100 ml of lysis

Clustering was done using hclust from the R package. We used quantitative data from the primary, low-density screening. Both variants and targets were clustered using standard hierarchical clustering with Euclidean distance and Ward’s method.36 Mutant and target dendrograms were re-ordered to optimize positions of the clusters and the mutant dendrogram was cut at the height of 8 to define the cluster.

DTD 5

ARTICLE IN PRESS 15

Engineering of Specific Homing Endonucleases

Acknowledgements We thank Rodney J. Rothstein, James E. Haber and Julianne Smith for critical reading of the manuscript, Daniel Padro´ for NMR analysis on the mutants and Pilar Redondo for assistance in protein purification and in vitro cleavage assays. This work was supported, in part, by grant 0212508Q from The French Agency for Innovation (ANVAR), and grant 04W107 from the French Ministry of Research, under label EUREKA S!3294.

14.

15.

16. 17.

References 1. Rouet, P., Smih, F. & Jasin, M. (1994). Introduction of double-strand breaks into the genome of mouse cells by expression of a rare-cutting endonuclease. Mol. Cell. Biol. 14, 8096–8106. 2. Choulika, A., Perrin, A., Dujon, B. & Nicolas, J. F. (1995). Induction of homologous recombination in mammalian chromosomes by using the I-SceI system of Saccharomyces cerevisiae. Mol. Cell. Biol. 15, 1968–1973. 3. Donoho, G., Jasin, M. & Berg, P. (1998). Analysis of gene targeting and intrachromosomal homologous recombination stimulated by genomic double-strand breaks in mouse embryonic stem cells. Mol. Cell. Biol. 18, 4070–4078. 4. Elliott, B., Richardson, C., Winderbaum, J., Nickoloff, J. A. & Jasin, M. (1998). Gene conversion tracts from double-strand break repair in mammalian cells. Mol. Cell. Biol. 18, 93–101. 5. Sargent, R. G., Brenneman, M. A. & Wilson, J. H. (1997). Repair of site-specific double-strand breaks in a mammalian chromosome by homologous and illegitimate recombination. Mol. Cell. Biol. 17, 267–277. 6. Puchta, H., Dujon, B. & Hohn, B. (1996). Two different but related mechanisms are used in plants for the repair of genomic double-strand breaks by homologous recombination. Proc. Natl Acad. Sci. USA, 93, 5055–5060. 7. Smith, J., Berg, J. M. & Chandrasegaran, S. (1999). A detailed study of the substrate specificity of a chimeric restriction enzyme. Nucl. Acids Res. 27, 674–681. 8. Urnov, F. D., Miller, J. C., Lee, Y. L., Beausejour, C. M., Rock, J. M., Augustus, S. et al. (2005). Highly efficient endogenous human gene correction using designed zinc-finger nucleases. Nature, 435, 646–651. 9. Pabo, C. O., Peisach, E. & Grant, R. A. (2001). Design and selection of novel Cys2His2 zinc finger proteins. Annu. Rev. Biochem. 70, 313–340. 10. Segal, D. J. & Barbas, C. F., 3rd (2001). Custom DNAbinding proteins come of age: polydactyl zinc-finger proteins. Curr. Opin. Biotechnol. 12, 632–637. 11. Isalan, M., Klug, A. & Choo, Y. (2001). A rapid, generally applicable method to engineer zinc fingers illustrated by targeting the HIV-1 promoter. Nature Biotechnol. 19, 656–660. 12. Chevalier, B. S. & Stoddard, B. L. (2001). Homing endonucleases: structural and functional insight into the catalysts of intron/intein mobility. Nucl. Acids Res. 29, 3757–3774. 13. Kostriken, R., Strathern, J. N., Klar, A. J., Hicks, J. B. &

18.

19.

20.

21.

22. 23.

24.

25.

26.

27.

28.

29.

Heffron, F. (1983). A site-specific endonuclease essential for mating-type switching in Saccharomyces cerevisiae. Cell, 35, 167–174. Jacquier, A. & Dujon, B. (1985). An intron-encoded protein is active in a gene conversion process that spreads an intron into a mitochondrial gene. Cell, 41, 383–394. Lucas, P., Otis, C., Mercier, J. P., Turmel, M. & Lemieux, C. (2001). Rapid evolution of the DNAbinding site in LAGLIDADG homing endonucleases. Nucl. Acids Res. 29, 960–969. Jurica, M. S., Monnat, R. J., Jr & Stoddard, B. L. (1998). DNA recognition and cleavage by the LAGLIDADG homing endonuclease I-CreI. Mol. Cell. 2, 469–476. Chevalier, B. S., Monnat, R. J., Jr & Stoddard, B. L. (2001). The homing endonuclease I-CreI uses three metals, one of which is shared between the two active sites. Nature Struct. Biol. 8, 312–316. Chevalier, B., Turmel, M., Lemieux, C., Monnat, R. J., Jr & Stoddard, B. L. (2003). Flexible DNA target site recognition by divergent homing endonuclease isoschizomers I-CreI and I-MsoI. J. Mol. Biol. 329, 253–269. Moure, C. M., Gimble, F. S. & Quiocho, F. A. (2003). The crystal structure of the gene targeting homing endonuclease I-SceI reveals the origins of its target site specificity. J. Mol. Biol. 334, 685–695. Moure, C. M., Gimble, F. S. & Quiocho, F. A. (2002). Crystal structure of the intein homing endonuclease PI-SceI bound to its recognition sequence. Nature Struct. Biol. 9, 764–770. Ichiyanagi, K., Ishino, Y., Ariyoshi, M., Komori, K. & Morikawa, K. (2000). Crystal structure of an archaeal intein-encoded homing endonuclease PI-PfuI. J. Mol. Biol. 300, 889–901. Duan, X., Gimble, F. S. & Quiocho, F. A. (1997). Crystal structure of PI-SceI, a homing endonuclease with protein splicing activity. Cell, 89, 555–564. Bolduc, J. M., Spiegel, P. C., Chatterjee, P., Brady, K. L., Downing, M. E., Caprara, M. G. et al. (2003). Structural and biochemical analyses of DNA and RNA binding by a bifunctional homing endonuclease and group I intron splicing factor. Genes Dev. 17, 2875–2888. Silva, G. H., Dalgaard, J. Z., Belfort, M. & Van Roey, P. (1999). Crystal structure of the thermostable archaeal intron-encoded endonuclease I-DmoI. J. Mol. Biol. 286, 1123–1136. Grindl, W., Wende, W., Pingoud, V. & Pingoud, A. (1998). The protein splicing domain of the homing endonuclease PI-sceI is responsible for specific DNA binding. Nucl. Acids Res. 26, 1857–1862. Epinat, J. C., Arnould, S., Chames, P., Rochaix, P., Desfontaines, D., Puzin, C. et al. (2003). A novel engineered meganuclease induces homologous recombination in yeast and mammalian cells. Nucl. Acids Res. 31, 2952–2962. Chevalier, B. S., Kortemme, T., Chadsey, M. S., Baker, D., Monnat, R. J. & Stoddard, B. L. (2002). Design, activity, and structure of a highly specific artificial endonuclease. Mol. Cell. 10, 895–905. Steuer, S., Pingoud, V., Pingoud, A. & Wende, W. (2004). Chimeras of the homing endonuclease PI-SceI and the homologous Candida tropicalis intein: a study to explore the possibility of exchanging DNA-binding modules to obtain highly specific endonucleases with altered specificity. ChemBiochem, 5, 206–213. Sussman, D., Chadsey, M., Fauce, S., Engel, A., Bruett, A., Monnat, R., Jr et al. (2004). Isolation and

DTD 5

ARTICLE IN PRESS

16

30.

31.

32.

33.

34.

Engineering of Specific Homing Endonucleases

characterization of new homing endonuclease specificities at individual target site positions. J. Mol. Biol. 342, 31–41. Seligman, L. M., Chisholm, K. M., Chevalier, B. S., Chadsey, M. S., Edwards, S. T., Savage, J. H. & Veillet, A. L. (2002). Mutations altering the cleavage specificity of a homing endonuclease. Nucl. Acids Res. 30, 3870–3879. Gimble, F. S., Moure, C. M. & Posey, K. L. (2003). Assessing the plasticity of DNA target site recognition of the PI-SceI homing endonuclease using a bacterial two-hybrid selection system. J. Mol. Biol. 334, 993–1008. Argast, G. M., Stephens, K. M., Emond, M. J. & Monnat, R. J., Jr (1998). I-PpoI and I-CreI homing site sequence degeneracy determined by random mutagenesis and sequential in vitro enrichment. J. Mol. Biol. 280, 345–353. Perez, C., Guyot, V., Cabaniols, J. P., Gouble, A., Micheaux, B., Smith, J. et al. (2005). Factors affecting double-strand break-induced homologous recombination in mammalian cells. Biotechniques, 39, 109–115. Perrin, A., Buckle, M. & Dujon, B. (1993). Asymmetrical recognition and activity of the I-SceI endonuclease on its site and on intron-exon junctions. EMBO J. 12, 2939–2947.

35. Wang, J., Kim, H. H., Yuan, X. & Herrin, D. L. (1997). Purification, biochemical characterization and protein-DNA interactions of the I-CreI endonuclease produced in Escherichia coli. Nucl. Acids Res. 25, 3767–3776. 36. Ward, J. H. (1963). Hierarchical grouping to optimize an objective function. J. Am. Statist. Assoc. 58, 236–244. 37. Kiel, C., Wohlgemuth, S., Rousseau, F., Schymkowitz, J., Ferkinghoff-Borg, J., Wittinghofer, F. & Serrano, L. (2005). Recognizing and defining true Ras binding domains II: in silico prediction based on homology modelling and energy calculations. J. Mol. Biol. 348, 759–775. 38. Schymkowitz, J., Borg, J., Stricher, F., Nys, R., Rousseau, F. & Serrano, L. (2005). The FoldX web server: an online force field. Nucl. Acids Res. 33, W382–W388. 39. Schymkowitz, J. W., Rousseau, F., Martins, I. C., Ferkinghoff-Borg, J., Stricher, F. & Serrano, L. (2005). Prediction of water and metal binding sites and their affinities by using the Fold-X force field. Proc. Natl Acad. Sci. USA, 102, 10147–10152. 40. Akada, R., Murakane, T. & Nishizawa, Y. (2000). DNA extraction method for screening yeast clones by PCR. Biotechniques, 28, 668–670, 672, 674.

Edited by M. Belfort (Received 26 August 2005; received in revised form 19 October 2005; accepted 24 October 2005)