THEJOURNAL OF BIOLOGICAL CHEMISTRY 0 1984 by The American Society of Biological Chemists, Inc.
Vol. 259, No. 16, Issue of August 25, pp. 10606-10613, 1984 Printed in (IS. A.
Sequences of the maZE Gene and of Its Product, theMaltose-binding Protein of Escherichia coli K12* (Received for publication, February 21, 1984)
Pascale DuplaySO, Hugues BedouelleSSll, Audree Fowlerll**, Irving Zabinll**, William SaurinSS,and Maurice HofnungSS From the $Programmation Moliculaire etToxicologie Genetique (Centre National de l a Recherche Scientifique LA 271 -Znstitut a Sante et dela Recherche Medicale U 163), Znstitut Pasteur, 75015 Paris, France and the IlDepartment of National de l Biological Chemistry, School of Medicine and Molecular Biology Institute, University of California, Los Angeles, California 90024
The sequences of the malE gene and of its mature product, the maltose-binding protein, have been determined and arein good agreement. The malE gene encodes thepre-protein (396 amino acid residues) which yields, upon cleavage of the NHP-terminal extension (26 amino acid residues), the mature maltosebinding protein (370 amino acid residues). The malE mRNA could form stable stem and loop structures, some of which may account for translational pauses observed by Randall et al. (Randall, L., Josefsson, L. G. & Hardy, S . J. S . (1980) Eur. J. Biochem. 107, 375-379). The sequence change due to an in-frame nonpolar deletion of 765 nucleotides in malE is also presented as well as homologies between the maltosebinding protein and other sugar-binding proteins. The malB region of the Escherichia coli chromosome is composed of two operons, malE-malF-malG and malK-lamB, transcribed divergently from a controlregion located between malE and malK (Fig. 1) (Hofnung, 1974; Silhavy et al., 1979; Raibaud et al., 1979; Bedouelle et al., 1982). malB encodes the components of the system that transports maltose and maltodextrins acrossthe bacterial envelope (reviewedin Shuman, 1982a; Hengge and Boos, 1983). The LamB protein is located in the outer membrane (Randall-Hazelbauer and Schwartz, 1973);the MalE protein (or maltose-binding protein) is in the periplasmic region (Kellermann and Szmelcman, 1974); the MalF, MalG, and MalK proteins are associated with the cytoplasmic membrane (Shuman et al., 1980; Shuman and Silhavy, 1981; Shuman, 1982b). The nucleotide sequences of the entiremall(-lamB operon (Bedouelle and Hofnung, 1982; Gilson et al., 1982; C16ment and Hofnung, 1981) and of the malF gene’ have been determined. * The costs of publication of this article were defrayed in part by the payment of page charges. This article must therefore be hereby marked “advertisement” in accordance with 18 U.S.C. Section 1734 solely to indicate this fact. 3 Supported by grants from the Direction Generale a la Recherche Scientifique et Technique, the Foundation pour la Recherche Medicale, the Ligue Nationale Francaise contre le Cancer, the Association pourle Developpement de la Recherche surleCancer,Grant CP.960002-ATP.956144 from the Centre National de la Recherche Scientifique, and Grant 1297 from the North Atlantic Treaty Organization. ll Recipient of a special stipend from the Institut Francais du Petrole. Present address, Medical Research Council, Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, England. ** Supported by Grant PCM-8118/112 from the National Science Foundation and Grant AI-04181 from the United States Public Health Service. S. Froshauer and J. Beckwith, personal communication.
MBP2 is a binding protein specific for maltose and maltodextrins with a KO around 1 PM; the affinity is maximum for maltotriose (Schwartz et al., 1976; Szmelcman et al., 1976; Wandersman et aL, 1979). Studies on the binding specificity have suggested that the binding site recognizes the glycosidic bond linking the glucose moities of maltose (Kellermann and Szmelcman, 1974; Ferenci, 1980). MBP can be purified in dimeric form from bacteria that are constitutive for the expression of the maltose system and are grown in the absence of maltose. Maltose induces the conversion of the protein dimers to monomers (Richarme, 1982). There is one binding site for maltose/monomer (Schwartz et al., 1976), and upon binding of the substrate, the protein undergoes a conformational change that can be monitored by fluorescence techniques (Szmelcman et at., 1976; Zukin, 1979). MBP is a multisite protein which, in addition to binding its substrates, interacts with at least two proteins located in different layers of the cell envelope: with LamB (KO = 0.15 p M ) to specifically facilitate the diffusion of maltose and maltodextrinsthrough the outer membrane (Wandersman et al., 1979; Heuzenroeder and Reeves, 1980; Boos and Staehelin, 1981; Bavoil and Nikaido, 1981; Bavoil et al., 1983;Neuhaus et al., 1983)and with the inner membrane methyl-accepting proteinMCP I1 (KOaround 1 p ~to)induce the chemotactic response (Koiwai and Hayashi, 1979;Hayashi and Ohba, 1982; Richarme, 1982). In addition, MBP might interact with the MalG of MalF proteins to allow translocation of the substrate through the cytoplasmic membrane.3 The protein has been crystallized, and the elucidation of its three-dimensional structure isin progress (Quiocho et al., 1979). MBP is synthesized in large amounts. A fully induced cell may contain up to 40,000 monomers, i.e. about one MBP monomer/LamB trimer (Kellermann and Szmelcman, 1974; Dietzel et al., 1978). Expression of the malE gene is activated by the MalTprotein in the presence of maltose or maltodextrim and by the cyclic AMP receptor protein;a detailed genetics analysis of the malE promoter has revealed potential interaction sites for MalT and the cyclic AMP receptor protein and a long-range interaction with the diverging malK promoter (Bedouelle, 1983). Like most outer membrane and periplasmic proteins, MBP is synthesized initially withan amino-terminal signal peptide which is necessary for the initiation of export through the ~
~~
* The abbreviations used are: MBP, maltose-binding protein; SDS, sodium dodecylsulfate; HPLC, high-pressureliquid chromatography; PTH, phenylthiohydantoin; dansyl, 5-dimethylaminonaphthalene-lsulfonyl. H. Shuman, personal communication.
10606
10607
Sequences of the malE Gene and Its Product TABLEI Bacterial strains Strain
pop1741 JC10289 HBlOl MC4100 AmalE444 MC4100 HS2019 pop3971 PD1 PD2 PD3 PD4 PD5
References
Genotype
HfrG6 malBA114 his F-thrl leuB6 proA2 his4 argE3 thilmtll gatC? aral4lacy1 galk2 xy15 rpsL3l tsx33 supE44A (srlR-recA)306:TnIO F-proA2 aral4lacy1 galK2 xy15 mtll rpsL2O supE44 hsdS2O (r-B, mB) recA13 F-thiA relA araD139A lncU169 rpsL MC4100 malTpl malQ7 AmalE444 pop3971 A(srlR-recA)306::TnZO MC4100 HS2019 A(srlR-recA)306:TnlO A(srlR-recA)306::TnlO pop3971 PD1 A(srlR-recA)306:TnlO
Hofnung et al. (1974) Csonka and Clark (1979) Boyer and Roulland-Dussoix (1969) Casadaban (1976) Shuman (1982b) Chapon (1982) This study This study This study This study This studv
RESULTS cytoplasmic membrane. It interacts with the cell secretory apparatus and is cleaved during the exportprocess or shortly Cloning of the malE Gene-The transducing phage after (Bedouelle et al., 1980; reviewed in Bassford etal., 1984). Xaph80malB130 carries most of the wild-type malB region The MBP precursor is active in binding maltose (Ferenci(Raibaud and etal., 1979). TheAmalE444 deletion was transferred Randall, 1979). from the chromosomeof the bacterial strain, HS2019, to the MBP is essential for the energy-dependent translocation of chromosome of phage Xaph80malB130 by in vivo recombinamaltoseandmaltodextrinsthroughthe cytoplasmic mem- tion(“MaterialsandMethods”).Theresulting phage was brane.Thispropertywasdemonstratedusing a deletion, labeled Xaph80malB130AmalE444. AmalE444, that is internal to malE, is nonpolar on malF and The malB region contains a unique EcoRI restriction site malG, and abolishes transport (Shuman, 198213; Brass et al., at the beginningof the malK gene (Bedouelleet al., 1982) and 1981). a StuI site upstreamof malF.5. (Fig. 1).Analysis and comparIn this paper, we present the amino acid sequence of the ison of phage haph8OmalB130 and of its AmalE444 derivative MalE protein, thenucleotide sequence of the malE gene, and with endonucleases revealed that the AmalE444 mutation is the location in this sequence of the AmalE444 deletion. contained within an EcoRI-StuI fragment and deletes about 765 base pairs. The length of the wild-type fragment, about MATERIALS ANDMETHODS 1712 base pairs, was a priori enough to contain the malB Media and General Techniques-The growth media, genetic tech- control region (Bedouelle and Hofnung, 1982) and the entire niques (Bedouelle, 1983), and standardDNA technology (Maniatis et malE coding sequence (Fowler and Zabin, 1982). The wildal., 1982) were as described. Bacterial Strains and Phages-The bacterial strains are listed in type and deleted EcoRI-StuI fragments were purified from between EcoRIand PvuII Table I. Strains harboring a plasmid are noted with their name the transducing phages and inserted followed by the name of the plasmid between parentheses, for exam- sites of plasmid vector pBR322. The recombinant plasmid ple, HBlOl (pPD1). The A(srlR-recA)306:TnIO deletion was intro- that carries the wild-type malE gene was labeled pPD1, and duced into various strains by transduction to tetracycline resistance the one that containsAmalE444 was labeled pPD2. using a Plv grown on strain JC10289. The particular clone of phage Plasmid p P D l Carries a Functional malE Gene-The paXaph8OmalB13 (Raibaud et al., 1979) used in this work was labeled rental plasmid pBR322 and the recombinant plasmids pPDl Xaph80malB130. It carries a wild-type malB region, except for the 3’terminal end of the lumB gene which is deleted. It transduced strain and pPD2 were introduced into strain HB101, which is wild pop1741, which harbors a complete deletion of the malE gene, to type for the maltose system. The tramformed strain HBlOl Mal+. The AmalE444 deletion was transferred from strain HS2019 (pPD1) synthesizeda polypeptide that was predominant, was onto phage Xaph8OrnalB130 by the integration-excision technique, as inducible with maltose, and co-migrated with purified MalE described (Bedouelle, 1983). The resulting phage was labeled protein when total cell extracts were analyzed by polyacrylXaph80malB130AmalE444. AmalE444 was transferred from the latter phage to the chromosome of strain pop3971 by the same technique. amide gel electrophoresis (Fig. 2). In contrast, strains HBlOl (pPD2) and HBlOl (pBR322) synthesized much lower The resulting strain is PD1. Restriction Mapping and DNA Sequencing-The restriction frag- amounts of the samepolypeptide. These results showed that ments were labeled a t their 5’ ends with [w~’P]ATP and T4 poly- p P D l directs the synthesis of a full-length MBP; they connucleotide kinase by the exchange reaction (Maniatis et al., 1982). firmed that the malE promoter is functional and under malThe two labeled ends were segregated by secondary restriction cuts. The labeled fragments were separated by electrophoresis through thin tose control when carried by a multicopy plasmid (Bedouelle et al., 1982). (0.35 mm) non-denaturing 5% polyacrylamide gels and eluted off the A Mal’ strain generally becomes partially Mal- when transgel pieces by overnight diffusion a t 37 ‘C in 0.1% SDS, 0.3 M sodium acetate, pH 8. Fine restriction maps of the purified fragments were formed witha multicopyplasmid that carries the malB control obtained by the method of partial digests and their nucleotide se- region. This effect has been attributed to a titration of the quences by the Maxam and Gilbert technique as described (Maniatis transcription activator MalT by the multiple copies of the et al., 1982). Protein Sequencing-The corresponding materials and methods malE andmalK promoters; under these conditions, the intracellular concentration of MalT would become limiting for the are given in Miniprint4 and include Fig. l a and Table Ia-IIIa. _____ expression of the maltose operons.‘ Accordingly, strain PD2
‘
Portions of this paper (includingpart of “Materials and Methods,” part of “Results,” part of “Discussion,” Fig. la, Tables Ia-IIIa, and additional references) are presented in miniprint at the end of this paper. Miniprint is easily read with the aid of a standard magnifying glass. Full size photocopies are available from the Journal of Biological Chemistry, 9650 Rockville Pike, Bethesda, MD 20814. Request Document No. 84-M526, cite the authors, and include a check or
._____
money order for $5.20 per set of photocopies. Full size photocopies are also included in the microfilm edition of the Journal that is available from Waverly Press. S. Froshauer, personal communication. H. Bedouelle, unpublished observations.
Sequences of the malE Gene and Its Product
10608
,
, malK
lam8
_””” r
,p, malE -
J
I
malF
“
---
““
””
r
, malt3 ““
I
malE
I
I
>
I I
,200bP,
b-+-A444-*
-
FIG.1. Restriction map and sequencing strategy for the malE gene. The upper section of the figure shows the genetic structure of the mall3 region. The middle section is an enlargement of the EcoRI-Stul restriction fragment that contains the mnlE gene. The large arrow shows the position of the sequence encoding the preMalE protein in this fragment. The restriction sites that we used to determine the DNA sequence are indicated with the TaqI sites drawn under the line. We also used a TthlllI site which is located 123 base pairs (bp) downstream of the StuI site and is unique in plasmid pPD1. The horizontal arrows in the lower section represent the sequenced DNA fragments. The tails of the arrows indicate the labeled 5’-ends of the fragments. The length of the arrows corresponds to the extentof the sequences read off the gels. The complete nucleotide sequence of the malE gene was obtained a first time from the AoaII, BgnI and HinfI sites and then verified from the BssHII, DdeI, NcoI, and one of the TaqI sites. 81% of the malE gene was sequenced on both strands. This calculation takes into account the fact that thesequence of the first 600 base pairs from EcoRI site was determined previously (Bedouelle and Hofnung, 1982).
became partially Mal- whentransformed with plasmids pPDl and pPD2, whereas it remained Mal+ when transformed with pBR322 (Table 11); strain PD4, a mal-1 malTp7 double mutant that overproduces MalT 30-fold (Chapon, 19821, remained Mal+ when transformed with any of the three plasmids. Strain PD5, a AmalE444, mal-1 mal-7 mutant, became Mal+ when transformed with plasmid pPDl but remained Mal- when transformed with pPD2. This last result demonstrated that pPDlcan complement the AmalE444 mutation for maltose fermentation and therefore that it carries a functional malE gene. Nucleotide Sequence of the malE Gene-Comparison of the DNA fragment obtained after digestions of plasmids pPDl and pPD2 with various restriction endonucleases allowed the construction of a preliminary restriction map of the EcoRIStuI fragment that contains the m l E gene and the approximate localization of the AmalE444 deletion in this fragment. Fig. 1 shows this restriction map and the strategy used to determine both the nucleotide sequence of the malE gene and the position in this sequence of the AmalE444 deletion using the Maxam and Gilbert technique. Fig. 3 shows the nucleotide sequence data. The AmalE444 mutation removes 765 base pairs i.e. exactly 255 amino acids, and preserves the reading frame. Protein Sequence Strategy-Primary cleavages of the maltose-binding protein with carried out with cyanogen bromide, trypsin, and the glutamic acid-specific Staphylococcusaureus V-8 protease. In the case of the trypsin digestion, the lysines wereblocked with citraconyl groups. The only methods needed to separate the peptides were sizeseparation on Seph-
FIG,2. Expression of the malE gene on plasmid pPD1. The cells were grown in maltose (+) or in glycerol (-) minimal medium supplemented with 0.2% casamino acids. Whole cell extracts were prepared and analyzed by electrophoresis through 10% polyacrylamide gels in SDS as described (Laemmli, 1972). The gels were stained with Coomassie Brilliant Blue. About 10s cells were loaded per well. Lane I, HBlOl (pBR322); lanes 2 and 3, HB101 (pPD2); lanes 4 and 5,HBlOl (pPD1); lane 6,4wg of purified E. coli MalE protein. adex and reverse-phase chromatography by high-pressure liquid chromatography. The protein contains 6 methionine and 6 arginine residues so that seven peptides would be expected from both the cyanogen bromide (Table Ia) and tryptic (Table IIa) digests. In both cases, all seven peptides were isolated. Cleavage of Met-Ser and Met-Thrbonds by cyanogen bromide was quite high. If anything, cleavage of Met-Pro was poor since the yield of CB6 was somewhat low. Since the protein contains no cyteine, incorporation of [%SI sulfate into thecells yielding maltose-binding protein labeled only in methionine. Therefore, the methionine-containing peptides in the tryptic and Staphylococcus protease digests were easily identified. These labeled peptides were sequenced snd used to place the cyanogen bromide peptides in order.
CCA Pro
TTC Phe
AAA Lys
CAG Gln
ATT Ile
AAA GCG TTC Lys Al a Phe
GGC AAG CTG Gly Lys Leu
GCT Ala
TAC Tyr
AAG Lys
GGC TAC Gly Tyr
405 GCT GTT Ala Val 109 GAA Glu
ATT Ile
GCT Ala
GGT AAG 61 y Lys -T45
567 GCT Ala 163
513 AGC Ser
GAC Asp
AT6 MET
GGG GGT Gly Gly
GCG CTG Ala Leu
GAA Glu
ATC Ile -
TAT Tyr
TTC Phe
ATC Ile
TTC Phe
AM Lys
GTA Val
ACG Thr
CTG Leu
GCG TTC Ala Phe
AAC Asn
ATT Ile
CGT Arg
ACC Thr
AAG Lys
CAA Gin
270 CAC His x4
216 AAA Lys 46
162 GAG Glu 28
108 TOG Trp 10
TAT Tyr
TAC Tyr
TAT Tyr
GAA Glu
594 GAA Glu 172
540 CCG Pro 154
406 GAT Asp 136
432 AAC Asn 118
378 AAC Am 100
324 CCG GAC Pro Asp 82
TGG GCA Trp Ala -
GA6 Glu
TTC Phe
ATC Ilc
ACG A% Thr IYT -9
CCG GCG CTG Pro Ala Leu
TCG CTG Ser Lcu
GCC GTA Ala Val
GAG ATC Glu Ile
GCG TTA Ala Leu
AAG Lys
CTG Leu
TTA Leu
AAA CTG GM Lys Leu Glu -
GGT Gly
AAA Lys
GCA Ala
GCT GAA Ala Glu
ATT Ilc
TOG OAT Trp Asp
TTT Phe
351 CC0 Pro 91 ACC Thr -
GGC CTG TTG Gly Leu Let,
GAC Asp
297 TCT Ser 73
GGC CCT Gly Pro -
GTC Val
GGT Gly
TCC Ser
CCG GAT Pro Asp
GAA Glu
GAA Glu
TTA Leu
GTA Val
GGT Gly
ATC Ile
TTC Phe
GCA Ala
CTG Leu
GAC Asp
GTA Val
CTG Leu
ATT Ile
GAA Glu
GTT Val
ATT Ile
CGT CAG Arg Gln
TCC Ser
ACT Thr
GCT Al a
TTC Phe
AAG Lys
TCC Ser
GCT Ala
GGT Gly
AAC Asn
ACT Thr
GAT Asp
GAA Glu
GTC Val
TTC Phe
CT6 Leu
945 G6T Gly 289
999 TAC Tyr 307
AAA Lys
AAC Asn
891 CC0 Pro 271
GAT Asp
1107 GCC GTG COT Ala Val Arg 343
ACT Thr
1161 GAA GCC CTG AAA GAC Glu Ala Leu Lys Asp 361
TGG TAT Trp Tyr
GGT Gly
GAG GAA Glu Glu
GAA Glu
TCC Set-
1053 GAA AAC GCC CAG AAA Glu Asn Ala Gin Lys 325
GCG CTG AAG TCT Ala Leu Lys Ser
CTG Leu
AAC Asn
6GC Gly
ATG MET
CC0 Pro
ATC Ile
GCG CAG Al a Gin
GCG GTG Ala Val
GAA Glu
GAG TTG 61 u Leu -
GCG GTT Ala Val
GAG CTG Glu Leu
AAA Lys
GTT Val
GGT Gly
GGC GTG Gly Val
TAT Tyr
ACC Thr-
GAC Asp
AAA Lys
ACT Thr
ATC Ile
ATG MET
GAT Asp
GAC Asp
CGT At-g
AAC Asn
ATC Ile
CCA Pro
AAA Lys
1080 CCG Pro 334
1026 CGT Arg 316
972 CCG Pro 298
916 CTC Leu 280
864 CTG Leu 262
810 GTA Val 244
756 ATC Ile 226
702 ACC Thr 208
648 GCG Ala 190
1iSS ATC ACC AA0 11 e Thr LYP 370
1134 GCC GCC AGC Ala Al a Ser 352
CCG AAC Pro Asn
GCG AAA Al a Lys
AAT Asn
GCG AAA GAG TTC Ala Lys Glu Phe
TTC Phe
GCA Ala
GCG ATG Ala MET
AAT Asn
GCT GGC GCG AAA Ala 01 y Ala Lys
GAA ACA Glu Thr
AAA CAC Lye His -
GAT Asp
GA!2 ACC AGC AAA GTG AAT Asp Thr Ser Lys Val Asn
AAA Lys
AAC Asn
037 CAA CCA Gln Pro 253
703 ATC Ile 235
AAT Atm
729 GCC TTT Ala Phe 217
GAC CTG Asp Leu
AAA Lys
GGC GTG Gly Val
675 ATT Ile l-w
621 AAA GAC GTG Lys Asp Val 181
AAC GCC GCC AGT Asn Ala Ala Ser
CCG ACC Pro Thr -
GCC GCC ACC ATG Ala Ala Thr MET
CAG ATG Gln MET
ATT Ile
TAT Tyr
6GT Gly
CTG Leu
GGT GCC Gly Ala
GAA AAC Glu Asn
CTG Leu
TCC SW
ACC Thr
TAC Tyr
GGC CCG TGG G&A TGG 61 y Pro Trp Ala Trp
TAC Tyr
CTG Leu
GGC AAG Gly Lys
AGC GCA Set- Ala
ACG Thr
AAC Asn
GAT Asp
GGT Gly
AAC Asn
TAA .
FIG. 3. Sequences of the mdE gene and of its product. The nucleotide sequence of the m&E gene is numbered from 5’ to 3’ with the MalE protein sequence shown below. The vertical arrow indicates the site at which the MalE precursor is cleaved to yield the mature protein. The amino acid residues in the signal peptide have ncgatiue numbers. The underlined amino acids .- correspond to residues that were not characterized by protein sequencing. TWO hor~zontul arrows, at nucleotide positions 28 and 792, bracket the 765 base pairs deleted by the Am&B444 mutation.
CCG CTG Pro Leu
ACC Tht-
TAC Tyr
TGG Trp
CTG AAA GCG AAA Leu Lys Al a Lys
AAA GAA Lys 61 u
TTC Phe
TAT Tyr
CAA Gln
243 GGC OAT Gly Asp 55
GAG CAT Glu His -
GCT Ala
135 AAC GGT CTC Asn Gly Leu 19
109 ACC OTT Thr Val 37
GAA Glu
GCA Ala
ATC Ile
81 GCC AAA Ala Lys
CTC Leu
277
CGC ATC Arg Ile -18
CCG ATC Pro Ile
CTG Leu
GCT Ala
ACT Thr
AAA GTC Lyr Val
GGC TAT Gly Tyr
GCG GCA Ala Ala
GAC Asip
GCA Ala
GCT CTC Ala Leu
GGT Gly
459 CTG CTG CCG AAC CCG CCA AAA ACC TGG Leu Leu Pro Asn Pro Pro Lys Thr Trp 127
GGT Gly
GAC CGC TTT Asp Arg Phe
CAG GTT Gln Val -
ACC GGA ATT Thr Gly Ile
GAT Asp
AAA Lya
AAA GAT Lys Asp
ACA Thr
GCC TCG Ala Set-
AAC GGC GAT Asn Gly Asp
TCC Ser
ATG TTT MET Phe
AAA Lys
ATT Ile
ATA Ile
ATG AAA MET Lys
Sequences of the malE Gene and Its Product
10610
A 1 32 33
RBP
1 32 33
ABP
59
L "
1 34 35
GBP
MBP
59
L""
118
92
177
235 36 271 ""
199
107
""""""_
58
51
184
""U I
92 94
-
92
"""_"
59
L""
1
B
56
255 51
306 I
"""
111
205
51
256
771
288
56
34326 370
56
309
_ _ _ _ _ __ _ - -_ _____ U J
119
L-.
""_
u "_I
""
A
B
FIG. 5. Sequence homologies between sugar-bindingproteins. The sequences of the maltose (MEP; this work)-, ribose (RBP; Groarke et al., 1983)-, galactose (GBP; Mahoney et al., 1981)-, and arabinose (AEP; Hogg and Hermodson, 1977bbinding proteins were compared using (i) dot matrices (Gibbs and McIntyre, 1970)' and (ii) an adaptation of the program described by Korn et at. (1977) to protein sequence (Brutlag et al., 1982). This analysis revealed two main regions of homology noted A and B and allowed the alignment of the four protein sequences. This alignment is shown schematically in the upper part of the figure. The A and B regions are represented by solid lines, with their coordinates in the sequences (roman numbers) and the lengths of the polypeptide segments involved (italicized numbers, in numbers of residues). Notice that the distances between A and B are smaller in RBP (19 residues less) and in ABP (4 residues less) than in GBP or MBP. The C-terminal end of MBP would be shorter than those of the three other proteins, while the NHZ-terminal end of MBP would be longer by over 80 amino acid residues. The lower part of the figure compares the sequences of the A and B regions. The amino acid occurring at thesame position in at least three of the four proteins are written ascapitals in the one letter code. The boxed amino acids are either identical or belong to the same class of conservative replacement: A/P; D/E/N; F/Y/W; I/L/M/V; K / R S/T (Dayhoff, 1972). More than 45% of the residues are boxed in reeion A and more than 39% in B. This comuarison extends to MBP the alignment between ABP, GBP, and RBP-previously published (Argos et al., 1981).
-
TABLEI1 Complementation and titration by plasmidp P D l The indicated recA host strains were transformed with plasmids pPD1, pPD2, and pBR322 and streaked on McConkey maltose indicator agar containing 100 pg/ml of ampicillin. The coloration of the colonies is symbolized by + and -, going from intense red (+++), indicating strong fermentation of maltose, to red (++), pink (+I, and white (-), indicating no fermentation of maltose. Similar results were observed by looking at growth on minimal maltose plates; however, on this medium, strain pD5 (pPD1) presented irregular colonies possibly indicating some growth inhibition. marker
Plasmids
Host strain
Relevant
PD2 PD3 PD4 PD5
Wild type AmalE444 rnalTpl Tp7 malTpl Tp7 AmalE444
pPD2
pPDl
+ + +++ +++ ++
pBR322
+-
+++ -
-
-
+++
The fact that 3 of the methionine residues as well as 4 of the arginine residues are clustered in the last 55 residues (of 370) of the protein made the sequence determination more difficult. Three large peptides (CB1, 148 residues; CB4, 97 residues; and CT3, 218 residues) were obtained from the cyanogen bromide and tryptic digests. The Staphylococcus W. Saurin, unpublished programs.
200
1
400
1
"
I
I
100
1
I -I
I
I1
600
800
1
I
200 -I
In
1000 I
1188
1
_."
no
300 I-
IPJ
" I -
"
V VI
U1
FIG. 4. Secondary structure of the malE mRNA. The numbers abooe the line correspond to thenucleotide sequence of the malE gene; the numbers below the line to the amino acid sequence of the MalE mature protein. The stem and loop structures that could form in the malE mRNA having calculated energy of formation AG < -10 kcal/mol (Tinoco et al., 1973) have been numbered from I to VII. The two palindromic sequences that form the stem of each structure are represented by two black boxes; their coordinates in the nucleotide sequence and the AG for the structures are as follows: I , 231-251/ 278-301, AG = -17.1; II, 320-343/351-371, AG = -14.4; III, 644666/683-705, AG = -17.0; I V , 773-787/804-825, AG = -17.5; V, 1046-1071/1079-1106, AG = -17.9; VI, 1024-1069/1080-1126, AG = -28.3; VII, 1090-1119/1128-1157, AG = -15.1.
protease digest yielded on the average much smaller peptides since the protein contains 27 glutamic acid residues. These peptides were therefore used to complete the sequence of the large peptides, and in particular, the SP peptides isolated from the 35S-proteinwere used to order the CB peptides. Some subfragmentations were necessary to complete the sequence determinations of the large peptides. Cleavage with Staphylococcus protease yielded a complex
Sequences malE of the
Gene Product Itsand
10611
REFERENCES mixture of peptides (Table IIIa),which might be due to some residual structure of the protein. Actual splitting would be Argos, P., Mahoney, W. C., Hermodson, M. A. & Hanei, M. (1981) J. Biol. Chem. 256,4357-4361 expected in only 19 of the 27 glutamic acids since there are three Glu-X-Pro sequences, one Glu-Pro, two Glu-Glu, and Bavoil, P. & Nikaido, H. (1981) J. Biol. Chem. 256,11385-11388 one Glu-Glu-Glu sequence in the protein. Staphylococcuspro- Bavoil, P., Wandersman, C., Schwartz, M. & Nikaido, H. (1983) J. Bucteriol. 1 5 5 , 919-921 tease does not split the two types of proline sequences and is Bassford, P. J., Jr., Bankaitis, V. A., Rasmussen, B. A. & Ryan, J. P. not an exopeptidase, so it does not release free glutamic acid. (1984) Microbiology (Wash. D.C.) 8-12 Peptides isolated from the Staphylococcus protease digests Bedouelle, H. (1983) J. Mol. Biol. 170,861-882 were frequently sequenced to thecarboxyl terminal. Bedouelle, H. & Hofnung, M.(1982) Mol. Gen. Genet. 185,82-87 The residues that were not identified by protein sequence Bedouelle, H., Bassford, P. J., Jr., Fowler, A. V., Zabin, I., Beckwith, J. & Hofnung, M.(1980) Nature ( L o r d . ) 285,78-81 analysis are underlined in Fig. 3. The peptides used to deduce the sequence are indicated in Fig. l a of the Miniprint. Most Bedouelle, H., Schmeissner, U., Hofnung, M. & Rosenberg, M.(1982) J . Mol. Biol. 161,519-531 of the residues were sequenced at least three times from Boos, W. & Staehelin, L. (1981) Arch. Microbiol. 129,240-246 different peptides. Boyer, H. W. & Roulland-Dussoix, D. (1969) J. Mol. Bid. 4 1 , 459472 Brass, J. M.,Boos, W. & Hengge, R. (1981) J. Bacteriol. 146, 10-17 DISCUSSION Brutlag, D. L., Clayton, J., Friedland, P. & Kedes, L. H. (1982) Nucleic Acids Res. 10, 279-294 The complete nucleotide sequence of the malE gene was determined. It encodes a precursory polypeptide of 396 amino Casadaban, M. (1976) J . Mol. B i d 104,541-555 Chapon, C. (1982) EMBO J. 1,369-374 acids; the 26 NH2-terminal amino acids correspond to the Clement, J . M. & Hofnung, M. (1981) Cell 2 7 , 507-514 signal peptide (Bedouelle et al., 1980), and the 370 remaining Csonka, L. N. & Clark, A. J . (1979) Genetics 93,321-343 amino acids correspond to the mature MBP. The amino acid Dayhoff, M. 0. (1972) Atlas of Protein Sequence and Structure, Vol. sequence was obtained both by direct sequencing of most of 5, p. 96, National Biomedical Research Foundation, Washington, D. C. the protein and by DNA sequencing. The molecular weight of the mature protein, 40,661, predicted from the sequence is in Dietzel, I., Kolb, V. & Boos, W. (1978) Arch. Microbiol. 179, 921934 good agreement with the experimental values, 44,000 and Ferenci, T. (1980) Eur. J . Biochem. 108, 631-636 37,000, obtained by equilibrium centrifugation and polyacryl- Ferenci, T. & Randall, L. L. (1979) J. Biol. Chem. 2 5 4 , 9979-9981 amide gel electrophoresis in thepresence of SDS (Kellermann Fowler, A. V. & Zabin, I. (1982) Ann. Microbwl. (Paris) 133A, 49and Szmelcman, 1974). MBP contains 45% nonpolar, 29% 53 uncharged polar, 14%acidic, and 12%basic amino acids. This Gibbs, A. J . & McIntyre, G. A. (1970) Eur. J. Biochem. 16, 1-11 distribution is similar to thatfound in other binding proteins Gilson, E., Nikaido, H. & Hofnung, M. (1982) Nucleic Acids Res, 1 0 , 7449-7458 (see Groarke et al., 1983, and references therein). Groarke, J. M., Mahoney, W. C., Hope, J. N., Furlong, C. E., Robb, Intermediate polypeptides transiently accumulate during F. T., Zalkin, H. & Hermodson, M. A. (1983) J. Biol. Chem. 2 5 8 , the synthesis of MBP due to drastic reductions in the rate of 12952-12956 elongation at specific sites of the mRNA. These pauses occur Hayashi, H. & Ohba, M.(1982) Ann. Microbiol. (Paris) 133A, 195197 about 25 and 50 amino acids upstream of the C-terminal end of the protein (Randall et al., 1980). We did not find any Hengge, R. & Boos, W. (1983) Biochim. Biophys. Acta 737,443-478 accumulation of rare codons in the malE gene that could Heuzenroeder, M. W. & Reeves, P. (1980) J. Bacteriol. 141,431-435 Hofnung, M. (1974) Genetics 76,169-184 account for these pauses. Asearch for mRNA secondary Hofnung, M., Hatfield, D. & Schwartz, M. (1974) J. Bacteriol. 117, structures using the SEQ program (Brutlag et al., 1982) re40-47 vealed seven stable stemand loop structures (AG < -10 Kcal/ Hogg, R. W. & Hermodson, M. A. (1977) J. Bid. Chem. 2 5 2 , 51355141 mol) along the malE message. Three of these structures, including the stablest one (AG = -28 Kcal/mol), are clustered Kellermann, 0. & Szmelcman, S. (1974) Eur. J. Biochem. 4 7 , 139149 at the end of the gene. They could transiently arrest the 0. & Hayashi, H. (1979) J. Biochem. (Tokyo) 8 6 , 27-34 ribosome at theobserved sites (Fig. 4) and thusbe responsible Koiwai, Korn, L. J., Queen, C. L. & Wegman, M. N. (1977) Proc. Natl. Acad. for at least some of the observed pauses. Sci. U. S. A. 74,4401-4405 Comparison of the primary sequence of the maltose-binding Laemmli, U. K. (1970) Nature (Lond.)2 2 7 , 680-685 protein with that of other bacterial periplasmic binding pro- Maniatis, T., Fritsh, E. F. & Sambrook, J (1982) Molecular Cloning, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY teins revealed two main regions of homologies (Fig. 5, A and B ) . Alignment of the sequences based on these two regions Neuhaus, J. M., Schindler, H. & Rosenbusch, J. P. (1983) EMBO J . 2,1987-1991 may correspond to structural similarities (Argos et aL, 1981; Quiocho, F. A., Meador, W. E. & Pflugrath, J. W. (1979) J. Mol. Biol. Vyas et al., 1983) and suggests in particular that MBP may 133, 181-184 have an extra segment of over 80 amino acid residues at its Raibaud, O., Clement, J. M. & Hofnung, M. (1979) Mol. Gen. Genet. 174,261-267 NH2 terminus compared to the three other proteins (legend to Fig. 5). The amino acid sequence of MBP should help Randall, L., Josefsson, L. G. & Hardy, S. J. S. (1980) Eur. J. Biochem. 1 0 7 , 375-379 elucidate the three-dimensional structure of this protein. The L. & Schwartz, M. (1973) J. Bacteriol. 1 1 6 , nucleotide sequence of the gene will allow a detailed genetic Randall-Hazelbauer, 1436-1446 analysis of the structure, functions, and interactions in par- Richarme, G. (1982) Biochem. Biophys. Res. Commun. 105,476-481 ticular with its substrates andwith the other cellular compo- Schwartz, M., Kellermann, O., Szmelcman, S. & Hazelbauer, G. nentsthatare involved in thetransport of maltose and (1976) Eur. J. Biochem. 71,167-170 Shuman, H. A. (1982a) Ann. Microbwl. (Paris) 133A, 153-159 maltodextrins and in the chemotaxis towards these sugars. Shuman, H. A. (1982b) J . Biol. Chem. 2 5 7 , 5455-5461 Shuman, H. A. & Silhacr, T. J. (1981) J. Biol. Chem. 2 6 6 . 560-562 Acknowledgments-We thank S. Froshauer, C. Lee, and J. Beck- Shuman, H. A., Silhavy, T.J. & Beckwith, J. R. (1980) J. Bwl. Chem. with for communication of unpublished data andC. Chapon, J. Clark, 255, 168-174 and H. Shuman for bacterial strains. We thank W. Boos for the Silhavy, T. J., Brickman, E., Bassford, P. J., Jr., Casadaban, M. J., maltose-binding protein and for his continued interest in this project. Shuman, H. A,, Schwartz, V., Guarente, L., Schwartz, M. & BeckI
~~~
~"
10612
Sequences of the malEand Gene
with, J. (1979) Mol. Gem Genet. 1 7 4 , 249-259 Szmelcman, S., Schwartz, M., Silhavy, T. & Boos, W. (1976) Eur. J. Biochem. 65,13-19 Tinoco, I., Jr., Borer, P. N., Dengler, B., Levine, M., Uhlenbeck, 0. C., Crothers, D. M. & Gralla, J. (1973) Nature New Biol. 246.4041
t a CB7.
Its Product
Wandersman, C., Schwartz, M. & Ferenci, T. (1979) J. Bmteriol. 140, 1-13 Zukin, S. (1979) Biochemistry 18,2139-2145 Vyas, N. K., Vyas, M. N. & Quiocho, F. A. (1983) Proc. Natl. Acad. Sci. U. S. A. 80. 1792-1796 Additional references are found below.
Sequences of the malE Gene and Its Product
10613 TABLE l l l d
Amino Acld Compollt~on o f SP Peptldes Amno Acid CB1
CB2
CB3
C84
CB5
CB6
SP3-4-5
SP7
sP9
C87
SPIO-11835
Aspartic Acid Thmonlne Serine Glutamic Acid Proline Glycine Alanine
valine
6.815) 2.7(3) 1.511) 5.017) 3.6131 6.0(7) 5.015) 3.414)
7.0(7) 4.0151
5.7151 3.5(4)
4.515)
2.9(2)
1.212) 3.7(51 1.111)
lraleucine Leucine Tyrosine Phenylalanine
2.4(2)
2.3(2) 7.7(8) 2.6131 1.7(2) 3.5(4)
1.4(1) 1.1(1) 1.0(11
3.5(31
1.0(11
A111 2.9(31
__ 370
56
27
42
60
Residues
23-78
112-138
173-214
215-274
Amno Tcminal
Val
Ala
A m
M
61
30
TOt*l
YWld
no
-
19.1
Ala
notdetemined The nmbers i n parenthesis dm frm t h e sequence.
TABLE I l a
Amino Acid C n p o n t i o n r o f Tryptic Peptides Am1 no
ACId
9.018) 3.113)
3.213) 1.712)
9.1181
3.3(3)
3.113) B.9(8)
2.3(2) 3.1(3)
29.3(29) 8.6(101 7.3 ( 9 ) 17.3(18) 12.4114) 17.8(16)
4.2(+)
3.414)
21.9(26)
5451
1.4(1)
4.8(61
1.4111 2.813) 1.9121 2.513) 2.012)
11.7(11) 3.1 ( 3 ) 2.713) 13.1(11) 24.31231 10.2111) 7.3 I81 25.7123) 1.8 ( 1 )
.2
3.6(3) 1.0(11 3,013) 9.0(91 1.7(2)
1.0(1)
.2
66
32
218
Residues
1-66
67-98
99-316
Amine Terminal
Lys
Phe
Yield(%)
40
81
-
6.3(6)
3.4(1) 4.5(51 6.9(6) 6.211) 3.9151 1.011) 2.6(21 2.3121 1.1(11 2.3(31 4.315)
Uethionine
6.1 (61
TOTAL
SP19-535 ~
17.9(16) 6.1 (6) 4.3 13) 15.7(15) 10.9110) 15.2113) 16.1(141 8.4 ( 7 ) NII (1) 9.2(111 13.8(141 6.2 (6) 16.9(18) 1.7 I21 NO 14) 1.6 (21
SPl4-1516
54
2.2(21
1.1111 1.3(1) 1.0(1) 3.9141 2.1(2)
1.2(11 1.0(1)
1.7W
1.0111
3.0(3)
1.6(1]
1.2(1)
4.0(5)
2.5(3) .811)
l.O(l)
2.012)
2.1(2)
I.O[l)
2.8131
.8(1]
1.0(1)
1.2(11 1.0111 1.011)
l.Oll1
28
317-344
Tyr
Ile
30
26
NO nntdetermined The numbers l n paremhesir am from the sequence
1.0(11
10
13
345-354 355-367
Thr
1.0(11
3
368-370
Gln
Ile
41
6
62
29
282-310
48
323-370
As"
Am
42
35