Coupling Proteomics and Transcriptomics for the ... - Page d'accueil

Mar 14, 2011 - es, an important step is to gain knowledge at the molecular level about the ... Until now, sequences of shell matrix proteins have been ob- tained by .... The three KRMP variants possess an identical signal peptide. Figure 1.
768KB taille 5 téléchargements 186 vues
DOI: 10.1002/cbic.201000667

Coupling Proteomics and Transcriptomics for the Identification of Novel and Variant Forms of Mollusk Shell Proteins: A Study with P. margaritifera Sophie Berland,*[a] Arul Marie,[b] Denis Duplat,[a] Christian Milet,[a] Jean Yves Sire,[c] and Laurent B¤douet[a] Shell matrix proteins from Pinctada margaritifera were characterized by combining proteomics analysis of shell organic extracts and transcript sequences, both obtained from the shellforming cell by using the suppression subtractive hybridization method (SSH) and from an expressed sequence tag (EST) database available from Pinctada maxima mantle tissue. Some of the identified proteins were homologues to proteins reported in other mollusk shells, namely lysine-rich matrix proteins (KRMPs), shematrins and molluscan prismatic and nacreous layer 88 kDa (MPN88). Sequence comparison within and among Pinctada species pointed to intra- and interspecies variations relevant to polymorphism and to evolutionary distance, respectively. In addition, a novel shell matrix protein, linkine

was identified. BLAST analysis of the peptide sequences obtained from the shell of P. margaritifera against the EST database revealed the presence of additional proteins: two proteins similar to the Pif97 protein that was identified in the shell of P. fucata, a chitinase-like protein previously identified in Crassostrea gigas, two chitin-binding proteins, and two incomplete sequences of proteins unknown so far in mollusk shells. Combining proteomics and transcriptomics analysis we demonstrate that all these proteins, including linkine, are addressed to the shell. Retrieval of motif-forming sequences, such as chitin-binding, with functional annotation from several peptides nested in the shell could indicate protein involvement in shell patterning.

Introduction Mollusk shell is a composite material made of crystallized calcium carbonate and organic matrix that patterns the mineralized network, crystal habit and microarchitecture. In Pinctada margaritifera, a bivalve, the shell is made of two well-separated, superimposed layers composed of distinct crystallographic polymorphs, namely, prismatic calcite and aragonite sheets of nacre. The organic matrix is located both between and inside the mineral crystals in the two layers, and incorporates different proteins, including glycoproteins and polysaccharides.[1, 2] These proteins are synthesized by the pallial cells and undergo maturation in the shell matrix by a mechanism such as supra molecular arrangement.[3] Later, the shell matrix formation might involve solidification by dehydration.[4] In order to understand the role of proteins in the biomineralization processes, an important step is to gain knowledge at the molecular level about the proteins present in the shell matrix. However, the mature organic framework of the shell is resistant to solubilization and extraction, and presents considerable challenges for efficient extraction of the incorporated proteins.[5, 6] Although the interaction between the organic matrix and mineral crystals were described some decades ago,[7] characterization of the macromolecules present in the shell matrix was only undertaken in the mid-1990s.[8] During the last fifteen years various strategies have been developed in order to characterize the organic components of the mollusk shell matrix, but to date only a few proteins and peptides have been characterized in bivalves, gastropods and cephalopods.[9–12]

950

Until now, sequences of shell matrix proteins have been obtained by using various approaches, such as amino acid sequencing of proteins isolated from the shell,[13] DNA cloning strategies based on screening of cDNA libraries by using antibodies,[14] oligonucleotide probes,[15] search for glycine-rich sequences within a cDNA library,[16] RACE (rapid amplification of cDNA ends) experiments from mantle cell poly-A mRNA[17] and PCR amplification by using degenerate oligonucleotide primers deduced from N-terminal peptide sequences.[18–20] These sequences have provided basic information on the primary structure of shell matrix proteins, mainly in gastropods and bivalves, and these findings have indicated that the protein edifice is generally made of a few prevailing amino acids, often organized into repeats.[14, 15, 18, 21, 22] [a] Dr. S. Berland, Dr. D. Duplat, Dr. C. Milet, Dr. L. B¤douet UMR BOREA (Biologie des Organismes et Ecosystºmes Aquatiques) MNHN/CNRS 7208/IRD 207, CP 26 43 rue Cuvier, 75231 Paris Cedex 05 (France) Fax: (+ 33) 1-40-79-37-71 E-mail: [email protected] [b] Dr. A. Marie D¤partement RDDM, FRE 3206 CNRS Plateforme de Spectrom¤trie de Masse et de Prot¤omique Mus¤um National d’Histoire Naturelle 57 rue Cuvier, 75005 Paris (France) [c] Dr. J. Y. Sire UMR 7138, Equipe “Evolution et D¤veloppement du Squelette” Universit¤ Pierre & Marie Curie 7 quai St. Bernard, 75005 Paris (France)

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

ChemBioChem 2011, 12, 950 – 961

Identification of Mollusk Shell Proteins Recently, several cDNA libraries have been created from the mantle-specific cells involved in the expression of shell matrix proteins, and their sequencing (transcriptomics) has allowed the identification of putative shell matrix protein genes.[16, 23–25] Adapting the above mentioned approach, Jackson and coworkers[25] initiated a comparative genomic analysis of proteins involved in shell biomineralization in bivalves and gastropods. Expressed sequence tags (ESTs) obtained from cDNA libraries constructed from the mantle tissue cells of Pinctada maxima, Pinctada margaritifera and Haliotis asinina theoretically allow the identification of shell matrix proteins in a few steps.[25] Putative shell matrix proteins were identified after sequencing and translation of EST clones based on the identification of the signal peptides. However, this approach might lead to the identification of certain proteins that are exported towards the pallial space as shell matrix proteins. Likewise, comparison of the sequences deduced from the EST clones with the shell proteins that are annotated in databases did not allow the discovery of novel proteins. An efficient approach to identifying novel shell matrix proteins from the nucleotide databases is to couple proteomics studies of the shell organic matrix extracts with transcriptomics experiments. Proteomics approaches rely mainly on the identification of proteins with peptide fragments generated in the gas phase in a mass spectrometer to interrogate the annotated protein or nucleotide databases by using bioinformatics tools. For example, this strategy was applied to identify nacrein, MSI60, and perline in the acid-insoluble organic matrix of the nacre in P. margaritifera.[26] An advantage with this approach is that it permits the generation of peptides with the optimal number of amino acids from the acid-insoluble shell organic matrix obtained after mild decalcification of the shell powder. However, interrogation of databases by using proteomics data does not always lead to the identification of proteins because shell matrix proteins predominantly consist of motifs that are unique and are not found in other cellular systems. When database searches fail, de novo sequencing of peptides generated from shell organic matrix is used to obtain new sequences, as reported in the proteomics analyses of the nacre in Unio pictorum and Nautilus macromphalus.[11, 12]

In order to identify the proteins involved in shell formation of P. margaritifera, we constructed a specific transcript library from shell-forming cells using suppression subtractive hybridization (SSH). This method allows the amplification of transcripts specifically produced by the tissue of interest, that is, the shell-forming cells in the present study, and a restricted cDNA probe library can be built.[27] Our library is composed of 72 differentially expressed cDNA fragments[23] that were used in this study to identify proteins involved in shell formation. Protein homologues of shematrins, KRMPs and MPN88 were identified and also a novel putative shell protein was found, which was named linkine. Proteomics analysis of the organic matrix of the P. margaritifera shell confirmed that these proteins are located in the shell, and can be considered as shell matrix components. In addition, using the EST library constructed from the mantle tissue of P. maxima,[25] we identified seven additional proteins from P. margaritifera shell organic matrix, including two novel shell proteins without any homology with known proteins, a chitinase-like protein and two chitin-binding proteins.

Results Identification of shell matrix proteins of P. margaritifera by coupling transcriptomics and proteomics Lysine-rich matrix protein: Interrogation of the SSH library gave three sequences that share similarity with the KRMP protein sequences of Pinctada fucata (Figure 1). These full-length sequences are considered as polymorphic variants of P. margaritifera KRMP and were named KRMP-4, -5 and -6 (GenBank accession numbers: EF183517, EF183518 and EF183519, respectively). The sequence KRMP-4 has an open reading frame (ORF) that encodes a protein with 136 amino acids (aa) with an estimated molecular weight (MW) of 12.6 kDa and isoelectric point (pI) of 9.3. Sequence KRMP-5 encodes a protein made of 144 aa with a Mw of 13.2 kDa and pI of 9.1. Sequence KRMP-6 encodes a 133 aa protein with a MW of 12.2 kDa and pI of 9.2. The three KRMP variants possess an identical signal peptide

Figure 1. Amino acid sequence alignment of the three KRMP variants (KRMP4–6) identified from the in-house SSH library of P. margaritifera (P.mar) mantle tissue cDNAs with that of P. fucata (P.fuc; KRMP1, -2, -3, -4; accession numbers: AB507416, AB507420, AB507423, AB507428, respectively). The shaded region represents the signal peptide. The peptides detected during proteomics analyses of the shell organic matrix are shown in boxes; (·): residue identical to the top sequence; (¢): insertion or deletion.

ChemBioChem 2011, 12, 950 – 961

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

www.chembiochem.org

951

S. Berland et al. containing 20 residues, and they differ from each other in few amino acid substitutions and in some repeats. More than 60 % of the P. margaritifera KRMP sequence is composed of only three residues (Gly: 39–42 %, Tyr: 18–19 %, and Trp: 7 %) and its primary sequence contains three regions: a tryptophan-rich region (aa 21–61 in our alignment), a glycine/tyrosine-rich region (aa 66–142) that includes several GnY repeats (n = 2, 3 or 4 along the sequence), and the last five residues in the C terminus is composed of RPKKY as specific tail (Figure 1). The hexapeptide (YWWCLK) is detected only in the trypsin digestion of the whole-shell organic extracts; this suggests that a prismatic location was identified in the KRMP sequences of P. margaritifera from the SSH library (Figure 1). BLAST analysis confirmed that this peptide was specific to P. margaritifera KRMP (Table 1). Shematrins: The amino acid sequences deduced from the sequenced clones of our SSH library showed similarity with that of shematrin (shem) sequences of P. fucata, particularly shem1 and shem2 (GenBank accession numbers: BAE93433 and BAE93434, respectively[16]), and also to the sequences of P. maxima shematrins, particularly shem1 and shem3 (accession numbers: AB429365 and AB429367, respectively; Figure 2). We named the shematrins observed in P. margaritifera, shem8 and shem9. As the shem8 sequence lacked the 5’ region, including the start codon, the sequence was completed with RACE PCR by using a reverse primer designed from the partial cDNA sequence (Table 2). A full-length mRNA sequence of 1417 bp was obtained (accession number: EF160119) encoding a protein of 416 residues (MW = 35.6 kDa; pI = 9.1), in which the first 19 amino acids constitute the signal peptide (Figure 2). The shem8 sequence is rich in Gly (46 %), Leu/Ile (12 %) and Tyr (10 %) residues. This glycine-rich protein possesses several GnY motifs (n = 2 or 3 along the sequence; aa 167–310) and an isoleucine-rich domain with a GnI motif (n = 1, 2, 3 or 4; aa 367– 409). The C-terminal region ends with a typical basic RKKKY amino acid sequence. Alignment with other molluscan shematrins indicated that shem8 is close to shem2 of P. fucata and to shem1 of P. maxima (Figure 2).

Similarly the shem9 cDNA sequence was completed by RACE PCR by using a reverse primer (Table 2). We obtained a fulllength mRNA sequence of 1142 bp (accession number: EF160120) encoding a protein with 332 aa. The signal peptide was found to contain 16 aa and the mature protein was basic (MW = 30.5 kD; pI = 9.0). The shem9 protein is mainly composed of Gly (37 %), Tyr (18 %) and Leu/Ile (8.2 %) residues, and has a tyrosine-rich domain with a GnY motif (n = 1, 2 or 3; aa 159– 282). The C-terminal region displays a basic RRKKY amino acid sequence. Alignment with other shematrins indicated that shem9 is close to shem1 of P. fucata and to shem3 of P. maxima (Figure 2). Three peptides found in the whole-shell extracts of P. margaritifera were attributed to the shem8 sequence, which represents 5 % of the protein sequence (Figure 2, Table 1). Two other peptides identified as belonging to the shem9 sequence represented 9 % of the protein sequence (Figure 2, Table 1). These two shematrin-derived peptides were not detected in the nacre organic extracts; they are probably located in the prismatic layer. Molluscan prismatic and nacreous layer protein (MPN): The putative orthologue of MPN88 described in P. fucata (accession number: BAH05007) was not found in the P. margaritifera SSH library. The full-length sequence resulted from 5’ and 3’ RACE reactions (Table 2). The N-terminal region of MPN protein was obtained after 5’ RACE by using the reverse primer 5’-GGG TCG AAC TCC AAG GTC ACC AG-3’, which led to a product of 631 bp containing an ORF of 154 aa with a signal peptide of 21 residues (Figure 3). A 3’ RACE procedure was performed by using the forward specific primer indicated in Table 2, and yielded a cDNA product of 2257 bp (Figure 3) for a continuous ORF of 131 aa in the C-terminal region with a stop codon located 100 bp upstream the poly-A tail. The sequence located between the 5’ and 3’ ends was found by using specific primers designed from the two sequences obtained by RACE PCR (Table 2); a cDNA product of 2100 bp was amplified and the full-length sequence completed. The P. margaritifera MPN encodes a basic protein with 695 aa (MW = 64.24; pI = 11.10). This sequence was further validated as P. margaritifera MPN by means

Table 1. List of MS/MS peptides obtained after trypsin digestion of P. margaritifera acid-insoluble organic matrix from the whole shell or nacre layer. They were used to identify the shell matrix protein sequences found in the SSH library or by RACE PCR amplification. Peptides matching the P. margaritifera linkine sequence were identified from the whole shell and nacre organic matrix extracts, while peptides matching KRMP, shematrin and MPN sequences were not found in the nacre layer extracts; this suggests a prismatic location. Protein

Peptide sequence

Markers (100 % identities)

KRMPs shem8

YWWCLKR AFGLGGLSPATR GAAQGAATLSALGVASGRPSR VSGVSVGTGGGR GAAQGAATLSALGVASGVPSR VSGSSIGIGGGR VPLLPLGSLGGFGQGQTGQQTGAGR KPCTVTDPK TCPEGCFCPNIPR IPGVFGMDAYYCCK

KRMP4, -5, -6 shematrin-8 shematrin-2 (P.fuc) Gly-rich protein (P.fuc) shematrin-9 shematrin (P.max) MPN88 linkine linkine linkine

shem9 MPN linkine

952

www.chembiochem.org

Charge 2+ 2+ 3+ 2+ 2+ 2+ 3+ 2+ 2+ 2+

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

Experimental peptide mass 954.48 1145.60 1896.99 1031.53 1839.99 1045.56 2395.36 1044.52 1606.70 1679.74

ChemBioChem 2011, 12, 950 – 961

Identification of Mollusk Shell Proteins

Figure 2. Amino acid sequence alignment of the two shematrins (shem8 and -9) identified from the in-house SSH library of P. margaritifera (P.mar), with the homologous shematrins from P. fucata (P.fuc) and P. maxima (P.max). The signal peptides are shaded. The peptides from proteomics analyses of the shell organic matrix are boxed (Table 1). Symbols are the same as described for Figure 1.

Table 2. Primer sequences designed for the amplification of cDNA sequences of the shell matrix protein genes of P. margaritifera. Either 5’ RACE PCR alone or 5’ and 3’ RACE PCRs were used for the elongation of incomplete sequences identified in the in-house SSH library. For MPN transcripts, RACE PCRs were performed in order to obtain the 5’ and 3’ ends of the mRNA, then nested forward and reverse primers were designed to amplify a large PCR product by using primer walking sequencing. Protein

Method

shem8 shem9

RACE RACE RACE RACE PCR for primer walking sequencing RACE

MPN linkine

End to be amplified 5’ 5’ 5’ 3’ – 5’

of alignment with P. fucata MPN88 (Figure 3) and was deposited in GenBank (accession number: HQ259055). Four residues (Gly: 30 %, Gln: 11 %, Pro: 9.5 %, and Ser: 9.5 %) represent 60 % of the protein sequence of P. margaritiChemBioChem 2011, 12, 950 – 961

Forward primer

Reverse primer

GR5’ (manufacturer) GR 5’nested GR 5’nested GATGCGACCACCAGGACTAGG GTGGCCGCCGAAAATTTCCAC GR 5’nested

CGCCAATTGGTCCTGCACCGCC CGTATCCGTAGCCATATGAAGGAACAC GGGTCGAACTCCAAGGTCACCA GR3’ (manufacturer) CGCCGGCTCCTTGTGTAGAAC CCCGTCTGACAGCCATCCACAG

fera MPN. The signal peptide is putatively cleaved after Ala21, and the mature protein that is secreted in the extracellular space starts with four glutamines followed by five glycines. The sequence contains an unusual methionine-rich region with

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

www.chembiochem.org

953

S. Berland et al.

Figure 3. Amino acid sequence alignment of MPN from P. margaritifera (P.mar) obtained after 5’ and 3’ RACE PCR with total mRNA from mantle tissue, and of MPN88 (BAH05007) of P. fucata (P.fuc). The signal peptide of MPN is shaded. The MS/MS peptide obtained during proteomics analyses of the shell organic matrix is boxed. Symbols are as described for Figure 1.

several repeats of Gly-Met-Pro residues (aa 648–703 in our alignment), and several repeats of GnSnP/QF (n = 2 or 3) are observed within the C-terminal region (aa 448–695). A peptide containing 25 aa identified in the whole-shell matrix of P. margaritifera corresponded to an MPN region covering 3 % of the protein (Figure 3, Table 1). BLASTp analysis of this peptide showed its similarity with P. fucata MPN88. However, no other peptide matched with the P. margaritifera MPN88 sequence. This can be explained by the low frequency of Lys and Arg. Indeed, P. margaritifera MPN contains only seven sites available for trypsin digestion (positions 32, 44, 92, 104, 136, 243 and 467, in our alignment); some of them generate large peptides that could not be analyzed by the mass spectrometer used in our experiments. The only trypsin-derived peptide starting at position 44 was identified in the trypsin digest of shell extract. The MPN protein orthologue to MPN88 was found within the whole-shell digest but not in the organic fraction of nacre; this suggests a prismatic location for this protein. Linkine, a novel mollusk shell protein: The linkine sequence was only partially included in the SSH library constructed from P. margaritifera mantle tissue. The full-length sequence was ob-

954

www.chembiochem.org

tained after 5’ RACE PCR by using a reverse specific primer designed from the partial sequence (Table 2). A 797 bp product was obtained. Its translation revealed a continuous ORF of 111 aa with a typical signal peptide composed of 25 residues that ends with a stop codon located at 101 bp upstream of the reverse primer, the latter being located in the 3’-UTR region (data not shown). BLASTp analysis against nonredundant protein databases revealed that this short protein does not display consistent homology with other proteins. It was named linkine (accession number: ABO87300). The mature protein (MW = 9.5 kDa; pI = 9.4) is composed of 86 residues and is basic due to the high frequency of Lys (17.4 %), which is unusual for a shell matrix protein. Pro (9.3 %) and Cys (9.3 %) are the two other main amino acid residues. The central region of the protein is characterized by the presence of several Thr residues, which could be sites for O-glycosylation. Three peptides covering 32 % of the protein sequence allowed the easy identification of linkine in both the whole shell and the nacreous organic matrix extracts of P. margaritifera (Figure 4, Table 1). This clearly indicates that linkine is a true shell protein.

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

ChemBioChem 2011, 12, 950 – 961

Identification of Mollusk Shell Proteins P. margaritifera as reported above, the peptide sequences from proteomics data matched Figure 4. Amino acid sequence of linkine, a novel lysine-rich protein located in the shell of P. margaritifera. The to other cloned sequences that signal peptide is shaded. The MS/MS peptides obtained during proteomics analyses of the shell organic matrix were not found in our SSH liare boxed. brary (Table 3). Pif protein: Seven MS/MS peptides (28, 20, 17, 10, 9, 8 and Identification of shell matrix proteins from P. margaritifera 7 aa) obtained in the nacre layer extract of P. margaritifera by proteomics and screening of P. maxima mantle-derived matched two translated EST clones of P. maxima (GT283629 EST library and GT278653; Table 3). The homologous peptides were locatBLAST analysis of the invertebrate EST database by using the ed in the N-terminal region of these sequences, which dispeptides obtained from whole-shell matrix matched to numerplayed high sequence homology with the N-terminal region of ous clones of P. maxima mantle tissue EST library that was the Pif protein of P. fucata (accession number: BAH97338;[3] [25] sequenced by Jackson and co-workers. From these, we idenFigure 5). The protein sequences deduced from the translated tified shematrin and linkine sequences (data not shown). In clones were similar, except for Gly166 and Glu173 in clone addition to these shell matrix proteins already identified in GT278653, which were substituted with Arg and Gln in clone

Table 3. List of the MS/MS peptides generated by trypsin digestion of the acid-insoluble organic fraction of the nacre layer of P. margaritifera that matched to the translated EST clone sequences from mantle tissue from P. maxima EST clones, and then identified by using the MASCOT program. P. maxima clones

Protein annotation

GT283629 Pif97 protein GT278653 GT279532 chitin-binding domain of peritrophin-A (CBM-14) GT278440[a] GT280780

CLP1

GT279894

novel proteins GT278041

Peptide sequences

Charge

Experimental peptide mass

AMLMLVR GLSIDDSQIR FNVPHVTLNLGGDIVDSEVR AMLMLVR FNVPHVTLNLGGDIVDSEVR GTSPASSPSGVSPLATALVKPAPK ECPLGMFWDQPLLLCR LPGDVPCFDDR CIEGISVPACCPK YGYIVR ECPLGMFWDQPLLLCR ECPLGMFWDQPLLLCR IYVPEVFEDMHLFER INIGIPFFGR NPLLLQQILSQSK ASAGSGNLMK NPLLMMYLMK GGSGMGGILPLLMGEDAMK VGLSMDAPDSLFPIK DGASSNVQATTQSAPISLLFSPQNIPDVK MQAFIIFENEPILFNYR

2+ 2+ 3+ 2+ 3+ 3+ 2+ 2+ 2+ 2+ 2+ 3+ 3+ 2+ 2+ 2+ 2+ 2+ 2+ 3+ 3+

832.48 1102.58 2180.17 832.46 2180.22 2219.27 2033.99 1289.59 1489.70 769.42 2049.98 2049.99 1922.96 1132.66 1480.89 934.47 1252.66 1832.91 1604.80 2984.56 2160.26

[a] The MS/MS peptide that matched clone GT278440 was identified twice during RP-HPLC fractionation.

Figure 5. Amino acid sequence alignment of the two Pif variants deduced from clones GT283629 and GT278653, which were identified in P. maxima (P.max) EST library and in P. fucata (P.fuc) Pif sequence (accession number: BAH97338). The two cloned sequences are only partial sequences (N-terminal region) of P. maxima, and correspond to the Pif97 sequence of P. fucata. The five peptides that were generated during trypsin digestion of P. margaritifera nacre extracts, which allowed identification in P. maxima, are boxed. The signal peptide is shaded. For convenience, the complete sequence of P. fucata Pif, including residues 221–1000, is not shown. Symbols are as described for Figure 1.

ChemBioChem 2011, 12, 950 – 961

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

www.chembiochem.org

955

S. Berland et al. GT278653, respectively. Both possessed the identical signal peptide of 22 aa. Alignment with the P. fucata Pif amino acid sequence indicated that the two peptides correspond to two polymorphic variants of incomplete P. maxima Pif sequences with 209 and 188 residues, versus 1009 aa in the coding sequence of P. fucata Pif protein (Figure 5). In addition, these peptides belong to the protein region housing the large Von Willebrand factor type A domain (VWFA) of Pif protein (aa 29– 202), which could bind collagen in mammals.[28] The Pif protein also possesses two chitin-binding domains in its central region. The peptides from P. margaritifera cover 34.4 % of the largest P. maxima sequence. Shell matrix peptides containing chitin-binding domains: Four peptides (27, 16, 11 and 6 aa) obtained from the nacre layer extract of P. margaritifera matched two translated clones (GT279532 and GT278440) found in the P. maxima EST library (Figure 6). These nucleotide sequences encode two peptides of 228 and 220 residues that are polymorphic variants of a gene encoding a single, larger but incomplete peptide of 289 aa (pI = 6.5). This protein region contains two domains similar to the chitin-binding peritrophin-A domain (Pfam: CBM-14 superfamily; aa 266–319 and 324–379 in our alignment; Figure 6, Table 3). This domain is found in chitin-binding proteins, and particularly peritrophic matrix proteins of chitinases.[29] Alignment of the two peptides with the P. fucata Pif97 amino acid sequence revealed similarities within the chitin-binding domains. Particularly, most cysteins, which are characteristic of such domains that create disulfide bonds, were conserved (Figure 6). However, the large number of sequence variations of these peptides with that of the P. fucata Pif97 sequence did

not allow to conclude whether they belong to P. maxima Pif97 or to another closely related protein. The peptides from P. margaritifera covered 15.2 % of the sequence from P. maxima CLP1: Two peptides (15 and 10 aa) from the nacre layer of P. margaritifera matched the P. maxima EST clone GT280780, which encodes a partial amino acid sequence of 219 residues. This peptide exhibited high sequence similarity with chitinases (GH18 chitinase-like superfamily) particularly the chitinase-like protein 1 (CLP1) of Crassostrea gigas (accession number: CAI96028;[30] Figure 7). The peptides from P. margaritifera covered 11.4 % of the sequence of P. maxima (Table 3). Sequence alignment with C. gigas CLP1 indicated that the GT280780 sequence is homologous to the central region of this protein containing 472 residues (Figure 7). New shell matrix proteins: The peptides from the whole-shell extract of P. margaritifera matched two P. maxima EST clones, GT279894 and GT278041, both encoding incomplete protein sequences, which neither exhibited sequence similarity with any other proteins in the databases nor possessed a known domain (Figure 8, Table 3). The first sequence (clone GT279894) leads to a basic peptide with 244 residues (pI = 9.3). The four P. margaritifera peptides (19, 13, 10 and 10 aa) covered 21.3 % of the P. maxima sequence. This peptide is mainly composed of four residues: Ala (17.2 %), Leu (16.8 %), Ser (16 %) and Gly (15.2 %). The sequence contains four Ala-rich regions (AXAXAX; X is either S, N, K or G) and repeats of seven amino acids, MKNPLLL (Figure 8). The second sequence (clone GT278041) leads to a peptide with 224 residues (pI = 6.7). The three P. margaritifera peptides (29, 17 and 15 aa) covered 27.2 % of the P. maxima sequence

Figure 6. Amino acid sequence alignment of the two putative chitin-binding peptides deduced from clones GT279532 and GT278440 identified in the P. maxima (P.max) EST library by proteomics and of P. fucata (P.fuc) Pif protein. The peptides generated during trypsin digestion of the nacre organic extract of P. margaritifera are boxed. The two chitin-binding domains are indicated in bold text. The cysteines putatively involved in disulfide bond formation are shaded. For convenience, the complete sequence of P. fucata Pif, including residues 521–1000, is not shown. Symbols are as described for Figure 1.

956

www.chembiochem.org

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

ChemBioChem 2011, 12, 950 – 961

Identification of Mollusk Shell Proteins

Figure 7. Amino acid sequence alignment of clone GT280780 obtained from P. maxima (P.max) EST library and CLP1 of C. gigas (C.gig). The two peptides identified in the nacre layer of P. margaritifera are boxed. Sequence alignment between the translated clone GT280780 from the P. maxima EST library.

Figure 8. The two novel, incomplete protein sequences deduced from clones GT279894 and GT278041 from the P. maxima (P.max) EST library and identified with the peptides (boxed) obtained from the P. margaritifera nacre layer extracts. The clone sequences lacked both a signal peptide and a stop codon. Note, the Ala-rich regions (italics) and the presence of some amino acid repeats (bold).

and Ser (9.4 %), Ala (8 %) and Val (6.7 %) are the three major residues represented.

Discussion The present work brings to focus how combining proteomics and transcriptomics data is helpful for identifying the proteins located within the shell matrix of the pearl oyster P. margaritifera. Indeed, to date, accurately characterizing the proteins incorporated within the shell matrix is a major challenge for understanding how these molecules are processed once they are secreted into the extracellular matrix and their function during shell formation and mineralization. Proteomics analysis reveals protein targeting to the shell Obtaining protein sequences after translation of a cDNA library constructed from mollusk mantle cells, and showing that they possess a signal peptide allowing extracellular secretion, are two important steps for understanding shell growth; nevertheless this alone is not sufficient. In fact, are all these proteins really incorporated into the shell matrix or are they only secreted in the pallial space? In this study we show that combining proteomics and transcriptomics goes in the right direction to answer this question clearly. In addition, this proteomics apChemBioChem 2011, 12, 950 – 961

proach allowed the annotation of sequenced clones from a P. maxima EST library obtained from the mantle tissue. For instance, the RYWWCLKR peptide found within the shell matrix extracts favors the argument that KRMP, first identified in the cDNA library (in-house SSH library), was really exported to the shell matrix. KRMPs were identified first in P. fucata after cDNA sequencing[17] but no further study indicated that they were present within the shell matrix. Similarly, other peptides show that shematrin and MPN88 orthologues of P. fucata proteins are also exported to the P. margaritifera shell matrix. The same finding applies to P. margaritifera linkine, and given that its sequence lacked residues and/or repeat motifs that were characteristic of shell matrix proteins its location within the shell matrix was doubtful. However, the MS/MS peptides generated from the nacre layer matrix of P. margaritifera shell revealed that linkine is a shell matrix protein. After interrogation of the NCBI EST database, we found that some peptides generated from P. margaritifera shell organic matrix matched to various translated EST clones that were generated from P. maxima mantle tissue.[25] This demonstrated that the deduced proteins, or only partial sequences, should be considered as shell matrix components in the two species, P. maxima and P. margaritifera. Two P. maxima EST clones matched with the N-terminal region of the Pif protein identified in P. fucata.[3] More precisely,

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

www.chembiochem.org

957

S. Berland et al. this region belonged to Pif97, a large peptide resulting from post-translational cleavage of Pif177 (Pif177 is cleaved in Pif97 and Pif80), the full-length protein deduced from the cDNA sequence. Pif was shown to bind aragonite crystals and the down-regulation of its gene arrested the growth of the nacreous layer; this suggests that Pif is a key molecule in the induction of aragonite crystal formation. Using immunolocalization, Suzuki and co-workers[3] demonstrated that Pif80 was located in the nacreous layer. Our findings, from proteomics and P. maxima EST clone screening, suggest not only the presence of Pif97 orthologues in the three Pinctada species but also they have well-conserved sequences. The Pif97 sequence also houses, in its central region, a chitin-binding domain. During our annotation of P. maxima EST library using the P. margaritifera peptides, three of them matched two clones the translated peptides of which were also putative chitin-binding sequences. These sequences showed some similarity with the homologous sequence of Pif97, but in view of our findings, there is no reason to consider these to be Pif97 sequences. However, we believed they could belong to a closely related protein with similar chitinbinding function. These results suggest that these proteins could be involved in the interaction with the chitin within the Pinctada shell. Chitin is a polysaccharide widely distributed in living organisms and is considered to be a fundamental template in biomineralization, as an alternative to collagen.[31] In bivalves, chitin is involved in shell matrix scaffolding and especially for nacre, in which it was proposed to make the core of the organic framework surrounding the aragonite tablets.[32] Recently, Jackson et al.[25] reported that the P. maxima EST library contained gene products associated with the production, modification or interaction with chitin. For example, CL355Contig1 has similarity with chitinase, and its level of expression is low (297 transcripts per million) when compared to shematrin (80 748 transcripts per million). In addition, we identified an EST clone homologous to a region of the CLP1 of C. gigas identified by Badariotti et al.[30] These authors showed that this protein was expressed in the mantle edge of adult oysters. It possesses a strong chitin-binding activity but does not hydrolyze it. Interestingly, CLP1 activates the proliferation of rabbit chondrocytes and stimulates the synthesis of glycosaminoglycans; this suggests that its functions are similar to a growth factor. Also, the authors supposed that CLP1 could be a shell protein involved in the control of shell formation. Our data confirm, by the proteomics annotation of P. maxima EST library, that P. maxima and P. margaritifera CLP1 orthologues are present within the shell. The P. margaritifera peptide sequences also allowed the identification of two new, incomplete proteins from the P. maxima EST library. These do not show sequence similarity with known proteins but their high content in Ala, Gly and Asp residues is in agreement with the general amino acid composition of mollusk shell proteins. Further 5’ and 3’ RACE PCR experiments will be performed in order to obtain the fulllength mRNA sequences, which will highlight the nature of these new shell matrix proteins.

958

www.chembiochem.org

Accurate identification of protein sequences led to the identification of polymorphic variants and orthologues of shell matrix proteins Specific transcripts of the shell-forming cells of P. margaritifera allowed the characterization of proteins previously described in the literature as shell matrix components of bivalves: KRMP,[17] shematrins[16] and MPN88. However, several KRMP and shematrin sequences were obtained in the P. margaritifera shell matrix extracts and they exhibited variable percentage of amino acid identity. This raises the question of whether the different KRMPs or shematrins should be considered as orthologues, that is, proteins in different species deriving from a common ancestral gene, as paralogues, that is, proteins deriving from the duplication of a genome, as isoforms, that is, proteins deriving from a single gene but being submitted to alternative splicing, or as polymorphic variants, that is, the same protein exhibiting sequence variations from one individual to another. The latter possibility implies that RNA extracts from several specimens of the same species were mixed, which is the case in the present study. Answering this question is of importance for phylogenetic applications.[33] However, when genomic data are lacking, as in the case of mollusk species investigated here and in previous studies, interspecies comparison of these protein sequences are necessary. This reasoning is well illustrated herein with the KRMP and shematrins obtained from two or three Pinctada species. Considering KRMPs, our data show first that the P. margaritifera and P. fucata (-4, -5, -6 and -1, -2, -3, -4, respectively) sequences are similar, that is, they are certainly related proteins. Second, they regroup each in a cluster, that is, they are more similar within each species than between species; this pattern is expected when considering polymorphic variants of a single protein. Thus, we can conclude that all these KRMPs belong to the same protein, KRMP, and that in each species the different KRMPs are simply polymorphic variants resulting from mixing RNA extracts from several individuals. They should be named KRMP variant -1, -2, -3, etc., in each Pinctada species. The presence of intra- and interspecies variable sequences indicate that the regions subjected to variations are not important for the correct functioning of the protein, as shown for instance, by the number of GY repeats in P. fucata (low number) and in P. margaritifera (high number). In this regard, shematrins contrasts with KRMPs. In P. margaritifera shem8 and shem9 displayed a low percentage of sequence identity (53 %), but each of them has a high percentage of sequence identity with other shematrins from P. fucata and P. maxima. For instance, shem8 shows a high sequence identity (78 %) with P. fucata shem2, whereas shem9 is similar to P. fucata shem1 (71 %). These data indicate that there are two shematrins, one represented by shem8 and shem2, and the other by shem9 and shem1. Amino acid prevalence in the proteins and relevance to shell matrix architecture A large number of glycines is specific to shematrins, KRMPs and MPN, in which this residue represents 30 to 40 % of the

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

ChemBioChem 2011, 12, 950 – 961

Identification of Mollusk Shell Proteins amino acid sequences. In living systems, a solid architecture usually relies on glycine-rich protein scaffolding, as exemplified by the cell wall in plants, and collagens, keratins and silk in animals.[34] In addition, shematrins and KRMPs are enriched in tyrosine, the second most frequent residue that possesses a chemically-active lateral chain. The presence of numerous tyrosines denotes a putative oxidative activity for tyrosinases (e.g., Pfty1 and Pfty2 proteins that are found in the prismatic shell layer of P. fucata[35]). Oxidation of tyrosine side chains into 3,4dihydroxyphenylanine can lead to the formation of cross-links as discussed by Zhang et al.[17] regarding the role of tyrosine residues in P. fucata KRMP proteins. Lysine, which is the most abundant amino acid in linkine (17.4 %), could be involved in protein cross-linking through the chemical reactivity of its side chain as observed for elastin, a major Lys-rich component of mammalian extracellular proteins.[36] In elastin, the formation of protein cross-links is catalyzed by lysyl oxidases, which deaminate some lysine residues into reactive semi-aldehydes (allysine). The latter can react with corresponding aldehydes on adjacent polypeptide chains to form condensation products, or with native lysine residues to form bifunctional cross-links, such as lysinonorleucine and hydroxylysinonorleucine.[37] We assume that the organic matrix of Pinctada shell is probably built by such protein cross-links but experimental evidence describing the structure of such features are not available yet. It also appears that KRMP and shematrins could be related proteins as indicated by regions with similar signatures (i.e., GnY and GnI repeats). The mature protein is organized into similar regions with variable sequences separated with regions with GnY/I repeats, which represent two-thirds of the fulllength sequence, and finally the R(K/P/R)KKY motif as a C terminus signature. The KRMP variable sequence is balanced by evenly spaced tryptophanes. Limits of the proteomics approach for identifying shell matrix protein sequences in nucleotide databases In order to identify novel matrix proteins in the shell of P. margaritifera, the peptides generated from acid-insoluble organic matrix of whole shell or nacre layer were used to interrogate a specific SSH library composed of 72 cDNA sequences, completed with RACE PCR. This approach led to the discovery of a novel, intracellular protein, calconectin[23] and also five fulllength sequences of KRMPs and shematrins already known in P. fucata. The other 66 clones did not match to any MS/MS peptides; this led to poor annotation of the SSH library. When using the P. maxima EST library derived from mantle tissue, only seven clone sequences (out of 2850) were annotated by using the peptides derived from P. margaritifera shell organic matrix; this led to a few, incomplete amino acid sequences, some of which probably belong to new shell matrix proteins. Several reasons could explain these unexpected results. First, and probably the major one, is related to the number of lysine and arginine residues available for trypsin cleavage in the protein sequences present in the shell extracts. For example, mature linkine (86 aa) possesses nine trypsin sites with a cleavage probability of 100 % and that would result in seven small ChemBioChem 2011, 12, 950 – 961

peptides (< 15 aa). In contrast, mature MPN (674 aa) contains only seven trypsin cleavage sites, which generate four small peptides (< 30 aa) and three long peptides (> 100 residues), the latter are not fragmented under low energy collisions in the mass spectrometer that is generally used in proteomics. We can suppose that the SSH and EST libraries we screened during this study contained matrix proteins or peptides with a few trypsin digestion sites, that did not allow their annotation from the small peptides generated from trypsin digestion of the shell organic matrix. We propose that other protein cleavage agents must be investigated with the aim to generate peptides with optimal length. Another factor to be considered is the post-translational modifications that could protect the proteins from degradation, and/or in vivo cleavage of the mature protein during the mineralization and maturation process within the shell matrix that could affect their detection.

Conclusions The identification of shell matrix proteins of P. margaritifera was improved by combining proteomics and the annotation of a P. margaritifera SSH library and of an EST database created from mantle tissue of P. maxima. Coupling of proteomics and transcriptomics approaches can narrow the gap between gene expression in mantle epithelium and final addressing of proteins in mineralized tissues. This strategy will speedup the identification of true shell proteins and will bring information about their potential function in shell patterning, and could be used for other mineralizing systems, for example, in organisms the protein components of which are scarcely documented. In the near future, better understanding of the processes directing shell formation and mineralization would be possible through such comparative analysis; however, more genomics and transcriptomics data and improvement in proteomics approaches is required.

Experimental Section cDNA library construction: The SSH library was constructed from adult P. margaritifera specimens from the French Polynesia. The SSH technique was adapted from Diatchenko et al.[27] Shell-forming cells from the mantle edge and muscle tissues were dissected from fresh specimens and immediately stored in liquid nitrogen until RNA isolation. Experimental procedures for RNA extraction, obtaining differential cDNA inserts, sequencing and comparison of the SSH clones are described in detail elsewhere.[23] Rapid amplification of cDNA ends: Total RNA was extracted from mantle tissue by using the TriZOL reagent (Sigma). RACE PCR was performed by using the GeneRacer Kit (Invitrogen) according to the manufacturer’s instructions. The 5’ and 3’ RACE products were resolved by agarose gel electrophoresis, and purified and cloned in a pCR4-TOPO cloning vector by using the TOPO T/A cloning kit (Invitrogen). The recombinant plasmids were sequenced by the Qiagen sequencing service. Sequencing of the PCR products encoding MPN was performed by GATC Biotech (Marseille, France). The primers are indicated in Table 2. Sequence analysis: Protein identification was performed by using the BLASTp tool (http://blast.ncbi.nlm.nih.gov/) against the data-

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

www.chembiochem.org

959

S. Berland et al. base of nonredundant protein sequences. Peptide signal prediction was carried out by using the SIG-Pred program (http://bmbpcu36. leeds.ac.uk). Alignments were performed by using the Se-Al v2.0a11 software (http://tree.bio.ed.ac.uk/software/seal)[38] and were verified manually. Amino acid composition, molecular weight and theoretical isoelectric points were determined by using the ProtParam tool available in the ExPASy proteomics server (http:// www.expasy.ch/). The protein sequences were scanned for known motifs by using the MotifScan tool with pfam_ls (global models) and the motif database (http://www.expasy.ch). The RADAR program was used for rapid automatic detection and alignment of repeats in protein sequences (http://www.expasy.ch). Prediction of trypsin cleavage was carried out by using the PeptideCutter program (http://www.expasy.ch). Extraction of acid-insoluble shell organic matrix: The acid-insoluble shell organic matrix was isolated as previously described.[12, 26] Briefly, shells were cleaned from residual periostracum and external contaminants by scratching. Whole shell (calcite and aragonite layers; 10 g) or the nacre layer only (50 g) were mechanically crushed and decalcified in acetic acid (6.5 m) for 6 h at room temperature under continuous stirring. After centrifugation (22 000 g; 4 8C; 20 min) the supernatant was discarded, and the acid-insoluble material was thoroughly washed with water before lyophilization. Trypsin digestion of acid-insoluble shell organic matrix: The lyophilized insoluble material (200 mg) was resuspended in ammonium bicarbonate (5 mL; 100 mm), and reduced while being stirred at 60 8C with DTT (5 mm). After being cooled to room temperature, alkylation was performed in the dark with iodoacetamide (20 mm) while being stirred. Pellets obtained after centrifugation were washed five times with ammonium bicarbonate (1 mL) and lyophilized. The residues (ca. 2 mg) was suspended in ammonium bicarbonate (50 mm) containing acetonitrile (5 %) and were digested with trypsin (20 mg; Sigma) at 37 8C for 18 h. After centrifugation, the clear supernatants were lyophilized. The dry digests were dissolved in water (100 mL), centrifuged and 10 mL was used for the proteomics analyses.[12, 26] Peptide fractionation: RP-HPLC of tryptic peptides was carried out on a C18 column (150 Ò 0.5 mm, Cluzeau, France) at a flow rate of 20 mL min¢1 with formic acid (0.1 %) in water (eluent A), and acetonitrile (eluent B). The gradient was 3 to 70 % of B in 60 min. The eluted peptides were analyzed in an ESI-QqTOF hybrid mass spectrometer (Pulsar i, Applied Biosystems) by using the information dependant acquisition (IDA) mode, which maximizes the number of peptides fragmented in LC-MS experiments by using parameters such as ion intensity and exclusion time (see below). The system allows switching between MS and MS/MS experiments. After 2 s MS spectrum, two most intense multiply charged precursor ions +2 to +4) could be selected for 2 s MS/MS spectral acquisitions. (+ In order to avoid reanalysis, the mass-to-charge ratios of the fragmented precursor ions selected were excluded for further analysis for 60 s (exclusion time). The data were acquired and analyzed with the Analyst QS software (version 1.1). The minimum threshold intensity of the ion was set to ten counts. The ion spray and declustering potentials were 5200 and 50 V, respectively. The collision energy for the gas phase fragmentation of the precursor ions was determined automatically by the IDA based on their m/z values. Data analysis: Database search was carried out by using the inhouse MASCOT search engine (Matrix Science, London, UK; version 2.1). Fragment ions obtained from MS/MS experiments were searched both against the protein sequences of P. margaritifera deduced from the in-house SSH library after translation into the six

960

www.chembiochem.org

ORFs, and the invertebrate EST database downloaded (June 2010) from the NCBI server (http://www.ncbi.nlm.nih.gov). The parameters used for the database search were: mass tolerance 0.5 Da for MS and MS/MS experiments, one missed cleavage, carbamido methylation as fixed modification and methionine oxidation as variable modification. At least two distinct peptides were required for the protein identification to be valid. The proteins identified with only one peptide were considered as valid when the peptide ion score was higher than 50 and/or the peptide was analyzed more than once. The quality of MS/MS spectra was checked manually to confirm the accuracy of the protein identification.

Keywords: linkine · Pinctada margaritifera · proteins · proteomics · shell · transcriptomics [1] N. Watabe, J. Ultrastruct. Res. 1965, 12, 351 – 370. [2] F. Nudelman, B. A. Gotliv, L. Addadi, S. Weiner, J. Struct. Biol. 2006, 153, 176 – 187. [3] M. Suzuki, K. Saruwatari, T. Kogure, Y. Yamamoto, T. Nishimura, T. Kato, H. Nagasawa, Science 2009, 325, 1388 – 1390. [4] J. C. Marxen, M. Nimtz, W. Becker, K. Mann, Biochim. Biophys. Acta 2003, 1650, 92 – 98. [5] L. Pereira-Mouries, M.-J. Almeida, C. Ribeiro, J. Peduzzi, M. Barth¤lemy, C. Milet, E. Lopez, Eur. J. Biochem. 2002, 269, 4994 – 5003. [6] B.-A. Gotliv, L. Addadi, S. Weiner, ChemBioChem 2003, 4, 522 – 529. [7] K. Simkiss, Comp. Biochem. Physiol. 1965, 16, 427 – 435. [8] H. Miyamoto, T. Miyashita, M. Okushima, S. Nakano, T. Morita, A. Matshushiro, Proc. Natl. Acad. Sci. USA 1996, 93, 9657 – 9660. [9] C. Zhang, R. Zhang, Mar. Biotechnol. 2006, 8, 572 – 586. [10] F. Marin, G. Luquet, B. Marie, D. Medakovic, Curr. Top. Dev. Biol. 2008, 80, 209 – 276. [11] B. Marie, G. Luquet, L. B¤douet, C. Milet, N. Guichard, D. Medakovic, F. Marin, ChemBioChem 2008, 9, 2515 – 2523. [12] B. Marie, F. Marin, A. Marie, L. B¤douet, L. Dubost, G. Alcaraz, C. Milet, G. Luquet, ChemBioChem 2009, 10, 1495 – 1506. [13] K. Mann, I. M. Weiss, S. Andr¤, H.-J. Gabius, M. Fritz, Eur. J. Biochem. 2000, 267, 5257 – 5264. [14] F. Marin, P. Corstjens, B. De Gaulejac, E. de Vrind-De Jong, P. Westbroek, J. Biol. Chem. 2000, 275, 20667 – 20675. [15] M. Kono, N. Hayashi, T. Samata, Biochem. Biophys. Res. Commun. 2000, 269, 213 – 218. [16] M. Yano, K. Nagai, K. Morimoto, H. Miyamoto, Comp. Biochem. Physiol. Part B 2000, 144, 254 – 262. [17] C. Zhang, L. Xie, J. Huang, X. Liu, R. Zhang, Biochem. Biophys. Res. Commun. 2000, 344, 735 – 740. [18] X. Shen, A. Belcher, P. Hansma, G. Stucky, D. Morse, J. Biol. Chem. 1997, 272, 32472 – 32481. [19] S. Sudo, T. Fujikawa, T. Nagakura, T. Ohkubo, K. Sakaguchi, M. Tanaka, K. Nakashima, Nature 1997, 387, 563 – 564. [20] M. Michenfelder, G. Fu, C. Lawrence, J. C. Weaver, B. A. Wustman, L. Taranto, J. S. Evans, D. E. Morse, Biopolymers 2003, 70, 522 – 533. [21] B. A. Gotliv, N. Kessler, J. L. Sumerel, D. E. Morse, N. Tuross, L. Addadi, S. Weiner, ChemBioChem 2005, 6, 304 – 314. [22] H. Miyamoto, M. Yano, T. Miyashita, J. Moll. Stud. 2003, 69, 87 – 89. [23] D. Duplat, M. Puiss¤gur, L. B¤douet, M. Rousseau, H. Boulzaguet, C. Milet, D. Sellos, A. Van Womhoudt, E. Lopez, FEBS Lett. 2006, 580, 2435 – 2441. [24] D. Jackson, C. McDougall, K. Green, F. Simpson, G. Wçrheide, B. Degnan, BMC Biol. 2006, 22, 4 – 40. [25] D. Jackson, C. McDougall, B. Woodcroft, P. Moase, R. A. Rose, M. Kube, R. Reinhardt, D. S. Rokhsar, C. Montagnani, C. Joubert, D. Piquemal, B. Degnan, Mol. Biol. Evol. 2010, 27, 591 – 608. [26] L. B¤douet, A. Marie, L. Dubost, J. P¤duzzi, D. Duplat, S. Berland, M. Puiss¤gur, H. Boulzaguet, M. Rousseau, C. Milet, E. Lopez, Mar. Biotechnol. 2007, 9, 638 – 649. [27] L. Diatchenko, A. P. Campbell, A. Chenchik, F. Moqadam, B. Huang, S. Lukyanov, K. Lukyanov, N. Gurskaya, E. Sverdlov, P. Siebert, Proc. Natl. Acad. Sci. USA 1996, 93, 6025 – 6030.

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

ChemBioChem 2011, 12, 950 – 961

Identification of Mollusk Shell Proteins [28] E. G. Huizinga, R. Martijn van der Plas, J. Kroon, J. J. Sixma, P. Gros, Structure 1997, 5, 1147 – 1156. [29] Z. Shen, M. Jacobs-Lorena, J. Biol. Chem. 1998, 273, 17665 – 17670. [30] F. Badariotti, M. Kypriotou, C. Lelong, M.-P. Dubos, E. Renard, P. Galera, P. Favrel, J. Biol. Chem. 2006, 281, 29583 – 29596. [31] H. Ehrlich, Int. Geol. Rev. 2010, 52, 661 – 699. [32] M. Suzuki, S. Sakuda, H. Nagasawa, Biosci. Biotechnol. Biochem. 2007, 71, 1735 – 1744. [33] N. Al-Hashimi, J.-Y. Sire, S. Delgado, J. Mol. Evol. 2009, 69, 635 – 656. [34] G. Sachetto-Martins, L. O. Franco, D. E. de Oliveira, Biochim. Biophys. Acta 2000, 1492, 1 – 14.

ChemBioChem 2011, 12, 950 – 961

[35] K. Nagai, M. Yano, K. Morimoto, H. Miyamoto, Comp. Biochem. Physiol. Part B 2007, 146, 207 – 214. [36] K. Reiner, J. McCormick, R. Rucker, FASEB J. 1992, 6, 2439 – 2449. [37] P. Brown-Ausburger, C. Tisdale, T. Broekelmann, C. Sloan, R. P. Mecham, J. Biol. Chem. 1995, 270, 17 778 – 17 783. [38] A. Rambaut, Se-Al Sequence Alignment Editor v2.0, Department of Zoology, Oxford, 1996.

Received: November 5, 2010 Published online on March 14, 2011

Ð 2011 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim

www.chembiochem.org

961