Bushmeat genetics: setting up a reference ... - Pablo A. Tedesco

species was based on the field guide of Kingdon (1997). In some cases, we relied ... MgCl2 and 0.5–1.5 U Taq DNA polymerase (Q-BIO- gene, Illkirch, France).
1MB taille 1 téléchargements 327 vues
Molecular Ecology Resources (2014)

doi: 10.1111/1755-0998.12334

Bushmeat genetics: setting up a reference framework for the DNA typing of African forest bushmeat PHILIPPE GAUBERT,* FLOBERT NJIOKOU,† AYODEJI OLAYEMI,‡ PAOLO PAGANI,§ SYLVAIN DUFOUR,¶ EMMANUEL DANQUAH,** MAC ELIKEM K. NUTSUAKOR,** GABRIEL NGUA,††  ALAIN-DIDIER MISSOUP,‡‡ PABLO A. TEDESCO,§§ R EMY DERNAT¶¶ and AGOSTINHO ANTUNES***††† *Institut des Sciences de l’Evolution de Montpellier – UM2-CNRS-IRD, Universite Montpellier 2, Place Eugene Bataillon – CC 64, 34095 Montpellier Cedex 05, France, †Laboratoire de Parasitologie et d’Ecologie, Faculte des Sciences, Universite de Yaounde I, BP 812 Yaounde, Cameroon, ‡Natural History Museum, Obafemi Awolowo University, Ho 220005 Ile-Ife, Osun State, Nigeria, §Dutch Wildlife Health Centre, Faculty of Veterinary Medicine, Yalelaan 1, 3584 CL Utrecht, The Netherlands, ¶SYLVATROP, 26 route de Vannes, Nantes, France, **Department of Wildlife and Range Management, Faculty of Renewable Natural Resources, Kwame Nkrumah University of Science and Technology, University Post Office, Kumasi, Ghana, ††Amigos de la Naturaleza y del Desarrollo de Guinea Ecuatorial (ANDEGE), Barrio Ukomba, S/N, Bata, Equatorial Guinea, ‡‡Biologie de l’Evolution Mammalogie, Departement de Biologie des Organismes Animaux, Faculte des Sciences, Universite de Douala, BP 24157 Douala, Cameroon, §§Departement Milieux et Peuplements Aquatiques, Museum National d’Histoire Naturelle, UMR Biologie des ORganismes et des Ecosystemes Aquatiques (UMR BOREA IRD 207-CNRS 7208-UPMC-MNHN), 43 rue Cuvier, FR-75231 Paris Cedex, France, ¶¶Institut des Sciences de l’Evolution – CNRS UMR 5554, Plateforme Bioinformatique LabEx, Universite Montpellier 2, Place Eugene Bataillon, 34095 Montpellier Cedex 05, France, ***CIMAR/CIIMAR, Centro Interdisciplinar de Investigacß~ao Marinha e Ambiental, Universidade do Porto, Rua dos Bragas, 177, 4050-123 Porto, Portugal, †††Departamento de Biologia, Faculdade de Ci^encias, Universidade do Porto, Rua do Campo Alegre, 4169-007 Porto, Portugal

Abstract The bushmeat trade in tropical Africa represents illegal, unsustainable off-takes of millions of tons of wild game – mostly mammals – per year. We sequenced four mitochondrial gene fragments (cyt b, COI, 12S, 16S) in >300 bushmeat items representing nine mammalian orders and 59 morphological species from five western and central African countries (Guinea, Ghana, Nigeria, Cameroon and Equatorial Guinea). Our objectives were to assess the efficiency of cross-species PCR amplification and to evaluate the usefulness of our multilocus approach for reliable bushmeat species identification. We provide a straightforward amplification protocol using a single ‘universal’ primer pair per gene that generally yielded >90% PCR success rates across orders and was robust to different types of meat preprocessing and DNA extraction protocols. For taxonomic identification, we set up a decision pipeline combining similarity- and tree-based approaches with an assessment of taxonomic expertise and coverage of the GENBANK database. Our multilocus approach permitted us to: (i) adjust for existing taxonomic gaps in GENBANK databases, (ii) assign to the species level 67% of the morphological species hypotheses and (iii) successfully identify samples with uncertain taxonomic attribution (preprocessed carcasses and cryptic lineages). High levels of genetic polymorphism across genes and taxa, together with the excellent resolution observed among species-level clusters (neighbour-joining trees and Klee diagrams) advocate the usefulness of our markers for bushmeat DNA typing. We formalize our DNA typing decision pipeline through an expert-curated query database – DNABUSHMEAT – that shall permit the automated identification of African forest bushmeat items. Keywords: Africa, bushmeat, decision pipeline, DNA typing, mammals, mtDNA Received 12 March 2014; revision received 17 September 2014; accepted 19 September 2014

Introduction Correspondence: Philippe Gaubert, Fax: +3 346 714 3622; E-mail: [email protected]

© 2014 John Wiley & Sons Ltd

Bushmeat is the wild game (mostly mammals), that is hunted by local communities for subsistence and trade. Although considered illegal in many countries, the bush-

2 P. GAUBERT ET AL. meat market is a flourishing economic activity that supports a multimillion dollar worldwide economy (Nasi et al. 2008). Following recent socioeconomic transformations, including increased pressures from burgeoning human populations, and commercial logging, but also the generalized use of firearms, the volume of bushmeat hunting has reached unsustainable levels (Fa et al. 2005; Nasi et al. 2008; Jenkins et al. 2011). This ‘bushmeat crisis’ is particularly visible in western and central Africa, where bushmeat has traditionally been the main source of animal protein and revenue for rural populations (Asibey 1977). Off-takes exceed several millions of tons each year (Davies 2002; Brown & Davies 2007), and in Central Africa, 100% of the targeted mammalian species were considered to be hunted at unsustainable levels (Nasi et al. 2008). The challenges and imperatives of addressing the bushmeat crisis on the African continent are numerous, including the need to secure sustainable access to natural protein resources, while concurrently reducing the occurrence and mitigating the effects of zoonotic pandemics that can be spread through the bushmeat market (Kilonzo et al. 2013). Understanding and mitigating the bushmeat market first relies on the accurate identification of the species being traded, notably for conservationists and national wildlife corps engaged in the control of bushmeat activities (Ogden et al. 2009). A large percentage of the bushmeat sold in markets consists of smoked and processed meat (Willcox & Nambu 2007), which is difficult or impossible to identify accurately. As a consequence, distinguishing between legal and illegal trade is difficult or impossible and surveys to estimate the impact of bushmeat activities using standard protocols (i.e. phenotypic recognition of carcasses) likely underestimate the number of individuals and species involved (Olayemi et al. 2011). Forensic science techniques, including DNA typing methods such as ‘forensically informative nucleotide sequencing’ (FINS), have been applied successfully to species-level identification of illegally hunted wildlife (Thommasen et al. 1989; Bartlett & Davidson 1992; Verma & Singh 2003; Baker 2008). The application of DNA typing (using a short series of informative mitochondrial gene fragments) and barcoding (targeting the standard barcode fragment of cytochrome c oxidase 1 ‘COI’; Hebert et al. 2003) for the identification of African bushmeat has demonstrated a potential for DNA-based species identification. However, these studies have been limited taxonomically to a few mammalian orders and/ or geographically restricted to a given region (Malisa et al. 2006; Eaton et al. 2010; Ghobrial et al. 2010; Ntie et al. 2010; Bitanyi et al. 2011; Olayemi et al. 2011; Minh os et al. 2013). Most importantly, the heterogeneity of the molecular markers and PCR techniques used in these

studies may not be sufficient for standardized species identification across the African continent. Here, we propose a reference framework for the DNA typing of African forest bushmeat (i.e. mammals) that would facilitate the implementation of a standardized DNA-based species identification tool. To circumvent the potential caveats of DNA typing nonvouchered specimens (for ethical reasons, no carcasses were bought from bushmeat markets), we relied on the production of multiple FINS and the use of a taxonomic expert ‘loop’ to validate the genetic identification of the sampled animals. Through a collaborative regional framework across five African countries (Guinea, Ghana, Nigeria, Cameroon and Equatorial Guinea), we sequenced four mitochondrial gene fragments in >300 bushmeat items representing nine mammalian orders. Our first objective was to assess the PCR amplification efficiency of our mtDNA markers on bushmeat samples (i.e. their ability to amplify a wide range of species from potentially poor quality and degraded samples) across different methods of DNA extraction. Second, we evaluated the efficacy of our mitochondrial DNA (mtDNA) sequences to distinguish and identify species (i.e. we assessed their status of FINS) through an original decision pipeline. Finally, we produced a web-assisted query database – DNABUSHMEAT – that can serve as a reference framework for the DNA-based identification of African forest bushmeat species.

Materials and methods Sampling data We collected 302 samples of African mammalian species from bushmeat markets and other sources in Guinea, Ghana, Nigeria, Cameroon and Equatorial Guinea (Fig. 1; Table S1, Supporting Information). The sample of Orycteropus afer (our sole representative of Tubulidentata) came from South Africa. Our sample set covered eight taxonomic orders of African mammals commonly found in bushmeat markets, including Artiodactyla (14 species; n = 53), Carnivora (17 species; n = 88), Pholidota (three species; n = 27), Primates (15 species; n = 25), Rodentia (seven species; n = 70), Erinaceomorpha (one species; n = 1), Lagomorpha (one species; n = 1) and Tubulidentata (one species; n = 1). A ninth mammalian order (Hyracoidea) was subsequently identified in a set of taxonomically unidentified samples from Guinea (see below). We followed an opportunistic sampling strategy, collecting small pieces of tissues (ear or tongue, in general) from the available carcasses at the time of our surveys. To assess the impact of sample quality on our DNA typing approach, both freshly killed and smoked specimens were sampled. Whenever possible, pictures of

© 2014 John Wiley & Sons Ltd

DNA TYPING OF AFRICAN FOREST BUSHMEAT 3 (a) (b)

Cameroon n = 60 n = 72

Nigeria (d)

Guinea n = 108

(c)

Ghana n = 39

Equatorial Guinea n = 22

Tropical rainforest 1000 km (e)

(i)

(f)

(j)

(g)

(k)

(h)

(l)

Fig. 1 Geographic coverage of our African bushmeat study. Samples (n) were collected from various bushmeat markets and game selling places in five countries from tropical Africa (Orycteropus afer from South Africa is not shown). Photographs of sampled individuals for species identification as follows: a – Civettictis civetta; b – processed Manis gigantea; c – Manis tricuspis; d – Thryonomys swinderianus; e – beheaded Cephalophus ogilbyi (Cameroon); f – Lepus victoriae; g – anus of Civettictis civetta (Ghana); h – stall of smoked meat; i – Xerus erythropus (Nigeria); j – processed Hylochoerus meinertzhageni (Guinea); k – smoked Mandrillus sphinx; l – processed Gorilla gorilla (Equatorial Guinea).

the animals were taken to confirm species identification (Fig. 1). The preliminary morphological identification of species was based on the field guide of Kingdon (1997). In some cases, we relied on local knowledge for identification (e.g. attributing local names to the ani-

© 2014 John Wiley & Sons Ltd

mals). Taxonomy was further refined/updated following the recent edition of the Mammals of Africa (Kingdon et al. 2013) and Colyn et al. (2010) for the new species of blue duiker (Philantomba walteri). We also included 36 samples from Guinea that did not have any species

4 P. GAUBERT ET AL. attribution, to test the usefulness of our approach in identifying unrecognizable species representatives. Two nonmammalian species, a hooded vulture (Necrosyrtes monachus) and a Nile monitor (Varanus niloticus) were also sampled and included in the analyses.

DNA extraction, amplification and sequencing We extracted genomic DNA using either an ABI PRISM 6100 Nucleic Acid PrepStation (Applied Biosystems, Carlsbad, CA, USA) following the manufacturer’s recommendations, or a standard CTAB procedure (Rogers & Bendich 1988). We systematically amplified four mitochondrial genes to ensure a broad nucleotide sequence coverage that would maximize our chance of reaching DNA-based taxonomic identification. We used the ‘universal’ primer pair L14724-H15149 (following Olayemi et al. 2011) to amplify by PCR the first 402 bp of cytochrome b (cyt b). We aligned a series of GENBANK sequences representative of the mammalian orders under study (data not shown) to design single, mammalian-universal primer pairs amplifying 384–658 bp fragments of cytochrome c oxidase I (COI) and ribosomal subunits 12S and 16S (Table 1). Conserved primer pairs were designed using consensus sequences for each gene on the Primer3 web platform (http://primer3.ut.ee/). Our targeted COI fragment corresponds to the ‘standard barcode’ region developed for animals (Hebert et al. 2003). PCRs were carried out in a 20-lL final volume, containing ~50 ng of template DNA, 0.1 mg/mL BSA, 0.25 9 4 mM dNTPs, 0.2 9 2 lM primers, 59 PCR direct loading buffer with MgCl2 and 0.5–1.5 U Taq DNA polymerase (Q-BIOgene, Illkirch, France). PCR cycling conditions included a first step of denaturation (94 °C, 2 min), followed by 35 cycles of denaturation (92 °C, 30 s), annealing (30 s; see Table 1 for T°), extension (72 °C, 30 s) and a final extension step (72 °C, 15 min). PCR products were directly sequenced in both directions on 3730xl DNA Analyzer 96-capillary sequencers (Applied Biosystems, Foster City, CA, USA) at Genoscope, Evry, France. All

the sequences were deposited in GENBANK under accession nos KJ192435–KJ193529.

DNA typing analytical procedures Validation of nucleotide sequences—The detection of putative pseudogenes in gene fragments with open reading frame (cyt b and COI) was performed by translating nucleotide sequence alignments into proteins with MEGA 5.2.1. (Tamura et al. 2011) and checking for stop codons and indels. We also checked for heterozygosity, atypical branch lengths and dubious phylogenetic branching of the sequences (see below ‘Taxonomic assignment procedure’), a method also applicable to genes unconstrained by open reading frames such as rRNAs (Triant & DeWoody 2007). We checked for potential contamination from exogenous DNA by assessing the congruence of taxonomic assignments (i) between sequences and morphological identification and (ii) among sequences themselves (for a single individual). A taxonomic conflict among sequences was considered to have occurred when a DNA-based identification was in disagreement with another for the same taxonomic level [e.g. ‘Dendrohyrax dorsalis’ (Hyracoidea) vs. ‘Perodicticus potto’ (Primate)]. By contrast, two assignments such as ‘D. dorsalis’ and ‘Dendrohyrax’ would not be considered to be in conflict. The morphological identification of the species and/or the taxonomic identification supported by the majority of the genes was used to eventually identify and remove the exogenous DNA fragments (cross-species contamination). Taxonomic assignment procedure—Morphological species identification was made through consensus of the co-authors via direct observations (joint presence in the field) or shared photographic material (‘primary morphological species hypothesis’). If cross-validation could not be carried out, the identification was considered secondary (‘secondary morphological species hypothesis’), except when the sequence derived from such

Table 1 Details of the single primer pairs used to amplify the four mitochondrial DNA fragments across mammals

cyt b COI 12S 16S

Primer pairs

Sources

Annealing T° (C)

Product size (bp)

GVL14724 50 GATATGAAAAACCATCGTTG 30 H15149 50 CTCAGAATGATATTTGTCCTCA 30 bush-COIF 50 CACAAACCACAAAGAYATYGG 30 bush-COIR 50 TCAGGGTGTCCAAARAAYCA 30 bush-12SF 50 GGGATTAGATACCCCACTATGC 30 bush-12SR 50 GTGACGGGCGGTGTGT 30 bush-16SF 50 CGCCTGTTTACCAAAAACATC 30 bush-16SR 50 AATCGTTGAACAAACGAACC 30

Modified from Kocher et al. (1989) Irwin et al. (1991) This study This study This study This study This study This study

50

402

50

658

52

384–430

52

510–527

© 2014 John Wiley & Sons Ltd

DNA TYPING OF AFRICAN FOREST BUSHMEAT 5 Fig. 2 Decision pipeline used to taxonomically assign the nucleotide sequences generated from bushmeat animals.

Nucleotide sequences cyt b, 12S, COI, 16S MEGABLAST in GENBANK BLAST Treeview

Within or sister-group of a monophyletic species-level cluster Yes

Appropriate GENBANK taxonomic coverage + expert validation of the taxonomic identity of the sequence

No

Yes

≥95% maximum identity

≥95% maximum identity

Yes

Yes

Hard species

No

Soft species

Genus

No

Most inclusive phylogenetic level

representatives grouped within the same genetic clusters with other sequences representing primary morphological hypotheses of species. The taxonomic identification of the sample set from Guinea was considered ‘unknown’. The accuracy of DNA-based species assignment methods is dependent on the level of taxonomic representation and the amount of within-species genetic diversity represented in the nucleotide databases. Traditionally used tools such as BLAST (Ye et al. 2006) are known to result in false positives in cases of taxonomic underrepresentation (Ross et al. 2008). To follow a rigorous and conservative approach, we set up a decision pipeline (Fig. 2) combining (i) similarity-based taxonomic assignments by querying the GENBANK database (http://www.ncbi.nlm.nih.gov, accessed November 2013) with MEGABLAST (Ng & Peng Pang 2010) using default parameters, (ii) a tree-based approach determining the most inclusive phylogenetic attribution of a given sequence, using the BLAST Treeview widget (Neighbour Joining and Reroot option whenever necessary), (iii) an assessment of the accuracy of the taxonomic identification of the GENBANK sequences used in the taxonomic assignment procedure (i.e. whether the sequence labels have been validated by expert taxonomic studies; see Appendix S2, Supporting Information, and below for further details) and (iv) an assessment of taxonomic coverage (i.e. whether or not the species sequenced was already represented in GENBANK). Studies used as ‘expert’ reference sources were restricted to those that included voucher speci-

© 2014 John Wiley & Sons Ltd

No

mens, morphological description of species or which focused on a single species (or a few congeneric species). These steps resulted in DNA-based assignments of accurate to less accurate taxonomic categories, ranging from ‘hard’ and ‘soft’ species (respectively, species identified from expert- and nonexpert-generated sequences) to genus and in some cases the most inclusive phylogenetic level below the genus (named after Wilson & Reeder 2005). For example, the taxonomic assignment of a sequence would be considered a hard species if it grouped within or was the sister group of a monophyletic species-level cluster, under the met condition of appropriate GENBANK taxonomic coverage and expert validation of the taxonomic identity of the sequence(s) with which it shared ≥95% maximum identity (per cent similarity between the query and subject sequences over the length of the coverage area). At the opposite extreme, a sequence that did not group within or as sister group to a monophyletic species-level cluster would be assigned to its most inclusive phylogenetic level (genus, subfamily, etc.), independently of its maximum identity value. Final classification (DNA-based taxonomic assignment) was determined by choosing the lowest phylogenetic level provided by the four genes, as long as none of these were in conflict with each other. For instance, a sample that was classified as ‘Neotragus pygmaeus’ with two genes, and as Bovidae and Bovinae for the other two, would be assigned to the species N. pygmaeus. Alternatively, a sample that best matched ‘Manis’ and ‘Mammalia’, each with two of the four genes, would be assigned to the genus Manis.

6 P. GAUBERT ET AL. Utility and diagnostic level of the four mitochondrial genes for DNA typing African forest bushmeat—Correlations among gene amplification success and species, quality of samples and extraction methods were estimated using the Chi2 test (for groups with n > 5). We further evaluated the effect of these variables on PCR success applying generalized linear models (GLMs) with binomial distribution errors fitting success or failure (as a response variable) for each mitochondrial gene fragment. Models were performed with the ‘glm’ function in the R package (R Development Core Team 2013). We also assessed the success of species-level assignment per mammalian orders and genes as well as the reasons why species-level assignment was not reached using GENBANK as a reference database. Cyt b and COI sequences were aligned by eye with BIOEDIT 7.1.3 (Hall 1999). We used the MUSCLE webplatform (http://www.ebi.ac.uk/Tools/msa/muscle/; Edgar 2004) with default settings to align the 12S and 16S fragments. Regions with indels were removed before analysis. DNA polymorphism estimates (polymorphic sites and polymorphic informative sites) were calculated for each mammalian order and gene fragment in DNASP 5.10.01 (Librado & Rozas 2009). We used the ‘sliding window’ option to give a visual representation of the distribution of polymorphic sites (S) along genes (window length: 20; step size: 10). As an alternative way of estimating the usefulness of our sequences in DNA typing bushmeat, we assessed the level of taxonomic clustering in our data sets across mammalian orders using neighbour-joining (NJ) trees built in MEGA with the K2P distance model (Kimura 1980) and 1000 bootstrap replications (Felsenstein 1985). We also used the model selection option in MEGA to select the best fit model (using the BIC criterion; Keane et al. 2006) per gene partition per mammalian order and ran maximum-likelihood (ML) tree searches with five discrete categories for the gamma distribution (whenever applicable). Node support was estimated through 500 bootstrap replications. Genetic distances among and within mammalian orders were calculated in MEGA using K2P distance and 1000 bootstrap replications for standard error estimates. The choice of the K2P model allowed the comparison of our results with genetic distance estimates from previous bushmeat studies in the framework of the genetic species concept in mammals (Bradley & Baker 2001; Baker & Bradley 2006). In addition, we used an approach based on the vectorization of nucleotide sequences that extracts DNA diagnostic patterns in the form of indicator vectors and visually represents the patterns as ‘Klee diagrams’ (Sirovich et al. 2009, 2010). Those latter constitute an alternative to phylogenetic trees for representing nucleotide sequence clustering, where Klee diagrams are heat

maps of the indicator- vector correlation matrixes. For this purpose, distance trees were calculated using p-distance (Nei & Kumar 2000) in MEGA. Output files were generated via the web-based program TREEPARSER and visualized using the Indicator Vector program (http://phe.rockefeller.edu/barcode/klee.php).

Results We obtained 1157 nucleotide sequences of the 1208 possible sequences from the 302 mammalian samples (i.e. 51 PCR amplifications failed). Only 69 sequences required a second round of PCR and the success rate of first-round PCR amplification across mammalian orders was generally > 90% (Fig. S3, Supporting Information). Lower levels of PCR success occurred with COI in Erinaceomorpha, Pholidota and Carnivora. Amplification of pseudogenes (NUMTs) was most common in Pholidota (26%) and Carnivora (20%) for COI and was also observed when amplifying cyt b in Primates (10%). Although the rate of successful COI amplification was the most dissimilar compared with the three other genes, the difference was not significant. Among orders, patterns of PCR amplification success of the four genes were significantly different when comparing Pholidota to Artiodactyla and Primates. Binomial GLMs highlighted the significantly lower level of PCR success concerning COI in Pholidota and Carnivora (Table S4, Supporting Information). Smoked samples were not particularly subject to failed amplification, although they were involved in most of the failed PCRs in Artiodactyla and Carnivora (Fig. S5, Supporting Information). There was no significant difference between CTAB and robot extraction PCR success rates. PCR success rates from CTAB DNA extractions ranged from 96% (COI) to 100% (cyt b, 12S, 16S), whereas robot-based DNA extractions yielded rates from 88% (COI) to 96–98% (cyt b, 12S, 16S). The nonmammalian species Necrosyrtes monachus and Varanus niloticus yielded amplification products for 12S and 16S, and COI, 12S and 16S, respectively. Primary morphological species hypotheses represented the majority of our data set (53 species hypotheses out of 59; Table 2). The success rate of DNA-based species-level assignment across mammals was ≥50% for each of the four genes (Fig. 3). Cyt b and 16S had the highest and lowest rates (73 and 50%), respectively. The main reason that species-level assignments failed was incomplete taxonomic representation in GENBANK (reaching 86% for COI), and to a lesser extent, the nonmonophyly of species-level reference sequences (14–28%). This trend was generally similar within each taxonomic group, with cyt b performing better or equally better than the other three mtDNA genes in Artiodactyla, Carnivora, Pholidota and Rodentia. COI was generally

© 2014 John Wiley & Sons Ltd

© 2014 John Wiley & Sons Ltd 9 2

Genetta pardina

Genetta maculata

Carnivora

Carnivora

10 5 4 5 21 1 2 1 1 1 10 3 11 1 2 2 3

Carnivora Carnivora Carnivora Carnivora Carnivora Carnivora

Artiodactyla Artiodactyla Artiodactyla Carnivora Carnivora Carnivora Carnivora Carnivora Carnivora Carnivora Carnivora

2 2 2 3 2 2 4 6 2 3 2 2 1

n

Syncerus caffer* Neotragus batesi* Neotragus pygmaeus Tragelaphus scriptus Tragelaphus spekii Cephalophus dorsalis Cephalophus ogilbyi Philantomba maxwelli Philantomba monticola Philantomba walteri Unidentified duiker 1* Unidentified duiker 2* Hylochoerus meinertzhageni Phacochoerus africanus Potamochoerus porcus 1* Potamochoerus porcus 2* Canis adustus Nandinia binotata Profelis aurata Caracal caracal Proailurus serval Felis silvestris Panthera pardus Crossarchus platycephalus Herpestes ichneumon Civettictis civetta Genetta cf thierryi Genetta thierryi Genetta servalina Genetta cf pardina

Morphological species hypothesis

Artiodactyla Artiodactyla Artiodactyla Artiodactyla Artiodactyla Artiodactyla Artiodactyla Artiodactyla Artiodactyla Artiodactyla Artiodactyla Artiodactyla Artiodactyla

Order

Phacochoerus africanus Potamochoerus porcus Sus scrofa Canis Feliformia Felidae Felidae Felidae Felis silvestris Panthera pardus Herpestidae

Phacochoerus africanus Potamochoerus porcus Sus scrofa Canis adustus Nandinia binotata Felidae Felidae Felidae Felis silvestris NA Crossarchus platycephalus Herpestes ichneumon Civettictis civetta Genetta thierryi Genetta thierryi Genetta servalina Large-spotted genet complex Large-spotted genet complex Large-spotted genet complex Feliformia

Feliformia

Herpestidae Viverrinae Feliformia Feliformia Feliformia Feliformia

Syncerus caffer Neotragus batesi Neotragus pygmaeus Tragelaphus scriptus Tragelaphus spekii Cephalophus Cephalophus Philantomba maxwelli Philantomba Philantomba Philantomba Sus scrofa Phacochoerus africanus

12S

Syncerus caffer Neotragus batesi Neotragus pygmaeus Tragelaphus scriptus Tragelaphus spekii Cephalophus dorsalis Cephalophus ogilbyi Philantomba maxwelli Philantomba monticola Philantomba walteri Philantomba walteri Sus scrofa Phacochoerus africanus

cyt b

DNA-based taxonomic assignment

Feliformia

Feliformia

Herpestidae Civettictis civetta NA NA Feliformia Feliformia

Phacochoerus africanus Potamochoerus porcus Sus scrofa Canis Nandinia binotata Felidae Caracal caracal Felidae Felis silvestris Panthera pardus Crossarchus

Syncerus caffer Neotragus batesi Bovidae Tragelaphus scriptus Tragelaphus spekii Cephalophus dorsalis Cephalophus Philantomba maxwelli Philantomba monticola Philantomba walteri Philantomba walteri Sus scrofa Phacochoerus africanus

COI

Viverrinae

Viverrinae

Herpestidae Viverrinae Viverrinae Viverrinae Viverrinae Viverrinae

Phacochoerus africanus Potamochoerus porcus Sus scrofa Canis Feliformia Felidae Felidae Felidae Felis silvestris Panthera pardus Herpestidae

Syncerus caffer Neotragus batesi Bovinae Tragelaphus scriptus Tragelaphus spekii Cephalophus dorsalis Cephalophus Philantomba maxwelli Philantomba monticola Philantomba Philantomba Sus scrofa Phacochoerus africanus

16S

Herpestes ichneumon Civettictis civetta Genetta thierryi Genetta thierryi Genetta servalina Large-spotted genet complex Large-spotted genet complex Large-spotted genet complex

Phacochoerus africanus Potamochoerus porcus Sus scrofa Canis adustus Nandinia binotata Felidae Caracal caracal Felidae Felis silvestris Panthera pardus Crossarchus platycephalus

Syncerus caffer Neotragus batesi Neotragus pygmaeus Tragelaphus scriptus Tragelaphus spekii Cephalophus dorsalis Cephalophus ogilbyi Philantomba maxwelli Philantomba monticola Philantomba walteri Philantomba walteri Sus scrofa Phacochoerus africanus

DNA-based taxonomic consensus

Table 2 DNA-based taxonomic assignment of bushmeat samples using our decision pipeline (Fig. 2) and GENBANK as a reference database. Grey cells indicate that taxonomic assignment did not reach species level

DNA TYPING OF AFRICAN FOREST BUSHMEAT 7

2 5 5 17 1 1 1 1 1 5 2 2 1 2 1 1 1 1 2 1 1 1 10 2 1 2 1 1 5

Genetta sp. 2 Manis gigantea Manis tetradactyla Manis tricuspis Cercocebus torquatus Cercopithecus erythrogaster Cercopithecus mona Cercopithecus sp. 1*

Cercopithecus sp. 2

Chlorocebus sabaeus Colobus satanus Erythrocebus patas Lophocebus albigena* Mandrillus sphinx Galago senegalensis*

Galagoides demidovii

Gorilla gorilla 1 Gorilla gorilla 2* Pan troglodytes Perodicticus potto* Unidentified primate

Anomalurus pelii Atherurus africanus 1 Atherurus africanus 2 Atherurus africanus 3* Cricetomys emini* Cricetomys sp. 1 Cricetomys sp. 2 Heliosciurus rufobrachium Xerus erythropus

Carnivora Philodota Philodota Philodota Primates Primates

Primates

Primates Primates Primates Primates Primates Primates

Primates

Primates Primates Primates Primates Primates

Rodentia Rodentia Rodentia Rodentia Rodentia Rodentia Rodentia Rodentia

Rodentia

3

10

Genetta sp. 1

Carnivora

Primates Primates

n

Morphological species hypothesis

Order

Table 2 (Continued)

Gorilla gorilla Pan troglodytes Pan troglodytes Perodicticus potto Cercopithecus erythrogaster Anomalurus Atherurus africanus Hystricidae Sus scrofa Cricetomys sp. 3 Cricetomys gambianus Cricetomys sp. 1 Heliosciurus rufobrachium Xerus erythropus

Galagidae

Large-spotted genet complex Genetta thierryi Mammalia Mammalia Manis tricuspis Cercocebus torquatus Cercopithecus erythrogaster Cercopithecus mona Cercopithecus petaurista Cercopithecus erythrogaster Chlorocebus sabaeus Colobus Erythrocebus patas† Lophocebus Mandrillus sphinx Galagidae

cyt b

Xerinae

Anomalurus Atherurus africanus Atherurus africanus Sus scrofa Cricetomys Cricetomys Cricetomys Heliosciurus

Chlorocebus sabaeus Colobus Erythrocebus patas Lophocebus albigena Mandrillus sphinx Galago senegalensis (isolates 1,2) Galagoides demidovii (isolates 2,3,6,7) Gorilla gorilla Pan troglodytes Pan troglodytes Perodicticus Cercopithecus erythrogaster

Mammalia

Anomalurus Amniota Amniota Sus scrofa Cricetomys Cricetomys Cricetomys Sciuridae

Gorilla gorilla Pan troglodytes Pan troglodytes Perodicticus potto Cercopithecus erythrogaster

Amniota

Chlorocebus sabaeus Colobus Erythrocebus patas Lophocebus albigena Mandrillus sphinx Galago

Cercopithecus erythrogaster

Cercopithecus mona Cercopithecus petaurista

Cercopithecus mona Cercopithecus petaurista Cercopithecus erythrogaster

NA Mammalia Manis Manis tricuspis Cercopithecinae Cercopithecus erythrogaster

Feliformia

COI

Feliformia Manis Manis Manis tricuspis Lophocebus albigena Cercopithecus erythrogaster

Feliformia

12S

DNA-based taxonomic assignment

Xerini

Anomalurus Hystricidae Hystricidae Sus scrofa Muroidea Muroidea Muroidea Sciuridae

Gorilla gorilla Pan troglodytes Pan troglodytes Perodicticus potto Cercopithecus

Galagidae

Chlorocebus sabaeus Colobus Erythrocebus patas NA Mandrillus sphinx Galago senegalensis

Cercopithecus

Cercopithecus Cercopithecus

Viverrinae Manis Manis Manis tricuspis Lophocebus albigena Cercopithecus

Viverrinae

16S

Xerus erythropus

Anomalurus Atherurus africanus Atherurus africanus Sus scrofa Cricetomys sp. 3 Cricetomys gambianus Cricetomys sp. 1 Heliosciurus rufobrachium

Chlorocebus sabaeus Colobus Erythrocebus patas Lophocebus albigena Mandrillus sphinx Galago senegalensis (isolate) Galagoides demidovii (isolate) Gorilla gorilla Pan troglodytes Pan troglodytes Perodicticus potto Cercopithecus erythrogaster

Cercopithecus erythrogaster

Cercopithecus mona Cercopithecus petaurista

Large-spotted genet complex Genetta thierryi Manis Manis Manis tricuspis Cercocebus torquatus Cercopithecus erythrogaster

DNA-based taxonomic consensus

8 P. GAUBERT ET AL.

© 2014 John Wiley & Sons Ltd

© 2014 John Wiley & Sons Ltd

Unknown 13 Orycteropus afer Atelerix albiventris Lepus victoriae

Pholidota‡ Tubulidentata Erinaceomorpha Lagomorpha

2 1 1 1

3 1 11 1 1 5 1 1 2 1 4 3

18

25

n

Cercopithecinae Chlorocebus sabaeus Cercopithecus campbelli† Cercopithecus petaurista Dendrohyrax dorsalis Cephalophus dorsalis Cephalophus silvicultor Cephalophus Philantomba maxwelli Tragelaphus scriptus Civettictis civetta Large-spotted genet complex Manis tricuspis Orycteropus afer Metazoa Lepus

Thryonomis swinderianus Sus scrofa

cyt b Heliosciurus gambianus Sus scrofa Cercopithecinae Chlorocebus sabaeus Cercopithecus campbelli Cercopithecus petaurista Dendrohyrax dorsalis Cephalophus dorsalis Cephalophus silvicultor Cephalophus Philantomba maxwelli Tragelaphus scriptus Civettictis civetta Feliformia Manis tricuspis Orycteropus afer NA Lepus

Sus scrofa Cercocebus atys Chlorocebus sabaeus Cercopithecus Cercopithecus petaurista Dendrohyrax dorsalis Cephalophus Cephalophus Cephalophus Philantomba maxwelli Tragelaphus scriptus Viverrinae Feliformia Manis tricuspis Orycteropus afer Atelirix albiventris Lepus

COI

Thryonomis swinderianus

12S

DNA-based taxonomic assignment

*Secondary morphological hypothesis of species. †Soft species assignment. ‡Order attribution after DNA-based taxonomic assignment.

Rodentia

Rodentia

Primates‡ Primates‡ Primates‡ Primates‡ Hyracoidea‡ Artiodactyla‡ Artiodactyla‡ Artiodactyla‡ Artiodactyla‡ Artiodactyla‡ Carnivora‡ Carnivora‡

Morphological species hypothesis

Thryonomys swinderianus 1 Thryonomys swinderianus 2* Unknown 1 Unknown 2 Unknown 3 Unknown 4 Unknown 5 Unknown 6 Unknown 7 Unknown 8 Unknown 9 Unknown 10 Unknown 11 Unknown 12

Order

Table 2 (Continued)

Manis tricuspis Orycteropus afer Erinaceinae Lepus

Cercocebus atys Chlorocebus sabaeus Cercopithecus Cercopithecus Dendrohyrax dorsalis Cephalophus dorsalis Cephalophus silvicultor Cephalophus Philantomba maxwelli Tragelaphus scriptus Viverrinae Viverrinae

Sus scrofa

Thryonomis swinderianus

16S

Cercocebus atys Chlorocebus sabaeus Cercopithecus campbelli Cercopithecus petaurista Dendrohyrax dorsalis Cephalophus dorsalis Cephalophus silvicultor Cephalophus Philantomba maxwelli Tragelaphus scriptus Civettictis civetta Large-spotted genet complex Manis tricuspis Orycteropus afer Atelerix albiventris Lepus

Sus scrofa

Thryonomys swinderianus

DNA-based taxonomic consensus

DNA TYPING OF AFRICAN FOREST BUSHMEAT 9

10 P . G A U B E R T E T A L . 70

Artiodactyla

Complete data set 10

60

14

50

12

40

10

30

8 6

20

4

10

2

0 1 10

2

3

4

0 1

Carnivora 16

12

14

10

12

4

2

3

4

10

8

8

6

6

4

4

2

2

0

1

2

3

4

Primates

0 1 16

14

Rodentia

14

12

12

10

10

8

8

6

6

4

4

2

2 0

0 1

16

3

Pholidota

14

16

2

2

3

4

1

2

3

4

Unknown taxa and other mammalian orders

14 12

cyt b

10

12S

8

COI

6

16S

4 2 0 1

2

3

4

Fig. 3 Success of species-level assignment per taxonomic groups and genes together with factors responsible for failure in reaching species-level assignment. Assignments are calculated per morphological species hypothesis. 1 – quantity of species-level assignment reached; quantity of species-level assignment not reached because of: 2 – incomplete taxonomic representation in GENBANK; 3 – low expert knowledge or inappropriate taxonomic labelling of sequences (e.g. ‘Anomalurus sp.’) in GENBANK; 4 – nonmonophyly of the sequences belonging to a same species in GENBANK.

© 2014 John Wiley & Sons Ltd

D N A T Y P I N G O F A F R I C A N F O R E S T B U S H M E A T 11 ranked second or third in species-level assignment success. 12S performed better than any other genes in Primates. Artiodactyla and Primates were the orders with the highest number of successful species-level assignments per genes. There were more failed specieslevel assignments than successes in Carnivora and Pholidota and equal levels of success in Rodentia. The nonmonophyly of species-level GENBANK sequences (especially for 16S) was the main reason species-level assignments failed in Primates. Sixty-nine per cent of the samples of unknown taxonomic origin were identified at the species level. Overall, our decision pipeline allowed assigning to the species level 45 of the 69 taxonomic hypotheses (Table 2). Cyt b sequences provided ‘soft species’ support for Erythrocebus patas and Cercopithecus campbelli (Primates), which in contrast were recovered as ‘hard species’ (i.e., more accurately distinguished) with the other genes. Seven of the morphological species identifications conflicted with their DNA-based assignments. Series of smoked duikers, Potamochoerus porcus (Artiodactyla), Atherurus africanus and Thryonomys swinderianus (Rodentia) were assigned to Sus scrofa. H. meinertzhageni (Artiodactyla) was assigned to Phacochoerus africanus, one individual of Gorilla gorilla (Primates) was classified as

18

cyt b (n = 295)

Pan troglodytes and two specimens of Cricetomys emini (Rodentia) were assigned to a different species-level lineage of Cricetomys. Conflicting assignments among genes were observed in one Primate (Cercocebus torquatus) and one Rodentia (T. swinderianus). Some representatives of A. africanus could not be classified to the species level, whereas some from other geographic origins were correctly classified. DNA polymorphism levels were high in each of the four genes, especially in COI and cyt b (Table S6, Supporting Information). Mean number of polymorphic sites (S) in the total data sets ranged from 8.4 (16S) to 9.4 (12S), 9.6 (COI) and 11.8 (cyt b). The distribution of S along the genes was more regular in COI and cyt b than in 12S and 16S (Fig. 4). Those trends were also consistent within mammalian orders (data not shown). Average interspecies genetic distances within mammalian orders were the highest for cyt b (min–max: 23.3–39.5%), and then decreased from COI (20.8–27.8%) to 12S (11.6–22.4%) and 16S (9–19.6%). Cyt b had the widest range of genetic distance values (Fig. 5). NJ and ML trees yielded very similar patterns of sequence clustering, with NJ trees performing slightly better (i.e. having in some cases better supported clusters; Figs S7 and S8, Supporting Information). Trees provided a very good

18

16

16

14

14

12

12

10

10

8

8

6

6

4

4

2

2

0

18

0

bp

402

COI (n = 265)

0

18

16

16

14

14

12

12

10

10

8

8

6

6

4

4

2

2

0

0

bp

658

0

12S (n = 298)

0

bp

418

bp

602

16S (n = 295)

0

Fig. 4 Sliding window view of the distribution of polymorphic sites along the four mitochondrial genes within African mammals. The curve describes the distribution of the mean number of polymorphic sites across all taxa in each 20 bp along the genes (sliding window = 10 bp).

© 2014 John Wiley & Sons Ltd

12 P . G A U B E R T E T A L .

K2P distances 0.1 0.2 0.3

0.4

level of resolution among the five mammalian orders represented by more than one species (Bovidae, Carnivora, Pholidota, Primates and Rodentia). Cyt b trees clustered into 55 well-supported, distinct monophyletic

0.0

Discussion cyt b 12S (n = 295) (n = 298)

COI (n = 265)

16S (n = 294)

Fig. 5 Box plots summarizing the distribution of average interspecific genetic distances within each of the four mitochondrial genes among African mammals.

18 16 14 12 10 8 6 4 2 0

groups corresponding to species-level lineages. In Artiodactyla, Cephalophus dorsalis was not monophyletic in the 12S and 16S trees, nor Cephalophus ogilbyi in the 12S tree. In Carnivora, Genetta pardina and Genetta maculata clustered in a single group for each of the four genes. In Primates, the two specimens of Cercopithecus petaurista were not monophyletic or clustered with weak support in the 12S, COI and 16S analyses. The distribution of intra- versus interspecies genetic distances was nonoverlapping in Carnivora, Pholidota and Rodentia for all genes, and in Artiodactyla and Primates for cyt b, COI and 16S (Fig. 6). The strength of the tree-based segregation among and within mammalian orders was also apparent in the heat maps of indicator-vector correlation matrixes (Figs 7 and S9, Supporting Information).

Success in PCR amplification of the four mtDNA markers across taxonomic and sample quality ranges The application of forensic science and conservation genetics techniques to the study and regulations of wildlife trade requires genetic markers that are amplifiable

Artiodactyla

Pholidota

(15 species)

(3 species)

*

3

*

2 1 0

1

2

3

4

5

6

7

8

9

1

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

20

5

5

6

7

8

9 10 11 12 13 14 15 16 17 18 19 20

(12 species)

25 15

*

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42

Rodentia (9 species) 7 6 5 4 3 2 1 0

4

20

15 10

3

Carnivora

Primates (16 species)

2

10 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

cyt b COI 12S 16S

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

Fig. 6 Distribution of genetic distances per mitochondrial gene among species within five mammalian orders. X-axis represents K2P distances in percentage (1 = 0–0.99%; 2 = 1–1.99%; and so on). Dashed lines delimit intraspecific genetic distance distributions. Categories where intra- and interspecies genetic distances overlap are marked with an asterisk.

© 2014 John Wiley & Sons Ltd

D N A T Y P I N G O F A F R I C A N F O R E S T B U S H M E A T 13 1 0.9

cyt b

3 0.8

15 0.7

7

0.6

5 4

9 2

1

0.5

8

0.4

Cercopithecus campbelli (15) Cercopithecus mona (3) Erythrocebus patas (7) Chlorocebus sabaeus (5) Cercopithecus petaurista (4) Cercopithecus erythrogaster (2) Lophocebus albigena (8) Cercocebus torquatus (1) Mandrillus sphinx (9) Cercocebus atys (16) Colobus satanus (6) Gorilla gorilla (12) Pan troglodytes (13) P. potto (14)

16 12

0.3

Gs. demidovii (11) Galago senegalensis (10)

6 13

0.03

11 10 14 1

12S

0.9

Cercopithecus campbelli (15) Cercopithecus mona (3) Cercopithecus petaurista (4) Cercopithecus petaurista (4) Cercopithecus erythrogaster (2) Chlorocebus sabaeus (5) Erythrocebus patas (7) Lophocebus albigena (8) Mandrillus sphinx (9) Cercocebus torquatus (1) Cercocebus atys (16) Colobus satanus (6) Gorilla gorilla (12) Pan troglodytes (13)

3 0.8

15

0.6

5 9

2

0.5

4 1

0.4

8

16 6

0.2

13 11 10 14

0.3

12

P. potto (14) Galagoides demidovii (11) Galago senegalensis (10) 0.02

0.7

7

Fig. 7 Neighbour-joining trees (left) and Klee diagrams (right) showing species-level genetic clusters and correlation of indicator vectors within Primates for cyt b and 12S. All the collapsed clusters (NJ trees) are supported by bootstrap values > 75%. Scale bar (NJ trees) corresponds to % divergence. Vertical scale bar (Klee diagrams) expresses the level of similarity among sequences. Numbers in the Klee diagrams correspond to the species in the NJ trees. Gs. demidovii = Galagoides demidovii; P. potto = Perodicticus potto. For 12S, the correlation between the two sequences of C. petaurista was less ‘hot’ than for cyt b, mirroring their nonmonophyly in the 12S NJ tree.

across a wide taxonomic range to be broadly useful (Verma & Singh 2003). However, investigations of the African bushmeat trade have focused on only a limited taxonomic set and/or geographic spectrum of species, and a limited number of gene fragments. Although a variety of DNA typing techniques have been proposed, the majority of the bushmeat studies have so far relied on the direct sequencing of a heterogeneous series of ‘barcoding’ fragments, including: (i) COI and 12S for Primates in Guinea Bissau (Minh os et al. 2013), (ii) cyt b and control region for Artiodactyla (Bovinae) in Central Africa (Ntie et al. 2010), and (iii) COI for Primates and Artiodactyla in the Republic of Congo (Eaton et al. 2010) and for Artiodactyla in Tanzania (Bitanyi et al. 2011, 2012). These studies used taxon-specific series or cocktails of primer pairs that cannot be expanded to investigations across a wide range of mammalian orders.

© 2014 John Wiley & Sons Ltd

Building on the preliminary work of Olayemi et al. (2011), we provide a single primer pair per gene for the amplification of the cyt b, 12S, COI and 16S fragments across species from nine mammalian orders that are commonly involved in the African bushmeat trade (Artiodactyla, Carnivora, Pholidota, Primates, Rodentia, Erinaceomorpha, Lagomorpha, Tubulidentata, and Hyracoidea). In our study, PCR success rates across orders exceeded 90% and a second round of PCRs was rarely necessary. These rates are remarkably high given the variety of factors that can contribute to lower rates of amplification of cross-species primers (Housley et al. 2006). The only significant exception was the weaker success of COI amplification in Carnivora and Pholidota, mostly attributable to the amplification of pseudogenes. Nuclear integration of mtDNA resulting in nuclear mitochondrial pseudogenes (NUMTs) is a relatively common

14 P . G A U B E R T E T A L . problem in COI amplification (Song et al. 2008; Buhay 2009). Indeed, the regular distribution of site polymorphism inherent to COI (Fig. 4) made it more difficult to define conserved internal primers. This is a well-known issue affecting vertebrate taxa (Vences et al. 2005) that was notably reflected in the multiprimer pair approach of previous bushmeat genetic studies (see above). The degenerate nature of our COI primer pair in 30 probably explains the higher level of pseudogene amplification observed in some mammalian orders (Dawnay et al. 2007; Ivanova et al. 2007). Nevertheless, considering the fact that we used ‘universal’ primers to amplify four mitochondrial genes, the overall level of pseudogene amplification was low in our study (see Zhang & Hewitt 1996). PCR amplification was not affected by the type of sample (fresh, smoked or dry) or the DNA extraction method (robot versus CTAB). Although a few degraded (smoked) samples prevented PCR amplification, the heating techniques used to smoke and dry meat did not denature DNA and sequences of c. 650 bp could be routinely amplified, as has been previously observed for smoked fishes (Smith et al. 2008). This is an encouraging result regarding our ability to amplify middle-sized mtDNA fragments whatever the preprocessing of bushmeat items (see Eaton et al. 2010). We expect that the design of ‘universal’ mtDNA markers in combination with the use of a multilocus approach and a decision pipeline will be a useful contribution to the implementation of standardized FINS-based DNA typing procedures for the study and monitoring of African bushmeat markets. The utility of our straightforward PCR protocol should be even more far-reaching as it might successfully amplify additional taxa that are involved in the bushmeat trade, such as Chiroptera (data not shown) and at least some reptiles and birds (this study). However, further analyses involving additional mammalian species and orders will have to be conducted to accurately assess the taxonomic amplification range of our mtDNA markers.

Usefulness of our approach in generating FINS applicable to bushmeat items Our approach is original and broadly useful because our protocol includes a decision pipeline that provides a level of confidence in species assignments. Most wildlife forensic approaches have focused on the validation of their laboratory protocols (Dawnay et al. 2007), but to our knowledge, generally did not include a rigorous decision pipeline for the taxonomic assignment of FINS. To avoid nonreproducible protocols of DNA-based species identification (Ogden et al. 2009), we provide a stepby-step decision pipeline to validate sequence identities

(Fig. 2). Our approach explicitly relies on an expert validation of GENBANK sequences, so doubtful or mislabelled reference sequences that may blur the assignment process (Bridge et al. 2003) could be filtered. We also took into account taxonomic coverage and intraspecific diversity representation in GENBANK, which can cause inaccurate sequence assignment (Munch et al. 2008; see below). Nevertheless, we acknowledge that going through the expert validation of a given sequence may be a subjective process, notably when the available literature does not perfectly meet our criteria (e.g. when there is no reference to vouchers or expert taxonomic knowledge of the authors). This potential drawback will be less burdensome with the addition of new expertgenerated sequences in GENBANK, a task to which our study contributes. Our decision pipeline may also benefit from an empirical assessment of the conservative, 95% similarity threshold that we used to minimize false positives of species-level matches, given the potential arbitrariness of this value (Meier et al. 2006). Our study confirmed the usefulness of a multilocus typing approach (Elias et al. 2007; Frezal & Leblois 2008; Kim et al. 2014). The independent amplification of four mtDNA genes facilitated accurate species-level assignment by overcoming random, gene-dependent gaps in GENBANK taxonomic coverage. It also permitted the assessment of taxonomic assignment inconsistencies among amplifications for a given sample (notably due to pseudogene amplification or cross-species contamination). The success in reaching species-level identification from our mtDNA sequences when querying GENBANK was fair to good, ranging from 50% (16S) to 73% (cyt b). In total, 67% of the morphological species hypotheses could be DNA assigned to the species level (i.e. ‘hard species’ assignments). In addition, the distribution of genetic distances, together with the good level of resolution of the NJ/ML trees and the Klee diagrams showed that in most cases, we obtained satisfying results in distinguishing among recently diverged species (e.g. Philantomba spp.). Thus, we hypothesize that our rate of species-level assignment would be higher with a larger taxonomic representation of the queried database. The great majority of interspecific distances were higher than intraspecific distances (Fig. 6), further confirming the utility of the four mtDNA genes. COI distances among congeneric species averaged 11.4% (from 4.3 to 20.7%), which is in line with previously published ranges (e.g. Hebert et al. 2003: 9.6%; Eaton et al. 2010: 9.8%). For cyt b, the trend was similar (mean = 11%; range = 4.4–22.7%), and a large proportion of pairwise genetic distances were above the 11% threshold that can be indicative of species recognition among mammals (Bradley & Baker 2001; Baker & Bradley 2006). Although our sample set included a small proportion of sister

© 2014 John Wiley & Sons Ltd

D N A T Y P I N G O F A F R I C A N F O R E S T B U S H M E A T 15 species, the high pairwise distances were indicative of the utility of cyt b and COI in identifying between closely related species (Philantomba walteri and P. maxwelli: 8.2 and 4.9%, respectively; C. caracal and P. aurata: 13.6 and 8.4%; M. tetradactyla and M. tricuspis: 18.3 and 14.5%). The only cases of intra- versus interspecific distance overlap concerned phylogenetically proximate species groups (Philantomba spp., Cercopithecus spp.) with 12S and 16S, but these were generally distinguishable via clustering methods (NJ/ML trees and Klee diagrams). Such clustering methods are probably the best complement to similarity-based approaches when DNA typing biodiversity in the context of incomplete taxonomic coverage of reference databases (Puillandre et al. 2009). However, we acknowledge that a denser taxonomic sampling including more sister species among mammalian orders would be useful to improve accuracy between inter- and intraspecific genetic distance estimates. DNA typing proved useful in (i) resolving the taxonomic identity of smoked/processed carcasses and cryptic species and (ii) correcting misidentifications in the field. We reached species-level assignment in 69% of the Guinean samples of unknown taxonomic attribution. In addition, a pool of smoked items sold in southwestern Nigeria under different labels (‘duiker’, Potamochoerus porcus, P. africanus or Thryonomys swinderianus) was DNA assigned to S. scrofa. This finding is not trivial for the Muslim populations of the area, and we suspect that traders may increase their margins by selling domestic animals (pigs) at the price of wild games. Other conflicts between morphological species hypotheses and DNA assignment occurred with processed carcasses (only parts of the body were available) that could not be distinguished by eye from closely related species (e.g. ‘H. meinertzhageni’ and ‘Gorilla gorilla’ actually represented P. africanus and P. troglodytes, respectively). However, with H. meinertzhageni, the assignment to P. africanus may reflect natural interspecies hybridization, incomplete lineage sorting or the poor delineation of species boundaries (see Gongora et al. 2011). Another case where FINS proved their usefulness was in the assignment of phenotypically undistinguishable (cryptic) species of Cricetomys into three distinct cyt b species-level lineages (Olayemi et al. 2012). Similarly, our specimens representing G. senegalensis and Galagoides demidovii were assigned to specific 12S lineages (none of the two species were monophyletic); FINS providing here a glimpse of the complex systematics of Galagidae (Bearder & Masters 2013). Another advantage of our decision pipeline approach was that it could provide a minimum phylogenetic level of assignment for problematic taxa that could not be identified to the species level (see concern raised by Nilsson et al. 2005). For instance, an unknown sample

© 2014 John Wiley & Sons Ltd

from Guinea was assigned to the genus Cephalophus, although it was subject to conflicting species-level assignments (‘Unknown 8’; Table 2). Similar to what has been observed in most DNA typing studies, the main reason for not reaching species-level identification was incomplete taxonomic representation in GENBANK (see Puillandre et al. 2009). Our best performing gene was cyt b (73%), supporting the idea that cyt b is, to date, the best marker to identify mammalian species (Parson et al. 2000; Bradley & Baker 2001; Eaton et al. 2010; Olayemi et al. 2011; Naidu et al. 2012). On the other hand, COI reached 64% of species-level identification and was poorly represented mainly in Carnivora, Primates and Rodentia. The four mtDNA genes were in general complementary in achieving species-level identification across different taxa (e.g. 12S performed better in Primates than any other genes). Weak representation of intraspecific diversity in GENBANK was also responsible for less accurate assignment of species identification (Munch et al. 2008) – e.g. what we identified as ‘soft species’ for E. patas and C. campbelli (cyt b). Here, the formulation of clearly defined species assignment hypotheses – such as ‘soft species’ – will help focus the direction of future investigations on the status of those lineages with uncertain taxonomic attributions. The nonmonophyly of the species-level sequences present in GENBANK was the second leading factor causing the failure of species assignment. This was mostly the case for 12S and 16S (especially in Primates), suggesting that their lower rates of evolution were responsible for incomplete lineage sorting among recently diverged species (Simon et al. 1994). The failure to obtain monophyletic species-level sequences may also originate from an incongruence between morphological species hypotheses and gene trees, such as in large-spotted genets – here involving Genetta pardina and Genetta maculata (Gaubert 2003; Olayemi et al. 2011). Incorrect labelling and poor quality control of GENBANK sequences may also involve the inconclusive, nonmonophyly of retrieved species-level sequences (Nilsson et al. 2006; Bertheau et al. 2011). In addition, mislabelling can result in wrong species assignment attribution, when only a single sequence is available to represent a species in GENBANK. Here, we suggest that (i) the mitogenome of ‘Manis tetradactyla’ AJ421454/NC004027 actually represents the species M. tricuspis (also see Olayemi et al. 2011), and (ii) the COI sequences of ‘Heliosciurus gambianus’ JX426127-8 belong to the species T. swinderianus (Table 2). The apparent conflict around the genetic assignment of the primary morphological hypothesis C. torquatus for L. albigena (12S and 16S) also originated from the nonupdated mislabelling of GENBANK sequences (see Guschanski et al. 2013).

16 P . G A U B E R T E T A L .

DNABUSHMEAT: an expert knowledge, web-assisted query database for the identification of African forest bushmeat DNA typing (including barcoding) is an efficient tool to trace the dynamics of hidden or hardly accessible animal trades (Baker 2008; Baker et al. 2010). It is also a useful tool to reassess the taxonomic identity of mislabelled, traded species (‘market substitution’; Wong & Hanner 2008), which can lead to the identification of protected species (Milius 1998). The use of FINS has become a widely approved approach by the forensic genetics community (Carracedo et al. 2000). Given that >50% of African forest mammals are considered game species (Fa et al. 2002) and that misidentifications of carcasses may reach 59% in some cases (Minh os et al. 2013), we propose to deliver a web-accessible tool for the DNA-based identification of African forest bushmeat. Our mtDNA data sets were released in DNABUSHMEAT (http://mbb.univ-montp2.fr/MBB/DNAbushmeat), an expert-curated query database that will provide a reference framework for the DNA typing of African forest bushmeat. DNABUSHMEAT uses the cluster computing capacities of the Montpellier Bioinformatics Biodiversity platform (http://mbb.univ-montp2.fr/MBB/). At the moment, DNABUSHMEAT includes 60 species representing 110–150 sequences for each mtDNA alignment (cyt b, COI, 12S, 16S). As far as possible, we used a comprehensive coverage of intraspecific variability by including a series of different haplotypes for each species in the reference data sets. Users can locally blast the four databases by pasting or attaching sequences in fasta format, using BLASTn and ‘discontiguous MEGABLAST’ to allow the return of best hits with similarity values