Supervised machine learning outperforms taxonomy-based environmental DNA metabarcoding applied to biomonitoring

Jul 17, 2017 | Methods

Cordier T, Forster D, Dufresne Y, Martins CIM, Stoeck T, Pawlowski J
Mol Ecol Resour. 2018 Jul 17. doi: 10.1111/1755-0998.12926.

ABSTRACT

Biodiversity monitoring is the standard for environmental impact assessment of anthropogenic activities. Several recent studies showed that high-throughput amplicon sequencing of environmental DNA (eDNA metabarcoding) could overcome many limitations of the traditional morphotaxonomy-based bioassessment. Recently, we demonstrated that supervised machine learning (SML) can be used to predict accurate biotic indices values from eDNA metabarcoding data, regardless of the taxonomic affiliation of the sequences. However, it is unknown to which extent the accuracy of such models depends on taxonomic resolution of molecular markers or how SML compares with metabarcoding approaches targeting well-established bioindicator species. In this study, we address these issues by training predictive models upon five different ribosomal bacterial and eukaryotic markers and measuring their performance to assess the environmental impact of marine aquaculture on independent datasets. Our results show that all tested markers are yielding accurate predictive models, and that they all outperform the assessment relying solely on taxonomically assigned sequences. Remarkably, we did not find any significant difference in the performance of the models built using universal eukaryotic or prokaryotic markers. Using any molecular marker with a taxonomic range broad enough to comprise different potential bioindicator taxa, SML approach can overcome the limits of taxonomy-based eDNA bioassessment. This article is protected by copyright. All rights reserved.

View more details and read article on Pubmed