Addressing Misclassification in the Microbiome: A Data-Scientific Approach to Propagating Uncertainty in Microbial Community Composition

Misclassification of taxa is one of the biggest barriers in sequence-based microbiome studies. This project addresses that barrier by developing algorithms that account for uncertainties in taxa classification and incorporating them into microbiome analysis pipelines.

A common first step in microbiome analyses is to obtain a census of the microbes that are there, typically done for bacteria using the universal biomarker 16S rRNA gene. Current approaches to utilizing these data classify a given sequence by matching it to its closest taxonomic neighbor in the dataset or in a reference database, but this approach falls short of accurate classification due to the fact that the sequence is often not a 100 percent match with any given sequence in the dataset or reference database.

This project will overcome this limitation by defining membership as a probability, rather than relying on the all or none approach that is currently employed. By using this method, concerns over incorrect assessment of diversity and abundance can be overcome. Fixing this problem has implications for research in human health, agriculture and understanding of the environment, as well as any systems where there are discrete data with some degree of misclassification.

PRINCIPAL INVESTIGATOR:

Katherine McMahon, Professor of Civil and Environmental Engineering, and Bacteriology

CO-PRINCIPAL INVESTIGATOR:

Daniel Noguera, Professor of Civil and Environmental Engineering

COLLABORATORS:

Daniel Amador-Noguez, Assistant Professor of Bacteriology

Mike Jetten, Professor and Chair of Microbiology, Radboud University, The Netherlands

Sebastian Lücker, Assistant Professor of Microbiology, Radboud University, The Netherlands