Addressing Misclassification in the Microbiome: A Data-Scientific Approach to Propagating Uncertainty in Microbial Community Composition
Misclassification of taxa is one of the biggest barriers in sequence-based microbiome studies. This project addresses that barrier by developing algorithms that account for uncertainties in taxa classification and incorporating them into microbiome analysis pipelines.
A common first step in microbiome analyses is to obtain a census of the microbes that are there, typically done for bacteria using the universal biomarker 16S rRNA gene. Current approaches to utilizing these data classify a given sequence by matching it to its closest taxonomic neighbor in the dataset or in a reference database, but this approach falls short of accurate classification due to the fact that the sequence is often not a 100 percent match with any given sequence in the dataset or reference database.
This project will overcome this limitation by defining membership as a probability, rather than relying on the all or none approach that is currently employed. By using this method, concerns over incorrect assessment of diversity and abundance can be overcome. Fixing this problem has implications for research in human health, agriculture and understanding of the environment, as well as any systems where there are discrete data with some degree of misclassification.
PRINCIPAL INVESTIGATOR:
Katherine McMahon, Professor of Civil and Environmental Engineering, and Bacteriology
CO-PRINCIPAL INVESTIGATOR:
Daniel Noguera, Professor of Civil and Environmental Engineering
COLLABORATORS:
Daniel Amador-Noguez, Assistant Professor of Bacteriology
Mike Jetten, Professor and Chair of Microbiology, Radboud University, The Netherlands
Sebastian Lücker, Assistant Professor of Microbiology, Radboud University, The Netherlands