Addressing Misclassification in the Microbiome: A Data-Scientific Approach to Propagating Uncertainty in Microbial Community Composition

Misclassification of taxa is one of the biggest barriers in sequence-based microbiome studies. This project addresses that barrier by developing algorithms that account for uncertainties in taxa classification and incorporating them into microbiome analysis pipelines.

A common first step in microbiome analyses is to obtain a census of the microbes that are there, typically done for bacteria using the universal biomarker 16S rRNA gene. Current approaches to utilizing these data classify a given sequence by matching it to its closest taxonomic neighbor in the dataset or in a reference database, but this approach falls short of accurate classification due to the fact that the sequence is often not a 100 percent match with any given sequence in the dataset or reference database.

This project will overcome this limitation by defining membership as a probability, rather than relying on the all or none approach that is currently employed. By using this method, concerns over incorrect assessment of diversity and abundance can be overcome. Fixing this problem has implications for research in human health, agriculture and understanding of the environment, as well as any systems where there are discrete data with some degree of misclassification.

PRINCIPAL INVESTIGATOR:

Thea Whitman, Assistant Professor of Social Science

CO-PRINCIPAL INVESTIGATOR:

Amy Willis, Assistant Professor of Biostatistics at the University of Washington School of Public Health

CO-INVESTIGATOR:

Karl Broman, Professor of Biostatistics and Medical Informatics