18
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Decision Tree Ensembles Utilizing Multivariate Splits Are Effective at Investigating Beta Diversity in Medically Relevant 16S Amplicon Sequencing Data

      research-article
      a , , b , c , a ,
      Microbiology Spectrum
      American Society for Microbiology
      16S rRNA, metric learning, amplicon sequencing, biomarker discovery, machine learning, metabarcoding, ordination

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          ABSTRACT

          Developing an understanding of how microbial communities vary across conditions is an important analytical step. We used 16S rRNA data isolated from human stool samples to investigate whether learned dissimilarities, such as those produced using unsupervised decision tree ensembles, can be used to improve the analysis of the composition of bacterial communities in patients suffering from Crohn’s disease and adenomas/colorectal cancers. We also introduce a workflow capable of learning dissimilarities, projecting them into a lower dimensional space, and identifying features that impact the location of samples in the projections. For example, when used with the centered log ratio transformation, our new workflow (TreeOrdination) could identify differences in the microbial communities of Crohn’s disease patients and healthy controls. Further investigation of our models elucidated the global impact amplicon sequence variants (ASVs) had on the locations of samples in the projected space and how each ASV impacted individual samples in this space. Furthermore, this approach can be used to integrate patient data easily into the model and results in models that generalize well to unseen data. Models employing multivariate splits can improve the analysis of complex high-throughput sequencing data sets because they are better able to learn about the underlying structure of the data set.

          IMPORTANCE There is an ever-increasing level of interest in accurately modeling and understanding the roles that commensal organisms play in human health and disease. We show that learned representations can be used to create informative ordinations. We also demonstrate that the application of modern model introspection algorithms can be used to investigate and quantify the impacts of taxa in these ordinations, and that the taxa identified by these approaches have been associated with immune-mediated inflammatory diseases and colorectal cancer.

          Related collections

          Most cited references68

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2

          In comparative high-throughput sequencing assays, a fundamental task is the analysis of count data, such as read counts per gene in RNA-seq, for evidence of systematic changes across experimental conditions. Small replicate numbers, discreteness, large dynamic range and the presence of outliers require a suitable statistical approach. We present DESeq2, a method for differential analysis of count data, using shrinkage estimation for dispersions and fold changes to improve stability and interpretability of estimates. This enables a more quantitative analysis focused on the strength rather than the mere presence of differential expression. The DESeq2 package is available at http://www.bioconductor.org/packages/release/bioc/html/DESeq2.html. Electronic supplementary material The online version of this article (doi:10.1186/s13059-014-0550-8) contains supplementary material, which is available to authorized users.
            Bookmark
            • Record: found
            • Abstract: not found
            • Article: not found

            Random Forests

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Matplotlib: A 2D Graphics Environment

                Bookmark

                Author and article information

                Contributors
                Role: Editor
                Journal
                Microbiol Spectr
                Microbiol Spectr
                spectrum
                Microbiology Spectrum
                American Society for Microbiology (1752 N St., N.W., Washington, DC )
                2165-0497
                6 March 2023
                Mar-Apr 2023
                6 March 2023
                : 11
                : 2
                : e02065-22
                Affiliations
                [a ] Department of Integrative Biology & Centre for Biodiversity Genomics, University of Guelph, Guelph, Ontario, Canada
                [b ] Department of Biology, McMaster University, Hamilton, Ontario, Canada
                [c ] School of Computer Science, University of Guelph, Guelph, Ontario, Canada
                Lerner Research Institute
                Author notes

                The authors declare no conflict of interest.

                Author information
                https://orcid.org/0000-0003-0484-8028
                Article
                02065-22 spectrum.02065-22
                10.1128/spectrum.02065-22
                10100742
                36877086
                49c45edc-88ef-46b2-b56b-b716ee206e6e
                Copyright © 2023 Rudar et al.

                This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license.

                History
                : 7 June 2022
                : 11 February 2023
                Page count
                supplementary-material: 0, Figures: 13, Tables: 2, Equations: 5, References: 70, Pages: 19, Words: 10695
                Funding
                Funded by: Canada First Research Excellence Fund (CFREF), FundRef https://doi.org/10.13039/501100010785;
                Award ID: Food from Thought
                Award Recipient :
                Funded by: Genome Canada (GC), FundRef https://doi.org/10.13039/100008762;
                Award Recipient :
                Funded by: Ontario Genomics (OG), FundRef https://doi.org/10.13039/100013061;
                Award Recipient :
                Funded by: Gouvernement du Canada | Natural Sciences and Engineering Research Council of Canada (NSERC), FundRef https://doi.org/10.13039/501100000038;
                Award ID: RGPIN-2020-05733
                Award Recipient :
                Categories
                Research Article
                open-peer-review, Open Peer Review
                computational-biology, Computational Biology
                Custom metadata
                March/April 2023

                16s rrna,metric learning,amplicon sequencing,biomarker discovery,machine learning,metabarcoding,ordination

                Comments

                Comment on this article