8
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling

      research-article
      1 , 2 , 1 , 2 ,
      Genome Biology
      BioMed Central
      Nanopore, Basecalling, Benchmark, Deep learning

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Nanopore-based DNA sequencing relies on basecalling the electric current signal. Basecalling requires neural networks to achieve competitive accuracies. To improve sequencing accuracy further, new models are continuously proposed with new architectures. However, benchmarking is currently not standardized, and evaluation metrics and datasets used are defined on a per publication basis, impeding progress in the field. This makes it impossible to distinguish data from model driven improvements.

          Results

          To standardize the process of benchmarking, we unified existing benchmarking datasets and defined a rigorous set of evaluation metrics. We benchmarked the latest seven basecaller models by recreating and analyzing their neural network architectures. Our results show that overall Bonito’s architecture is the best for basecalling. We find, however, that species bias in training can have a large impact on performance. Our comprehensive evaluation of 90 novel architectures demonstrates that different models excel at reducing different types of errors and using recurrent neural networks (long short-term memory) and a conditional random field decoder are the main drivers of high performing models.

          Conclusions

          We believe that our work can facilitate the benchmarking of new basecaller tools and that the community can further expand on this work.

          Supplementary Information

          The online version contains supplementary material available at 10.1186/s13059-023-02903-2.

          Related collections

          Most cited references8

          • Record: found
          • Abstract: found
          • Article: not found

          Minimap2: pairwise alignment for nucleotide sequences

          Heng Li (2018)
          Recent advances in sequencing technologies promise ultra-long reads of ∼100 kb in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 Mb in length. Existing alignment programs are unable or inefficient to process such data at scale, which presses for the development of new alignment algorithms.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: found
            Is Open Access

            Nanopore sequencing and assembly of a human genome with ultra-long reads

            We report the sequencing and assembly of a reference genome for the human GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence data, representing ~30× theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads alone yielded a contiguous assembly (NG50 ~3 Mb). We developed a protocol to generate ultra-long reads (N50 > 100 kb, read lengths up to 882 kb). Incorporating an additional 5× coverage of these ultra-long reads more than doubled the assembly contiguity (NG50 ~6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, after incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled assembly and phasing of the 4-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere repeat length, and closure of gaps in the reference human genome assembly GRCh38.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Nanopore sequencing technology, bioinformatics and applications

                Bookmark

                Author and article information

                Contributors
                m.pagesgallego@umcutrecht.nl
                j.deridder-4@umcutrecht.nl
                Journal
                Genome Biol
                Genome Biol
                Genome Biology
                BioMed Central (London )
                1474-7596
                1474-760X
                11 April 2023
                11 April 2023
                2023
                : 24
                : 71
                Affiliations
                [1 ]GRID grid.7692.a, ISNI 0000000090126352, Center for Molecular Medicine, , University Medical Center Utrecht, ; Universiteitsweg 100, 3584 CG Utrecht, The Netherlands
                [2 ]GRID grid.499559.d, Oncode Institute, ; Utrecht, The Netherlands
                Author information
                http://orcid.org/0000-0001-8888-5699
                Article
                2903
                10.1186/s13059-023-02903-2
                10088207
                37041647
                2bb5aa2a-8514-4286-96cd-d84a3cfc9639
                © The Author(s) 2023

                Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

                History
                : 1 August 2022
                : 20 March 2023
                Funding
                Funded by: FundRef http://dx.doi.org/10.13039/100016036, Health~Holland;
                Award ID: LSHM19029
                Award Recipient :
                Categories
                Research
                Custom metadata
                © The Author(s) 2023

                Genetics
                nanopore,basecalling,benchmark,deep learning
                Genetics
                nanopore, basecalling, benchmark, deep learning

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content222

                Cited by13

                Most referenced authors184