12
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Rapid development of cloud-native intelligent data pipelines for scientific data streams using the HASTE Toolkit

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Background

          Large streamed datasets, characteristic of life science applications, are often resource-intensive to process, transport and store. We propose a pipeline model, a design pattern for scientific pipelines, where an incoming stream of scientific data is organized into a tiered or ordered “data hierarchy". We introduce the HASTE Toolkit, a proof-of-concept cloud-native software toolkit based on this pipeline model, to partition and prioritize data streams to optimize use of limited computing resources.

          Findings

          In our pipeline model, an “interestingness function” assigns an interestingness score to data objects in the stream, inducing a data hierarchy. From this score, a “policy” guides decisions on how to prioritize computational resource use for a given object. The HASTE Toolkit is a collection of tools to adopt this approach. We evaluate with 2 microscopy imaging case studies. The first is a high content screening experiment, where images are analyzed in an on-premise container cloud to prioritize storage and subsequent computation. The second considers edge processing of images for upload into the public cloud for real-time control of a transmission electron microscope.

          Conclusions

          Through our evaluation, we created smart data pipelines capable of effective use of storage, compute, and network resources, enabling more efficient data-intensive experiments. We note a beneficial separation between scientific concerns of data priority, and the implementation of this behaviour for different resources in different deployment contexts. The toolkit allows intelligent prioritization to be `bolted on' to new and existing systems – and is intended for use with a range of technologies in different deployment scenarios.

          Related collections

          Most cited references25

          • Record: found
          • Abstract: found
          • Article: found
          Is Open Access

          CellProfiler 3.0: Next-generation image processing for biology

          CellProfiler has enabled the scientific research community to create flexible, modular image analysis pipelines since its release in 2005. Here, we describe CellProfiler 3.0, a new version of the software supporting both whole-volume and plane-wise analysis of three-dimensional (3D) image stacks, increasingly common in biomedical research. CellProfiler’s infrastructure is greatly improved, and we provide a protocol for cloud-based, large-scale image processing. New plugins enable running pretrained deep learning models on images. Designed by and for biologists, CellProfiler equips researchers with powerful computational tools via a well-documented user interface, empowering biologists in all fields to create quantitative, reproducible image analysis workflows.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            Scipion: A software framework toward integration, reproducibility and validation in 3D electron microscopy.

            In the past few years, 3D electron microscopy (3DEM) has undergone a revolution in instrumentation and methodology. One of the central players in this wide-reaching change is the continuous development of image processing software. Here we present Scipion, a software framework for integrating several 3DEM software packages through a workflow-based approach. Scipion allows the execution of reusable, standardized, traceable and reproducible image-processing protocols. These protocols incorporate tools from different programs while providing full interoperability among them. Scipion is an open-source project that can be downloaded from http://scipion.cnb.csic.es.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Critical analysis of Big Data challenges and analytical methods

                Bookmark

                Author and article information

                Contributors
                Journal
                Gigascience
                Gigascience
                gigascience
                GigaScience
                Oxford University Press
                2047-217X
                19 March 2021
                March 2021
                19 March 2021
                : 10
                : 3
                : giab018
                Affiliations
                Department of Information Technology , Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
                Department of Information Technology , Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
                Department of Pharmaceutical Biosciences , Uppsala University, Husargatan 3, 75237, Uppsala, Sweden
                Science for Life Laboratory , Uppsala University, Husargatan 3, 75237 Uppsala, Sweden
                Department of Information Technology , Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
                Department of Pharmaceutical Biosciences , Uppsala University, Husargatan 3, 75237, Uppsala, Sweden
                Science for Life Laboratory , Uppsala University, Husargatan 3, 75237 Uppsala, Sweden
                Department of Information Technology , Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
                Science for Life Laboratory , Uppsala University, Husargatan 3, 75237 Uppsala, Sweden
                Vironova AB , Gävlegatan 22, 11330 Stockholm, Sweden
                Advanced Drug Delivery , Pharmaceutical Sciences, R&D, AstraZeneca, Pepparedsleden 1, 43183 Mölndal, Sweden
                Department of Information Technology , Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
                Science for Life Laboratory , Uppsala University, Husargatan 3, 75237 Uppsala, Sweden
                Department of Pharmaceutical Biosciences , Uppsala University, Husargatan 3, 75237, Uppsala, Sweden
                Science for Life Laboratory , Uppsala University, Husargatan 3, 75237 Uppsala, Sweden
                Department of Information Technology , Uppsala University, Lägerhyddsvägen 2, 75237 Uppsala, Sweden
                Author notes
                Correspondence address: Ben Blamey, Department of Information Technology, Uppsala University, Box 337, 75105 Uppsala, Sweden. E-mail: ben.blamey@ 123456it.uu.se

                Co–senior authors.

                Author information
                https://orcid.org/0000-0003-1206-1428
                https://orcid.org/0000-0003-0302-6276
                https://orcid.org/0000-0001-5447-9465
                https://orcid.org/0000-0002-6289-7285
                https://orcid.org/0000-0003-4046-9017
                https://orcid.org/0000-0002-8307-7411
                https://orcid.org/0000-0001-5310-0281
                https://orcid.org/0000-0002-4139-7003
                https://orcid.org/0000-0002-8083-2864
                https://orcid.org/0000-0001-7273-7923
                Article
                giab018
                10.1093/gigascience/giab018
                7976223
                33739401
                f6c72803-7fe9-4dc4-9803-5987afaa68dd
                © The Author(s) 2021. Published by Oxford University Press GigaScience.

                This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

                History
                : 14 September 2020
                : 26 January 2021
                : 23 February 2021
                Page count
                Pages: 14
                Funding
                Funded by: Sjögren’s Syndrome Foundation, DOI 10.13039/100003392;
                Award ID: BD15-0008
                Categories
                Technical Note
                AcademicSubjects/SCI00960
                AcademicSubjects/SCI02254

                stream processing,interestingness functions,haste,tiered storage,image analysis

                Comments

                Comment on this article

                scite_
                0
                0
                0
                0
                Smart Citations
                0
                0
                0
                0
                Citing PublicationsSupportingMentioningContrasting
                View Citations

                See how this article has been cited at scite.ai

                scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

                Similar content291

                Most referenced authors279