13
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      qNABpredict: Quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids

      research-article

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Protein sequence‐based predictors of nucleic acid (NA)‐binding include methods that predict NA‐binding proteins and NA‐binding residues. The residue‐level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA‐binding residues, offering more information than the protein‐level prediction and much shorter runtime than the residue‐level tools. Our first‐of‐its‐kind content predictor, qNABpredict, relies on a small, rationally designed and fast‐to‐compute feature set that represents relevant characteristics extracted from the input sequence and a well‐parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy‐agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy‐aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low‐similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue‐level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue‐level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein–NA interactions for large protein families and proteomes.

          Related collections

          Most cited references82

          • Record: found
          • Abstract: found
          • Article: not found

          Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

          S Altschul (1997)
          The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.
            Bookmark
            • Record: found
            • Abstract: found
            • Article: not found

            The Protein Data Bank.

            The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.
              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              Greedy function approximation: A gradient boosting machine.

                Bookmark

                Author and article information

                Contributors
                lkurgan@vcu.edu
                Journal
                Protein Sci
                Protein Sci
                10.1002/(ISSN)1469-896X
                PRO
                Protein Science : A Publication of the Protein Society
                John Wiley & Sons, Inc. (Hoboken, USA )
                0961-8368
                1469-896X
                January 2023
                01 January 2023
                01 January 2023
                : 32
                : 1 ( doiID: 10.1002/pro.v32.1 )
                : e4544
                Affiliations
                [ 1 ] School of Mathematical Sciences and LPMC Nankai University Tianjin China
                [ 2 ] Department of Computer Science Virginia Commonwealth University Richmond Virginia USA
                Author notes
                [*] [* ] Correspondence

                Lukasz Kurgan, Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.

                Email: lkurgan@ 123456vcu.edu

                Author information
                https://orcid.org/0000-0002-4191-0241
                https://orcid.org/0000-0002-3694-2011
                https://orcid.org/0000-0002-7749-0314
                Article
                PRO4544
                10.1002/pro.4544
                9798252
                36519304
                18df82b4-954b-423d-a7d5-f96598d4a568
                © 2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society.

                This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

                History
                : 07 December 2022
                : 16 September 2022
                : 08 December 2022
                Page count
                Figures: 6, Tables: 2, Pages: 15, Words: 9891
                Categories
                Methods and Applications
                Methods and Applications
                Custom metadata
                2.0
                January 2023
                Converter:WILEY_ML3GV2_TO_JATSPMC version:6.2.3 mode:remove_FC converted:29.12.2022

                Biochemistry
                prediction,protein function,protein–nucleic acids interactions,protein sequence
                Biochemistry
                prediction, protein function, protein–nucleic acids interactions, protein sequence

                Comments

                Comment on this article