qNABpredict: Quick, accurate, and taxonomy‐aware sequence‐based prediction of content of nucleic acid binding amino acids

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Protein sequence‐based predictors of nucleic acid (NA)‐binding include methods that predict NA‐binding proteins and NA‐binding residues. The residue‐level tools produce more details but suffer high computational cost since they must predict every amino acid in the input sequence and rely on multiple sequence alignments. We propose an alternative approach that predicts content (fraction) of the NA‐binding residues, offering more information than the protein‐level prediction and much shorter runtime than the residue‐level tools. Our first‐of‐its‐kind content predictor, qNABpredict, relies on a small, rationally designed and fast‐to‐compute feature set that represents relevant characteristics extracted from the input sequence and a well‐parametrized support vector regression model. We provide two versions of qNABpredict, a taxonomy‐agnostic model that can be used for proteins of unknown taxonomic origin and more accurate taxonomy‐aware models that are tailored to specific taxonomic kingdoms: archaea, bacteria, eukaryota, and viruses. Empirical tests on a low‐similarity test dataset show that qNABpredict is 100 times faster and generates statistically more accurate content predictions when compared to the content extracted from results produced by the residue‐level predictors. We also show that qNABpredict's content predictions can be used to improve results generated by the residue‐level predictors. We release qNABpredict as a convenient webserver and source code at http://biomine.cs.vcu.edu/servers/qNABpredict/. This new tool should be particularly useful to predict details of protein–NA interactions for large protein families and proteomes.

Related collections

Most cited references 82

Record: found
Abstract: found
Article: not found

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

S Altschul (1997)

The BLAST programs are widely used tools for searching protein and DNA databases for sequence similarities. For protein comparisons, a variety of definitional, algorithmic and statistical refinements described here permits the execution time of the BLAST programs to be decreased substantially while enhancing their sensitivity to weak similarities. A new criterion for triggering the extension of word hits, combined with a new heuristic for generating gapped alignments, yields a gapped BLAST program that runs at approximately three times the speed of the original. In addition, a method is introduced for automatically combining statistically significant alignments produced by BLAST into a position-specific score matrix, and searching the database using this matrix. The resulting Position-Specific Iterated BLAST (PSI-BLAST) program runs at approximately the same speed per iteration as gapped BLAST, but in many cases is much more sensitive to weak but biologically relevant sequence similarities. PSI-BLAST is used to uncover several new and interesting members of the BRCT superfamily.

0 comments Cited 4700 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The Protein Data Bank.

H M Berman, J Westbrook, Z Feng … (2000)

The Protein Data Bank (PDB; http://www.rcsb.org/pdb/ ) is the single worldwide archive of structural data of biological macromolecules. This paper describes the goals of the PDB, the systems in place for data deposition and access, how to obtain further information, and near-term plans for the future development of the resource.

0 comments Cited 4456 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: not found
Article: not found

Greedy function approximation: A gradient boosting machine.

Jerome Friedman (2001)

0 comments Cited 2747 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Contributors

Lukasz Kurgan:

ORCID: https://orcid.org/0000-0002-7749-0314

lkurgan@vcu.edu

Journal

Journal ID (nlm-ta): Protein Sci

Journal ID (iso-abbrev): Protein Sci

Journal ID (doi): 10.1002/(ISSN)1469-896X

Journal ID (publisher-id): PRO

Title: Protein Science : A Publication of the Protein Society

Publisher: John Wiley & Sons, Inc. (Hoboken, USA )

ISSN (Print): 0961-8368

ISSN (Electronic): 1469-896X

Publication date Collection: January 2023

Publication date (Electronic): 01 January 2023

Publication date PMC-release: 01 January 2023

Volume: 32

Issue: 1 ( doiID: 10.1002/pro.v32.1 )

Electronic Location Identifier: e4544

Affiliations

[ ¹ ] School of Mathematical Sciences and LPMC Nankai University Tianjin China

[ ² ] Department of Computer Science Virginia Commonwealth University Richmond Virginia USA

Author notes

[*] [* ] Correspondence

Lukasz Kurgan, Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.

Email: lkurgan@ 123456vcu.edu

Author information

Zhonghua Wu https://orcid.org/0000-0002-4191-0241

Xuantai Wu https://orcid.org/0000-0002-3694-2011

Lukasz Kurgan https://orcid.org/0000-0002-7749-0314

Article

Publisher ID: PRO4544

DOI: 10.1002/pro.4544

PMC ID: 9798252

PubMed ID: 36519304

SO-VID: 18df82b4-954b-423d-a7d5-f96598d4a568

License:

This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited and is not used for commercial purposes.

History

Date revision received : 07 December 2022

Date received : 16 September 2022

Date accepted : 08 December 2022

Page count

Figures: 6, Tables: 2, Pages: 15, Words: 9891

Custom metadata

source-schema-version-number 2.0

cover-date January 2023

details-of-publishers-convertor Converter:WILEY_ML3GV2_TO_JATSPMC version:6.2.3 mode:remove_FC converted:29.12.2022

ScienceOpen disciplines: Biochemistry

Keywords: prediction,protein function,protein–nucleic acids interactions,protein sequence

Data availability:

ScienceOpen disciplines: Biochemistry

Keywords: prediction, protein function, protein–nucleic acids interactions, protein sequence

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.