Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Background

Enzymes are known as the largest class of proteins and their functions are usually annotated by the Enzyme Commission (EC), which uses a hierarchy structure, i.e., four numbers separated by periods, to classify the function of enzymes. Automatically categorizing enzyme into the EC hierarchy is crucial to understand its specific molecular mechanism.

Results

In this paper, we introduce two key improvements in predicting enzyme function within the machine learning framework. One is to introduce the efficient sequence encoding methods for representing given proteins. The second one is to develop a structure-based prediction method with low computational complexity. In particular, we propose to use the conjoint triad feature (CTF) to represent the given protein sequences by considering not only the composition of amino acids but also the neighbor relationships in the sequence. Then we develop a support vector machine (SVM)-based method, named as SVMHL (SVM for hierarchy labels), to output enzyme function by fully considering the hierarchical structure of EC. The experimental results show that our SVMHL with the CTF outperforms SVMHL with the amino acid composition (AAC) feature both in predictive accuracy and Matthew’s correlation coefficient (MCC). In addition, SVMHL with the CTF obtains the accuracy and MCC ranging from 81% to 98% and 0 .82 to 0 .98 when predicting the first three EC digits on a low-homologous enzyme dataset. We further demonstrate that our method outperforms the methods which do not take account of hierarchical relationship among enzyme categories and alternative methods which incorporate prior knowledge about inter-class relationships.

Conclusions

Our structure-based prediction model, SVMHL with the CTF, reduces the computational complexity and outperforms the alternative approaches in enzyme function prediction. Therefore our new method will be a useful tool for enzyme function prediction community.

Related collections

Most cited references 14

Record: found
Abstract: found
Article: not found

Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

B W Matthews (1975)

Predictions of the secondary structure of T4 phage lysozyme, made by a number of investigators on the basis of the amino acid sequence, are compared with the structure of the protein determined experimentally by X-ray crystallography. Within the amino terminal half of the molecule the locations of helices predicted by a number of methods agree moderately well with the observed structure, however within the carboxyl half of the molecule the overall agreement is poor. For eleven different helix predictions, the coefficients giving the correlation between prediction and observation range from 0.14 to 0.42. The accuracy of the predictions for both beta-sheet regions and for turns are generally lower than for the helices, and in a number of instances the agreement between prediction and observation is no better than would be expected for a random selection of residues. The structural predictions for T4 phage lysozyme are much less successful than was the case for adenylate kinase (Schulz et al. (1974) Nature 250, 140-142). No one method of prediction is clearly superior to all others, and although empirical predictions based on larger numbers of known protein structure tend to be more accurate than those based on a limited sample, the improvement in accuracy is not dramatic, suggesting that the accuracy of current empirical predictive methods will not be substantially increased simply by the inclusion of more data from additional protein structure determinations.

0 comments Cited 699 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

The ENZYME database in 2000.

A Bairoch (2000)

The ENZYME database is a repository of information related to the nomenclature of enzymes. In recent years it has became an indispensable resource for the development of metabolic databases. The current version contains information on 3705 enzymes. It is available through the ExPASy WWW server (http://www.expasy.ch/enzyme/ ).

0 comments Cited 386 times – based on 0 reviews      Review now

Bookmark

Record: found
Abstract: found
Article: not found

Twin Support Vector Machines for pattern classification.

Jayadeva, R Khemchandani, Suresh Chandra (2007)

We propose Twin SVM, a binary SVM classifier that determines two nonparallel planes by solving two related SVM-type problems, each of which is smaller than in a conventional SVM. The Twin SVM formulation is in the spirit of proximal SVMs via generalized eigenvalues. On several benchmark data sets, Twin SVM is not only fast, but shows good generalization. Twin SVM is also useful for automatically discovering two-dimensional projections of the data.

0 comments Cited 142 times – based on 0 reviews      Review now

Bookmark

All references

Author and article information

Conference

Journal ID (nlm-ta): BMC Syst Biol

Title: BMC Systems Biology

Publisher: BioMed Central

ISSN (Electronic): 1752-0509

Publication date Collection: 2011

Publication date (Electronic): 20 June 2011

Volume: 5

Issue: Suppl 1

Page: S6

Affiliations

[1 ]College of Science, China Agricultural University, Beijing, China, 100083

[2 ]Key Laboratory of Adaptation and Evolution of Plateau Biota, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, China, 810001

[3 ]Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, China, 100190

[4 ]College of Mathematics and System Science, Xinjiang University, Urumuchi, China, 830046

Article

Publisher ID: 1752-0509-5-S1-S6

DOI: 10.1186/1752-0509-5-S1-S6

PMC ID: 3121122

PubMed ID: 21689481

SO-VID: 397e7914-1c3c-4003-8b32-b0dd33ad2fa3

License:

This is an open access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Conference name: The 4th International Conference on Computational Systems Biology (ISB 2010)

Conference location: Suzhou, P. R. China

Conference date: 9-11 September 2010

History

Comments

Comment on this article

scite_

Smart Citations

Citing PublicationsSupportingMentioningContrasting

View Citations

See how this article has been cited at scite.ai

scite shows how a scientific paper has been cited by providing the context of the citation, a classification describing whether it supports, mentions, or contrasts the cited claim, and a label indicating in which section the citation was made.

Cited by 15

See all cited by

Most referenced authors 310

See all reference authors

Support vector machine prediction of enzyme function with conjoint triad feature and hierarchical context

Read this article at

Abstract

Background

Results

Conclusions

Related collections

Journal of Systems Thinking

Most cited references 14

Comparison of the predicted and observed secondary structure of T4 phage lysozyme.

The ENZYME database in 2000.

Twin Support Vector Machines for pattern classification.

Author and article information

Conference

Affiliations

Article

History

Categories

Comments

Comment on this article

Similar content 57

Cited by 15

Most referenced authors 310