TY - JOUR
T1 - Hierarchical multi-label prediction of gene function
AU - Barutcuoglu, Zafer
AU - Schapire, Robert E.
AU - Troyanskaya, Olga G.
N1 - Funding Information:
The authors would like to thank Camelia Chiriac and Rajiv Ayyangar for laboratory work, Kara Dolinski and members of the Functional Genomics Laboratory for their contributions, and the anonymous referees for improvements in this manuscript. O.G.T. is an Alfred P. Sloan Fellow. This research was supported by NSF grant IIS-0513552 and partially supported by NIH grant R01 GM071966 to O.G.T.
PY - 2006/4/1
Y1 - 2006/4/1
N2 - Motivation: Assigning functions for unknown genes based on diverse large-scale data is a key task in functional genomics. Previous work on gene function prediction has addressed this problem using independent classifiers for each function. However, such an approach ignores the structure of functional class taxonomies, such as the Gene Ontology (GO). Over a hierarchy of functional classes, a group of independent classifiers where each one predicts gene membership to a particular class can produce a hierarchically inconsistent set of predictions, where for a given gene a specific class may be predicted positive while its inclusive parent class is predicted negative. Taking the hierarchical structure into account resolves such inconsistencies and provides an opportunity for leveraging all classifiers in the hierarchy to achieve higher specificity of predictions. Results: We developed a Bayesian framework for combining multiple classifiers based on the functional taxonomy constraints. Using a hierarchy of support vector machine (SVM) classifiers trained on multiple data types, we combined predictions in our Bayesian framework to obtain the most probable consistent set of predictions. Experiments show that over a 105-node subhierarchy of the GO, our Bayesian framework improves predictions for 93 nodes. As an additional benefit, our method also provides implicit calibration of SVM margin outputs to probabilities. Using this method, we make function predictions for multiple proteins, and experimentally confirm predictions for proteins involved in mitosis.
AB - Motivation: Assigning functions for unknown genes based on diverse large-scale data is a key task in functional genomics. Previous work on gene function prediction has addressed this problem using independent classifiers for each function. However, such an approach ignores the structure of functional class taxonomies, such as the Gene Ontology (GO). Over a hierarchy of functional classes, a group of independent classifiers where each one predicts gene membership to a particular class can produce a hierarchically inconsistent set of predictions, where for a given gene a specific class may be predicted positive while its inclusive parent class is predicted negative. Taking the hierarchical structure into account resolves such inconsistencies and provides an opportunity for leveraging all classifiers in the hierarchy to achieve higher specificity of predictions. Results: We developed a Bayesian framework for combining multiple classifiers based on the functional taxonomy constraints. Using a hierarchy of support vector machine (SVM) classifiers trained on multiple data types, we combined predictions in our Bayesian framework to obtain the most probable consistent set of predictions. Experiments show that over a 105-node subhierarchy of the GO, our Bayesian framework improves predictions for 93 nodes. As an additional benefit, our method also provides implicit calibration of SVM margin outputs to probabilities. Using this method, we make function predictions for multiple proteins, and experimentally confirm predictions for proteins involved in mitosis.
UR - http://www.scopus.com/inward/record.url?scp=33645323768&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=33645323768&partnerID=8YFLogxK
U2 - 10.1093/bioinformatics/btk048
DO - 10.1093/bioinformatics/btk048
M3 - Article
C2 - 16410319
AN - SCOPUS:33645323768
SN - 1367-4803
VL - 22
SP - 830
EP - 836
JO - Bioinformatics
JF - Bioinformatics
IS - 7
ER -