TY - GEN
T1 - A mutual information based approach for evaluating the quality of clustering
AU - Fattah, S. A.
AU - Lin, Chia Chun
AU - Kung, Sun Yuan
N1 - Copyright:
Copyright 2011 Elsevier B.V., All rights reserved.
PY - 2011
Y1 - 2011
N2 - In this paper, a new method for evaluating the quality of clustering of genes is proposed based on mutual information criterion. Instead of using the conventional histogram-based modeling method to assess clustering performance, we derive a normalized mutual information criterion utilizing the Gaussian kernel density estimator. In the computation of the mutual information, we propose to use only cluster-centroids instead of involving all the members, which offers a huge computational savings. The proposed algorithm not only considers the cluster size but also takes into consideration the homogeneity within a cluster. One major advantage of the proposed algorithm is that, it is capable of estimating an appropriate number of clusters. Extensive experimentation has been carried out on some synthetic data as well as the most widely used Yeast cell cycle gene expression data. Under various clustering conditions it is found that the proposed method provides an excellent performance in terms of measuring the quality of cluster and identifying the true number of cluster.
AB - In this paper, a new method for evaluating the quality of clustering of genes is proposed based on mutual information criterion. Instead of using the conventional histogram-based modeling method to assess clustering performance, we derive a normalized mutual information criterion utilizing the Gaussian kernel density estimator. In the computation of the mutual information, we propose to use only cluster-centroids instead of involving all the members, which offers a huge computational savings. The proposed algorithm not only considers the cluster size but also takes into consideration the homogeneity within a cluster. One major advantage of the proposed algorithm is that, it is capable of estimating an appropriate number of clusters. Extensive experimentation has been carried out on some synthetic data as well as the most widely used Yeast cell cycle gene expression data. Under various clustering conditions it is found that the proposed method provides an excellent performance in terms of measuring the quality of cluster and identifying the true number of cluster.
KW - Mutual information
KW - clustering
KW - gene classification
KW - kernel density estimator
KW - microarray gene expression data
KW - probability density function
UR - http://www.scopus.com/inward/record.url?scp=80051606719&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=80051606719&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2011.5946475
DO - 10.1109/ICASSP.2011.5946475
M3 - Conference contribution
AN - SCOPUS:80051606719
SN - 9781457705397
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 601
EP - 604
BT - 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011 - Proceedings
T2 - 36th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011
Y2 - 22 May 2011 through 27 May 2011
ER -