TY - JOUR
T1 - Discriminatory Mining of Gene Expression Microarray Data
AU - Wang, Zuyi
AU - Wang, Yue
AU - Lu, Jianping
AU - Kung, Sun Yuan
AU - Zhang, Junying
AU - Lee, Richard
AU - Xuan, Jianhua
AU - Khan, Javed
AU - Clarke, Robert
N1 - Funding Information:
∗This work was supported in part by the National Institutes of Health under Grants 5R21CA83231. †Present address: Center for Genetic Research, Children’s National Medical Center, Washington, DC 20010, USA.
PY - 2003/11
Y1 - 2003/11
N2 - Recent advances in machine learning and pattern recognition methods provide new analytical tools to explore high dimensional gene expression microarray data. Our data mining software, VISual Data Analyzer for cluster discovery (VISDA), reveals many distinguishing patterns among gene expression profiles, which are responsible for the cell's phenotypes. The model-supported exploration of high-dimensional data space is achieved through two complementary schemes: dimensionality reduction by discriminatory data projection and cluster decomposition by soft data clustering. Reducing dimensionality generates the visualization of the complete data set at the top level. This data set is then partitioned into subclusters that can consequently be visualized at lower levels and if necessary partitioned again. In this paper, three different algorithms are evaluated in their abilities to reduce dimensionality and to visualize data sets: Principal Component Analysis (PCA), Discriminatory Component Analysis (DCA), and Projection Pursuit Method (PPM). The partitioning into subclusters uses the Expectation-Maximization (EM) algorithm and the hierarchical normal mixture model that is selected by the user and verified "optimally" by the Minimum Description Length (MDL) criterion. These approaches produce different visualizations that are compared against known phenotypes from the microarray experiments. Overall, these algorithms and user-selected models explore the high dimensional data where standard analyses may not be sufficient.
AB - Recent advances in machine learning and pattern recognition methods provide new analytical tools to explore high dimensional gene expression microarray data. Our data mining software, VISual Data Analyzer for cluster discovery (VISDA), reveals many distinguishing patterns among gene expression profiles, which are responsible for the cell's phenotypes. The model-supported exploration of high-dimensional data space is achieved through two complementary schemes: dimensionality reduction by discriminatory data projection and cluster decomposition by soft data clustering. Reducing dimensionality generates the visualization of the complete data set at the top level. This data set is then partitioned into subclusters that can consequently be visualized at lower levels and if necessary partitioned again. In this paper, three different algorithms are evaluated in their abilities to reduce dimensionality and to visualize data sets: Principal Component Analysis (PCA), Discriminatory Component Analysis (DCA), and Projection Pursuit Method (PPM). The partitioning into subclusters uses the Expectation-Maximization (EM) algorithm and the hierarchical normal mixture model that is selected by the user and verified "optimally" by the Minimum Description Length (MDL) criterion. These approaches produce different visualizations that are compared against known phenotypes from the microarray experiments. Overall, these algorithms and user-selected models explore the high dimensional data where standard analyses may not be sufficient.
KW - Cluster visualization and selection
KW - Computational bioinformatics
KW - Finite normal mixture
KW - Gene microarrays
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=0141990708&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=0141990708&partnerID=8YFLogxK
U2 - 10.1023/B:VLSI.0000003024.13494.40
DO - 10.1023/B:VLSI.0000003024.13494.40
M3 - Article
AN - SCOPUS:0141990708
SN - 1939-8018
VL - 35
SP - 255
EP - 272
JO - Journal of Signal Processing Systems
JF - Journal of Signal Processing Systems
IS - 3
ER -