Outlier Removal for Enhancing Kernel-Based Classifier Via the Discriminant Information

Thee Chanyaswad, Mert Al, S. Y. Kung

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Scopus citations

Abstract

Pattern recognition on big data can be challenging for kernel machines as the complexity grows with the squared number of training samples. In this work, we overcome this hurdle via the outlying data sample removal pre-processing step. This approach removes less-informative data samples and trains the kernel machines only with the remaining data, and hence, directly reduces the complexity by reducing the number of training samples. To enhance the classification performance, the outlier removal process is done such that the discriminant information of the data is mostly intact. This is achieved via the novel Outlier-Removal Discriminant Information (ORDI) metric, which measures the contribution of each sample toward the discriminant information of the dataset. Hence, the ORDI metric can be used together with the simple filter method to effectively remove insignificant outliers to both reduce the computational cost and enhance the classification performance. We experimentally show on two real-world datasets at the sample removal ratio of 0.2 that, with outlier removal via ORDI, we can simultaneously (1) improve the accuracy of the classifier by 1 %, and (2) provide significant saving on the total running time by 1.5x and 2x on the two datasets. Hence, ORDI can provide a win-win situation in this performance-complexity tradeoff of the kernel machines for big data analysis.

Original languageEnglish (US)
Title of host publication2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2266-2270
Number of pages5
ISBN (Print)9781538646588
DOIs
StatePublished - Sep 10 2018
Event2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Calgary, Canada
Duration: Apr 15 2018Apr 20 2018

Publication series

NameICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Volume2018-April
ISSN (Print)1520-6149

Other

Other2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Country/TerritoryCanada
CityCalgary
Period4/15/184/20/18

All Science Journal Classification (ASJC) codes

  • Software
  • Signal Processing
  • Electrical and Electronic Engineering

Keywords

  • Big data
  • Classification
  • Discriminant information
  • Kernel machines
  • Outlier removal

Fingerprint

Dive into the research topics of 'Outlier Removal for Enhancing Kernel-Based Classifier Via the Discriminant Information'. Together they form a unique fingerprint.

Cite this