An automatic data cleaning procedure for electron cyclotron emission imaging on EAST tokamak using machine learning algorithm

  • C. Li
  • , T. Lan
  • , Y. Wang
  • , J. Liu
  • , J. Xie
  • , T. Lan
  • , H. Li
  • , H. Qin

Research output: Contribution to journalArticlepeer-review

9 Scopus citations

Abstract

A new data cleaning procedure for the electron cyclotron emission imaging (ECEI) of the EAST tokamak is developed. Machine learning techniques, including support vector machine (SVM) and Decision Trees, are applied to the identification of saturated, zero, and weak signals of the ECEI raw data. As a result, the burden of data analysis is reduced, and the classification accuracy is improved. Proper training sets are sampled using the massive raw ECEI data from the EAST tokamak. The optimal window size of temporal signals, the kernel function, and other model parameters are obtained by the model training. Five-fold cross-validation (CV) is applied during modeling and an external testing set is employed to validate the prediction performance of models. The average recall rates on CV sets of saturated, zero, and weak signals are 95.9%, 96.72%, and 100%, respectively, which prove the accuracy of this procedure. Random Forest, as a comparative method, is also employed to deal with the same data sets. The average recall rates on CV sets of saturated, zero, and weak signals performed by Random Forest are 95.9%, 96.72%, and 95.88%. Our method has been proved to outperform Random Forest with small data sets.

Original languageEnglish (US)
Article numberP10029
JournalJournal of Instrumentation
Volume13
Issue number10
DOIs
StatePublished - Oct 24 2018

All Science Journal Classification (ASJC) codes

  • Instrumentation
  • Mathematical Physics

Keywords

  • Analysis and statistical methods
  • Data processing methods

Fingerprint

Dive into the research topics of 'An automatic data cleaning procedure for electron cyclotron emission imaging on EAST tokamak using machine learning algorithm'. Together they form a unique fingerprint.

Cite this