TY - GEN
T1 - Analyzing large data sets from XGC1 magnetic fusion simulations using apache spark
AU - Churchill, R. Michael
N1 - Publisher Copyright:
© 2016 IEEE.
PY - 2016/11/17
Y1 - 2016/11/17
N2 - Apache Spark is explored as a tool for analyzing large data sets from the magnetic fusion simulation code XGC1. Implementation details of Apache Spark on the NERSC Edison supercomputer are discussed, including binary file reading, and parameter setup. An unsupervised machine learning algorithm, k-means clustering, is applied to XGC1 particle distribution function data, showing that highly turbulent spatial regions do not have common coherent structures, but rather broad, ringlike structures in velocity space.
AB - Apache Spark is explored as a tool for analyzing large data sets from the magnetic fusion simulation code XGC1. Implementation details of Apache Spark on the NERSC Edison supercomputer are discussed, including binary file reading, and parameter setup. An unsupervised machine learning algorithm, k-means clustering, is applied to XGC1 particle distribution function data, showing that highly turbulent spatial regions do not have common coherent structures, but rather broad, ringlike structures in velocity space.
KW - distributed computing
KW - k-means clustering
KW - machine-learning
KW - magnetic fusion
KW - simulation
KW - spark
UR - https://www.scopus.com/pages/publications/85006955775
UR - https://www.scopus.com/inward/citedby.url?scp=85006955775&partnerID=8YFLogxK
U2 - 10.1109/NYSDS.2016.7747820
DO - 10.1109/NYSDS.2016.7747820
M3 - Conference contribution
AN - SCOPUS:85006955775
T3 - 2016 New York Scientific Data Summit, NYSDS 2016 - Proceedings
BT - 2016 New York Scientific Data Summit, NYSDS 2016 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2016 New York Scientific Data Summit, NYSDS 2016
Y2 - 14 August 2016 through 17 August 2016
ER -