Analyzing large data sets from XGC1 magnetic fusion simulations using apache spark

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Scopus citations

Abstract

Apache Spark is explored as a tool for analyzing large data sets from the magnetic fusion simulation code XGC1. Implementation details of Apache Spark on the NERSC Edison supercomputer are discussed, including binary file reading, and parameter setup. An unsupervised machine learning algorithm, k-means clustering, is applied to XGC1 particle distribution function data, showing that highly turbulent spatial regions do not have common coherent structures, but rather broad, ringlike structures in velocity space.

Original languageEnglish (US)
Title of host publication2016 New York Scientific Data Summit, NYSDS 2016 - Proceedings
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781467390514
DOIs
StatePublished - Nov 17 2016
Event2016 New York Scientific Data Summit, NYSDS 2016 - New York, United States
Duration: Aug 14 2016Aug 17 2016

Publication series

Name2016 New York Scientific Data Summit, NYSDS 2016 - Proceedings

Conference

Conference2016 New York Scientific Data Summit, NYSDS 2016
Country/TerritoryUnited States
CityNew York
Period8/14/168/17/16

All Science Journal Classification (ASJC) codes

  • Information Systems
  • Computer Networks and Communications
  • Computer Science Applications

Keywords

  • distributed computing
  • k-means clustering
  • machine-learning
  • magnetic fusion
  • simulation
  • spark

Fingerprint

Dive into the research topics of 'Analyzing large data sets from XGC1 magnetic fusion simulations using apache spark'. Together they form a unique fingerprint.

Cite this