TY - GEN
T1 - Maintaining Trust in Reduction
T2 - 21st Smoky Mountains Computational Sciences and Engineering Conference, SMC 2021
AU - Gong, Qian
AU - Liang, Xin
AU - Whitney, Ben
AU - Choi, Jong Youl
AU - Chen, Jieyang
AU - Wan, Lipeng
AU - Ethier, Stéphane
AU - Ku, Seung Hoe
AU - Churchill, R. Michael
AU - Chang, C. S.
AU - Ainsworth, Mark
AU - Tugluk, Ozan
AU - Munson, Todd
AU - Pugmire, David
AU - Archibald, Richard
AU - Klasky, Scott
N1 - Publisher Copyright:
© 2022, Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - As the growth of data sizes continues to outpace computational resources, there is a pressing need for data reduction techniques that can significantly reduce the amount of data and quantify the error incurred in compression. Compressing scientific data presents many challenges for reduction techniques since it is often on non-uniform or unstructured meshes, is from a high-dimensional space, and has many Quantities of Interests (QoIs) that need to be preserved. To illustrate these challenges, we focus on data from a large scale fusion code, XGC. XGC uses a Particle-In-Cell (PIC) technique which generates hundreds of PetaBytes (PBs) of data a day, from thousands of timesteps. XGC uses an unstructured mesh, and needs to compute many QoIs from the raw data, f. One critical aspect of the reduction is that we need to ensure that QoIs derived from the data (density, temperature, flux surface averaged momentums, etc.) maintain a relative high accuracy. We show that by compressing XGC data on the high-dimensional, nonuniform grid on which the data is defined, and adaptively quantizing the decomposed coefficients based on the characteristics of the QoIs, the compression ratios at various error tolerances obtained using a multilevel compressor (MGARD) increases more than ten times. We then present how to mathematically guarantee that the accuracy of the QoIs computed from the reduced f is preserved during the compression. We show that the error in the XGC density can be kept under a user-specified tolerance over 1000 timesteps of simulation using the mathematical QoI error control theory of MGARD, whereas traditional error control on the data to be reduced does not guarantee the accuracy of the QoIs.
AB - As the growth of data sizes continues to outpace computational resources, there is a pressing need for data reduction techniques that can significantly reduce the amount of data and quantify the error incurred in compression. Compressing scientific data presents many challenges for reduction techniques since it is often on non-uniform or unstructured meshes, is from a high-dimensional space, and has many Quantities of Interests (QoIs) that need to be preserved. To illustrate these challenges, we focus on data from a large scale fusion code, XGC. XGC uses a Particle-In-Cell (PIC) technique which generates hundreds of PetaBytes (PBs) of data a day, from thousands of timesteps. XGC uses an unstructured mesh, and needs to compute many QoIs from the raw data, f. One critical aspect of the reduction is that we need to ensure that QoIs derived from the data (density, temperature, flux surface averaged momentums, etc.) maintain a relative high accuracy. We show that by compressing XGC data on the high-dimensional, nonuniform grid on which the data is defined, and adaptively quantizing the decomposed coefficients based on the characteristics of the QoIs, the compression ratios at various error tolerances obtained using a multilevel compressor (MGARD) increases more than ten times. We then present how to mathematically guarantee that the accuracy of the QoIs computed from the reduced f is preserved during the compression. We show that the error in the XGC density can be kept under a user-specified tolerance over 1000 timesteps of simulation using the mathematical QoI error control theory of MGARD, whereas traditional error control on the data to be reduced does not guarantee the accuracy of the QoIs.
KW - Error control
KW - Lossy compression
KW - Quantities of interest
KW - XGC simulation data
UR - https://www.scopus.com/pages/publications/85127037644
UR - https://www.scopus.com/pages/publications/85127037644#tab=citedBy
U2 - 10.1007/978-3-030-96498-6_2
DO - 10.1007/978-3-030-96498-6_2
M3 - Conference contribution
AN - SCOPUS:85127037644
SN - 9783030964979
T3 - Communications in Computer and Information Science
SP - 22
EP - 39
BT - Driving Scientific and Engineering Discoveries Through the Integration of Experiment, Big Data, and Modeling and Simulation - 21st Smoky Mountains Computational Sciences and Engineering, SMC 2021, Revised Selected Papers
A2 - Nichols, [given-name]Jeffrey
A2 - Maccabe, [given-name]Arthur ‘Barney’
A2 - Nutaro, James
A2 - Pophale, Swaroop
A2 - Devineni, Pravallika
A2 - Ahearn, Theresa
A2 - Verastegui, Becky
PB - Springer Science and Business Media Deutschland GmbH
Y2 - 18 October 2021 through 20 October 2021
ER -