The CESM Workflow Re-Engineering Project
The Community Earth System Model (CESM) Workflow Re-Engineering Project is a collaborative project between the CESM Software Engineering Group (CSEG) and the NCAR Computation and Information Systems Lab (CISL) Application Scalability and Performance (ASAP) Group to revamp how CESM saves its output. The CMIP3 and particularly CMIP5 experiences in submitting CESM data to those intercomparison projects revealed that the output format of the CESM is not well-suited for the data requirements common to model intercomparison projects. CESM, for efficiency reasons, creates output files containing all fields for each model time sampling, but MIPs require individual files for each field comprising all model time samples. This transposition of model output can be very time-consuming; depending on the volume of data written by the specific simulation, the time to re-orient the data can be comparable to the time required for the simulation to complete. Previous strategies including using serial tools to perform this transposition, but they are now far too inefficient to deal with the many terabytes of output a single simulation can generate. A new set of Python tools, using data parallelism, have been written to enable this re-orientation, and have achieved markedly improved I/O performance. The perspective of a data manager/data producer in the use of these new tools is presented, and likely future work on their development and use will be shown. These tools are a critical part of the NCAR CESM submission to the upcoming CMIP6, with the intention that a much more timely and efficient submission of the expected petabytes of data will be accomplished in the given time frame.