The Climate & Forecast (CF) Metadata Convention for Quantization
An increasingly important and easy way to tame the geophysical data deluge is to quantize the random or unwanted trailing bits of IEEE floating point numbers, and then to losslessly compress and store only the remaining, scientifically meaningful bits. Two main problems that operational centers and other research data infrastructures (RDIs) face when employing quantization are deciding the number of significant bits (NSB, or number of significant base-10 digits, NSD) to retain, and how to convey the quantization method and precision information to downstream data users. The new Climate & Forecast (CF) convention for encoding quantization metadata solves the latter problem. This presentation describes the new CF convention, and illustrates its use and appearance in a reference implementation in the netCDF Operators (NCO).
The new CF metadata convention can be implemented in any of the multiple language bindings (C, Fortran, Java, Python etc.) that already support the quantization algorithms provided by the netCDF library for multiple backend storage format (e.g., HDF5, Zarr). CF-compliant files will include provenance information about the exact quantization method (e.g., BitRound, Digit Round, Granular BitGroom) in a container variable of negligible size, and will convey variable-specific information (NSD or NSB) as attributes of each quantized field. Optional variable-level attributes can provide metrics such as the maximum relative error due to quantization. As motivation to adopt the new CF convention, we show that (RDIs) like the Coupled Model Intercomparison Project (e.g., CMIP7) could increase their overall compression ratio by a factor of about two relative to CMIP6 while still preserving all scientifically meaningful data.