xCDAT (Xarray Climate Data Analysis Tools): A Python Package for Simple and Robust Analysis of Climate Data
xCDAT is a Python package that combines the power of Xarray, a mature and widely adopted scientific Python package, with geospatial analysis features inspired by the Community Data Analysis Tools library. xCDAT’s scope focuses on routine climate research analysis operations such as loading, averaging, and regridding data on structured grids (e.g., rectilinear, curvilinear). Some of the key features include spatial averaging, temporal averaging, horizontal regridding, vertical regridding, and coordinate bounds generation. xCDAT has the ability to operate generally across model and observational datasets that follow the CF Metadata Conventions by interpreting CF Metadata through the CF xarray package.
Since its inception in early 2021, xCDAT has gained widespread adoption throughout the open-source community. xCDAT has accumulated over 14,000 total downloads on Anaconda and nearly 100 stars on GitHub as of May 2024. There are users from various projects and organizations across the globe, including Earth Exascale Energy System Model (E3SM), Program for Climate Model Diagnosis and Intercomparison (PCMDI), National Aeronautics and Space Administration (NASA), and Institut Pierre-Simon Laplace (IPSL). At Lawrence Livermore National Lab (LLNL), xCDAT and Xarray are becoming staple tools for routine climate research. xCDAT is currently being integrated as a data processing engine within the PCMDI Metrics Package and E3SM Diagnostics Package. It is also included in the E3SM Unified Environment as a tool for post processing and analyzing E3SM data.
At a broader engineering level, xCDAT’s intentional design encourages software sustainability and reproducible science. xCDAT aims to contribute to Pangeo’s effort to foster an ecosystem of mutually compatible geoscience Python packages by following the best practices for Pangeo projects. xCDAT’s well-documented and configurable features allow scientists to rapidly develop robust, reusable, and maintainable code (API Reference page). The xCDAT documentation includes a gallery of Jupyter Notebooks that demonstrate how to use key features in scientific workflows. xCDAT inherits support for parallel computing through Xarray and Dask, enabling scientists to speed up their workflows by efficiently utilizing their compute resources. xCDAT uses modern software engineering practices such as continuous integration and continuous deployment to ensure high-quality software is released to the open-source. xCDAT is rigorously tested using real-world data and has 100% code coverage rate at the time of this abstract.
xCDAT is a community-driven open source project that encourages discussion and contributions from everyone. The main goals for the remainder of FY24 include:
- Scope out interoperability with UXarray to support analysis on unstructured grids using xCDAT features (e.g., unstructured to structured regridding)
- Explore the addition of functionality to handle hybrid grids (e.g., pressure)
- Improve documentation and guidance for leveraging Dask with xCDAT to parallelize workflows with large datasets
- Continue to assist PMP, E3SM Diags and potentially other DOE funded analysis capabilities such as ARM Diags and ILAMB to migrate towards using xCDAT