Increasing the Reproducibility and Replicability of Supervised AI/ML in the Earth Systems Science by Leveraging Social Science Methods
Supervised machine learning requires labeled data to train. While this labeled data can be defined mathematically (e.g., indices or anomalies), other times, these labels are created by manually categorizing samples (e.g., labeling samples by hand). However, when a supervised machine learning approach requires hand labeling, there is often no thorough documentation behind the process used to assign these labels nor measure of how consistent the labelers are across samples. Not documenting this process can cause difficulty when research teams attempt to add more labeled samples or attempt to reproduce results. To address these issues, this work introduces and demonstrates a method called Quantitative Content Analysis (QCA). QCA is a method from the social sciences that aims to objectively categorize data through documentation (a codebook) of the decision-making process used to classify samples along with a reliability assessment to evaluate the consistency among labelers who use the codebook. Until the reliability tests are successful, the codebook is iteratively updated to address inconsistencies. This thorough and documented process provides an opportunity for more methodologically transparent supervised machine learning applications.
This work demonstrates how supervised machine learning applications in the Earth System Sciences can leverage social science methods, such as QCA, to improve the reproducibility and replicability of hand labeling tasks (e.g. labeling atmospheric rivers or fronts). As machine learning continues to grow in popularity and is applied to a variety of different data sets (e.g. observations, CESM, and E3SM), it is important to use these tools to ensure results can be reproduced and replicated.
We introduce a social science method known as Quantitative Content Analysis (QCA) to the Earth System Sciences to help improve the reproducibility and replicability of hand labeling data for supervised machine learning tasks. We further provide a case study as an example of its successful application for labeling meteorological-related road conditions.