Skip to main content
U.S. flag

An official website of the United States government

Publication Date
22 October 2024

MSD-LIVE: Enabling Open MultiSector Dynamics Science

Image
Image Caption

MSD-LIVE is a collaborative data and computational platform for the MultiSector Dynamics community, 

|
Image Credit

EESM/MSD

Image
Image Caption

MSD-LIVE Principal Investigator Casey Burleyson. 

|
Image Credit

Andrea Starr, Pacific Northwest National Laboratory

Description

An Innovative Approach to Collaboration and Data Management

Just a few years ago, members of the Department of Energy’s (DOE) MultiSector Dynamics (MSD) research community had a problem: The ever-growing scale, multidisciplinary nature, and societal importance of their work demanded new and innovative strategies.

“Successful MSD research projects integrate large, interdisciplinary, multi-institutional teams”, says Casey Burleyson, an earth scientist at Pacific Northwest National Laboratory (PNNL) in Washington state. 

“Like other research communities, MSD struggled with data and code management—from managing storage to making it easy for others to find, share, and re-use data and code. However, the extraordinary diversity of MSD research exacerbated these challenges.”

Open science, he says, was the answer. “In theory, all science should be open, but in practice, it often is not. This is due to the technical obstacles of sharing code and data and managing it in practical ways. Most researchers want to do the right thing, but they don’t always have the tools.”

Burleyson and a small team from PNNL set out to address the challenge by creating the needed tools—a platform that MSD researchers could use to document, archive, and share data, software, and multi-model workflows across a large, collaborative scientific community. 

After reviewing several open-source digital repository systems, the team selected the InvenioRDM framework for its flexibility and functionality. The result? MSD-LIVE, short for MultiSector Dynamics – Living, Intuitive, Value-adding, Environment. 

The cloud-based platform is designed to meet the growing needs of the MSD Community of Practice, which includes hundreds of scientists from national laboratories and universities. 

Burleyson emphasizes the innovative nature of the technology behind MSD-LIVE. “It’s not just our use of the cloud, but the ability to extend the open-source InvenioRDM framework that underpins our data repository and our approach to on-demand computing for training new researchers to use MSD models. We’re doing something new and unique.”

Launched in August 2022 with funding from DOE Program Manager Bob Vallario, Burleyson says that MSD-LIVE is already transforming how MSD researchers work together. The platform has fostered an open, collaborative community of—at last count—219 active users across 10 projects, hosting 112 datasets, nearly 165,000 files, and more than 185 TB of data. 

“MSD-LIVE provides researchers the tools they need to break down barriers and do open MSD science,” says Burleyson. 

During the project’s first phase, the team developed a cloud-based core data repository to store, share, and document MSD data. With the capabilities now available, MSD researchers can:

  • quickly and easily find datasets produced by other users and projects
  • archive final-form datasets and generate data Digital Object Identifiers to meet journal requirements for data sharing
  • use an intuitive web-based user interface to document and share versioned datasets and associate data with the code used to produce it
  • train new team members on MSD projects to effectively manage their data and code.

MSD-LIVE is now focused on interactive computing so users can run lightweight analysis and data processing codes (e.g., re-gridding, scaling, visualization, etc.) on any publicly available dataset. 

The MSD community is leveraging the cloud-computing capabilities in MSD-LIVE to train new people and teams to understand and extend MSD models. For example, at the 2024 Global Change Analysis Model (GCAM) annual meeting, hundreds of attendees were able to run and analyze the output of seven GCAM ecosystem models in real time on the Amazon Web Services cloud.

The MSD-Live team is comprised of MSD researchers, data scientists, and software engineers. Burleyson is quick to give “all credit” to the team. “They have been phenomenal to work with,” he says. “The best part of the project has been watching them grow as technical experts and professionals. I see how they work together to solve problems, and it’s inspiring.”

The core team includes:

  • Casey Burleyson, Principal Investigator (PNNL)
  • Zoë Guillen, Data Repository Lead (PNNL)
  • Carina Lansing, Lead Software Architect (PNNL)
  • Matthew Macduff, Software Engineer (PNNL)
  • Devin McAllester, Lead Developer (PNNL)
  • Jon Weers, Senior Cloud Advisor (National Renewable Energy Laboratory – NREL)
Project(s)
Funding Program Area(s)