Materials Cloud, AiiDA, cornerstones of MARVEL open science strategy, feature in Scientific Data
by Carey Sargent, NCCR MARVEL, EPFL
The principle that open access to data, software and, eventually, infrastructure, leads to scientific results that can be assessed, verified and reproduced is core to the mission of open computational science. Today’s information technology enables the design of open-science platforms that let scientists use existing data, submit new content and launch new simulations with minimal technical expertise.
Materials Cloud aims to be such a platform—“Materials Cloud, a platform for open computational science” is the first paper describing the non-profit service developed and supported by NCCR MARVEL, the European H2020 MaX Centre of Excellence, as well as a number of other partners.
“Materials Cloud has come some way since it first went online,” said Leopold Talirz, an NCCR MARVEL scientist working jointly at the Laboratory of Molecular Simulation (LSMO) and THEOS, and developer of Materials Cloud. “It contains a number of different sections and this paper explains the sections, how they work and what our plans for the future are. It’s a summary of what we have done so far and where we want to go.”
Materials Cloud emerged within the context of computational materials science, a field that offered advantages as well as challenges in terms of the platform’s development. While research data in the field is produced in digital form by default and existing data repositories already centralize large numbers of individual materials science calculations in one place, materials simulations often rely on complex workflows and require substantial computational power. Such considerations led developers to pursue a flexible approach to designing such workflows, as well as to automatically detail records of their many steps and interconnected results
Since first going online in 2018, developers have integrated numerous new computational tools and discover sections with curated data sets. The Materials Cloud Archive has been recommended as a repository of Nature Research’s Scientific Data, indexed by the Google Dataset Search and has become an active implementation network of the GO FAIR initiative. The number of submissions to the archive has roughly tripled every year, reflecting a real need for a data repository in materials science. In order to accommodate the growing number of submissions, the team recently unveiled a major reengineering of the Materials Cloud Archive. The new archive improves the user experience, permits the needed scaling and helps moderators and developers focus on the core work of the archive—enabling the seamless sharing and dissemination of resources in computational materials science.
Plans for the future include reducing the workload of platform administrators by moving towards a platform-as-a-service architecture and adapting the governance model of Materials Cloud to reflect its increasing role within the MARVEL and MaX scientific centers and the computational materials science community at large.
“We are looking to diversify the range of voices among moderators, starting with researchers who are associated with MARVEL or the Materials Cloud partners.” Talirz said. “Pioneers are very welcome!”
"(AiiDA) is serving more people and has become even more useful for scientists, making research more reproducible and automated. The publication is a summary of the technical accomplishments and the culmination of four years of hard work.” — Sebastiaan Huber.
Meanwhile, the paper “AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance” describes how the infrastructure has evolved over the last several years. The choice to publish the paper in Scientific Data—a cross-disciplinary open access journal focused the publication of peer-reviewed research data in an accessible way in order to facilitate interpretation and reuse—reflects how AiiDA has evolved from a useful tool for materials science calculations to representing an interesting alternative for the management of scientific data in general.
“The publication is a reflection of how AiiDA has grown and diversified and become more general,” said Sebastiaan Huber, a scientist at THEOS and one of the lead developers of AiiDA. “It’s serving more people and has become even more useful for scientists, making research more reproducible and automated. The publication is a summary of the technical accomplishments and the culmination of four years of hard work.”
The paper largely addresses a technical audience, covering improvements such as overall scalability and significant improvements in the workflow engine, which had reached its limits in the older version, Huber says. AiiDA now supports throughputs of tens of thousands of processes/hour, while automatically preserving and storing the full data provenance in a relational database. The data is queryable and traversable, and so enablesing high-performance data analytics. Despite substantial changes to the core engine, developers have made sure that all existing data can be automatically migrated to be fully compatible with the latest version, as required by AiiDA's goal of making data more reproducible and reusable.
AiiDA’s workflow language provides advanced automation, error handling features as well as a flexible plugin model that allows scientists to use AiiDA together with almost any simulation software. Users can share extensions on the dedicated plugin registry, where a vibrant developer community has already published 53 plugin packages supporting more than 100 simulation codes.
“This has been a big community effort.” Huber said. “The majority have not been implemented by AiiDA developers—these are people using AiiDA and developing plugins in the community.”
The AiiDA team has been active in community outreach, hosting more than 10 tutorials over the last two years, both in Europe as well as in India, China and Japan. More recently, they have also started building momentum in the U.S., taking part in SciPy 2020, a conference that aims to advance scientific computing through open source Python software for mathematics, science, and engineering. Joining the NumFOCUS organization has enabled AiiDA’s participation in the 2020 Google Summer of Code, a program that aims to bring more student developers into open source software development. The results of this active outreach can be seen in the annual survey of research projects—in April 2020 there were 69 research projects from several European countries, Japan, India and China. While most were from academia, and some 16% from research institutes and industry.
AiiDA 1.0 marks the arrival of a stable interface after the first years of continuous development. In the foreseeable future, AiiDA development will focus on further improving the performance and stability of the infrastructure, enabling the community to share their scientific workflows and data faster and more easily, while continuing to rely on the growing ecosystem of plugins.
Talirz, L., Kumbhar, S., Passaro, E. et al. Materials Cloud, a platform for open computational science. Sci Data 7, 299 (2020). https://doi.org/10.1038/s41597-020-00637-5
Huber, S.P., Zoupanos, S., Uhrin, M. et al. AiiDA 1.0, a scalable computational infrastructure for automated reproducible workflows and data provenance. Sci Data 7, 300 (2020). https://doi.org/10.1038/s41597-020-00638-4
Low-volume newsletters, targeted to the scientific and industrial communities.Subscribe to our newsletter