The Materials Cloud Archive has published its 1000th record

This was published on April 3, 2024

Launched in 2017, this open repository for materials science data has kept growing constantly since then and is now one of the repositories of choice recommended by the European Commission, the National Swiss Science Foundation and Nature Scientific Data. A new version with improved features will be launched in the second half of 2024, while the BIG-MAP  project will apply the same technology to create a restricted-access archive for sharing data within collaborations.

by Nicola Nosengo, NCCR MARVEL

Seven years after receiving its very first submission, the Materials Cloud Archive has reached an important milestone on 3 April with the publication of the 1000th record. Titled "Phonon-limited mobility for electrons and holes in highly-strained silicon", the latest addition to the Archive contains data based on first-principles calculations to explore the variations of the mobility of electrons and holes in silicon upon deformation by uniaxial strain, and comes from Samuel Poncé's group at the Université catholique de Louvain in Belgium. 

Launched in 2017, this open repository for materials science data has kept growing constantly since then and is now one of the repositories of choice recommended by the European Commission, the National Swiss Science Foundation and Nature Scientific Data.
 
The Materials Cloud Archive offers scientists the possibility to upload data connected to a research article, in a format that facilitates the reuse of data by other researchers. The Archive itself and datasets published on it are indexed by top registries such as re3data, fairsharing.org, Google Dataset Search and Eudat's B2FIND.

Materials Cloud Archive: number of records published per year.

Materials Cloud Archive: total uploaded data per year.

The Archive is part of the larger Materials Cloud ecosystem, that also includes educational materials and simulation tools to run electronic structure simulations on the cloud. The Materials Cloud is supported by a large consortium of partners that include NCCR MARVEL,  EPFL, the Paul Scherrer Institute, swissuniversities, the supercomputing centers CSCS, CINECA, and Forschungszentrum Jülich, the European Centre of Excellence MaX and many others.

“The Archive is a core component of the Materials Cloud, and is dedicated to preserving curated computational data that underpin reproducibility and allow for data mining and machine learning", says MARVEL director Nicola Marzari. "More broadly, it’s one of the pillars of our MARVEL digital infrastructure, where we explore how a digital facility - rather than a brick-and-mortar one - can empower future scientific discovery and technological innovation”. 

One of the Archive’s strong assets is the efficiency of its moderation - not a peer review of the data, but a quality check that ensures submissions comply with open data standards and guidelines to facilitate data reuse. In fact, “Over 75% of moderation feedback is provided within one day, and almost two thirds of records are published within three days of submission”, says MARVEL program manager Patrick Mayor. “Moderation is key to guarantee the quality and interoperability of data, but we make sure that it does not introduce unnecessary delays in publication”. 

Moderation statistics for the Materials Cloud Archive, 2020-2024.

The team behind the Archive is currently working on a new version that will be based on a new release of the Invenio software platform on which the Archive is run. Invenio was originally developed at CERN, where it powers the widely used Zenodo repository.

“The new version includes a number of new features to improve the scalability and interoperability of the data, and to support the use of the repository during the editorial process of journals” says Valeria Granata, senior researcher at EPFL who oversees the infrastructure. “For example it allows users to share a version of the data in read-only mode before it is published, which is useful for the referees of a paper”. The migration should happen in the second half of 2024, and the website’s front-end will also be improved.

Within the Battery Interface Genome – Materials Acceleration Platform (BIG-MAP) project, one of the projects that make up the Battery2030+ European initiative, the same technology behind the Materials Cloud Archive has been used to create a restricted-access archive, for researchers who are part of a project with a clear data management plan, are bound to a data agreement and cannot yet publish their data, but want to share it already with the members of their collaboration. “The idea is that, in addition to supporting this relevant use case, by using the same technology and metadata format the data will already receive the curation that would be needed for later publication, and in the end a simple click will allow to publish them openly on the Materials Cloud” says Giovanni Pizzi, group leader of the Materials Software and Data Group at the Paul Scherrer Institute and leader of the Open Digital Infrastructure project in MARVEL. “The plan is to make the platform easy to redeploy and extend the same concept to the other projects in Battery2030+”.

Other evolutions are around the corner thanks to the collaboration with the MAX (MAterials design at the eXascale) Centre of Excellence, an international consortium led by Italy’s National Research Council. In order to increase data safety, all the data published in the Materials Cloud Archive will soon be duplicated in two mirror databases hosted at CINECA in Bologna (Italy) and Forschungszentrum Jülich (Germany). The two centers will also provide a live backup of the Archive, to maximize data availability also in the (rare) cases in which the main website is offline, e.g. due to maintenance.

The Materials Cloud Archive is also becoming better integrated in the Swiss research ecosystem. The software behind the Archive is currently run at the Swiss National Computing Center (CSCS), and a plan is developing to leverage the long-term storage service at this facility to reinforce the long-term preservation of the data. Additionally, Pizzi explains, a new project funded by the ETH Board’s Open Research Data program will soon start to support the integration of the Materials Cloud Archive with the institutional repository DORA.

For any questions, please consult the Materials Cloud Archive’s FAQs or contact the Materials Cloud team at archive@materialscloud.org

Stay in touch with the MARVEL project

Low-volume newsletters, targeted to the scientific and industrial communities.

Subscribe to our newsletter