AiiDA manages, preserves and disseminates the simulations, data and workflows of modern computational science
by Carey Sargent, EPFL, NCCR MARVEL
While researchers could only perform a small number of expensive simulations in the past, they can now use computational engines to run thousands of simultaneous simulations to understand, predict or design the properties of complex materials and devices—massive amounts of data and calculations are being produced.
Though this represents a huge opportunity, the processes and resulting data need to be handled carefully. An ideal tool would ease data management, storage and retrieval, help ensure that simulations are reproducible and encourage the sharing and cross-validation of results by the larger research community. It would be flexible enough to support different tasks, but also easy to use. All these ideas have been integrated into AiiDA (www.aiida.net), an automated interactive infrastructure and database built on the four “ADES” pillars of automation, data, environment and sharing (see paper 1, cited below).
“When you have 100s or maybe even 1000s of calculations you can keep track of what you’re doing,” said Spyros Zoupanos, an NCCR MARVEL scientist at EPFL and AiiDA developer. “But when you have 100'000 or millions of calculations and you want to understand the difference between very similar approaches and you have to think and remember where you stored everything or how you organized your files or even what you did…you can’t focus on everything.”
“The first two pillars of AiiDA, automation and data, allow users to automate critical lower-lever concerns such as automation and efficient data management, which are nonetheless complex, and focus on the actual research”, said Giovanni Pizzi, an NCCR MARVEL scientist at EPFL who has been developing AiiDA since the beginning and coordinates the project. “The two other pillars, environment and sharing, couple a high-level environment with a social ecosystem to stimulate the sharing of codes, data and workflows.”
“In the end, people should be using AiiDA because it allows you to do things faster.” —Spyros Zoupanos, NCCR MARVEL and EPFL
The user-friendly platform allows researchers to access local and remote computer resources while a tight coupling of storage and workflow automation addresses the goal of full reproducibility of calculations and the resulting data chain. A database design that’s based on directed acyclic graphs and tailored to data management for high-throughput simulations allows for efficient data analysis and investigation of varying results and of how calculations are related. The generated knowledge can be easily shared thanks to the easy set-up and management of local repositories and tools that help share not only the data itself, but also the full scientific workflows used to generate the results.
“In the end, people should be using AiiDA because it allows you to do things faster,” Zoupanos said. “It’s a big tool that allows you to express your ideas in an abstract way without getting your hands too dirty. It does the difficult job of implementing and materializing your ideas and allows you to sit at a higher level and understand what you did, what you expressed and analyze your results.”
“AiiDA supports many of the codes used by the materials science community, but there’s nothing that stops anyone from writing a new plug-in to make what they have compatible too.” —Spyros Zoupanos, NCCR MARVEL and EPFL
While AiiDA has been created and developed within the domain of materials science, it’s general enough to be used in other fields. It is not tightly bound to a specific code and users who wish to use the platform in other applications must simply write a plug-in that tells AiiDA how to execute a given code on a supercomputer, and how to interpret back the results of the code—it basically explains what would be done manually by a normal user connecting to a supercomputer.
“There’s a lot of effort, even from outside the project team, to write plug-ins for various codes,” Zoupanos said. “AiiDA supports many of the codes used by the materials science community, but there’s nothing that stops anyone from writing a new plug-in to make what they have compatible too.”
The team has developed a tool that allows developers to test such plug-ins, one of the ways they’re promoting wider use of AiiDA. Another important approach is the organization of tutorials meant to introduce prospective users to the platform and how it works. In the tutorials, they use simple scenarios to show how researchers can run new calculations and simulations, how they can retrieve and query results, how to write workflows, etc. They have hosted two to three per year since 2014. The next one will be hosted from 30 May to 1 June at the Italian Supercomputer Center CINECA in Bologna. It will be preceded by an optional two-day introduction to Python.
“If a company needs to screen a lot of materials for a given property and doesn’t have the expertise to compute it from first principles, then AiiDA is definitely a good idea for them,” he said. “It will allow them to rationalize their process a lot.” —Nicolas Mounet, NCCR MARVEL and EPFL
It was at one such tutorial that a company, Constellium, became interested in using AiiDA for its own research. The company has signed a contract with the AiiDA team who provided them with the code, additional features tailored to their work and the necessary support. Companies could gain from use of AiiDA in various ways, said Nicolas Mounet, an NCCR MARVEL scientist at EPFL who develops AiiDA workflows to manage his research. A company may want, for example, to screen many materials to find one with a particularly disrupting property.
“If a company needs to screen a lot of materials for a given property and doesn’t have the expertise to compute it from first principles, then AiiDA is definitely a good idea for them,” he said. “It will allow them to rationalize their process a lot.”
Others, like Constellium, may just wish to keep track of very complex manipulations, allowing scientists to have a better overview of what they’ve already done a year later or more easily communicate the workflow to someone else. It allows them to see what calculations were carried out on a piece of data, the results and the ensuing sequence, something that’s unique to AiiDA in the world of simulations, even in other fields. A third option could involve extending AiiDA beyond materials science, to additional domains, but this would entail extensive collaboration between a given company and the AiiDA development team.
“We’ve spent the past three or four years really building up the software, making it robust, and now we have a stable code already being used in publications such as the paper in Nature Nanotechnology,” Mounet said. (See paper 2, cited below) “We haven’t been too active in trying to find industrial partners yet, but this is something I think is coming in the next year or so.”
Paper 1
G. Pizzi, A. Cepellotti, R. Sabatini, N. Marzari, B. Kozinsky, AiiDA: automated interactive infrastructure and database for computational science, Computational Materials Science 111, 218-230 (2016).
https://arxiv.org/abs/1504.01163
Paper 2
N. Mounet, M. Gibertini, P. Schwaller, D. Campi, A. Merkys, A. Marrazzo, T. Sohier, I. E. Castelli, A. Cepellotti, G. Pizzi & N. Marzari, Two-dimensional materials from high-throughput computational exfoliation of experimentally known compounds, Nature Nanotechnology (2018), doi:10.1038/s41565-017-0035-5
Low-volume newsletters, targeted to the scientific and industrial communities.
Subscribe to our newsletter