On the trail of the missing hydrogen atoms
Artificial intelligence is often used to generate images. In research, specialised AI models are used for scientific applications – for example, to predict the positions of atoms in materials. The MatterGen model developed by Microsoft can generate complex crystal structures from just a few pieces of information – which atoms should be present and in what proportions – and researchers can then use these structures for computer simulations of new materials.
Now a scientific team led by Giovanni Pizzi from NCCR MARVEL and from the PSI Center for Scientific Computing, Theory and Data, together with researchers from the universities of Parma and Modena in Italy, has found a way to use AI to solve a practical problem in materials science: locating missing atomic positions in otherwise known structures. As they report in the journal npj Computational Materials, the materials scientists used an approach normally employed in image processing or machine vision, that is, recognition and interpretation of visual information by means of AI.
This allows materials that are experimentally known but have been theoretically inaccessible to be simulated for the first time or significantly better than before. Thus the researchers are contributing to the exploration of new materials with special properties, for hydrogen storage for example, or potentially for the development of new superconductors.
“Invisible” hydrogen atoms
“For our simulations of material properties, we rely on information in databases telling us where each atom is located in a crystal structure,” says Timo Reents, a doctoral candidate in Giovanni Pizzi’s group. However, the element hydrogen presents a challenge. It is often part of the crystal lattice, but it is difficult to detect experimentally using traditional methods that measure the arrangement of atoms through X-ray scattering. Consequently, the positions of hydrogen atoms in crystal representations are often inaccurate, or they are missing altogether from the visualisations.
Precise knowledge of the atomic positions is essential for computer simulations that researchers use to predict specific material properties, such as electrical or thermal conductivity. “If the information about the hydrogen atoms is missing, that’s a problem,” says Giovanni Pizzi. “Often, we can’t use several thousand potentially interesting materials for our simulations precisely for this reason.” This is where AI should be able to help.
When a paw is missing from a photo of a dog
In machine vision, so-called diffusion models are used. When these are used to fill in missing image information, it is called inpainting. For example, a paw that was hidden in a photo of a dog can be added.
Earlier approaches to machine vision would first add “noise” to the entire image of the dog, intentionally overlaying it with random image information, in order to then reconstruct the photo with all four paws in a second step. Now, however, it is standard practice to vary the strength of the noise depending on the image area: Noise would be added heavily only to the unknown regions where the paw should be.
While this is already established in the field of machine vision, it was previously unavailable for the reconstruction of atomic positions. Now Giovanni Pizzi’s team has developed an adapted open-source model called XtalPaint, based on Microsoft’s MatterGen. “This combines the advantages of modern machine vision and crystal reconstruction: Noise is added only to the unknown positions within the crystal – the known positions remain largely unchanged during the process,” Timo Reents explains.
This offers greater efficiency, just as it does in modern inpainting approaches in machine vision: “With step-by-step reconstruction, XtalPaint can orient itself to the existing crystal from the very beginning,” Reents says. “This increases the success rate and also conserves computing power.”
Timo Reents (left) and Giovanni Pizzi
Also applicable to lithium and sodium
To test their method, the researchers removed the hydrogen atom positions from known crystal structures and then used XtalPaint to reconstruct them. In 87 percent of cases, they found the known positions – and in another ten percent, configurations that were even more stable energetically. “Overall, this means a success rate of 97 percent for XtalPaint,” Reents says.
“We can now use our method, for example, to complete structures in databases with the missing hydrogen positions,” says Pizzi. Also, he and his colleagues have already detected errors in databases that can arise through data transfer from original scientific publications. Furthermore, they can apply the method not only to hydrogen atoms, but also to lithium and sodium – two elements that are important for the development of new batteries.
Reference
Reents, T., Cantarella, A., Bercx, M. et al. Score-based diffusion models for accurate crystal-structure inpainting and reconstruction of hydrogen positions. npj Comput Mater 12, 203 (2026). https://doi.org/10.1038/s41524-026-02090-1
Low-volume newsletters, targeted to the scientific and industrial communities.
Subscribe to our newsletter