Materials follow the 'Rule of Four', but scientists don’t know why yet

This was published on April 17, 2024

A new study by MARVEL researchers describes an unexpected "rule" followed by about 60 per cent of structures included in large databases of computational and experimental materials: their primitive unit cells are made out of multiples of four atoms. The scientists tried many different explanations, considering the role of specific chemical elements as well as formation energy and symmetry, but a convincing explanation is yet to be found. Still, the scientists could use an algorithm to predict with high accuracy whether a given compound will follow the Rule of Four or not.

By Nicola Nosengo - NCCR MARVEL

Scientists are normally happy to find regularities and correlations in their data – but only if they can explain them. Otherwise, they worry that those patterns might just be revealing some flaw in the data itself, so-called experimental artifacts.

That’s what scientists in Nicola Marzari’s group at EPFL worried about when they noticed an unexpected pattern in two widely used databases of electronic structures, the Materials Project (MP) database and the Materials Cloud 3-dimensional crystal structures ‘source’ database (MC3Dsource).

The two collections include over 80,000 electronic structures of experimental as well as predicted materials, and in principle all types of structures should be equally represented. But scientists noticed that around 60 per cent of structures in both databases have primitive unit cells (the smallest possible cell in a crystal structure) made out of a multiple of 4 atoms. The scientists named this recurrence the “Rule of Four” and started looking for an explanation.

The two datasets contain a disproportionate amount of compounds with a primitive unit cell containing multiples of 4 atoms. Gazzarini et al, npj Comput Mater (2024) 

“A first intuitive reason could come from the fact that when a conventional unit cell (a larger cell than the primitive one, representing the full symmetry of the crystal) is transformed into a primitive cell, the number of atoms is typically reduced by four times”, says Elena Gazzarini, a former INSPIRE Potentials fellow in the Laboratory of Theory and Simulation of Materials (THEOS) at EPFL and now at CERN in Geneva. “The first question we asked was whether the software used to ‘primitivize’ the unit cell had had done it correctly, and the answer was yes”.

From a chemical point of view, another possible suspect was the coordination number of silicon (the number of atoms that can bind to its atom), which is four. “We could expect to find that all the materials following this rule of four included silicon” says Gazzarini. “But again, they did not”.

The Rule of Four could not either be explained by the formation energies of the compounds. “The materials that are most abundant in nature should be the most energetically favoured, which means the most stable ones, those with negative formation energy” says Gazzarini. “But what we saw with classic computational methods was that there was no correlation between the rule of four and negative formation energies”.

Because the materials space covered by the two databases is huge, going from small unit to very large cells with dozens of different chemical species, there was still a chance that a more refined analysis looking for a correlation between formation energies and chemical properties may provide an explanation. So, the team involved Rose Cernosky, a machine-learning expert at the University of Wisconsin, who developed an algorithm to group structures according to their atomic properties and look at formation energies within classes of materials sharing some chemical similarities. But again, this method did not provide a way to distinguish the rule-of-four compliant materials from the non-compliant ones.

Similarly, the abundance of multiple of fours does not even correlate with highly symmetric structures, but rather with low symmetries and loosely packed arrangements.

In the end, the resulting article in npj Computational Materials is the rare example of a scientific paper describing a negative result: the researchers could only describe the phenomenon and rule out several possible causes, without finding one. But negative results can be just as important as positive ones for scientific advancement, because they point to difficult problems – which is why scientists often complain that journals should publish more such studies.

The failure to find a compelling explanation did not prevent the group from predicting, through a Random Forest algorithm, with an accuracy of 87% whether a given compound will follow the Rule of Four or not. “This is interesting because the algorithm uses only local rather than global symmetry descriptors, which suggests that there may be small chemical groups in the cells (still to be found) that may explain the rule of four” says Gazzarini.


Gazzarrini, E., Cersonsky, R.K., Bercx, M. et al. The rule of four: anomalous distributions in the stoichiometries of inorganic compounds. npj Comput Mater 10, 73 (2024).

Stay in touch with the MARVEL project

Low-volume newsletters, targeted to the scientific and industrial communities.

Subscribe to our newsletter