sparks-baird / matbench-genmetrics

Generative materials benchmarking metrics, inspired by guacamol and CDVAE.

Home Page:https://matbench-genmetrics.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

On the validity score (improving understanding and/or modifications)

sgbaird opened this issue · comments

https://twitter.com/keeeto2000/status/1555143104650428419 by @keeeto (@keeeto2000 on Twitter)

Very timely idea! I am not sure I totally follow the validity score - it sounds a bit like an FID score. If this runs on top of xtal2png, would it be possible to just implement an FID score? Obvs it might be a bit slower because Inception...

As_a_distance_between_probability_distributions_(the_FID_score) has a "see also" to Wasserstein_metric § Normal_distributions

And in the Fréchet inception distance wiki article:

In other words, it is the 2-Wasserstein distance on {\displaystyle \mathbb {R} ^{n}}\mathbb {R} ^{n}.

Wondering whether this needs to be renamed, explained differently, or if it needs to be changed to a different calculation. The intention behind the validity score is to ensure that the generated structures are "reasonable" and "valid" (i.e. realistic), and a set of structures with a similar space group number distribution to known structures from train+test seemed like a good way to tell, especially with the difficulty some models have had with generating structures other than P1 symmetry. Using e_above_hull would be another option, but this requires a high-fidelity property predictor and could lead to bias depending on the model used.

Related:

(1) Dimitrakopoulos, P.; Sfikas, G.; Nikou, C. Wind: Wasserstein Inception Distance For Evaluating Generative Adversarial Network Performance. In ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2020; pp 3182–3186. https://doi.org/10.1109/ICASSP40776.2020.9053325.

I.e. relaxing the Gaussian assumption (Frechet --> Wasserstein)

(from abstract)

... We extend FID by relaxing the Gaussian hypothesis of the related inception features and extend it for non-Gaussian, multimodal distributions. ...

I think that the score is a good one. I didn't think of the point regarding the difficulty in generating anything but P1 - so the bit of text that you added there certainly helps to motivate the score. I might call it something like crystallographic validity or similar, as it is based on the symmetry rather than the chemistry. You could have a load of crazy chemical bonds for example and yet have a reasonable spacegroup distribution.

If you wanted to check for chemical bonding validity an interesting approach might be to calculate the elemental embeddings using our recent SkipAtom https://github.com/lantunes/skipatom https://www.nature.com/articles/s41524-022-00729-3 . Essentially SkipAtom generates an embedding based on observed chemical environments, it is based on word embeddings from NLP. If your SkipAtom emebdding is similar to the training set, then you should have similar chemistry - this would be a kind of "crystal chemistry validity".

@keeeto I like the idea of considering both structural and chemical validity, which is in line with some other recent changes #39 (comment). I've been hoping to use SkipAtom at some point, so I'm glad you're bringing it up in this context. I've also been interested in using it as the elemental featurizer for CrabNet in a Matbench submission for matbench_expt_gap. I already incorporated a SkipAtom CSV file into CrabNet lantunes/skipatom#6, so just a matter of fleshing out a short script and preparing a benchmark submission.

Oh - that sounds cool. I will be really interested to hear how SkipAtom + CrabNet works :). BTW we have been using a slight mod of CrabNet in an upcoming piece of work - nice to see some good cross-pollintaion; Open Source win!!!