google-deepmind / materials_discovery

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inconsistencies in Dataset Counts Across Different Attributes

HarshaSatyavardhan opened this issue · comments

I have downloaded the dataset and the number of datapoints keep on changing

  • by conductivity - 377223
  • by id - 384939
  • by reduced_formula - 377184

why their is huge number difference in the cif's in these particular folders ?

I understand they are screening some erroneous structures from the initial dataset, I archived the initially published version as an OPTIMADE API at https://optimade-gnome.odbx.science/v1/structures.