hackingmaterials / matminer

Data mining for materials science

Home Page:https://hackingmaterials.github.io/matminer/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

F. Ricci et al. Electronic Transport Properties available through load_dataset()?

janosh opened this issue · comments

Is the MPContrib Electronic Transport dataset available via matminer?

This

from matminer.datasets import get_available_datasets

get_available_datasets()

prints

['boltztrap_mp',
 'brgoch_superhard_training',
 'castelli_perovskites',
 'citrine_thermal_conductivity',
 'dielectric_constant',
 'double_perovskites_gap',
 'double_perovskites_gap_lumo',
 'elastic_tensor_2015',
 'expt_formation_enthalpy',
 'expt_gap',
 'flla',
 'glass_binary',
 'glass_binary_v2',
 'glass_ternary_hipt',
 'glass_ternary_landolt',
 'heusler_magnetic',
 'jarvis_dft_2d',
 'jarvis_dft_3d',
 'jarvis_ml_dft_training',
 'm2ax',
 'matbench_dielectric',
 'matbench_expt_gap',
 'matbench_expt_is_metal',
 'matbench_glass',
 'matbench_jdft2d',
 'matbench_log_gvrh',
 'matbench_log_kvrh',
 'matbench_mp_e_form',
 'matbench_mp_gap',
 'matbench_mp_is_metal',
 'matbench_perovskites',
 'matbench_phonons',
 'matbench_steels',
 'mp_all_20181018',
 'mp_nostruct_20181018',
 'phonon_dielectric_mp',
 'piezoelectric_tensor',
 'steel_strength',
 'wolverton_oxides']

So I'm guessing not? If so, curious to know why.

Also, I'd like to suggest adding a short code block to each MPContrib detail page showing how to download it. E.g.

Use matminer (pip install matminer) to download this dataset programmatically:

from matminer.datasets import load_dataset

df = load_dataset("matbench_phonons")

Hey @janosh

Currently the full data is not available through matminer, though if @tschaume wants to make a matminer-loadable static .json.gz of it available, I'd be glad to add it to matminer.

There is an abbreviated version of it: https://hackingmaterials.lbl.gov/matminer/dataset_summary.html, boltztrap_mp available in matminer. The following columns are available:

image

@ardunn Thanks for the quick reply! Do you have any information on how the 8,924 entries were selected from the 44,333 listed in the full dataset at https://contribs.materialsproject.org/projects/carrier_transport?

@janosh @ardunn I do have different versions of a potential .json.gz files we could use to link the full dataset up to matminer. I'll make them available at a persistent link in MPContribs and report back here by Monday (hopefully).

@janosh @ardunn There's a JSON file for download now at https://contribs.materialsproject.org/projects/carrier_transport.json.gz (12.5MB). It reflects the format of the contributions as they go into the MPContribs API and does not include the temperature- and doping-level dependent tables. Happy to iterate if it isn't a suitable format to link up to matminer. FYI @fraricci

Thanks a lot @tschaume! 👍

I'm guessing for addition to matminer it should be in a format ready for data mining. So probably not have dtype object (i.e. strings) for target columns but floats.

Here's a version of the dataset as we would use it with models like CGCNN: https://github.com/janosh/matbench/commit/df3831319599b9aa3768dd5f97fdac5ab94bdc37.

What's the meaning of .v in these columns?

Sᵉ.p.v [µV/K]
Sᵉ.n.v [µV/K]
σᵉ.p.v [1/Ω/m/s]
σᵉ.n.v [1/Ω/m/s]
PFᵉ.p.v [µW/cm/K²/s]
PFᵉ.n.v [µW/cm/K²/s]
κₑᵉ.p.v [W/K/m/s]
κₑᵉ.n.v [W/K/m/s]

Ah. From here:

Value (v), temperature (T), and doping level (c) at the maximum of the average eigenvalue of the Seebeck coefficient

Thanks @janosh and @tschaume . I will add these to the metadata at the same time that I add Ryan Kingsbury's updated expt_gaps and _formation_enthalpy datasets. The columns will be casted to the correct dtypes before uploading as well.

@janosh @tschaume I wound up using the carrier_transport_with_strucs.json.gz that @janosh referenced earlier. Unfortunately the file currently hosted on mpcontribs has a pesky data column which is not super easy to use, so the raw json.gz has been uploaded to figshare (https://figshare.com/articles/dataset/ricci_boltztrap_mp_tabular/14701110) in the meantime.

Notes for @janosh

The *_strucs.json.gz needed some minor adjustments.

  • "type" column name was changed to "functional", as "type" is ambiguous
  • all carrier concentrations at optimal values of S, kappa, PF, and conductivity were mis-parsed (e.g., 1e20 --> 120.0). They were easily corrected. Thought it might be important for you to know if you were doing calculations with these values...that the carrier concentrations were 115 cm^-3 instead of say, 1x10^15 cm^-3
  • mpid label was added to index

Notable additions to metadata beyond what was in MPContribs:

  • description in metadata was expanded to provide more details
  • description of each column was expanded to comprehensively explain each one, as otherwise it can be kind of confusing if a user doesn't know exactly what the data is

Notes for @tschaume

If there is any major problems with hosting this data temporarily on figshare lmk and it will be removed immediately. Obviously the best scenario is if the matminer-compatible .json.gz is hosted on MPContribs. If there is no major problem keeping this file on Figshare in the interim it will remain there until MPContribs has a serviceable link to the matminer-compatible .json.gz. Let me know if/when that is done and I will update the matminer link.

all carrier concentrations at optimal values of S, kappa, PF, and conductivity were mis-parsed (e.g., 1e20 --> 120.0).

@ardunn Oops! I wasn't using those columns but very good thing you noticed. Thanks for making the data easily available through matminer! 😅