F. Ricci et al. Electronic Transport Properties available through load_dataset()?
janosh opened this issue · comments
Is the MPContrib Electronic Transport dataset available via matminer
?
This
from matminer.datasets import get_available_datasets
get_available_datasets()
prints
['boltztrap_mp',
'brgoch_superhard_training',
'castelli_perovskites',
'citrine_thermal_conductivity',
'dielectric_constant',
'double_perovskites_gap',
'double_perovskites_gap_lumo',
'elastic_tensor_2015',
'expt_formation_enthalpy',
'expt_gap',
'flla',
'glass_binary',
'glass_binary_v2',
'glass_ternary_hipt',
'glass_ternary_landolt',
'heusler_magnetic',
'jarvis_dft_2d',
'jarvis_dft_3d',
'jarvis_ml_dft_training',
'm2ax',
'matbench_dielectric',
'matbench_expt_gap',
'matbench_expt_is_metal',
'matbench_glass',
'matbench_jdft2d',
'matbench_log_gvrh',
'matbench_log_kvrh',
'matbench_mp_e_form',
'matbench_mp_gap',
'matbench_mp_is_metal',
'matbench_perovskites',
'matbench_phonons',
'matbench_steels',
'mp_all_20181018',
'mp_nostruct_20181018',
'phonon_dielectric_mp',
'piezoelectric_tensor',
'steel_strength',
'wolverton_oxides']
So I'm guessing not? If so, curious to know why.
Also, I'd like to suggest adding a short code block to each MPContrib detail page showing how to download it. E.g.
Use
matminer
(pip install matminer
) to download this dataset programmatically:from matminer.datasets import load_dataset df = load_dataset("matbench_phonons")
Hey @janosh
Currently the full data is not available through matminer, though if @tschaume wants to make a matminer-loadable static .json.gz of it available, I'd be glad to add it to matminer.
There is an abbreviated version of it: https://hackingmaterials.lbl.gov/matminer/dataset_summary.html, boltztrap_mp
available in matminer. The following columns are available:
@ardunn Thanks for the quick reply! Do you have any information on how the 8,924 entries were selected from the 44,333 listed in the full dataset at https://contribs.materialsproject.org/projects/carrier_transport?
@janosh @ardunn There's a JSON file for download now at https://contribs.materialsproject.org/projects/carrier_transport.json.gz (12.5MB). It reflects the format of the contributions as they go into the MPContribs API and does not include the temperature- and doping-level dependent tables. Happy to iterate if it isn't a suitable format to link up to matminer. FYI @fraricci
Thanks a lot @tschaume! 👍
I'm guessing for addition to matminer
it should be in a format ready for data mining. So probably not have dtype
object
(i.e. strings) for target columns but floats.
Here's a version of the dataset as we would use it with models like CGCNN: https://github.com/janosh/matbench/commit/df3831319599b9aa3768dd5f97fdac5ab94bdc37.
What's the meaning of .v
in these columns?
Sᵉ.p.v [µV/K]
Sᵉ.n.v [µV/K]
σᵉ.p.v [1/Ω/m/s]
σᵉ.n.v [1/Ω/m/s]
PFᵉ.p.v [µW/cm/K²/s]
PFᵉ.n.v [µW/cm/K²/s]
κₑᵉ.p.v [W/K/m/s]
κₑᵉ.n.v [W/K/m/s]
Ah. From here:
Value (v), temperature (T), and doping level (c) at the maximum of the average eigenvalue of the Seebeck coefficient
@janosh @tschaume I wound up using the carrier_transport_with_strucs.json.gz
that @janosh referenced earlier. Unfortunately the file currently hosted on mpcontribs has a pesky data
column which is not super easy to use, so the raw json.gz has been uploaded to figshare (https://figshare.com/articles/dataset/ricci_boltztrap_mp_tabular/14701110) in the meantime.
Notes for @janosh
The *_strucs.json.gz
needed some minor adjustments.
- "type" column name was changed to "functional", as "type" is ambiguous
- all carrier concentrations at optimal values of S, kappa, PF, and conductivity were mis-parsed (e.g., 1e20 --> 120.0). They were easily corrected. Thought it might be important for you to know if you were doing calculations with these values...that the carrier concentrations were 115 cm^-3 instead of say, 1x10^15 cm^-3
- mpid label was added to index
Notable additions to metadata beyond what was in MPContribs:
- description in metadata was expanded to provide more details
- description of each column was expanded to comprehensively explain each one, as otherwise it can be kind of confusing if a user doesn't know exactly what the data is
Notes for @tschaume
If there is any major problems with hosting this data temporarily on figshare lmk and it will be removed immediately. Obviously the best scenario is if the matminer-compatible .json.gz is hosted on MPContribs. If there is no major problem keeping this file on Figshare in the interim it will remain there until MPContribs has a serviceable link to the matminer-compatible .json.gz. Let me know if/when that is done and I will update the matminer link.
all carrier concentrations at optimal values of S, kappa, PF, and conductivity were mis-parsed (e.g., 1e20 --> 120.0).
@ardunn Oops! I wasn't using those columns but very good thing you noticed. Thanks for making the data easily available through matminer
! 😅