materialsproject / pymatgen

Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project.

Home Page:https://pymatgen.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Changing MP Input Sets

matthewkuner opened this issue · comments

There have been internal talks with @munrojm @janosh @esoteric-ephemera, among others, about changing the MP Input Sets. We want to open the discussion up before moving forward with such changes.

Currently, we are in favor of the following changes:

For all MP sets:

  • Set LMAXMIX = 6 for all structures. This is based on a benchmarking study @esoteric-ephemera performed (attached below) which showed that the VASP manual recommendations for setting LMAXMIX based on the element block is actually not sufficient in all cases.
  • Set LREAL=False, because the current LREAL=Auto seems to be less reliable for structures further from equilibrium (regardless of # of sites). Data supporting this is also in the attached benchmark by @esoteric-ephemera.
  • Get rid of EDIFF_PER_ATOM and instead replace it with a flat value. EDIFF = 1e-5 is a reasonable starting point for discussion.

For MPRelaxSet (and the MPMetalRelaxSet):

  • Use a force-based convergence criterion. EDIFFG = -0.05 is standard. We could also use -0.02 to match the value from the MPScanRelaxSet
  • Include additional functionality similar to the MPSCANRelaxSet wherein certain parameters are set according to bandgap, if known (see
    if self.bandgap == 0:
    ). This would likely involve merging the MPMetalRelaxSet and MPRelaxSet.
  • Change the ENCUT and ENAUG of MPRelaxSet to match the r2SCAN values of 680 and 1360, respectively.

Here is the current draft of @esoteric-ephemera's benchmarking study: bench_vasp_pars.docx.

@shyuep see the final section of the attached doc regarding LREAL (which we discussed earlier today)

@computron @Andrew-S-Rosen @JaGeo @utf would love to hear your thoughts as well!

Thanks. I have no problems with most of this, but I would point out a few issues.

The MPRelaxSet is generally used as the first calculation for a new structure. So the assumptions are (a) we do not know whether it is a metal or not, and (b) too strict criteria result in extremely long ionic convergence. Hence, I would still argue for keeping EDIFF_PER_ATOM, no EDIFFG and an assumption of insulator (unless someone wants to code a rough estimator, such as any compound containing only metallic elements are considered to be metals and everything else are assumed insulators. Alternatively, we can use one of the ML band gap models to make an initial estimate of the band gap). I have done relaxations before and my tests show that a loose first relaxation followed by a strict second relaxation is much faster and yields just as accurate results than immediately doing a strict relaxation.

For second relaxations, I believe the practice is actually to use EDIFF (no per atom), set EDIFFG and also determine ISMEAR based on band gap.

One possible compromise is to have a "strict" mode for MPRelaxSet, which defaults to False. People who want to immediately start with stricter criteria can just set it to True and EDIFF, EDIFFG will automatically be set.

@JiQi535 @SophiaRuan I have changed LREAL to False in MatPESStaticSet.

So the assumptions are (a) we do not know whether it is a metal or not

Couldn't we just use ISMEAR = 0, SIGMA = 0.05, and a higher KPOINTS in such cases? The first two could probably be done regardless of whether or not the "strict mode" stuff you are talking about is implemented, as they are the recommended settings by VASP when the bandgap is unknown.

I have done relaxations before and my tests show that a loose first relaxation followed by a strict second relaxation is much faster and yields just as accurate results than immediately doing a strict relaxation.

Perhaps this should be built into an Atomate2 workflow rather than the pymatgen sets? I expect most of the users of pymatgen do not use a package like Atomate2, so they are probably more likely to just use an input set for a single calc (rather than doing a "loose" calc --> "strict" calc type of workflow, which I expect is more tedious to do without workflow orchestration)

Just from the times when I ran such computations by hand: I have always done a loose and then strict calculation. I therefore think it is a good idea. I mostly, however, only adapted kpoints and not EDIFF.

@matthewkuner Also, re the recommendations on ISMEAR, I think it should be clarified that you are recommending -5 based on mostly the discrepancy in band gaps? I completely agree that band gaps are better with ISMEAR=-5. But for energies, the only result shown is that ISMEAR=0 and ISMEAR=-5 differ in a small number of cases, but I have no idea which is the correct one.

And yes, the design of MPRelaxSet is for the first of the two relaxations as implemented in Custodian and Atomate2. The second relaxation overrides EDIFF. A EDIFF_PER_ATOM has no real effect in small systems (which I would guess is the majority of structures out there).

I also add that the strict vs loose has smaller effects on PBE of course. But in HSE calculations for instance, a loose relaxation vs strict relaxation can be a huge difference in terms of convergence (in some cases, whether a calculation even succeeds or not).

@matthewkuner Also, re the recommendations on ISMEAR, I think it should be clarified that you are recommending -5 based on mostly the discrepancy in band gaps? I completely agree that band gaps are better with ISMEAR=-5. But for energies, the only result shown is that ISMEAR=0 and ISMEAR=-5 differ in a small number of cases, but I have no idea which is the correct one.

I'm advocating for using ISMEAR = 0 when we do not know the bandgap of a material (except for static calcs, where we always use ISMEAR=-5 for good energies/DOS). In subsequent relaxation calculations, ISMEAR could be set to -5 for non-metals, and to 1 or 2 for metals. The logic I'm advocating for already exists in the Atomate2 r2SCANRelaxSet that is going to be merged into pmg (see https://github.com/materialsproject/atomate2/blob/ecf80af781ee754f0f918a5ed90cf7d1247f62a5/src/atomate2/vasp/sets/base.py#L669)
*Note that I disagree with the choice of SIGMA used in the code above for the case where the bandgap is unknown--I think it should be set to 0.05. I've been dragging my feet on doing benchmarking tho

OK. But I just want to note that ISMEAR=0 does not affect insulator negatively at all. So for MatPES (where we don't really care about DOS, bandgap, etc.), we are going to do ISMEAR=0 for everything.

Hence, I would still argue for keeping EDIFF_PER_ATOM

This doesn't seem to matter too much if the r2SCAN workflow is used since the EDIFFG in the second step of the workflow will fix things up even for larger systems. But for a GGA workflow, is it indeed true that EDIFFG is always set explicitly (and EDIFF without per atom)? Just want to confirm that's indeed the case. As long as the per-atom business isn't being used in the final relax, I personally don't care what happens in terms of the first relax. (EDIFF_PER_ATOM is a horrible choice for the systems I typically study, which often have hundreds of atoms... but I am not MP 😅 ).

@esoteric-ephemera, could you confirm? Thanks!

@Andrew-S-Rosen for the GGA workflow, EDIFF_PER_ATOM is always set in each step of the WF (initial relax, second relax, static), and EDIFFG is never set explicitly (VASP defaults to EDIFFG = 10*EDIFF).

Per @shyuep's and @Andrew-S-Rosen's comments on EDIFF: we could do an initial coarse relax with EDIFF_PER_ATOM set, and a final tighter relax with EDIFF set. To balance this for larger structures, setting EDIFF = min( 1.e-4, NSITES * EDIFF_PER_ATOM) might be a better choice

I agree with @matthewkuner that setting ISMEAR = 0 with an appropriate SIGMA (we're discussing how to proceed with this) is best for the initial relax. If that returns a metal, we switch to ISMEAR = 2, SIGMA = 0.2, and if not, we stick with ISMEAR = 0

To see which value of ISMEAR is most correct, I can try to run some tests like in this Jorgensen and Hart paper recommended by @computron . Will just be costly to do such a high density of k-points

I agree with setting EDIFF = min(1e-4, NSITES * EDIFF_PER_ATOM)
EDIT: I am now unsure as of Oct 26, 2023)

I think we need to be deliberate in the amount of benchmarking we do to avoid it sapping too much time and effort. At least for the pre-relax step, maybe we can pick a setting that everyone agrees is good enough without more benchmarking?

@rkingsbury regarding EDIFFG in the R2SCAN input set-- how was the value of 0.02 chosen? In your paper "Performance comparison of r2SCAN and SCAN metaGGA density functionals for solid materials via an automated, high-throughput computational workflow", I can't find justification for that specific value being used

@rkingsbury regarding EDIFFG in the R2SCAN input set-- how was the value of 0.02 chosen? In your paper "Performance comparison of r2SCAN and SCAN metaGGA density functionals for solid materials via an automated, high-throughput computational workflow", I can't find justification for that specific value being used

Hi @matthewkuner I dug back through old notes and determined that we initially used EDIFFG=-0.05, and later decided to tighten the convergence to -0.02 for the 2nd relaxation (the r2SCAN one) in the workflow. My recollection at the time was that there was some precedent for these values, but I can't find what now. We tested different workflows with and without force-convergence (see Table A.1 in the SI of the paper), but we didn't do a systematic evaluation of different EDIFFG values. I know from some of the old emails I dug up that we did look at the forces on selected structures to sanity check our value, but that's all I can remember. @mkhorton do you recall any additional context about EDIFFG=-0.02?

@mkhorton just a follow-up ping for the above comment

Per @shyuep's and @Andrew-S-Rosen's comments on EDIFF: we could do an initial coarse relax with EDIFF_PER_ATOM set, and a final tighter relax with EDIFF set. To balance this for larger structures, setting EDIFF = min( 1.e-4, NSITES * EDIFF_PER_ATOM) might be a better choice

@esoteric-ephemera but wouldn't min( 1.e-4, NSITES * EDIFF_PER_ATOM) just set EDIFF=1e-4 for all structures with 2 or more atoms? At that point, it would be better to just set a static EDIFF for all calculations.

Thanks for initiating/sharing the benchmarking document. I’m in favor of most of this but want to query the ENCUT — is this not very large for a GGA?

Re. EDIFFG, I do not recall. I think perhaps 0.01 was tried initially but was too difficult to converge.

I’d add the standard caveat that the older input set yaml is retained somewhere for historical reference.

For EDIFF, if we’re adding force convergence, my hunch is that a decent EDIFF will also be required. I would abandon the EDIFF_PER_ATOM. However, this opinion should certainly be overruled by anyone who has data to support an alternative view.

Thanks for initiating/sharing the benchmarking document. I’m in favor of most of this but want to query the ENCUT — is this not very large for a GGA?

The idea for changing ENCUT/ENAUG to match the r2SCAN set was floated offline (I forget by who). We haven't conducted any benchmarking for this, though, so will probably forego this unless anyone objects (or unless anyone performs this benchmarking).

Thanks for the info regarding EDIFFG! @mkhorton