openmm / spice-dataset

A collection of QM data for training potential functions

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

What quantities to compute

peastman opened this issue · comments

What quantities do we want to compute and include in the dataset? Energies and forces are of course essential, but there are other things we could also include. A good principle is that if it's cheap to compute something, and if it might potentially be useful to someone, we might as well include it. Here is a list of quantities that Psi4 can compute: https://psicode.org/psi4manual/master/oeprop.html. Here are some to consider.

  • Partial charges. Psi4 can compute a few different types of partial charges. Would this be useful for people who want to train models to predict partial charges?
  • Dipoles. In the PhysNet paper, they predict molecular dipoles from their model during training and include them in the loss function. You could easily do the same thing with other models.
  • Electrostatic potential and/or field. I'm not sure if this would be useful to anyone, but it's a possibility.
  • Anything else?

We should save the converged wavefunction.

  • If we have the wavefunction, we can relatively cheaply compute any additional electronic properties.
  • If we decide to recompute the dataset with a higher-accuracy method, the current wavefunction could be used as an initial guess to the reduce computational cost of the higher-accuracy method.

In the past there were problems saving the wavefunction with Psi4, but hopefully in the latest release it is fixed.

Computed benzene with wB97X-D/def2-TZVPPD:

import psi4

psi4.set_memory('32 GB')

benzene = psi4.geometry("""
  H      1.2194     -0.1652      2.1600
  C      0.6825     -0.0924      1.2087
  C     -0.7075     -0.0352      1.1973
  H     -1.2644     -0.0630      2.1393
  C     -1.3898      0.0572     -0.0114
  H     -2.4836      0.1021     -0.0204
  C     -0.6824      0.0925     -1.2088
  H     -1.2194      0.1652     -2.1599
  C      0.7075      0.0352     -1.1973
  H      1.2641      0.0628     -2.1395
  C      1.3899     -0.0572      0.0114
  H      2.4836     -0.1022      0.0205
""")

energy, wfn = psi4.energy('wB97X-D/def2-TZVPPD', molecule=benzene, return_wfn=True)

wfn.to_file('benzene')

The wavefunction size is 8.1 MB.

I don't know for sure what QCArchive can handle, but I suspect that won't be practical. For a molecule that size, the coordinates and forces together take 288 bytes. Adding in a few other values and some metadata brings it up to around 1 KB. Storing the wavefunction increases the storage requirements by 3-4 orders of magnitude!

@jthorton and @pavankum will have to chime in with which properties are supported by QCEngine/QCFractal/QCArchive and can reasonably be captured.

Instead of the wavefunction we can save the orbital coefficients and eigenvalues, which are good enough for most properties and also to reconstruct the wavefunction. A "crude" example to restart from orbital coeffs,

import psi4
import numpy as np

psi4.set_memory('32 GB')

benzene = psi4.geometry("""
  H      1.2194     -0.1652      2.1600
  C      0.6825     -0.0924      1.2087
  C     -0.7075     -0.0352      1.1973
  H     -1.2644     -0.0630      2.1393
  C     -1.3898      0.0572     -0.0114
  H     -2.4836      0.1021     -0.0204
  C     -0.6824      0.0925     -1.2088
  H     -1.2194      0.1652     -2.1599
  C      0.7075      0.0352     -1.1973
  H      1.2641      0.0628     -2.1395
  C      1.3899     -0.0572      0.0114
  H      2.4836     -0.1022      0.0205
""")

energy, wfn = psi4.energy('wB97X-D/def2-TZVPPD', molecule=benzene, return_wfn=True)

alpha_orb_coeffs = wfn.Ca().np
eigen_vals = wfn.epsilon_a().np
nalpha = wfn.nalpha()

print("a and b densities same: ", wfn.same_a_b_dens())
print("a and b orbs same: ", wfn.same_a_b_orbs)

Density = np.dot(alpha_orb_coeffs[:, :nalpha], alpha_orb_coeffs[:, :nalpha].T)
print(Density == wfn.Da().np)

# Changing orbitals to orbitals read from file (here, stored in variables)
psi4.core.clean()

new_scf, new_wfn = psi4.energy('hf/def2-tzvppd', molecule=benzene, return_wfn=True)
print(new_wfn.Ca().np == wfn.Ca().np)

# since alpha and beta are similar
new_wfn.Ca().np[:] = alpha_orb_coeffs
new_wfn.epsilon_a().np[:] = eigen_vals

new_wfn.Cb().np[:] = alpha_orb_coeffs
new_wfn.epsilon_b().np[:] = eigen_vals

# writing to the scratch file that psi4 reads if scf_guess was set to READ
my_file=new_wfn.get_scratch_filename(180) + '.npy'
new_wfn.to_file(my_file)

psi4.set_options({'guess': 'read'})
energy = psi4.energy('wb97x-d/def2-TZVPPD', molecule=benzene)

May be @jthorton has a polished way to construct a new wfn object instead of replacing the orb coeffs of another energy calc. Anyways, those orbitals and eigenvalues would be on the order of 10's of kilobytes.

Some properties we would be interested in are wiberg/mayer bond indices, dipole, quadrupole moments (already listed above). ESPs can be built from orbital coefficients after we reconstruct the wavefunction.

Saving the coefficients isn't a substitute for also computing and storing useful quantities. Even if it only took 1 second to recompute them for each conformation, it would still take weeks for the entire dataset. How about including the following?

DIPOLE
QUADRUPOLE
WIBERG_LOWDIN_INDICES
MAYER_INDICES
MBIS_CHARGES

Psi4 also supports Distributed Multipole Analysis, which is another way of computing atomic charges and multipoles. I don't know how it compares to MBIS.

Closing since version 1 is now released.