Less-standard molecular modelling methods, combining rules, and OpenMM nonbonded forces

Almost every molecular modelling software will have some API to support harmonic bonds, harmonic angles, Lennard-Jones nonbonded interactions, and Coulombic nonbonded interactions. Pretty widely implemented, but maybe not as commonly implemented, periodic (CHARMM) dihedrals and Ryckaert-Bellemans/OPLS dihedrals. In a highly developmental field like molecular modelling, scientists will often come up with unique functional forms to express some sort of interaction. These unique functional forms could be hard to incorporate into traditional molecular simulation software; how do you incorporate them into middle-men molecular modelling API in a manner that is quick (quick to merge a PR) and reliable (just because the intermediate API can interpret a particular type of molecular model, how can you verify it converts well to another engine)? How do we make molecular modelling packages "future-proof" in this regard?

I am going to present the Lennard-Jones potential to introduce some nuances in molecular simulation. This is a pair potential, an interaction between two particles. For every pair, there is a $\sigma$ and $\epsilon$. In a molecular model, an atom type will prescribe a particular $\sigma$ and $\epsilon$ combination. In a system with a single atom type (thus only one $\sigma$ and $\epsilon$)

$ V_{LJ} = 4 \epsilon [ (\frac{\sigma}{r})^{12} - (\frac{\sigma}{r})^6]$

Combining (mixing) rules refer to computing unknown Lennard-Jones (LJ) sigma and epsilon values from LJ sigma and epsilon values you already know. If you imagine a system with N different atomtypes, that's N (sigmas, epsilons) for each atomtype. Now, those sigmas and epsilons will tell you how atomtype "I" interacts with another atomtype "I", but what about how atomtype "I" interacts with different atomtype "J"? If you imagine a matrix, these cross-interactions are also known as off-diagonal elements, if you imagine sigmas/epsilons as an NxN matrix, where the diagonal elements are the self-LJ interactions.

UNLIKE harmonic bonds, where you write k (maybe account for a factor of two) and r_eq down in some way, shape, or form across nearly all molecular simulation engines, combining rules are handled vastly differently. In gromacs, this is a an integer corresponding to the comb-rule flag in the topology file. In parmed, this is a string in structure.combining_rule. In lammps, this is an input line in your run file. In openmm, this is a CustomNonbondedForce, with modifications to the NonbondedForce to prevent "double accounting". In hoomd, I haven't even figured out the API/line to implement mixing rules, I've had to unwrap, calculate, and set every single cross interaction

Lorentz-Berthelot mixing rules (generally AMBER and CHARMM force fields):

$\sigma_{ij} = \frac{\sigma_i + \sigma_j}{2}$

$\epsilon_{ij} = \sqrt{\epsilon_i * \epsilon_j}$

Geometric mixing rules (generally OPLS):

$\sigma_{ij} = \sqrt{\sigma_i * \sigma_j}$

$\epsilon_{ij} = \sqrt{\epsilon_i * \epsilon_j}$

There are other mixing rules, but these are two of the most common. These result in small numerical differences, but depending on your field within molecular simulation, this could be a big deal. For estimating bulk properties like diffusion or density, they probably won't make a huge difference, but something as sensitive as free energy calculations may be impacted by the slight difference in mixing rules.

import parmed as pmd
import simtk.openmm as openmm

This is a gromacs file for a box of Lennard-Jones particles, with comb-rule=3 (geometric mixing rules)

!head lj3_bulk.top
[ defaults ]
; nbfunc	comb-rule	gen-pairs	fudgeLJ	fudgeQQ
1		3		yes		0.5	0.5

[ atomtypes ]
LJP   LJP  18   10.0000     0.000       A    4.00000e-01  1.0000 ; 

[ moleculetype ]
; Name            nrexcl
Lennard-Jones                 1

We turn to our handy-dandy ParmEd package to turn this file into a data object

structure = pmd.load_file('lj3_bulk.top', xyz='lj3_bulk.gro')
structure
<GromacsTopologyFile 400 atoms; 400 residues; 0 bonds; PBC (orthogonal); NOT parametrized>

Every atom in this system has the same atom type, this corresponds to the [atomtypes] directive in the gromacs file (the last two floats are the sigma (nm) and epsilon (kJ/mol)). ParmEd processed the sigmas and epsilons of the atomtypes just fine, and properly converted them to Angstrom and kcal/mol

structure.atoms[0].sigma, structure.atoms[0].epsilon
(4.0, 0.2390057361376673)

Combining rule was processed, here it is as a string

structure.combining_rule
'geometric'

Let's see how well this translates to OpenMM

system = structure.createSystem()
system
<simtk.openmm.openmm.System; proxy of <Swig Object of type 'OpenMM::System *' at 0x105f07c30> >

Identifying the forces within the openmm.System, we see there is the openmm.NonbondedForce, which is to be expected if we have a system of particles whose sole interaction is via Lennard-Jones. The openmm.CMMotionRemover is not going to impact the energy evaluation of the model, but just used to prevent flying ice cube effects in a molecular simulation.

But what about the openmm.CustomNonbondedForce? In general, openmm.CustomXYZForce objects are openmm's way of accounting for any new potential interactions (at least, for bonds, angles, generalized Born, hydrogen bond, torsion, and nonbonded forces). There's some incredible work done to make sure these custom force objects are easy-to-create for users, but still fast to compute on GPUs.

system.getForces()
[<simtk.openmm.openmm.NonbondedForce; proxy of <Swig Object of type 'OpenMM::NonbondedForce *' at 0x1070d8cf0> >,
 <simtk.openmm.openmm.CustomNonbondedForce; proxy of <Swig Object of type 'OpenMM::CustomNonbondedForce *' at 0x1070d8f30> >,
 <simtk.openmm.openmm.CMMotionRemover; proxy of <Swig Object of type 'OpenMM::CMMotionRemover *' at 0x1070d8db0> >]

OpenMM NonbondedForces

Before we look at the openmm.CustomNonbondedForce, let's look at the openmm.NonbondedForce. The openmm.NonbondedForce encompasses Lennard-Jones and Coulombic interactions. Not only that, it handles the treatment of these interactions (Long range treatments: cutoff, switching functions, PME, reaction field, and LJPME; Bonded exceptions: 1-2 exceptions, 1-3 exceptions, 1-4 exceptions and their scale factors). Note: if I refer to a "nonbonded interaction", I am referring to the Coulombic (QQ) + Lennard-Jones (LJ) interactions.

I will avoid talking about long range treatments, but that is a very "sensitive" subject - molecular simulations can be very touchy to your long range treatment, every engine might do it differently, and many force fields prescribe different long range treatments, do your reading!

We will cover bonded exceptions as it pertains to building a molecular model. 1-2 interactions (nonbonded interactions between two particles that are directly bonded) are generally ignored by most force fields, and the only interaction is the bond interaction. The 1-2 exceptions mean the nonbonded interaction is neglected between two bonded atoms. 1-3 exceptions (two particles separated by a middle particle, think the H-H nonbonded interaction in H-O-H) are also generally ignored by most force fields. 1-4 exceptions are different; the Coulombic interactions can be scaled by an amount [0.0, 1.0), while the Lennard-Jones interactions can be scaled by another amount [0.0, 1.0). These scalings can be called fudge factors, 14 scalings, or something else, but be sure to follow the correct 14 scalings as prescribed by your force field. More on that later

Returning to the openmm.NonbondedForce, I have links to their documentation below. With the NonbondedForce, you create the Force object, but then you have to start specifying the entities for which the force will apply in addition to the parameters by which the force will behave. An example: you have to call NonbondedForce.addParticle(charge, sigma, epsilon) to specify the QQ and LJ parameters. If you call addParticle for the i-th time, you will be specifying nonbonded parameters for the i-th particle. So you will have to add nonbonded parameters sequentially, in order with how your chemical system is ordered. Later, you can call Nonbondedforce.setParticleParameters(index, charge, sigma, epsilon) to specify parameters for the i-th particle. Cross interactions are never specified as they are assumed to follow Lorentz-Berthelot (we will see how to get around this later).

There are functions for Nonbondedforce.addException(int particle1, int particle2, double chargeProd, double sigma, double epsilon, replace=False) to specify particle pairs who are exceptions to the rules - this will add an exception for a specific particle pair. To generally/globally add exceptions across your whole molecular model, you can call Nonbondedforce.createExceptsionFromBonds((int,int) bonds, coulomb14scale, lj14scale). This is more convenient as the 1,4 exceptions are universal for a given molecular model under a particular force field.

To see how parmed translates parmed nonbonded parameters into openmm nonbonded parameters go to their createSystem method. Note that their exceptions will specify a sigma of 0.5, but that doesn't matter since the chargeproduct and epsilon are 0, thereby nullifying both QQ and LJ interactions.

In our LJ particle system, we can get the particle parameters of the openmm.NonbondedForce. We are looking at the charge, sigma, and epsilon.

But wait - why are the charge and epsilon 0? The parmed.Stucture clearly had atomtypes specified, as did the gromacs files.

Are there 1,4 exceptions? Technically, there shouldn't be any because this LJ particle system has no bonds, so there are no 1,4 (or 1,2 or 1,3) interactions.

system.getForce(0).getParticleParameters(1)
[Quantity(value=0.0, unit=elementary charge),
 Quantity(value=0.5, unit=nanometer),
 Quantity(value=0.0, unit=kilojoule/mole)]

Hm, no 1,4 exceptions here...

system.getForce(0).getNumExceptions()
0

Let's look at the openmm.CustomNonbondedForce. There are 400 particles in our system, and this force has 400 particles, so every particle is interacting via this nonbonded force, and there are 2 parameters for each particle.

print(system.getForce(1).getNumParticles())
print(system.getForce(1).getNumPerParticleParameters())
400
2

We can look at the actual energy function. This gives some hint into how a user could encode multiple-functions into a single string. In this string, the first function looks like a LJ potential, the second function explains what sigr6 is, the third function explains what sigr2 is, and the last function explains what sigc is - this last function is the geometric mixing rule for sigma! This is our Lennard Jones nonbonded force, but now following geometric mixing rules!

system.getForce(1).getEnergyFunction()
'epsilon1*epsilon2*(sigr6^2-sigr6); sigr6=sigr2*sigr2*sigr2; sigr2=(sigc/r)^2; sigc=sigma1*sigma2'

Quoting openmm documentation, "The names of per-particle parameters have the suffix “1” or “2” appended to them to indicate the values for the two interacting particles. As seen in the above example, the expression may also involve intermediate quantities that are defined following the main expression, using “;” as a separator."

So for these two-particle interactions, we can intelligently parse epsilon1 vs epsilon2 and sigma1 vs sigma2 as parameters for different particles, and also intelligently "join" multiple functions. I will observe that this is a combination of openmm's API and how parmed implemented geometric mixing rules for openmm.

In broad strokes of how parmed does it, the openmm.CustomNonbondedForce object is created, whose only argument is the string that is the multiple functions being employed. You specify the "perParticleParameters" that every particle should know. I'll ignore the nonbonded methods for now. Just like openmm.NonbondedForce, you add each particle sequentially (i.e. you add particle 1's parameters first, particle 3's parameters third, etc.). The parameters to add to each particle should be in the order you initially added them to the Force object - we call force.addPerParticleParameter('epsilon') and then force.addPerParticleParameter('sigma'), this means we should add particles' parameters as epsilon, then sigma. We iterate through each atom in our parmed.Structure, do a unit conversion on epsilon and sigma, then call force.addParticle((eps,sig)). Again, we have to add each particle in order, but we also have to add each parameter in order. Note: we did not have to do any manual code-calculation of the mixed-sigma or mixed-epsilon, this is handled in the openmm.CustomNonbondedForce object. At this point, we've specified everything we need for a LJ-geometric-mixing-rule-force.

For cleanup, we have to post-process and back-track through the original LJ force object (the openmm.NonbondedForce). This means keeping the original charges, but setting the LJ interactions to 0. The LJ (geometric-mixing) interactions are handled by the CustomNonbondedForce, the LJ (lorentz-mixing) interactions are set to 0 via epsilon in the NonbondedForce, and the QQ interactions are un-touched in the NonbondedForce. Lastly, any 1,4 exclusions in the NonbondedForce need to get replicated and carried over as 1,4 exclusions in the CustomNonbondedForce.

This looks like a lot of work for something fairly simple

Valid point. However, if you wanted to make a truly flexible engine that could accommodate some new, arbitrary molecular model feature, you'll have to do a lot of infrastructure/backend work to make sure this is done properly, reproducibly, cleanly, and done well. If you were just trying to make an engine that accommodated only the simplest molecular models, then that would be fairly simple to implement and the resultant API would be pretty simple, but you're no longer accounting for these "edge case" models.

Maybe this applies in general to software and data engineering, but flexibility will require complexity

Using this notebook

All notebooks within this webiste/repo can be found here