Molecular Modeling Software: ParmEd

17 minute read

Published:

ParmEd

ParmEd is an open-source python package for molecular modeling applications. In the description: “Cross-program parameter and topology file editor and molecular mechanical simulator engine.”

The description doesn’t quite do this package justice - ParmEd provides a data structure that represents a molecular model. Atomic names, coordinates, bonding neighbors, and ALL the interaction parameters you would use to model a chemical system and its underlying physics (at a molecular mechanics level, not quantum).

For a molecular modeler, this extremely important if we want to consistently and reliably build a model for a chemical system. Building these models ends up being a lot of bookkeeping (okay we’ll use this value for the A-B bond but this other value for the A-B non-bonded interaction), which becomes exponentially laborious as the sorts of chemical systems diversify from hundreds of copies of the same type of atom and parameters, to thousands of copies of a dozen different types and hundreds of different parameters, to beyond.

For a scientific software developer, this is a great package in its object-oriented design, extensive documentation, use of unit tests, open-source nature, and active user community; active user communities are hard in academia when everyone does everything slightly differently - if you can find a common ground that everyone will use and know, then you might hit a “critical threshold” for userbase that the open-source community may yield some incredible contributions.

For both groups, this is a great package because it facilitates interoperability between simulation engines/applications. The world of molecular modeling has developed many simulation packages designed for a particular purpose, focused more on depth than breadth of application. The algorithms and simulation implementations may differ, but the fundamental molecular model is the same.

The interconversion and interoperability issue

Here’s a gripe within the simulation community: everyone has a different file format

One simulation engine might write bond parameters as (bondtype, r0 in Angstrom, k in kcal/mol/A^2)

1, 10, 200

Another simulation engine might go with (atomtype1, atomtype2, k in kJ/mol/nm^2, r0 in nm)

1, 2, 41840, 1

For thousands of parameter lines similar to this, keeping track of units and proper syntax and overall formatting is a huge headache, and you’d think this sort of thing should be automated.

Another example, to express a bond as a harmonic potential, there are 2 ways to write the function

$V(r) = k * (r-r_0)^2$

$V(r) = (1/2) * k * (r-r_0)^2$

So your force constant, $k$, can be off by a factor 2 if you’re not bookkeeping properly.

In the grand scheme of interoperability, ParmEd processes these nuances between implementations upon translating to or from the parmed.Structure object

Let’s say there are N simulation engines. Converting between each engine could be as expensive as O($N^2$) convertes. You would have to read in engine A’s file format and convert it to the proper format for engine B. And then you’d need one for A-C and B-C, etc.

What if there were a middleman? Converting between each engine would be O($2N$). This middleman is parmed.Structure, this massive class that contains all these parameters and coordinates and everything you need in a molecular model. The conversion procedure would be A-Structure-B or C-Structure-B.

If you can store your model information into the parmed.Structure, you just need to write a routine that parses through the Structure and spits the information out according to the engine. If you have the information stored in an engine’s file format, you just need to write a routine that processes that information into parmed.Structure

Examining pmd.Structure

One popular use-case of parmed involves reading files to get a pmd.Structure. It’s fairly simple to go from some GROMACS files to pmd.Structure. In this case, we’ll be looking at ethane (C2H8)

import parmed as pmd
my_struc = pmd.load_file('ethane_periodic.top', xyz='ethane_periodic.gro')
my_struc
<GromacsTopologyFile 8 atoms; 1 residues; 7 bonds; PBC (orthogonal); parametrized>

In this one line, all the molecular model information about bond parameters, angle parameters, dihedral parameters, non-bonded parameters, etc. are all loaded in to my_struc.

Now we can start digging through the various properties within my_struc

my_struc is just a massive container for smaller containers that eventually hold numerical values. We can examine the bonds within my_struc

print(my_struc.bonds)
TrackedList([
	<Bond <Atom C [0]; In RES 0>--<Atom H [3]; In RES 0>; type=<BondType; k=340.000, req=1.090>>
	<Bond <Atom C [0]; In RES 0>--<Atom H [1]; In RES 0>; type=<BondType; k=340.000, req=1.090>>
	<Bond <Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>; type=<BondType; k=268.000, req=1.529>>
	<Bond <Atom H [2]; In RES 0>--<Atom C [0]; In RES 0>; type=<BondType; k=340.000, req=1.090>>
	<Bond <Atom C [4]; In RES 0>--<Atom H [5]; In RES 0>; type=<BondType; k=340.000, req=1.090>>
	<Bond <Atom H [6]; In RES 0>--<Atom C [4]; In RES 0>; type=<BondType; k=340.000, req=1.090>>
	<Bond <Atom H [7]; In RES 0>--<Atom C [4]; In RES 0>; type=<BondType; k=340.000, req=1.090>>
])

The string representations of these bonds is pretty helpful at a glance. We can see the two atoms participating in the bond, and the nature of their interactions (bondtypes) There are two different bondtypes, one for C-C and another for C-H.

Next, we can select out an individual bond and sift through the various information inside it

print(my_struc.bonds[0])
print(my_struc.bonds[0].type)
print(my_struc.bonds[0].type.k)
<Bond <Atom C [0]; In RES 0>--<Atom H [3]; In RES 0>; type=<BondType; k=340.000, req=1.090>>
<BondType; k=340.000, req=1.090>
339.99999999999994

We can look at the atoms similarly - we’ll get the coordinates of the first atom

print(my_struc.atoms[0])
print(my_struc.atoms[0].xx, my_struc.atoms[0].xy, my_struc.atoms[0].xz)
<Atom C [0]; In RES 0>
0.0 -1.4000000000000001 -0.0

Before going forward, I want to introduce the term atom type, which describes the identity of an atom.

If you had a system of just argon atoms, they are all chemically identical. There might be a bunch of argon atoms, but there would only be one atom type

In a more complicated example, let’s say you had a box of propane propane

Looking at the carbons, there might be 3 carbons, but 2 different atom types. The “outside” carbons (each with 3 hydrogens) moieties, are chemically different from the “middle” carbon (with 2 hydrogens).

When you look at bonds, the external C-H bond might be chemically different from the inside C-H bond (they probably aren’t very chemically different, but let’s assume they are). As a result, while they are both C-H bonds, they will have different bond types. This sort of delination appears for angles (3 atoms bonded linearly) and dihedrals (4 atoms bonded together either linearly) and improper dihedrals (3 atoms bonded to a central atom). You might have a lot of bonds, but many fewer bond types

Let’s look deeper at the atoms within my_struc, the atom type

print(my_struc.atoms[0].type)
opls_135

Not the most interesting or intuitive name, but it’s still unique enough for differenting atom types

Before we go further, I have to introduce another equation that models the non-bonded interactions in our system. For $N$ atoms, there are $N^2$ of these. And usually non-bonded interactions are broken into coulombic (think physics 2) interactions, but also your dispersion forces (which arise from temporary dipole interactions that may arise between two atoms). I will focus on the dispersion forces - most models for this force utilize the Lennard-Jones potential (though some people use Buckingham, Morse, Mie, etc).

$V(r) = 4\epsilon[(\frac{\sigma}{r})^{12} - (\frac{\sigma}{r})^6]$

$\sigma$ and $\epsilon$ are parameters of the model, $r$ is the interatomic distance between the two atoms in question

For the atom type opls_135, we can examine the associated sigma and epsilon parameters

print(my_struc.atoms[0].sigma, my_struc.atoms[0].epsilon)
35.0 0.066

For fun, we can visualize this potential over distance

import matplotlib
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
r = np.linspace(0.00001, 120, num=1000)
v = 4 * my_struc.atoms[0].epsilon * ((my_struc.atoms[0].sigma/r)**12 - (my_struc.atoms[0].sigma/r)**6)
fig, ax = plt.subplots(1,1)
ax.plot(r, v)
ax.set_ylim([-0.1,0.1])
ax.set_ylabel("V(r), kcal/mol")
ax.set_xlabel("r, $\AA$")
Text(0.5, 0, 'r, $\\AA$')

png

That is an introduction to the pmd.Structure - it is a MASSIVE container for all things related to a molecular model. This works well for molecular models with a lot of things in common. Now we will tackle the components of a molecular model are different.

Dihedrals

A dihedral refers to the interaction between 4 linearly connected atoms. This isn’t exactly chemistry, but rather an approximation to chemistry and things like steric interactions (think Newman projections back from organic chemistry).

There a variety of ways to model this interaction. For example:

  • Periodic torsions periodic

  • Ryckaert-Bellemans (RB) torsions rbtorsion

They are both trigonometric series, but there’s not much commonality between them - there’s not a simple underlying set of parameters/function between the two that you can easily relate. You’d have to generate some energies for one function, and then numerically fit the other. Since there’s no easy way for one function/set of parameters to express both of these functions, we have to handle them separately within pmd.Structure

periodic_torsions = pmd.load_file('ethane_periodic.top', xyz='ethane_periodic.gro')
rb = pmd.load_file('ethane_rbtorsion.top', xyz='ethane_rbtorsion.gro')

periodic_torsions is a molecular model for ethane using periodic torsions. rb is a molecular model for ethane using RB torsions.

All the different interaction models/classes are found in parmed/topologyobjects.py

But some dihedral bookkeeping in ParmEd: periodic torsions are stored in structure.dihedrals while RBtorsions are stored in structure.rb_torsions. Handling improper dihedrals is a bigger mess, but I want to emphasize how ParmEd handles these different functional forms: they are kept as separate objects and handled separately.

The next two snippets will look at periodic torsions within rb and periodic_torsions (or lack thereof)

periodic_torsions.dihedrals
TrackedList([
	<Dihedral; <Atom H [1]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [5]; In RES 0>; type=<DihedralType; phi_k=0.741, per=1, phase=179.909,  scee=2.000, scnb=1.000>>
	<Dihedral; <Atom H [1]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [6]; In RES 0>; type=<DihedralType; phi_k=0.741, per=1, phase=179.909,  scee=2.000, scnb=1.000>>
	<Dihedral; <Atom H [1]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [7]; In RES 0>; type=<DihedralType; phi_k=0.741, per=1, phase=179.909,  scee=2.000, scnb=1.000>>
	<Dihedral; <Atom H [2]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [5]; In RES 0>; type=<DihedralType; phi_k=0.741, per=1, phase=179.909,  scee=2.000, scnb=1.000>>
	<Dihedral; <Atom H [2]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [6]; In RES 0>; type=<DihedralType; phi_k=0.741, per=1, phase=179.909,  scee=2.000, scnb=1.000>>
	<Dihedral; <Atom H [2]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [7]; In RES 0>; type=<DihedralType; phi_k=0.741, per=1, phase=179.909,  scee=2.000, scnb=1.000>>
	<Dihedral; <Atom H [3]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [5]; In RES 0>; type=<DihedralType; phi_k=0.741, per=1, phase=179.909,  scee=2.000, scnb=1.000>>
	<Dihedral; <Atom H [3]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [6]; In RES 0>; type=<DihedralType; phi_k=0.741, per=1, phase=179.909,  scee=2.000, scnb=1.000>>
	<Dihedral; <Atom H [3]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [7]; In RES 0>; type=<DihedralType; phi_k=0.741, per=1, phase=179.909,  scee=2.000, scnb=1.000>>
])
rb.dihedrals
TrackedList([
])

The next two snippets will look at RB torsions within rb and periodic_torsions (or lack thereof)

periodic_torsions.rb_torsions
TrackedList([
])
rb.rb_torsions
TrackedList([
	<Dihedral; <Atom H [1]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [5]; In RES 0>; type=<RBTorsionType; c0=0.150; c1=0.450; c2=0.000; c3=-0.600; c4=0.000; c5=0.000; scee=2.0; scnb=1.0>>
	<Dihedral; <Atom H [1]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [6]; In RES 0>; type=<RBTorsionType; c0=0.150; c1=0.450; c2=0.000; c3=-0.600; c4=0.000; c5=0.000; scee=2.0; scnb=1.0>>
	<Dihedral; <Atom H [1]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [7]; In RES 0>; type=<RBTorsionType; c0=0.150; c1=0.450; c2=0.000; c3=-0.600; c4=0.000; c5=0.000; scee=2.0; scnb=1.0>>
	<Dihedral; <Atom H [2]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [5]; In RES 0>; type=<RBTorsionType; c0=0.150; c1=0.450; c2=0.000; c3=-0.600; c4=0.000; c5=0.000; scee=2.0; scnb=1.0>>
	<Dihedral; <Atom H [2]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [6]; In RES 0>; type=<RBTorsionType; c0=0.150; c1=0.450; c2=0.000; c3=-0.600; c4=0.000; c5=0.000; scee=2.0; scnb=1.0>>
	<Dihedral; <Atom H [2]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [7]; In RES 0>; type=<RBTorsionType; c0=0.150; c1=0.450; c2=0.000; c3=-0.600; c4=0.000; c5=0.000; scee=2.0; scnb=1.0>>
	<Dihedral; <Atom H [3]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [5]; In RES 0>; type=<RBTorsionType; c0=0.150; c1=0.450; c2=0.000; c3=-0.600; c4=0.000; c5=0.000; scee=2.0; scnb=1.0>>
	<Dihedral; <Atom H [3]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [6]; In RES 0>; type=<RBTorsionType; c0=0.150; c1=0.450; c2=0.000; c3=-0.600; c4=0.000; c5=0.000; scee=2.0; scnb=1.0>>
	<Dihedral; <Atom H [3]; In RES 0>--<Atom C [0]; In RES 0>--<Atom C [4]; In RES 0>--<Atom H [7]; In RES 0>; type=<RBTorsionType; c0=0.150; c1=0.450; c2=0.000; c3=-0.600; c4=0.000; c5=0.000; scee=2.0; scnb=1.0>>
])

In the parmed.Structure object, the different dihedral forms are held in different containers. Regardless, the parameters relevant to each dihedral object are still tracked

parmed.Structure conversion

Many like to use parmed as the middleman to convert from one engine’s file format to another engine’s file format. In this notebook, we’ve been using GROMACS files to model ethane, and now we’ll mess around with converting to files suitable for other engines.

my_struc = pmd.load_file('ethane_periodic.top', xyz='ethane_periodic.gro')
my_struc.save('ethane_periodic.prmtop')
my_struc.save('ethane_periodic.crd')

Another molecular simulation engine, AMBER, specifies the molecular model in prmtop (parametrized topology) and crd (coordinate) files. I confess I haven’t actually used AMBER (it’s very popular), but parmed has made it incredibly easy for a GROMACS-user to switch over (and use functionality that AMBER might provide)

!head ethane_periodic.prmtop
%VERSION  VERSION_STAMP = V0001.000  DATE = 08/01/19  17:37:44
%FLAG TITLE
%FORMAT(20a4)

%FLAG POINTERS
%FORMAT(10I8)
       8       2       6       1      12       0       9       0       0       0
      29       1       1       0       0       2       2       9       1       0
       0       0       0       0       0       0       0       1       8       0
       0
!head ethane_periodic.crd
* GENERATED BY PARMED (HTTPS://GITHUB.COM/PARMED/PARMED)
*
         8  EXT
         1         1  RES       C               0.0000000000       -1.4000000000       -0.0000000000  SYS       0               0.0000000000
         2         1  RES       H              -1.0700000000       -1.4000000000       -0.0000000000  SYS       0               0.0000000000
         3         1  RES       H               0.3600000000       -2.1700000000        0.6500000000  SYS       0               0.0000000000
         4         1  RES       H               0.3600000000       -1.5800000000       -0.9900000000  SYS       0               0.0000000000
         5         1  RES       C               0.0000000000        0.0000000000        0.0000000000  SYS       0               0.0000000000
         6         1  RES       H               1.0700000000        0.0000000000        0.0000000000  SYS       0               0.0000000000
         7         1  RES       H              -0.3600000000        0.7700000000        0.6500000000  SYS       0               0.0000000000

What we did with GROMACS files to pmd.Structure to AMBER files involed an IOstep to read GROMACS files into an in-memory representation, then an IO step to write AMBER files from the in-memory representation. These 2 simulation engines require these files to perform a simulation - what about a simulation engine that doesn’t require this IO step, and can we purely handle all this in-memory?

OpenMM is another simulation engine rooted in C++ with wrappers for python implementations and GPU support (CUDA and OpenCL). We can “interface” more tightly with the OpenMM codebase by doing all these molecular model conversions within python

my_omm_system = my_struc.createSystem()
print(type(my_omm_system))
my_omm_system
<class 'simtk.openmm.openmm.System'>

<simtk.openmm.openmm.System; proxy of <Swig Object of type 'OpenMM::System *' at 0x110c0ef00> >

In that one line, we’ve converted pmd.Structure into openmm.System, which gets us very close to performing a simulation within OpenMM without writing out an input file.

In my opinion, this adds some elegance to performing molecular simulations in that they can be well-encompassed in a python script (and not a python script that just wraps a bunch of shell commands)

Does this relate to other fields of technology?

Are there other applications of STEM where objects/entities need to be able to convert, communicate, or inter-operate between one another? ParmEd is a nice intermediate that allows different simulation engines to communicate information.

I recently set up a Google Home Mini in my apartment - how does the Google product know how to turn off all the different smart bulbs out there (Philips, Element, Sengled, Wyze, FluxSmart)? I’m sure there are engineers who have to work on this GoogleHome-bulb interoperability to make sure a command to Google Home will perform an action to another device.

What about parallel computing? The MPI standard defines routines and protocols to which any MPI library must adhere (OpenMPI, MPICH, etc). This defines a common ground so that an external program can call the same MPI functions and expect the same operation, but under the hood it’s a different MPI library performing the gather/recieve commands.

Summary

ParmEd is a digital, in-memory data structure that represents a molecular model. It often serves an intermediary and great bookkeeper to facilitate translating molecular model (parameters, topology, coordinates) information from one simulation’s file format (or data structure) to another simulation’s file format (or data structure). It supports a variety of functional forms for the molecular modelers out there, and it is well-designed for the scientific software developers out there. Any molecular modeler knows how difficult and laborious this sort of package is, so the contributors to ParmEd deserve a big thank you.