GROMACS

13 minute read

Published:

Notes on GROMACS molecular dynamics code

GROMACS Documentation Page

Beginner tutorials

Justin A. Lemkul, Ph.D., has produced a repository of beginner tutorials for GROMACS which can be accessed through mdtutorials.com/gmx/ or GitHub, with an open-access article explaining each tutorial in more depth

Dr. Lemkul describes the tutorials as

  1. Lysozyme in Water: The intent of this tutorial is to give new users a basic introduction into the tools used to prepare, run, and perform simple analysis on a “typical” system with GROMACS.
  2. KALP15 in DPPC: This tutorial is more advanced, and is designed for more experienced users who want to simulate membrane proteins and understand force field structure and modification.
  3. Umbrella Sampling: Also somewhat advanced, this tutorial is intended for users who wish to learn to use umbrella sampling to calculate the potential of mean force (PMF) along a single, linear degree of freedom.
  4. Biphasic Systems: The construction of a biphasic cyclohexane-water system.
  5. Protein-Ligand Complex: The fifth tutorial instructs the user on how to deal with a protein-ligand system, with a focus on proper ligand parametrization and topology handling.
  6. Free Energy of Solvation: This tutorial describes the procedure for carrying out a simple free energy calculation, the elimination of van der Waals interactions between a simple molecule (methane) and water. More complicated systems are discussed.
  7. Virtual Sites: This tutorial guides the user through manual construction of virtual sites for a very simple linear, triatomic molecule (CO2).

Topology .top and .itp files

Documentation and sample topology

The topology file includes the parameters defining the system, the content within the system, and additional restraints, descriptions, etc. which a user may wish to include. Molecular topologies are written in a .top file or can be referenced by an .itp within the .top

There is considerable nuance in how these files are made, so consult the documentation and read the error messages carefully

Structure files

Documentation and sample structure

Gist of the file, top to bottom:

  • Simulation title - String
  • Number of atoms - Integer
  • One line per atom in format: “%5d%-5s%5s%5d%8.3f%8.3f%8.3f%8.4f%8.4f%8.4f”
  • Box dimensions - Floats describing X, Y, and Z box dimensions, separated by spaces

From left to right, the format to describe each atom is (exactly quoting from the documentation)

  • residue number (5 positions, integer)
  • residue name (5 characters)
  • atom name (5 characters)
  • atom number (5 positions, integer)
  • position (in nm, x y z in 3 columns, each 8 positions with 3 decimal places)
  • velocity (in nm/ps (or km/s), x y z in 3 columns, each 8 positions with 4 decimal places)

OPLS force field

The Optimized Potential for Liquid Simulations (OPLS) is a set of parameters for describing proteins and organic molecules

Within the GROMACS installation, there is a folder oplsaa.ff, located at /share/gromacs/top/oplsaa.ff, which contains several useful files for simulations

  • spc, spce, tip3p, tip4pew, tip4p, tip5pe, tip5p .itp files - Several water models that can be used for solvating systems. Read more about these within the watermodels.dat file or in the GROMACS documentation
  • ions.itp - Topology of common ions (e.g., Na+, Cl-)
  • Organic solvent topology files - 1propanol.itp, ethanol.itp, methanol.itp
  • forcefield.doc - Citation for using the OPLS-AA topology files

Automatically generating a topology file

gmx pdb2gmx

Adds hydrogens and creates a topology file (.top) for a structure file (.gro or .pdb). Outputs are a GROMACS topology file and format structure file.

The forcefield and water model to be used can be defined interactively or as options with the command.

ex. gmx pdb2gmx -f protein.pdb -ff oplsaa -p topol.top -water spce -o protein_treated.gro

will add hydrogens to the protein and create a .gro structure file and a .top topology file, including the topology/forcefield to define the protein and water model (SPC/E in this example).

For pdb2gmx to correctly generate the topology file, the molecule called for in the structure file must be within the aminoacids.rtp file. More on that below.

pdb2gmx can also produce an index file with the -n flag.

Adding custom topologies to aminoacids.rtp

rtp file format

United-Atom (UA) vs. All-Atom (AA) simulation

Molecular topologies can described using all-atom (AA), united-atom (UA), or a combination of the two. All-atom topologies describe every atom individually (including hydrogens). United-atom will collapse the structural and chemical descriptions of some combination of atoms into a single point, typically near the original center of mass of the grouping

The OPLS force field makes working between AA, UA, and mixed fairly simple. All that is necessary to produce a UA topology for a functional group is to find the OPLS index in the oplsaa.ff folder

LigParGen topology generator

LigParGen is used to produce OPLS force field parameters for organic molecules (proteins, polymers, etc.)

The software may be used in a browser or by command-line on a local LigParGen server

Polymer simulation

General polymer topology description

Polyethylene example

Adding water to simulation

gmx solvate

ex. gmx solvate -cp lysozyme.gro -cs spc216.gro -p topol.top -o lysozyme_treated.gro -water spce -ff oplsaa

calls for the solvation of the “lysozyme.gro” simulation file with as many SPC/E (spc216.gro) water model molecules as will fit in the free space of the simulation box while using the OPLS-AA forcefield. It will additionally edit the “topol.top” topology file to add the optional restraints for the water model and define the number of molecules of that type of water in the system.

Convert trajectory files

gmx trjconv

From the user manual:

`gmx` `trjconv` can convert trajectory files in many ways:

* from one format to another
* select a subset of atoms
* change the periodicity representation
* keep multimeric molecules together
* center atoms in the box
* fit atoms to reference structure
* reduce the number of frames
* change the timestamps of the frames (-t0 and -timestep)
* cut the trajectory in small subtrajectories according to information in an index file. This allows subsequent analysis of the subtrajectories that could, for example, be the result of a cluster analysis. Use option -sub. This assumes that the entries in the index file are frame numbers and dumps each group in the index file to a separate trajectory file.
* select frames within a certain range of a quantity given in an .xvg file.

ex. gmx trjconv -f npt.trr -s npt.gro -o npt_trj.gro

will convert the trajectory described in the binary file “npt.trr” with the structure and masses defined in “npt.gro” to the output file “npt_trj.gro” which will contain the structure and masses at every timestep of the simulation. There is an option “-sep” which will separate each timestep into a unique file if set to “yes”.

Extract energy components from energy file

gmx energy

Mean-square displacement

gmx msd

Compute the mean-square displacement (\(MSD\)) of atoms or molecules from a set of initial positions

$$MSD\equiv<(\vec{x}(t)-\vec{x_0})^2>=\frac{1}{N}\sum_{i=1}^{N} |\vec{x}^{(i)}(t)-\vec{x}^{(i)}(0)|^2$$

where \(N\) is the total number of particles (atoms/molecules) in the selection, vectors \(\vec{x}^{i}(t)\) and \(\vec{x}^{(i)}(0)\) are the position of the \(i\)-th particle at time \(t\) and the reference position of the \(i\)-th particle.

The command will calculate a diffusion constant, \(D\), from \(MSD\) according to the Einstein relation

$$MSD=2Dt$$

where \(t\) is the simulation time used to calculate the \(MSD\) (GROMACS defaults to the middle 80% of the simulation time).

Velocity autocorrelation function

gmx velacc

gmx analyze

The Velocity Autocorrelation Function

Compute the velocity autocorrelation function of a set of atoms or molecules. This may be used with gmx analyze with the -integrate option selected to calculate the diffusion constant. This function automatically normalizes the autocorrelation, but this can be turned off with -normalize no

The velocity autocorrelation function, \(C_V(t)\), is

$$C_V(t)=\frac{1}{N}\sum_{i=1}^{N}(\vec{v_i}(0)\cdot\vec{v_i}(t))=<\vec{v_i}(0)\cdot\vec{v_i}(t)>$$

where \(N\) is the total number of particles (atoms/molecules in the selection) and \(\vec{v_i}\) is a vector storing the three components of the velocity (\(v_x\), \(v_y\), and \(v_z\)) for the \(i\)-th particle.

\(\vec{v_i}(0)=\vec{v_i}(t=t_0)\) and \(\vec{v_i}(t)=\vec{v_i}(t=t_0+n \Delta t)\), where \(n\) is the timestep and \(\Delta t\) is the timestep size

Given this function decays to zero at long time, the diffusion constant \(D\) may be found from the integral of \(C_V(T)\) as

$$D=\frac{1}{3}\int_{t=0}^{t=\infty}<\vec{v_i}(0)\cdot\vec{v_i}(t)>dt$$

Position restraints

GROMACS Restraints

‘Plus-Minus’ method for polymerization and bonding

Quoting the GROMACS documentation

GROMACS is well-suited for running simulations of polymeric materials.  One of the questions frequently asked on the gmx-users list is how to create a topology for a polymer containing many repeat units.  Probably the simplest way is to create .rtp entries for the desired force field, and use pdb2gmx to create the topology.  To do so requires (at least) three types of residues to be defined:

A "starting" residue, defining the "beginning" of the chain
The repeat (internal) residues
An "ending" residue, defining the "end" of the chain

with example rtp entry for the terminal and internal residues of polyethylene

; Polyethylene - this is an internal residue
[ Eth ]
  [ atoms ]
    C1    opls_136    -0.120    1
    H11   opls_140     0.060    1
    H12   opls_140     0.060    1
    C2    opls_136    -0.120    2
    H21   opls_140     0.060    2
    H22   opls_140     0.060    2
  [ bonds ]
    C1    -C2
    C1    H11
    C1    H12
    C1    C2
    C2    H21
    C2    H22
    C2    +C1

; Terminal PE residue ("beginning" of chain)
; designation arbitrary, C1 is -CH3
[ EthB ]
  [ atoms ]
    C1    opls_135    -0.180    1
    H11   opls_140     0.060    1
    H12   opls_140     0.060    1
    H13   opls_140     0.060    1
    C2    opls_136    -0.120    2
    H21   opls_140     0.060    2
    H22   opls_140     0.060    2
  [ bonds ]
    C1    H11
    C1    H12
    C1    H13
    C1    C2
    C2    H21
    C2    H22
    C2    +C1

; Terminal PE residue ("end" of chain)
; designation arbitrary, C2 is -CH3
[ EthE ]
  [ atoms ]
    C1    opls_136    -0.120    1
    H11   opls_140     0.060    1
    H12   opls_140     0.060    1
    C2    opls_135    -0.180    2
    H21   opls_140     0.060    2
    H22   opls_140     0.060    2
    H23   opls_140     0.060    2
  [ bonds ]
    C1    -C2
    C1    H11
    C1    H12
    C1    C2
    C2    H21
    C2    H22
    C2    H23

With these +/- definitions in the .rtp file, pdb2gmx is able to create a topology for the “bonded” molecule. If there are empty entries (bonds, angles, dihedrals) within the resulting topology, verify these are defined in the .rtp entries.

If attempting to use this method for a homogeneous material (e.g., a homopolymer), only one entry is necessary in the .rtp file. This entry should describe all the necessary parameters for the resulting bonds.

This method becomes complicated for materials where the bonding sites along the “backbone” are of different chemistries. For example, I attempted to construct a heteropolymer of carboxybetaine acrylamide with sulfobetaine methacrylate. To do so requires custom bonding definitions in the entries of the .rtp. With only two monomer kinds, this is annoying, but doable. Adding additional monomer kinds made the number of potential .rtp entries grow exponentially and became a nightmare of a programming problem where it was necessary to create new .rtp entires on the fly with python

Stick to homogeneous materials.

Create index

Custom index groups

gmx make_ndx

Create default index file

printf \“q\” | gmx make_ndx -f .gro -o index.ndx

Distance distribution against time

gmx distance

Use the gmx distance function to calculate the distance distribution against time between two selections in a system. This utility can calculate the average distance as a function of time (-oav), all distances as a function of time (-oall), the x-, y-, and z- components of the distance norm as a function of time (-oxyz), and more.

Before gmx distance can be executed, it could be useful to create the indices of the atom-pairs in an index.ndx file. The utility works by calculating the distance between two exact atoms (e.g., the oxygen of waters), so think about how the pairings should be made.

Root mean square deviation (RMSD)

gmx rms

gmx rmsdist

gmx rms and rmsdist will compute the root mean square deviation (RMSD) between two structures (can be the same molecule) over time. This will require structure and trajectory files to execute properly. rms performs a fitting of the RMSD between successive simulation timesteps, whereas rmsdist does not.

This can be useful for recognizing when a system is appropriately equilibrated during simulation. It is insufficient to evaluate equilibration by system potential energy and temperature/pressure, exclusively. However, some have argued this is also an insufficient method due to biases (see Knapp et al. 2011, DOI: 10.1089/cmb.2010.0237), so be sure to consider several metrics when determining equilibration status.

Extend simulation

Managing long simulations

gmx convert-tpr

Simulation time can be edited or extended with gmx convert-tpr. For example, to extend a simulation (e.g., NPT) by 1 ns, use the following command:

gmx convert-tpr -s NPT.tpr -extend 1000 -o NPT_extend_1ns.tpr

where NPT.tpr is the previous simulation run input file, 1000 is the length of time in ps to extend for, and NPT_extend_1ns.tpr is the new run input file. To use this new run input file, it is necessary to have the previous .log, .cpt, .trr, and .edr files within the same directory. Also, it is necessary to use the same naming convention as the previous files, so the -deffnm argument must be used. With these files and arguments, the simulation can be started with:

gmx mdrun -s NPT_extend_1ns.tpr -cpi NPT.cpt -deffnm NPT

References

GROMACS Reference Manual

GROMACS Forum & Mailing List

GROMACS Tutorials by Dr. Justin A. Lemkul