GROMACS
Published:
Notes on GROMACS molecular dynamics code
Beginner tutorials
Justin A. Lemkul, Ph.D., has produced a repository of beginner tutorials for GROMACS which can be accessed through mdtutorials.com/gmx/ or GitHub, with an open-access article explaining each tutorial in more depth
Dr. Lemkul describes the tutorials as
- Lysozyme in Water: The intent of this tutorial is to give new users a basic introduction into the tools used to prepare, run, and perform simple analysis on a “typical” system with GROMACS.
- KALP15 in DPPC: This tutorial is more advanced, and is designed for more experienced users who want to simulate membrane proteins and understand force field structure and modification.
- Umbrella Sampling: Also somewhat advanced, this tutorial is intended for users who wish to learn to use umbrella sampling to calculate the potential of mean force (PMF) along a single, linear degree of freedom.
- Biphasic Systems: The construction of a biphasic cyclohexane-water system.
- Protein-Ligand Complex: The fifth tutorial instructs the user on how to deal with a protein-ligand system, with a focus on proper ligand parametrization and topology handling.
- Free Energy of Solvation: This tutorial describes the procedure for carrying out a simple free energy calculation, the elimination of van der Waals interactions between a simple molecule (methane) and water. More complicated systems are discussed.
- Virtual Sites: This tutorial guides the user through manual construction of virtual sites for a very simple linear, triatomic molecule (CO2).
Topology .top
and .itp
files
Documentation and sample topology
The topology file includes the parameters defining the system, the content within the system, and additional restraints, descriptions, etc. which a user may wish to include. Molecular topologies are written in a .top
file or can be referenced by an .itp
within the .top
There is considerable nuance in how these files are made, so consult the documentation and read the error messages carefully
Structure files
Documentation and sample structure
Gist of the file, top to bottom:
- Simulation title - String
- Number of atoms - Integer
- One line per atom in format: “%5d%-5s%5s%5d%8.3f%8.3f%8.3f%8.4f%8.4f%8.4f”
- Box dimensions - Floats describing X, Y, and Z box dimensions, separated by spaces
From left to right, the format to describe each atom is (exactly quoting from the documentation)
- residue number (5 positions, integer)
- residue name (5 characters)
- atom name (5 characters)
- atom number (5 positions, integer)
- position (in nm, x y z in 3 columns, each 8 positions with 3 decimal places)
- velocity (in nm/ps (or km/s), x y z in 3 columns, each 8 positions with 4 decimal places)
OPLS force field
The Optimized Potential for Liquid Simulations (OPLS) is a set of parameters for describing proteins and organic molecules
Within the GROMACS installation, there is a folder oplsaa.ff
, located at /share/gromacs/top/oplsaa.ff
, which contains several useful files for simulations
- spc, spce, tip3p, tip4pew, tip4p, tip5pe, tip5p
.itp
files - Several water models that can be used for solvating systems. Read more about these within thewatermodels.dat
file or in the GROMACS documentation - ions.itp - Topology of common ions (e.g., Na+, Cl-)
- Organic solvent topology files - 1propanol.itp, ethanol.itp, methanol.itp
- forcefield.doc - Citation for using the OPLS-AA topology files
Automatically generating a topology file
Adds hydrogens and creates a topology file (.top) for a structure file (.gro or .pdb). Outputs are a GROMACS topology file and format structure file.
The forcefield and water model to be used can be defined interactively or as options with the command.
ex. gmx
pdb2gmx
-f protein.pdb -ff oplsaa -p topol.top -water spce -o protein_treated.gro
will add hydrogens to the protein and create a .gro structure file and a .top topology file, including the topology/forcefield to define the protein and water model (SPC/E in this example).
For pdb2gmx
to correctly generate the topology file, the molecule called for in the structure file must be within the aminoacids.rtp
file. More on that below.
pdb2gmx
can also produce an index file with the -n
flag.
Adding custom topologies to aminoacids.rtp
United-Atom (UA) vs. All-Atom (AA) simulation
Molecular topologies can described using all-atom (AA), united-atom (UA), or a combination of the two. All-atom topologies describe every atom individually (including hydrogens). United-atom will collapse the structural and chemical descriptions of some combination of atoms into a single point, typically near the original center of mass of the grouping
The OPLS force field makes working between AA, UA, and mixed fairly simple. All that is necessary to produce a UA topology for a functional group is to find the OPLS index in the oplsaa.ff
folder
LigParGen topology generator
LigParGen is used to produce OPLS force field parameters for organic molecules (proteins, polymers, etc.)
The software may be used in a browser or by command-line on a local LigParGen server
Polymer simulation
General polymer topology description
Adding water to simulation
ex. gmx
solvate
-cp lysozyme.gro -cs spc216.gro -p topol.top -o lysozyme_treated.gro -water spce -ff oplsaa
calls for the solvation of the “lysozyme.gro” simulation file with as many SPC/E (spc216.gro) water model molecules as will fit in the free space of the simulation box while using the OPLS-AA forcefield. It will additionally edit the “topol.top” topology file to add the optional restraints for the water model and define the number of molecules of that type of water in the system.
Convert trajectory files
From the user manual:
`gmx` `trjconv` can convert trajectory files in many ways:
* from one format to another
* select a subset of atoms
* change the periodicity representation
* keep multimeric molecules together
* center atoms in the box
* fit atoms to reference structure
* reduce the number of frames
* change the timestamps of the frames (-t0 and -timestep)
* cut the trajectory in small subtrajectories according to information in an index file. This allows subsequent analysis of the subtrajectories that could, for example, be the result of a cluster analysis. Use option -sub. This assumes that the entries in the index file are frame numbers and dumps each group in the index file to a separate trajectory file.
* select frames within a certain range of a quantity given in an .xvg file.
ex. gmx
trjconv
-f npt.trr -s npt.gro -o npt_trj.gro
will convert the trajectory described in the binary file “npt.trr” with the structure and masses defined in “npt.gro” to the output file “npt_trj.gro” which will contain the structure and masses at every timestep of the simulation. There is an option “-sep” which will separate each timestep into a unique file if set to “yes”.
Extract energy components from energy file
Mean-square displacement
Compute the mean-square displacement (\(MSD\)) of atoms or molecules from a set of initial positions
$$MSD\equiv<(\vec{x}(t)-\vec{x_0})^2>=\frac{1}{N}\sum_{i=1}^{N} |\vec{x}^{(i)}(t)-\vec{x}^{(i)}(0)|^2$$
where \(N\) is the total number of particles (atoms/molecules) in the selection, vectors \(\vec{x}^{i}(t)\) and \(\vec{x}^{(i)}(0)\) are the position of the \(i\)-th particle at time \(t\) and the reference position of the \(i\)-th particle.
The command will calculate a diffusion constant, \(D\), from \(MSD\) according to the Einstein relation
$$MSD=2Dt$$
where \(t\) is the simulation time used to calculate the \(MSD\) (GROMACS defaults to the middle 80% of the simulation time).
Velocity autocorrelation function
The Velocity Autocorrelation Function
Compute the velocity autocorrelation function of a set of atoms or molecules. This may be used with gmx
analyze
with the -integrate
option selected to calculate the diffusion constant. This function automatically normalizes the autocorrelation, but this can be turned off with -normalize no
The velocity autocorrelation function, \(C_V(t)\), is
$$C_V(t)=\frac{1}{N}\sum_{i=1}^{N}(\vec{v_i}(0)\cdot\vec{v_i}(t))=<\vec{v_i}(0)\cdot\vec{v_i}(t)>$$
where \(N\) is the total number of particles (atoms/molecules in the selection) and \(\vec{v_i}\) is a vector storing the three components of the velocity (\(v_x\), \(v_y\), and \(v_z\)) for the \(i\)-th particle.
\(\vec{v_i}(0)=\vec{v_i}(t=t_0)\) and \(\vec{v_i}(t)=\vec{v_i}(t=t_0+n \Delta t)\), where \(n\) is the timestep and \(\Delta t\) is the timestep size
Given this function decays to zero at long time, the diffusion constant \(D\) may be found from the integral of \(C_V(T)\) as
$$D=\frac{1}{3}\int_{t=0}^{t=\infty}<\vec{v_i}(0)\cdot\vec{v_i}(t)>dt$$
Position restraints
‘Plus-Minus’ method for polymerization and bonding
Quoting the GROMACS documentation
GROMACS is well-suited for running simulations of polymeric materials. One of the questions frequently asked on the gmx-users list is how to create a topology for a polymer containing many repeat units. Probably the simplest way is to create .rtp entries for the desired force field, and use pdb2gmx to create the topology. To do so requires (at least) three types of residues to be defined:
A "starting" residue, defining the "beginning" of the chain
The repeat (internal) residues
An "ending" residue, defining the "end" of the chain
with example rtp
entry for the terminal and internal residues of polyethylene
; Polyethylene - this is an internal residue
[ Eth ]
[ atoms ]
C1 opls_136 -0.120 1
H11 opls_140 0.060 1
H12 opls_140 0.060 1
C2 opls_136 -0.120 2
H21 opls_140 0.060 2
H22 opls_140 0.060 2
[ bonds ]
C1 -C2
C1 H11
C1 H12
C1 C2
C2 H21
C2 H22
C2 +C1
; Terminal PE residue ("beginning" of chain)
; designation arbitrary, C1 is -CH3
[ EthB ]
[ atoms ]
C1 opls_135 -0.180 1
H11 opls_140 0.060 1
H12 opls_140 0.060 1
H13 opls_140 0.060 1
C2 opls_136 -0.120 2
H21 opls_140 0.060 2
H22 opls_140 0.060 2
[ bonds ]
C1 H11
C1 H12
C1 H13
C1 C2
C2 H21
C2 H22
C2 +C1
; Terminal PE residue ("end" of chain)
; designation arbitrary, C2 is -CH3
[ EthE ]
[ atoms ]
C1 opls_136 -0.120 1
H11 opls_140 0.060 1
H12 opls_140 0.060 1
C2 opls_135 -0.180 2
H21 opls_140 0.060 2
H22 opls_140 0.060 2
H23 opls_140 0.060 2
[ bonds ]
C1 -C2
C1 H11
C1 H12
C1 C2
C2 H21
C2 H22
C2 H23
With these +/- definitions in the .rtp
file, pdb2gmx
is able to create a topology for the “bonded” molecule. If there are empty entries (bonds, angles, dihedrals) within the resulting topology, verify these are defined in the .rtp
entries.
If attempting to use this method for a homogeneous material (e.g., a homopolymer), only one entry is necessary in the .rtp
file. This entry should describe all the necessary parameters for the resulting bonds.
This method becomes complicated for materials where the bonding sites along the “backbone” are of different chemistries. For example, I attempted to construct a heteropolymer of carboxybetaine acrylamide with sulfobetaine methacrylate. To do so requires custom bonding definitions in the entries of the .rtp
. With only two monomer kinds, this is annoying, but doable. Adding additional monomer kinds made the number of potential .rtp
entries grow exponentially and became a nightmare of a programming problem where it was necessary to create new .rtp
entires on the fly with python
Stick to homogeneous materials.
Create index
Custom index groups
Create default index file
printf \“q\” | gmx
make_ndx
-f
Distance distribution against time
Use the gmx
distance
function to calculate the distance distribution against time between two selections in a system. This utility can calculate the average distance as a function of time (-oav
), all distances as a function of time (-oall
), the x-, y-, and z- components of the distance norm as a function of time (-oxyz
), and more.
Before gmx
distance
can be executed, it could be useful to create the indices of the atom-pairs in an index.ndx
file. The utility works by calculating the distance between two exact atoms (e.g., the oxygen of waters), so think about how the pairings should be made.
Root mean square deviation (RMSD)
gmx
rms
and rmsdist
will compute the root mean square deviation (RMSD) between two structures (can be the same molecule) over time. This will require structure and trajectory files to execute properly. rms
performs a fitting of the RMSD between successive simulation timesteps, whereas rmsdist
does not.
This can be useful for recognizing when a system is appropriately equilibrated during simulation. It is insufficient to evaluate equilibration by system potential energy and temperature/pressure, exclusively. However, some have argued this is also an insufficient method due to biases (see Knapp et al. 2011, DOI: 10.1089/cmb.2010.0237), so be sure to consider several metrics when determining equilibration status.
Extend simulation
Simulation time can be edited or extended with gmx
convert-tpr
. For example, to extend a simulation (e.g., NPT) by 1 ns, use the following command:
gmx
convert-tpr
-s NPT.tpr -extend 1000 -o NPT_extend_1ns.tpr
where NPT.tpr is the previous simulation run input file, 1000 is the length of time in ps to extend for, and NPT_extend_1ns.tpr is the new run input file. To use this new run input file, it is necessary to have the previous .log, .cpt, .trr, and .edr files within the same directory. Also, it is necessary to use the same naming convention as the previous files, so the -deffnm
argument must be used. With these files and arguments, the simulation can be started with:
gmx
mdrun
-s NPT_extend_1ns.tpr -cpi NPT.cpt -deffnm NPT