In this exercise, you will calculate the protein folding free energy using thermodynamic integration, a method based on molecular dynamics (MD). The protein will be described by the empirical force field, CHARMM22, http://mackerell.umaryland.edu/charmm_ff.shtml
A model protein you will have to deal with is the alanine decapeptide. The folding/unfolding will be achieved by stretching/compressing the chain. This in practice can be achieved by constraining the distance between the end carbon atoms in the protein and performing different simulations for each value of the distance. The atoms are the number 11 and 91, which are the carbon atoms of the carbonyl groups at the edges of the protein. The distance between these atoms is the collective variable chosen for the system. At each distance one runs a constrained MD to extract the time-averaged forces acting on the collective variable, $F(x)$. Then, a free energy difference can be calculated via thermodynamic integration (TI):
\begin{equation} \Delta A = -\int_a^b F(x)\, dx \end{equation}
Here $a$ and $b$ are the initial and the final values of the collective variable. TI is a general method, which can be applied to a variety of processes, e.g. phase transitions, electron transfer etc.
Download the files: deca_ala_ex3.tar.gz
in the directory deca_ala
you will find
deca_ala.pdb
(protein data base) file contains the coordinates
deca_ala.psf
(protein structure file) file contains connectivity data
par_all27_prot_lipid.inp
contains the force field parameters. You will be using the CHARMM v.22, a popular force field for biologically relevant systems.
md_std.inp
is the CP2K input file
Open the deca_ala.pdb
protein data bank format file with vmd. Create a new representation for the protein, e.g. of type Ribbon to observe the alpha-helix.
Although the image below shows the deca-alanine in water, it is expensive to run thermodynamic integration for a solvated protein with many values of the constraints on small laptops. So we will run TI for the protein in the gas-phase.
Here you are asked to run several MD simulations for different values of the distance between atoms 11 and 91, in each run it will be constrained. In the original file md_std.inp
the distance is set to $14.37$ Å as is in the deca_ala.pdb
file. This is the first step to carry out the termodynamic integration, as described in the equation above.
We have made the script run_ti_jobs.sh
to run these simulations, which you can find inside the compressed file deca_ala.tar.gz
. Take a look at the script and familiarize yourself with it. At which values are we constraining the distances between the carbon atoms? In this case we are performing 5 different simulations, each with a different value of the constraint. You can edit this script to use a larger or smaller number of constraints and to increase or reduce the upper and/or lower bound of integration. Can you guess where in the script we are specifying the values of the constraints?
#!/bin/bash -l for d in $(seq 16 1 20); do mkdir $d cp deca_ala/* $d/. cd $d sed -e "s|14.37|${d}|" md_std.inp > d_${d}.inp cp2k.sopt -i d_${d}.inp -o d_${d}.out cd .. done
Look into the main input file of cp2k md_std.inp
, and try to understand the keywords used as much as possible, by now you should be able to understand most of it, and you can experiment changing some of the keywords to see what happens. Look in particular at the definition of the section CONSTRAINT
where the target value of the distance between the two carbon atoms at the edges of the proteins are constrained for instance to 14.37, and at the COLVAR
section where the the collective variable for the distance between the two C atoms is defined.
&CONSTRAINT &COLLECTIVE COLVAR 1 INTERMOLECULAR TARGET [angstrom] 14.37 &END COLLECTIVE &LAGRANGE_MULTIPLIERS COMMON_ITERATION_LEVELS 1 &END &END CONSTRAINT
&COLVAR &DISTANCE ATOMS 11 91 &END DISTANCE &END
⇒ Each constrained MD will produce a .LagrangeMultLog
-files, which look like this:
Shake Lagrangian Multipliers: -63.547262596 Rattle Lagrangian Multipliers: 63.240598387 Shake Lagrangian Multipliers: -0.326901815 Rattle Lagrangian Multipliers: -0.318145579
grep Shake yourprojectname.LagrangeMultLog | awk '{c++ ; s=s+$4; sq=sq+$4*$4 }END{print s/c, sqrt(sq/c - s*s/(c*c))/(sqrt(c)) }'
\begin{equation} \Delta A = -\int_a^b F(x)\, dx \end{equation}
\begin{equation} \Delta A(x^\prime) = -\int_a^{x^\prime} F(x)\, dx \end{equation}
.xyz
files for some of the constrained MD trajectories and think about what are the fundamental interactions between the constituents of the protein that we are taking into account with the CHARMM force field (e.g. electrostatic, van-der Waals, covalent bonds) and how these may contribute to the stabilization of the protein in a given state.generate_plots.sh
that extracts the average force and the standard error for each constrained MD simulation (see the grep
command line above), and it prints out the file av_force_vs_x.dat
containing the force as a function of the collective variable, and the error on the force (third column). Take a look at the script and modify it if necessary, e.g. if you have changed the lower and upper bound for the constraint or if you have changed the number of constraints.