To come back to the tutorial index, click here.
wat.pdb
- the PDB structure of a water moleculeprotein_pms.pdb
- the structure of the BPTI in PDB formatThe protein has already been protonated and had it's atoms in the ProtoMS naming scheme. Note that there are 3 disulfied bonds in the protein, which are labelled as CYX. The water molecule is the Tip4p model, and is already located in the BPTI cavity. You can read more about setting up structures here.
We will be performing GCMC on the single water molecule, wat.pdb, at a range of different chemical potentials. During these GCMC simulations, the water will be instantaneously switched "on" and "off", and from the average occupancy of the water as a function of chemical potential, we will estimate it's binding free energy. As you'll see below, we actually use something called the "Adams" value rather than the chemical potential. This is for technical reasons, and the two are related to each other by multiplicativen and additive constants.
python2.7 $PROTOMSHOME/tools/make_gcmcbox.py -s wat.pdb
This has created a file called gcmc_box.pdb. Next, we'll use the automatic capabilities of protoms.py to do the rest of the set-up.
python2.7 $PROTOMSHOME/protoms.py -sc protein_pms.pdb -s gcmc --gcmcwater wat.pdb --gcmcbox gcmc_box.pdb --adams -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 --capradius 26
We've input which Adams values (i.e. chemical potentials) we will use with --adams
. These can be chosen after some preliminary analysis that we'll come to later. As the protein was already in the ProtoMS format, we gave it the flag -sc
(for scoop). We also specified the radius of the water droplet with --capradius
; the default radius in protoms.py is 30 Angstroms, which is unnecessary for such a small protein.
Have a look at the simulation system we've created with your favourite molecular viewer. For instance, with vmd:
vmd -m protein_pms.pdb water.pdb wat.pdb gcmc_box.pdb
You should see something like this
While your visualising the system, have a look at the cystein bridges. These need to be restrained in ProtoMS which protoms.py will do automatically when preparing proteins. To see that protein_pms.pdb
specifies which residues to restrain, open it up and see
REMARK chunk fixresidues 1 5 14 30 38 51 55 REMARK chunk fixbackbone 1 4 6 13 15 29 31 37 39 50 52 54 56This fixes the cystein residues, as well as the neighbouring residues. If the above isn't included, the cysteine bonds will break due to a quirk in the backbone sampling of ProtoMS.
run_gcmc.cmd
file there is the line
multigcmc -11.000 -12.000 -13.000 -14.000 -15.000 -16.000 -17.000 -18.000 -19.000 -20.000 -21.000 -22.000 -23.000 -24.000 -25.000 -26.000These sixteen Adams values mean that sixteen cores will be needed to run these simulations. ProtoMS is designed to run mpi, so to execute, enter
mpirun -np 16 $PROTOMSHOME/protoms3 run_gcmc.cmdThis will take approximately five to six hours to run. The length of time and number of jobs means that it is more convenient to run on a computer cluster than your work-station.
out
, try
grep -v WA1 all.pdb > all_nowat.pdb vmd all_nowat.pdband make sure the simulation looks okay.
We can look at the occupancy of the water molecule in each of the simulations. At less negative Adams values, the water has a greater probability to be inserted for the majority of the simulation. In simulations with low Adams values, the water may completely vacant from the cavity. Intermediate Adams values will produce a large number of insertions and deletions. Lets have a look at one such intermediate value. For instance, type
python2.7 $PROTOMSHOME/tools/calc_series.py -f out_gcmc/b_-19.000/results -s solventsonYou may see something like this
As demonstrated in G. A. Ross et. al, Journal of the American Chemical Society, 2015, the relationship between the Adams value and the average occupancy of the GCMC region can be used to calculate the free energy to transfer the water molecule from ideal gas to the GCMC region via
‹N› = 1/(1 + exp(βΔFtrans - B)),
where ‹N› is the average number of water molecules, B is the Adams value, ΔFtrans is the free energy to transfer the water molecule from ideal gas, and β is 1/kBT, with kB denoting Boltzmann's constant, and T the temperature. This equation - a logistic equation - takes the form of a smooth step function, with the point of inflection given by ΔFtrans. While the transfer free energy can be estimated by eye, the most accurate way to calculate ΔFtrans is to fit the logistic equation to the titration data, as ΔFtrans is the only unknown in the equation. The transfer free energy can be calculated by typing
python2.7 $PROTOMSHOME/tools/calc_gcsingle.py -d out_gcmc/b_-* -pwhere the
-p
flag specifies that titration data and fitted logistic curve will be plotted. The script calc_gcsingle.py
estimates the error in the calculated free energy by performing 1000 bootstrap samples of the titration data and refitting the logistic equation each time. The printed results should look something like:
FREE ENERGY ESTIMATES: Least squares estimate = -10.57 kcal/mol Bootstrap estimate = -10.57 +/- 0.19 kcal/molThe first result,
Least squares estimate
, is simply the least squares fit of the transfer free energy, and has no error associated with it. The second result, Bootstrap estimate
is the estimate that has been calculated via bootstrap sampling. The value quoted is the mean transfer free energy from all bootstrap samples, and the error is the standard deviation from those samples. To get the binding free energy of water for the GCMC region, simply take the difference of the transfer free energy and hydration free energy of water. ProtoMS calculates the hydration free energy of water to be about -6.2 kcal/mol, so the binding energy is roughly -10.6 + 6.2 = -4.4 kcal/mol.
Two plots will have been produced when calling calc_gcsingle.py
. Like the results, one shows the least squares fit of the logistic function, and the second shows the results of the bootstrap sampling and fitting, and is probably more informative. For the latter, you may see something like
The red line marks the median of the all the bootstrap fits, the orange and grey areas contain 50% and 90% of all the fits respectively. To be clear, these errors illustrate the uncertainty of the fit, not the GCMC titration data. The error produced from the bootstrap sampling can be reduced with data more GCMC simulations, particularly at Adams values near the point of inflection.
B = μex + ln ‹N›,
where B is the Adams value, ‹N› is the average number of water molecules, and μex is the excess chemical potential. As described in J. Chem. Inf. Model., Vol. 54, pg. 1623, it is possible to use an estimate of μex to approximate ΔFtrans. This approximation is only valid when ‹N› approches zero. Unfortunately, at low Adams values - when the occupancy of the water is very low - the data becomes noisy, so that the recomended way to estimate ΔFtrans is via fitting the logistic function as described above. We can start estimating the excess chemical potential at each Adams value by typing
python2.7 $PROTOMSHOME/tools/calc_gcsingle.py -d out_gcmc/b_-* --excesswhich brings up a plot for the excess chemical potential. As you can see, the line appears to become flatter but noiser at low Adams values. It is in this regime that one can approximate ΔFtrans. You will then be prompted to input the range of Adams values from which the excess chemical potential will be estimated. You should chose the range over which the estimates for excess chemical appears constant. For instance, for the plot
-17 -24as the line seems flatest between these values. The excess chemical potential is averaged over this range, which produces our estimate for the coupling free energy. You should predict ΔFtrans to be about -10.5 kcal/mol.
--adams
- input the Adam(s) value(s) you want to simulate at.python2.7 $PROTOMSHOME/protoms.py -sc protein_pms.pdb -s gcmc --gcmcwater wat.pdb --gcmcbox gcmc_box.pdb --adams -11the command file
run_gcmc.cmd
will have the line
potential -11.000On the other hand, if one were to enter multiple Adams values, for instance
python2.7 $PROTOMSHOME/protoms.py -sc protein_pms.pdb -s gcmc --gcmcwater wat.pdb --gcmcbox gcmc_box.pdb --adams -11 -12one would find this line in
run_gcmc.cmd
:
multigcmc -11.000 -12.000The difference between these two commands determines whether one needs to run ProtoMS with MPI. For one Adams value, ProtoMS is executed with
$PROTOMSHOME/protoms3 run_gcmc.cmdWith more than one value, ProtoMS is executed with
mpirun -np 16 $PROTOMSHOME/protoms3 run_gcmc.cmd
were 16 would typically be the number of b values in the simulation. When choosing which Adams values to use in ones simulations, note βΔFtrans is the point of half maximum of the logistic curve. Reliable transfer free energy estimates can be achieved with the Adams values around this point.
start by simulating with values between -16 and 0 and analysing the results. You can choose more Adams values to simulate with if you didn't get reliable free energy estimates from the runs you already have.
python2.7 $PROTOMSHOME/protoms.py -sc protein_pms.pdb -s gcmc --gcmcwater wat.pdb --gcmcbox gcmc_box.pdb --adams -19 -20 -21 -r 3will create 3 command files called
run1_bnd.cmd
, run2_bnd.cmd
, and run3_bnd.cmd
, which, when executed, create the output folders out_gcmc1
, out_gcmc2
, and out_gcmc3
. Analysing the results from multiple directories is trivial; all you need to do is enter the directories that contain the ProtoMS results
files. For instance, one can type
python2.7 $PROTOMSHOME/tools/calc_gcsingle.py -d out_gcmc1/b_-* out_gcmc2/b_-* out_gcmc3/b_-*to analyse all the simulation data in one go.
protoms.py
when assigning move proportions is to dedicate half of all trial moves to grand canonical solute moves. A typical run_gcmc.cmd
created by protoms.py
for BPTI will contains the lines
chunk equilibrate 5000000 solvent=0 protein=0 solute=0 insertion=333 deletion=333 gcsolute=333 chunk equilibrate 5000000 solvent=440 protein=60 solute=0 insertion=167 deletion=167 gcsolute=167 chunk simulate 40000000 solvent=440 protein=60 solute=0 insertion=167 deletion=167 gcsolute=167Note how the numbers to the right of each "=" sign roughly add up to 1000.The terms
insertion
, deletion
, and gcsolute
are specific to GCMC. Let's look at the last two chunks. As 167+167+167≈500, the proportion of moves dedicated to GCMC is 500/1000=1/2. This default behaviour was designed for cases when the GCMC region would contain tens of water molecules, and not just one water molecule as we have with BPTI. By trailing too many moves on the single water molecule in the BPTI case, we run the risk of undersampling the rest of the system with respect to the water molecule. While that isn't the case with BPTI, it may be for other systems, so it's worth experimenting with different move proportions. Altering the last two chunks to read
chunk equilibrate 5000000 solvent=440 protein=60 solute=0 insertion=50 deletion=50 gcsolute=50 chunk simulate 40000000 solvent=440 protein=60 solute=0 insertion=50 deletion=50 gcsolute=50means that one fifth of all moves will be dedicated to sampling the grand canonical water molecule. The free energy calculated from such simulations is not significantly different to what we calculated before for BPTI, which lends confidense in our estimate.
make_gcmcbox.py
and protoms.py
created the following files:
gcmc_box.pdb
= the box that marks out the volume where GCMC will be carried out; created by mark_gcmcbox.pygcmc_wat.pdb
= a water molecule that GCMC moves will be performed onwater.pdb
= the droplet of solvent water that surrounds the protein run_gcmc.cmd
= the command file that tells ProtoMS what to simulate and howwater_clr.pdb
will be created as well.
protoms.py
. Doing so will grant us some extra flexiblity.
make_gcmcbox.py
to delineate the volume around the structure wat.pdb
. If you don't have such a structure to draw a box around, you can also specify the size and centre of the box instead. For instance,
python2.7 $PROTOMSHOME/tools/make_gcmcbox.py -b 32.67 4.32 10.34 -p 2 -o gcmc_box.pdb
will create gcmc_box.pdb
centered around x=32.67, y=4.32 and x=10.34, with 2 Angstroms either side of the centre such that the box has a volume 4×4×4 Angstroms cubed. Alternatively, one can create the same box by specifying
python2.7 $PROTOMSHOME/tools/make_gcmcbox.py -b 32.67 4.32 10.34 4 4 4 -o gcmc_box.pdb
Here, we've explicitly stated the size of the box in each direction, meaning we could create a cuboid by giving different box lengths in each direction, as opposed to the cube with the -p
flag.
python2.7 $PROTOMSHOME/tools/solvate.py -pr protein_pms.pdb -g droplet -r 26 -b $PROTOMSHOME/data/wbox_tip4p.pdb -o water.pdb
This makes water.pdb
.
We need to specify the water molecules that we'll be sampling with GCMC. We'll use solvate.py
again to fill the volume specified by gcmc_box.pdb
.
python2.7 $PROTOMSHOME/tools/solvate.py -pr gcmc_box.pdb -g flood -b $PROTOMSHOME/data/wbox_tip4p.pdb -o gcmc_wat.pdb
with output gcmc_wat.pdb
.
When running Monte Carlo, ProtoMS will not allow any bulk water molecules (specified in water.pdb
) to enter or leave the dimentions specified in gcmc_box.pdb
. So that we don't block the movement of the grand canonical water molcules (specified in gcmc_wat.pdb
), we need to remove any block water molecules that are with the box:
python2.7 $PROTOMSHOME/tools/clear_gcmcbox.py -b gcmc_box.pdb -s water.pdb -o water_clr.pdb
In the BPTI test case, no water molcules in water.pdb
are within the GCMC region, so no waters are cleared and, in this case, we don't need the created file water_clr.pdb
.
python2.7 $PROTOMSHOME/tools/generate_input.py -p protein_pms.pdb -pw water.pdb --gcmcwater gcmc_wat.pdb --gcmcbox gcmc_box.pdb --adams -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 -o run
creating run_gcmc.cmd
.