wat.pdb
- the PDB structure of a water moleculeprotein.pdb
- the structure of the BPTI in PDB formatThe protein has already been protonated and had it's atoms in the ProtoMS naming scheme. Note that there are 3 disulfied bonds in the protein, which are labelled as CYX. The water molecule is the tip4p model, and is already located in the BPTI cavity. You can read more about setting up structures here.
We will be performing GCMC on the single water molecule, wat.pdb, at a range of different chemical potentials. From the average occupancy of the water as a function of chemical potential, we will estimate it's binding free energy. As you'll see below, we actually use something called the "Adams" value rather than the chemical potential. This is for technical reasons, and the two are related to each other by an additive constant.
python2.7 $PROTOMSHOME/tools/make_gcmcbox.py -s wat.pdb
This has created a file called gcmc_box.pdb. Next, we'll use the automatic capabilities of protoms.py to do the rest of the set-up.
python2.7 $PROTOMSHOME/protoms.py -sc protein.pdb -s gcmc --gcmcwater wat.pdb --gcmcbox gcmc_box.pdb --adams -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 --capradius 26
We've input which Adams values (i.e. chemical potentials) we will use with --adams
. These can be chosen after some preliminary analysis that we'll come to later. As the protein was already in the ProtoMS format, we gave it the flag -sc
(for scoop). We also specified the radius of the water droplet with --capradius
; the default radius in protoms.py is 30 Angstroms, which is unnecessary for such a small protein.
Have a look at the simulation system we've created with your favourite molecular viewer. For instance, with vmd:
vmd -m protein.pdb water.pdb wat.pdb gcmc_box.pdb
You should see something like this
While your visualising the system, have a look at the cystein bridges. These need to be restrained in ProtoMS. Open up the command file run_bnd.cmd
, and add
chunk fixresidues 1 5 14 30 38 51 55 chunk fixbackbone 1 2 4 6 13 15 29 31 37 39 50 52 54 56To the top of the list of "chunks". This fixes the cystein residues, as well as the neighbouring residues. If you don't include the above, the cysteine bonds will break due to a quirk in ProtoMS.
run_bnd.cmd
file there is the line
multigcmc -11.000 -12.000 -13.000 -14.000 -15.000 -16.000 -17.000 -18.000 -19.000 -20.000 -21.000 -22.000 -23.000 -24.000 -25.000 -26.000These sixteen Adams values mean that sixteen cores will be needed to run these simulations. ProtoMS is designed to run mpi, so to execute, enter
mpirun -np 16 $PROTOMSHOME/protoms3 run_bnd.cmdThis will take approximately five to six hours to run. The length of time and number of jobs means that it is more convenient to run on a computer cluster than your work-station.
out
, try
grep -v WAT all.pdb > all_nowat.pdb vmd all_nowat.pdband make sure the simulation looks okay.
We can look at the occupancy of the water molecule in each of the simulations. At less negative Adams values, the water has a greater probability to be inserted for the majority of the simulation. In simulations with low Adams values, the water may completely vacant from the cavity. Intermediate Adams values will produce a large number of insertions and deletions. Lets have a look at one such intermediate value. For instance, type
python2.7 $PROTOMSHOME/tools/calc_series.py -f out3/b_-19.000/results -s solventsonYou may see something like this
Each simulation at a different Adams value can be used to estimate the excess chemical potential of the water molecule in the protein cavity via the equation
B = μex + ln ‹N›,
where B is the Adams value, ‹N› is the average number of water molecules, and μex is the excess chemical potential. We will use μex to approximate the coupling free energy of the water. This approximation is more accurate the lower Adams value is. However, at low Adams values - when the occupancy of the water is very low - the data becomes noisier. Just as with previous studies, we will use human judgement to circumvent these issues. We can do this interactively by typing
python2.7 $PROTOMSHOME/tools/calc_gcsingle.py -d out/b_-*which brings up a plot for the estimates of the excess chemical potential. You will then be prompted to input the range of Adams values from which the excess chemical potential will be estimated. You should chose the range over which the estimates for excess chemical appears constant. For instance, for the plot
-17 -24as the line seems flatest between these values. The excess chemical potential is averaged over this range, which produces our estimate for the coupling free energy. You should the predict the coupling free energy to be about -12 kcal/mol.
--adams
- input the Adam(s) value(s) you want to simulate at.python2.7 ~/ProtoMS3/protoms.py -sc protein.pdb -s gcmc --gcmcwater wat.pdb --gcmcbox gcmc_box.pdb --adams -11the command file
run_bnd.cmd
will have the line
potential -11.000On the other hand, if one were to enter multiple Adams values, for instance
python2.7 ~/ProtoMS3/protoms.py -sc protein.pdb -s gcmc --gcmcwater wat.pdb --gcmcbox gcmc_box.pdb --adams -11 -12one would find this line in
run_bnd.cmd
:
multigcmc -11.000 -12.000The difference between these two commands determines whether one needs to run ProtoMS with MPI. For one Adams value, ProtoMS is executed with
$PROTOMSHOME/protoms3 run_bnd.cmdWith more than one value, ProtoMS is executed with
mpirun -np 16 $PROTOMSHOME/protoms3 run_bnd.cmd
To choose which Adams values to simulate at, start by simulating with values between -16 and 0 and analysing the results. You can choose more Adams values to simulate with if you didn't get reliable free energy estimates from the runs you already have.
python2.7 ~/ProtoMS3/protoms.py -sc protein.pdb -s gcmc --gcmcwater wat.pdb --gcmcbox gcmc_box.pdb --adams -19 -20 -21 -r 3will create 3 command files called
run1_bnd.cmd
, run2_bnd.cmd
, and run3_bnd.cmd
, which, when executed, create the output folders out1
, out2
, and out3
. Analysing the results from multiple directories is trivial; all you need to do is enter the directories that contain the ProtoMS results
files. For instance, one can type
python2.7 $PROTOMSHOME/tools/calc_gcsingle.py -d out1/b_-* out2/b_-* out3/b_-*to analyse all the simulation data in one go.
protoms.py
when assigning move proportions is to dedicate half of all trial moves to grand canonical solute moves. A typical run_bnd.cmd
created by protoms.py
for BPTI will contains the lines
chunk equilibrate 5000000 solvent=440 protein=60 solute=0 insertion=167 deletion=167 gcsolute=167 chunk simulate 40000000 solvent=440 protein=60 solute=0 insertion=167 deletion=167 gcsolute=167Note how the numbers to the right of each "=" sign roughly add up to 1000. The terms
insertion
, deletion
, and gcsolute
are specific to GCMC. As 167+167+167≈500, the proportion of moves dedicated to GCMC is 500/1000=1/2. This default behaviour was designed for cases when the GCMC region would contain tens of water molecules, and not just one water molecule as we have with BPTI. By trailing too many moves on the single water molecule in the BPTI case, we run the risk of undersampling the rest of the system with respect to the water molecule. While that isn't the case with BPTI, it may be for other systems, so it's worth experimenting with different move proportions. Altering the chunks to read
chunk equilibrate 5000000 solvent=440 protein=60 solute=0 insertion=50 deletion=50 gcsolute=50 chunk simulate 40000000 solvent=440 protein=60 solute=0 insertion=50 deletion=50 gcsolute=50means that one fifth of all moves will be dedicated to sampling the grand canonical water molecule. The free energy calculated from such simulations is not significantly different to what we calculated before for BPTI, which lends confidense in our estimate.
make_gcmcbox.py
and protoms.py
created the following files:
gcmc_box.pdb
= the box that marks out the volume where GCMC will be carried out; created by mark_gcmcbox.pygcmc_wat.pdb
= a water molecule that GCMC moves will be performed onwater.pdb
= the droplet of solvent water that surrounds the protein run_bnd.cmd
= the command file that tells ProtoMS what to simulate and howwater_clr.pdb
will be created as well.
protoms.py
. Doing so will grant us some extra flexiblity.
make_gcmcbox.py
to delineate the volume around the structure wat.pdb
. If you don't have such a structure to draw a box around, you can also specify the size and centre of the box instead. For instance,
python2.7 $PROTOMSHOME/tools/make_gcmcbox.py -b 32.67 4.32 10.34 -p 2 -o gcmc_box.pdb
will create gcmc_box_temp.pdb
centered around x=32.67, y=4.32 and x=10.34, with 2 Angstroms either side of the centre such that the box has a volume 4×4×4 Angstroms cubed. Alternatively, one can create the same box by specifying
python2.7 $PROTOMSHOME/tools/make_gcmcbox.py -b 32.67 4.32 10.34 4 4 4 -o gcmc_box.pdb
Here, we've explicitly stated the size of the box in each direction, meaning we can create a cuboid, as opposed to the a cube with the -p
flag.
python2.7 $PROTOMSHOME/tools/solvate.py -pr protein.pdb -g droplet -r 26 -b $PROTOMSHOME/wbox_tip4p.pdb -o water.pdb
This makes water.pdb
.
We need to specify the water molecules that we'll be sampling with GCMC. We'll use solvate.py
again to fill the volume specified by gcmc_box.pdb
.
python2.7 $PROTOMSHOME/tools/solvate.py -pr gcmc_box.pdb -g flood -b $PROTOMSHOME/data/wbox_tip4p.pdb -o gcmc_wat.pdb
with output gcmc_wat.pdb
.
When running Monte Carlo, ProtoMS will not allow any bulk water molecules (specified in water.pdb
) to enter or leave the dimentions specified in gcmc_box.pdb
. So that we don't block the movement of the grand canonical water molcules (specified in gcmc_wat.pdb
), we need to remove any block water molecules that are with the box:
python2.7 $PROTOMSHOME/tools/clear_gcmcbox.py -b gcmc_box.pdb -s water.pdb -o water_clr.pdb
In the BPTI test case, no water molcules in water.pdb
are within the GCMC region, so no waters are cleared and, in this case, we don't need the created file water_clr.pdb
.
python2.7 $PROTOMSHOME/tools/generate_input.py -p protein.pdb -pw water.pdb --gcmcwater gcmc_wat.pdb --gcmcbox gcmc_box.pdb --adams -11 -12 -13 -14 -15 -16 -17 -18 -19 -20 -21 -22 -23 -24 -25 -26 -o run
creating run_bnd.cmd
. Finally, we need to add the contraints to the cysteine bridges:
chunk fixresidues 1 5 14 30 38 51 55 chunk fixbackbone 1 2 4 6 13 15 29 31 37 39 50 52 54 56