Voronota version 1.9

About Voronota

The analysis of macromolecular structures often requires a comprehensive definition of atomic neighborhoods. Such a definition can be based on the Voronoi diagram of balls, where each ball represents an atom of some van der Waals radius. Voronota is a software tool for finding all the vertices of the Voronoi diagram of balls. Such vertices correspond to the centers of the empty tangent spheres defined by quadruples of balls. Voronota is especially suitable for processing three-dimensional structures of biological macromolecules such as proteins and RNA.

Since version 1.2 Voronota also uses the Voronoi vertices to construct inter-atom contact surfaces and solvent accessible surfaces. Voronota provides tools to query contacts, generate contacts graphics, compare contacts and evaluate quality of protein structural models using contacts.

Voronota is developed by Kliment Olechnovic (kliment@ibt.lt).

Getting the latest version

Download the latest archive from the official downloads page: https://bitbucket.org/kliment/voronota/downloads.

The archive contains ready-to-use statically compiled 'voronota' program for 64 bit Linux systems. This executable can be rebuilt from the provided source code to work on any modern Linux, Mac OS X or Windows operating systems.

Packages in .deb or .rpm formats are currently not available. However, installing Voronota in Linux or Mac OS X is easy: just copy Voronota executable files ('voronota' program and, if needed, the wrapper scripts) to one of the directories listed in $PATH variable.

Building from source code

Using C++ compiler directly

Voronota has no required external dependencies, only a standard-compliant C++ compiler is needed to build it.

For example, "voronota" executable can be built from the sources in "src" directory using GNU C++ compiler:

g++ -O3 -o voronota src/*.cpp

Using CMake

You can also build using CMake for makefile generation. Starting in the directory containing "CMakeLists.txt" file, run the sequence of commands:

mkdir build ; cd build ; cmake ../ ; make ; cd ../ ; mv build/voronota voronota

Enabling OpenMP

To enable the usage of OpenMP for parallel processing when building using C++ compiler directly, add "-fopenmp" option:

g++ -O3 -fopenmp -o voronota src/*.cpp

When using CMake, OpenMP usage is enabled automatically if it is possible.

Enabling MPI

To enable the usage of MPI for parallel processing, you can use mpic++ compiler wrapper. You also need to define "ENABLE_MPI" macro when buiding:

mpic++ -O3 -DENABLE_MPI -o voronota ./src/*.cpp

Basic usage example

Computing Voronoi vertices

Here is a basic example of computing Voronoi vertices for a structure in a PDB file:

./voronota get-balls-from-atoms-file < input.pdb > balls.txt
./voronota calculate-vertices < balls.txt > vertices.txt

The first command reads a PDB file "input.pdb" and outputs a file "balls.txt" that contains balls corresponding to the atoms in "input.pdb" (by default, Voronota ignores all heteroatoms and all hydrogen atoms when reading PDB files: this behavior can be altered using command-line options). The second command reads "balls.txt" and outputs a file "vertices.txt" that contains a quadruples and empty tangent spheres that correspond to the vertices of the Voronoi diagram of the input balls. The formats of "balls.txt" and "vertices.txt" are described below.

In "balls.txt" the line format is "x y z r # comments". The first four values (x, y, z, r) are atomic ball coordinates and radius. Comments are not needed for further calculations, they are to assist human readers. For example, below is a part of some possible "balls.txt":

28.888 9.409 52.301 1.7 # 1 A 2 SER N
27.638 10.125 52.516 1.9 # 2 A 2 SER CA
26.499 9.639 51.644 1.75 # 3 A 2 SER C
26.606 8.656 50.915 1.49 # 4 A 2 SER O
27.783 11.635 52.378 1.91 # 5 A 2 SER CB
27.69 12.033 51.012 1.54 # 6 A 2 SER OG

In "vertices.txt" the line format is "q1 q2 q3 q4 x y z r". The first four numbers (q1, q2, q3, q4) are numbers of atomic records in "balls.txt", starting from 0. The remaining four values (x, y, z, r) are the coordinates and the radius of an empty tangent sphere of the quadruple of atoms. For example, below is a part of some possible "vertices.txt":

0 1 2 3 27.761 8.691 51.553 -0.169
0 1 2 23 28.275 9.804 50.131 0.588
0 1 3 1438 24.793 -3.225 60.761 14.047
0 1 4 5 28.785 10.604 50.721 0.283
0 1 4 1453 30.018 10.901 55.386 1.908
0 1 5 23 28.544 10.254 50.194 0.595

Computing inter-atom contacts

Taking the "balls.txt" file described in the previous section, here is a basic example of computing inter-atom contacts:

./voronota calculate-contacts < balls.txt > contacts.txt

In "contacts.txt" file the line format is "b1 b2 area". The first two numbers (b1 and b2) are numbers of atomic records in "balls.txt", starting from 0. If b1 does not equal b2, then the 'area' value is the area of contact between atoms b1 and b2. If b1 equals b2, then the 'area' value is the solvent-accessible area of atom b1. For example, below is a part of some possible "contacts.txt":

0 0 35.440
0 1 15.908
0 2 0.167
0 3 7.025
0 4 7.021
0 5 0.624
0 23 2.849
0 25 0.008
0 26 11.323
0 1454 0.021
1 1 16.448
1 2 11.608
1 3 0.327
1 4 14.170
1 5 0.820
1 6 3.902
1 23 0.081
2 2 3.591
2 3 11.714
2 4 0.305
2 5 2.019

Computing annotated inter-atom contacts

Here is a basic example of computing annotated inter-atom contacts:

./voronota get-balls-from-atoms-file --annotated < input.pdb > annotated_balls.txt
./voronota calculate-contacts --annotated < annotated_balls.txt > annotated_contacts.txt

In "annotated_contacts.txt" the line format is "annotation1 annotation2 area distance tags adjuncts [graphics]". The strings 'annotation1' and 'annotation2' describe contacting atoms, the 'area' value is the area of contact between the two atoms, the 'distance' value is the distance between the centers of the contacting atoms. If 'annotation2' contains string "solvent", then the 'area' value is the solvent-accessible area of the atom described by 'annotation1'. The remaining part of the line is used by Voronota querying and drawing commands that are not covered in this section. Below is a part of some possible "annotated_contacts.txt":

c<A>r<2>a<1>R<SER>A<N> c<A>r<2>a<2>R<SER>A<CA> 15.908 1.456 . .
c<A>r<2>a<1>R<SER>A<N> c<A>r<2>a<3>R<SER>A<C> 0.167 2.488 . .
c<A>r<2>a<1>R<SER>A<N> c<A>r<2>a<4>R<SER>A<O> 7.025 2.774 . .
c<A>r<2>a<1>R<SER>A<N> c<A>r<2>a<5>R<SER>A<CB> 7.021 2.486 . .
c<A>r<2>a<1>R<SER>A<N> c<A>r<2>a<6>R<SER>A<OG> 0.624 3.159 . .
c<A>r<2>a<1>R<SER>A<N> c<A>r<5>a<24>R<GLU>A<CB> 2.849 4.628 . .
c<A>r<2>a<1>R<SER>A<N> c<A>r<5>a<26>R<GLU>A<CD> 0.008 4.792 . .
c<A>r<2>a<1>R<SER>A<N> c<A>r<5>a<27>R<GLU>A<OE1> 11.323 3.932 . .
c<A>r<2>a<1>R<SER>A<N> c<A>r<194>a<1501>R<LEU>A<CD2> 0.021 5.465 . .
c<A>r<2>a<1>R<SER>A<N> c<solvent> 35.440 5.9 . .
c<A>r<2>a<2>R<SER>A<CA> c<A>r<2>a<3>R<SER>A<C> 11.608 1.514 . .
c<A>r<2>a<2>R<SER>A<CA> c<A>r<2>a<4>R<SER>A<O> 0.327 2.405 . .
c<A>r<2>a<2>R<SER>A<CA> c<A>r<2>a<5>R<SER>A<CB> 14.170 1.523 . .
c<A>r<2>a<2>R<SER>A<CA> c<A>r<2>a<6>R<SER>A<OG> 0.820 2.430 . .
c<A>r<2>a<2>R<SER>A<CA> c<A>r<3>a<7>R<LYS>A<N> 3.902 2.371 . .
c<A>r<2>a<2>R<SER>A<CA> c<A>r<5>a<24>R<GLU>A<CB> 0.081 4.954 . .
c<A>r<2>a<2>R<SER>A<CA> c<solvent> 16.448 6.1 . .

Getting help in command line

The list of all available Voronota commands is displayed when executing Voronota without any parameters.

Command help is shown when "--help" command line option is present, for example:

./voronota calculate-vertices --help

Using "--help" option without specific command results in printing help for all commands:

./voronota --help

Command reference

List of all commands

Command 'get-balls-from-atoms-file'

Command line arguments:

Name Type Description
--annotated flag to enable annotated mode
--include-heteroatoms flag to include heteroatoms
--include-hydrogens flag to include hydrogen atoms
--multimodel-chains flag to read multiple models in PDB format and rename chains accordingly
--mmcif flag to input in mmCIF format
--radii-file string path to radii configuration file
--default-radius number default atomic radius
--only-default-radius flag to make all radii equal to the default radius
--hull-offset number positive offset distance enables adding artificial hull balls
--help flag to print usage help to stdout and exit

Input stream:

file in PDB or mmCIF format

Output stream:

list of balls

Command 'calculate-vertices'

Command line arguments:

Name Type Description
--print-log flag to print log of calculations
--exclude-hidden-balls flag to exclude hidden input balls
--include-surplus-quadruples flag to include surplus quadruples
--link flag to output links between vertices
--init-radius-for-BSH number initial radius for bounding sphere hierarchy
--check flag to slowly check the resulting vertices (used only for testing)
--help flag to print usage help to stdout and exit

Input stream:

list of balls (line format: 'x y z r')

Output stream:

list of Voronoi vertices, i.e. quadruples with tangent spheres (line format: 'q1 q2 q3 q4 x y z r')

Command 'calculate-vertices-in-parallel'

Command line arguments:

Name Type Description
--method string * parallelization method name, variants are: 'simulated'
--parts number * number of parts for splitting, must be power of 2
--print-log flag to print log of calculations
--include-surplus-quadruples flag to include surplus quadruples
--link flag to output links between vertices
--init-radius-for-BSH number initial radius for bounding sphere hierarchy
--help flag to print usage help to stdout and exit

Input stream:

list of balls (line format: 'x y z r')

Output stream:

list of Voronoi vertices, i.e. quadruples with tangent spheres (line format: 'q1 q2 q3 q4 x y z r')

Command 'calculate-contacts'

Command line arguments:

Name Type Description
--annotated flag to enable annotated mode
--probe number probe radius
--exclude-hidden-balls flag to exclude hidden input balls
--step number curve step length
--projections number curve optimization depth
--sih-depth number spherical surface optimization depth
--add-mirrored flag to add mirrored contacts to non-annnotated output
--draw flag to output graphics for annotated contacts
--tag-centrality flag to tag contacts centrality
--help flag to print usage help to stdout and exit

Input stream:

list of balls

Output stream:

list of contacts

Command 'calculate-mock-solvent'

Command line arguments:

Name Type Description
--solvent-radius number solvent atom radius
--solvent-distance number min distance from non-solvent atoms to solvent atoms
--sih-depth number spherical surface optimization depth
--help flag to print usage help to stdout and exit

Input stream:

list of balls (line format: 'annotation x y z r tags adjuncts')

Output stream:

list of balls (line format: 'annotation x y z r tags adjuncts')

Command 'query-balls'

Command line arguments:

Name Type Description
--match string selection
--match-not string negative selection
--match-tags string tags to match
--match-tags-not string tags to not match
--match-adjuncts string adjuncts intervals to match
--match-adjuncts-not string adjuncts intervals to not match
--match-external-annotations string file path to input matchable annotations
--invert flag to invert selection
--whole-residues flag to select whole residues
--drop-atom-serials flag to drop atom serial numbers from input
--drop-altloc-indicators flag to drop alternate location indicators from input
--drop-tags flag to drop all tags from input
--drop-adjuncts flag to drop all adjuncts from input
--set-tags string set tags instead of filtering
--set-dssp-info string file path to input DSSP file
--set-adjuncts string set adjuncts instead of filtering
--set-external-adjuncts string file path to input external adjuncts
--set-external-adjuncts-name string name for external adjuncts
--rename-chains flag to rename input chains to be in interval from 'A' to 'Z'
--renumber-from-adjunct string adjunct name to use for input residue renumbering
--renumber-positively flag to increment residue numbers to make them positive
--reset-serials flag to reset atom serial numbers
--set-seq-pos-adjunct flag to set normalized sequence position adjunct
--set-ref-seq-num-adjunct string file path to input reference sequence
--ref-seq-alignment string file path to output alignment with reference
--seq-output string file path to output query result sequence string
--chains-summary-output string file path to output chains summary
--chains-seq-identity number sequence identity threshold for chains summary
--help flag to print usage help to stdout and exit

Input stream:

list of balls (line format: 'annotation x y z r tags adjuncts')

Output stream:

list of balls (line format: 'annotation x y z r tags adjuncts')

Command 'query-balls-sequences-pairings-stats'

Command line arguments:

Name Type Description
--help flag to print usage help to stdout and exit

Input stream:

list of balls files

Output stream:

list of sequences pairings stats

Command 'write-balls-to-atoms-file'

Command line arguments:

Name Type Description
--pdb-output string file path to output query result in PDB format
--pdb-output-b-factor string name of adjunct to output as B-factor in PDB format
--pdb-output-template string file path to input template for B-factor insertions
--help flag to print usage help to stdout and exit

Input stream:

list of balls (line format: 'annotation x y z r tags adjuncts')

Output stream:

list of balls (line format: 'annotation x y z r tags adjuncts')

Command 'draw-balls'

Command line arguments:

Name Type Description
--representation string representation name: 'vdw', 'sticks', 'trace' or 'cartoon'
--drawing-for-pymol string file path to output drawing as pymol script
--drawing-for-scenejs string file path to output drawing as scenejs script
--drawing-name string graphics object name for drawing output
--default-color string default color for drawing output, in hex format, white is 0xFFFFFF
--adjunct-gradient string adjunct name to use for gradient-based coloring
--adjunct-gradient-blue number blue adjunct gradient value
--adjunct-gradient-red number red adjunct gradient value
--rainbow-gradient flag to use rainbow color gradient
--adjuncts-rgb flag to use RGB color values from adjuncts
--random-colors flag to use random color for each drawn ball
--random-colors-by-chain flag to use random color for each drawn chain
--use-labels flag to use labels in drawing if possible
--help flag to print usage help to stdout and exit

Input stream:

list of balls (line format: 'annotation x y z r tags adjuncts')

Output stream:

list of balls (line format: 'annotation x y z r tags adjuncts')

Command 'query-contacts'

Command line arguments:

Name Type Description
--match-first string selection for first contacting group
--match-first-not string negative selection for first contacting group
--match-second string selection for second contacting group
--match-second-not string negative selection for second contacting group
--match-min-seq-sep number minimum residue sequence separation
--match-max-seq-sep number maximum residue sequence separation
--match-min-area number minimum contact area
--match-max-area number maximum contact area
--match-min-dist number minimum distance
--match-max-dist number maximum distance
--match-tags string tags to match
--match-tags-not string tags to not match
--match-adjuncts string adjuncts intervals to match
--match-adjuncts-not string adjuncts intervals to not match
--match-external-first string file path to input matchable annotations
--match-external-second string file path to input matchable annotations
--match-external-pairs string file path to input matchable annotation pairs
--no-solvent flag to not include solvent accessible areas
--no-same-chain flag to not include same chain contacts
--invert flag to invert selection
--drop-tags flag to drop all tags from input
--drop-adjuncts flag to drop all adjuncts from input
--set-tags string set tags instead of filtering
--set-hbplus-tags string file path to input HBPLUS file
--set-distance-bins-tags string list of distance thresholds
--inter-residue-hbplus-tags flag to set inter-residue H-bond tags
--set-adjuncts string set adjuncts instead of filtering
--set-external-adjuncts string file path to input external adjuncts
--set-external-adjuncts-name string name for external adjuncts
--inter-residue flag to convert input to inter-residue contacts
--summarize flag to output only summary of contacts
--preserve-graphics flag to preserve graphics in output
--help flag to print usage help to stdout and exit

Input stream:

list of contacts (line format: 'annotation1 annotation2 area distance tags adjuncts [graphics]')

Output stream:

list of contacts (line format: 'annotation1 annotation2 area distance tags adjuncts [graphics]')

Command 'query-contacts-depth-values'

Command line arguments:

Name Type Description
--help flag to print usage help to stdout and exit

Input stream:

list of contacts (line format: 'annotation1 annotation2')

Output stream:

list of depth values (line format: 'annotation depth')

Command 'query-contacts-simulating-unfolding'

Command line arguments:

Name Type Description
--max-seq-sep number * maximum untouchable residue sequence separation
--help flag to print usage help to stdout and exit

Input stream:

list of contacts (line format: 'annotation1 annotation2 area distance tags adjuncts [graphics]')

Output stream:

list of contacts (line format: 'annotation1 annotation2 area distance tags adjuncts')

Command 'draw-contacts'

Command line arguments:

Name Type Description
--drawing-for-pymol string file path to output drawing as pymol script
--drawing-for-jmol string file path to output drawing as jmol script
--drawing-for-scenejs string file path to output drawing as scenejs script
--drawing-name string graphics object name for drawing output
--default-color string default color for drawing output, in hex format, white is 0xFFFFFF
--adjunct-gradient string adjunct name to use for gradient-based coloring
--adjunct-gradient-blue number blue adjunct gradient value
--adjunct-gradient-red number red adjunct gradient value
--adjuncts-rgb flag to use RGB color values from adjuncts
--random-colors flag to use random color for each drawn contact
--alpha number alpha opacity value for drawing output
--use-labels flag to use labels in drawing if possible
--help flag to print usage help to stdout and exit

Input stream:

list of contacts (line format: 'annotation1 annotation2 area distance tags adjuncts graphics')

Output stream:

list of contacts (line format: 'annotation1 annotation2 area distance tags adjuncts graphics')

Command 'plot-contacts'

Command line arguments:

Name Type Description
--background-color string color string in SVG-acceptable format
--default-color string color string in SVG-acceptable format
--adjuncts-rgb flag to use RGB color values from adjuncts
--help flag to print usage help to stdout and exit

Input stream:

list of contacts (line format: 'annotation1 annotation2 area distance tags adjuncts')

Output stream:

plot of contacts in SVG format

Command 'score-contacts-potential'

Command line arguments:

Name Type Description
--input-file-list flag to read file list from stdin
--input-contributions string file path to input contact types contributions
--input-fixed-types string file path to input fixed types
--input-seq-pairs-stats string file path to input sequence pairings statistics
--potential-file string file path to output potential values
--probabilities-file string file path to output observed and expected probabilities
--single-areas-file string file path to output single type total areas
--contributions-file string file path to output contact types contributions
--multiply-areas number coefficient to multiply output areas
--toggling-list string list of toggling subtags
--help flag to print usage help to stdout and exit

Input stream:

list of contacts (line format: 'annotation1 annotation2 conditions area')

Output stream:

line of contact type area summaries (line format: 'annotation1 annotation2 conditions area')

Command 'score-contacts-potentials-stats'

Command line arguments:

Name Type Description
--help flag to print usage help to stdout and exit

Input stream:

list of potential files

Output stream:

list of normalized energy mean and sd values per interaction type

Command 'score-contacts-energy'

Command line arguments:

Name Type Description
--potential-file string * file path to input potential values
--ignorable-max-seq-sep number maximum residue sequence separation for ignorable contacts
--inter-atom-scores-file string file path to output inter-atom scores
--atom-scores-file string file path to output atom scores
--depth number neighborhood normalization depth
--help flag to print usage help to stdout and exit

Input stream:

list of contacts (line format: 'annotation1 annotation2 conditions area')

Output stream:

global scores

Command 'score-contacts-energy-stats'

Command line arguments:

Name Type Description
--help flag to print usage help to stdout and exit

Input stream:

list of atom energy descriptors

Output stream:

list of normalized energy mean and sd values per atom type

Command 'score-contacts-quality'

Command line arguments:

Name Type Description
--default-mean number default mean parameter
--default-sd number default standard deviation parameter
--means-and-sds-file string file path to input atomic mean and sd parameters
--mean-shift number mean shift in standard deviations
--external-weights-file string file path to input external weights for global scoring
--smoothing-window number window to smooth residue quality scores along sequence
--atom-scores-file string file path to output atom scores
--residue-scores-file string file path to output residue scores
--help flag to print usage help to stdout and exit

Input stream:

list of atom energy descriptors

Output stream:

weighted average local score

Command 'compare-contacts'

Command line arguments:

Name Type Description
--target-contacts-file string * file path to input target contacts
--inter-atom-scores-file string file path to output inter-atom scores
--inter-residue-scores-file string file path to output inter-residue scores
--atom-scores-file string file path to output atom scores
--residue-scores-file string file path to output residue scores
--depth number local neighborhood depth
--smoothing-window number window to smooth residue scores along sequence
--smoothed-scores-file string file path to output smoothed residue scores
--detailed-output flag to enable detailed output
--help flag to print usage help to stdout and exit

Input stream:

list of model contacts (line format: 'annotation1 annotation2 area')

Output stream:

global scores (atom-level and residue-level)

Command 'score-scores'

Command line arguments:

Name Type Description
--reference-threshold number reference scores classification threshold
--testable-step number testable scores threshold step
--outcomes-file string file path to output lines of 'threshold TP TN FP FN'
--ROC-curve-file string file path to output ROC curve
--PR-curve-file string file path to output PR curve
--help flag to print usage help to stdout and exit

Input stream:

pairs of reference and testable scores files

Output stream:

global results

Wrapper scripts

CAD-score method wrapper script

The 'voronota-cadscore' script is an implementation of CAD-score (Contact Area Difference score) method using Voronota. The script command line arguments are:

-t input_target_file.pdb
-m input_model_file.pdb
[-T output_residue_scores_on_target_file.pdb]
[-M output_residue_scores_on_model_file.pdb]
[-s residue_scores_smoothing_window_size]

VoroMQA method wrapper script

The 'voronota-voromqa' script is an implementation of VoroMQA (Voronoi diagram-based Model Quality Assessment) method using Voronota. The script command line arguments are:

-i input_file.pdb
[-a output_atom_scores_file.pdb]
[-r output_residue_scores_file.pdb]
[-s residue_scores_smoothing_window_size]