Post-translational modification databases

PhosFox can optionally detect novel acetylations and phosphorylations. "Novel" means that the correspoding modification and site have not been reported in the scientific literature according to the given database files. These database files are specific to PhosFox, and they must be specified in the job file. See the user manual for instructions on job file configuration.

Two database files are available for download from the PhosFox web site:

Unzip the downloaded files first, for example into the same directory with PhosFox.

Both files have UniProt accession numbers, so the input peptides must have UniProt accession numbers whenever these files are used.

Important: The source databases (SwissProt and PhosphoSitePlus) are copyrighted, and their use is restricted by licenses. These restrictions apply to the TSV files above as well. See "Licensing and attributions" below for details.

Other PTM databases can be added as well, provided that they can be converted into a format understood by PhosFox. See "Database file format" below for details.

Source databases

At the moment, PTMs from UniProt/SwissProt and PhosphoSitePlus have been extracted and converted to PhosFox-compliant format. We additionally provide a script to download and convert Phosida human PTMs. Phosida must be processed manually because we do not have a permission to redistribute it.

SwissProt

We have downloaded a complete SwissProt knowledge base in a flat file format from ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/ and extracted acetylations and phosphorylations for all proteins from that file. Modifications that have been annotated as "potential", "probable", or likely "by similarity" in SwissProt are considered novel by PhosFox.

PhosphoSitePlus

We have downloaded acetylation and phosphorylation site datasets from http://www.phosphosite.org/staticDownloads.do. From these files we have extracted PTMs for proteins that have UniProt accession numbers. Other PTMs have been discarded.

Phosida

We provide a script for downloading Phosida Homo Sapiens datasets (phosphoproteome and acetylome) from http://www.phosida.de and converting those sets into PhosFox-compatible format. This script, fetch-phosida, is available in the scripts subdirectory. The script requires wget, gunzip and Perl, so a Unix system is recommended (although Cygwin can be used in Windows).

First the script downloads all required files (Phosida CSVs, and IPI and UniProt human FASTAs). Next it converts IPI accession numbers to UniProt accession numbers by consulting mappings in the IPI FASTA file. Since Phosida's PTM sites are not always in one-to-one correspondence with SwissProt sites, the script re-aligns the modifications against UniProt protein sequences.

Unfortunately these identifier mappings and alignments are now always possible. In case of any discrepancies or ambiguituies the script discards the problematic PTM. These events are reported into a log file produced by the script.

Licensing and attributions

UniProt (SwissProt) PTM data has been compiled by the UniProt Consortium. We redistribute it with an explicit permission for non-profit, academic research use only. Any other use must be separately agreed with the UniProt Consortium.

Please use the following citation when you use swissprot.tsv:

The UniProt Consortium. Update on activities at the Universal Protein Resource (UniProt) in 2013. Nucleic Acids Research 41:D43-D47 (2013).

PhosphoSitePlus has been provided by Cell Signaling Technology, Inc. for non-commercial use under Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.

Please use the following citation when you use phosphosite.tsv:

Hornbeck PV. et al. PhosphoSitePlus: a comprehensive resource for investigating the structure and function of experimentally determined post-translational modifications in man and mouse. Nucleic Acids Reseach 40:D261-70 (2012).

Also include "PhosphoSitePlus®, www.phosphosite.org" in appropriate place(s) in your manuscripts or presentations. See http://www.phosphosite.org/staticContact.do for detailed conditions.

Phosida data is provided by the Department of Proteomics and Signal Transduction at Max Planck Institute of Biochemistry. We are not aware of the specific licensing conditions regarding Phosida, so we recommend you to contact Max Planck Institute for terms and conditions, especially for anything else than non-profit academic research.

Please use the following citation for acknowledging Phosida:

Florian Gnad, Jeremy Gunawardena, and Matthias Mann. PHOSIDA 2011: the posttranslational modification database. Nucleic Acids Research 39:D253-D260 (2011).

Database file format

PhosFox database files are simple plain text files. Here is a snippet from one such file:

P42643      phospho        267
Q945L2      phospho        238
P46077      phospho        248
E1P616      acetyl         2
P31946      acetyl         70

The file has one line for each modification. Each row has three columns: accession, modification type and site. The database files that are distributed with PhosFox have always UniProt accession numbers, but this is not required. Modification type is either acetyl (acetylation) or phospho (phosphorylation). Modification sites are assumed to be relative to the beginning of the corresponding UniProt protein sequence, with the first amino acid having position 1.

PhosFox can accept an arbitrary number of database files. For example, if you have your own in-house PTM database you can use it with PhosFox by converting the database into the format above. You can also use other accession numbers than UniProt in your files; just make sure that you use the same identifier scheme in all files specified in your job file (databases, FASTA files, and peptide lists).

Conversion scripts

We provide conversion scripts for SwissProt, PhosphoSite+ and Phosida to facilitate updating of the database files. These scripts are located in the scripts subdirectory. Each script has a short usage instruction at the top. You can use these scripts to update the database files whenever new versions of the source databases are released.