We have performed a statistical analysis of the chemical shifts in the PACSY database, 1 which contains >3000 proteins with 3D structures. After removal of misreferenced and misassigned data we have determined refined (multidimensional) chemical shift ranges for intra-residue correlations (13Cā13C, 15Nā13C, etc.). These chemical shift ranges can be used to gain amino-acid type-assignment and/or secondary-structure information from experimental NMR spectra.2
This page provides access to the Python tools we built for analyzing the PACSY database as well as some of the more useful derived data in tabulated form. Based on the purged data, we also provide two command-line programs PLUQin and SQAT. PLUQin can be used to help type assign experimental data. SQAT enables a quick assessment of the quality of assigned chemical shift data when secondary structure information is available.
PLUQ-PLUQin-SQAT-PIQC Python 2.7 code. Instruction for use are below.
CSV formatted data tables with all chemical shift statistics from PIQC.
The code has non-standard Python dependencies. If you are on a Mac or LINUX system with Python 2.X installed, install with:
cd pluq
pip install -r requirements.txt
python setup.py install
It is a bad idea to use your default system Python. Please see `Pluqin_Install_Directions.txt` for an explanation of how to properly install Python.
The functionality of the code and provided scripts will be maintained and extended, but the API will be changed as needed/wanted. If you find bugs or would like to offer improvements please contact Keith (kfritzsc@brandeis.edu) .
Program to help assign protein chemical shifts peaks. Especially helpful for assigning 2D 13C-13C chemical shift correlations. Also provides secondary-structure information.
Use -p for each peak you want to add. If the experiment is a 1D there should be 1 number after -p if the experiment is a 2D there should be two numbers. You can enter as many peaks as you would like. You can set the experiment with the option -e. The default is c (1D carbon). The joint probability cut-off can be set with the option -c. You can enter a negative value to see all options before taking the joint probability.
$ pluqin.py -p 55.2 -p 18 -c 0
input: [55.2], [18.0]
experiment: c
AA p1 p2 p1 p2 Joint H C E
A CA CB 25.8 44.8 91.4 99.2 0.8 -
M CA CE 3.3 8.0 8.6 7.7 42.4 49.9
Peaks positions from a 2D C-C experiment can be entered like:
$ pluqin.py -p 55.2 18 -c 0 -e cc
input: [55.2, 18.0]
experiment: cc
AA p1 p1 Joint H C E
A ('CA', 'CB') 100.0 100.0 92.9 6.9 0.1
Sequence information can be given with the option -s.
$ pluqin.py -p 55.2 -p 18 -s MLFAMM -c 0 -e c
input: [55.2], [18.0]
experiment: c
AA p1 p2 p1 p2 Joint H C E
M CA CE 48.0 68.8 53.8 7.7 42.4 49.9
A CA CB 30.3 31.2 46.2 99.2 0.8 -
Peaks from a 2D C-N experiment can be entered like:
# remember only intra-residue peaks will work.
$ pluqin.py -p 45 103 -e cn
input: [45.0, 103.0]
experiment: cn
AA p1 p1 Joint H C E
G ('CA', 'N') 100.0 100.0 14.6 65.3 20.1
Sometimes PLUQin cannot make a definitive type assignment but still can provide secondary-structure information. Eg. all sheet (E) here
$ pluqin.py -p 175 55 -p 55 35 -e cc
input: [175.0, 55.0], [55.0, 35.0]
experiment: cc
AA p1 p2 p1 p2 Joint H C E
K ('C', 'CA') ('CA', 'CB') 19.0 58.6 66.3 - 3.0 97.0
R ('C', 'CA') ('CA', 'CB') 11.4 13.8 16.0 - 1.6 98.4
E ('C', 'CA') ('CA', 'CB') 14.1 12.3 12.8 - 0.8 99.2
H ('C', 'CA') ('CA', 'CB') 7.4 4.4 4.9 - 3.6 96.4
For a full list of options use: pluqin.py -h
.
Purging by Intrinsic Quality Criteria: Used to identify mis-referenced and otherwise comprised protein chemical shift data sets from the PACSY database. The results that come from running PIQC are downloadable above. Also, the maintainers of the PACSY database will run the analysis monthly and provide the output within the PACSY database. Never-the-less the programs are included in the scripts/build_pacsy directory. Please follow the direction in the readme.txt file. A graphical view of PIQC's output for proteins with 13C chemical shift is below:
Coming Soon (by Feb. 12)