This is a guide to setting up, customizing and running CP-predict. For an overview of what it is and what it does, read CP-predict: a two-phase algorithm for rRNA structure prediction.
In order to be able to use CP-predict, you must have the following installed:
If you want to use CP-predict with the default infrastructure, simply run:
rPredictorDB_web.install
This will install the prediction script and optionally also Matlab runtime necessary for CP-predict to run.
If you wish to customize your installation and/or to understand what CP-predict is doing behind the scenes, read on.
The conversion from the 3D-structure measurements recorded in PDB files to secondary structure is a non-trivial task. It is handled by the pdb2dp.py script. Most of the “heavy lifting” is done by x3dna-dssr, which is the program that determines which residues form base pairs based on the positions of their atoms.
However, there are additional concerns which DSSR does not address. Some residues are not measured in the PDB files. While unmeasured residues would only impede template selection for rather closely related templates, they might distort conservation statistics. More generally, if the sequence is known, we are discarding information by not taking it into account.
Re-inserting unmeasured residues into the structure predicted by DSSR is implemented in pdb2dp.py - the script is essentially a wrapper for DSSR that additionally performs this re-insertion. It uses the *.ct output DSSR provides and scans the input PDB file for REMARK 465 and REMARK 470 records which mark the residues that were not measured at all or from which some atoms are missing.
The other critical functionality that pdb2dp.py provides during the conversion process is untangling: the measurements of 3D structures were not done on individual molecules but on the entire ribosome. In this setting, some residues form base pairs to residues from another rRNA molecule in the ribosomal subunit. These base pairs need to be filtered out for dealing with individual molecules.
The pdb2dp.py script is the primary way of acquiring template structures. The setup_cp_predict.py script uses it internally. To run pdb2dp.py separately, use:
pdb2dp.py -r $CP_ROOT --standard_with_dssr ABCD
to extract secondary structures and unmeasured and tangled region information.
Note
Running x3dna-dssr may take up to several minutes.
Note
pdb2dp.py is also very useful when you want to obtain reference structures for evaluating CP-predict performance.
Running Cp-predict yourself is not necessary for rPredictorDB operation. However, it may be useful to test that everything went correctly. Run:
./cp_predict.sh /usr/local/MATLAB/MATLAB_Runtime/v901/ -sqs=test.fasta -str=template.br
-ALM=clustalw2 -EXTEND_MECHANICALLY_LONELY_PAIRS=0 -BOOTSTRAP=1
for a test run that will go through all nooks and crannies of the prediction algorithm. /usr/local/MATLAB/MATLAB_Runtime/v901/ is a location of Matlab runtime. test.fasta is a FASTA file with a RNA sequence that should be predicted, template.br is a template sequence in the dot-parent format.