3. The rPredictorDB Toolkit

As you may know from the introduction (rPredictorDB User Documentation), rPredictorDB tools fall into two distinct categories:

  • Search
  • Secondary structure prediction

This document should help you choose the appropriate tools for your work.

The following tools are available in the rPredictorDB 1.0 toolkit:

3.1. Search tools

There are two approaches to searching rPredictorDB’s database: exact search and similarity search. Exact search tools will help you if you are looking for well-defined criteria such as sequence length, molecule type, accession number (if you know which specific molecule you’re looking for), etc. Use similarity search tools if you are looking for a group of molecules that you assume should behave similarly to a molecule of your choice. You can also combine both approches (“give me all molecules similar to this one that come from Arthropoda”).

Different search criteria are always combined using the AND operator: when multiple input fields and/or multiple tools are in use, only those results that satisfy all search criteria are returned.

Individual search criteria may be modified or multiplied. Modifiers for search criteria are most often > and/or < signs applied to numerical criteria (such as sequence length). Multipliers are the little + signs next to the input fields that enable multiplying the search criterion. Each multiplier is either of the AND type, or of the OR type (this is given in the alt-text for the multiplier). An example of an OR-combined search criterion is the Accession number search field in the Database search tool, an AND-combined criterion (incl. a modifier) is the Sequence length search field in the same tool.

3.2. Prediction tool

Prediction tool output a secondary structure for a given input sequence. The structure is given in dot-paren notation. Prediction tool predicts secondary structure using a custom CP-predict2 algorithm developed specifically with respect to ribosomal RNA.

3.2.1. CP-predict2

CP-predict2 works by selecting a template from a set of measured secondary structures and then proceeds to estimate which regions of the target structure can be reliably copied from the template and which should be predicted “from scratch”. This simple algorithm exploits the fact that rRNA structure is very well conserved acrossed multiple taxa.

A detailed description of how CP-predict works can be found here: CP-predict: a two-phase algorithm for rRNA structure prediction