.. _User-toolkit:

The rPredictorDB Toolkit
**********************


As you may know from the introduction (:ref:`User-index`), rPredictorDB tools fall into two distinct categories:

* Search
 
* Secondary structure prediction

This document should help you choose the appropriate tools for your work.

The following tools are available in the rPredictorDB |version| toolkit:

* :ref:`User-toolkit-search-tools`

  * :ref:`User-toolkit-search-taxonomy`

  * :ref:`User-toolkit-search-annotation`

  * :ref:`User-toolkit-search-sequence`

* :ref:`User-toolkit-prediction-tools`

  * :ref:`User-toolkit-prediction-cppredict`


.. _User-toolkit-search-tools:

Search tools
============

There are two approaches to searching rPredictorDB's database: *exact* search and *similarity* search. Exact search tools will help you if you are looking for well-defined criteria such as sequence length, molecule type, accession number (if you know which specific molecule you're looking for), etc. Use similarity search tools if you are looking for a group of molecules that you assume should behave similarly to a molecule of your choice. You can also combine both approches ("give me all molecules similar to this one that come from Arthropoda").

.. note: 

  More on how similarity search works can be found here: :ref:`User-similarity-search`

Different search criteria are always combined using the **AND** operator: when multiple input fields and/or multiple tools are in use, only those results that satisfy all search criteria are returned.

Individual search criteria may be *modified* or *multiplied*. Modifiers for search criteria are most often ``>`` and/or ``<`` signs applied to numerical criteria (such as sequence length). Multipliers are the little ``+`` signs next to the input fields that enable multiplying the search criterion. Each multiplier is either of the ``AND`` type, or of the ``OR`` type (this is given in the alt-text for the multiplier). An example of an ``OR``-combined search criterion is the Accession number search field in the Database search tool, an ``AND``-combined criterion (incl. a modifier) is the Sequence length search field in the same tool.  


.. _User-toolkit-search-taxonomy:

Taxonomy search
---------------

Taxonomy search is an exact search. This tool allows to filter records according to their organism names and taxonomy.

A detailed description of available input fields for the Taxonomy search tool is given in :ref:`User-toolkit-reference-taxonomy`.


.. _User-toolkit-search-annotation:

Annotations search
------------------

Annotations search is an exact search. Many search criteria are available, falling into several broader categories:

* Source descriptors: accession number, type of molecule, publication date and description;

* Sequence information: length and sequencing quality.

A detailed description of available input fields for the Annotation search tool is given in :ref:`User-toolkit-reference-annotation`.


.. _User-toolkit-search-sequence:

Sequence search
---------------

Sequence is a **similarity search** tool. It will output all records with sequences similar to the input sequence(s) above a given cutoff. The output will be sorted from most to least similar. 

This tool uses BLAST to find regions of local similarity between sequences. It compares nucleotide sequences to sequence databases and calculates the statistical significance of matches. The algorithm is described in detail at its `NCBI pages <http://blast.ncbi.nlm.nih.gov/Blast.cgi>`_.

A detailed description of available input fields for the Sequence search tool is given in :ref:`User-toolkit-reference-sequence`.


.. _User-toolkit-prediction-tools:

Prediction tool
===============

Prediction tool output a secondary structure for a given input sequence. The structure is given in dot-paren notation. Prediction tool predicts secondary structure using a custom CP-predict2 algorithm developed specifically with respect to ribosomal RNA.


.. _User-toolkit-prediction-cppredict:

CP-predict2
-----------

CP-predict2 works by selecting a *template* from a set of measured secondary structures and then proceeds to estimate which regions of the target structure can be reliably copied from the template and which should be predicted "from scratch". This simple algorithm exploits the fact that rRNA structure is very well conserved acrossed multiple taxa.

A detailed description of how CP-predict works can be found here: :ref:`User-cp-predict`