.. _User-toolkit-reference: The rPredictorDB Toolkit Reference ******************************** Herein lies a complete description of the input fields for each tool in the rPredictorDB toolkit (rTools). * :ref:`User-toolkit-reference-taxonomy` * :ref:`User-toolkit-reference-annotation` * :ref:`User-toolkit-reference-sequence` * :ref:`User-toolkit-reference-cppredict` Each input field is described with a short table: =========== ============================================================= Input type A description of what input the field expects Modifiers A list of possible modifiers (see below) Multipliers The type and cardinality of possible multipliers (see below) =========== ============================================================= **Modifiers** are extra options like "greater than" or "before" that modify the meaning of the input to the given field. **Multipliers** enable multiple inputs to be given to the field, for example to search for sequences with one of accession numbers from some group. The **type** of a multiplier is either ``OR``, which says "Find me records that match at least one of my criteria", or ``AND``, which says "Find me only records that match all of my criteria". For example, the :ref:`User-toolkit-reference-annotation-length` field has ``AND`` multiplier: the sequence must fulfill all length constraints placed upon it. The **cardinality** of a multiplier says how many inputs you can at most give for the multiplied field. For instance, the :ref:`User-toolkit-reference-annotation-accession` field has a cardinality of 10, so ten different accession numbers can be searched for. Modifiers and multipliers can be combined - for instance, an interval can be specified by combining a two-valued "before" and "after" modifier with an ``AND`` type multiplier (this is the case in :ref:`User-toolkit-reference-annotation-firstpublished`). .. _User-toolkit-reference-taxonomy: Taxonomy search =============== The description of what the Taxonomy search tool does is here: :ref:`User-toolkit-search-taxonomy`. The individual input fields: .. _User-toolkit-reference-taxonomy-organismnamecontains: Organism name contains ---------------------- =========== ========================================= Input type String (case-insensitive) Modifiers Multipliers =========== ========================================= The **Organism name contains** field expects an organism name (or its part). It is case-insensitive. Combining the :ref:`User-toolkit-reference-annotation-moleculetype` input field and organism name can yield RNA of the desired type for the given organism. .. _User-toolkit-reference-taxonomy-taxonomy: Taxonomy ---------- =========== ============================================== Input type Select (nested): Archaea, Bacteria, Eukaryota Modifiers Multipliers =========== ============================================== The **Taxonomy** field expects a selection from the given taxa. Inner nodes are expandable, i.e. the next level of phylogenetic tree becomes available as another select-box. This way, any number of selections from the top to the bottom of the phylogenetic tree can be made. A number behind a taxon name indicates how many records belongs to the taxon (independently of other input fields). .. _User-toolkit-reference-annotation: Annotations search ================== The description of what the Annotations search tool does is here: :ref:`User-toolkit-search-annotation`. The individual input fields: .. _User-toolkit-reference-annotation-moleculetype: Molecule type ---------------- =========== ========================================================== Input type Select (any, 18S ribosomal RNA, Lysine riboswitch, MicF RNA, etc.) Modifiers Multipliers AND, max. 5 =========== ========================================================== The **Molecule type** field specifies what kind of molecules should be retrieved. The exact list of possible values is extracted at the time of loading the webpage according to the current sequences in the rData. Combining molecule type and :ref:`User-toolkit-reference-taxonomy-organismnamecontains` can yield the desired RNA for the given organism. .. _User-toolkit-reference-annotation-descriptioncontains: Description contains ---------------------- =========== ========================================= Input type String (case-insensitive) Modifiers Multipliers =========== ========================================= The **Description contains** field expects any string. The "Description" field contents are not constrained; usually it will be something along the lines of:: Cerasicoccus maritimus gene for 16S rRNA, partial sequence, strain: YM31-114. .. _User-toolkit-reference-annotation-accession: Accession number ---------------- ============ ================================= Input type String (a code like DQ815398) Modifiers Multipliers OR, max. 10 ============ ================================= The **Accession number** is a code that uniquely identifies a sequenced molecule. Note that this may be both an RNA or a DNA, so there is a possibility that more rRNA sequences derived from the same DNA will have the same accession number. .. _User-toolkit-reference-annotation-firstpublished: First published ---------------- =========== ========================================= Input type Date Modifiers before; after Multipliers AND, max. 2 =========== ========================================= The **First published** field limits the date of the first publication of retrieved sequences. An interval can be specified by using the AND multiplier. A calendar will pop up on selecting the input field; select the date there. .. _User-toolkit-reference-annotation-minimalsequencequality: Minimal sequence quality ------------------------- =========== ========================================= Input type Select (0, 50, 90, 95, 99) Modifiers Multipliers =========== ========================================= The **Minimal sequence quality** field restricts the search to records that have been sequenced with a high degree of precision. .. warning:: This contraint is available for records originating from the SILVA database (i.e. 18S ribosomal RNA) only. Taken from `SILVA FAQs `_: "The sequence quality score is a combination of the percentages of ambiguities, homopolymers longer 4 bases and possible vector contaminations. The overall score was normalized to fit into our unified scoring system ranging between 0 and 100 such as 100 is the best." The formula for computing sequence quality is available `in the latest SILVA publication `_. .. _User-toolkit-reference-annotation-length: Length -------- =========== ========================================= Input type Number Modifiers <, > Multipliers AND, max. 2 =========== ========================================= The **Length** field specifies, trivially, bounds on retrieved sequence length. .. _User-toolkit-reference-sequence: Sequence search =============== The description of what the Sequence search tool does is here: :ref:`User-toolkit-search-sequence`. The individual input fields: .. _User-toolkit-reference-sequence-mincoverage: Min. coverage ------------- =========== ============================================== Input type Select: 40, 65, 80, 85, 90, 95, 98, 99, 100 Modifiers Multipliers =========== ============================================== The **Min. coverage** field defines the minimum relative length of aligned region in the query sequence above which results should be reported. .. _User-toolkit-reference-sequence-minidentity: Min. identity ------------- =========== ============================================== Input type Select: 40, 65, 80, 85, 90, 95, 98, 99, 100 Modifiers Multipliers =========== ============================================== The **Min. identity** field defines the minimum identity in the alignment above which results should be reported. .. _User-toolkit-reference-sequence-sequence: Sequence -------- =========== ========================================== Input type String (allowed characters: A, C, G, T, U) Modifiers Multipliers OR, max. cardinality: 10 =========== ========================================== The **Sequence** input field expects a RNA sequence to which the similarity of rPredictorDB records should be computed. If more are given, rPredictorDB will output all sequences with greater than the given minimum similarity to at least one of the given sequences. .. _User-toolkit-reference-sequence-FASTAfile: Fasta file ----------- =========== ===================================== Input type File (will open an upload dialogue) Modifiers Multipliers =========== ===================================== Instead of copying sequence and header, it is possible to upload a FASTA file with query sequences. .. _User-toolkit-reference-cppredict: CP-predict2 =========== A short description of what CP-predict2 does is here: :ref:`User-toolkit-prediction-cppredict` A detailed description is here: :ref:`User-cp-predict` .. _User-toolkit-reference-cppredict-FASTAheader: FASTA header ------------ =========== ===================================== Input type String Modifiers Multipliers =========== ===================================== The FASTA header containing the name under which the sequence should be kept during prediction. Currently, the only way Cp-predict uses the header is in the output. .. _User-toolkit-reference-cppredict-sequence: Sequence -------- =========== ========================================= Input type String (allowed characters: A, C, G, T, U, M, R, W, S, Y, K, V, H, D, B, N) Modifiers Multipliers =========== ========================================= The **Sequence** input field expects a DNA or RNA sequence. Wildcards (M, R, W, S, Y, K, V, H, D, B, N) will be replaced with the most probable possibility and the resulting sequence will be passed to the prediction algorithm. .. note:: 10 sequences can be predicted at one time. .. _User-toolkit-reference-cppredict-template: Template -------- =========== ===================================== Input type Select (automatic, 1C2X, 4V6X - B2, etc.) Modifiers Multipliers =========== ===================================== A template that should be used for the prediction (see :ref:`User-cp-predict`). The exact list of possible values is extracted at the time of loading the webpage according to the current templates in the rData. In the case of 'automatic' option, each query sequence is globally aligned with all templates and a template from the most probable alignment is used for the prediction. .. _User-toolkit-reference-cppredict-zscore: z-score ------- =========== ===================================== Input type Checkbox Modifiers Multipliers =========== ===================================== Whether to compute the probability of the predicted structure. .. note:: Computing z-score will noticeably slow down the prediction as it requires generate and predict random sequences.