4. The rPredictorDB Toolkit Reference¶

Herein lies a complete description of the input fields for each tool in the rPredictorDB toolkit (rTools).

Taxonomy search
Annotations search
Sequence search
CP-predict2

Each input field is described with a short table:

Input type	A description of what input the field expects
Modifiers	A list of possible modifiers (see below)
Multipliers	The type and cardinality of possible multipliers (see below)

Modifiers are extra options like “greater than” or “before” that modify the meaning of the input to the given field.

Multipliers enable multiple inputs to be given to the field, for example to search for sequences with one of accession numbers from some group. The type of a multiplier is either OR, which says “Find me records that match at least one of my criteria”, or AND, which says “Find me only records that match all of my criteria”. For example, the Length field has AND multiplier: the sequence must fulfill all length constraints placed upon it.

The cardinality of a multiplier says how many inputs you can at most give for the multiplied field. For instance, the Accession number field has a cardinality of 10, so ten different accession numbers can be searched for.

Modifiers and multipliers can be combined - for instance, an interval can be specified by combining a two-valued “before” and “after” modifier with an AND type multiplier (this is the case in First published).

4.1. Taxonomy search¶

The description of what the Taxonomy search tool does is here: Taxonomy search.

The individual input fields:

4.1.1. Organism name contains¶

Input type	String (case-insensitive)
Modifiers
Multipliers

The Organism name contains field expects an organism name (or its part). It is case-insensitive.

Combining the Molecule type input field and organism name can yield RNA of the desired type for the given organism.

4.1.2. Taxonomy¶

Input type	Select (nested): Archaea, Bacteria, Eukaryota
Modifiers
Multipliers

The Taxonomy field expects a selection from the given taxa. Inner nodes are expandable, i.e. the next level of phylogenetic tree becomes available as another select-box. This way, any number of selections from the top to the bottom of the phylogenetic tree can be made.

A number behind a taxon name indicates how many records belongs to the taxon (independently of other input fields).

4.2. Annotations search¶

The description of what the Annotations search tool does is here: Annotations search.

The individual input fields:

4.2.1. Molecule type¶

Input type	Select (any, 18S ribosomal RNA, Lysine riboswitch, MicF RNA, etc.)
Modifiers
Multipliers	AND, max. 5

The Molecule type field specifies what kind of molecules should be retrieved. The exact list of possible values is extracted at the time of loading the webpage according to the current sequences in the rData.

Combining molecule type and Organism name contains can yield the desired RNA for the given organism.

4.2.2. Description contains¶

Input type	String (case-insensitive)
Modifiers
Multipliers

The Description contains field expects any string. The “Description” field contents are not constrained; usually it will be something along the lines of:

Cerasicoccus maritimus gene for 16S rRNA, partial sequence, strain: YM31-114.

4.2.3. Accession number¶

Input type	String (a code like DQ815398)
Modifiers
Multipliers	OR, max. 10

The Accession number is a code that uniquely identifies a sequenced molecule. Note that this may be both an RNA or a DNA, so there is a possibility that more rRNA sequences derived from the same DNA will have the same accession number.

4.2.4. First published¶

Input type	Date
Modifiers	before; after
Multipliers	AND, max. 2

The First published field limits the date of the first publication of retrieved sequences. An interval can be specified by using the AND multiplier. A calendar will pop up on selecting the input field; select the date there.

4.2.5. Minimal sequence quality¶

Input type	Select (0, 50, 90, 95, 99)
Modifiers
Multipliers

The Minimal sequence quality field restricts the search to records that have been sequenced with a high degree of precision.

Warning

This contraint is available for records originating from the SILVA database (i.e. 18S ribosomal RNA) only.

Taken from SILVA FAQs: “The sequence quality score is a combination of the percentages of ambiguities, homopolymers longer 4 bases and possible vector contaminations. The overall score was normalized to fit into our unified scoring system ranging between 0 and 100 such as 100 is the best.” The formula for computing sequence quality is available in the latest SILVA publication.

4.2.6. Length¶

Input type	Number
Modifiers	<, >
Multipliers	AND, max. 2

The Length field specifies, trivially, bounds on retrieved sequence length.

4.3. Sequence search¶

The description of what the Sequence search tool does is here: Sequence search.

The individual input fields:

4.3.1. Min. coverage¶

Input type	Select: 40, 65, 80, 85, 90, 95, 98, 99, 100
Modifiers
Multipliers

The Min. coverage field defines the minimum relative length of aligned region in the query sequence above which results should be reported.

4.3.2. Min. identity¶

Input type	Select: 40, 65, 80, 85, 90, 95, 98, 99, 100
Modifiers
Multipliers

The Min. identity field defines the minimum identity in the alignment above which results should be reported.

4.3.3. Sequence¶

Input type	String (allowed characters: A, C, G, T, U)
Modifiers
Multipliers	OR, max. cardinality: 10

The Sequence input field expects a RNA sequence to which the similarity of rPredictorDB records should be computed. If more are given, rPredictorDB will output all sequences with greater than the given minimum similarity to at least one of the given sequences.

4.3.4. Fasta file¶

Input type	File (will open an upload dialogue)
Modifiers
Multipliers

Instead of copying sequence and header, it is possible to upload a FASTA file with query sequences.

4.4. CP-predict2¶

A short description of what CP-predict2 does is here: CP-predict2

A detailed description is here: CP-predict: a two-phase algorithm for rRNA structure prediction

4.4.1. FASTA header¶

Input type	String
Modifiers
Multipliers

The FASTA header containing the name under which the sequence should be kept during prediction. Currently, the only way Cp-predict uses the header is in the output.

4.4.2. Sequence¶

Input type	String (allowed characters: A, C, G, T, U, M, R, W, S, Y, K, V, H, D, B, N)
Modifiers
Multipliers

The Sequence input field expects a DNA or RNA sequence. Wildcards (M, R, W, S, Y, K, V, H, D, B, N) will be replaced with the most probable possibility and the resulting sequence will be passed to the prediction algorithm.

Note

10 sequences can be predicted at one time.

4.4.3. Template¶

Input type	Select (automatic, 1C2X, 4V6X - B2, etc.)
Modifiers
Multipliers

A template that should be used for the prediction (see CP-predict: a two-phase algorithm for rRNA structure prediction). The exact list of possible values is extracted at the time of loading the webpage according to the current templates in the rData. In the case of ‘automatic’ option, each query sequence is globally aligned with all templates and a template from the most probable alignment is used for the prediction.

4.4.4. z-score¶

Input type	Checkbox
Modifiers
Multipliers

Whether to compute the probability of the predicted structure.

Note

Computing z-score will noticeably slow down the prediction as it requires generate and predict random sequences.

4. The rPredictorDB Toolkit Reference¶

4.1. Taxonomy search¶

4.1.1. Organism name contains¶

4.1.2. Taxonomy¶

4.2. Annotations search¶

4.2.1. Molecule type¶

4.2.2. Description contains¶

4.2.3. Accession number¶

4.2.4. First published¶

4.2.5. Minimal sequence quality¶

4.2.6. Length¶

4.3. Sequence search¶

4.3.1. Min. coverage¶

4.3.2. Min. identity¶

4.3.3. Sequence¶

4.3.4. Fasta file¶

4.4. CP-predict2¶

4.4.1. FASTA header¶

4.4.2. Sequence¶

4.4.3. Template¶

4.4.4. z-score¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

4. The rPredictorDB Toolkit Reference¶

4.1. Taxonomy search¶

4.1.1. Organism name contains¶

4.1.2. Taxonomy¶

4.2. Annotations search¶

4.2.1. Molecule type¶

4.2.2. Description contains¶

4.2.3. Accession number¶

4.2.4. First published¶

4.2.5. Minimal sequence quality¶

4.2.6. Length¶

4.3. Sequence search¶

4.3.1. Min. coverage¶

4.3.2. Min. identity¶

4.3.3. Sequence¶

4.3.4. Fasta file¶

4.4. CP-predict2¶

4.4.1. FASTA header¶

4.4.2. Sequence¶

4.4.3. Template¶

4.4.4. z-score¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation