Herein lies a complete description of the input fields for each tool in the rPredictorDB toolkit (rTools).
Each input field is described with a short table:
Input type | A description of what input the field expects |
Modifiers | A list of possible modifiers (see below) |
Multipliers | The type and cardinality of possible multipliers (see below) |
Modifiers are extra options like “greater than” or “before” that modify the meaning of the input to the given field.
Multipliers enable multiple inputs to be given to the field, for example to search for sequences with one of accession numbers from some group. The type of a multiplier is either OR, which says “Find me records that match at least one of my criteria”, or AND, which says “Find me only records that match all of my criteria”. For example, the Length field has AND multiplier: the sequence must fulfill all length constraints placed upon it.
The cardinality of a multiplier says how many inputs you can at most give for the multiplied field. For instance, the Accession number field has a cardinality of 10, so ten different accession numbers can be searched for.
Modifiers and multipliers can be combined - for instance, an interval can be specified by combining a two-valued “before” and “after” modifier with an AND type multiplier (this is the case in First published).
The description of what the Taxonomy search tool does is here: Taxonomy search.
The individual input fields:
Input type | String (case-insensitive) |
Modifiers | |
Multipliers |
The Organism name contains field expects an organism name (or its part). It is case-insensitive.
Combining the Molecule type input field and organism name can yield RNA of the desired type for the given organism.
Input type | Select (nested): Archaea, Bacteria, Eukaryota |
Modifiers | |
Multipliers |
The Taxonomy field expects a selection from the given taxa. Inner nodes are expandable, i.e. the next level of phylogenetic tree becomes available as another select-box. This way, any number of selections from the top to the bottom of the phylogenetic tree can be made.
A number behind a taxon name indicates how many records belongs to the taxon (independently of other input fields).
The description of what the Annotations search tool does is here: Annotations search.
The individual input fields:
Input type | Select (any, 18S ribosomal RNA, Lysine riboswitch, MicF RNA, etc.) |
Modifiers | |
Multipliers | AND, max. 5 |
The Molecule type field specifies what kind of molecules should be retrieved. The exact list of possible values is extracted at the time of loading the webpage according to the current sequences in the rData.
Combining molecule type and Organism name contains can yield the desired RNA for the given organism.
Input type | String (case-insensitive) |
Modifiers | |
Multipliers |
The Description contains field expects any string. The “Description” field contents are not constrained; usually it will be something along the lines of:
Cerasicoccus maritimus gene for 16S rRNA, partial sequence, strain: YM31-114.
Input type | String (a code like DQ815398) |
Modifiers | |
Multipliers | OR, max. 10 |
The Accession number is a code that uniquely identifies a sequenced molecule. Note that this may be both an RNA or a DNA, so there is a possibility that more rRNA sequences derived from the same DNA will have the same accession number.
Input type | Date |
Modifiers | before; after |
Multipliers | AND, max. 2 |
The First published field limits the date of the first publication of retrieved sequences. An interval can be specified by using the AND multiplier. A calendar will pop up on selecting the input field; select the date there.
Input type | Select (0, 50, 90, 95, 99) |
Modifiers | |
Multipliers |
The Minimal sequence quality field restricts the search to records that have been sequenced with a high degree of precision.
Warning
This contraint is available for records originating from the SILVA database (i.e. 18S ribosomal RNA) only.
Taken from SILVA FAQs: “The sequence quality score is a combination of the percentages of ambiguities, homopolymers longer 4 bases and possible vector contaminations. The overall score was normalized to fit into our unified scoring system ranging between 0 and 100 such as 100 is the best.” The formula for computing sequence quality is available in the latest SILVA publication.
Input type | Number |
Modifiers | <, > |
Multipliers | AND, max. 2 |
The Length field specifies, trivially, bounds on retrieved sequence length.
The description of what the Sequence search tool does is here: Sequence search.
The individual input fields:
Input type | Select: 40, 65, 80, 85, 90, 95, 98, 99, 100 |
Modifiers | |
Multipliers |
The Min. coverage field defines the minimum relative length of aligned region in the query sequence above which results should be reported.
Input type | Select: 40, 65, 80, 85, 90, 95, 98, 99, 100 |
Modifiers | |
Multipliers |
The Min. identity field defines the minimum identity in the alignment above which results should be reported.
Input type | String (allowed characters: A, C, G, T, U) |
Modifiers | |
Multipliers | OR, max. cardinality: 10 |
The Sequence input field expects a RNA sequence to which the similarity of rPredictorDB records should be computed. If more are given, rPredictorDB will output all sequences with greater than the given minimum similarity to at least one of the given sequences.
Input type | File (will open an upload dialogue) |
Modifiers | |
Multipliers |
Instead of copying sequence and header, it is possible to upload a FASTA file with query sequences.
A short description of what CP-predict2 does is here: CP-predict2
A detailed description is here: CP-predict: a two-phase algorithm for rRNA structure prediction
Input type | String |
Modifiers | |
Multipliers |
The FASTA header containing the name under which the sequence should be kept during prediction. Currently, the only way Cp-predict uses the header is in the output.
Input type | String (allowed characters: A, C, G, T, U, M, R, W, S, Y, K, V, H, D, B, N) |
Modifiers | |
Multipliers |
The Sequence input field expects a DNA or RNA sequence. Wildcards (M, R, W, S, Y, K, V, H, D, B, N) will be replaced with the most probable possibility and the resulting sequence will be passed to the prediction algorithm.
Note
10 sequences can be predicted at one time.
Input type | Select (automatic, 1C2X, 4V6X - B2, etc.) |
Modifiers | |
Multipliers |
A template that should be used for the prediction (see CP-predict: a two-phase algorithm for rRNA structure prediction). The exact list of possible values is extracted at the time of loading the webpage according to the current templates in the rData. In the case of ‘automatic’ option, each query sequence is globally aligned with all templates and a template from the most probable alignment is used for the prediction.
Input type | Checkbox |
Modifiers | |
Multipliers |
Whether to compute the probability of the predicted structure.
Note
Computing z-score will noticeably slow down the prediction as it requires generate and predict random sequences.