This chapter describes what the individual fields in the record detail returned by a successful search mean.
Individual tools may add tool-specific fields to the detail. These are described separately in the section Tool-specific sections.
5.2. Primary structure
This section contains information about the sequence itself: length, fields related to sequence quality, date of publication, further description incl. EMBL classification, taxonomic information and the sequence itself.
5.2.1. Start position
The position in the molecule identified by the Accession number at which the rPredictorDB record sequence starts.
A rPredictorDB record is uniquely identified by the triplet accession, start, stop.
5.2.2. Stop position
The position in the molecule identified by the Accession number at which the rPredictorDB record sequence ends.
A rPredictorDB record is uniquely identified by the triplet accession, start, stop.
5.2.3. Region length
The number of bases participating in the rPredictorDB entry.
This field is directly searchable by the Annotations search, see Length.
5.2.5. Updated (ENA rel.)
The date when the record was last updated in the source database (ENA) and the corresponding ENA release version.
5.2.6. Annotation source
Sources of sequence annotations are listed here (EMBL, RNAmmer, etc.).
5.2.7. Sequence quality
Sequence quality is a percentage that indicates the level of confidence that the sequence is sequenced correctly. It is based on ambiguities, homopolymers and vector contamination. For the exact formula, see section on Quality control in the latest SILVA article
Note
The information is available for records from SILVA database only.
This field is directly searchable by the Annotations search, see Minimal sequence quality.
5.2.8. Pintail quality
Pintail quality indicates the probability that the sequence is not an anomalous product of the sequencing procedure. The Pintail application was used to determine this number. For more details, read the original article on Pintail.
Note
The information is available for records from SILVA database only.
5.2.9. Alignment quality
Alignment quality is an indicator of how consistent the given sequence is with other sequences of the same type. For computing this value we use the Rfam seed alignment.
Note
The information is available for records from Rfam database only.
5.2.10. Sequence
The sequence itself. Note that aside from A, C, G and T/U, it will may contain codes for ambiguous residues. Here is a complete list of ambiguous nucleotide IUPAC codes.
Note
The sequence display window can be re-sized and the sequence shown in a separate window for easier copy-pasting.
This field is directly searchable by the Sequence search, see Sequence.
5.3. Secondary structure
This section contains information about the predicted secondary structure for the sequence. Under the section heading is a text that identifies the version of the tool used for obtaining the prediction and the used template. Also there are shown measures of quality of the prediction.
Aside from the predicted structure and its visualization, a list of Structural features is returned.
The number of occurrences in the predicted structure is given for all of the structural feature entities described here. The regions where the individual structural features occur can be viewed in detail by clinking on Show next to the number of their occurences.
Some features consist of more than one region. Regions belonging to the same feature are always ordered from the 5’-end of the molecule. Region indexing starts at zero (the 5’-overhang always starts at position 0).
5.3.1. Template similarity
Length of the alignment between sequences of the record and the template divided by the average of lengths of query and template.
5.3.2. z-score
A measure of the quality of the prediction. (see wikipedia)
5.3.3. 5’-overhangs (Foverhangs)
For a description of the 5’-overhang, see 5’-overhang. There is always only one overhang and it consists of one region, starting at the first position (0) in the structure.
5.3.4. 3’-overhangs (Toverhangs)
For a description of the 3’-overhang, see 3’-overhang. There is always only one overhang and it consists of one region, ending at the last position in the structure.
5.3.5. Helices
For a description of what helices are, see Helix. Helices always consist of two regions.
5.3.6. Hairpins
For a description of what bulges are, see Hairpin loop. Bulges always consist of one region.
5.3.7. Bulges
For a description of what bulges are, see Bulge. Bulges always consist of two regions.
5.3.8. Loops (internal loops)
For a description of what internal loops are, see Internal loop. Internal loops always consist of two regions.
5.3.9. Junctions
For a description of what bulges are, see Junction. Junctions consist of a variable number of regions, but always at least three. (A two-region junction is called an Internal loop.)
5.3.10. Structure
The predicted secondary structure in dot-paren representation. The sequence display window can be re-sized and the sequence shown in a separate window for easier copy-pasting.
5.3.11. RNAplot visualization
A visualization of the predicted secondary structure generated using the RNAplot tool from the Vienna RNA package . The thumbnail (or the text beneath it) can be clicked and a full-size image will be generated on the fly in a new tab/window.
5.4. References
This section describes the scientific literature where the sequence was published. There may be (and most often is) more than one reference for a rPredictorDB record.
The title of the reference is used as an underlined header for the section of the detail with the reference entity fields. Each reference contains the following fields (which we deem rather self-explanatory):
- Consortium
- Submission date
- Journal
- Year
- Volume
- Issue
- First page
- Last page
- Comment
- Reference location
- Type
- Number
- Location
- Authors
- Applicants
Any of them may be null, except for Type (which describes what kind of publication the reference was: article, submission, patent, etc.), Number (the order of the reference entity among others for the same record) and Authors.
5.5. Specimen
This block contains additional information added to the ENA source database at the discretion of the submission authors. It cannot be relied on to contain any specific fields. However, it often contains identification of the specimen used for sequencing and other information important enough so that we decided to retain it in rPredictorDB.
5.6. Xrefs
The Xrefs section describes the sources of the information in rPredictorDB: other bioinformatical databases. There may be (and often are) more Xref entities for a rPredictorDB record. Each of them consists of the following fields.
5.6.1. Db
The ID of a database from which information about the rPredictorDB record was pulled.
5.6.2. Secondary id
If given, a database subset from which the rPredictorDB entry was constructed.
5.6.3. Db id
The ID under which the rPredictorDB record (or its part) can be found in the given database.