5. rPredictorDB record detail¶

This chapter describes what the individual fields in the record detail returned by a successful search mean.

The record detail is divided into the following sections:

General information
Primary structure
Secondary structure
References
Specimen
Xrefs

Individual tools may add tool-specific fields to the detail. These are described separately in the section Tool-specific sections.

5.1. General information¶

This section contains general information about the rPredictorDB record: its unique identification, taxonomy and description.

5.1.1. Accession number¶

The accession number uniquely identifies a molecule. This is not the same as a rPredictorDB record: multiple transcription sites coding various RNA types (5S, 16S, 23S...) may be sequenced from the same DNA molecule and processed as rPredictorDB records, so the accession number will be the same for all these records.

A rPredictorDB record is uniquely identified by the triplet accession, start, stop.

This field is directly searchable by the Annotations search, see Accession number.

5.1.2. Full organism name¶

The scientific name of the source organism.

This field is directly searchable by the Taxonomy search, see Organism name contains.

5.1.3. Path name¶

The NCBI taxonomic division to which the rPredictorDB record belongs.

This field is directly searchable by the Taxonomy search, see Taxonomy.

5.1.4. Full dataset description¶

A type of the sequence. ‘18S ribosomal RNA’ for sequences from the SILVA database and a description of the corresponding family for sequences from the Rfam database. See rPredictorDB data and database.

This field is directly searchable by the Annotations search, see Molecule type.

5.1.5. Description¶

Additional description of the record.

This field is directly searchable by the Annotations search, see Description contains.

5.2. Primary structure¶

This section contains information about the sequence itself: length, fields related to sequence quality, date of publication, further description incl. EMBL classification, taxonomic information and the sequence itself.

5.2.1. Start position¶

The position in the molecule identified by the Accession number at which the rPredictorDB record sequence starts.

A rPredictorDB record is uniquely identified by the triplet accession, start, stop.

5.2.2. Stop position¶

The position in the molecule identified by the Accession number at which the rPredictorDB record sequence ends.

A rPredictorDB record is uniquely identified by the triplet accession, start, stop.

5.2.3. Region length¶

The number of bases participating in the rPredictorDB entry.

This field is directly searchable by the Annotations search, see Length.

5.2.4. First published¶

The date of first publication of given sequence.

This field is directly searchable by the Annotations search, see First published.

5.2.5. Updated (ENA rel.)¶

The date when the record was last updated in the source database (ENA) and the corresponding ENA release version.

5.2.6. Annotation source¶

Sources of sequence annotations are listed here (EMBL, RNAmmer, etc.).

5.2.7. Sequence quality¶

Sequence quality is a percentage that indicates the level of confidence that the sequence is sequenced correctly. It is based on ambiguities, homopolymers and vector contamination. For the exact formula, see section on Quality control in the latest SILVA article

Note

The information is available for records from SILVA database only.

This field is directly searchable by the Annotations search, see Minimal sequence quality.

5.2.8. Pintail quality¶

Pintail quality indicates the probability that the sequence is not an anomalous product of the sequencing procedure. The Pintail application was used to determine this number. For more details, read the original article on Pintail.

Note

The information is available for records from SILVA database only.

5.2.9. Alignment quality¶

Alignment quality is an indicator of how consistent the given sequence is with other sequences of the same type. For computing this value we use the Rfam seed alignment.

Note

The information is available for records from Rfam database only.

5.2.10. Sequence¶

The sequence itself. Note that aside from A, C, G and T/U, it will may contain codes for ambiguous residues. Here is a complete list of ambiguous nucleotide IUPAC codes.

Note

The sequence display window can be re-sized and the sequence shown in a separate window for easier copy-pasting.

This field is directly searchable by the Sequence search, see Sequence.

5.3. Secondary structure¶

This section contains information about the predicted secondary structure for the sequence. Under the section heading is a text that identifies the version of the tool used for obtaining the prediction and the used template. Also there are shown measures of quality of the prediction.

Aside from the predicted structure and its visualization, a list of Structural features is returned.

The number of occurrences in the predicted structure is given for all of the structural feature entities described here. The regions where the individual structural features occur can be viewed in detail by clinking on Show next to the number of their occurences.

Some features consist of more than one region. Regions belonging to the same feature are always ordered from the 5’-end of the molecule. Region indexing starts at zero (the 5’-overhang always starts at position 0).

5.3.1. Template similarity¶

Length of the alignment between sequences of the record and the template divided by the average of lengths of query and template.

5.3.2. z-score¶

A measure of the quality of the prediction. (see wikipedia)

5.3.3. 5’-overhangs (Foverhangs)¶

For a description of the 5’-overhang, see 5’-overhang. There is always only one overhang and it consists of one region, starting at the first position (0) in the structure.

5.3.4. 3’-overhangs (Toverhangs)¶

For a description of the 3’-overhang, see 3’-overhang. There is always only one overhang and it consists of one region, ending at the last position in the structure.

5.3.5. Helices¶

For a description of what helices are, see Helix. Helices always consist of two regions.

5.3.6. Hairpins¶

For a description of what bulges are, see Hairpin loop. Bulges always consist of one region.

5.3.7. Bulges¶

For a description of what bulges are, see Bulge. Bulges always consist of two regions.

5.3.8. Loops (internal loops)¶

For a description of what internal loops are, see Internal loop. Internal loops always consist of two regions.

5.3.9. Junctions¶

For a description of what bulges are, see Junction. Junctions consist of a variable number of regions, but always at least three. (A two-region junction is called an Internal loop.)

5.3.10. Structure¶

The predicted secondary structure in dot-paren representation. The sequence display window can be re-sized and the sequence shown in a separate window for easier copy-pasting.

5.3.11. RNAplot visualization¶

A visualization of the predicted secondary structure generated using the RNAplot tool from the Vienna RNA package . The thumbnail (or the text beneath it) can be clicked and a full-size image will be generated on the fly in a new tab/window.

5.4. References¶

This section describes the scientific literature where the sequence was published. There may be (and most often is) more than one reference for a rPredictorDB record.

The title of the reference is used as an underlined header for the section of the detail with the reference entity fields. Each reference contains the following fields (which we deem rather self-explanatory):

Consortium
Submission date
Journal
Year
Volume
Issue
First page
Last page
Comment
Reference location
Type
Number
Location
Authors
Applicants

Any of them may be null, except for Type (which describes what kind of publication the reference was: article, submission, patent, etc.), Number (the order of the reference entity among others for the same record) and Authors.

5.5. Specimen¶

This block contains additional information added to the ENA source database at the discretion of the submission authors. It cannot be relied on to contain any specific fields. However, it often contains identification of the specimen used for sequencing and other information important enough so that we decided to retain it in rPredictorDB.

5.6. Xrefs¶

The Xrefs section describes the sources of the information in rPredictorDB: other bioinformatical databases. There may be (and often are) more Xref entities for a rPredictorDB record. Each of them consists of the following fields.

5.6.1. Db¶

The ID of a database from which information about the rPredictorDB record was pulled.

5.6.2. Secondary id¶

If given, a database subset from which the rPredictorDB entry was constructed.

5.6.3. Db id¶

The ID under which the rPredictorDB record (or its part) can be found in the given database.

5.7. Tool-specific sections¶

Some tools add specific fields to the sequence detail.

When searching with Sequence tool, the following sections will be available:

5.7.1. Matching hits¶

This section contains information about the alignment with the query sequence.

5.7.1.1. Alignment tool¶

The application used for similarity search. Actually it is BLAST only.

5.7.1.2. E-value¶

The significancy of the hit (see Blast FAQ for more details).

5.7.1.3. Bitscore¶

A normalized score of the alignment.

5.7.1.4. Score¶

A raw score of the alignment.

5.7.1.5. Coverage¶

Relative length of the aligned region to the query sequence.

5.7.1.6. Identity¶

The identity in the aligned region.

5.7.1.7. Matching start position¶

Start position of the alignment. The field is present twice - one for the query sequence and one for the returned sequence.

5.7.1.8. Matching stop position¶

Stop position of the alignment. The field is present twice - one for the query sequence and one for the returned sequence.

5.7.1.9. Matching subsequence¶

The part of the sequence that was matched by BLAST to the query sequence.

Navigation

5. rPredictorDB record detail¶

5.1. General information¶

5.1.1. Accession number¶

5.1.2. Full organism name¶

5.1.3. Path name¶

5.1.4. Full dataset description¶

5.1.5. Description¶

5.2. Primary structure¶

5.2.1. Start position¶

5.2.2. Stop position¶

5.2.3. Region length¶

5.2.4. First published¶

5.2.5. Updated (ENA rel.)¶

5.2.6. Annotation source¶

5.2.7. Sequence quality¶

5.2.8. Pintail quality¶

5.2.9. Alignment quality¶

5.2.10. Sequence¶

5.3. Secondary structure¶

5.3.1. Template similarity¶

5.3.2. z-score¶

5.3.3. 5’-overhangs (Foverhangs)¶

5.3.4. 3’-overhangs (Toverhangs)¶

5.3.5. Helices¶

5.3.6. Hairpins¶

5.3.7. Bulges¶

5.3.8. Loops (internal loops)¶

5.3.9. Junctions¶

5.3.10. Structure¶

5.3.11. RNAplot visualization¶

5.4. References¶

5.5. Specimen¶

5.6. Xrefs¶

5.6.1. Db¶

5.6.2. Secondary id¶

5.6.3. Db id¶

5.7. Tool-specific sections¶

5.7.1. Matching hits¶

5.7.1.1. Alignment tool¶

5.7.1.2. E-value¶

5.7.1.3. Bitscore¶

5.7.1.4. Score¶

5.7.1.5. Coverage¶

5.7.1.6. Identity¶

5.7.1.7. Matching start position¶

5.7.1.8. Matching stop position¶

5.7.1.9. Matching subsequence¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation