5. rPredictorDB record detail

This chapter describes what the individual fields in the record detail returned by a successful search mean.

The record detail is divided into the following sections:

Individual tools may add tool-specific fields to the detail. These are described separately in the section Tool-specific sections.

5.1. General information

This section contains general information about the rPredictorDB record: its unique identification, taxonomy and description.

5.1.1. Accession number

The accession number uniquely identifies a molecule. This is not the same as a rPredictorDB record: multiple transcription sites coding various RNA types (5S, 16S, 23S...) may be sequenced from the same DNA molecule and processed as rPredictorDB records, so the accession number will be the same for all these records.

A rPredictorDB record is uniquely identified by the triplet accession, start, stop.

This field is directly searchable by the Annotations search, see Accession number.

5.1.2. Full organism name

The scientific name of the source organism.

This field is directly searchable by the Taxonomy search, see Organism name contains.

5.1.3. Path name

The NCBI taxonomic division to which the rPredictorDB record belongs.

This field is directly searchable by the Taxonomy search, see Taxonomy.

5.1.4. Full dataset description

A type of the sequence. ‘18S ribosomal RNA’ for sequences from the SILVA database and a description of the corresponding family for sequences from the Rfam database. See rPredictorDB data and database.

This field is directly searchable by the Annotations search, see Molecule type.

5.1.5. Description

Additional description of the record.

This field is directly searchable by the Annotations search, see Description contains.

5.2. Primary structure

This section contains information about the sequence itself: length, fields related to sequence quality, date of publication, further description incl. EMBL classification, taxonomic information and the sequence itself.

5.2.1. Start position

The position in the molecule identified by the Accession number at which the rPredictorDB record sequence starts.

A rPredictorDB record is uniquely identified by the triplet accession, start, stop.

5.2.2. Stop position

The position in the molecule identified by the Accession number at which the rPredictorDB record sequence ends.

A rPredictorDB record is uniquely identified by the triplet accession, start, stop.

5.2.3. Region length

The number of bases participating in the rPredictorDB entry.

This field is directly searchable by the Annotations search, see Length.

5.2.4. First published

The date of first publication of given sequence.

This field is directly searchable by the Annotations search, see First published.

5.2.5. Updated (ENA rel.)

The date when the record was last updated in the source database (ENA) and the corresponding ENA release version.

5.2.6. Annotation source

Sources of sequence annotations are listed here (EMBL, RNAmmer, etc.).

5.2.7. Sequence quality

Sequence quality is a percentage that indicates the level of confidence that the sequence is sequenced correctly. It is based on ambiguities, homopolymers and vector contamination. For the exact formula, see section on Quality control in the latest SILVA article

Note

The information is available for records from SILVA database only.

This field is directly searchable by the Annotations search, see Minimal sequence quality.

5.2.8. Pintail quality

Pintail quality indicates the probability that the sequence is not an anomalous product of the sequencing procedure. The Pintail application was used to determine this number. For more details, read the original article on Pintail.

Note

The information is available for records from SILVA database only.

5.2.9. Alignment quality

Alignment quality is an indicator of how consistent the given sequence is with other sequences of the same type. For computing this value we use the Rfam seed alignment.

Note

The information is available for records from Rfam database only.

5.2.10. Sequence

The sequence itself. Note that aside from A, C, G and T/U, it will may contain codes for ambiguous residues. Here is a complete list of ambiguous nucleotide IUPAC codes.

Note

The sequence display window can be re-sized and the sequence shown in a separate window for easier copy-pasting.

This field is directly searchable by the Sequence search, see Sequence.

5.3. Secondary structure

This section contains information about the predicted secondary structure for the sequence. Under the section heading is a text that identifies the version of the tool used for obtaining the prediction and the used template. Also there are shown measures of quality of the prediction.

Aside from the predicted structure and its visualization, a list of Structural features is returned.

The number of occurrences in the predicted structure is given for all of the structural feature entities described here. The regions where the individual structural features occur can be viewed in detail by clinking on Show next to the number of their occurences.

Some features consist of more than one region. Regions belonging to the same feature are always ordered from the 5’-end of the molecule. Region indexing starts at zero (the 5’-overhang always starts at position 0).

5.3.1. Template similarity

Length of the alignment between sequences of the record and the template divided by the average of lengths of query and template.

5.3.2. z-score

A measure of the quality of the prediction. (see wikipedia)

5.3.3. 5’-overhangs (Foverhangs)

For a description of the 5’-overhang, see 5’-overhang. There is always only one overhang and it consists of one region, starting at the first position (0) in the structure.

5.3.4. 3’-overhangs (Toverhangs)

For a description of the 3’-overhang, see 3’-overhang. There is always only one overhang and it consists of one region, ending at the last position in the structure.

5.3.5. Helices

For a description of what helices are, see Helix. Helices always consist of two regions.

5.3.6. Hairpins

For a description of what bulges are, see Hairpin loop. Bulges always consist of one region.

5.3.7. Bulges

For a description of what bulges are, see Bulge. Bulges always consist of two regions.

5.3.8. Loops (internal loops)

For a description of what internal loops are, see Internal loop. Internal loops always consist of two regions.

5.3.9. Junctions

For a description of what bulges are, see Junction. Junctions consist of a variable number of regions, but always at least three. (A two-region junction is called an Internal loop.)

5.3.10. Structure

The predicted secondary structure in dot-paren representation. The sequence display window can be re-sized and the sequence shown in a separate window for easier copy-pasting.

5.3.11. RNAplot visualization

A visualization of the predicted secondary structure generated using the RNAplot tool from the Vienna RNA package . The thumbnail (or the text beneath it) can be clicked and a full-size image will be generated on the fly in a new tab/window.

5.4. References

This section describes the scientific literature where the sequence was published. There may be (and most often is) more than one reference for a rPredictorDB record.

The title of the reference is used as an underlined header for the section of the detail with the reference entity fields. Each reference contains the following fields (which we deem rather self-explanatory):

  • Consortium
  • Submission date
  • Journal
  • Year
  • Volume
  • Issue
  • First page
  • Last page
  • Comment
  • Reference location
  • Type
  • Number
  • Location
  • Authors
  • Applicants

Any of them may be null, except for Type (which describes what kind of publication the reference was: article, submission, patent, etc.), Number (the order of the reference entity among others for the same record) and Authors.

5.5. Specimen

This block contains additional information added to the ENA source database at the discretion of the submission authors. It cannot be relied on to contain any specific fields. However, it often contains identification of the specimen used for sequencing and other information important enough so that we decided to retain it in rPredictorDB.

5.6. Xrefs

The Xrefs section describes the sources of the information in rPredictorDB: other bioinformatical databases. There may be (and often are) more Xref entities for a rPredictorDB record. Each of them consists of the following fields.

5.6.1. Db

The ID of a database from which information about the rPredictorDB record was pulled.

5.6.2. Secondary id

If given, a database subset from which the rPredictorDB entry was constructed.

5.6.3. Db id

The ID under which the rPredictorDB record (or its part) can be found in the given database.

5.7. Tool-specific sections

Some tools add specific fields to the sequence detail.

When searching with Sequence tool, the following sections will be available:

5.7.1. Matching hits

This section contains information about the alignment with the query sequence.

5.7.1.1. Alignment tool

The application used for similarity search. Actually it is BLAST only.

5.7.1.2. E-value

The significancy of the hit (see Blast FAQ for more details).

5.7.1.3. Bitscore

A normalized score of the alignment.

5.7.1.4. Score

A raw score of the alignment.

5.7.1.5. Coverage

Relative length of the aligned region to the query sequence.

5.7.1.6. Identity

The identity in the aligned region.

5.7.1.7. Matching start position

Start position of the alignment. The field is present twice - one for the query sequence and one for the returned sequence.

5.7.1.8. Matching stop position

Stop position of the alignment. The field is present twice - one for the query sequence and one for the returned sequence.

5.7.1.9. Matching subsequence

The part of the sequence that was matched by BLAST to the query sequence.