.. _User-detail:

rPredictorDB record detail
************************

This chapter describes what the individual fields in the record detail returned by a successful search mean. 

The record detail is divided into the following sections:

* :ref:`User-detail-record`

* :ref:`User-detail-primary`

* :ref:`User-detail-predictions`

* :ref:`User-detail-references`

* :ref:`User-detail-features`

* :ref:`User-detail-xrefs`

Individual tools may add tool-specific fields to the detail. These are described separately in the section :ref:`User-detail-toolspecific`.


.. _User-detail-record:

General information
===================

This section contains general information about the rPredictorDB record: its unique identification, taxonomy and description.


.. _User-detail-record-accessionnumber:

Accession number
----------------

The accession number uniquely identifies a molecule. This is **not** the same as a rPredictorDB record: multiple transcription sites coding various RNA types (5S, 16S, 23S...) may be sequenced from the same DNA molecule and processed as rPredictorDB records, so the accession number will be the same for all these records. 

A rPredictorDB record is uniquely identified by the triplet ``accession, start, stop``.

This field is directly searchable by the :ref:`User-toolkit-reference-annotation`, see :ref:`User-toolkit-reference-annotation-accession`. 


.. _User-detail-record-organismname:

Full organism name
------------------

The scientific name of the source organism.

This field is directly searchable by the :ref:`User-toolkit-reference-taxonomy`, see :ref:`User-toolkit-reference-taxonomy-organismnamecontains`. 


.. _User-detail-record-taxonomicdivision:

Path name
---------

The NCBI taxonomic division to which the rPredictorDB record belongs.

This field is directly searchable by the :ref:`User-toolkit-reference-taxonomy`, see :ref:`User-toolkit-reference-taxonomy-taxonomy`.


.. _User-detail-primary-moleculetype:

Full dataset description
------------------------

A type of the sequence. '18S ribosomal RNA' for sequences from the SILVA database and a description of the corresponding family for sequences from the Rfam database. See :ref:`User-rData`.

This field is directly searchable by the :ref:`User-toolkit-reference-annotation`, see :ref:`User-toolkit-reference-annotation-moleculetype`.


.. _User-detail-record-description:

Description
-------------

Additional description of the record.

This field is directly searchable by the :ref:`User-toolkit-reference-annotation`, see :ref:`User-toolkit-reference-annotation-descriptioncontains`. 


.. _User-detail-primary:

Primary structure
=================

This section contains information about the sequence itself: length, fields related to sequence quality, date of publication, further description incl. EMBL classification, taxonomic information and the sequence itself. 


.. _User-detail-primary-startposition:

Start position
--------------

The position in the molecule identified by the :ref:`User-detail-record-accessionnumber` at which the rPredictorDB record sequence starts.

A rPredictorDB record is uniquely identified by the triplet ``accession, start, stop``.


.. _User-detail-primary-stopposition:

Stop position
-------------

The position in the molecule identified by the :ref:`User-detail-record-accessionnumber` at which the rPredictorDB record sequence ends.

A rPredictorDB record is uniquely identified by the triplet ``accession, start, stop``.


.. _User-detail-primary-regionlength:

Region length
-------------

The number of bases participating in the rPredictorDB entry.

This field is directly searchable by the :ref:`User-toolkit-reference-annotation`, see :ref:`User-toolkit-reference-annotation-length`. 



.. _User-detail-primary-firstpublished:

First published
----------------

The date of first publication of given sequence.

This field is directly searchable by the :ref:`User-toolkit-reference-annotation`, see :ref:`User-toolkit-reference-annotation-firstpublished`. 



.. _User-detail-primary-updated:

Updated (ENA rel.)
--------------------

The date when the record was last updated in the source database (ENA) and the corresponding ENA release version.


.. _User-detail-primary-annotationsource:

Annotation source
------------------

Sources of sequence annotations are listed here (EMBL, RNAmmer, etc.).


.. _User-detail-primary-sequencequality: 

Sequence quality
----------------

Sequence quality is a percentage that indicates the level of confidence that the sequence is sequenced correctly. It is based on ambiguities, homopolymers and vector contamination. For the exact formula, see section on Quality control `in the latest SILVA article <http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3531112/>`_

.. note:: The information is available for records from SILVA database only.  

This field is directly searchable by the :ref:`User-toolkit-reference-annotation`, see :ref:`User-toolkit-reference-annotation-minimalsequencequality`. 


.. _User-detail-primary-pintailquality:

Pintail quality
---------------

Pintail quality indicates the probability that the sequence is not an anomalous product of the sequencing procedure. The Pintail application was used to determine this number. For more details, read the `original article on Pintail <http://www.ncbi.nlm.nih.gov/pubmed/16332745>`_.

.. note:: The information is available for records from SILVA database only.  


.. _User-detail-primary-alignmentquality:

Alignment quality
-----------------

Alignment quality is an indicator of how consistent the given sequence is with other sequences of the same type. For computing this value we use the `Rfam seed alignment <http://rfam.xfam.org/help#tabview=tab3>`_.

.. note:: The information is available for records from Rfam database only.


.. _User-detail-primary-sequence:

Sequence
---------

The sequence itself. Note that aside from A, C, G and T/U, it will may contain codes for ambiguous residues. Here is `a complete list of ambiguous nucleotide IUPAC codes <http://www.bioinformatics.org/sms/iupac.html>`_.

.. note:: The sequence display window can be re-sized and the sequence shown in a separate window for easier copy-pasting.  

This field is directly searchable by the :ref:`User-toolkit-reference-sequence`, see :ref:`User-toolkit-reference-sequence-sequence`. 


.. _User-detail-predictions:

Secondary structure
===================

This section contains information about the predicted secondary structure for the sequence. Under the section heading is a text that identifies the version of the tool used for obtaining the prediction and the used template. Also there are shown measures of quality of the prediction.

Aside from the predicted structure and its visualization, a list of :ref:`User-structural-features` is returned.

The number of occurrences in the predicted structure is given for all of the structural feature entities described here. The regions where the individual structural features occur can be viewed in detail by clinking on ``Show`` next to the number of their occurences.

Some features consist of more than one region. Regions belonging to the same feature are always ordered from the 5'-end of the molecule. Region indexing starts at zero (the 5'-overhang always starts at position ``0``). 


.. _User-detail-predictions-similarity:

Template similarity
-------------------

Length of the alignment between sequences of the record and the template divided by the average of lengths of query and template.


.. _User-detail-predictions-zscore:

z-score
-------

A measure of the quality of the prediction. (see `wikipedia <https://en.wikipedia.org/wiki/Standard_score>`_)


.. _User-detail-predictions-Foverhangs:

5'-overhangs (Foverhangs)
-------------------------

For a description of the 5'-overhang, see :ref:`User-structural-features-foverhang`. There is always only one overhang and it consists of one region, starting at the first position (``0``) in the structure.


.. _User-detail-predictions-toverhangs:

3'-overhangs (Toverhangs) 
-------------------------

For a description of the 3'-overhang, see :ref:`User-structural-features-toverhang`. There is always only one overhang and it consists of one region, ending at the last position in the structure.


.. _User-detail-predictions-helices:

Helices
-------

For a description of what helices are, see :ref:`User-structural-features-helix`. Helices always consist of two regions.


.. _User-detail-predictions-hairpins:

Hairpins
--------

For a description of what bulges are, see :ref:`User-structural-features-hairpin`. Bulges always consist of one region.


.. _User-detail-predictions-bulges:

Bulges
------

For a description of what bulges are, see :ref:`User-structural-features-bulge`. Bulges always consist of two regions.


.. _User-detail-predictions-loops:

Loops (internal loops)
----------------------

For a description of what internal loops are, see :ref:`User-structural-features-internal`. Internal loops always consist of two regions.


.. _User-detail-predictions-junctions:

Junctions
---------

For a description of what bulges are, see :ref:`User-structural-features-junction`. Junctions consist of a variable number of regions, but always at least three. (A two-region junction is called an :ref:`User-structural-features-internal`.)


.. _User-detail-predictions-structure:

Structure
---------

The predicted secondary structure in dot-paren representation. The sequence display window can be re-sized and the sequence shown in a separate window for easier copy-pasting. 


.. _User-detail-prediction-rnaplotvisualization:

RNAplot visualization
----------------------

A visualization of the predicted secondary structure generated using the ``RNAplot`` tool from the `Vienna RNA package <http://www.tbi.univie.ac.at/RNA/>`_ . The thumbnail (or the text beneath it) can be clicked and a full-size image will be generated on the fly in a new tab/window.


.. _User-detail-references:

References
==========

This section describes the scientific literature where the sequence was published. There may be (and most often is) more than one reference for a rPredictorDB record. 

The title of the reference is used as an underlined header for the section of the detail with the reference entity fields. Each reference contains the following fields (which we deem rather self-explanatory):

* Consortium
* Submission date
* Journal
* Year
* Volume
* Issue
* First page
* Last page
* Comment
* Reference location
* Type
* Number
* Location
* Authors
* Applicants

Any of them may be ``null``, except for ``Type`` (which describes what kind of publication the reference was: article, submission, patent, etc.), ``Number`` (the order of the reference entity among others for the same record) and ``Authors``.
 

.. _User-detail-features:

Specimen
========

This block contains additional information added to the ENA source database at the discretion of the submission authors. It cannot be relied on to contain any specific fields. However, it often contains identification of the specimen used for sequencing and other information important enough so that we decided to retain it in rPredictorDB.
 


.. _User-detail-xrefs:

Xrefs
=====

The Xrefs section describes the sources of the information in rPredictorDB: other bioinformatical databases. There may be (and often are) more Xref entities for a rPredictorDB record. Each of them consists of the following fields. 

.. _User-detail-xrefs-db:

Db
----

The ID of a database from which information about the rPredictorDB record was pulled. 


.. _User-detail-xrefs-secondary-id:

Secondary id
--------------

If given, a database subset from which the rPredictorDB entry was constructed.

.. _User-detail-xrefs-db-id:

Db id
-----------

The ID under which the rPredictorDB record (or its part) can be found in the given database.


.. _User-detail-toolspecific:

Tool-specific sections
======================

Some tools add specific fields to the sequence detail.

When searching with Sequence tool, the following sections will be available:


.. _User-detail-toolspecific-sequence:

Matching hits
-------------

This section contains information about the alignment with the query sequence.


.. _User-detail-toolspecific-sequence-tool:

Alignment tool
^^^^^^^^^^^^^^

The application used for similarity search. Actually it is BLAST only.


.. _User-detail-toolspecific-sequence-evalue:

E-value
^^^^^^^

The significancy of the hit (see `Blast FAQ <https://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=FAQ#expect>`_ for more details).


.. _User-detail-toolspecific-sequence-bitscore:

Bitscore
^^^^^^^^

A `normalized score <https://www.ncbi.nlm.nih.gov/books/NBK62051/>`_ of the alignment.


.. _User-detail-toolspecific-sequence-score:

Score
^^^^^

A `raw score <https://www.ncbi.nlm.nih.gov/books/NBK62051/>`_ of the alignment.


.. _User-detail-toolspecific-sequence-coverage:


Coverage
^^^^^^^^

Relative length of the aligned region to the query sequence.


.. _User-detail-toolspecific-sequence-identity:

Identity
^^^^^^^^

The identity in the aligned region.


.. _User-detail-toolspecific-sequence-startposition:

Matching start position
^^^^^^^^^^^^^^^^^^^^^^^

Start position of the alignment. The field is present twice - one for the query sequence and one for the returned sequence.


.. _User-detail-toolspecific-sequence-stopposition:

Matching stop position
^^^^^^^^^^^^^^^^^^^^^^

Stop position of the alignment. The field is present twice - one for the query sequence and one for the returned sequence.


.. _User-detail-toolspecific-sequence-matchingsequence:

Matching subsequence
^^^^^^^^^^^^^^^^^^^^

The part of the sequence that was matched by BLAST to the query sequence.