.. _Technical-architecture: Architecture ************* The rPredictorDB infrastructure has several major components: * a database of RNA sequences and secondary structures (*rData*) and Extraction-Transformation-Load mechanisms to build rData (*rETL*), * a set of tools that perform standard tasks on the data like similarity search or secondary structure prediction (*rTools*), * an implementation of a new algorithm dedicated to RNA secondary structure prediction (*CP-predict*), * an internet portal and back-end that makes the rData and rTools components accessible to the research community and general public (*rWeb*), * and finally documentation, split between a User, Technical and API Reference part (*rDoc*). .. figure:: architecture.svg :align: center :figwidth: 650 :width: 650 A high-level overview of the rPredictorDB architecture We will now describe the individual components of rPredictorDB. .. _Technical-architecture-rData: rData ============= The **rData** components is further divided into parts. The major part is the rPredictorDB POSTGRES database, **rDB**. The rDB holds the rPredictorDB *dataset* - all the information about RNA that are available for searching in rPredictorDB. Also under the label of rData, databases for individual tools are grouped. These databases do not offer any extra information; they are merely extracted from rData (or directly from its sources) and re-formatted for efficient use by individual tools. The tool that needs this kind of re-formatting of the whole rPredictorDB dataset is currently :ref:`User-toolkit-search-sequence`. The :ref:`User-toolkit-search-taxonomy` and :ref:`User-toolkit-search-annotation`, on the other hand, queries the database directly. (More on this in the section on :ref:`Technical-architecture-rweb`.) The dataset is generated by combining information external sources (`SILVA `_, `Rfam `_, `ENA `_ and `Taxonomy-NCBI `_ databases). This process is handled by the :ref:`Technical-architecture-rETL` component. A full description of rDB itself is found in :ref:`Technical-rData`. The process of generating tool-specific representations of the rPredictorDB dataset is described in the :ref:`Technical-setup`. A more high-level description of the dataset is available in the User documentation, in :ref:`User-rData`. .. _Technical-architecture-rETL: rETL ============= The *rETL* (Extraction - Transformation - Load) component of rPredictorDB handles downloading and processing data from various sources in order to populate rDB with the rPredictorDB dataset. The process has many steps, from automated queries to parallelized processing of secondary structure predictions. A detailed description of this component can be found in the section :ref:`Technical-rDataETL`. .. _Technical-architecture-rTools: rTools ============== Within rPredictorDB, numerous external tools are integrated. ("External" here means "not a part of rWeb".) They provide the "useful functionality" like various methods of similarity search or secondary structure prediction, including auxiliary functions like gluing various input/output formats togehter. The **rTools** component is a label under which this collection of external (and partially internal) tools is kept. .. warning:: Note that there is a different perspective on what a "tool" is from the point of view of the :ref:`Technical-architecture-rWeb` component. In rWeb, a tool is a PHP class that integrates some search or prediction functionality into the rPredictorDB website. rTools, on the other hand, is a collection of programs that stand *outside* the rWeb component. Not all tools are third-party: under rTools is also grouped :ref:`Technical-setup-cppredict`, a Matlab program that implements :ref:`User-cp-predict` (and some more utilities). The connections between rTools and other components (rData, rETL and rWeb tool classes) merit further explanation of the nature of these relationships: * The rETL component utilizes secondary structure prediction and analysis capability in rTools to obtain structural information about the rPredictorDB dataset. (This information is not present in the source databases.) * The rData component contains tool-specific exports of the dataset that the tools utilize as they run. This includes special databases for similarity search tools or the infrastructure used by CP-predict. * Finally, the tool classes of rWeb execute the external tools based on the user's search/prediction query and collect their results. An overview of rTools is practically synonymous with the list of requirements of type "install a library/tool/package" in :ref:`Technical-setup`. Furthermore, as new functionality will be made available through rWeb, the rTools component will grow accordingly. .. _Technical-architecture-rWeb: rWeb ============ The **rWeb** component serves as a presentation layer for collected and generated data. The web application is written in `PHP `_ with the `Nette Framework `_. The rWeb component is the most complex of rPredictorDB. It is organized into a layered architecture, with client-side scripts on the user end and a pipeline that runs a user's query through presenters, parsers and finally tools and then the results back to the user. The detailed description of rWeb design is available in the section :ref:`Technical-rWeb`. .. _Technical-architecture-rDoc: rDoc ============ The **rDoc** component, rPredictorDB documentation, is split into three major groups: User, Technical and API references. The User and Technical documentation are generated using the `Sphinx `_ library from a central repository. The reference documentation is further split into documentation for individual components, as each component (and in the case of rTools, each sub-component) has its own API reference, often in incompatible formats. The User and Technical documentation generated in HTML form is integrated directly into rWeb, the API reference for rWeb is available as a part of the rPredictorDB site as well.