1. Architecture

The rPredictorDB infrastructure has several major components:

  • a database of RNA sequences and secondary structures (rData) and Extraction-Transformation-Load mechanisms to build rData (rETL),
  • a set of tools that perform standard tasks on the data like similarity search or secondary structure prediction (rTools),
  • an implementation of a new algorithm dedicated to RNA secondary structure prediction (CP-predict),
  • an internet portal and back-end that makes the rData and rTools components accessible to the research community and general public (rWeb),
  • and finally documentation, split between a User, Technical and API Reference part (rDoc).

A high-level overview of the rPredictorDB architecture

We will now describe the individual components of rPredictorDB.

1.1. rData

The rData components is further divided into parts. The major part is the rPredictorDB POSTGRES database, rDB. The rDB holds the rPredictorDB dataset - all the information about RNA that are available for searching in rPredictorDB. Also under the label of rData, databases for individual tools are grouped. These databases do not offer any extra information; they are merely extracted from rData (or directly from its sources) and re-formatted for efficient use by individual tools. The tool that needs this kind of re-formatting of the whole rPredictorDB dataset is currently Sequence search. The Taxonomy search and Annotations search, on the other hand, queries the database directly. (More on this in the section on rWeb.)

The dataset is generated by combining information external sources (SILVA, Rfam, ENA and Taxonomy-NCBI databases). This process is handled by the rETL component.

A full description of rDB itself is found in The Data of rPredictorDB.

The process of generating tool-specific representations of the rPredictorDB dataset is described in the rPredictorDB setup.

A more high-level description of the dataset is available in the User documentation, in rPredictorDB data and database.

1.2. rETL

The rETL (Extraction - Transformation - Load) component of rPredictorDB handles downloading and processing data from various sources in order to populate rDB with the rPredictorDB dataset. The process has many steps, from automated queries to parallelized processing of secondary structure predictions.

A detailed description of this component can be found in the section The ETL layer of rPredictorDB.

1.3. rTools

Within rPredictorDB, numerous external tools are integrated. (“External” here means “not a part of rWeb”.) They provide the “useful functionality” like various methods of similarity search or secondary structure prediction, including auxiliary functions like gluing various input/output formats togehter. The rTools component is a label under which this collection of external (and partially internal) tools is kept.

Warning

Note that there is a different perspective on what a “tool” is from the point of view of the rWeb component. In rWeb, a tool is a PHP class that integrates some search or prediction functionality into the rPredictorDB website. rTools, on the other hand, is a collection of programs that stand outside the rWeb component.

Not all tools are third-party: under rTools is also grouped Cppredict, a Matlab program that implements CP-predict: a two-phase algorithm for rRNA structure prediction (and some more utilities).

The connections between rTools and other components (rData, rETL and rWeb tool classes) merit further explanation of the nature of these relationships:

  • The rETL component utilizes secondary structure prediction and analysis capability in rTools to obtain structural information about the rPredictorDB dataset. (This information is not present in the source databases.)
  • The rData component contains tool-specific exports of the dataset that the tools utilize as they run. This includes special databases for similarity search tools or the infrastructure used by CP-predict.
  • Finally, the tool classes of rWeb execute the external tools based on the user’s search/prediction query and collect their results.

An overview of rTools is practically synonymous with the list of requirements of type “install a library/tool/package” in rPredictorDB setup. Furthermore, as new functionality will be made available through rWeb, the rTools component will grow accordingly.

1.4. rWeb

The rWeb component serves as a presentation layer for collected and generated data. The web application is written in PHP with the Nette Framework.

The rWeb component is the most complex of rPredictorDB. It is organized into a layered architecture, with client-side scripts on the user end and a pipeline that runs a user’s query through presenters, parsers and finally tools and then the results back to the user.

The detailed description of rWeb design is available in the section rWeb: the rPredictorDB website.

1.5. rDoc

The rDoc component, rPredictorDB documentation, is split into three major groups: User, Technical and API references. The User and Technical documentation are generated using the Sphinx library from a central repository. The reference documentation is further split into documentation for individual components, as each component (and in the case of rTools, each sub-component) has its own API reference, often in incompatible formats.

The User and Technical documentation generated in HTML form is integrated directly into rWeb, the API reference for rWeb is available as a part of the rPredictorDB site as well.

Table Of Contents

Previous topic

2. rPredictorDB Technical Documentation

Next topic

2. rPredictorDB setup

This Page