.. _Technical-setup:

rPredictorDB setup
****************

First, a disclaimer: rPredictorDB is a somewhat complex [#]_ infrastructure. Setting it up from scratch will require considerable time and energy; it uses numerous programming languages, third-party libraries and applications.

Moreover, you will not get anything else than what is already available at `the rPredictorDB website <http://rpredictordb.elixir-czech.cz>`_. The rPredictorDB infrastructure there is being actively developed. Instead of setting up your own rPredictorDB clone, you may want to consider helping develop the rPredictorDB that is already there. See: :ref:`User-FAQ-can-I-help`

If you still really wish to set it up yourself, then this guide should get you going (but consider yourself warned).

.. note:: 

  If you find this guide unclear or incomplete, do not hesitate to `contact us </contact>`_.
  

.. _User-setup-before-starting:

Before starting
===============

Before you embark on the setup process, you will need to get access to the rPredictorDB repository and check it out on your system. To get repository access, `contact us </contact>`_.


.. _User-setup-platform:

Platform
========

The rPredictorDB infrastructure is known to run on Debian 3.2.54-2 and 8.6 x86_64 GNU/Linux. The infrastructure is untested on other platforms. 


.. _User-setup-languages:

Programming languages
=====================

If you wish to set up rPredictorDB, you will need to be able to compile/run source files from the following languages:

* PHP
 
* Matlab 9.1

* C#

* Java

* C, C++

* PL/pgSQL

PHP is used as the main programming language for the website.

Matlab 9.1 is the core language for the cp-predict.

C# and Java are used for :ref:`Technical-rDataETL`.

C and C++ are used for :ref:`Technical-rDataETL` and by various third-party bioinformatical tools that need to be compiled.

PL/pgSQL is used for building the database schema and for importing data into it.


.. _User-setup-process:

Setup process overview
======================

The whole setup process has four distinct parts, each of which consists of multiple steps:

#. Installing rWeb and rData requirements

   #. :ref:`Technical-setup-httpserver`
   
   #. :ref:`Technical-setup-nette`
   
   #. :ref:`Technical-setup-postgresql`

#. Setting up rWeb

   #. :ref:`Technical-setup-rweb`
   
   #. :ref:`Technical-setup-rweb-configuration`
 
#. Setting up rData

   #. :ref:`Technical-setup-ETL`

#. Installing individual tools: for each tool -

   #. Installing tool & requirements,
  
   #. Configuring the tool for rPredictorDB    


.. _Technical-setup-httpserver:

HTTP server
===========

Because the rPredictorDB application is developed in PHP, the first important step is installing an HTTP server which will run it. For using the application on one machine only, it is possible to use a developer server or a public web server.

Although several HTTP servers exist, we strongly recommend the Apache server, which can be installed through default package distribution channel in most GNU/Linux systems.

There are also several packages containing also PhP, for example:

 * `LAMP <http://lamphowto.com/>`_

 * `Zend Server <http://www.zend.com/en/downloads/>`_

Another option is installing the bare Apache HTTP server. The binaries can be downloaded on the `official Apache webpages <http://httpd.apache.org/download.cgi>`_.

After installing Apache HTTP server, installation of PhP is required. It can also be done through a Linux package distribution channel or from binaries available on `official PhP webpages <http://php.net/>`_.


.. _Technical-setup-nette:

Nette
=====

The application uses the open-source framework Nette in version 2.0. It is necessary to verify the ``php.ini`` and ``.htaccess`` settings against Nette requirements. This should not be a problem in most cases, barring minor corrections. The verication can be done `directly from the current rPredictorDB pages <http://rpredictordb.elixir-czech.cz/requirements-checker/checker.php>`_.

 
.. _Technical-setup-postgresql:

PostgreSQL
==========

rPredictorDB uses the PostgreSQL database, which can be downloaded `from its website <http://www.postgresql.org/download/>`_. It can also be installed through standard Linux package distribution channel. For database administration, we recommend the `pgAdmin application <http://www.pgadmin.org>`_ or `phpPgAdmin <http://phppgadmin.sourceforge.net>`_ (clone of famous MySQLAdmin).

In order to access the database from rPredictorDB, access credentials need to be set in the ``app/BaseModule/config.neon`` configuration file in the ``www`` branch.

.. note:: It is necessary to set the database datestyle to "European" (ISO DMY). Otherwise, search by publication date will not work. See `PostgreSQL documentation for details <http://www.postgresql.org/docs/9.1/static/runtime-config-client.html#GUC-DATESTYLE>`_.


.. _Technical-setup-ETL:

rETL
====

See :ref:`Technical-rDataETL` for an overview of what needs to be done to run rETL and populate the rPredictorDB database.

.. note:: Running rETL to populate the database may - and will - take *long* (days), since secondary structure predictions need to be computed, structural features extracted and visualization thumbnails created.


.. _Technical-setup-rweb:

rWeb
============

To set up the website itself, simply copy the ``www`` branch of the repository so that the ``index.php`` file in the ``www`` folder of the branch is accessible at the URL where you want the web published.

Make sure owners, groups and permissions are set correctly so that all executable files can actually be executed by the server, temporary directories (``www/www/files/``, ``temp/`` and subdirectories) are writeable, etc.


.. _Technical-setup-rweb-configuration:

Configuring rWeb
-------------------

The configuration file ``www/config.php`` contains variables that will be accessible to all classes in the ``app/`` infrastructure. **This is the preferred way of setting environmental variables for rWeb tools.**

The most important variables in the configuration file that need to be set are
``TOOL_DIR`` and ``TMP_DIR``. See tool installation instruction for details.

To properly set up the Nette config file (``app\BaseModule\config.neon``), see `Nette configuration manual <http://doc.nette.org/en/2.0/configuring#toc-framework-configuration>`_. The most important directives are database dsn string, correct timezone and variables in common/parameters.


.. _Technical-setup-blast:

Blast
=====

The following paths to executable binaries for Blast and for the source database are currently set in the ``config.php`` file:: 

  
  BLAST_DATABASE = /var/data/blast/data/database 
  BLAST_PATH = /var/rtools/blast/bin 


Requirements
------------

* The Blast package, version 2.2.28+. The package can be downloaded `from the NCBI FTP server <ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.2.28/>`_. Follow the Blast installation instruction `at this NCBI webpage <http://www.ncbi.nlm.nih.gov/books/NBK52640/>`_, sections Installation and Configuration (we'll set up our own database later).

  .. note:: This is *not* the latest version: during rPredictorDB development, version 2.2.29+ was released, with some minor changes. We have NOT tested whether the rPredictorDB Blast tool will run with the new version. 


Installation
------------

After the Blast package has been successfully installed and configured (you can verify by running ``which blastn`` and getting non-blank output), we'll need to finish the procedure of installing Blast *for rPredictorDB*, so that it can be used for searching the rPredictorDB database. To this end, we will need to set up a *Blast database* over the correct dataset.

Blast Data
^^^^^^^^^^

Source data for Blast (and generally for all similarity search tools) can be found `here </download>`__ - there is the dump of all sequences in rData (and therefore all sequences available for searching in rPredictorDB).

Blast database setup
^^^^^^^^^^^^^^^^^^^^

The database is created and filled by data using a utility from the Blast package called ``makeblastdb``::

  makeblastdb -dbtype nucl -title newDB -in input.fasta

where:

* ``-dbtype nucl`` says that the database will contain sequences of nucleotides (Blast can also work with amino-acids, for databases of proteins), 

* ``-title newDB`` sets the name of the newly created database,

* ``-in input.fasta`` say that the database will be filled with data from the file ``input_file.fasta``.  This is a file containing all the sequences among which we will be searching when the Blast tool is deployed; we created this file in the previous step from the SILVA database exports.
   
This command produces several files: one with the extension ``nhr``, one with ``nin`` and one with ``nsq``. If the input file was named ``input.fasta``, files ``input.fasta.nhr``, ``input.fasta.nin`` and ``input.fasta.nsq`` will be created.


Running Blast
^^^^^^^^^^^^^

.. note:: This is not a part of the installation process itself, but it is useful to test whether the Blast setup went correctly.

From the several scripts provided in the Blast package, our Blast search will use ``blastn`` (the "n" stands for *nucleotide*), which is intended to work on databases of nucleic acid sequences with nucleic acid queries.

After the database is prepared, ``blastn`` is ready to process queries.  A query is given by command::

  blastn -db database_to_be_searched -query query_file -outfmt 5 -out output_file.xml

where 

* ``database_to_be_searched`` is a database previously created by the ``makeblastdb`` utility (the ``*.nin`` file - in our previous example, ``in.fasta.nin``),
 
* ``-query query_file`` specifies the file with the query sequence (not a FASTA file, only the sequence!). Optionally, multiple query sequences can be given; each on its own line. 

* ``--outfmt 5`` specifies that Blast should return its results as an XML file,

* ``-out output_file.xml`` specifies the output file name. The search results will be stored to this file. The appropriate suffix depends on the ``--outfmt`` argument.


Blast-related rWeb files
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

..note:: If something doesn't work, this is where to go looking.

The Blast tool class: ``DispatchModule\searchTools\BlastTool``

These classes are present in the ``DispatchModule\Models`` namespace, in the ``app/DispatchModule/models`` subdirectory of branch ``www`` of the rPredictorDB repository. 

* ``BlastModel`` is used to execute the blast commands from the previous paragraph and generates the output (the XML file).

* ``BlastXMLParser`` works with the generated output XML file produced by the blast model. The task of the model is to produce result set generated by blast. 

* ``FileModel`` is used to work with files (main tasks are to create, delete or append content to the file). Mostly used for temporary files.

.. note:: 

  In case of trouble, make sure that the command executed in the ``BlastTool`` class's ``execute`` method corresponds to how the Blast database has been set up, respects environmental variables, etc. (Note that the command is actually set in ``BlastModel``.) 


.. _Technical-setup-cppredict:

Cppredict
=========

CP-predict is the custom rRNA secondary structure prediction algorithm. For a description of what it does, see :ref:`User-cp-predict`, respectively.

Uses the following config variables:

* ``CPPREDICT2_PATH`` a path to the rPredictorDB program

* ``WILDCARDS_PATH`` a path to the program that replacement wildcards in the query sequence

* ``CPPREDICT2_TEMPLATES`` a path to templates used for the prediction


Requirements
------------

In order to run Cppredict on your own rPredictorDB infrastructure, you will need:

* Matlab 9.1

* the `Vienna RNA package <http://www.tbi.univie.ac.at/RNA>`__
 
* ``clustalw2``, which can be found somewhat unorthodoxly at the `Help page of its website <http://www.ebi.ac.uk/Tools/msa/clustalw2/help/>`_ 

* ``ggsearch``, a part of `FASTA package <http://fasta.bioch.virginia.edu/fasta_www2/fasta_list2.shtml>`__

* Optionally, if you want to set up your own templates from structures in the PDB, you will need `x3dna-dssr <http://x3dna.org>`_ 
 
* ``replacement``, the C++ program, in ``branches/predict/packages/replacement``.
 
* Finally, the rPredictorDB Matlab program, in ``branches/predict/packages/rPredictorDB``.


.. _Technical-setup-cppredict-installation:

Installation
------------ 

After installing all prerequisities, install the rPredictorDB program itself. In the package directory (``branches/predict/packages/rPredictorDB`` in the repository), run::

  ./rPredictorDB_web.install

and follow the instructions. Installer also installs the required Matlab runtime if it is not installed yet.
Predictor is then called as::

  path/to/run_run_pactool_pairwise_f.sh /usr/local/MATLAB/MATLAB_Runtime/v901/ -sqs=query.fasta -str=template.br -ALM=clustalw2 -EXTEND_MECHANICALLY_LONELY_PAIRS=0 -BOOTSTRAP=0

where ``path/to/run_run_pactool_pairwise_f.sh`` is the prediction tool, ``/usr/local/MATLAB/MATLAB_Runtime/v901/`` is path to the Matlab runtime, ``-sqs`` set a path to the predicted structure, ``-str`` set a path to the template, ``-ALM`` defines the tool used for mapping between the query and template, ``-EXTEND_MECHANICALLY_LONELY_PAIRS`` set TODO and ``-BOOTSTRAP`` denotes whether z-score should be calculated.


.. rubric:: Footnotes


.. [#] Brutally complicated.