.. _Technical-dbtool:

Database querying
*********************

DbTool is the part of rWeb that communicates with the rPredictorDB database, rDB. It is by far the most complex tool in rWeb and merits a detailed explanation.

.. _Technical-dbtool-access:

Access to database
==================

As noted in :ref:`Technical-rWeb-implementation`, rWeb uses the repository pattern to access/manipulate data. 
 
The database queries rely on a specific Repository, which in turn relies on a `Nette Database layer <http://doc.nette.org/cs/2.0/database>`_. The ``BaseRepository`` contains several default methods for data querying and manipulating, which may be used in every repository. Other repositories extend this ``BaseRepository`` and contain methods specific for their purposes. Each repository has its own table which serves as main table while querying; however, this table can be changed via the ``getTable($tableName)`` method.

The Nette Database layer is extended with new reflection class ``DiscoveredViewReflection`` located in ``BaseModule\Database``. Reason for this is that the standard ``DiscoveredReflection`` provided by Nette cannot handle views, which are used in rDB. Therefore, ``DiscoveredViewReflection`` provides some extra functionality to make views work with Nette Database layer.

The appropriate connection must be configured in the Nette config file (``app/BaseModule/config.neon``); see the `Nette configuration manual <http://doc.nette.org/en/2.0/configuring#toc-framework-configuration>`_. The pertinent directive is the database dsn string.


.. _Technical-dbtool-work:

How does DbTool work?
======================

A core idea is to put all parameters (defined by ``wantedParameters``) in "buckets" defined by database criteria - e.g. equals, greater than, like. These criteria are defined at the top of the class and their names start with the prefix ``RULE_`` (all of them are also stored in a ``protected`` variable ``criteria``). When the tool has user values and DB keys in proper values, it is then easy to construct the resulting SQL query. 

This should make deriving new database search forms like structural feature search easier.

.. _Technical-dbtool-work-inputs:

Specifying inputs
-----------------

The first important part of this task is to define ``wantedParameters`` correctly. The important ones from the list in the ``DbTool`` class are:

* ``limit``, which defines the number of results per query; when ``limit`` is too big, queries take too long to execute. This number determines how many records retrieved from the database will appear together when the user scrolls down the results.

* ``offset`` works together with ``limit`` and by default it always has to be 0 - otherwise the query would not perform search on a whole result set. ``Offset`` grows when user scrolls down the results (size of the step is defined by ``limit``).

* ``startPosition``, ``stopPosition`` are present as a possibility to search sequences with precise start and stop positions, and thus identify it uniquely - this is used for inner mechanism like merging sequences from multiple tools (because a sequence with the same accession number can have multiple records with different start and stop in database).

* ``aggregateByAccession`` has similar usage as previous ``startPosition`` and ``stopPosition`` - it enforces aggregation by accession number, so for each accession number, only the longest sequence is displayed

The next important thing is the constructor (method ``__construct``), which accepts an object of type ``\Nette\DI\Container`` (the standard Nette class). This object has the method ``getHttpRequest()``, which contains information about the current HTTP request. The constructor stores this information for later use.

.. _Technical-dbtool-work-parsing:

Parsing inputs
----------------

The method ``addCriteria`` accepts key-value pairs - it first checks, whether key is one of the special reserved words in method ``checkReservedWord``. This method stores ``limit``, ``offset`` and aggregate properties for later usage. If the key is not in the reserved words, it checks if it is in the wanted parameters. If so, it calls ``getCriteriaParam`` on the pair.

The method ``getCriteriaParam`` handles each input key individually. The main idea is to return a pair of the database key (attribute name) and criteria which should be applied for this key (one of the "buckets" we talked about earlier). It also allows change of the value. So, for simple types, their corresponding part of the switch looks like this::

    case 'name':
        $key = '%BASE_VIEW%.name';
        $criteria = self::RULE_LIKE;
        break;

This code states that for ``name``, the database key is ``name`` in table ``%BASE_VIEW`` - that is a magic constant that will be explained later in this section. The criteria is ``RULE_LIKE`` and the value does not changes. In SQL it means that statement::

 ``WHERE %BASE_VIEW%.name LIKE '%$user_value%'`` 

will be a part of the query.

Then there are more complicated parameters - those that also have modifiers and multipliers. The tool needs to manually load this extra data for specified parameter from HTTP request stored earlier. Consider following code::

	case'firstpublished':
        $key = '%BASE_VIEW%.first_public';
        if ($this->httpRequest->getPost('db_firstpublished_direction') == 'gt') {
            $criteria = self::RULE_GT;
        } else {
            $criteria = self::RULE_LT;
        }
        $this->defineMultipliedRules($key, 
              $this->httpRequest->getPost('db_firstpublished_array'), 
              $this->httpRequest->getPost('db_firstpublished_direction_array')
                                     );
        break;

It first states that the database key for ``firstpublished`` is in the table ``%BASE_VIEW%`` and its name is ``first_public``. Then there is an if-condition checking direction (modifier) of the parameter - whether it's "less than" or "greater than". That is enough for adding SQL statement like ``WHERE %BASE_VIEW% < '$user_value'``. But since this input can be multiplied, it additionally calls a helper method ``defineMultipliedRules``.

The method ``defineMultipliedRules`` accept key and two arrays as parameters. The first array (called ``$allValues``) holds values itself and the second array (called ``$allDirections``) holds appropriate modificators. Therefore input arrays from example above might look like this::

    $array1 = array(0 => '10.10.2010');
    $array2 = array(0 => 'gt');

This means that the user wants publication date to be greater than 10. 10. 2010. The method itself then loops over all elements of those arrays (the amount of multiplied items may be unlimited) and adds the values to proper buckets::

    if ($allDirections[0] == 'gt') {
        $this->criteria[self::RULE_GT][] = array($key => $allLengths[0]);
    } else {
        $this->criteria[self::RULE_LT][] = array($key => $allLengths[0]);
    }

The ``addCriteria`` method adds values from ``getCriteriaParam`` in the very same way.

.. _Technical-dbtool-works-execution:

Execution
-----------

The main task of the ``execute`` method is to construct the SQL query, loop over its results and add them to the ``DispatchModule\ResultSet`` that will be returned. 

First, it chooses proper base table in a method ``chooseBaseTable``. The table is actually view joining several tables together and there are multiple views with the same structure but slightly different data. The difference is always in how the data are grouped (e.g. by accession number, organism name...). Therefore the ``chooseBaseTable`` method *decides which view to use* depending on the aggregate properties (previously stored in ``checkReserverdWords``).

After the base table is chosen, the ``prepareCriteria`` method is called. This method creates the query (using the repository ``getTable`` method) and loops over data of all buckets (SQL rules) and adds the proper SQL rule for each key, replacing the ``%BASE_VIEW%`` magic constant by the name of the actual view.

At this moment, the ``execute`` method already has completed the SQL query and can loop over its results. Each result is converted to a ``DispatchModule\Sequence`` object and additional data are loaded - references, features, xrefs predicted features. The first three are loaded in very similar way - a new SQL query is performed, loading additional data from another table (the table for these properties is in a 1:N relationship). There is a nested loop where needed, like in case of ``authors``::

    // load references with authors
    $references = $this->repository->getTable('references')->select('id, ...,
                          ..., location')->where('fk_annotation', 
                                                 $res['fk_annotation']);
    $referenceArray = array();
    $i = 0;
    foreach ($references as $reference) {
        $referenceArray[$i] = $reference->toArray();
        $authors = $reference->related('authors');
        foreach ($authors as $author) {
            $referenceArray[$i]['authors'][] = $author->author['name'];
        }

        $applicants = $reference->related('applicants');
        foreach ($applicants as $applicant) {
            $referenceArray[$i]['applicants'][] = $applicant->applicant['name'];
        }
        $i++;
    }
    $sequence->loadData(array('references' => $referenceArray));

Since every reference has 0 - N authors, the nested loop is necessary. The same is applied to ``applicants``. The method ``loadData`` adds newly gained data to the ``DispatchModule\Sequence`` object.

For predicted features, the situation is a bit more complex, since every sequence can have 0 - N predictions and each prediction has 0 - N predicted features in several tables and each predicted feature has 1 - N positions. For predictions there is similar loop over all predictions. For each prediction all six types of features are loaded::

    // load predictions
    $predictions = $this->repository->getTable('predictions')->select('*')->where('fk_accession', 
      $res['fk_accession']);
    $predictionArray = array();
    foreach ($predictions as $prediction) {
        $structure = $this->repository->getTable('structures')->where(
                      'fk_prediction', $prediction->id)->fetch()->toArray();
        $wholePrediction['structure'] = $structure['structure'];
        $wholePrediction['algorithm'] = $prediction->algorithm;
        $wholePrediction['structure_id'] = $prediction->id;
        $wholePrediction['bulges'] = $this->getPredictionFeatures(
                                                $prediction->id, 'bulges');
        $wholePrediction['foverhangs'] = $this->getPredictionFeatures(
                                                $prediction->id, 'foverhangs');
        $wholePrediction['hairpins'] = $this->getPredictionFeatures(
                                                $prediction->id, 'hairpins');
        $wholePrediction['helices'] = $this->getPredictionFeatures(
                                                $prediction->id, 'helices');
        $wholePrediction['junctions'] = $this->getPredictionFeatures(
                                                $prediction->id, 'junctions');
        $wholePrediction['loops'] = $this->getPredictionFeatures(
                                                $prediction->id, 'loops');
        $wholePrediction['toverhangs'] = $this->getPredictionFeatures(
                                                $prediction->id, 'toverhangs');
        $predictionArray[] = $wholePrediction;
    }

The method ``getPredictionFeatures`` simply loads all structural features of a given prediction together with their positions type (using ``LEFT JOIN`` in SQL) and stores them into an array which then looks like the following::

    array(0 => (0, 20, 111),
          1 => (35, 45, 111),
          2 => (65, 85, 121),
          3 => (90, 95, 121)
    )

This means there are two predicted features (with id 111 and 121) at two positions each (first one is on 0-20 and 35-45, the other one is on 65-85 and 90-95).

All that remains is to return the result.

.. _Technical-dbtool-misc:

Miscellaneous methods
======================

There are some other methods in ``DbTool`` like ``getCount`` which works in similar way as ``execute``, except it's simpler because it does not return actual data but rather a number of records matching selected criteria. All other methods are just helpers (like ``lastTaxonomySelected``, which picks last taxonomy from all of it's values, or ``getTaxonomyPath``, which returns taxonomy path for specified id).