6. Creating your own tool

One of the main design properties of rPredictorDB is extensibility. Bioinformatics is a fast-moving field and the rPredictorDB team cannot presume to know what functionality the bioinformatics community will require in a few years’ time. Therefore, rPredictorDB strives to make it easy to add further tools to its toolbox.

This tutorial describes how to create a new tool (at least the PhP part) - it is recommended to read the previous section on tool design first (Tools).

Make sure you read the tutorial through to the end, including the Tool development best practices section!

The tutorial also assumes you have access to the rPredictorDB repository. That can be arranged by contacting us.

6.1. The tool class

The first step is to create a new class and place it in the DispatchModule\Tools namespace. This class has to implement the ToolInterface interface.

It is recommended not to create new class from scratch, but rather copy an existing tool instead. For a start, let’s copy BlastTool and change it’s name to ExampleTool.

Warning

The new tool file name must be the same as the tool class name!

Make sure you place the tool in the appropriate section: if it is a search tool, put it in the DispatchModule/searchTools/ directory, if it is a prediction tool, use the DispatchModule/predictTools/ directory.

6.1.1. Naming the tool

The tool class has to implement the getLabel() method, which returns the tool name that will be displayed on the rPredictorDB Search or Predict page.

It is also in your very best interest to provide a short description of the tool in the $description variable of the tool class, which will be displayed as alt-text for the tool button on the webpage.

6.1.2. Specifying tool inputs

Next, we need to make clear what inputs new tool requires. Let’s create a wantedParameters array like this:

  array(
      'kingdom' => array('select', 'Eukaryote kingdom', array(
              'multiplicators' => array('or' => 7),
          'items' => array('all' => 'All',
                           'animalia' => 'Animalia',
                           'fungi' => 'Fungi',
                           'amoebae' => 'Amoebae',
                           'plantae' => 'Plantae',
                           'chromalveolata' => 'Chromalveolata',
                           'rhizaria' => 'Rhizaria',
                           'excavata' => 'Excavata')
          )),
      'quality' => array('select', 'Which quality search should be run',
                 array('items' => array('1' => 'High',
                                        '0' => 'Low')
                 )
             ),
      'similar_to' => array('text', 'Similarity',
                  array(
                    'modifiers' => array(
                                    'similar_to_how' => array(
                                      'select', '', array(
                                        'items' => array(
                                            'exact' => 'Exact match',
                                            'vague' => 'Vague is OK'
                                        )
                                      )
                                    )
                    ),
                        )
),
  );

So what have we done? First, we specified kingdom input - the user can define in which kingdom of Eukaryota the search can be performed. Because the multiplier cardinality is set to 7, the user can choose any combination of the kingdoms (up to 7, from a set of 7). Next, the user can specify the quality parameter - it can be set either to high or low. Finally, the user can set text input with a part of a sequence and specify whether an exact match should be found, or a similar match is enough.

The similar_to field looks intimidating, but in fact isn’t that hard to grasp: we’re only saying that the input field similar_to_how will be available to the user as a modifier select-box in the form next to the “Similarity” input field, with two possible values for the user to choose from: “Exact match” and “Vague is OK”.

Now SearchParser would already correctly render the form with all of our newly defined inputs. Let’s prepare the tool for performing the search.

6.1.3. Processing inputs

We need to program the body of the addCriteria method. It is called separately for each input defined in the wantedParameters array. We need to handle all three parameters that we accept. Let’s write a switch for it:

switch($name) {
case 'kingdom':
    // save kingdom settings
    break;
case 'quality':
    // save quality settings
    break
case 'similar_to':
    // save similar_to sequence
    break;
case 'similar_to_how':
    // save similarity settings
    break;
}

Note

There’s one more case in the switch then there are inputs - that’s because modifiers are given to addCriteria method as regular inputs (but can also - and sometimes have to, when combined with multipliers - be processed directly with its primary input).

See also

The requireParameter method of BaseTool, which discovers modifiers

This switch is pretty straightforward and saving values for quality, similar_to and similar_to_how is very easy as well - we simply store the value into our newly created variable. Let’s add those variable to the top of the class:

private $quality;
private $similar_to;
private $similar_to_how;

Now, in the corresponding places in the switch in addCriteria, we simply set those values:

...
$this->quality = $value;
...
$this->similar_to = $value;
...
$this->similar_to_how = $value;
...

That’s all we need to do for quality, similar_to and similar_to_how. Let’s handle kingdom now. Since that is more complex task, let’s separate it into it’s own method:

private function processKingdom($value) {

    // The $value parameter is the value of the first input.
    // If it's 'all', simply return all choices.
    if($value == 'all') {
        // get all possible keys into array
        $kingdomArray = array_keys($this->wantedParameters['kingdom']['items'];
        // Remove 'all', since it's not a real value
        unset($kingdomArray['all']);
    } else {
        // Declare new array with one element - value of a first input
        $kingdomArray = array($value);

        // Read all the kingdom inputs directly from http post request (unsafe)
        $kingdoms = $this->completeData['example_kingdom_array'];
        foreach($kingdoms as $kingdom) {
            if($kingdom != 'all') {
                // add element to kingdom element
                $kingdomArray[] = $kingdom;
            }
        }

    return $kingdomArray;
}

The method first checks whether value of the first kingdom element is all - in that case, the method returns all possible kingdom values. Otherwise a loop over all input elements is performed and the array is build manually. Note that only the first input is checked for the value all - if all is set in some of the multiplied inputs, it’s considered as a mistake and search is still performed only on specified kingdoms. To change this behaviour, simply add an else statement into the foreach loop and do the same thing as in the first if - fill the array with all possible values.

The completeData variable basically contains multiplied (_array) inputs from the form (loaded via $httpRequest->getPost() method).

Warning

Using completeData is a must when handling multiplied inputs, because these inputs are not in the form by default and therefore Nette forms don’t recognize them (for safety reasons) and, more importantly, don’t “secure” them - meaning no string escaping or other validation is performed. Therefore such code might lead to security threats and needs more careful handling that we omit here for the sake of brevity.

The last thing we need to do to store our data is to save the result of processKingdom into a variable in the switch as mentioned before:

    private $kingdoms;
...
$this->kingdoms = $this->processKingdom($value);

6.1.4. The execute() method

Now, all that is left is to execute our tool. Let’s say we have two different bash scripts - low quality and high quality, each of which takes a sequence and a similiarity criterion as an argument:

public function execute() {

    // Create temporary file name
    $filename = 'exampletool.' . time() . '.output';

    // Prepare arguments
    $arguments = $this->similar_to . ' --how ';
    $arguments .= $this->similar_to_how . ' --outputfile ' . $filename;

    // Which tool script to use
    if($this->quality == '0') {
        $toolName = 'lowQualitySearch.sh';
    } else {
        $toolName = 'highQualitySearch.sh';
    }

    // Execute the tool
    exec('cd files/exampletool/; ' . $toolName . ' ' . $arguments');

    $results = new \DispatchModule\ResultSet;

    // Read results from file created by the tool
    foreach($this->xmlService->read($filename) as $record) {

        // check whether we're in a proper kingdom
        if(in_array($record['kingdom'], $this->kingdoms)) {

            // create sequence object
            $sequence = $this->setResults($record);

            // add sequence to the resultset
            $results->addSequence($sequence);
        }
    }

    return $results;
}

This method translates user input data to arguments for the bash scripts. After the proper script is executed, the tool needs to read the output file and parse it - for example from XML, but any other format works as well (this is under your control). The xmlService used here should be an external service that read records produced by the bash script (or any other utility). The setResults method only sets sequence properties, like in following example:

private function setResults(record) {
    $sequence = new \DispatchModule\Sequence;
    $sequence->accession = $record['accession'];
    $sequence->start = $record['start'];
    $sequence->stop = $record['stop'];
    $sequence->sequence = $record['sequence'];
    $sequence->kingdom = $record['kingdom'];
    $sequence->quality = $record['quality'];
    ...
    return $sequence;
}

And that completes the tool. Note that a tool does not have to use any external utilities, like the lowQualitySearch.sh script - it can be written entirely in PhP. When writing more complex tools, it’s a good practice to split off the functionality into a SomeComplexToolModel class; you can look at the \DispatchModule\BlastTool and \DispatchModule\models\BlastModel for reference.

6.2. Integrating the tool into rPredictorDB

Once the tool is done and saved at the proper location, we need to register the tool with rPredictorDB. Find the config.neon file in www/app/BaseModule. Find the section that looks like this:

    services:

#
# [...some other stuf...]
#

            # tools
            dbTool: \DispatchModule\Tools\DbTool
            blastTool: \DispatchModule\Tools\BlastTool
            rfpredictTool: \DispatchModule\Tools\RfpredictTool
            cppredictTool: \DispatchModule\Tools\CppredictTool

Add the line:

exampleTool: \DispatchModule\Tools\ExampleTool

Done! Now the tool should be ready to use.

We will then review the committed tool and upload it to the server. (Or, if we find problems or glitches, we’ll work with you on fixing them.)

6.3. rPredictorDB environmental variables

You will very soon find out that you need some environmental variables for your tool - like the paths to the external utilities, some default settings, etc. Environmental variables for rPredictorDB can be set in the file:

www/www/config.php

Two variables important for all tools are set there:

  • TOOL_DIR is a directory where individual tools can store their infrastructure outside the PhP tool/model classes. It is given as an absolute path, so you can safely use it as a prefix for your tool’s executable script paths without worrying about PATH settings.
  • TEMP_PATH is a directory where tools can store their temporary files. Make sure your tool cleans up after itself!

6.4. Tool development best practices

These are some tips, tricks and best practices to follow when building tools for rPredictorDB.

  • When using an external utility, remember that the server has its own PATH settings. Use absolute paths to external utilities - ideally, via the TOOL_DIR variable. When uploading the tool to the server, we’ll make sure that the external files are in the correct place in TOOL_DIR.

Note

Some third-party tools (e.g. Blast) will install itself to a directory of its own choice. In that case, you of course do not have to specify tool path envrionmental variables relative to TOOL_DIR.

  • Although you can set environmental variables for your tool in the www/www/config.php file, don’t abuse it, because it pollutes the namespace of all other tools. If your tool needs some infrastructure outside its own class, set only the root for the tool in config.php and use a Model class for filling in other paths relative to the tool infrastructure root. See DispatchModule/predictTools/CppredictTool.php and DispatchModule/models/CppredictModel.php for an example. The best practice for environmental variables is to only create an EXAMPLETOOL_DIR variable that will hold the path relative to TOOL_DIR where all your other external dependencies are kept and use a Model class to store all other paths to your tool’s resources relative to EXAMPLETOOL_DIR.
  • If your tool generates temporary files, make sure you use the TEMP_DIR environmental variable.

7. Creating your own analytical model

Creating your own analytical model is, in a way, similar to creating your own tools and requires the same requirements. From this point on, the tutorial assumes you know how to create a tool.

You can also obtain many useful information for creating an analytical model by checking out existing models - e.g. ConservancyComparator model along with ConservancyPresenter.

7.1. The model class

The model class is a class that holds the functionality of the analytical tool itself. Models are located in AnalyseModule/models directory. The class itself should be available as Nette service but does not have any other requirements.

In this class, usually the main part is the execution of some external utility. This tutorial assumes you have this utility already installed and you know how to build a command to run it. Executing this utility is usually done via PhP exec function. Main method of the model (where the execution is usually performed) should be by convention named execute in order to keep all the models similar. However, the purpose and amount of arguments it takes is up to you, so there is no defined interface (obviously, the model class should usually accept some ids/accesion numbers of sequences to work with). Other methods of the class usually involves preparing temporary files, cleaning them, creating the command and so on.

7.2. The temporary files

If the model requires usage of temporary files, they should be placed in the TEMP_DIR just as it is with regular tools. You should create a new directory named similarly to your model in order to keep things organized. In this temporary directory, your model can do whatever it wants, just don’t forget that it also has to cleanup after itself (remove old no longer needed files).

7.3. The presentation

This, last section covers creating a presentation layer for your model.

7.3.1. Presenters

Your newly added model should come along with exactly one new presenter (placed in AnalyseModule/presenters directory). This presenter should be named similarly to your model and has to contain renderDefault method (may be replaced by actionDefault).

The presenter can have any number of other methods (and of course one template per each action). In the presenter and in the template, you are free to do anything you want.

The default template will be loaded via AJAX request and therefore it should not contain certain HTML entities such as headers or body definition. There is a template called “@ajax.latte`` prepared for this in app/AnalyseModule/templates directory. This template still can contain block (see Nette documentation for details) called head with the CSS styles or JavaScript files since the framework can load them upon finishing the AJAX HTTP request.

Note that the files loaded this way will not be available until the request is finished. Scripts that needs to work before the request is finished (e.g. the submit button function or generally scripts for manipulating the user interface) should be put inside www/js/Views/Analyse directory. Usually one tool should have exactly one JavaScript file there.

What is left is to add the CSS file and the JavaScript files. Usually the tool should not need more than one CSS file stored inside the www/css directory (the name of the file should respect the name of the tool). As mentioned before, the JavaScript files should be at two locations - app/js/Views/Analyse should contain one file (e.g. with the submit button functionality) and there can be any number of additional files inside the app/js/Classes directory.

7.3.2. Routes

You also need to specify new routes for your newly created presenters. Go to BaseModule/router/RouterFactory.php and in the bottom part of the page, you’ll see section called ANALYTICAL MODULE specific routes. Here you can copy the route (line) containing Conservancy Comparator and just rename it and add new arguments if needed.

7.3.3. Add a submit button

The final thing to do is to add a button to run your model on selected sequences. Since every model can be very specific and have different inputs, there is no auto-loader as it is with Search and Predict tools. However, it is really simple, just a few quick steps:

  1. Add the button itself - open AnalyseModule/templates/Analyse/default.latte and add the button into the div with analyseButtons (you should do this using the Nette way, so you also need to open AnalyseModule/presenters/AnalysePresenter.php and add your new submit button inside the createComponentAnalyseForm)
  2. Handle clicking the button via JavaScript - open www/js/Views/Analyse.js, scroll to the very bottom of the page and add handling of click event on your new button. This mainly includes checking whether necessary conditions for running your model were met (e.g. user has selected enough sequences) and changing the action attribute of the form to path to your new presenter

Note

Recommended way of adding the JavaScript handler is to copy the handler from Conservancy comparator (which is the first in the EXECUTING OF DIFFERENT MODELS section in the Analyse.js file) - everything you need is there.