.. _User-index: rPredictorDB User Documentation ***************************** .. toctree:: :maxdepth: 2 :numbered: rWeb Tutorial Data Toolkit Toolkit reference Database record information (Sequence detail) Exporting results Similarity search CP-predict: Custom rRNA prediction algorithm Structural features FAQ Glossary Biological background .. _User-index-why: rPredictorDB: the Why & the What ============================== .. only:: latex rPredictorDB exists to make bioinformatics over ribosomal RNA easier. The rPredictorDB *website* provides access to various *tools* you can use over a *database* of rRNA molecules. We assembled a toolkit for various common bioinformatical tasks, falling into two broad categories: * Search * Predict secondary structure *Search tools* retrieve a set of rRNA molecules from our database, both according to some exact criteria (length, organism name or group, type of rRNA, etc.) and similarity criteria ("Find me more sequences like this one"). *Prediction tools* take a rRNA sequence and suggest base pairs. An overview of the available tools is in :ref:`User-toolkit`. This overview should help you to select the right tools for your work. A reference manual of the inputs for each individual tool can be found in :ref:`User-toolkit-reference`. This should help you understand in detail how to use a tool you chose. .. note:: If there is any confusion about the terminology we use and how we use it, refer to the :ref:`User-glossary`. .. note:: A PDF version of the documentation `is available `_ We also have an `RSS feed `_! Goals of rPredictorDB -------------------- The aim of rPredictorDB is twofold. We develop and deploy a technique of predicting ribosomal RNA secondary structure and make the resulting structural information readily available. At the same time, the rPredictorDB database contains rich annotations of rRNA structures and the underlying sequences, providing a unified interface for bioinformaticians involved in rRNA research. A secondary goal of rPredictorDB is extensibility: it should be easy to integrate third-party tools. Motivation ---------- Gene translation is the process of implementation of genetic information, which forms a living organism. The unit central to translation is the ribosome. The "scaffold" (and major part) of the ribosome consists of ribosomal ribonucleic acids (rRNA), which are critical for its function. Because the function of biological molecules is mostly determined by their spatial structure, understanding the role of rRNA in translation depends on understanding rRNA structures. While rRNA nucleotide sequences can be obtained relatively easily, determining their three-dimensional structure is very demanding: sequences are known for hundreds of eucaryotic organisms while spatial structures only for about 5. Secondary structures are an intermediate step between sequences and three-dimensional structures. Understanding secondary structures enables at least partial study of rRNA behavior, and secondary structures can be predicted from sequences (to a much greater extent than the spatial structures). However, ribosomal RNA are notoriously hard problems for secondary structure prediction. In recent years, improved imaging methods have led to detailed measurement of spatial structures of the ribosome in a few organisms (and a secondary structure can be derived from these measurements). The availability of these structures, together with a high degree of conservation in the ribosome, should make it possible to derive secondary structures for other ribosomal RNA as well. While the importance of ribosomes and ribosomal RNA has been recognized for several decades, support for bioinformatical work over the available ribosomal data is fragmented and unsatisfactory. Various sites dedicated to rRNA exist: most notably the SILVA database and the Comparative RNA Website (CRW). However, none of them are satisfactory: the CRW site is hard to navigate, fails to provide crucial information (e.g. origin of provided secondary structure) and does not support mass retrieval; SILVA has no support for structural bioinformatics. The rPredictorDB infrastructure aims to overcome the shortcomings of these sites and make working with rRNA as easy as possible. Documentation ------------------ The documentation is split into three parts: * rDoc-User, which covers all you need to know to *use* rPredictorDB (you are reading the introduction of rDoc-User right now). Note that installation instructions are not included with the User documentation, because it is *not* a task for users to undertake; the infrastructure is already set up and available at the `rPredictorDB website `__. * rDoc-Technical, which describes how rPredictorDB is done: :ref:`Technical-index`. This includes installation instructions. * The API reference, which is useful if you would like to join the rPredictorDB development team. The reference documentation is not a part of the printed documentation. For the rWeb component, it is available on the `rPredictorDB website `__. What now? --------- To start using the rPredictorDB website right away, go `search `__ or `predict `__. If you wish to read more about rPredictorDB: * For a tutorial on how to make the most of what rPredictorDB can offer, go read the :ref:`User-rWebTutorial`. * To find out what tools are available through rPredictorDB and how to use them, browse the :ref:`User-toolkit`. * To learn about the rPredictorDB database and data sources, read :ref:`User-rData` (or :ref:`Technical-rData` for a technical description of the rPredictorDB database, rDB). * To find out how the rPredict algorithm works, see :ref:`User-cp-predict`. * To learn more about the biological background of rPredictorDB, go :ref:`User-biology`. * For the technical documentation of rPredictorDB, go to :ref:`Technical-index`. * For eager developers: you can visit the `rWeb API reference documentation of rPredictorDB `_ (But please do at least read about :ref:`Technical-architecture` first.) .. only:: html rPredictorDB exists to make bioinformatics over ribosomal RNA easier. The rPredictorDB *website* provides access to various *tools* you can use over a *database* of rRNA molecules. We assembled a toolkit for various common bioinformatical tasks, falling into two broad categories: * Search * Predict secondary structure Search tools retrieve a set of rRNA molecules from our database, both according to some exact criteria (length, organism name or group, type of rRNA, etc.) and similarity criteria ("Find me more sequences like this one"). Prediction tools take a rRNA sequence and suggest base pairs. An overview of the available tools is in :ref:`User-toolkit`. This overview should help you to select the right tools for your work. A reference manual of the inputs for each individual tool can be found in :ref:`User-toolkit-reference`. This should help you understand in detail how to use a tool you chose. .. warning:: If there is any confusion about the terminology we use and how we use it, refer to the :ref:`User-glossary`. More accurately and technically, the rPredictorDB infrastructure consists of the following components: * rData: a database of rRNA sequences and secondary structures * rETL: the infrastructure necessary to populate rData and keep it up to date * rTools: a set of tools that perform standard tasks on the data like similarity search or secondary structure prediction * CP-predict: a new algorithm dedicated to rRNA secondary structure prediction * rWeb: an internet portal and back-end that makes the rData and rTools components accessible to the research community and general public * rDoc: thorough documentation, including relevant scientific literature Probably of most interest to the casual user is the rWeb component, which is the interface through which you will communicate with the other rPredictorDB components. rDoc ------------------ The documentation is split into three parts: * rDoc-User, which covers all you need to know to *use* rPredictorDB (you are reading the main page of rDoc-User right now). Installation instruction is not included with the User documentation, because it is *not* a task for users to undertake. * rDoc-Technical, which describes how rPredictorDB is done: :ref:`Technical-index`. This includes installation instructions. * rDoc-Reference, which is useful if you would like to join the rPredictorDB development team. `The reference documentation is here. `_ .. note:: Taken from :ref:`User-FAQ`. **Why build a website for rRNA bioinformatics at all?** While there are several websites dedicated directly to ribosomal RNA, we (`Bioinformatics laboratory of the Microbiological Institute of the Czech Academy of Sciences `_ and our team at the Faculty of Mathematics and Physics at Charles University) feel that a better job could have been done. The main drawbacks of similar bioinformatical websites are a sharp learning curve and missing information or unclear purpose of information. In order to use a website such as the `Protein Data bank `_ or the `Comparative RNA website `_, a user first has to have - or obtain - a good general idea of what the site does, why would anyone want such a site and what a *lot* of terminology means. After this background knowledge is obtained, the user finds out that certain information is not well-curated or made explicit: no one (at least publicly) keeps track of whether a RNA sequence was obtained directly or from a DNA transcription site, information about the type of rRNA is sparse, sometimes the phylogeny of a molecule is missing, a resolved secondary structure is not sufficiently labeled, it is unclear what constitutes a truly unique identifier of an RNA sequence or structure, etc. For instance, the `STRAND database `_ of resolved RNA secondary structures contains a number of structures inconsistently labeled for RNA-protein complexes and duplicate sequences. Another drawback of such sites is often little or missing support for mass data retrieval. When present, it is usually in the form of pre-packaged archives and the user has little choice over what subset of the data to download. (A notable exception here is the `SILVA database `_ of rRNA molecules.) While perhap such drawbacks are of less concern to biologists, the fast-growing field of bioinformatics is sensitive to this kind of volatility in data sources. Nobody wants to spend a lot of time by finding out what uniquely identifies an RNA sequence or structure, filling in missing fields, etc., let alone downloading a set of several hundred sequences of interest one by one. The rWeb and rData components of rPredictorDB were designed with overcoming these drawbacks in mind. Our goal is first and foremost clarity: answers to questions such as "Why would I ever want to do that?" should be easy to find (and relatively easy to read). What now? ========= To start using the rPredictorDB website right away, go `search `__ or `predict `__. If you wish to read more about rPredictorDB: * For a tutorial on how to make the most of what rPredictorDB can offer, go read the :ref:`User-rWebTutorial`. * To find out what tools are available through rPredictorDB and how to use them, browse the :ref:`User-toolkit`. * To learn about the rPredictorDB database and data sources, read :ref:`User-rData` (or :ref:`Technical-rData` for a more technical description). * To find out how the rPredict algorithm works, see :ref:`User-cp-predict`. * To learn more about the biological background of rPredictorDB, go :ref:`User-biology`. * For the technical documentation of rPredictorDB, go to :ref:`Technical-index`. * For developers: the `API reference documentation of rPredictorDB `_