rPredictorDB is a predictive database of secondary structures of individual RNAs and their formatted plots. The structures are generated by template-based prediction of RNA secondary structure with experimentally identified structures as templates. RNAs with large secondary structure are visualized using a template-based visualization method allowing for their formatted and readable display. The rPredictorDB web also allows for secondary structure template-based prediction for user-uploaded RNA sequences using templates stored in rPredictorDB.
Following is a brief, usage documentation. A technical, more detailed, documentation is available here.
Citations:
The Search page allows for searching the RNAs stored in rPredictorDB accompanied with their secondary structure(s) and their plot(s) base on various criteria. The items on the Search page have either a static description or a pop-up description that appear on mouse pointer overlap.
The search criteria are Taxonomy, Annotation and Sequence. If selected via their check boxes, an appropriate section appears bellow. Multiple criteria can be selected.
Taxonomy criterion: Restricts the search in the database to chosen taxonomic group(s) or an organism. A taxonomy browser is available (via a hyperlink) to choose taxonomic group(s) by browsing a phylogenetic tree.
Annotation criterion: search the stored RNAs using annotations stored in rPredictorDB. The items of the annotations that can be used for the search have appropriate text / check boxes. They are pre-filled with example keywords.
Sequence criterion: search the stored RNAs based on sequence similarity. The input is a RNA sequence. RNAs stored in rPredictorDB with sequences similar to the query sequence are identified and listed. The similarity is computed by BLAST.
The Predict page allows for template–based RNA secondary structure prediction for user sequences using templates stored in rPredictorDB.
The page allows for sequence upload (by copy – paste) and selection of the template. The selection is either automatic, when the template with most similar sequence to the query sequence is identified and used for prediction, or manual. Templates are secondary structures extracted from experimentally identified structures, found mostly in PDB.
The Download page allows to download rPredictorDB database in CSV format or as a database dump. There is also publicly available the used prediction algorithm.
RNA | Source of sequences | Templates and their source | Template sequence length (nucleotides) |
---|---|---|---|
16S rRNA | Silva | E. coli 16S rRNA (PDB ID 2ZM6)1 | 1542 |
18S rRNA Chordata | H. sapiens 18S rRNA (PDB ID 4V6X)1 | 1869 | |
18S rRNA Diptera | D. melanogaster 18S rRNA (PDB ID 4V6W)1 | 1995 | |
5S rRNA Bacteria | Rfam (RF00001) | E. coli 5S (PDB ID 1C2X)1 | 120 |
5S rRNA Eukarya | H. sapiens 5S (PDB ID 6EKO)1 | 120 | |
5.8S rRNA | Rfam (RF00002) | Trypanosoma cruzi 5.8S RNA (PDB ID 5T5H) | 169 |
6S RNA | Rfam (RF00013) | E. coli 6S RNA (Wassarman et al. 2000)2,3 | 184 |
B. subtilis 6S RNA (Ando et al.2002)2,3 | 187 | ||
9S rRNA | Rfam (RF02545) | Trypanosoma brucei 9S rRNA (PDB ID 6HIY) | 621 |
Aphthovirus internal ribosome entry site | Rfam (RF00210) | PDB ID 2NBX4 | 107 |
AdoCbl variant | Rfam (RF01689) | Marine metagenome (PDB ID 4FRN) | 110 |
Archaeal signal recognition particle RNA | Rfam (RF01857) | Methanocaldococcus jannaschii (PDB ID 3NDB) | 135 |
Cobalamin riboswitch | Rfam (RF00174) | Symbiobacterium thermophilum (PDB ID 4GXY) | 172 |
C-DI-AMP riboswitch | Rfam (RF00379) | Thermovirga lienii C-DI-AMP riboswitch (PDB ID 4QK9) | 123 |
CRPV-IRES | Rfam (RF00458) | Mammalian CRPV-IRES (PDB ID 6D9J) | 190 |
CSFV IRES | Rfam (RF00209) | Viral CSFV IRES (PDB ID 4C4Q) | 233 |
FMN riboswitch | Rfam (RF00050) | PDB ID 3F2Y4 | 112 |
Fungi U3 | Rfam (RF01846) | S. cerevisiae u3 (PDB ID 5WYK) | 333 |
gcvB | Sharma et al. 20073 | S. typhimurium gcvB (Sharma et al. 2007)3 | 206 |
GLMS ribosyme | Rfam (RF00234) | Bacillus anthracis GLMS ribosyme (PDB ID 3L3C) | 141 |
Group I catalytic intron | Rfam (RF00028) | Staphylococcus virus Twort (PDB ID 1Y0Q) | 192 |
Group II catalytic intron D1-D4-3 | Rfam (RF02001) | PDB ID 3BWP4 | 172 |
Group II catalytic intron D1-D4-7 | Rfam (RF02012) | Pylaiella littoralis (PDB ID 4R0D) | 157 |
Group II intron lariat | NCBI5 | Oceanobacillus iheyensis group II intron (PDB ID 5J02) | 418 |
Group II intron lariat in post-catalytic state6 | NCBI5 | Pylaiella littoralis (PDB ID 6CIH) | 621 |
IRES HCV | Rfam (RF00061) | H. sapiens IRES HCV (PDB ID 5A2Q) | 257 |
Lariat capping ribozyme | Rfam (RF01807) | Didymium iridis lariat capping ribozyme (PDB ID 4P8Z) | 188 |
Lysine riboswitch | Rfam (RF00168) | T. maritima lysine riboswitch (PDB ID 4ERL) | 161 |
Mammalian CPEB3 ribozyme | Rfam (RF00622) | H. sapiens CPEB3 (Salehi-Ashtiani et al. 2006)3 | 78 |
M-box | Rfam (RF00380) | B. subtilis M-box (PDB ID 3PDR) | 161 |
micF | Rfam (RF00033) | E. coli micF (Esterling et al. 1994)3 | 95 |
MLV encapsidation signal | Rfam (RF00374) | Viral MLV (PDB ID 1U6P) | 101 |
ms1 | Hnilicova et al. 20143 | M. smegmatis ms1 (Panek et al. 2011, Hnilicova et al. 2014)3 | 304 |
oxyS | Rfam (RF00035) | E. coli oxyS (Argaman et al. 2000)3 | 109 |
PHI29 PROHEAD RNA | Rfam (RF00044) | Bacteriophage PHI29 (PDB ID 1FOQ) | 117 |
RNaseP arch | Rfam (RF00373) | Pyrococcus furiosus RNaseP | 347 |
RNaseP bact a | NCBI5 | T. tengcongensis RNaseP bact a (PDB ID 3Q1R) | 347 |
RNaseP bact b | Rfam (RF00011) | PDB ID 2A644 | 414 |
RNaseP nuc | Rfam (RF00009) | H. sapiens RNaseP (Marquez et al. 2005)3 | 341 |
ryhB | Davis et al. 20053 | E. coli ryhB (Davis et al. 2005)3 | 90 |
SAM I | Rfam (RF00162) | T. tengcongensis SAM I (PDB ID 2GIS) | 94 |
spot42 | Rfam (RF00021) | E. coli spot42 (Moller et al. 2002) | 119 |
SRP bact small | Rfam (RF00169) | E. coli SRP (SRPDB ID esccol3d-97-11-17-stretched.pdb) | 114 |
SRP bact large | Rfam (RF01854) | B. subtilis SRP (PDB ID 4UE4) | 266 |
SRP Metazoa | NCBI5 | H. sapiens SRP (PDB ID 4P3E) | 301 |
Tetrahymena ribozyme | NCBI5 | PDB ID 1X8W4 | 247 |
Tetrahymena telomerase RNA | Rfam (RF00025) | Tetrahymena Telomerase RNA (PDB ID 6D6V) | 159 |
THF riboswitch | Rfam (RF01831) | PDB ID 4LVV4 | 89 |
tmRNA | Rfam (RF00023) | E. coli tmRNA (PDB ID 3IZ4) | 377 |
TPP riboswitch | NCBI5 | E. coli TPP (PDB ID 4NYG) | 83 |
tRNA Gly eukaryotic | Rfam (RF00005) | H. sapiens tRNA Gly (PDB ID 5E6M)1 | 74 |
tRNA Gly bacterial | G. kaustophilus tRNA Gly (PDB ID 4MGM)1 | 75 | |
Trypanosomatid mitochondrial large subunit ribosomal RNA | Rfam (RF02546) | Trypanosoma brucei brucei (PDB ID 6HIV) | 560 |
u1 | Rfam (RF00003) | H. sapiens u1 (Nagai et al. 2002)3 | 163 |
u2 | Rfam (RF00004) | H. sapiens u2 (Nagai et al. 2002)3 | 188 |
u4 | Rfam (RF00015) | H. sapiens u4 (Krol et al. 1981)3 | 144 |
u5 | Rfam (RF00020) | H. sapiens u5 (Sievers et al. 2011)3 | 116 |
u6 | Rfam (RF00026) | H. sapiens u6 (PDB ID 5LQW) | 112 |
vertebrate Telomerase RNA | Rfam (RF00024) | H. sapiens Telomerase RNA (Bentley et al. 2002)3 | 451 |
yeast u1 | Rfam (RF00488) | S. cerevisiae (PDB ID 5ZWN) | 565 |
1
The template is applied to sequences according to taxonomy, i.e. a eukaryotic template to eukaryotic sequences, a prokaryotic template to prokaryotic sequences.
2
It is impossible to distinguish which template should be used based on taxonomy, as some bacteria, e.g. Firmicutes, contain 6S RNAs of both template types.
Therefore the template producing a structure with a better z-score is used for each 6S RNA.
3
Sequences and/or template structure were copied from the paper publishing the template structure.
4
Organism not described or a synthetic expression system used.
5
The sequences were obtained by NCBI BLAST search with "somewhat similar sequences" parameters against nr database with query sequences taken from PDB.
The reason was that the sequences in an appropriate Rfam family seemed incompatible with PDB structure,
as they either were short fragments or had very low sequence similarity to the PDB sequence.
6
This family contains several very short fragments producing substructures that are hard to match with the template structure.
Nevertheless, we included them into rPredictorDB as they had significant BLAST E-values (< 1.10-12) and also,
as they represent a good example of RNAs with extremely fragmented sequences.