12. Biological background

Ribosomal RNA secondary structure prediction? What is that?

This page (selectively) describes the biological background of rPredictorDB. It is intended mainly for those who would be interested in some topics rPredictorDB addresses, but lack the background biological knowledge to understand what it’s all about.

Note

A disclaimer: We are worse writers than the Wikipedians who were so generous to share their knowledge, so it is generally recommended to read the Wiki link first. If we felt like we had something to add to make the topic even easier to understand, we did our best.

No more than a high school biology background is assumed.

12.1. What’s a ribosome? Why do we have them?

Ribosome on Wikipedia

Storing information about what the body should do in a genome is - obviously - a great idea. However, if we only had DNA molecules, we wouldn’t get very far in life: DNA is very good at storing information, but not very good at, say, digesting chocolate. We need a mechanism that can take the information stored in DNA and make it do stuff.

The “action guys” that can do all the work like chocolate digestion are proteins: just as DNA and RNA are sequences of some basic building blocks (nucleotides), a protein is a sequence of amino-acids, not very complicated molecules that can, however, be stacked one on another, just like various rail cars can make up a train. The mechanism that gets us from the planning department (DNA) to the manual laborers (proteins) is called protein synthesis.

Protein synthesis has several complex steps. First, a part of the DNA needs to be copied over to an RNA molecule called messenger RNA, or mRNA, then this information needs to be used to build the right “train” of amino-acids. Like a railyard dispatcher’s list would say “boxcar, boxcar, container car, refrigerator car, boxcar”, the mRNA says “Alanine, Alanine, Histidine, Tryptophan, Alanine”. The process of creating an mRNA molecule from DNA is called gene transcription.

Next, we need to translate this dispatcher’s list to the actual train - we need to translate the information about the amino acid sequence into an actual sequence of amino acids. This is called gene translation.

Gene translation is, finally, where the ribosome steps in: the ribosome is the unit where genetic information carried by mRNA is converted into a protein. The conversion itself is like the working of a railyard: yard locomotives haul cargo cars to assemble the train. The yard locomotives are transport RNA molecules - short RNAs that can carry an amino-acid. The ribosome is the dispatcher that is reading the list and says: “Now give me a boxcar.” [1]

In more detail: the ribosome “captures” a mRNA molecule at one of its ends and exposes the first codon (a set of three nucleotides that encodes an amino acid). Transport RNAs then “bump” into the ribosome with their anticodon, an end that depends on what kind of amino acid the tRNA is carrying. The ribosome moves the tRNA molecule so that it can check if its anticodon fits to the current codon. If it does fit (which happens only if the tRNA is carrying the right amino acid), the ribosome will grab the amino acid from the tRNA, bind it to the last captured amino acid in the protein chain under construction, let the un-loaded tRNA go and move along the mRNA chain to the next codon. In this manner, the protein is gradually built up from the individual amino acids. (The wikipedia link above contains a great animation that shows this process.)

[1]This is where the metaphor breaks down: if the railyard worked like the ribosomes, the yard locomotives would simply randomly be lurking around, with cars or not, and from time to time they would encounter a dispatcher and offer him the car; the car would be either accepted or refused, based on the dispatcher’s list.

12.2. What’s a “secondary structure”?

RNA structure on Wikipedia

Nucleotide on Wikipedia

Base pairs on Wikipedia

An RNA molecule consists of nucleotides, which are simpler molecules that join together into a chain to form the RNA molecule. There are four basic RNA nucleotides: Adenine (A), Cytosine (C), Guanine (G) and Uracil (U).

The first thing the nucleotides do is arrange themselves into the sequence by forming regular covalent bonds with their neighbors. However, after this basic duty is seen to, they start to get restless. Some nucleotides like each other and want to be closer. This is due to hydrogen bonds forming between the nucleotides of the given types. The tendency to form hydrogen bonds is by far the strongest between C and G nucleotides (these usually form three H-bonds), followed by A and U (these form 2).

The G-C and A-U pairs are called Watson-Crick pairs. The next one down the line is the G-U match, called a “wobble” pair. This one is weaker still, because the G would like to form one more hydrogen bond, but U can’t. So G might very well try to form that bond with some other nucleotide, making the wobble pair less stable.

This process where nucleotides are trying to find their partners is well and orderly in a DNA, where two complementary nucleotide chains are available. These two chains will simply clamp together like a zipper and stay more or less put. (“Complementary” means that for each C in the first chain, there is a G in the other and vice versa, for every U, there is an A, etc.)

However, it is less clear-cut in RNA, where only one chain is available: nucleotides from one part of the chain try to find partners from another part and the chain folds upon itself.

Determining the exact shape the folded RNA molecule will have - the positions of individual atoms, or at least nucleotides (also called residues) in space is very hard to predict. However, it is possible - to a much more reasonable degree of certainty - to predict at least how the nucleotides will pair up: which ones will find a partner, which other nucleotide this significant other will be, and which of the guys stay single.

This pairing is called, at least for the purposes of rPredictorDB (see Glossary), the secondary structure.

Of course, in reality, things aren’t as clear-cut as a nice list of unambiguous base pairs. (For a taste of how complicated things may get, take a look at the Hoogsteen pair Wikipedia article.) However, the secondary structure is still a useful approximation that helps predict at least some useful properties of the given RNA molecule.

For an example of a typical secondary structure, take a look at some tRNA

12.3. Why structure?

The structure of RNA is much more important to its function than its chemical composition. All the nucleotides are pretty much made of the same atoms and there are tens and hundreds of each in the molecule, spread around more or less evenly throughout the primary structure. However, the seemingly random patterns of nucleotides one after the other in the chain create enable, thanks to the attraction between certain nucleotides, the molecule to fold into a complex shape, and furthermore enable different molecules built from the very same set of nucleotides to fold into wildly different shapes. Therefore, to understand how these molecules actually work, we need to understand their structure.

Ribosomes are extra important because they are the site of one of the most important and pervasive processes practically all life: protein synthesis. Understanding ribosomal RNA structure has for example lead to development of powerful antibiotics that can selectively block protein synthesis in bacteriae (thanks to the different structure of bacterial and human/animal ribosomal RNA), thus depriving the bacteria of its building material.

You can read a presentation dealing with ribosomes and antibiotics.

12.4. Why prediction?

The Wikipedia article on secondary structure prediction

There are powerful imaging technologies available today that can measure the position of individual atoms in large molecules well. Why would we want to settle for guessing, and only guessing an intermediat approximation to the actual structure rRNA will fold into?

The answer is, because it’s very expensive and difficult to use these imaging technologies. There is over half a million unique ribosomal RNA sequences known; running X-ray crystallography or Nuclear Magnetic Resonance over a significant portion of them would cost just way, way, way too much. Plus, we know that functions of biological molecules are conserved among species that are evolutionarily close together. So, perhaps, we could just measure a few structures and try to extrapolate from there?

Or, we may do away with the observed structures altogether and find a model that simply describes accurately the preferred foldings of a given RNA molecule. Or try something completely different. The palette of available secondary structure prediction software is not small. And that list is just not comprehensive.

In the end, it’s all about the Benjamins: predicting secondary structures saves time and money.