The DNA Code (2) ...

How does the information encoded in DNA get passed to new cells? The double-strand structure of DNA is the key. The DNA double helix can unwind and replicate itself, thus allowing the transmission of hereditary information for producing new cells and for assembling the proteins that the new cells need for survival and reproduction.
The Role of RNA
The instructions for assembling proteins are provided by DNA, but are carried by ribonucleic acid or RNA. RNA is very similar to DNA, but the sugar deoxyribose is replaced by a related sugar called ribose, RNA is usually a single long chain rather than a twisted pair, and in RNA the base thiamine (T) is replace by a different base called uracil (U). There actually are 3 forms of RNA that play a role in protein building:

1. Messenger RNA (mRNA), which carries the genetic blueprint for building a protein that it obtained from the DNA.
2. Ribosomal RNA (rRNA), which combines with proteins to form ribosomes, which are a kind of "workbench" for protein construction in the cell.
3. Transfer RNA (tRNA), which links to specific amino acids (the basic building blocks of proteins; a list is given in the table below) and transports them to the ribosomes for protein construction.

All three forms of RNA are manufactured from the DNA. All are critical to the synthesis of proteins, but only mRNA carries the genetic blueprint that supervises protein building. We shall describe the role of RNA in making proteins further below.

Replication of DNA
The double stranded nature of the DNA molecule, the strict base pairing rules, and the resulting complementary nature of the two strands are central to the replication of DNA and to its method of issuing instructions for the manufacture of proteins. In replication, the DNA unwinds by unlinking the hydrogen bonds between members of each base pair (the unlinking is accomplished through the action of certain chemicals called enzymes.) This permits each strand of DNA to serve as a template for the synthesis of a new strand. Because of the base pairing rules, the new strand is exactly complementary to the original. Therefore one starts with one DNA molecule and ends up with two DNA molecules. In each of the two resulting molecules the double helix consists of one old strand from the original molecule and one new strand, and the new DNA molecules are identical to the original, base-sequence for base-sequence. This method of replication is called semiconservative because, in each DNA molecule thus produced, one strand is conserved from the original molecule and one strand is new.

Manufacture of Proteins
The chief building operation for cells is the manufacture of proteins. The manufacture of proteins is also initiated by the unlinking of DNA strands. However, in this case only a portion of the chain, corresponding to particular genes carrying the code blueprint for making particular proteins, is unlinked. The base pair sequence on the unlinked segments then serves as a template to assemble a messenger RNA (mRNA) chain. Because of the base pairing rules, the mRNA molecule that is assembled encodes the complement of the base pairing sequence on the unlinked segment of DNA. Recall that for RNA the base thiamine (T) found in the DNA nucleotides is replaced by uracil (U). For the assembly of the RNA strand on the unlinked DNA strand, the base pairing rules are the same as those for DNA, but with T replaced by U. That is, they read A with U and G with C. The basic procedure for protein manufacture may be described as:

1. DNA guides the synthesis of RNA (the transcription step): At an unwound section of a DNA molecule in the cell nucleus, a single strand of messenger RNA (mRNA) is assembled on the unwound DNA template. This molecule is called the RNA transcript. Since it carries the complement of base pairs transcribed from the DNA, it holds the genetic information encoded in the unwound segment of DNA that was transcribed.
2. RNA guides the synthesis of proteins (the translation step): The transcript RNA carries the genetic information transcribed from the DNA out of the cell nucleus into the cytoplasm (the watery portion of the cell that is neither cell nucleus nor cell wall). There it interacts with the ribosomes to synthesize a protein from the 20 basic amino acids that can be used to assemble all proteins. As we have noted above, rRNA and tRNA, also produced by the DNA, are key players in the translation step. Which sequence of amino acids, and therefore which protein is synthesized, is dictated by the genetic blueprint carried by the mRNA transcript molecule.

As might be guessed, the details of manufacturing proteins are much more complex, but in essence the details serve to implement the two steps listed above. This sequence is summarized concisely in what is sometimes called the Central Dogma of Molecular Genetics: "DNA makes RNA, which makes proteins". See the above right figure.

The Triplet Genetic Code

Detailed experiments indicate that each series of three nucleotides along a DNA strand orders where particular amino acid building blocks will be placed when RNA synthesizes proteins. That is, the genetic code is a triplet code. The 3-nucleotide units in RNA strands carrying this information are called codons. For example, UUU or UUG are examples of codons, the first corresponding to the 3-base sequence uracil-uracil-uracil and the second to uracil-uracil-guanine. Therefore, the codons are transcribed from the DNA and strung out in sequence to make mRNA. The transfer RNA (tRNA) responsible for transporting the correct amino acid to the protein assembly point is made from anticodons, which are nucleotide triplets that pair their bases with the mRNA codons. For example, AAA is the anticodon of UUU, since the RNA base pairing rule is A with U and G with C.

Because there are 4 bases, there are 4 x 4 x 4 = 64 ways that the bases can combine in groups of three. That is, given one of the 4 bases there are 4 possibilities to combine it with a second base (giving 16 independent combinations), and each of these can combine with one of 4 bases, giving a total of 4 x 16 = 64 possibilities for the number of independent ways to combine 3 bases. Of these 64 codons, 61 specify a unique amino acid (this property is called uniqueness), but because there are only 20 amino acids and 61 codons available, most amino acids are ordered by more than one codon (this property is called degeneracy). For example, the mRNA codons UGU and UGC both are found to order the amino acid called cysteine. Since the mRNA is a reverse copy of the DNA which carries the genetic code (because of the base pairing in the replication step), the genetic code for cysteine corresponds to the sequences ACA or ACG in the DNA (that is, the complement of UGU or UGC). The remaining 3 codons of the 64 possibilities are "stop commands" that end the addition of amino acids to the chain and signal the completion of the protein under construction. The following figure illustrates the 64 possible combinations for the genetic code and the corresponding amino acids for which they code (see the table below for amino acid abbreviations).

The complete list of amino acids and the corresponding codons are given in the following table.

The Amino Acids and the Triplet Genetic Code
Symbol Name Codons
ala alanine GCU GCC GCA GCG
val valine GUU GUC GUA GUG
leu leucine UUA UUG CUU CUC CUA CUG
ile isoleucine AUU AUC AUA
pro proline CCU CCC CCA CCG
met methionine AUG
phe phenylalanine UUU UUC
trp tryptophan UGG
gly glycine GGU GGC GGA GGG
ser serine UCU UCC UCA UCG AGU AGC
thr threonine ACU ACC ACA ACG
cys cysteine UGU UGC
tyr tyrosine UAU UAC
asn asparagine AAU AAC
gln glutamine CAA CAG
lys lysine AAA AAG
arg arginine CGU CGC CGA CGG AGA AGG
his histidine CAU CAC
asp aspartic acid GAU GAC
glu glutamic acid GAA GAG