15.1: The Genetic Code - Biology

15.1: The Genetic Code - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Skills to Develop

  • Explain the “central dogma” of protein synthesis
  • Describe the genetic code and how the nucleotide sequence prescribes the amino acid and the protein sequence

The cellular process of transcription generates messenger RNA (mRNA), a mobile molecular copy of one or more genes with an alphabet of A, C, G, and uracil (U). Translation of the mRNA template converts nucleotide-based genetic information into a protein product. Protein sequences consist of 20 commonly occurring amino acids; therefore, it can be said that the protein alphabet consists of 20 letters (Figure (PageIndex{1})). Each amino acid is defined by a three-nucleotide sequence called the triplet codon. Different amino acids have different chemistries (such as acidic versus basic, or polar and nonpolar) and different structural constraints. Variation in amino acid sequence gives rise to enormous variation in protein structure and function.

The Central Dogma: DNA Encodes RNA; RNA Encodes Protein

The flow of genetic information in cells from DNA to mRNA to protein is described by the Central Dogma (Figure (PageIndex{2})), which states that genes specify the sequence of mRNAs, which in turn specify the sequence of proteins. The decoding of one molecule to another is performed by specific proteins and RNAs. Because the information stored in DNA is so central to cellular function, it makes intuitive sense that the cell would make mRNA copies of this information for protein synthesis, while keeping the DNA itself intact and protected. The copying of DNA to RNA is relatively straightforward, with one nucleotide being added to the mRNA strand for every nucleotide read in the DNA strand. The translation to protein is a bit more complex because three mRNA nucleotides correspond to one amino acid in the polypeptide sequence. However, the translation to protein is still systematic and colinear, such that nucleotides 1 to 3 correspond to amino acid 1, nucleotides 4 to 6 correspond to amino acid 2, and so on.

The Genetic Code Is Degenerate and Universal

Given the different numbers of “letters” in the mRNA and protein “alphabets,” scientists theorized that combinations of nucleotides corresponded to single amino acids. Nucleotide doublets would not be sufficient to specify every amino acid because there are only 16 possible two-nucleotide combinations (42). In contrast, there are 64 possible nucleotide triplets (43), which is far more than the number of amino acids. Scientists theorized that amino acids were encoded by nucleotide triplets and that the genetic code was degenerate. In other words, a given amino acid could be encoded by more than one nucleotide triplet. This was later confirmed experimentally; Francis Crick and Sydney Brenner used the chemical mutagen proflavin to insert one, two, or three nucleotides into the gene of a virus. When one or two nucleotides were inserted, protein synthesis was completely abolished. When three nucleotides were inserted, the protein was synthesized and functional. This demonstrated that three nucleotides specify each amino acid. These nucleotide triplets are called codons. The insertion of one or two nucleotides completely changed the triplet reading frame, thereby altering the message for every subsequent amino acid (Figure (PageIndex{4})). Though insertion of three nucleotides caused an extra amino acid to be inserted during translation, the integrity of the rest of the protein was maintained.

Scientists painstakingly solved the genetic code by translating synthetic mRNAs in vitro and sequencing the proteins they specified (Figure (PageIndex{3})).

In addition to instructing the addition of a specific amino acid to a polypeptide chain, three of the 64 codons terminate protein synthesis and release the polypeptide from the translation machinery. These triplets are called nonsense codons, or stop codons. Another codon, AUG, also has a special function. In addition to specifying the amino acid methionine, it also serves as the start codon to initiate translation. The reading frame for translation is set by the AUG start codon near the 5' end of the mRNA.

The genetic code is universal. With a few exceptions, virtually all species use the same genetic code for protein synthesis. Conservation of codons means that a purified mRNA encoding the globin protein in horses could be transferred to a tulip cell, and the tulip would synthesize horse globin. That there is only one genetic code is powerful evidence that all of life on Earth shares a common origin, especially considering that there are about 1084 possible combinations of 20 amino acids and 64 triplet codons.

Link to Learning

Transcribe a gene and translate it to protein using complementary pairing and the genetic code at this site.

Degeneracy is believed to be a cellular mechanism to reduce the negative impact of random mutations. Codons that specify the same amino acid typically only differ by one nucleotide. In addition, amino acids with chemically similar side chains are encoded by similar codons. This nuance of the genetic code ensures that a single-nucleotide substitution mutation might either specify the same amino acid but have no effect or specify a similar amino acid, preventing the protein from being rendered completely nonfunctional.

Scientific Method Connection

Which Has More DNA: A Kiwi or a Strawberry?

Question: Would a kiwifruit and strawberry that are approximately the same size (Figure (PageIndex{5})) also have approximately the same amount of DNA?

Background: Genes are carried on chromosomes and are made of DNA. All mammals are diploid, meaning they have two copies of each chromosome. However, not all plants are diploid. The common strawberry is octoploid (8n) and the cultivated kiwi is hexaploid (6n). Research the total number of chromosomes in the cells of each of these fruits and think about how this might correspond to the amount of DNA in these fruits’ cell nuclei. Read about the technique of DNA isolation to understand how each step in the isolation protocol helps liberate and precipitate DNA.

Hypothesis: Hypothesize whether you would be able to detect a difference in DNA quantity from similarly sized strawberries and kiwis. Which fruit do you think would yield more DNA?

Test your hypothesis: Isolate the DNA from a strawberry and a kiwi that are similarly sized. Perform the experiment in at least triplicate for each fruit.

  1. Prepare a bottle of DNA extraction buffer from 900 mL water, 50 mL dish detergent, and two teaspoons of table salt. Mix by inversion (cap it and turn it upside down a few times).
  2. Grind a strawberry and a kiwifruit by hand in a plastic bag, or using a mortar and pestle, or with a metal bowl and the end of a blunt instrument. Grind for at least two minutes per fruit.
  3. Add 10 mL of the DNA extraction buffer to each fruit, and mix well for at least one minute.
  4. Remove cellular debris by filtering each fruit mixture through cheesecloth or porous cloth and into a funnel placed in a test tube or an appropriate container.
  5. Pour ice-cold ethanol or isopropanol (rubbing alcohol) into the test tube. You should observe white, precipitated DNA.
  6. Gather the DNA from each fruit by winding it around separate glass rods.

Record your observations: Because you are not quantitatively measuring DNA volume, you can record for each trial whether the two fruits produced the same or different amounts of DNA as observed by eye. If one or the other fruit produced noticeably more DNA, record this as well. Determine whether your observations are consistent with several pieces of each fruit.

Analyze your data: Did you notice an obvious difference in the amount of DNA produced by each fruit? Were your results reproducible?

Draw a conclusion: Given what you know about the number of chromosomes in each fruit, can you conclude that chromosome number necessarily correlates to DNA amount? Can you identify any drawbacks to this procedure? If you had access to a laboratory, how could you standardize your comparison and make it more quantitative?


The genetic code refers to the DNA alphabet (A, T, C, G), the RNA alphabet (A, U, C, G), and the polypeptide alphabet (20 amino acids). The Central Dogma describes the flow of genetic information in the cell from genes to mRNA to proteins. Genes are used to make mRNA by the process of transcription; mRNA is used to synthesize proteins by the process of translation. The genetic code is degenerate because 64 triplet codons in mRNA specify only 20 amino acids and three nonsense codons. Almost every species on the planet uses the same genetic code.

Review Questions

The AUC and AUA codons in mRNA both specify isoleucine. What feature of the genetic code explains this?

  1. complementarity
  2. nonsense codons
  3. universality
  4. degeneracy


How many nucleotides are in 12 mRNA codons?

  1. 12
  2. 24
  3. 36
  4. 48


Free Response

Imagine if there were 200 commonly occurring amino acids instead of 20. Given what you know about the genetic code, what would be the shortest possible codon length? Explain.

For 200 commonly occurring amino acids, codons consisting of four types of nucleotides would have to be at least four nucleotides long, because 44 = 256. There would be much less degeneracy in this case.

Discuss how degeneracy of the genetic code makes cells more robust to mutations.

Codons that specify the same amino acid typically only differ by one nucleotide. This nuance of the genetic code ensures that a single-nucleotide substitution mutation might either specify the same amino acid and have no effect, or may specify a similar amino acid, preventing the protein from being rendered completely nonfunctional.


Central Dogma
states that genes specify the sequence of mRNAs, which in turn specify the sequence of proteins
three consecutive nucleotides in mRNA that specify the insertion of an amino acid or the release of a polypeptide chain during translation
in terms of RNA and protein, three “units” of RNA (nucleotides) specify one “unit” of protein (amino acid) in a consecutive fashion
(of the genetic code) describes that a given amino acid can be encoded by more than one nucleotide triplet; the code is degenerate, but not ambiguous
nonsense codon
one of the three mRNA codons that specifies termination of translation
reading frame
sequence of triplet codons in mRNA that specify a particular protein; a ribosome shift of one or two nucleotides in either direction completely abolishes synthesis of that protein

Genetic code

The genetic code is the set of rules used by living cells to translate information encoded within genetic material (DNA or mRNA sequences of nucleotide triplets, or codons) into proteins. Translation is accomplished by the ribosome, which links proteinogenic amino acids in an order specified by messenger RNA (mRNA), using transfer RNA (tRNA) molecules to carry amino acids and to read the mRNA three nucleotides at a time. The genetic code is highly similar among all organisms and can be expressed in a simple table with 64 entries. [1]

The codons specify which amino acid will be added next during protein synthesis. With some exceptions, [2] a three-nucleotide codon in a nucleic acid sequence specifies a single amino acid. The vast majority of genes are encoded with a single scheme (see the RNA codon table). That scheme is often referred to as the canonical or standard genetic code, or simply the genetic code, though variant codes (such as in mitochondria) exist.

The Central Dogma: DNA Encodes RNA RNA Encodes Protein

The flow of genetic information in cells from DNA to mRNA to protein is described by the Central Dogma (Figure), which states that genes specify the sequence of mRNAs, which in turn specify the sequence of proteins. The decoding of one molecule to another is performed by specific proteins and RNAs. Because the information stored in DNA is so central to cellular function, it makes intuitive sense that the cell would make mRNA copies of this information for protein synthesis, while keeping the DNA itself intact and protected. The copying of DNA to RNA is relatively straightforward, with one nucleotide being added to the mRNA strand for every nucleotide read in the DNA strand. The translation to protein is a bit more complex because three mRNA nucleotides correspond to one amino acid in the polypeptide sequence. However, the translation to protein is still systematic and colinear, such that nucleotides 1 to 3 correspond to amino acid 1, nucleotides 4 to 6 correspond to amino acid 2, and so on.

Instructions on DNA are transcribed onto messenger RNA. Ribosomes are able to read the genetic information inscribed on a strand of messenger RNA and use this information to string amino acids together into a protein.

Quick Notes on Genetic Code | Cell Biology

Living things depend on proteins for exis­tence, the latter produce enzymes necessary for all chemical reactions. Structural infor­mation required to specify the synthesis of any given protein resides in the molecule of DNA which has the spatial configuration of a double helix proposed by Watson and Crick (1953).

The linear sequence of bases in DNA consti­tutes alphabet (hereditary lettering of 4 bases – A, T, C, C) which ‘codes’ for another linear structure, a protein, written in another alphabet of 20 amino acids.

The actual transfer of infor­mation is, however, indirect. DNA is a ‘tem­plate’ for the formation of RNAs, which are incorporated into ribosomes and in turn act as templates for protein synthesis.

All properties of protein, including its secondary and tertiary structure, are ultimately determined by chro­mosomal DNA, and all biological properties are in turn determined by the amino acid sequence of the proteins within an organism, through protein structure and enzyme activity.

The term ‘coding’ implies the relationship between DNA and protein. By coding, the hereditary lettering carried in the four alphabet of DNA is ultimately converted into the protein language composed of twenty letter alphabet of amino acids.

Co-linearity of Gene and Polypep­tide:

In 1958, Crick proposed the hypothesis that DNA determines the sequence of amino acids in a polypeptide. Fundamental to this relationship is that they are both linear in structures, in one case a sequence of nucleotides, in the other case a sequence of amino acids.

By comparing the nucleotide sequence of a gene with the amino acid sequence of a protein, we can determine directly whether the gene and the protein are co-linear or not. A gene of 3N base pairs is required to code for a protein of N amino acids.

The co-linearity of gene and protein was ori­ginally investigated in the tryptophan synthetase gene of E. coli by Yanofsky and his co-workers by utilizing a polypeptide chain A of tryptophan synthetase enzyme. It has been observed that different mutations in the DNA sequence were present in the same order as is observed in the alterations noticed in corresponding amino acid sequence in polypeptide chain A.

The recom­bination distances are relatively similar to the actual distances in the protein, so in this case there is much similarity between the recombina­tion map and the physical map.

For eukaryotic split gene having introns where all base sequences are not translated into amino acid in proteins demonstrates that co-linearity between base sequence of gene and amino acid sequence in protein may be interrupted but not violated.

Properties of Genetic Code:

Code is Triplet:

Researches have been carried out by Ochoa, Kornberg, Nirenberg, Brenner, Crick and others to detect the coding ratio, i.e., the number of units in one system required to specify one unit in the other system. Certainly no one-to-one correspondence can be observed between nucleotides and amino acids.

If each kind of nucleotide specified a single amino acid, only proteins consisting of four amino acids could be constructed. Similarly, the correspondence of an amino acid to two nucleotides would give a larger number of possibilities but still not enough, only = 16.

If a three digit code is employed, however, a total of = 64 kinds of units or codons are established (Fig. 15.1), more than enough to encode twenty amino acids. The surplus forty four triplets were initially thought to be nonsense codons and the remaining twenty as sense codons.

However, later studies have shown that several triplets can code for one amino acid. As such the number of nonsense triplets is very few. Some of the nonsense triplets might also be used as ‘punctuations’, designating the end of a chemical message.

Critical information on the nature of coding units (i.e., the code is in triplets) was gathered from studies of the muta­genic effect on polynucleotide chain (DNA).

Application of mutagen leads to the deletion or duplication of one nucleotide pair or several adjacent pairs. Addition or deletion of one or two bases respectively often causes a drastic effect and the organisms ultimately dies.

The addition or deletion of three bases together, on the other hand, though causing changes in the behaviour of the organism, yet may not necessarily induce a lethal effect and organism may survive with altered mutated tissue.

(i) The direct and exact evidence suppor­ting the triplet code concept was provided by Crick et al. (1961) based on their experiments on a virus, T4 bacteriophage (Fig. 15.2). They found, that the treatment with a chemical called pro-flavin either added or removed a base in its DNA molecule, thus damaging the virus and resulting in an altered or mutant form of the virus.

An addition followed by a deletion of base close by resulted in the restoration of the original virus. This implied that the normal sequences of bases in the DNA molecule had been restored by the second change.

A deletion or insertion completely upsets the reading frame as may be seen from the example of the base sequence GTCCAGACC. Normally the sequence will be read as GTC, CAG, ACC, …, but with the insertion of a new base T between the first and second nucleotides, it yields the sequence GTTCCAGACC … and leads to reading in the groups GTT, CCA, GAC, C …, and specifies wrong amino acids.

A similar con­sequence results from a deletion. Crossing between an addition and deletion will restore the correct reading frame of the sequence except in the region between them. It is easy to see that the combinations of two mutants in the form of two insertions or two deletions will still produce a misplaced reading frame.

Crick (1961) found that three additions or deletions of adjacent nucleotides resulted in the production of the normal virus, due to the restoration of the normal base sequence in DNA.

Thus experiments demonstrating that a combina­tion of three insertions or deletions produced a bacteriophage of perfectly normal appearance and that recombinants containing insertions or deletions in numbers not multiples of three pro­duce only nonfunctional or wrong protein, pro­vided strong evidence that the genetic code operates as a triplet code or that one triplet of nucleotides constitutes a codon.

(ii) The triplet nature of the code was fur­ther confirmed through the research work of Nirenberg and Leder (1965) who found that although little binding of tRNA was possible in the presence of dinucleotide messengers, it occurred preferentially with trinucleotides.

They were able to stimulate binding of different amino acids through different sequences of the same three bases, once again giving credence to the existence of a triplet code.

Code is Non-Overlapping:

In nature, there is always a tendency towards economy. As suggested by Gamow, in his ‘over­lapping’ coding hypothesis, the code is in the form of triplets, but not arranged in a straight chain. It is overlapping in the regions where a particular nucleotide serves in more than one coding unit.

Gamow suggested overlapping code on the basis of two characteristics:

(a) Distance between two bases in a DNA molecule is 3.4A

(b) In a protein molecule also, the distance between two adjacent amino acids is 3.4A.

This can be explained in cases of mono-coding as well as overlapping coding but this is quite impro­bable in a straight chain triplet coding. In the non-overlapping code six nucleotides would code for two amino acids, while in case of over­lapping code up-to four (Fig. 15.3).

In the non- overlapping code each letter Is read only once while in the overlapping code it would be read three times, each time as a part of different words. Mutational changes in one letter would affect only one word in the non-overlapping code while it would affect three words in the overlapping code.

There are evidences of non- overlapping nature of genetic code.

(i) The experimental evidence by Crick (1961) compellingly argued against an over­lapping code and through their research substan­tiated the arguments provided by earlier scien­tists in favour of a non-overlapping code. They started with a messenger of known triplet sequence and used this to synthesize a particular protein.

On adding a nucleotide to it, the parti­cular protein could no longer be synthesized. The result remained unaltered even with the addition of a second necleotide. The proper function of the nucleotide was restored, how­ever, on introduction of a third nucleotide.

A given nucleotide sequence ACTACTAC- TACT bears the codons ACT, ACT, ACT, ACT under the non-overlapping coding systems. An insertion of a nucleotide G between the first C and the first T, under such a system will change the nucleotide sequence to ACGTACTACTACT and codon sequences to ACG, TAG, TAG, TAG, T.

The synthesis of original protein will not take place after the addition of a nucleotide. Instead the altered amino acid chain will be producing an altogether different protein. A second inser­tion of another nucleotide G between the first C and first G of the previously altered nucleotide chain results into a new nucleotide sequence ACGGTACTACTACT and the corresponding codon sequence ACG, GTA, CTA, CTA, CT.

The particular protein still cannot be synthesized. A third nucleotide addition, an insertion of nucleotide G, in the beginning of the nucleotide chain available after the last step causes it to read as GAGGGTACTACTACT and the corresponding codon chain available is GAC, GGT, ACT, ACT, ACT.

The third addition has restored most of the original triplet sequence. The deletion of bases from DNA has the same effect as that of deletion. The third deletion will, however, restores most of the reading frame and allow a sequence of amino acids, differing slightly from its original one. This suggests that the code is non-overlapping.

(ii) Another evidence supporting the exis­tence of a non-overlapping code is provided by the effect of single-site mutations.

A single muta­tion in an overlapping coding system would invariably affect two or more adjacent amino acids in the nucleotide chain. A mutation from the first G to C in the nucleotide sequence ATGATGATG will cause change in one codon only in the case of a non-overlapping code. The original codon sequence of ATG, ATG, ATG will result into a codon sequence ATC, ATG, ATG after single mutation.

However, if the code was an overlapping one, the original codon sequence ATG, TGA, GAT, ATG, TGA, GAT, ATG will change into the codon sequence ATC, TGA, CAT, ATC, TGA, GAT, .ATG. As a result of single mutation, three changes take place. In the codon sequence when the overlapping code is in ope­ration.

Only one change would be expected in case of a non-overlapping code. Since only sin­gle amino acid changes have been observed in the experimental studies of single-site mutation, this evidence reinforces the existence of non-overlapping code.

(iii) Brenner (1957), on the basis of all the published data on the studies of the sequence of amino acids in proteins, concluded that there were no forbidden zones in proteins, and neigh­bouring amino acids were invariably coded by unrelated groups of nucleotides.

It was further established that no specific amino acid will always have the same nearest neighbours and the amino acid sequences appear to be almost completely at random. Such revelations would not have been feasible had the code been of an overlapping nature.

(iv) Yanofsky (1963) provided perhaps the most convincing evidence available that excludes any overlapping code. In his studies of both mutation and recombination through transduc­tion technique, he found that in each protein with a different amino acid at a given position, the amino acids on either side remained unchanged.

Code is Degenerate:

Sometimes three or four triplet codons code for a particular amino acid. Such a genetic code where there are more than one triplet (codon) codes for a single amino acid is known as degen­erate code. Out of possible 64 different codons, 61 codons code for different amino acids.

As there are 20 amino acids, so it is obvious that more than one codon or triplet codes for one amino acid. If each amino acid is coded by a single codon, 44 codons out of 64 will be useless or nonsense codons.

Numerous evidences indicate that the genetic code is degenerate.

(i) If twenty triplets only would have made sense and the remaining forty four remained non­sense, then in a chromosome length mutations could occur only at very limited sites representing one-third of the length and not throughout its entire length.

But the rate of spontaneous muta­tion as well as the results of induced mutation through X-rays has shown that nearly the entire chromosome site is capable of undergoing muta­tion. It is possible if only when the code is degene­rate. However, though the degenerate nature of the code has been established, the presence of high number of repeated sequences may make major segments of chromosomes non-mutable.

(ii) When two bases U and C, in a 3:1 pro­portion are synthesized into in RNA, the possible triplets and their frequency can be mathemati­cally determined :

UUU = 3/4 x 3/4 x 3/4 = 27/64 UUC = 3/4 x 3/4 x 1/4 =9/64 UCU = 3/4 X 1/4 X 3/4 = 9/64 CUU = 1/4 x 3/4 x 3/4 = 9/64 UCC = 3/4 x 1/4 X 1/4 = 3/64 CUC = 1/4 x 3/4 X 1/4 = 3/64 CCU = 1/4 x 1/4 x 3/4 = 3/64 CCC = 1/4 X 1/4 X 1/4 = 1/64.

mRNA of this compo­sition should guide the incorporation of eight amino acids but in fact only four amino acids were actually detected in the protein chain indi­cating the degenerate nature of the code, i.e., some of the codons in this case have directed the incorporation of the same amino acid.

(iii) According to the wobble hypothesis of Crick (1966), the first two bases of the triplet codon pair according to the set rules, i.e., A with U and G with C but the third base having much more freedom of movement than the other two, wobbles and permits more than one type of pair­ing at that position. Thus the wobble hypothesis explains the degeneracy of the code to some extent.

It is sometimes argued that the third base of a code is not very important and that specificity of a codon is particularly determined by the first two bases. It has been shown that the same tRNA can recognise more than one codons differing only at the third posi­tion. This paring is not very stable and is allowed due to wobbling in base pairing at this third posi­tion.

Crick in 1965 proposed a hypothesis called wobble hypothesis to explain this phenomenon. He discovered that if U is present at first position of anticodon, it can pair with either A or G at the third position of codon. Similar is the case with G, found in anticodon, which can pair with either C or U of codon (Table 15.1 A).

The wobble hypothesis visualizes that many codons are able to tolerate mutations at the third base site because of the non-restrictive spatial limitations for the corresponding base in the anti- codon. The third nucleotide in many codons was better tolerated and could be substituted without damage.

The corresponding base in the anticodon would wobble and accommodate. This kind of wobbling allows economy of the number of tRNA molecules since several codons meant for same amino acid are recognized by same tRNA.

Code is Comma-less:

A comma-less code means that no punctua­tion marks are needed between two words. In other words, we can say that after one amino acid is coded, the second amino acid will be automatically coded by the next three letters and no letters are wasted (Fig. 15.4).

However, the code for an entire polypeptide having several amino acids is always terminated by a nonsense codon which servers as full stop in the coding terminology.

If the genetic code functions with commas, a specific nucleotide serves as a punc­tuation mark. Through experiments it has been established that poly-A (AAA) codes for lysine, poly-C (CCC) for proline, and poly-U (UUU) for phenylalanine, which implies that the commas are not made up of A, C and U.

Code is Non-Ambiguous:

Ambiguity denotes that a single codon may code for more than one amino acid. Non- ambiguous means that there is no ambiguity about a particular codon. A particular codon will always code for the same amino acid.

The genetic code is generally non-ambiguous, can be experimentally confirmed using a specific single triplet-ribosome complex which directs the binding of specific tRNA. For example, UUU triplet-ribosome complex directs the binding of phenylalanine-tRNA and AAA triplet-ribosome complex directs the binding of the lysine-tRNA.

In the similar manner, by using the triplets of known sequence, the codons for valine, cysteine, leucine and some other amino acids were determined, thus clearly establishing the non-ambiguous nature of the genetic code under natural physiological conditions.

Code is Universal:

The genetic code is universal. It means that the same codon codes for the same amino acid in all the organisms, from human beings to virus.

Universal nature of genetic code has been experimentally evidenced.

(i) The crucial point in the genetic code is the fitting of tRNA with specific anticodon into the codon of the mRNA.

Thus if mRNA is taken from an eukaryote and tRNA from a prokaryote and protein synthesis could be carried as coded in the mRNA, then it can be proved that code is universal, if mRNA and ribosome are taken from E. coli, and amino acid and tRNA from rat, pro­tein synthesis can be carried out as coded in the mRNA of E. coli. This is true also the other way round.

Von Ehrenstein and Lipmann found that E. coli tRNA to which labeled amino acids were added would form haemoglobin when incubated with the mRNA and ribosomes of rabbit reticulo­cytes.

The precision with which this interspecific attachment occurs was shown by converting cysteine into alanine in amino acid-activated tRNAcys and then observing that this alanine was now inserted into peptide positions ordinari­ly occupied by cysteine, in other words, the anti- codon of the cysteine-tRNA of a bacterial species recognized the cysteine codon of mammalian mRNA in spite of the fact that the tRNA was carrying an alanine amino acid.

(ii) The tRNA from E. coli, Xenopus laevis and guineapig bind to the same trinucleotides as shown by Nirenberg et al., indicates the univer­sality of the code.

(iii) Studies of Merril and co-workers (1971) revealed that a bacterial enzyme X-D-galactose -1 phosphate uridyl transferase which catalyses the metabolism of galactose sugars is produced in human tissue culture cells, previously unable to make it, after infection by a virus carrying the E. coli gal + gene. This provides strong evidence in favour of the universality of the code.

(iv) The correlated nucleotide and amino acid sequences in the overlapping genes of the DNA bacteriophage ф x 174 and in the capsid protein coding gene of RNA bacteriophage MS2 indicates that the genetic code is universal.

(v) Uniformity in amino acid sequence of homologous proteins, e.g., cytochrome c collec­ted from widely divergent species like human, horse, chickens, yeast and bacteria displayed universality of the genetic code.

(vi) Finally genes from human and other organisms have been expressed in E. coli and those from bacteria and other organisms in plants. In each such case, the polypeptide produced by a gene in the new organism was identical with the one it produced in the orga­nism of its origin.

Exceptions of Genetic Code:

A triplet codon demands its own tRNA with a complementary anticodon or a single tRNA responds to both members of a codon pair or to all (or at least some) of the four members of a codon family. Often one tRNA can recognise more than one codon, i.e., codon is degenerate.

This means that the base in the first position of the anticodon must be able to partner alternative bases in the corresponding third position of the codon. In such cases there may be differences in the efficiencies of the alternative recognition reactions (as a general rule, codons that are com­monly used tend to be more efficiently read).

In addition to the constructions of a set of tRNAs able to recognise all the codons, there may be multiple tRNAs that respond to the same codon. The predictions of wobble pairing accord very well with the observed abilities of almost all tRNAs. But there are exceptions in which the codons recognized by a tRNA differ from those predicted by the wobble rules.

Such effects pro­bably result from the influence of neighbouring bases and/or the conformation of the anticodon loop in the overall tertiary structure of the tRNA. Indeed, the importance of the structure of anti­codon loop is inherent in the idea of the wobble hypothesis itself.

Further support for the influ­ence of the surrounding structure is provided by the isolation of occasional mutants in which a change in a base in some other region of the molecule alters the ability of the anticodon to recognize codons.

Another unexpected pairing reaction is pre­sented by the ability of the bacterial initiator, fMet-tRNA ƒmet to recognize both AUG and GUG. This misbehavior involves the third base of the anticodon. Though the genetic code is non-ambiguous, but GUG codes for methionine when used as initiator codon, but it codes for valine if present at the intercalary position, indi­cating its ambiguous nature.

The universality of the genetic code is stri­king, but some exceptions exist. They tend to affect the codons involved in initiation or termi­nation and result from the production (or absence) of tRNAs representing certain codons. Almost all of the changes found in principal genomes affect termination codons.

In the prokaryote Mycoplasma capricolum, UGA is not used for termination, instead codes for tryptophan. In fact, it is the predominant Trp codon, and UGG is used only rarely. Two Trp-tRNA species exist, with the anticodons UCA (reads UCA and UGG) and CCA (reads only UGG).

Some ciliates (unicellular protozoa) read UAA and UAG as glutamine instead of termina­tion signals. Tetrahymena thermophile, one of the ciliates, contains three tRNAglu species. One recognises the usual codons CAA and CAG for glutamine, one recognises both UAA and UAG (according to wobble hypothesis), and the last recognizes only UAG.

We assume that the release factor eRF has a restricted specificity, compared with that of other eukaryotes.

In another ciliate (Euplotes octacarinatus), UGA codes for cysteine. Only UAA is used as a termination codon, and UAG is not found. The change in meaning of UGA might be accom­plished by a modification in the anticodon of tRNAcys to allow it to read UGA with the usual codon UGU and UGC.

The only substitution in coding for amino acids occurs in a yeast (Candida), where CUG means serine instead of leucine (and UAG is used as a sense codon).

All of these changes are sporadic, which is to say that they appear to have occurred indepen­dently in specific lines of evolution. They may be concentrated on termination codons, because these changes do not involve substitution of one amino acid for another. Thus the divergent uses of the termination codons could represent their ‘capture’ for normal coding purposes.

Exceptions to the universal genetic code also occur in the mitochondria from several species.

The earliest change was the employment of uni­versal stop codon UGA to code for tryptophan which is common to all (non-plant) mitochon­dria. It is not likely that UGA coded for trypto­phan in the universal code, but was changed to termination in cytoplasmic translation, because it is a stop codon in bacteria, plant mitochondria and nuclear genomes.

Departures from the universal code, all in non-plant mitochondria, are CUN (leucine) for threonine (in yeasts), AAA (lysine) for asparagine (in Platyhelminthes and echinoderms), UAA (stop) for tyrosine (in Planaria), and AGR (arginine) for serine (in several animal orders and for stop (in vertebrates) [N = A, U, G or C R = A or G) (Table 15.1B).

The mitochondria of plants and protozoans differ in importing and utilizing tRNAs encoded by the nuclear as well as the mitochondrial genome, whereas in animal mitochondria, all the tRNAs are encoded by the organelle.

The small number of tRNAs encoded by the mitochondrial genome highlights an important feature of the mitochondrial genetic system — the use of a slightly different genetic code, which is distinct from the universal code used by both prokaryotic and eukaryotic cells.

Some of these changes make the code simpler, by-replacing two codons that had different meanings with a pair that has a single meaning. Pairs treated like this include UGG and UGA both Trp instead one Trp and one termination) and AUG and AUA (both Met instead of one Met and other lie).

The changes are typically prece­ded by loss of a codon from all coding sequences in an organism or organelle, often as a result of directional mutation pressure, accompanied by loss of the tRNA that translates the codon.

The code reappears later by conversion of another codon and emergence of a tRNA that translates the reappeared codon with a different assign­ment. Changes in release factors also contribute to this revised assignment. Thus the genetic code, formerly thought to be frozen, is now known to be in a state of evolution.

Decipherence of Genetic Code:

It was not possible to say which codon of the possible 64 codons should code for which of the 20 amino acids until the first clue to this problem came when M.W. Nirenberg used in vitro sys­tem for the synthesis of a polypeptide using an artificially synthesized mRNA molecule.

In 1961 Nirenberg and Mathaei characterized the first specific coding sequences, which helped in analysis of genetic code.

Their success on decipherence of code was dependent on two experimental systems:

(i) In vitro (cell free) protein synthesizing system,

(ii) An enzyme, polynucleotide phosphorylase which allowed the synthesis of synthetic mRNAs. These mRNAs served as templates for polypeptide synthesis in the cell free system.

The enzyme polynucleotide phosphorylase functions metabolically in bacteria to degrade RNA, but with high concentrations of ribo­nucleotide diphosphates, the reaction can be ‘forced’ in the opposite direction to synthesize RNA.

Like RNA polymerase it does not require any DNA template, each addition of ribo­nucleotide is random based on the relative concentration of the four ribonucleoside diphos­phates added to the reaction mixtures. The probability of insertion of a specific ribonucleo­tide is proportional to the availability of that molecule, relative to other available ribonucleo­tides.

The cell free system for protein synthesis and the availability of synthetic mRNAs provided a means of deciphering the ribonucleotide compo­sition of various triplets encoding specific amino acids.

Homopolymers Technique (Poly U Experiment):

In their initial experiments, Nirenberg and Mathaei, synthesized RNA homopolymers, each consisting of only one type of ribonucleotide, i.e., the produced mRNA in the in vitro system is either UUUUU …, AAAAA …, CCCCC … or GGGGG … In testing each mRNA, it was very much easy to determine which amino acid was incorporated in the polypeptide chain.

Different amino acids were labelled by using 14 C and tested separately by radioactive counting. In the synthesized RNA using only uracil, there was no other base all along the length of mRNA and the only possible triplet was UUU.

When such a poly-U (RNA) was used in the synthesis of a polypeptide (using all extracts from E. coli, and supplying all the required components of protein synthesizing machinery), only polyphenylalanine was synthesized, meaning that the only amino acid coded was phenylalanine.

It was, therefore, immediately concluded that the input UUU coded for the amino acid phenylalanine. Subsequently, poly A gave polylysine and poly C gave poly-proline. Therefore, UUU was assigned to phenylalanine, AAA to lysine and CCC to pro­line. But the poly G did not serve as template as it gets folded backs on itself, for this assignment other method had been followed.

Heteropolymers (Random): Mixed Copolymers Technique:

The study of polynucleotides were further extended with copolymers as synthetic messen­gers containing two or more bases in definite proportion in cell free system. These randomly synthesized polynucleotides resulted in direct incorporation of amino acids into protein in a manner which indicated that a number of different code words are involved in the binding of different amino acids.

In cell free culture, with these synthetic polyribonucleotide’s, the different amino acids incorporated in a messenger could be clearly correlated with the expected variations in the frequency of different triplets in the synthetic copolymers. Thus this experiment showed the way of deriving nucleotide composition of triplets for each of the amino acids.

Nirenberg, Mathaei and Ochoa did their experiments using the RNA heteropolymers in this technique two or more different ribonucleoside diphosphates were added in combination to form the artificial message. The frequency of a particular triplet codon on the synthetic mRNA depended on the relative proportion of ribo­nucleotide addition in the cell free system.

The percentage of incorporation of particular amino acid in the polypeptide chain could be used for prediction against a particular triplet codon.

For example, in a system A and C are added in a ratio of 1 A: 5C. Now, the insertion of a ribonu­cleotide at any position along the RNA molecule during its synthesis is determined by the ratio of A:C. Therefore, there is a 1/6 possibility for an A and a 5/6 chance for a C to occupy each position.

On this basis, we can calculate the frequency of any given triplet appearing in the message. For AAA, frequency is (1/6) 3 or 0.4%. For AAC, ACA and CAA, the frequencies are identical (1/6) 3 x 5/6 or 2.3%, all three together it is 6.9%. In the same way 1A:2C is calculated which is 1/6 x (5/6) 2 or 11.6% or all together 34.8%, whereas CCC is (5/6)3 or 57.9% of the triplets.

Now by examining the percentage of any given amino acid incorporated into the protein synthesized under the direction of this message, it is possible to propose probable base composi­tion. As because proline appears 69%, it can be deduced that proline is likely to be coded by CCC (57.9%) and also by one of the triplet code 1A : 2C variety (11.6%), i.e., 57.9 + 11.6.

Histidine incorporation percentage is 14% which is probably coded by one 1A:2C category and another 1C:2A category (11.6+2.3)%. Threonine shows 12% incorporation, i.e., likely to be coded by one 1A:2C category. Asparagine and glutamine appear to be coded by one of the 1C:2A triplets and lysine appears to be coded by AAA.

Using as many as all four ribonucleotides to construct this kind of random heteropolymers of synthetic mRNA, the composition of triplet code words corresponding to all 20 amino acids could be determined (Table 15.2).

Heteropolymers (Ordered): Repea­ting Copolymers Technique:

In early 1960s H.G. Khorana could chemi­cally synthesize long RNA molecule consisting of short sequences repeated many times. The short sequences were of di-, tri- or tetra-nucleotides, which were replicated many a times and finally joined enzymatically to form the long polynu­cleotides.

The dinucleotide repeats will be trans­lated for two different amino acids trinucleotide repeats will be converted into 3 potential triplets, depending on the point at which initiation occurs and a tetra-nucleotide creates four repea­ting triplets.

When these synthetic mRNAs were added to a cell free system and amino acid incorporation is matched, the conclusions can be drawn from the composition assignment and triplet binding, and specific assignments were possible.

When the repeating dinucleotide sequence is UCUCUCUC…, it produces the triplets UCU and CUC — they can incorporate leucine and serine into the polypeptide. When the repeating trinucleotide sequence is UUCUUCUUC…, the possible triplets are of three kinds: UUC, UCU and CUU depending on the initiation point and they can incorporate phenylalanine, serine and leucine.

From the above two results it can be concluded that UCU and CUC encode for serine and leucine and also either UUC or CUU encodes for serine or leucine, while the other encodes for phenylala­nine. Further, when the tetra-nucleotide sequence UUAC is repeated then it produces the UUA, UAC, ACU and CUU.

Here the incorporated amino acids are leucine, threonine and tyrosine. In the above two cases, the common code is CUU and common amino acid incorporated is leucine, so it can be concluded that CUU encodes for leucine.

Now from these experiments logically it can be determined that UCU encodes for serine and the rest UUC encodes for phenylalanine and also the CUC encodes for leucine (Table 15.3).

Like this way, by logical interpretations, Khorana reaffirmed triplets that were already deciphered and filled in gaps left from other approaches (Table 15.4).

Triplet Binding Technique:

Nirenberg and Leder in 1964 found that if a synthetic tri-nucleotide for a known sequence is used with ribosome and a particular aminoacyl- tkNA, these will form a complex provided that the used codon codes for the amino acid attached to the given aminoacyl-tRNA.

In order to work out the code for all 20 amino acids, all the possible 64 triplets had to be tried in cell free culture.

In the experiment, 20 samples of the mixture of all 20 amino acids were taken and in each sample, one amino acid was made radioactive in such a manner that each and every amino acid is radioactive in one sample or the other, and no two samples have same radioactive amino acid. For instance, in one set valine has been labelled and the rest 19 remained unlabelled.

Similarly, in another set lysine was labelled and the rest 19 remained un-labelled. Then the tRNAs and ribosomes are mixed with each of these samples and the same codon is used for all sets. When the mixture is poured on the nitro­cellulose membrane, radioactivity on membrane will be observed only when the radioactive amino acid is taking part in the formation of complex.

Since in each sample the radioactive amino acid is known, it would be possible to detect the amino acid coded by a given codon by the presence of radioactivity on the membrane. Such a treatment was given to all 64 synthetic codons, and their respective amino acids were identified.

Codon Dictionary:

The base sequence in mRNA and the resul­ting amino acid sequence in protein reveals the code for each amino acid. All the 64 codons, along with their amino acids, are represented in Table 15.5.

An examination of the code table reveals the following characteristics:

i. Each codon consists of three nucleo­tides, i.e., the code is triplet. 61 codons represent 20 amino acids. Three represent (UAA, UAG, UGA) punctuation marks for termination of pro­tein synthesis.

ii. Almost all amino acids are coded by more than one codon, except methionine and tryptophan which have only one codon. Phenylalanine, tyrosine, histidine, glutamine, asparagine, lysine, aspartic acid, glutamic add and cysteine are the nine amino acids which are represented by two codons each. Three amino acids, i.e., arginine, serine and leucine have Six codons each. The table indicates the degeneracy of the genetic, code.

iii. If an amino acid has more than one codon, the first two nucleotides are identical and the third nucleotide can be either cytosine or uracil. Adenine and guanine are also similarly interchangeable at the third position. For example, UUU and UUC, both code for phenylalanine, and UCU, UCC, UGA and UCG code for serine.

However, there are some exceptions to the equi­valence rule of the first two nucleotides, as AGU and AGC also code for serine apart from UCU, UCC, UCA and UCG.

Similarly, the amino acid leucine is also coded- by six codons, i.e., UUA, UUG, CUU, CUC, CUA and CUG.

The frequent interchange of cytosine and uracil or guanine and adenine suggests that great variations can occur in AT/GC ratio in certain organisms without affecting large changes in the relative proportions of amino acids present in them, as for almost every amino acid there is one codon that carries G or C and another that carries A or U as its third nucleotide.

The two organisms carrying the same protein sequence information in their DNA, by selecting one or the other kind of synonym codon, can show different AT/GC ratios.

iv. The genetic code has a definite structure in the sense that the synonyms for the same amino acid are not randomly dispersed over the table but are usually found together. The only exceptions are the codons, six each for arginine, serine, and leucine, which are spread over the table.

v. Multiple codons for an amino acid show in general the similarity in first two nucleotides and it is the third nucleotide which varies.

AUG is the initiation codon, i.e., the polypeptide chain starts with methionine. This amino acid is the formulated form of methionine. The initiation codon binds to fmet-tRNA having an anticodon 3′ UAC 5′ which is identical to that of met-tRNA, i.e., both met- tRNA and fmet-tRNA are coded by AUG but the signal for the starting amino acid is much more complex than the signal for all other amino acids.

According to Stent, there exist two separable species of tRNA capable of accepting methionine. Methionine of only one of these is concerned into formyl methionine by the action of the special formulation enzyme. The other or ordinary met- tRNA incorporates methionine into the interior of the growing polypeptide chain and responds to the codon AUG only.

Formyl-met-tRNA initiates the polypeptide chain and responds to GUG (valine codon) also. The GUG while present at the initiation point, codes for methionine whereas in the intercalary position, it codes for valine. The anticodon of this species of tRNA seems to be per­missive with respect to the first nucleotide base of the codon and selective with respect to the second and third nucleotide bases.

UAA, UAG and UGA are the chain termination codons. They do not code for any of the amino acids but serve as stop codon. These codons do not have any tRNA but are read by specific proteins called release fac­tors. These codons are also called nonsense codons.

A mutation from a sense to nonsense codon in the middle of a genetic message results in the release of immature or incomplete polypeptides which do not have any biological activity. Nonsense mutations can be induced by mutagens. UAG was formerly known as amber, UAA as ochre and UGA as opal.

Section Summary

The genetic code refers to the DNA alphabet (A, T, C, G), the RNA alphabet (A, U, C, G), and the polypeptide alphabet (20 amino acids). The Central Dogma describes the flow of genetic information in the cell from genes to mRNA to proteins. Genes are used to make mRNA by the process of transcription mRNA is used to synthesize proteins by the process of translation. The genetic code is degenerate because 64 triplet codons in mRNA specify only 20 amino acids and three nonsense codons. Almost every species on the planet uses the same genetic code.

Additional Self Test Questions

1. Imagine if there were 200 commonly occurring amino acids instead of 20. Given what you know about the genetic code, what would be the shortest possible codon length? Explain.

2. Discuss how degeneracy of the genetic code makes cells more robust to mutations.


2. Codons that specify the same amino acid typically only differ by one nucleotide. In addition, amino acids with chemically similar side chains are encoded by similar codons. This nuance of the genetic code ensures that a single-nucleotide substitution mutation might either specify the same amino acid and have no effect, or may specify a similar amino acid, preventing the protein from being rendered completely nonfunctional.

The Cipher of the Genetic Code

A new approach to understanding of the genetic code is developed. In order to overcome the key paradox (and Darwinian selection problem) that the highly complex amino acid Phe is encoded by the simplest codons (UUY), and the simplest Gly encoded by the most complex codons (GGN) as well as the paradox of the duplication of some amino acids in the encoding process (Leu, Ser, Arg), we proposed an extension of the notion (and concept) of genetic code. For a better (and lighter) understanding of genetic coding, we proposed a hypothesis after that (under the conditions of allowed metaphoricity and modeling in biology) genetic code has to be understood, analogously in cryptology, as the unity of three entities: the code, the cipher of the code and the key of the cipher. In this hierarchy the term (and notion) "genetic code" remains what has been from the beginning: a connection between four-letter alphabet (four Py-Pu nucleotides, in form of codons) and a twenty-letter alphabet (twenty amino acids) the cipher is a specific chemical complementarity in chemical properties of molecules in the form: similarity in dissimilarity versus dissimilarity in similarity ("Sim in Diss vs Diss in Sim") and the key of cipher: the complementarity on the binary tree of the genetic code in the form: 0-15, 1-14, 2-13, …, 6-9, 7-8. These concepts improve understanding that within the two main Genetic Code Tables (of the nucleotide doublets and nucleotide Triplets) exists a sophisticated nuancing and balancing in the properties of the constituents of GC, including the balance of the number of molecules, atoms, and nucleons.

Keywords: Binary tree Canonical amino acids Cipher of the code Genetic code Gray code Particles number balance Protein amino acids The key of the cipher.

Amino Acid Structure

Figure 2 Structures of the 20 amino acids found in proteins are shown. Each amino acid is composed of an amino group (NH+3 ), a carboxyl group (COO-), and a side chain (blue). The side chain may be nonpolar, polar, or charged, as well as large or small. It is the variety of amino acid side chains that gives rise to the incredible variation of protein structure and function.

RNA: replicated from DNA

It is possible for RNA to replicate itself by mechanisms related to those used by DNA, even though it has a single-stranded instead of a double-stranded structure. In early cells RNA is thought to have replicated itself in this way. However, all of the RNA in present-day cells is synthesized by special enzymes that construct a single-stranded RNA chain by using one strand of the DNA helix as a template. Although RNA molecules are synthesized in the cell nucleus, where the DNA is located, most of them are transported to the cytoplasm before they carry out their functions.

The RNA molecules in cells have two main roles. Some, the ribozymes, fold up in ways that allow them to serve as catalysts for specific chemical reactions. Others serve as “messenger RNA,” which provides templates specifying the synthesis of proteins. Ribosomes, tiny protein-synthesizing machines located in the cytoplasm, “read” the messenger RNA molecules and “translate” them into proteins by using the genetic code. In this translation, the sequence of nucleotides in the messenger RNA chain is decoded three nucleotides at a time, and each nucleotide triplet (called a codon) specifies a particular amino acid. Thus, a nucleotide sequence in the DNA specifies a protein provided that a messenger RNA molecule is produced from that DNA sequence. Each region of the DNA sequence specifying a protein in this way is called a gene.

By the above mechanisms, DNA molecules catalyze not only their own duplication but also dictate the structures of all protein molecules. A single human cell contains about 10,000 different proteins produced by the expression of 10,000 different genes. Actually, a set of human chromosomes is thought to contain DNA with enough information to express between 30,000 and 100,000 proteins, but most of these proteins seem to be made only in specialized types of cells and are therefore not present throughout the body. (For further discussion, see below The nucleus.)

15.1: The Genetic Code - Biology

The genetic code allows biologists to read the "Books of Life." Codons, which consist of three nucleotides, are the "words" of these books. Living organisms "maintain these books in volumes" called chromosomes, and the collection of chromosomes constitutes the genome.
Since there are four different nucleotides T, C, A and G, there are 4 3 =64 possible codons. With the exception of three particular codons, which are "stops", each codon during protein synthesis leads to one of twenty amino acids. In other words, each three-letter nucleotide sequence corresponds to a specific amino acid. This correspondence is known as the genetic code and is summarized in the following table:

Second Position of Codon

How to Read the Table

The letters T, C, A and G represent respectively the four nucleotides thymine, cytosine, adenine and guanine. Of the three letters in a codon, the first appears at the left of the table, the second at the top, and the third on the right. For example, ATG codes methionine, TGG codes tryptophan, and either CAT or CAC codes histidine. The stops, which signal the end of an amino acid sequence (that is, a protein), correspond to TAA, TAG and TGA. Because the number of codons (64) is much bigger than the number of amino acids (20), the correspondence is not one-to-one: It is degenerate because several codons often correspond to the same amino acid, as is evident in the table.

This information is provided by the staff of Jupiter Scientific, an organization devoted to the promotion of science through books, the internet and other means of communication.

This web page may NOT be copied onto other web sites, but other sites may link to this page.

5.8 Using the genetic code

DNA is transcribed into RNA, which is translated into a polypeptide. Simply, a series consisting of four bases (A,G,C,T) is transcribed and translated into a series of 20 amino acids. This shift from one chemical language to another is accomplished through the genetic code, a set of precise rules that govern how every possible sequence of three RNA nucleotides (a codon) corresponds to a specific amino acid.

The genetic code can be read as a table, whereby each three-letter codon is broken into its first, second and third RNA bases. These bases converge on a single amino acid, and this amino acid is the actual translated product of the original DNA nucleotides. For example, figure 6 tells us that the amino acid tyrosine (Tyr) corresponds to the mRNA codon UAC. The genetic code also contains an initiation (or “start”) codon, AUG, and three “stop” codons: UAA, UAG, UGA. The initiation codon also codes for the amino acid Methionine. Methionine (with Tryptophan) is one of only two amino acids that correspond to a single codon. Most amino acids correspond to several three- base `sequences. In other words, the genetic code can be very redundant.

Figure 5.9 The genetic code consists of all the RNA codons and their associated amino acids.

Check Yourself

What else does DNA do?

Only 2% of the human genome codes for proteins, leading some scientists to refer to the remaining nucleotides as “junk DNA.” However, this moniker may be too harsh. Rather, it seems that much of your DNA is regulatory, and controls how much, or what kind, of a protein is synthesized. Some DNA is part of a gene that was active in our ancestors, but is no longer functional for example, humans have several vestigial olfactory genes, suggesting that modern humans are not as dependent on our sense of smell as were our hominid ancestors. And some DNA was adopted from viruses that attacked human ancestors these endogenous retroviruses say a lot about our evolutionary past with pathogens, but are not functioning genes.

It is also probable that many DNA sequences have functions yet to be identified. In 2012, a consortium of scientists reported on a decade-long project, The Encyclopedia of DNA Elements (ENCODE), suggesting that over 80% of DNA is somehow functional. ENCODE’s results have been applauded and criticized, a span of responses that highlights how DNA science is still in its infancy. Regardless of the exact percentages, it is clear that while much of our DNA does not directly code for protein, the human genome is far from being packed with junk.

Check Yourself

Watch the video: Βιολογία Ενδοσχολικές Β Λυκείου 2018 (August 2022).