Information

Why are prokaryotic promoter sequences written 5' to 3', when transcription proceeds from 3' to 5'?

Why are prokaryotic promoter sequences written 5' to 3', when transcription proceeds from 3' to 5'?


We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

It seems that the promoter regions are written from the 5' to 3' direction. My lecture notes, for example, give the -35 and -10 promoter regions sequences like so:

Wikipedia seems to agree, on their article about promoters.

However, RNA polymerase transcribes in the 3' to 5', as in it reads the template strand in the 3' to 5' direction. In that case, why aren't the promoter regions written also in the 3' to 5' direction?

Meaning, instead of the -10 sequence being5'-TATAAT-3', why is it not3'-ATATTA-5', in agreement with the direction of transcription?

My intuitive guess is that the $sigma$ factor recognises the promoter region in a 5' to 3' directionality. But I cannot find any information to back up this guess. Additionally, if this is true, how could it have been determined (experimentally)? Furthermore, whatever the answer is, does it also apply to transcription in eukaryotes?


DNA sequence by convention is written starting at the 5' position so the directionality is assumed without explicitly writing the 5' and 3'. As for why its not the template strand sequence. I would say that's also a convention for simplicity because you only need to write one sequence if you want to discuss genes and regulatory elements.


Initiation of Transcription in Prokaryotes

Prokaryotes do not have membrane-enclosed nuclei. Therefore, the processes of transcription, translation, and mRNA degradation can all occur simultaneously. The intracellular level of a bacterial protein can quickly be amplified by multiple transcription and translation events occurring concurrently on the same DNA template. Prokaryotic transcription often covers more than one gene and produces polycistronic mRNAs that specify more than one protein.

Our discussion here will exemplify transcription by describing this process in Escherichia coli, a well-studied bacterial species. Although some differences exist between transcription in E. coli and transcription in archaea, an understanding of E. coli transcription can be applied to virtually all bacterial species.


Biology 171

By the end of this section, you will be able to do the following:

  • List the different steps in prokaryotic transcription
  • Discuss the role of promoters in prokaryotic transcription
  • Describe how and when transcription is terminated

The prokaryotes, which include Bacteria and Archaea, are mostly single-celled organisms that, by definition, lack membrane-bound nuclei and other organelles. A bacterial chromosome is a closed circle that, unlike eukaryotic chromosomes, is not organized around histone proteins. The central region of the cell in which prokaryotic DNA resides is called the nucleoid region. In addition, prokaryotes often have abundant plasmids , which are shorter, circular DNA molecules that may only contain one or a few genes. Plasmids can be transferred independently of the bacterial chromosome during cell division and often carry traits such as those involved with antibiotic resistance.

Transcription in prokaryotes (and in eukaryotes) requires the DNA double helix to partially unwind in the region of mRNA synthesis. The region of unwinding is called a transcription bubble. Transcription always proceeds from the same DNA strand for each gene, which is called the template strand . The mRNA product is complementary to the template strand and is almost identical to the other DNA strand, called the nontemplate strand , or the coding strand. The only nucleotide difference is that in mRNA, all of the T nucleotides are replaced with U nucleotides ((Figure)). In an RNA double helix, A can bind U via two hydrogen bonds, just as in A–T pairing in a DNA double helix.


The nucleotide pair in the DNA double helix that corresponds to the site from which the first 5′ mRNA nucleotide is transcribed is called the +1 site, or the initiation site . Nucleotides preceding the initiation site are denoted with a “-” and are designated upstream nucleotides. Conversely, nucleotides following the initiation site are denoted with “+” numbering and are called downstream nucleotides.

Initiation of Transcription in Prokaryotes

Prokaryotes do not have membrane-enclosed nuclei. Therefore, the processes of transcription, translation, and mRNA degradation can all occur simultaneously. The intracellular level of a bacterial protein can quickly be amplified by multiple transcription and translation events that occur concurrently on the same DNA template. Prokaryotic genomes are very compact, and prokaryotic transcripts often cover more than one gene or cistron (a coding sequence for a single protein). Polycistronic mRNAs are then translated to produce more than one kind of protein.

Our discussion here will exemplify transcription by describing this process in Escherichia coli, a well-studied eubacterial species. Although some differences exist between transcription in E. coli and transcription in archaea, an understanding of E. coli transcription can be applied to virtually all bacterial species.

Prokaryotic RNA Polymerase

Prokaryotes use the same RNA polymerase to transcribe all of their genes. In E. coli, the polymerase is composed of five polypeptide subunits, two of which are identical. Four of these subunits, denoted α, α, β, and β‘, comprise the polymerase core enzyme . These subunits assemble every time a gene is transcribed, and they disassemble once transcription is complete. Each subunit has a unique role the two α-subunits are necessary to assemble the polymerase on the DNA the β-subunit binds to the ribonucleoside triphosphate that will become part of the nascent mRNA molecule and the β‘ subunit binds the DNA template strand. The fifth subunit, σ, is involved only in transcription initiation. It confers transcriptional specificity such that the polymerase begins to synthesize mRNA from an appropriate initiation site. Without σ, the core enzyme would transcribe from random sites and would produce mRNA molecules that specified protein gibberish. The polymerase comprised of all five subunits is called the holoenzyme .

Prokaryotic Promoters

A promoter is a DNA sequence onto which the transcription machinery, including RNA polymerase, binds and initiates transcription. In most cases, promoters exist upstream of the genes they regulate. The specific sequence of a promoter is very important because it determines whether the corresponding gene is transcribed all the time, some of the time, or infrequently. Although promoters vary among prokaryotic genomes, a few elements are evolutionarily conserved in many species. At the -10 and -35 regions upstream of the initiation site, there are two promoter consensus sequences, or regions that are similar across all promoters and across various bacterial species ((Figure)). The -10 sequence, called the -10 region, has the consensus sequence TATAAT. The -35 sequence has the consensus sequence TTGACA. These consensus sequences are recognized and bound by σ. Once this interaction is made, the subunits of the core enzyme bind to the site. The A–T-rich -10 region facilitates unwinding of the DNA template, and several phosphodiester bonds are made. The transcription initiation phase ends with the production of abortive transcripts, which are polymers of approximately 10 nucleotides that are made and released.


View Transcription (video – Walter & Eliza Hall) to see the first part of transcription and the base sequence repetition of the TATA box.

Elongation and Termination in Prokaryotes

The transcription elongation phase begins with the release of the σ subunit from the polymerase. The dissociation of σ allows the core enzyme to proceed along the DNA template, synthesizing mRNA in the 5′ to 3′ direction at a rate of approximately 40 nucleotides per second. As elongation proceeds, the DNA is continuously unwound ahead of the core enzyme and rewound behind it. The base pairing between DNA and RNA is not stable enough to maintain the stability of the mRNA synthesis components. Instead, the RNA polymerase acts as a stable linker between the DNA template and the nascent RNA strands to ensure that elongation is not interrupted prematurely.

Prokaryotic Termination Signals

Once a gene is transcribed, the prokaryotic polymerase needs to be instructed to dissociate from the DNA template and liberate the newly made mRNA. Depending on the gene being transcribed, there are two kinds of termination signals. One is protein-based and the other is RNA-based. Rho-dependent termination is controlled by the rho protein, which tracks along behind the polymerase on the growing mRNA chain. Near the end of the gene, the polymerase encounters a run of G nucleotides on the DNA template and it stalls. As a result, the rho protein collides with the polymerase. The interaction with rho releases the mRNA from the transcription bubble.

Rho-independent termination is controlled by specific sequences in the DNA template strand. As the polymerase nears the end of the gene being transcribed, it encounters a region rich in C–G nucleotides. The mRNA folds back on itself, and the complementary C–G nucleotides bind together. The result is a stable hairpin that causes the polymerase to stall as soon as it begins to transcribe a region rich in A–T nucleotides. The complementary U–A region of the mRNA transcript forms only a weak interaction with the template DNA. This, coupled with the stalled polymerase, induces enough instability for the core enzyme to break away and liberate the new mRNA transcript.

Upon termination, the process of transcription is complete. By the time termination occurs, the prokaryotic transcript would already have been used to begin synthesis of numerous copies of the encoded protein because these processes can occur concurrently. The unification of transcription, translation, and even mRNA degradation is possible because all of these processes occur in the same 5′ to 3′ direction, and because there is no membranous compartmentalization in the prokaryotic cell ((Figure)). In contrast, the presence of a nucleus in eukaryotic cells precludes simultaneous transcription and translation.


View Transcription (video – ndsuvirtualcell) to see the process of prokaryotic transcription.

Section Summary

In prokaryotes, mRNA synthesis is initiated at a promoter sequence on the DNA template comprising two consensus sequences that recruit RNA polymerase. The prokaryotic polymerase consists of a core enzyme of four protein subunits and a σ protein that assists only with initiation. Elongation synthesizes mRNA in the 5′ to 3′ direction at a rate of 40 nucleotides per second. Termination liberates the mRNA and occurs either by rho protein interaction or by the formation of an mRNA hairpin.

Free Response

If mRNA is complementary to the DNA template strand and the DNA template strand is complementary to the DNA nontemplate strand, then why are base sequences of mRNA and the DNA nontemplate strand not identical? Could they ever be?

DNA is different from RNA in that T nucleotides in DNA are replaced with U nucleotides in RNA. Therefore, they could never be identical in base sequence.

In your own words, describe the difference between rho-dependent and rho-independent termination of transcription in prokaryotes.

Rho-dependent termination is controlled by the rho protein, which tracks along behind the polymerase on the growing mRNA chain. Near the end of the gene, the polymerase stalls at a run of G nucleotides on the DNA template. The rho protein collides with the polymerase and releases mRNA from the transcription bubble. Rho-independent termination is controlled by specific sequences in the DNA template strand. As the polymerase nears the end of the gene being transcribed, it encounters a region rich in C–G nucleotides. This creates an mRNA hairpin that causes the polymerase to stall right as it begins to transcribe a region rich in A–T nucleotides. Because A–U bonds are less thermostable, the core enzyme falls away.

A fragment of bacterial DNA reads:

3’ –TACCTATAATCTCAATTGATAGAAGCACTCTAC– 5’

Assuming that this fragment is the template strand, what is the sequence of mRNA that would be transcribed? (Hint: Be sure to identify the initiation site.)

By examining the DNA sequence, we can see that there is a -10 consensus sequence near the 3’ end of the fragment. If we then count downstream, the +1 initiation site is the T immediately following the sequence AAT. This means the DNA fragment that will serve as the template for transcription has the sequence TGATAGAAGCACTCTAC. The mRNA made from this template will have complimentary base pairing with uracil (U) instead of thymine (T). This gives us ACUAUCUUCGUGAGAUG as the transcribed mRNA sequence.

Glossary


Study Notes on Transcription in Prokaryotes | Cell Biology

In prokaryotic organisms, transcription occurs in three phases known as initiation, elon­gation and termination with the help of single RNA polymerase.

RNA is synthesized by a single RNA polymerase enzyme which contains multiple polypeptide subunits.

In E. coli, the RNA polymerase has five subunits:

two a, one p, one P’ and one a subunit (α2ββ’σ). This form is called the holoenzyme. The σ subunit may dis­sociate from the other subunits to leave a form known as the core enzyme.

These two forms of the RNA polymerase have different roles in transcription. The σ subunit is required for the inter­action with the sigma factor. The sigma factor recognizes the start signal of DNA and directs the binding of the enzyme to the initiation site on DNA template. The binding of RNA polymerase to DNA involves the β subunit.

Initiation involves binding of RNA polymerase to the promoter site.

Promoter Site for Initiation:

Transcription cannot start randomly but must begin specially at the start of a gene. Signals for the initiation of transcription occur in the promoter sequence which lies directly upstream of the transcribed sequence of the gene.

The promoter contains specific DNA sequences that act as points of attachment for the RNA polymerase. The exact sequences can vary between promoters but ail conform to an overall pattern known as the consensus sequence.

In E. Coli, two sequence elements, -10 sequence and -35 sequence, are recognized by the RNA polymerase. The consen­sus -10 sequence, also called the “Pribnow box” is TATAAT and the consensus -35 sequence, also called the “recognition sequence” is TTGACA. (Fig. 16.2).

The σ subunit of the RNA polymerase is responsible for recognizing and binding the promoter, probably at the -35 box. In the absence of the σ subunit, the enzyme can still bind to DNA but binding is more random.

Initiation of RNA Synthesis:

When the enzyme binds to the promoter, it initially forms a closed promoter complex in which the promoter DNA remains as a double helix. The enzyme covers about 60 base pairs of the promoter including the -10 and -35 boxes.

To allow trans­cription to begin, the double helix partially dissociates at the -10 box, which is rich in weak A-T bonds, to give an open promoter complex. The σ subunit then dissociates from the open promoter complex leaving the core enzyme. At the same time the first two ribonucleotides bind to the DNA, the first phosphodiester bond is formed and transcription is initiated (Fig. 16.3).

During elongation, the RNA polymerase moves along the DNA molecule, melting and unwinding the double helix as it progresses. The enzyme adds ribonucleotides to the 3′ end of the growing RNA molecule with the order of addition determined by the order of the bases on the template strand.

In most cases, a leader sequence of variable length is transcribed before the coding sequence of the gene Is reached. Similarly, at the end of the coding sequence, a noncoding trailer sequence is trans­cribed before transcription ends. During trans­cription, only a small portion of the double helix is unwound at any one time.

The unwound area contains the newly synthesized RNA base-paired with the template DNA strand and extends over 12-17 base. The unwound area needs to remain small because unwinding in one region necessitates over winding in adjacent regions and this imposes strain on the DNA molecule.

To over­come this problem, the RNA is released from the template DNA as it is synthesized allowing the DNA double helix to reform (fig. 16.4).

Chain elongation takes place by addition of activated ribonucleoside triphosphates (ATP, UTP, GTP and CTP) to one strand of the DNA template. For each nucleotide added to the growing RNA chain, pyrophosphate (PPi) is given off. This is rapidly hydrolysed to inorganic phos­phate (Pi).

The synthesis of the RNA chain is energized by expenditure of energy in the form of pyrophosphates. For each nucleotide mono­mer added to the chain, two high energy phos­phates are expended.

The entire reaction may be summarized as follows:

Elongation of the RNA chain takes place by means of the core enzyme which moves along the DNA template. During transcription, RNA is synthesized by the polymerization of ribonu­cleotide triphosphate subunits (ATP, UTP, GTP, CTP).

The 3′-OH of one ribonucleotide reacts with the 5′ phosphate of another to form a phosphodiester bond. The transcript is synthe­sized in the 5′ →3′ direction but because the chain must be antiparallel for base pairing, the template strand runs in the opposite, 3’→5′ direction.

RNA Chains Grow in the 5’→3′ Direction:

If RNA chains are synthesized in the 5’→3′ direction, then the first nucleotide should have triphosphate group (P

P). If, on the other hand, the chain grows in the 3’→5′ direction, then the triphosphate group would be on the nucleotide at the growing end. It has been found that the triphosphate group is attached to the first nucleotide at its 5′ end and a free hydroxyl group at the 3′ end.

This shows that growth takes place in the 5′ →3′ direction.

Only One DNA Strand of a Gene Transcribes mRNA:

In double stranded DNA, a given gene is transcribed from one of the two strands. The transcribed RNA is complementary to only one of the two strands. All the transcribed genes need not, however, be on one strand of the DNA double helix.

One gene may transcribe mRNA from one strand while another transcribes from the other strand. The two strands or the double helix are called the template and the non-template strands. RNA is produced using the template strand and the RNA molecule synthe­sized is a copy of the non-template strand (Fig. 16.5), also named the sense (+) strand or the coding strand. The RNA molecule synthesized is called a transcript.

The termination of transcrip­tion occurs non-randomly and takes at specific points after the end of the coding sequence. In E. coli, termination occurs at sequences known as palindromes. These are symmetrical about their middle such that the first half of the sequence is followed by its exact complement in the second half.

In single-stranded RNA molecules, this feature allows the first half of the sequence to base pair with second half to form what is known as a stem-loop structure (Fig. 16.6). These appear to act as signals for termina­tion. In some cases, the stem-loop sequence is followed by a run of 5-10. As in the DNA which form weak A-U base pairs with the newly syn­thesized RNA.

It is thought that the RNA poly­merase pauses just after the stem-loop and that the weak A-U base pairs break causing the trans­cript to detach from the template.

In other cases, the run of As is absent and a different mechanism occurs based on binding of a protein called Rho (p) which disrupts base-pairing between the tem­plate and the transcript when the polymerase pauses after the stem-loop. The termination of transcription involves the release of the transcript and the core enzyme which may then re-associate with the o subunit and go on to another round of transcription.


Elongation

As elongation proceeds, the DNA is continuously unwound ahead of the core enzyme as the hydrogen bonds that connect the complementary base pairs in the DNA double helix are broken (Figure 2). The DNA is rewound behind the core enzyme as the hydrogen bonds are reformed. The base pairing between DNA and RNA is not stable enough to maintain the stability of the mRNA synthesis components. Instead, the RNA polymerase acts as a stable linker between the DNA template and the newly forming RNA strand to ensure that elongation is not interrupted prematurely.

Figure 2 During elongation, RNA polymerase tracks along the DNA template, synthesizes mRNA in the 5′ to 3′ direction, and unwinds then rewinds the DNA as it is read.


Prokaryotic Termination Signals

Once a gene is transcribed, the prokaryotic polymerase needs to be instructed to dissociate from the DNA template and liberate the newly made mRNA. Depending on the gene being transcribed, there are two kinds of termination signals. One is protein-based and the other is RNA-based. Rho-dependent termination is controlled by the rho protein, which tracks along behind the polymerase on the growing mRNA chain. Near the end of the gene, the polymerase encounters a run of G nucleotides on the DNA template and it stalls. As a result, the rho protein collides with the polymerase. The interaction with rho releases the mRNA from the transcription bubble.

Rho-independent termination is controlled by specific sequences in the DNA template strand. As the polymerase nears the end of the gene being transcribed, it encounters a region rich in C–G nucleotides. The mRNA folds back on itself, and the complementary C–G nucleotides bind together. The result is a stable hairpin that causes the polymerase to stall as soon as it begins to transcribe a region rich in A–T nucleotides. The complementary U–A region of the mRNA transcript forms only a weak interaction with the template DNA. This, coupled with the stalled polymerase, induces enough instability for the core enzyme to break away and liberate the new mRNA transcript.

Upon termination, the process of transcription is complete. By the time termination occurs, the prokaryotic transcript would already have been used to begin synthesis of numerous copies of the encoded protein because these processes can occur concurrently. The unification of transcription, translation, and even mRNA degradation is possible because all of these processes occur in the same 5′ to 3′ direction, and because there is no membranous compartmentalization in the prokaryotic cell ([link]). In contrast, the presence of a nucleus in eukaryotic cells precludes simultaneous transcription and translation.



Visit this BioStudio animation to see the process of prokaryotic transcription.


Overview of Transcription

Transcription is the first stage of the expression of genes into proteins. In transcription, an mRNA (messenger RNA) intermediate is transcribed from one of the strands of the DNA molecule. The RNA is called messenger RNA because it carries the "message," or genetic information, from the DNA to the ribosomes, where the information is used to make proteins. RNA and DNA use complementary coding where base pairs match up, similar to how the strands of DNA bind to form a double helix.

One difference between DNA and RNA is that RNA uses uracil in place of the thymine used in DNA. RNA polymerase mediates the manufacture of an RNA strand that complements the DNA strand. RNA is synthesized in the 5' -> 3' direction (as seen from the growing RNA transcript). There are some proofreading mechanisms for transcription, but not as many as for DNA replication. Sometimes coding errors occur.


1 Introduction

The pace at which whole genomes are sequenced is far ahead of that of genome sequence annotation, let alone a molecular understanding of the genome organization. Compared to experimental approaches, computational tools offer a faster option for sequence annotation reliability being largely decided by their careful design which in turn is dependent on a fair understanding of the molecular mechanisms concerned.

One of the most sought after task in genome annotation is identification of promoter regions, not only for the validation of predicted genes and identification of novel genes but also to understand transcriptomic regulatory networks (with respect to promoter location and architecture). During initial years of sequence accumulation, promoter identification was regarded as some trivial task because of certain experimentally derived sequence information rules ( Pribnow, 1975). However, with each passing year, it has emerged as one of the most daunting challenges in genome annotation. Genomic sequences in prokaryotes are highly adaptive within genomes and highly diversified across species to enable their survival in diverse and extreme conditions. This makes it difficult to detect conserved regulatory sites by sequence homology. Further, variability in the length of 5′ untranslated regions and presence of multiple transcriptional start sites (TSSs) does not make it obvious to look for promoters in the immediate upstream region of the annotated coding sequence. The complexity is further compounded by high gene densities, with neighbouring genes having generally very short intergenic spaces or in some cases have overlapping coding regions. Recent reports on pervasive transcription, where transcription can initiate from any location, have further deepened the mystery ( Wade and Grainger, 2014).

Many efforts have been made for developing efficient promoter prediction tools, based on different logics. Sequence based computational methods for promoter prediction have been moderately successful ( Dekhtyar et al., 2008 Jacques et al., 2006), though machine learning on huge training data of promoter sequences have led to some good predictive but highly genome specific tools ( de Jong et al., 2012 de Silva et al., 2011 Lai et al., 2019 Shahmuradov et al., 2016 Solovyev and Salamov, 2011 Umarov and Solovyev, 2017 Umesh et al., 2014). Lately, attention has also been given to capture the structural and/or energetic signals of promoter regions. Some of the structural and/or energetic properties used for promoter prediction are bendability, curvature, inter-base pair (BP) properties, free energy, A-philicity and stress induced DNA duplex destabilization among others ( Abeel et al., 2008 Florquin et al., 2005 Goñi et al., 2007 Rangannan and Bansal, 2010 Wang and Banham, 2006). DNA shape determined by four distinct features—minor grove width, propeller twist, roll and helical twist—have been found to be important determinants in the identification of transcription factor binding sites (TFBs) and TSSs ( Chiu et al., 2015 Levo et al., 2015 Zhou et al., 2013, 2015). These studies undoubtedly establish that three dimensional structure of DNA, beyond the primary sequence, is a determinant of protein–DNA binding specificity. Despite the many insights resulting from such studies over the years, development of a promoter prediction tool giving high performance is still awaited. It is apparent that the extant conceptual frameworks are not adequate yet to understand the nature of promoter signals fully. There is a need to develop new ideas/models to explain the fine tuning of structural and energy state of promoter sequence with respect to interacting protein/transcription factors/ligands.

Rationale: target was to develop a new model for capturing the structural and energy state of promoters. For structural characterization, instead of taking DNA shape (the cumulative effect of various parameters), we decided to use all the individual parameters (28 in all) involved in spatial organization of the bases and BP steps—backbone organization, inter-BP arrangements, intra-BP arrangements and the relative positioning with respect to BP axis. Rationale was to have a broad horizon for finding some novel information. Structural analysis was guided by some breakthrough studies in the analysis of nucleic acid structures during last few decades ( Beveridge et al., 2004, 2012 Dixit et al., 2005 Hassan and Calladine, 1995 Lavery et al., 2009, 2010 Olson et al., 1998 Pasi et al., 2014 Yanagi et al., 1991). For energy characterization, we relied on our own experience. We have been putting efforts to understand the language of DNA in terms of its energetics during the last 15 years and have obtained modest success in the process. In this series, we have reported that hydrogen bond, stacking and solvation energies show clear signatures of functional destinies of DNA sequences ( Dutta et al., 2006 Khandelwal et al., 2012, 2014 Khandelwal and Bhyravabhotla, 2010 Khandelwal and Jayaram, 2012 Singh et al., 2017 Singhal et al., 2008). So with 31 parameters (20 structural and 3 energy), we characterized 16 519 primary prokaryotic promoter sequences ( Mishra et al., 2018). It was found that all parameters yield a signature signal at/near TSS and the information for this signature signal is in-built in promoter sequences. Power of an accurate model lies not only in its ability to explain but also to predict precisely. If this structural and energetic model of promoters is accurate, it should lead to a reliable promoter prediction tool with an uncommon ability to efficiently and accurately predict the prokaryotic promoters irrespective of genome/species. With this aim, we directed our efforts to develop a suitable promoter prediction algorithm from the 31 parameters by using various statistical techniques. Here, we present a novel method, SEProm, which is applicable to all prokaryotes including archaea and performs considerably well in comparison with the available promoter prediction programmes. The tool is freely downloadable with easy to follow instructions.


Why are prokaryotic promoter sequences written 5' to 3', when transcription proceeds from 3' to 5'? - Biology

What is a gene? At one level, a gene is an ordered string of nucleotides that encodes a polypeptide. Such genes are "structural" genes. We also know that genes can also encode RNA, including messenger RNA (mRNA), transfer RNA (tRNA), ribosomal RNA (rRNA) as well as other RNA types. But something has to turn on and terminate gene expression, as well as regulate it. The regulatory sequences, which may be "promoters" or "enhancers/silencers" may be located far away from the coding regions. So now our view of a gene must include the idea of separate regions of a chromosome. What if the information as transcribed onto mRNA doesn't reflect the final protein until it is further modified? This is "posttranscriptional modification". Now the concept of a gene is becoming even cloudier. What if there are "overlapping" coding regions? Clearly our definition of a gene is not going to be a simple one.

Functionally, though, we can describe a gene as having a distinct coding region and a distinct regulatory region, the latter controlling the rate at which DNA is transcribed into mRNA. We will see that the regulatory units are composed of DNA "motifs" and that every motif will need to be occupied by a regulatory protein if a gene is to be regulated properly. Not only must there be the appropriate attachment of the protein, but the proteins all have to fit together with proteins binding to other nearby motifs in the way that jigsaw puzzle pieces fit together. And there is only one correct way for everything to fit together. So, it's not just a simple matter of DNA directing mRNA synthesis, which then directs protein synthesis proteins are intrinsically involved in the regulation of protein production at the level of transcription. This can get to be a nightmare if you begin to think about the regulation of the production of regulatory proteins.

We also have to consider an important difference between eukaryotes and prokaryotes with regard to the transcription of structural, or protein-coding, genes. In eukaryotes, the genes are transcribed individually while in prokaryotes, genes with related functions ("operons") can be transcribed together. As an example, the Lac operon includes three protein-encoding genes as well as their control sequences. The operon is transcribed as a single unit as a "polycistronic mRNA". Eukaryotic structural genes are transcribed as monocistronic mRNA.

DNA-Directed RNA Synthesis

There are three steps that characterize DNA-directed RNA synthesis:

(1) Initiation by binding of the transcription apparatus to the DNA template

(2) Elongation of the mRNA chain

(3) Termination of the mRNA chain

The piece of mRNA that results from the direct transcription of the DNA that encodes a "gene" is called the "primary transcript" and it undergoes modification, sometimes quite extensively, before it can translate its message into protein.

The class of enzymes that synthesize RNAs are known as RNA polymerases. They are all multisubunit complexes that are present in all cells and they catalyze the reaction:

(RNA)n residues + 1 NTP === (RNA)n+1 residues + PPi

Pyrophosphate is irreversibly hydrolyzed to 2 Pi thus driving the reaction to the right. The individual nucleotides that are read off of the DNA template strand are transcribed into the nucleotides of the corresponding RNA, so the final result is a single-stranded polymer, namely the mRNA, whose nucleotides correspond exactly to the complementary nucleotides on the DNA strand with the exception that everywhere that a "A" appears in the DNA template strand, a "U" appears in the mRNA. (The possible NTPs, then, are ATP,CTP,GTP,UTP.)

Transcription in Prokaryotes

The most studied RNA polymerase is that from E.coli, so we will study it as the prototype of the RNA polymerases. The holoenzyme is a 449 kD protein composed of a "core enzyme" and a " s -subunit", and the entire complex is denoted (core) s. The core enzyme directs the polymerization reaction, and it has 4 subunits: core enzyme = a2bb ' w. T he inorganic ions Zn 2+ (two of them in the b ' subunit) and Mg 2+ are required for catalytic activity and the three-dimensional structure of the enzyme resembles a hand. The thumb of the hand can be envsioned as grasping a piece of B DNA that lies in a channel represented by the curved fingers and palm of the hand. This channel is cylindrical, with dimensions on the order of 25 A by 55 A. These dimensions allow a fit of about 16 base pairs of B DNA.

The "hand" structure appears in other enzymes that we will study, including DNA polymerase and reverse transcriptase. You can further study the hand structure of RNA polymerase by looking at T7 RNA polymerase (see PDB below).

We will look at transcription from the point of view of the gene, which we have already mentioned is a rather ambiguous entity. Nevertheless, it is clear that there must be a starting point for correct transcription to take place, and it is reasonable to include this as part of the gene, even though it does not get transcribed itself. So, the problem of initiation is really one of recognition of a starting point. But which of the two strands of the DNA serves as the template and how does the polymerase choose?

Either strand can serve as the template but the transcription always proceeds from the 5' end of a strand of DNA to the 3' end. The 3'-5' strand that serves as the template is called the "antisense" or noncoding strand and the 5'-3' strand (which has the same nucleotide sequence, with exception of "U"s for "T"s, as the subsequently transcribed mRNA) is the "sense" or "coding" strand. To be consistent and clear, we will use the convention that our description of position along a sequence of nucleotides will be from the point of view of the sense strand, as this is the same ordering as that of the mRNA that is transcribed. The part of the gene that serves as the initiation site is called the "promoter" and it is sought out by the RNA polymerase holoenzyme. The holoenzyme binds weakly to DNA, with a Kdissoc of about 10 -7 M, and this allows it to move along the antisense strand in search of the promoter. The s subunit is specific for its promoter sequence and tight binding of the holoenzyme occurs (Kdissoc of about 10 -14 M).

The promoter is recognized by an approximately 40 bp nucleotide sequence on the 5' side of the initiation site, and within this sequence are two "conserved" sequences. One of these is 6 bp in length and is centered about 10 bps upstream from the starting site of transcription. This is the "Pribnow Box" and it has a consensus sequence TATAAT . The other, less highly conserved, sequence is centered about 35 bp upstream and has a consensus sequence of TTGACA . The start site is indicated by the notation +1 and is almost always A or G .

RNA polymerase holoenzyme contacts the promoter at roughly the centers of the two regions (-10 and -35) and the core enzyme tightly binds to the duplex DNA. Its action is that of melting the double-stranded DNA along a sequence of about 11 bps, from -9 to +2. The s factor splits off as transcription begins.

It is the specific s factors within a cell that determine which genes will be transcribed. Thus individual cell types are characterized by their s factors.

Chain elongation proceeds in the 5'--> 3' direction, and the "transcription bubble" (the length of "melted" DNA) travels with the RNA polymerase. As a consequence, the unmelted DNA is overwound in front of the bubble and underwound behind the bubble. Topoisomerases then act to relax the positive and negative supercoils. The mRNA that is produced is hybridized for a short length to the DNA at the downstream position, and exists separate from the DNA as a "tail", the point of attachment being at the downstream end. The RNA polymerase does not fall off of the DNA as it is processing because of its relatively tight, but nonspecific, binding on both sides of the transcription bubble, stabilized by its "thumb" wrapping around the DNA. About 20 to 50 nucleotides are transcribed per second at 37 C and one nucleotide is incorrectly transcribed in about every 10 4 . As genes are repeatedly transcribed, this error rate is not too deleterious, especially when coupled with the fact that there are multiple codons ("synonyms") for each amino acid subsequently translated and that single amino acid substitution errors in a protein usually do not hinder its function.

Spontaneous termination of gene transcription is signaled by "termination sequences".In E.coli, the final signal to stop transcription is a series of 4 - 10 A-T base pairings with the A s on the template strand. For each A in this region, the mRNA transcript will have a U . Just upstream from this sequence is a region rich in G and C bases followed by a spacer of nucleotides and another region rich in G and C . The two G , C rich regions are such that one region can be superimposed upon the other by a symmetry operation of 180 o . This relationship of base pairs around a center of rotational symmetry is called a "palindromic sequence". The resulting string of nucleotides at the 3' end of the mRNA is such that a hairpin loop can form, the G s base-pairing with the C s and vice versa, and the As with the Us. The most terminal part of the 3' end is a series of U s followed by a hydroxyl group. As the loop is forming, the RNA polymerase pauses at the termination site. The terminal oligo- U tail, which is only weakly bound to the DNA template strand, is displaced by the non-template DNA strand. Now the mRNA strand is free of the DNA template. However, there are numerous other factors that influence the overall process of termination.

Nonspontaneous termination of transcription requires a "rho factor" protein, which also functions to improve the spontaneous termination efficiency. The rho factor recognizes a sequence on the growing mRNA chain, upstream from the termination site, after which it attaches and moves along the chain in the 5'-3' direction until it reaches the RNA polymerase that is paused at the termination site. The transcript is released from its template strand by the unwinding of RNA-DNA duplex by the rho factor.

Transcription in Eukarytes:

While very similar to that in prokaryotes, the "machinery" and control sequences of transcription in eukaryotes is much more complex, and there are numerous RNA polymerases.

Ribosomal RNA (rRNA) constitutes about 95% of all RNA and about 67% of the RNA in ribosomes. The remainder of RNA includes transfer RNA (tRNA), messenger RNA (mRNA) and other types present in smaller amounts, like "small nuclear" RNAs (snRNAs) involved in mRNA splicing and "guide" RNAs that are involved in editing of RNA. These latter two processes occur in the post-translation stage of the life cycle of eukaryotic mRNA. All RNAs are coded for by DNA, and the different types of RNA polymerase in eukaryotes reflect this and the fact that, in eukaryotes, translation of mRNA into DNA occurs outside of the nucleus.

Precursors of most rRNA are synthesized in nucleoli with the enzyme RNA polymerase I. Precursors of mRNA are synthesized in the nucleoplasm by RNA polymerase II while RNA polymerase III, also in the nucleoplasm, synthesizes precursors of 5S RNA, tRNAs and other RNAs found both in the nucleus and cytoplasm. Mitochondria have their own RNA polymerases, and these are analogous to chloroplast RNAs found in plants. We will focus on RNA polymerase II as it is the one involved in transcription in eukaryotes.

You can look at the structure of yeast RNA polymerase II (see PDB below) as we discuss its structure as a prototype. These are large, multisubunit enzymes, with some of the subunits being homologs of the a,b, and b ' subunits in the prokaryotic RNA polymerase. The overall shape of the enzyme is similar to that of the prokaryotic RNA polymerase ( and DNA polymerase), namely that of a hand with a "thumb" motif that flanks a channel big enough to contain a piece of B-DNA (about 25 A wide).

We did not yet consider the chemistry of the elongation of the mRNA chain, but we will do so here. The chains are elongated in the direction 5' --> 3' by nucleophilic attack of the 3' OH group of the growing chain by the a- phosphate of the incoming NTP.

As in prokaryotes, eukaryotic transcription begins by recognition of promoters. There are many copies of the rRNA genes that direct rRNA synthesis, all with almost identical sequences. This redundancy assures an adequate supply of rRNA which, as we mentioned previously, comprises about 95% of cellular RNA. The promoters for these almost identical genes are, therefore, identical, so RNA polymerase I must only recognize one promoter sequence. However, the RNA polymerase I is species-specific (RNA poly II and III are not species specific).

For promotion of mammalian rRNA,, there is a "core promoter element" that spans the region -31 to +6 (note that this overlaps a region of the gene that is transcribed) and an "upstream promoter element" that spans -187 to -107.

For transcription of genes by RNA polymerase III, the promoter is sometimes located in a segment within the transcribed part of the gene, between +40 and +80, but can also be partially upsteam or entirely upstream fro the start site.

RNA Polymerase II Promoters and Control Sequences

Promoter sequences for RNA polymerase II are diverse. We can divide these into two classes: those that are found in genes that produce proteins at about the same rate in all cells ( "constitutive enzymes") and those for genes whose production rates vary greatly from one cell type to another and depend upon the needs of a differentiated cell at a given time ("inducible enzymes").

Constitutive Gene Promoter Elements:

The GC Box : This is a region containing one or more copies of the sequence GGGCGG (or its complement) in a location upstream from the start site, and it is analogous to the prokaryotic promoter elements.

Other promoter elements are also found in the -50 to - 110 region upstream from the GC box.

Selectively Expressed Gene Promoter Elements:

The TATA Box : A region located at about -25 to -30 that is rich in the nucleotides "A" and "T" and that resembles the Pribnow Box (TATAAT). Genes can still be transcribed in the presence of a defective TATA box and it is thought that the TATA box is involved in choosing the transcription start site

The CCAAT Box : This is a sequence that is often found upstream to the TATA box, located at about -70 to -90. These bind RNA polymerase II as well as other proteins needed for initiation of transcription.

Control Sequences for Structural Genes:

Other regions of the chromosome, some far-removed from the start site, can affect the binding of RNA polymerase II to promoter elements. These gene elements are called "enhancers" and "silencers". Proteins called "activators" and "repressors" can bind to the enhancers and silencers , thus affecting polymerase binding to the promoters. Furthermore, the same protein can function as both an activator or a repressor, depending upon the specific interaction ("dual-acting" transcription factors).

Recruitment of RNA Polymerase II to the Promoter:

Eukaryotes do not have a simple protein that corresponds to the s factor in prokaryotes. Rather, there is a set of proteins that together perform the same function as the s factor, and these are the "general transcription factors" ("GTFs"). We have already looked at structures of transcription factors when we discussed DNA-protein interaction in a previous lecture. Otherwise, the general mechanisms of transcription initiation are similar.

There are 6 GTFs that are required for a low and invariant basal rate of transcription, and this rate can be increased by the participation of other protein factors. These GTFs form a "preinitiation complex" that begins when the "TATA binding protein" ("TBP") binds to the TATA box (if there is one) of a promoter. The specific sequence at which it binds identifies the transcription start site. As a result of this binding, the DNA is distorted by kinks at both ends of the TATA box. Other GTFs bind successively, followed by the binding of RNA polymerase. Finally, the remaining GTFs bind.

After TBP (which is a component of TFIID) binds, the sequence of binding is as follows:

TFIIH has two important enzyme activities. The first is an ATP-dependent helicase activity that assists the formation of an open complex and the second is a kinase activity that results in the phosphorylation of the largest subunit of RNA polymerase II at its C-terminal end. Now the transcription elongation process can begin, with the various GTFs (except TFIIF) dissociating from the complex as elongation occurs. TFIID remains bound to the promoter so that repeated transcription can occur as GTFs reassemble to form the preinitiation complex.

This discussion has focused on RNA polymerase II different transcription factors are needed for RNA polymerases I and III. However, all three require TBP.

Cells control the transcription of every gene individually. A unique combination of silencers and enhancers for each gene modulates the transcription rate. How do activator and repressor proteins that are bound far from the promoter influence that transcription of genes?

"Specificity protein 1" (Sp1) was the first human transcription factor that was found that could recognize a specific GC regulatory enhancer sequence. This protein has two interesting modules:

(1) A module of 3 zinc fingers at one end

(2) A module at the opposite end with 2 discrete segments rich in Gln.

Mutants that do not have the glutamine-rich end can bind to DNA but transcription is not stimulated. Therefore, the glutamine-rich end must need to bind to something else for transcription to occur, and these are the "coactivators". They are also called "TBP-Assicuated Factors" or "TAFs" and there are at least eight of them that are important to transcriptional activation. These are not basal factors (GTFs) and they do not bind to specific DNA sequences. Rather, they bind avidly to TBP and provide for multiple "docking sites" to the activators. In this sense, they are "adaptor molecules". A "toolkit" of such adaptor molecules provides for tremendous diversity of options to modulate transcription of a gene. So, expanding on our previous comparison of the preinitiation complex of GTFs to the prokaryotic s factor, a better comparison would be between the s factor and the entire complex of activator-coactivator-basal preinitiation complex. As to how this arrangement modulates of influences the rate of transcription, it is probably mediated primarily by distortion of DNA that facilitates the movement of RNA polymerase II along the coding region.

Latchman (TRENDS in Biochemical Sciences Vol. 26 No.4 April 2001) has pointed out the importance of the DNA binding site itself as playing a key role in transcriptional modulation. The same transcription factor can assume different conformations as a result if binding to different sites. The conformational changes are induced by the DNA-protein interaction, thereby increasing the flexibility of the spectrum of control of transcription, since one protein can act like an entire collection of proteins, each having its own effect (activation, inhibition or no effect).

To carry this one step further, one can imagine that a similar phenomenon can occur when coactivators bind to activators. Perhaps different conformational changes are similarly induced in the bound protein depending upon type of protein-protein interaction. Such conformational changes can then result in different ability to modulate transcription.

The activation domains of transcription factors are often glutamine-rich, but others are proline-rich or acidic. In some cases, hydrophobic residues are interspersed among the acidic or glutamine residues and are important for activation. Tjian (Cell, Vol. 77, 5-8, April 8, 1994) suggests that hydrophobic forces drive cohesion of activation domains with their targets and that specificity is achieved by the periodicity of the cohesive elements.

Genes are transcribed at measurable rates only if the correct activators are present and are able to overcome the effects of repressors.


Watch the video: Καρδιογράφημα Μέρος 1ο. ECG made easy Part 1 (May 2022).