9: Protein Conservation - Biology

9: Protein Conservation - Biology

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

Learning Objectives

At the end of this laboratory, students should be able to:

  • identify amino acids by their 1-letter code.
  • explain the differences between high and low scores on the BLOSUM 62 matrix.
  • use the BLASTP algorithm to compare protein sequences.
  • identify conserved regions in a multiple sequence alignment.

As species evolve, their proteins change. The rate at which an individual protein sequence changes varies widely, reflecting the evolutionary pressures that organisms experience and the physiological role of the protein. Our goal this semester is to determine if the proteins involved in Met and Cys biosynthesis have been functionally conserved between S. pombe andS. cerevisiae, species that are separated by close to a billion years of evolution. In this lab, you will search databases for homologs of S. cerevisiae sequences in several species, including S. pombe. Homologs are similar DNA sequences that are descended from a common gene. When homologs are found in different species, they are referred to as orthologs.

Homologs within the same genome are referred to as paralogs. Paralogs arise by gene duplication, but diversify over time and assume distinct functions. Although a whole genome duplication occurred during the evolution of S. cerevisiae (Kellis et al., 2004), only a few genes in the methionine superpathway have paralogs. Interestingly, MET17 is paralogous to three genes involved in sulfur transfer: STR1 (CYS3), STR2 and STR4, reflecting multiple gene duplications. The presence of these four distinct enzymes confers unusual flexibility to S. cerevisiae in its use of sulfur sources. The SAM1 and SAM2 genes are also paralogs, but their sequences have remained almost identical, providing functional redundancy if one gene is inactivated (Chapter 6).

Protein function is intimately related to its structure. You will recall that the final folded form of a protein is determined by its primary sequence, the sequence of amino acids. Protein functionality changes less rapidly during evolution when the amino acid substitutions are conservative. Conservative substitutions occur when the size and chemistry of a new amino acid side chain is similar to the one it is replacing. In this lab, we will begin with a discussion of amino acid side chains. You will then use the BLASTP algorithm to identify orthologs in several model organisms. You will perform a multiple sequence alignment that will distinguish regions which are more highly conserved than others.

As you work through the exercises, you will note that protein sequences in databases are written in the 1-letter code. Familiarity with the 1-letter code is an essential skill for today’s molecular biologists.

Protein Conservation and Variation Suggest Mechanisms of Cell Type-Specific Modulation of Signaling Pathways

Affiliations EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain, Universitat Pompeu Fabra (UPF), Barcelona, Spain, Institució Catalana de Recerca i Estudis Avançats (ICREA), Barcelona, Spain

Affiliations EMBL/CRG Systems Biology Research Unit, Centre for Genomic Regulation (CRG), Barcelona, Spain, Universitat Pompeu Fabra (UPF), Barcelona, Spain

The conservation pattern of short linear motifs is highly correlated with the function of interacting protein domains

Background: Many well-represented domains recognize primary sequences usually less than 10 amino acids in length, called Short Linear Motifs (SLiMs). Accurate prediction of SLiMs has been difficult because they are short (often < 10 amino acids) and highly degenerate. In this study, we combined scoring matrixes derived from peptide library and conservation analysis to identify protein classes enriched of functional SLiMs recognized by SH2, SH3, PDZ and S/T kinase domains.

Results: Our combined approach revealed that SLiMs are highly conserved in proteins from functional classes that are known to interact with a specific domain, but that they are not conserved in most other protein groups. We found that SLiMs recognized by SH2 domains were highly conserved in receptor kinases/phosphatases, adaptor molecules, and tyrosine kinases/phosphatases, that SLiMs recognized by SH3 domains were highly conserved in cytoskeletal and cytoskeletal-associated proteins, that SLiMs recognized by PDZ domains were highly conserved in membrane proteins such as channels and receptors, and that SLiMs recognized by S/T kinase domains were highly conserved in adaptor molecules, S/T kinases/phosphatases, and proteins involved in transcription or cell cycle control. We studied Tyr-SLiMs recognized by SH2 domains in more detail, and found that SH2-recognized Tyr-SLiMs on the cytoplasmic side of membrane proteins are more highly conserved than those on the extra-cellular side. Also, we found that SH2-recognized Tyr-SLiMs that are associated with SH3 motifs and a tyrosine kinase phosphorylation motif are more highly conserved.

Conclusion: The interactome of protein domains is reflected by the evolutionary conservation of SLiMs recognized by these domains. Combining scoring matrixes derived from peptide libraries and conservation analysis, we would be able to find those protein groups that are more likely to interact with specific domains.

Conservation Exercises

You may find NCBI’s Amino Acid Explorer helpful for this exercise.

1. Under the amino sequence below, write the same sequence using the 1-letter code. Met-Glu-Asn-Asp-Glu-Leu-Pro-Ile-Cys-Lys-Glu-Asp-Pro-Glu-Cys-Lys-Glu-Asp

2. What is the net charge of this peptide? (Assign -1 for each acidic amino acid and +1 for each basic amino acid. Add up the total charges.)

3. Using the Venn diagram above, propose a conservative substitution for:
Trp – His – Arg – Leu –

4. Write the name of a music group that you enjoy. Then transpose the name into an amino acid sequence written with the 3-letter code. Pass the amino acid sequence to a friend and have him/her decode it. (Note: the 1-letter code uses all of the alphabet, except B, J, O, U, X and Z).

Exercise 2 – The BLOSUM62 matrix
  1. Find the BLOSUM scores for the conservative substitutions that you suggested in Exercise 1. Does the BLOSUM data support your hypotheses?
  2. Find the two substitutions with the highest BLOSUM scores. In what ways are the biochemical properties of the substituted amino acid similar or dissimilar to the amino acid that it replaces?
  3. Find the three amino acids for which there is no evidence of amino acid substitutions that have occurred more frequently than predicted by chance alone. What special features do these amino acids have?
Exercise 3 – Using BLASTP
  1. Direct your browser to the NCBI BLAST . Choose Protein BLAST.
  2. Enter the NP_ number for S. cerevisiae protein that your group is studying.
  3. Choose the records to be searched. For the database, select reference proteins . For the organism, type Neurospora crassa . (This is taxid 5141 from the dropdown box.)
  4. Expand the algorithm parameters at the bottom of the page. We will use the default values for the word size (=3), threshold value (=10) and a gap penalty (=11). The search could be made more stringent by increasing the word size, threshold value or gap penalty. The search could be made less stringent by decreasing these values.
  5. Click BLAST and wait for the results to appear.
  6. Analyze the results page:
    • The graphic summary at the top gives you an instant overview about the extent and strength of the match with N. crassa sequences. Colors are used to distinguish alignments with different ranges of bit scores. The top line represents a match between the S. cerevisiae Metp protein and its closest N. crassa ortholog. There may be shorter and less significant matches with other N. crassa protein sequences.
    • The summary table provides the numerical data. Matches with an E-value of 1E-10 or less and total scores above 100 are likely to be significant.
    • Cursor down to see the actual alignment between the sequences. Dashes have been introduced to either the S. cerevisiae or N. crassa sequence where gaps interrupt the alignments. The center row summarizes the homology between the protein sequences. If an amino acid is conserved between the two species, its 1-letter code is shown. Plus signs indicate conservative substitutions, i.e. substitutions with BLOSUM values of 1 or more.
    • Record your XP number: XP_______________ (XP_ is a pre x used to identify “modelled” sequences, in this case a protein sequence identified by sequence analysis but not necessarily functional confirmation. This is true of any gene, mRNA or protein with an X prefix).
    • Click on the link to the XP_ record for the N. crassa ortholog. Upon scrolling down, you should find the “CDS” designation providing you information about the gene that encodes your protein.
    • Find the NCU number, this is the N. crassa database designation for the gene which encodes your protein. Record the NCU number (you’ll need it later today) ______________
    • Also note if your gene has a gene name, for instance the N.crassaMET2 homolog is known as MET5. Not all genes have been named (more fertile ground for us to characterize!) Does your gene have a name, if so record here: ___________________
Exercise 4 – Multiple sequence alignments

BLASTP gives a pairwise alignment of sequences that is very useful for identifying homologs. Multiple sequence alignments compare a larger number of sequences simultaneously. By comparing a larger number of sequences over a wider evolutionary range, multiple sequence alignments allow researchers to identify regions of a protein that are most highly conserved, and therefore, more likely to be important for the function of a protein. In this exercise, we will study conservation of protein sequences in a number of model organisms that are widely used in genetic studies. The genomes for model organisms have been sequenced, and techniques for genetic analysis are well-developed. In addition, database and clone resources are available to support research with model organisms. The organisms below have been selected because they represent important branches of evolution and because they are potential candidates for future research in this course.

Bacteria – these represent two major divisions of the bacteria

  • Escherichia coli strain K-12 (gram negative K-12 is the standard laboratory strain)
  • Bacillus subtilis strain 168 (gram positive reference strain)

Eukaryotes – model organisms

  • Saccharomyces cerevisiae – needs to be included in trees and alignments!
  • Neurospora crassa
  • Arabidopsis thaliana – thale cress model organism for flowering plants
  • Caenorhabditis elegans – nematode model organism used in developmental studies
  • Mus musculus – laboratory mouse
Collect the sequences and BLAST data

The first step in a multiple sequence alignment is to collect the sequence data and analyze the BLASTP data that compare the sequences with the S. cerevisiae sequence. We will be using the reference sequences for the organisms, which begin with a NP___ number. Since you already know how to find NP____ records and use BLASTP, we will take some shortcuts to finding the remaining numbers and BLASTP statistics. For the eukaryotic sequences, we will use BLASTP data that are already available in NCBI’s Homologene database at NCBI (Sayers et al., 2012). The accession numbers for the bacterial species will be available on Canvas and in the lab.

Click on Release Statistics to see the species that have been included in the BLASTP searchers. Enter the name of your gene into the search box. This brings up the various Homologene groups that have a gene with that name. If search brings you to a page with more than one Homologene group list, click on the Homologene group that contains the S. cerevisiae gene.

Record the accession number for the Homologene group:

The top line of a Homologene record provides the accession number and summarizes the taxonomic distribution of homologs in eukaryotes (“Gene conserved in _________”) A narrowly conserved protein might only be found in the Ascomycota, while a widely-distributed protein would be found in the Eukaryota.

What phylogenetic divisions have homologs of your gene?

The left column of each Homologene record has links to comprehensive gene summaries prepared by NCBI curators. The right column has links to the NP___ records and a graphic showing conserved domains in the homologs. (Domains area noted with different colors.)

How many domains are found in the S. cerevisiae protein? Are the domains equally well-conserved between species?

Record the NP___ or XP ___ numbers for homologs of your S. cerevisiae Metp protein in N. crassa, A. thaliana, C. elegans and M. musculus. Add the NP__ or XP ___numbers for E. coli and B. subtilis homologs from the posted data sheet. (Some bacterial records may have XP__ or ZP___ prefixes, because the proteins have not been studied experimentally.) If you have less than five entries, e.g. the protein is narrowly restricted to Ascomycota, add two additional species from the Homologene group that contains your homolog.

NOTE: Does the N. crassa ortholog of your MET gene have a different name? You will need this information later in this chapter.

Next, perform a pairwise BLASTP alignment for each sequence against the S. cerevisiae sequence. Collecting BLASTP data is easy with Homologene: Use the grey box on the lower hand side of the page to set up each BLASTP comparison. Record the total score, % coverage and E-value for each match.

In the next step, you will prepare a multiple sequence alignment using the sequence information in the NP___ or XP ___ records. Using the BLASTP data, it may be possible to exclude some sequences from further study. The best matches will have high total scores and % coverage (fraction of the two proteins that are aligned) and low E-values. For the rest of this assignment, exclude sequences where the total score is less than 100 and E-values are greater than 1E-10.

Prepare the multiple sequence alignment.

We will use the Phylogeny suite of programs to construct a multiple sequence alignment and phylogenetic tree. Phylogeny describes itself as providing “Robust Phylogenetic Analysis for the Non-Specialist.” You will be working with material at two different sites, so you need two operational browser pages. One browser tab should remain at NCBI, where you will retrieve records. Direct the other browser page to

  1. Under the heading Phylogeny Analysi s tab, select One Click . After you enter the data, your sequences will be automatically brought through multiple alignment and phylogenetic tree building algorithms. The advanced option on this page would allow you to adjust the parameters associated with each program. We will let Phylogeny make these decisions for us!
  2. Enter the protein sequence in FASTA format. To obtain a FASTA file, enter the NP__number into the search box of the NCBI Protein Database. (Alternatively, you can click to the NP_ record from the Homologene summary page.) The first sequence in your analysis should be the S. cerevisiae protein. Click the FASTA link at the upper left side of the NP record. Copy the title line, beginning with > and the entire amino acid sequence. Paste the FASTA sequence DIRECTLY into the Phylogeny text box. Repeat this step with each of the sequences that you would like to compare.
  3. Edit the title lines of the FASTA files to include ONLY the species name. (You will see why later!) Each FASTA title line must begin with a > symbol (bird-beak) and end with a hard return. These characters provide the punctuation for the computer. DO NOT use a text editor or work processor to edit the FASTA files, since these introduce hidden punctuation that interferes with the phylogenetic analysis.
  4. When you are finished, enter your email address (this is useful if you want to come back to your analysis in the next few days) and click the Submit button. Your results will be posted on a web page.
Export and print the multiple sequence alignment
  1. Click on the Alignment tab to view the multiple sequence alignment.
  2. Under outputs, ask for the alignment in ClustalW format. The Clustal W alignment appears on a new web page. Note that the bottom line of each cluster indicates if an amino acid is invariant at the position by an asterisk. The positions of conserved amino acids are indicated by colons in the bottom line.
  3. Right-click on the page and download the Clustal alignment with a new filename that makes sense to you. The page will download as a text file that you will open in Word or a text editor.
  4. Open the file in a word processor. Adjust the font size and page breaks so that sequences are properly aligned and all members of a cluster fit on the same page. Choose a non- proportional font such as Courier so that the amino acids line up properly.
  5. Print the file and check that the format is correct! Turn it in with the Phylogeny assignment.
Construct a phylogenetic tree.
  1. Click the Tree Rendering tab to access your phylogenetic tree.
  2. You may use the editing tools to alter the appearance of your tree. Pay particular attention to the legends in the “leaves” of the tree, which should have the species names.
  3. Download the file in a format of your choice. Print the file and turn it in with the phylogeny assignment.
Investigate N. crassa homologs
  • Return to NCBI Homologene. N. crassa should be one of the organisms listed for containing a homolog for your gene of interest. Under the gene accession, note the size of the homolo- gous protein in amino acids (aa): _______________________
  • Click on the organism name, N. crassa, to be taken to your gene’s NCBI page. Under “Genomic regions, transcrips & products” note the location of your gene on the N. crassa chromosomes. Pay particular attention to the presence of any introns, small pieces of DNA that are spliced out of RNA before it is translated. In the graphical depiction of your gene, an intron is denoted by an unfilled portion of the gene arrow. Additionally, in the section labelled “Genomic context”, you are given an exon count. One exon would indicate 0 introns, while 2 exons would indicate 1 intron and so on.
  • Does your gene have any introns? If so, how many? ________________ Introns are very common in N. crassa genes, and had to be considered when cloning gene sequences to be expressed in S. cerevisiae. In fact, S. cerevisiae is unable to process introns out of foreign DNA, therefore, previously processed RNA (called cDNA) was used as a cloning template for our genes of interest.
A Neurospora crassa database

Like S. cerevisiae, N. crassa is a model organism with its own large community of researchers. The Broad Institute, located in Cambridge, MA, directs genomic searches of many organisms, including N. crassa. Access this database here.

Pathways over Time: functional genomics in the introductory lab

This site includes class materials from BIOL2040 (formerly BI204) – Investigations in Molecular Cell Biology. The course introduces students to molecular cell biology within the context of a semester-long research project in functional genomics. In the Pathways over Time project, students study the evolutionary conservation of genes in methionine synthesis. Each chapter includes both theoretical background material as well as detailed experimental procedures. Chapters can be used alone or in combination, depending on the course. All of the experiments have been thoroughly tested with Boston College students. Learn more about the course and student outcomes from our paper in CBE-Life Science Education or from our presentation at the 2015 ASMCUE meeting.

Students and educators are welcome to use our class materials, which were developed with the generous support of the National Science Foundation. We have tried to design a flexible course, and we would love to hear how the materials are being used. Please contact Clare O’Connor or Doug Warner for more information.

Investigations in Molecular Cell Biology by Clare M. O’Connor is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

C. elegans as a model to study the role of the CxCxxC motif

Our analysis shows that, in contrast to D. melanogaster, C. elegans is an excellent model to study the role of the separase regulatory domain, because it is orthologous to that in humans. As expected, multiple residues, distributed throughout this domain (Additional file 4), are conserved among nematode separases, which are different from residues conserved among vertebrate separases. None of these conserved residues are in contact with the separase inhibitory chaperone, securin. The nature of this conservation remains unclear. None of the highly conserved residues appear to form a contact with another similarly conserved residue, as judged by the proximity of beta carbons on the separase structure (Additional file 5). However, no changes in these residues are observed in human allelic variants or sequenced cancer populations implying that mutations in these resides may be lethal (Additional file 6). More importantly, a mutation in the newly identified CxCxxC motif (C450Y) of the C. elegans separase (sep-1(e2406)) results in a temperature sensitive phenotype that leads to exocytosis defects [12, 33]. We identified multiple intragenic suppressors of sep-1(e2406) that exclusively introduce mutations to the regulatory domain [34]. These positions are not conserved, do not contact securin and are distributed throughout the regulatory domain (Additional file 4). Understanding effects of these mutations will require further investigation, but our finding that the regulatory domain of separase has distinct conserved elements strongly supports its functional importance. In the future, it will be of great interest to determine the active conformation of separase and to investigate the functional role of this motif. Our analysis also demonstrates the utility of studies in C. elegans in understanding separase regulation in humans.


Comparisons between P. falciparum and P. chabaudiorthologs

The rodent malaria species Plasmodium chabaudi was used as comparator species given its status in terms of genome completion. Protein and DNA sequences (Plasmodb version: 2009.03.24) were obtained from Forty orthologous pairs of sequences were chosen, fulfilling the following criteria no known or hypothesised role in drug resistance or host interaction, syntenic relationship between P. falciparum and P. chabaudi and > 75% coverage in the P. reichenowi ortholog [18] (used for measurement of divergence within hominid malaria parasites). Since sexual-stage genes are released from purifying selection in asexual culture (experienced by several of the isolates under study) [43] genes with no evidence of asexual expression in transcriptomic surveys [44, 45] were also excluded. In order to reflect the types of genes which are implicated in drug resistance, as well as to obtain a range of conservation levels across this reference set, the reference genes consisted of ATPases (8), secondarily active transporters (12), glycolytic enzymes (10) and enzymes involved in DNA and RNA processing (10)(Table 1). The Plasmodb gene model for PfL0590c (PfATPase4) was found to be inconsistent with published data based on cDNA the latter were used as coding sequence [23].

CLUSTALW alignments of orthologous protein sequences from P. falciparum and P. chabaudi were performed using the default settings of BioEdit. BLOSUM62 scores (reflecting conservation) were then calculated for each P. falciparum residue. Regions that could not be aligned between P. falciparum and P. chabaudi orthologs were defined as the gaps in BLASTP alignments of P. falciparum and P. chabaudi orthologs (BLOSUM62 matrix, gap penalties: existence 9, extension 2) A manual check of the protein alignment in BioEdit was also performed and on rare occasions where short alignments had been excluded by the BLASTP search these were retained. For residues where there was no aligned P. chabaudi residue, a conservation score of -5 was applied. Relatively small regions with no P. reichenowi coverage were also removed from analysis to ensure comparable denominators for inter- and intra-species comparison.

Annotation of polymorphisms and fixed differences

We analysed single-nucleotide polymorphisms (SNPs) derived from Plasmodb, based on available sequence for various P. falciparum strains from around the world generated by the Broad Institute [46], Wellcome Trust Sanger Institute [18] and NIH [47]. Fixed differences between P. falciparum 3D7 strain and P. reichenowi (Oscar strain) were also obtained from Plasmodb [18]. Radical amino acid substitutions were defined as those with BLOSUM62 matrix score < 0.

The challenge of identifying single nucleotide changes within a sequence that is undergoing frequent insertion-deletion polymorphism has been described [48]. In addition, we noted that although complex polymorphisms were said to be excluded in publications, the Plasmodb lists of SNPs within P. falciparum and fixed differences between P. falciparum 3D7 strain and P. reichenowi sometimes contained repetitive mutations within tandem repeats (confirmed by Pustell protein matrix, MacVector) that were clearly part of complex indel polymorphisms and hence not genuine SNPs. These regions contributed 27.5% of all SNPs among P. falciparum isolates and 9.4% of the interspecies divergence, and were excluded from both polymorphism and divergence analyses.

Calculation of positions and transitions

Calculation of synonymous and non-synonymous positions was undertaken for each P. falciparum ortholog using a standard substitution matrix (assuming equal mutation rates) with Jukes-Cantor correction [49]. Confidence intervals for dS were determined assuming a continuous distribution. Confidence intervals for dN/dS were determined using the delta method. Transitional bias was determined by studying synonymous fixed differences between P. falciparum and P. reichenowi orthologs occurring at amino acids encoded by four codons. Consistent with previous measurements on a chromosomal scale [50], synonymous sites made up 20.1% of all sites within the reference genes (See Table 4) with 8.9% of sites being 4-fold degenerate sites (nucleotide positions at which all mutations are synonymous). For synonymous differences at 4-fold degenerate sites, which we assume are selectively neutral, transitions made up 41.4% of changes and transversions 58.6%, consistent with a moderate transitional bias that would produce falsely low dN/dS ratios [51] (since transitional mutations are associated with degeneracy at many 2-fold degenerate sites). Taking this factor into account led to upwards revision of dN/dS ratios for divergence of between 12 and 16% according to the level of conservation the effect was greatest for non-aligned sequence where the adjusted dN/dS ratio rose to 0.64 (Table 1).

Studies of hydrophilicity and complexity

Hydrophilicity scores were measured by the Kyte-Doolittle index (window = 14). Low-complexity regions were defined using the SEG algorithm at its default parameters [52].

Drug-resistance chromosomal regions

P. chabaudi orthologs were again used to generate the conservation score. In the case of one gene (MAL8P1.111) there was no rodent malaria parasite ortholog as previously reported [53] and in consequence the syntenic P. vivax ortholog was used for comparison. For the single apicoplast gene rpl4 the partial P. chabaudi sequence PC103611.00.0 was available for generation of the cross-genus conservation score at sites of mutation. For all studies at drug-resistance regions a neighbourhood conservation score was used (averaging the individual conservation scores across a sliding, overlapping window of 9 residues). This allows for the possibility that drug-resistance mutations may occur at residues that have previously undergone conservative change within a wider area of conservation, thereby reducing stochastic loss of sensitivity. This also obviated the need for a specific step to identify non-aligned regions at genomic regions. Non-synonymous SNPs (nSNPs) between sensitive and resistant parasites were studied at each locus, spanning the drug-resistance gene in each case and extending outwards symmetrically until 10 nSNPs outside the drug-resistance gene itself had been documented in at least one pairwise comparison. All residues known to be intrinsic to resistance haplotypes, whether or not each individual residue has been shown to cause drug-resistance independently, were included. Chi-squared testing was undertaken testing whether the distribution of amino acid variants in terms of conservation level was the same as the distribution of total sequence, using 13 conservation levels (bins of 1), and hence 12 degrees of freedom. The test was performed first for drug-resistance genes and mutations, and then for the other genes and the mutations within them.

Conserved protein terminal

I was wondering what it meant for a protein/a protein domain/a protein terminal to be conserved.

The term 'conserved' in this sense refers to the amino acid sequence of a protein or a protein domain (a part of a protein). If a sequence is conserved, that means that means it can be found in multiple species that are distantly related. If a protein has a highly conserved amino acid sequence across two or more species, that means that it has not changed much since the species diverged from each other. For example, a protein (or gene) that is highly conserved between humans and chimpanzees has not changed for about 5 million years (the approximate time when humans and chimps diverged from their common ancestor). A protein or gene with a highly conserved sequence must have a very important function since it does not mutate very often.


Additional file 1: Table of p-values and alignment profiles for additional 37 intein, group I or group II host proteins. Arrows point to intron or intein positions. Blue dots indicate intein positions, green dots group I intron positions, and orange dots group II intron positions. (PDF 2 MB)

List of Introns found in Cytochrome C oxidase subunit I

Additional file 2: . The list contains all introns from species for which at least one of their cox1 gene introns were BLAST hits when introns from Podospora anserina (X55026) and Saccharomyces cerevisiae (V00694) cox1 genes were used as query sequences. (PDF 103 KB)


Additional file 3: Accession numbers for sequences used in protein alignment. Lists of accession numbers for each protein used for each conservation profile. (DOC 212 KB)


Additional file 4: Perl scripts used to calculate the conservation profiles. Perl scripts used to calculate conservation profiles. (ZIP 3 KB)


Table 1 contains the species from add. file 1 predicted to possess at least one widely conserved promoter in the plastome. Predictions are identical for their close relatives with a corresponding orthologous gene (not shown). Within flowering plants the promoter sequences are similar and well aligned, therefore we illustrate results on Arabidopsis thaliana and Spinacia oleracea only. The five positive predictions are described below. Our analyses suggest that widely conserved promoters are absent elsewhere in streptophyte plastomes.

Gene psbA(protein D1 of the photosystem II active center) in plastomes. Promoters of this chloroplast gene were experimentally studied in selected species, including Arabidopsis, mustard, and spinach [3, 12, 13], for which our predictions are in good agreement with the experiment. The algorithm predicted candidate conserved promoters upstream of this gene in most Streptophyta, primary and secondary endosymbionts, Bigelowiella natans from the Chlorarachniophyceae, and Cyanophora paradoxa from the Glaucocystophyceae (ref. to Fig. 1, psbA). The gene alignments are given in Fig. 1, per-site nucleotide frequency distributions are given in Fig. 2 (constructed with the Weblogo program [14]). We suggest that this ancient promoter with the consensus TTGACA-15-TGTwATAmT is ancestral for at least all Streptophyta. The linker between the boxes is usually 18 bases long, but is 17 bases in Cycas taitungensis, Adiantum capillus-veneris, Staurastrum punctulatum, Mesostigma viride and B. natans. Many predictions possess the 5'-extension (TG or TGTG) of the "-10" box, which enhances the promoter efficiency. In the gymnosperm C. taitungensis, the predicted "-35" box essentially differs from the alignment consensus and the bacterial-like promoter. The psbA promoter was not found in the hornworts Anthoceros formosae, although in other bryophytes it is highly conserved. In the early emerging alga Chlorokybus atmophyticus only the "-35" box was identified, while the complete promoter was found in M. viride. Two dodder species (Cuscuta gronovii, C. obtusiflora) with a largely reduced plastome also lack the psbA promoter, which, however is found in their close relatives (C. exaltata, C. reflexa) and most angiosperm plants. The lack of promoters correlates with the reduction of genomes: Cuscuta gronovii and C. obtusiflora do not photosynthesize and lack most of the photosynthetic genes. Although the psbA gene retains an open reading frame, it lacks the PEP-promoter and is probably poorly expressed compared to photosynthetic species.

Predicted promoters upstream of genes psbA , psbB , psbE , rbcL , psaA. In the cells of first column only first occurrences of each taxon name are given. In yellow are the promoter boxes and the 5'-extension of the "-10" box. Numbers are the distance to the start codon its location is given in the last column, prepended with "c" for complement sequences. In violet are the experimentally identified transcription initiation sites in Arabidopsis thaliana and Spinacia oleracea upstream of psbA, psbB, rbcL, psaA.

Watch the video: Conservation Biology (June 2022).


  1. Ivar

    Bravo, what excellent answer.

  2. Kerwyn

    In this something is and is the good idea. It is ready to support you.

  3. Bourn

    Hi. Admin, do you want a joke?

  4. Stock

    The matchless theme, very much is pleasant to me :)

  5. Fauk

    We are sorry, but it could give you more information.

  6. Jaques

    It is the excellent idea. It is ready to support you.

  7. Bardaric

    In it something is. Clearly, I thank for the help in this question.

Write a message