Can a regulator genes be more highly expressed when it has more places to bind?

Can a regulator genes be more highly expressed when it has more places to bind?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am doing some research on small cell lung cancer and, from what I have found, many tumors show high levels of ASCL1, which is a regulator for neuroendocrine cell differentiation. However, no papers propose a mechanism for the higher levels of ASCL1. Instead, many of them refer to ASCL1 target genes with "amplified" enhancers and propose that ASCL1 may bind more frequently to these enhancer regions.

In general, if there are more places for the regulator to bind on an enhancer, is it possible that this is sufficient for the regulator to be more highly expressed?

CircRNAs and cancer: Biomarkers and master regulators

Circular RNAs (circRNAs) are a novel class of regulatory RNAs that despite being relatively abundant have only recently begun to be explored. There are many thousands of genes that appear capable of producing circRNAs, however the function of all but a handful remain to be determined. What is emerging about these highly conserved molecules is that they play important roles in biology and cancer biology in particular. The most explored function of circRNAs is as master regulators of gene expression that act to sequester or ´sponge´ other gene expression regulators, in particular miRNAs. They have also been demonstrated to function via direct modulation of transcription, and by interfering with splicing mechanisms. Although generally expressed in low abundance when compared to their linear counterparts, they are often expressed in a tissue- and developmental stage- specific manner. Coupled with their remarkable resistance to RNAse activity due to a covalent closed cyclic structure, circRNAs show great promise as novel biomarkers of cancer and other diseases. In this review we consider the current state of knowledge regarding these molecules, their synthesis, function, and association with cancer. We will also review some of the challenges that remain to be resolved if this emerging class of RNAs are really to become useful in the clinic.

'Jumping genes' repeatedly form new genes over evolution

In the same way that Lego pieces can be arranged in new ways to build a variety of structures, genetic elements can be mixed and matched to create new genes, according to new research.

A long-proposed mechanism for creating genes, called exon shuffling, works by shuffling functional blocks of DNA sequences into new genes that express proteins.

A study, "Recurrent Evolution of Vertebrate Transcription Factors by Transposase Capture," published Feb. 19 in Science, investigates how genetic elements called transposons, or "jumping genes," are added into the mix during evolution to assemble new genes through exon shuffling.

Transposons, first discovered in the 1940s by Cornell alum and Nobel Prize-winner Barbara McClintock '23, M.A. '25, Ph.D. '27, are abundant components of genomes—they make up half of human DNA—and have the ability to hop and replicate selfishly in the genome. Some transposons contain their own genes that code for enzymes called transposase proteins, which cut and paste genetic material from one chromosomal location to another.

The study, which focused on tetrapods (four-limbed vertebrates), is important because it shows that transposons represent an important force in the creation of new genes during evolution. The work also explains how genes critical for human development were born.

"We think it's very likely this mechanism may extend beyond vertebrates and could be more of a fundamental mechanism that occurs in non-vertebrates as well," said first author Rachel Cosby, Ph.D. '19, a postdoctoral researcher at the National Institutes of Health. Cosby is a former graduate student in the lab of senior author Cedric Feschotte, professor in the Department of Molecular Biology and Genetics in the College of Agriculture and Life Sciences.

"You are putting the bricks in in a different way and you construct a whole new thing," Feschotte said. "We are looking at the question of how genes are born. The originality is that we are looking at the role of transposons in creating proteins with novel function in evolution."

In the study, the researchers first mined existing databases for genomes of tetrapods, because genomes for more than 500 species have been fully sequenced. Cosby and colleagues searched for combinations of DNA sequences known to be characteristic of transposons fused to host sequences to find good candidates for study. They then chose genes that evolved relatively recently—within tens of millions of years ago—so they could trace the history of the gene's development through the vertebrate tree of life.

Though genes fused with these transposases are relatively rare, the researchers found them all over the vertebrate tree of life. The researchers identified more than 100 distinct genes fused with transposases born in the past 350 million years along different species lineages, including genes in birds, reptiles, frogs, bats and koalas, and a total of 44 genes born this way in the human genome.

Cosby and colleagues selected four recently evolved genes and performed a wide range of experiments in cell culture to understand their functions. They found the proteins derived from these genes are able to bind to specific DNA sequences and turn off gene expression. Such genes are known as transcription factors and act as master regulator genes for development and basic physiology. One such gene, PAX6, is well studied, plays a key role as a master regulator in the formation of eyes in all animals and is highly conserved throughout evolution.

"If you put a PAX6 gene from a mouse into a Drosophila [fruit fly], it works," Feschotte said. Though others have proposed before that PAX6 is derived from a transposase fusion, the researchers in this study further validated the hypothesis.

Cosby and colleagues isolated one of these recently evolved genes in bats, called KRABINER, and then used CRISPR gene-editing technology to delete it from the bat genome and see what genes were affected, before adding it back in. The experiment revealed that when KRABINER was removed, hundreds of genes were dysregulated, and when they restored it, normal functioning returned. The protein expressed by the KRABINER gene bound to other related transposons in the bat genome, Cosby said.

"The experiment revealed that it controls a large network of other genes wired through the past dispersion of related transposons throughout the bat genome—creating not just a gene but what is known as a gene regulatory network," Feschotte said.

Award presentation by Richard Lifton

The very existence of life seems miraculous. The development from a single cell of a plant, an insect, or a human that each contains specialized cells and organs with radically different functions represents one of the great mysteries. We now understand that all life forms on earth use information encoded in DNA as the instruction manual for building an adult organism. Instructions are executed by copying segments of DNA, called genes, into RNA copies, that direct the production of the proteins that create a muscle cell, liver cell or neuron. This requires that different sets of genes are turned on, turned off, or tuned to the right levels in different cell types. Moreover, gene expression must also must be modulated to respond to periodic changes in the external environment.

Our first insights into how genes are selectively turned on and off came from studies of the bacterium E. coli by Jacob and Monod in the early 1960s. They established the paradigm that transcription factors- proteins that bind to specific DNA sequences- promote or inhibit the copying of DNA into RNA.

More Info

Nonetheless, there are striking differences between bacteria and eukaryotes—that is, plants, animals and fungi. For example, the human genome is nearly 1000-fold larger than E. coli’s. The extended DNA from one human cell is 6 feet in length, but is compressed into a nucleus that has a radius of only 3 microns, posing a serious organizational/management problem. Chromosomal DNA of eukaryotes, but not bacteria, is associated with a family of small proteins called histones. Research by Roger Kornberg, Tim Richmond and others showed that this DNA-protein complex comprises a string of beads—called nucleosomes—with each bead having a core of histone proteins with DNA wrapped around the outside. These beads are packed into higher order assemblies in chromosomes, with the length of the DNA fibers compressed up to 10,000-fold during cell division.

The uniformity of histone beads and their lack of preference for specific DNA sequences led to the wide presumption that these nucleosomes were simply the inert packaging material for DNA in the nucleus. Nonetheless, in the 1960s Vincent Allfrey had made the provocative observation that histone proteins could be chemically modified. He showed that acetylated histones were enriched in highly expressed parts of the genome, and depleted from segments that do not express genes. However, available tools at that time could not establish whether histone modification was a cause or consequence of gene expression, and these observations languished.

Enter Michael Grunstein, who in the 1980s set out to test the idea that histones played an active role in the regulation of gene expression. He studied baker’s yeast—S. cerevisiae—because this single celled fungus could be genetically manipulated. Grunstein engineered yeast in which histone production could be turned on and off at will. In 1987, Grunstein and colleagues reported that turning off production of histones led to nucleosome depletion amazingly, they found that this resulted in activation of expression of genes that were normally turned off, indicating for the first time that nucleosomes regulate gene expression in vivo.

He then asked a yet more ambitious question: do specific DNA mutations that alter histone proteins alter gene expression? In an elegant series of experiments, he first showed that deletion a short piece of one end one of the histone proteins, called H4, selectively eliminated the induction of expression of genes that are highly regulated by the availability of different nutrients in the environment. Importantly, this segment included all four sites in the protein that were acetylated. He then mutated these four sites so they could no longer be acetylated and showed that these mutations were sufficient to prevent the turning on of these genes.

Conversely, Grunstein further discovered mutations in histone tails that turned gene expression on by preventing the normal compaction of chromosomes into highly condensed segments called heterochromatin in which gene expression is turned off. He elucidated a beautiful biochemical mechanism showing how this heterochromatin spreads in continuity along the chromosome, explaining a well-described but mysterious phenomenon.

Grunstein’s pioneering work for the first time established the causal relationship between alterations of specific sites in histone proteins in the normal activation and repression of gene expression.

In parallel, David Allis was using biochemistry to isolate the enzymes that mediate histone modification. The levels of these enzymes were extremely low in cells, leading Allis to turn to Tetrahymena, an unusual protozoan in which gene expression occurs in nuclei in which chromosomal DNA is shattered into small gene-sized pieces which are then amplified to high copy number. These nuclei have high levels of gene expression with high levels of histone acetylation. Through an ingenious experimental approach Allis’ team in 1996 purified the first histone acetyl transferase. The sequence of this protein provided a spectacular surprise, revealing that it was closely related to a puzzling yeast protein, Gcn5p, which had been identified as a protein that did not bind DNA but was nonetheless required as a partner to different transcription factors to turn on expression of highly regulated genes. In short order, Allis’ team showed that this yeast co-activator, like its Tetrahymena counterpart, selectively acetylated the sites in histone tails that are acetylated in vivo.

These studies collectively implicated covalent modification of histone proteins in the normal regulation of gene expression, and set the field ablaze, drawing in ­­a large number of talented scientists. The resulting torrent of work has changed our understanding of regulated gene expression and revealed a previously unrecognized language in which different chemical modifications of specific sites in histone proteins have distinct consequences. For example, acetylation at one particular site promotes activation of gene expression while methylation at that same site inhibits gene expression. In contrast, methylation at a different site, working through a different adaptor protein promotes activation of expression, and phosphorylation of yet another site promotes the extreme compaction of chromosomes prior to cell division. Allis has played an important role in the discovery of many of these histone modifications and the mechanism of their effects. Importantly, these chemical modifications can persist, providing a cellular memory that sustains the differentiation of cell types, and can prime particular genes that have been transiently turned on, for example in response to injury, to be ready for prompt re-activation if injury at the same site recurs.

Grunstein and Allis played the key roles in opening up this robust field, and have played direct roles in many of these transformative discoveries.

The importance of these histone modifications has become increasingly clear with time. Some classic developmental abnormalities in the fruit fly Drosophila—in which one body part is transformed into another—proved to be caused by mutations that knock out histone modifying enzymes. And in the last several years these findings have extended to humans, where mutations in histone modifiers and readers of these modifications are the most frequent cause of congenital malformations such as congenital heart disease and are also a frequent cause of neurodevelopmental abnormalities such as autism. Furthermore, somatic mutations in dozens of different histone modifiers or at the sites in histones that they modify are drivers of a wide range of cancers. In one striking example, pediatric gliomas- a devastating brain cancer- are commonly attributable to a single recurrent histone mutation at a site normally methylated by a histone modifying enzyme. Allis’ team has shown that this mutant histone binds tightly to its modifying enzyme, sequestering it and, preventing it from doing its normal job.

All this work has spurred efforts to develop new therapeutics that target these histone-modifying enzymes, some of which are now in routine clinical use for example, inhibitors of enzymes that that remove acetyl groups from histones are beneficial in cancers including cutaneous T-cell lymphoma and multiple myeloma, and many more are in clinical investigation for diverse diseases.

Grunstein and Allis’ discoveries were original and unanticipated they have transformed our understanding of the regulation of gene expression. Their work was also courageous, devoting years of effort, overcoming daunting technical challenges in their research programs, while swimming against a strong current of an advanced field that did not foresee the need for a role for histone proteins in gene regulation. Additionally, it’s worth noting that Grunstein and Allis’ discoveries came from their insightful selections of simple experimental systems for study- yeast and Tetrahymena. Such simple systems have yielded countless fundamental insights into biology, reminding us that public support of curiosity-driven research continues to produce profound insights that ultimately impact our understanding of human health and disease. I am delighted to congratulate Michael and David on their spectacular scientific achievements they are exceptionally deserving of this year’s Albert Lasker Basic Medical Research Award.

Synthetic genomics

From the first International Conference of Synthetic Biology (SB1.0) organized by Massachusetts Institute of Technology(MIT) in 2004 to SB7.0 held in Singapore in 2017, synthetic biology has gradually entered the stage of rapid development [1]. During the early twenty-first century, whole genomes of many species have been sequenced, providing data support for DNA synthesis, which endows the living beings with new genetic traits. Synthetic genomics include plenty of techniques, such as chemical synthesis, genome design, assembly, and transplantation with emphasis on the design and synthesis of the whole genome. As progress moves far ahead of synthetic biology, more and more attentions are paid to the scientific research in synthetic genomics.

The development of DNA synthesis

Initially, DNA synthesis belongs to “replication synthesis”, because the biologically active genome is chemically synthesized without major changes to its native genomic sequence (Fig. 3) [45,46,47,48,49,50,51,52,53,54,55,56,57,58]. The main task in this period was to reconstruct the viral genome using reverse genetics. In 2002, Cello et al. assembled the full-length poliovirus complementary DNA (cDNA) using chemically synthesized oligonucleotides of plus and minus strand polarity with the length of 7.5 kb [45]. One year later, Smith et al. chemically synthesized the complete infectious genome of bacteriophage with the length of 5386 bp from a single pool of assembled oligonucleotides [46]. Recently, researchers began to challenge the synthesis of larger scale genomes from a few kb to a few hundred kb owning to the fast-development of synthetic genomics [47, 48]. In 2008, Gibson et al. synthesized the genome of the well-known minimal prokaryotic cell, M. genitalium, whose genomes is 582 kb [49]. Two years later, the design, synthesis, and assembly of Mycoplasma mycoides JCVI-syn1.0 genome with 1.08-mega-base pairs (1.08 Mbp) were completed [51]. Since then, synthetic genomics entered the “design and synthesis” era, and more genome structures were designed and reconstructed [50]. The representative work includes minimization of Mycoplasma genome, recoding E. coli genome and artificial synthesis of S. cerevisiae chromosome. Especially, in 2016, Venter et al. constructed a smaller genome compared with that of any autonomously replicating cell found in nature by minimizing the synthetic genome of M. mycoides JCVI-syn1.0 (1079 kbp) to the JCVI-syn3.0 (531 kbp, 473 genes) [54]. Meanwhile, Ostrov et al. re-encoded the genome of E. coli and successfully designed a new genome, which contains only 57 codons [59]. This work underscored the feasibility of rewriting genomes and established a framework for large-scale design, assembly, troubleshooting, and phenotypic analysis of synthetic organisms. In 2009, Dymond et al. proposed the Synthetic Yeast Genome Project (Sc2.0 Project) [52]. Up until now, six chromosomes of S. cerevisiae have been successfully designed and synthesized [53, 55, 57, 58, 60, 61].

Development of the synthetic genomics

Currently, the prevailing synthetic methods for DNA synthesis are PCR-based and ligase-based DNA. For better development of synthetic genomics, new synthesis and assembly techniques are needed to meet the increasing demands, and many research groups are devoted to this topic. For example, a PCR-based and two-step DNA synthesis (PTDS) method was modified to synthesize long segments of DNA, which involved the synthesis of individual fragments of interested DNA [62]. 60 mer oligonucleotides with 20 bp overlap can produce 500 bp DNA fragment, and PCR amplification can assemble the entire sequence of interested DNA with two outermost primers. This modified method can produce DNA fragments of 5–6 kb with high G+C contents within 5–7 days [63]. Then, a method of PCR-based accurate synthesis (PAS) of long DNA sequences was assembled. In addition, purification using PAGE and error correction using overlap-extension PCR were added based on the PTDS method, resulting in longer DNA fragments of 12 kb within 7 days [64].

Directed evolution in vitro is another powerful molecular tool for designing new biological parts [65]. DNA synthesis methods are usually used to investigate the function of interested enzymes. Semi-rational design and high-throughput screening together with DNA shuffling and screening were applied for directed in vitro evolution to investigate the function of reporter enzymes [66, 67]. The rapid growth of synthetic biology in DNA synthesis includes the improvements of longer fragments, higher accuracy, and new capabilities, which not only reads, but also edits and rewrites genes and cells of organisms.

Minimal genomes

Another significant part of synthetic biology is minimal genomes, in which only a minimal DNA sequence is contained for the life maintenance. The ideal minimal genome will only be composed of genes that are essential for the survival of organisms under defined conditions. The non-essential genes and non-encoding regions are usually eliminated, such as genetic elements of alternative metabolic pathways or those encoding responses to stress situations [54]. If microbial chassis only contains the minimum of essential genes, i.e., minimal genome, the production of desired compounds would be more economically feasible. It is believed that minimal cells built on minimal genomes can serve as efficient platforms with new functions. In contrast to traditional recombinant technology, the technique of minimal genomes is far more advanced and directed, rather than the idea of fiddling with a handful of genes to tune one or two metabolic pathways. Most of them adopted top–down approaches for mapping essential and non-essential genes.

In general, natural product discovery was a ‘luck by chance’ event, which was driven by bioactivity-guided chemical screening. Nowadays, novel secondary metabolites can be discovered faster with greater functional abilities via genome mining. Many whole genome-sequencing projects have revealed a number of gene clusters specialized in production of novel chemicals [68]. For example, proteo-bacteria of Burkholderia, Photorhabdus, and Xenorhabdus species are being explored for full extent of secondary metabolite biosynthetic capabilities [69, 70]. Unique gene clusters can be uploaded as heterologous pathways in suitable expression host or chassis for the production of value-added compounds. To this end, it is important to understand the genome structure of the chassis in terms of modularity and essentiality. Thus, the minimal essential gene sets can be identified or filtered.

RANKL in immunity

RANKL signaling is crucial for the development of various organs, including immune organs. In fact, RANKL was first reported as an activator of dendritic cells expressed by T cells [4]. The immune organs consist of immune cells and stromal cells. Studies using mice have shown that several of these cell types express RANKL or RANK, transducing signals for the development and function of the immune system as described below.

Bone marrow formation

The bone marrow is one of the primary lymphoid organs, where lymphocytes emerge and mature. Both T and B cells are born in the bone marrow and the latter cells mature in this organ. Other types of hematopoietic cells including erythrocytes reside in this space as well. Because the bone marrow space is preserved by osteoclastic bone resorption within the bone, RANKL functions as a maintainer of the bone marrow and its indwelling immune cells. In most types of osteopetrosis, the patients exhibit mild to severe hematological defects, which can lead to anemia, hemorrhage, and severe or recurrent infectious diseases [55, 56].

Thymus development

The thymus is another primary lymphoid organ where T cell progenitors undergo the positive and the negative selections for acquiring the property to distinguish non-self from self-antigens, thereby establishing self-tolerance. During negative selection, cells that strongly interact with the self-antigens expressed on major histocompatibility complex (MHC) molecules undergo apoptosis [57]. In this process, these antigens, including a portion of the tissue-specific antigens (TSAs), are expressed by medullary thymic epithelial cells (mTECs) under the control of a crucial factor, autoimmune regulator (Aire) [58, 59]. RANKL is a key cytokine for inducing Aire expression in these epithelial cells, and it is provided by lymphoid tissue inducer (LTi) cells, single positive thymocytes, Vγ5 + γδ T cells, and invariant natural killer T (iNKT) cells (Fig. 2a) [60,61,62,63]. Because thymic development is normal in mice deficient in soluble RANKL, it is suggested that membrane-bound RANKL in these cells induces mTEC development [17].

RANKL in immunity. a RANKL–RANK interaction in the development of the thymus. RANKL is produced by LTi cells, T cells, and iNKT cells and interacts with the RANK expressed on mTECs. This interaction induces the expression of Aire, resulting in the expression of TSAs on MHC molecules. The TSA–MHC complex is necessary for negative selection, the key process for establishing self-tolerance. b RANKL–RANK interaction in the lymph node development. Lymph node development begins with the interaction between LTi cells and LTo cells. LTα1β2 is expressed by LTi cells and interacts with LTβR on LTo cells, which in turn leads to the expression of RANKL on LTo cells. The expressed RANKL stimulates LTi cells to induce more LTα1β2, forming a positive feedback loop. With the stimulation of LTα1β2, some LTo cells mature into MRCs. The RANKL on LTo cells and MRCs binds to the RANK on lymphatic endothelial cells, resulting in the recruitment of macrophages. c RANKL–RANK interaction in the gastrointestinal tract. (Left) ILC3s interact with each other through RANKL and RANK. The interaction leads to the decrease of the proliferation and IL-17/IL-22 production of these cells, resulting in the suppression of excessive inflammation. (Right) RANKL–RANK interaction in M cell development. Mesenchymal cells beneath the epithelium of the gastrointestinal tract express RANKL and interact with RANK–expressing epithelial cells. These cells differentiate into morphologically and functionally unique cells called M cells. These cells enable the transfer of antigens from the lumen of gastrointestinal tract to DCs, leading to IgA production. d RANKL–RANK interaction in the skin. Keratinocytes express RANKL upon UV–irradiation. The RANKL binds to LCs in the skin. These LCs contribute to the generation of Treg cells, which decrease the skin inflammation and resolution of dermatitis in psoriasis and atopic dermatitis. e RANKL–RANK interaction in the CNS inflammation. (Left) TH17 cell cells induce the CCL20 expression of astrocytes at the blood–brain barrier via RANKL–RANK signaling. CCL20 recruits CCR6-expressing cells, including TH17 cell cells. These accumulated cells penetrate the barrier and infiltrate into the CNS to elicit inflammation. (Right) In the context of ischemic stroke, dead cells in the brain release DAMPs, which are recognized by TLRs. TLR stimulation of microglial cells leads to the production of pro-inflammatory cytokines including IL-6 and TNF-α, leading to inflammation and further cell death. RANKL–RANK signal in the microglial cells inhibits the production of these cytokines, resulting in the protection of the brain. RANKL receptor activator of NF-κB ligand, RANK receptor activator of NF-κB, LTi cell lymphoid tissue inducer cell, iNKT cell invariant natural killer T cell, mTEC medullary thymic epithelial cell, Aire autoimmune regulator, TSA tissue-specific antigen, MHC major histocompatibility complex, LTo cell lymphoid tissue organizer cell, LT lymphotoxin, LTβR lymphotoxin β receptor, MRC marginal reticular cell, ILC3 group 3 innate lymphoid cell, IL interleukin, DC dendritic cell, UV ultra violet, LC Langerhans cell, Treg cell regulatory T cell, CNS central nervous system, TH17 cell T helper 17 cell, CCL20 C-C motif chemokine ligand 20, CCR6 C-C motif chemokine receptor 6, DAMP damage-associated molecular pattern, TLR Toll-like receptor

Lymph node development

RANKL also contributes to the development and function of the secondary lymphoid organs, where immune responses take place. The LN is one such organ distributed throughout the body. LNs consist of lymphocytes and their surrounding stromal cells, establishing a complex but well-organized structure, with B and T cells localized in distinct regions [64]. LN organogenesis begins with the condensation of LTi cells, which are CD45 + CD4 + CD3 − IL-7R + RORγt + , and specific mesenchymal cells named lymphoid tissue organizer (LTo) cells. RANKL is expressed on LTi cells, LTo cells, and the descendants of the latter, marginal reticular cells (MRCs) [65, 66]. The expression of RANKL on the stromal cells in the LNs is reported to be enhanced by lymphotoxin β receptor (LTβR) signaling [67]. The RANKL signal, more likely via the membrane-bound type [17], induces the maturation of the LNs by increasing cellularity and the attraction of immune cells to the LNs [6, 65]. It was recently reported that the RANKL expressed by LTo lineage cells stimulate lymphatic endothelial cells to recruit and maintain macrophages in the LNs (Fig. 2b) [68].

Intestinal immunity

The gastrointestinal (GI) tract is the largest pathogenic bacteria entry site, with a surface area 100 times that of the body surface. In order to protect the body from these bacteria, the GI tract has developed a highly specialized defense system. Lymphocytes lacking antigen receptors, innate lymphoid cells (ILCs), are known to be abundant in the mucosal tissues and constitute a part of barrier functions by secreting cytokines [69, 70]. Group 3 ILCs, including LTi cells and ILC3s, express a transcription factor RORγt and produce high amount of cytokines IL-17 and IL-22, contributing to the homeostasis in the intestine [71, 72]. A recent study reported that ILC3s are divided into NKp46 − CCR6 − , NKp46 + CCR6 − , and NKp46 − CCR6 + cells. The expression of both RANKL and RANK showed the highest in the CCR6 + cells, which cluster within the cryptopathces [73, 74]. The proliferation and IL-17A/IL-22 expression of the CCR6 + ILC3s were suppressed by RANKL [73], indicating that these cells interact with each other in the cryptopatches to suppress excessive proliferation and inflammation (Fig. 2c).

Peyer’s patches (PPs) are lymphoid follicles beneath the intestinal epithelium. Within the epithelium covering the PPs (follicle-associated epithelium, FAE), there is a unique cell subset, M cells. Unlike their surrounding epithelial cells, M cells lack villi, but have a micro-fold structure on the apical side and a sac-like structure (the M-cell pocket) on the basal side. These cells have a high capacity for transcytosis, thus transferring the bacteria in the lumen to the DCs in the M-cell pocket. Antigen presentation to DCs via M cells results in the immune response to the transcytosed bacteria, i.e., IgA production [75]. RANKL is necessary and sufficient for M cell development, and its source during the process has been shown to be the mesenchymal cells in the lamina propria (Fig. 2c). The deficiency in soluble RANKL has not affected the development of these cells [76]. The RANKL in these mesenchymal cells also plays a role in IgA production [14].

Skin inflammation

The skin is the front line of the defense against external stimuli, and is thus equipped with a specific immune system. Langerhans cells (LCs) reside in the epidermis and are one of the key components of skin immunity [77, 78]. LCs are classified as a DC subset, with neuron-like dendrites, a high capacity for antigen presentation, and a capacity to migrate into the LNs, where LCs present antigens to T cells, thereby generating inflammatory or regulatory T (Treg) cells. RANKL has been shown to be expressed by keratinocytes upon ultra violet (UV) irradiation via the prostaglandin E receptor (EP) 4 signal [79]. The RANKL expressed by the keratinocytes interacts with RANK on LCs, resulting in the expansion of Treg cells. The increased Treg cells exert immunosuppressive effects [80], decreasing excessive inflammation in the skin (Fig. 2d). The immunosuppression induced by UV is the basis of the phototherapy used for psoriasis and atopic dermatitis, but is also can lead to carcinogenesis [81].

Inflammation in the central nervous system

The central nervous system is an immune-privileged site, which is due to the presence of the blood–brain barrier (BBB) comprised of endothelial cells, pericytes, and astrocytes. This barrier restricts the entry of cells and microorganisms [82]. A study showed that penetration of the BBB by pathogenic TH17 cells in a multiple sclerosis mouse model depended on RANKL signaling TH17 cells expressing RANKL interact with RANK-expressing astrocytes, which in turn secrete C-C motif chemokine ligand 20 (CCL20), further attracting C-C motif chemokine receptor 6 (CCR6)-expressing cells into the central nervous system (CNS) (Fig. 2e) [83].

In the brain tissue with ischemic stroke, there is an inflammation elicited by immune cells including microglial cells, macrophages, DCs, and γδ T cells [84, 85]. Reduced blood flow in the brain leads to the brain cell death, which results in the release of damage-associated molecular patterns (DAMPs) form the dead cells. These DAMPs include high mobility group box-1 (HMGB1) and peroxiredoxin (Prx), which lead to the BBB break and the stimulation of the immune cells above [86]. Clinical studies have observed that serum OPG concentration is higher in patients with ischemic stroke and is positively correlated with the severity [87]. A study showed that RANKL suppresses the production of pro-inflammatory cytokine, such as IL-6 and TNF-α, induced via Toll-like receptor 4 (TLR-4) (Fig. 2e) [84].

The course of these studies has revealed that the RANKL signal functions in various immune settings such as organogenesis, immune cell development, as well as the regulation of their function. Because RANKL serves sometimes beneficial but other times harmful, the modulation of this cytokine may be therapeutic utility in diseases affecting the immune system. Careful studies are needed to avoid the potential occurrence of side effects.

Crosstalk and cooperation of mechanotransduction pathways

Several publications have identified crosstalk and cooperation between the mechanosensing pathways covered by this review (Fig. 3c). YAP negatively regulates myocardin expression as well as its association with SRF to control the phenotypic switch of vascular smooth muscle cells in response to stimulation with platelet-derived growth factor. Overexpression of YAP inhibited contractile gene expression including α-SMA, SM22α, SMMHC and MYOCD itself, whilst promoting transcription of pro-proliferative genes [122]. YAP was found to specifically interact with myocardin, which reduced its co-immunoprecipitation with SRF, hence reducing SRF-directed transcription of smooth muscle genes (Fig. 3c). YAP therefore plays a functional role in controlling the vascular smooth muscle cell phenotype in a myocardin-dependent manner. This is functionally relevant in response to vascular injury (e.g., balloon injury-induced vessel lesion formation) in which YAP expression is induced [122]. Under these conditions YAP acts as a negative regulator of SRF-mediated gene transcription. However in another study YAP and MRTF-A were found to cooperate to promote GPCR/RhoA stimulated gene transcription and cellular proliferation [123] (Fig. 3c). Knockdown of YAP or MRTF-A blocked induction of CCN1 (Cyr61) expression stimulated by S1P-mediated activation of GPCRs in glioblastoma cells. Like myocardin, MRTF-A was found to associate with YAP in co-immunoprecipitation experiments following GPCR stimulation. Functionally, both YAP and MRTF-A bind to the CCN1 promoter to drive S1P-stimulated glioblastoma cell proliferation [123]. Consistent with this, a recent paper by Cui et al. [93] reported that knockdown of either MRTF-A or YAP blocked cyclic stretch-stimulated spreading and proliferation of primary mouse embryonic fibroblasts on soft surfaces. Interestingly, knockdown of either YAP or MRTF-A impeded nuclear localisation of the other protein in response to cyclic stretching, though the mechanism of this regulation is yet to be elucidated.

More recently, two reports identified a link between MRTF and TAZ [124, 125]. MRTF/SRF signaling promotes TAZ gene expression and protein abundance downstream of activation by heregulin β1 in breast cancer cells [124]. Comparably, MRTF knockdown in a porcine kidney cell line resulted in significant down-regulation of TAZ mRNA and protein [125]. Similar to previous reports that found MRTFs could interact directly with YAP, Speight et al. [125] demonstrated that TAZ and MRTF associate, at least in part, by WW domain/PPxY-mediated interaction [126, 127]. Importantly however, the authors elegantly showed that despite their interaction, MRTF and TAZ translocate independently to the nucleus upon actin polymerization [125]. In fact, in an interestingly complex scheme of protein crosstalk, TAZ and MRTF reciprocally mitigate each other’s nuclear localisation and accumulation induced by low calcium (Fig. 3c). This observation is hypothesised to be mediated by TAZ-MRTF interaction, which may sequester both proteins in the cytoplasm. Furthermore, MRTF was found to up-regulate 14-3-3 expression, which is expected to increase cytoplasmic sequestration of both TAZ and YAP [125]. The crosstalk between these transcriptional co-factors is significant in light of the knowledge that interaction of TAZ and MRTF can have different transcriptional outcomes. Specifically, TAZ and MRTF antagonize each other on the α-SMA promoter, whilst synergizing on TEAD elements that are not located neat to a SRE/CArG sequence [125].

Heregulin β1 (a splicing isoform of neuregulin 1) is a soluble protein that binds to and activates the receptor protein tyrosine kinase ERBB4. Upon activation, the intracellular cytoplasmic domain (ICD) of ERBB4 translocates to the nucleus where it can activate transcription. Via a WW domain/PPxY-mediated interaction, YAP interacts with ERBB4 ICD to stimulate transcription [128]. This interaction, producing a YAP-TEAD-ERBB4 tripartite complex, was later shown to induce YAP target genes such as CTGF, and promoted YAP-dependent cell migration in response to neuregulin treatment in mammary carcinoma cells [129]. Interestingly, protein tyrosine kinases (including ERBB4) are principally involved in the formation of focal adhesions and rigidity sensing (reviewed in [130]). Knockdown of ERBB4 in cultured human fibroblasts significantly reduced rigidity-dependent cell polarization, characterised by reduced cell elongation and focal adhesion alignment, but with increased focal adhesion number, on both soft and rigid substrates [131]. These findings reveal that activation of ERBB4 via chemical (heregulin β1/neuregulin signaling) or mechanical (rigidity) cues can alter YAP/TAZ signaling via two different mechanisms. Hence ERBB4 should be considered to be a key regulator of YAP/TAZ activity.

As discussed above, MRTF associates with Smad3 to drive slug expression [27]. Intriguingly, Smad3 inhibits MRTF-dependent activation of the α-SMA promoter by reducing MRTF association with SRF [132] (Fig. 3c). TAZ has also been reported to cooperate with Smad3 to drive expression of α-SMA, and in an additional layer of complexity, treatment with TGFβ altered the relative interaction between MRTF, Smad3 and TAZ [125]. This is meaningful since TGFβ is a potent biochemical inducer of fibrogenesis, mediated by downstream MRTF signaling, thus the relative abundance of these multiple signaling mediators, in addition to the mechano- and chemical- stimuli detected by cells will precisely dictate the response at the level of gene transcription.

As another example of crosstalk between mechanosensing pathways, β-catenin was identified to be a positive regulator of MRTF signaling by alleviation of Smad3 inhibition via two mechanisms [133] (Fig. 3c). First, β-catenin competes with Smad3 for MRTF binding, freeing MRTF to associate with SRF. Second, β-catenin supresses Smad3-mediated recruitment of glycogen synthase kinase-3β to MRTF that leads to its ubiquitination and degradation, thus increasing MRTF protein stability [133]. Interestingly, YAP and β-catenin cooperate to regulate mechanical strain induced cell proliferation [134]. Cell cycle re-entry and subsequent progression from G1 to S phase are mediated by YAP- and β-catenin- signaling respectively, however inhibition of either is sufficient to block cellular proliferation as determined by Edu incorporation. Notably, treatment with inhibitors to block YAP activity (e.g., YAP1-TEAD inhibitory peptide or verteporfin) also blocked cell cycle entry evidenced by a marked reduction in Ki67 positive staining [134]. Thus, through different but complementary roles, YAP and β-catenin coordinate to regulate biological function (Fig. 3c).

EBV-associated tumours

Lymphoproliferative disease in immunosuppression

The lymphoproliferations that arise following iatrogenic immunosuppression for transplant surgery are collectively known as post-transplant lymphoproliferative disorders (PTLDs). Similar tumours are observed in patients with certain forms of inherited immunodeficiency syndromes, such as X-linked lymphoproliferative syndrome and Wiscott–Aldrich syndrome, and in AIDS patients. They are most often of B-cell origin and represent a family of lesions ranging from atypical polyclonal B-cell proliferations, which often regress following withdrawal or reduction of immune suppression, to aggressive monomorphic non-Hodgkin's lymphomas (NHLs), which generally do not resolve following immune reconstitution (Niedobitek and Young, 1997).

The incidence and clinical presentation of PTLDs varies with the organ transplanted, the duration of immunosuppression, and the dosage and number of agents used, although there are a number of common clinical features, which include their frequent occurrence in multiple extranodal locations such as the gastrointestinal tract or even in the allograft organ itself. The high incidence of PTLDs in the transplanted organ suggests that chronic antigen stimulation in the graft might be important in the pathogenesis of these lesions. Indeed, T-cells are required for the development of PTLD-like tumours in severe combined immunodeficient (SCID) mice, suggesting an important role for T-cell help in the growth of B-cell-derived PTLDs (Johannessen et al., 2000).

The majority of PTLD cases are EBV positive and many show a Lat III pattern of gene expression (Young et al., 1989). Thus, in many cases, PTLDs appear to represent the in vivo counterpart of in vitro immortalized LCLs and, by implication, are likely to be primarily driven by EBV. However, other forms of latency (i.e. Lat I and Lat II) are occasionally observed and EBV-negative forms of PTLD have been described, which include some T-cell tumours. These EBV-negative tumours tend to be monomorphic, present later than EBV-positive tumours and are more aggressive (Dotti et al., 2000 Nelson et al., 2000). Interestingly, a proportion of these tumours respond to a decrease in immunosuppression, which might suggest the involvement of other viral agents.

Burkitt's lymphoma

The recognition that BL in Africa was apparently restricted to areas where infection with Plasmodium falciparum malaria was holoendemic led to the suggestion that an infectious agent might be involved and eventually to Epstein, Achong and Barr's discovery of EBV. The so-called ‘endemic’ or high-incidence form of BL occurs at an annual incidence of approximately 5–10 cases per 100 000 children in equatorial Africa and parts of Papua New Guinea. By contrast, sporadic cases of BL occur worldwide but at a much lower frequency (at least 50-fold less than in the high-incidence areas).

Whereas virtually every BL tumour found in the high-incidence regions is EBV positive, only about 15% of sporadic BL tumours carry the virus (Rickinson and Kieff, 2001). In addition, certain ‘intermediate-incidence’ areas outside the regions of holoendemic malaria, such as Algeria and Egypt, have increased numbers of cases that correlate with an increased proportion of EBV-positive tumours. BL is also observed as a consequence of HIV infection, frequently occurring before the development of full-blown AIDS. Only 30–40% of cases of AIDS-BL are associated with EBV. A consistent feature of all BL tumours, irrespective of geographical location or AIDS association, is chromosomal translocations involving the long arm of chromosome 8 (8q24) in the region of the c-myc proto-oncogene and either chromosome 14 in the region of the immunoglobulin heavy-chain gene or, less frequently, chromosomes 2 or 22 in the region of the immunoglobulin light-chain genes. This translocation results in deregulated expression of the c-myc oncogene.

The precise role of EBV in the pathogenesis of BL remains to be established, although the detection of monoclonal EBV episomes in virus-positive BL biopsies suggests that EBV infection preceded proliferation of the precursor B cells (Neri et al., 1991). The apparent origin of BL in the germinal centre is based on phenotypic studies and is supported by the ability of BL risk factors such as holoendemic malaria and chronic HIV infection to stimulate proliferation of B cells in the germinal centre (Rowe et al., 1987). These cells are also programmed to undergo somatic mutation of immunoglobulin genes and this event, in conjunction with the stimulation of germinal centre proliferation and EBV infection, might be responsible for the generation and selection of B cells carrying the c-myc translocation.

BL cells from primary tissues appear to show a very restricted pattern of viral gene expression, where EBNA1 is seemingly the only EBV protein consistently detectable (Hatzubai et al., 1987 Gregory et al., 1990). Although some reports have documented expression of LMP1 and EBNA2 in small numbers of cells in a few cases of endemic BL (Niedobitek et al., 1995), and LMP1 in several cases of sporadic BL (Carbone and Gloghini, 1995), this does not seem to be a frequent occurrence. When cells from some EBV-positive BL tumours are passaged in culture, the other EBNAs and LMPs are expressed, and the EBNA2- and LMP1-induced cell surface antigens, such as CD23, CD30, CD39, LFA1, LFA3 and ICAM1, also are upregulated (Gregory et al., 1990). EBNA2 and LMP1 are the major mediators of EBV-induced B lymphocyte growth in vitro and the lack of expression of these proteins in tumour cells suggests that they are not required for the maintenance of BL tumours. Altered MYC expression may replace EBV-driven cell proliferation and allow cells to survive and proliferate with downregulation of the EBNAs and LMPs, which may in turn enable the infected cells to evade CTL immunosurveillance (Rowe et al., 1987). This may explain why the drift to an LCL phenotype seen in some BL lines in vitro occurs only at a low level in vivo (Carbone and Gloghini, 1995 Niedobitek et al., 1995), since ‘drifted’ cells would be selectively removed by the CTL response. EBV-positive BL lines that have retained the tumour cell phenotype in vitro are not sensitive to lysis by EBV-specific CTLs. In addition to the downregulation of the highly immunogenic EBNAs and LMPs, several phenotypic features contribute to diminish the immunogenicity of BL tumour cells. These include reduced expression of cell adhesion molecules, and a general and allele-selective downregulation of MHC class I expression (Rickinson et al., 1992), defects in antigen processing (de Campos-Lima et al., 1993) and peptide transport (Khanna et al., 1994).

Evidence that EBV and altered MYC expression can cooperate to alter B lymphocyte growth comes from studies in which EBV was used to transform human B lymphocytes in vitro, followed by the introduction of a rearranged MYC gene, cloned from a BL cell line, into these cells (Lombardi et al., 1987). The EBV-transformed cells initially had very low cloning efficiencies in soft agar and did not form tumours in nude mice, but after gene transfer of a rearranged MYC, they grew more efficiently in soft agar and were tumorigenic. Activated MYC gene introduced into an EBV-transformed cell line in which EBNA2 was rendered oestrogen-dependent was shown to be capable of inducing continuous proliferation of these cells in the absence of functional LMP1 and EBNA2, suggesting that MYC might substitute for LMP1 and EBNA2 in BL progenitor cells (Polack et al., 1996).

Recent evidence also suggests greater involvement of EBV in sporadic BL than previously documented. Thus, rearranged defective EBV genomes have been detected in some sporadic BL tumours from the USA (Razzouk et al., 1996). Such viral rearrangements can lead to constitutive expression of the immediate-early gene BZLF-1, which by transient transfection has been shown to result in partial elimination of EBV episomes from infected cells. Thus, the presence of these defective genomes may go undetected by conventional EBV testing (e.g. EBER in situ hybridization), and suggests a process of viral DNA rearrangement and loss during malignant progression consistent with a ‘hit and run’ role for EBV in the pathogenesis of at least a proportion of sporadic BL cases.

Hodgkin's disease

As early as 1966 MacMahon had proposed that HD might be caused by an infectious agent (MacMahon, 1966). First evidence that this agent might be EBV was provided by the detection of raised antibody titres to EBV antigens in HD patients when compared with other lymphoma patients (Levine et al., 1971) and, furthermore, that these raised levels preceded the development of HD by several years (Mueller et al., 1989). In addition, the relative risk of developing HD in individuals with a history of IM, relative to those with no prior history, was shown to range between 2.0 and 5.0 (Gutensohn and Cole, 1980). Weiss et al. (1991) were the first to demonstrate the presence of EBV DNA in HD tissue specimens using the cloned BamHI W fragment of EBV, as an in situ hybridization probe. The subsequent development of in situ hybridization to target the highly abundant EBERs provided a reliable and simple method for the detection of EBV in archival HD specimens (Wu et al., 1990). Several investigators have demonstrated the clonality of EBV in HD tissue by hybridization with the viral TRs (Anagnostopoulos et al., 1989). These findings indicate clonal expansion of single EBV-infected cells and further underline a possible aetiological role of EBV in a proportion of HD cases. Immunohistochemical assays (Figure 2) and transcriptional analysis on fresh biopsies has demonstrated that the malignant Hodgkin/Reed-Sternberg (HRS) cells of EBV-positive cases express high levels of LMP1 in the absence of EBNA2 expression (Lat II pattern) (Pallesen et al., 1991 Murray et al., 1992 Deacon et al., 1993).

Detection of EBV gene products in HD. Upper panel (a) shows the detection of the highly abundant EBV-encoded RNAs (EBERs) using in situ hybridization. EBER expression is mainly restricted to the nuclei of infected HRS cells. (b) Expression of the LMP1 using the monoclonal reagent CS1-4. LMP1 is almost always detectable in the HRS cells of EBV-associated HD. (c) LMP2 is also detectable in HRS cells of EBV-associated cases. A single LMP2-positive HRS cell is shown

EBV is regularly detectable in up to one-half of all HD tumours from developed countries and in a greater proportion of those cases arising in developing communities. Striking epidemiological and clinical differences point to important underlying differences in the aetiology and behaviour of the EBV-positive and EBV-negative forms of HD. Thus, EBV is preferentially associated with the mixed cellularity form of HD and also with males rather than females. EBV-positive HD also affects more Asians and Hispanics than whites or blacks (Glaser et al., 1997). Recent studies from the UK also show a strong association between EBV positivity and South Asian ethnicity in paediatric HD patients (Flavell et al., 2001).

HD in older patients (>55 years of age) and in children, especially boys under 10 years, has been shown to be more likely to be EBV associated than HD in young adults (Armstrong et al., 1998). This has led to the suggestion that HD consists of three disease entities: HD of childhood (EBV positive, MC type), HD of young adults (EBV negative, NS type) and HD of older adults (EBV positive, MC type) (Armstrong et al., 1998). While an unusual response to primary EBV infection might account for the incidence of virus-positive HD cases in children the association of EBV with the tumour in older patients could reflect increased EBV activity as a result of failing T-cell immunity. In this respect the overall incidence of HD is marginally increased in AIDS patients, but the majority of HD tumours arising in AIDS patients are EBV associated (Uccini et al., 1990).

These age groups seem to be important when considering the impact of EBV status on the outcome for HD patients. Thus, in young adults, there seems to be a marginal prognostic advantage when patients carry the EBV genome in their tumour (Glavina-Durdov et al., 2001). Yet in older adults (Clarke et al., 2001) and in children (Flavell et al., 2001) the presence of EBV in the tumour carries a poorer prognosis. Thus, the influence of EBV on survival in HD might reflect differences at these different ages in the balance between the immune response to the virus-infected tumour cells and the oncogenic capacity of the virus. EBV presence might also generally reflect a poor immune status, which in turn means that patients might tolerate disease and its treatment less well.

LMP1 has important effects in B cells and its high-level expression in EBV-infected HRS cells suggests that it is likely to be responsible for some of the observed differences between the virus-positive and virus-negative forms of HD. Studies that have attempted to show a correlation between LMP1 and expression of many of the genes known to be upregulated by LMP1 in vitro have shown that such differences do exist in primary HRS cells. For example, certain LMP1-regulated genes, such as IL-10 and TRAF1, are more highly expressed in EBV-positive compared with EBV-negative HD (Herbst et al., 1996 Durkop et al., 1999 Murray et al., 2001).

Although EBV-positive and EBV-negative HD might represent similar morphological end points the means by which this is achieved may well be different. Thus, constitutive NF-κB activation has been consistently detected in HRS cells (Bargou et al., 1996) and nuclear NF-κB expression can be observed in HRS cells by immunohistochemistry in both EBV-positive and EBV-negative tumours (Murray, unpublished). Inhibition of NF-κB activity in HD cell lines leads to their increased sensitivity to apoptosis after growth factor withdrawal and impaired tumorigenicity in severe combined immunodeficiency (SCID) mice (Bargou et al., 1997). Although NF-κB activation is a common feature of HRS cells, the molecular routes to this activation might be different between EBV-positive and EBV-negative HD. Thus, by single cell PCR of HRS cells, Jungnickel et al. (2000) detected clonal mutations in the IκBα gene in 2/3 cases of EBV-negative HD, but no such defects were detected in the two EBV-positive cases examined. This suggests that the constitutive activation of NF-κB by LMP1 in EBV-positive HRS cells might be substituted by IκBα gene mutations in HRS cells not infected by EBV. Further studies on larger numbers of cases are required to substantiate this.

Although EBV normally persists throughout the course of HD and is also found in multiple sites of HD (Coates et al., 1991), EBV-negative HD arising as a relapse of a formerly EBV-positive HD tumour has been reported (Nerurkar et al., 2000). The possibility that EBV may contribute to tumorigenesis by a ‘hit and run’ mechanism, similar to that already described for BL, has prompted the search for evidence of defective rearranged EBV DNA in tumours that, by conventional testing (e.g. using EBER in situ hybridization), are ostensibly virus negative. The results of these studies are somewhat conflicting. In one study, Gan et al. (2002) searched for the presence of defective rearranged EBV genomes in HD using both standard PCR and PCR in situ hybridization. They successfully amplified sequences that span abnormally juxtaposed BamH1W and Z fragments (which characterize defective heterogeneous EBV DNA) from two out of 24 EBER-negative tumours in which the standard viral genome could not be detected. However, in another study, fluorescence in situ hybridization (FISH) analysis found no evidence of integrated EBV genomes in EBV-negative HD tumours (Staratschek-Jox et al., 2000).

T-cell lymphomas

EBV has been linked to variable proportions of T-cell NHLs of these, a very high incidence of EBV genomes has been reported in sinonasal T-NHLs occurring in Japanese, Chinese, Peruvian, European and United States patients (Niedobitek and Young, 1997 Rickinson and Kieff, 2001). Sinonasal T-NHLs display peculiar phenotypic and genotypic features, including the frequent absence of T-cell antigens, expression of NK cell markers and the absence of T-cell receptor gene rearrangements.

An intriguing aspect of EBV-positive T-cell lymphomas is the frequent detection of the virus in only a fraction (5–50%) of the tumour cells implying that EBV infection might have occurred subsequent to tumour development (Niedobitek and Young, 1997). The documented increase in the proportion of EBV-positive tumour cells with T-cell lymphoma progression or recurrence suggests that the virus might provide an additional growth/survival advantage to the transformed T cells.

Most EBV-associated T-NHLs are extranodal and have a cytotoxic phenotype, as demonstrated by immunohistochemical staining for T-cell intracytoplasmic antigen-1 (TIA-1) and granzyme B (Brink et al., 2000), suggesting that these tumours might arise following EBV infection of CTLs during the killing of EBV-infected cells by virus-specific CTLs. Interestingly, EBV-positive B cells are frequently detectable in some EBV-negative T-cell lymphomas, and in contrast to the EBV-positive small lymphocytes detectable in UNPC or HD, these cells display a lat III phenotype, suggesting that the presence of the neoplastic T cells might be a stimulus for EBV-induced B-cell transformation (Niedobitek et al., 2000). A further possibility is that the EBV-infected B cells present in T-cell lymphomas might contribute to the growth of the neoplastic T cells, possibly by the secretion of cytokines or perhaps more directly by interaction of their costimulatory molecules with partner molecules on T cells.

Nasopharyngeal carcinoma

The tumour showing the most consistent worldwide association with EBV is the undifferentiated form of nasopharyngeal carcinoma (WHO type III UNPC). UNPC is characterized by the presence of undifferentiated carcinoma cells together with a prominent lymphocytic infiltrate the latter is believed to be important for the growth of the tumour cells. A link between EBV and UNPC was suggested as early as 1966 on the grounds of serological studies, and substantiated later by the demonstration of EBV DNA and the EBNA complex in the tumour cells of UNPCs using in situ hybridization and the anticomplement immunofluorescence (ACIF) assay (zur Hausen et al., 1970). Southern blot hybridization of DNA from UNPC tissues revealed monoclonality of the resident viral genomes, suggesting that EBV infection had taken place before clonal expansion of the malignant cell population (Raab-Traub and Flynn, 1986).

UNPC is particularly common in areas of China and South-East Asia, reaching a peak incidence of around 20–30 cases per 100 000. Incidence rates are also high in individuals of Chinese descent irrespective of where they live, and particularly in Cantonese males. In addition to this genetic predisposition, environmental cofactors such as dietary components (i.e. salted fish) are thought to be important in the aetiology of NPC (Yu et al., 1986). Experimental evidence to support a role for nitrosamines as a constituent of salted fish in NPC development is provided by the observation that albino rats fed on a diet of salted fish develop carcinoma of the nasal cavity (Huang et al., 1978).

Extensive serological screening has identified elevated EBV-specific antibody titres in high-incidence areas in particular, IgA antibodies to EBV capsid antigen (VCA) and early antigens (EA), and these have proved useful in the diagnosis and in monitoring the effectiveness of therapy (Zeng, 1985).

EBNA1 and the EBERs (Figure 3a) are expressed in all EBV-positive cases and LMP1 is present in up to approximately 65% of cases (Young et al., 1988 Niedobitek et al., 1992), although the detection of LMP1 in a given tumour is partly dependent on the sensitivity of the method used RT-PCR identifies more positive cases than immunohistochemistry (Figure 3b). PCR studies have also revealed expression of LMP2A, though the LMP2A protein has yet to be detected in NPC tumours (Brooks et al., 1992).

Detection of viral gene expression in EBV-associated epithelial malignancies. Panel (a) shows typical EBER expression in an undifferentiated NPC tumour, showing brown staining of malignant nuclei by in situ hybridization for the EBERs. The surrounding abundant lymphocytic infiltration characteristic of this tumour is not stained and appears blue. Panel (b) shows LMP1 expression from the same tumour. However, unlike HD, not all cases of UNPC express LMP1. Expression of LMP2 at the protein level has yet to be demonstrated in UNPC, although LMP2 transcription can be detected. Panel (c) shows a poorly differentiated EBV-positive gastric adenocarcinoma. EBV-positive carcinoma cells are highlighted by detection of the EBERs (brown staining). Note that these tumours also often have a prominent lymphocytic infiltrate

The association of EBV with the other two forms (WHO types I and II) of NPC is controversial. Viral DNA is detectable in extracts from squamous cell NPCs by Southern blot hybridization (Raab-Traub et al., 1987), though the clonality of the viral episomes could not be ascertained in these cases. Most in situ hybridization studies have failed to detect EBV DNA or the EBERs in squamous cell NPC, and PCR only identifies EBV DNA in a small proportion of squamous cell NPCs, suggesting that EBV is present only in reactive B lymphocytes in these lesions. However, one report has demonstrated the expression of the EBERs in all of 31 squamous NPCs (Pathmanathan et al., 1995).

Undifferentiated carcinomas of nasopharyngeal type

Carcinomas with features similar to UNPC have been described at other sites including the thymus, tonsils, lungs, stomach, skin or uterine cervix, and are often referred to as undifferentiated carcinomas of nasopharyngeal type (UCNT) or ‘lymphoepitheliomas’. The morphological similarities of UCNTs to UNPCs prompted several groups to examine such cases for the presence EBV. UCNTs of the stomach are consistently EBV positive (Shibata et al., 1991), whereas the association of the other UCNTs with EBV is less strong. EBV has been demonstrated in thymic epithelial tumours from Chinese but not Western patients (Fujii et al., 1993). Salivary gland UCNTs are EBV associated in Greenland Eskimos and Chinese but not in Caucasian patients (Raab-Traub et al., 1991), and several case reports have demonstrated the absence of EBV from UCNTs arising in the uterine cervix and breast (Weinberg et al., 1993 Dadmanesh et al., 2001).

Gastric carcinoma

EBV is also found in a small proportion of more typical gastric adenocarcinomas (Figure 3c) of either diffuse or intestinal type (Shibata and Weiss, 1992 Rowlands et al., 1993 Tokunaga et al., 1993). However, the histological features of these more typical differentiated gastric carcinomas can show heterogeneity, with some cases having lymphoepithelioma-like tumour growth next to adenocarcinoma tumour (Rowlands et al., 1993 Gulley et al., 1996) making the histological distinction between tumours more complicated. Immunohistochemical studies of virus-associated gastric carcinomas (including both UCNTs and adenocarcinomas) have shown a restricted pattern of expression limited to the EBERs, EBNA1 and BZLF1, but not LMP1 or the other EBNAs (Imai et al., 1994 Sugiura et al., 1996).

EBV-positive gastric cancers are more likely in males compared with females (Shibata and Weiss, 1992 Tokunaga et al., 1993), less likely to be found in the gastric antrum than in the cardia or body of the stomach (Tokunaga et al., 1993 Yuen et al., 1994 Galetsky et al., 1997), and more likely to be poorly differentiated tumours (Yuen et al., 1994 Gulley et al., 1996 Shin et al., 1996 Galetsky et al., 1997). Less consistent associations have been found for age, with some studies finding more EBV-positive tumours in older people, 60+ years (Qiu et al., 1997) or 56+ years (Gulley et al., 1996). Recently, it has been shown that EBV-positive gastric carcinoma is more common in tumours from Texas Hispanics than from non-Hispanic whites or African Americans (Vo et al., 2002) reinforcing the long held view, which surprisingly has never been properly tested, that the EBV association in gastric carcinoma varies with ethnicity.

Is EBV associated with other common epithelial malignancies?

Detection of the EBERs by in situ hybridization has become the standard method to detect EBV infection in routinely processed tumour tissues. Although the EBERs are thought to be expressed in all forms of latency, two studies have suggested the possibility of EBER-negative forms of latency and that such forms of latency might exist in hitherto unrecognized EBV-associated malignancies. In the first of these, the detection of EBV in a proportion of classical breast tumours was reported by PCR, immunohistochemistry for EBNA1 protein, and Southern blotting (Bonnet et al., 1999). However, EBER expression was not detectable by in situ hybridization. EBV was also detected more frequently in breast tumours that were hormone-receptor negative and of high histological grade. In the second study, EBV was reported in a series of hepatocellular carcinomas (HCC) again in the absence of EBER expression (Sugawara et al., 1999). Furthermore, a single-terminal fragment of EBV DNA was identified in these tissues, suggesting that the EBV-infected cells in HCC represent clonal proliferations. Western blotting and reverse transcription–polymerase chain reaction also demonstrated expression of EBNA1 and the BamHI A transcripts. More recently, the results of Grinstein et al. (2002) using the monoclonal antibody 2B4-1 and whole section-based PCR seemed to suggest that EBV is also present in a proportion of other common epithelial malignancies, including those of the lung, colon and prostate. Shimakage et al. (2002) examined EBV presence in oral squamous cell carcinoma using RNA ISH for EBERs. They also seemingly detected EBNA2 and LMP1 expression in these lesions. Gonzalez et al. (2002) detected EBV DNA by PCR in one-fifth of oral cancers and expression of LMP1 in almost all EBV-positive cases. Intriguingly, EBV presence was significantly more frequent in cancers of the lateral tongue (interestingly also the site for oral hairy leucoplakia) and was associated with greater nuclear atypia.

Clearly, further studies are required to substantiate these findings. In our opinion, definitive designation of a tumour as ‘EBV associated’ should require unequivocal demonstration of the EBV genome or virus gene products within the tumour cell population. Unfortunately, much of the methodology used to detect EBV in many of these studies (where EBER expression is presumed to be undetectable) has involved either analysis of whole tumour sections or antibody reagents that lack specificity. Future studies using robust methodologies will be required to establish whether HCC, breast cancer or any of the other common epithelial neoplasms described above are truly EBV associated.

Access options

Get full journal access for 1 year

All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.

Get time limited or full article access on ReadCube.

All prices are NET prices.


Identification of the HD-zip gene family in wheat

Wheat genome data used in this study were downloaded from the Chinese Spring IWGSC RefSeq v1.1 reference genome assembly (https://wheat-urgi.versailles.inra. fr/). We firstly converted the wheat genome into a local BLAST database using the UNIX pipeline. Then, we used 90 Arabidopsis and rice HD-Zip protein sequences to perform a BLAST search (BLASTP) against this local blast database using cut-off E-value <1e-10. After remove the all redundant sequences using CD-hit program, the rest of protein sequences were further subjected to identify the HD domain and LZ motif using the Simple Modular Architecture Research Tool (SMART NORMAL = 1). In a recently study, a total of 46 HD-Zip genes were identified in wheat by a genome-wide bioinformatic survey [35]. In this study, we further identified 67 additional HD-Zip genes in wheat latest genome and extended the total member to 113. Based on the genomic position information, 113 HD-Zip genes were located across all the 21 wheat chromosomes, ranging from 3 to 8 per chromosome. Chromosome 5A/B/D have the most HD-Zip genes (24 total, 8 per chromosome), followed by chromosome 4A/B/D (18 total, 6 per chromosome) (Table 1 Additional file 1: Figure S1). Acording to their phylogenetic relationship, the 113 HD-Zip proteins were grouped into 40 homoeologous clusters, and the members in each of 39 clusters were assigned to A, B or D sub-genomes. Finally, We designated wheat HD-Zip genes as TaHDZX_ZA, TaHDZX_ZB, or TaHDZX_ZD, where X denotes the gene number and Z the wheat chromosome where it is located. The detailed information of HD-Zip family genes in wheat, including nomenclature proposed in the previous study [35] was listed in Table 1. As shown in Table 1, the identified HD-Zip genes in wheat encode proteins ranging from 192 (TaHDZ12-6D) to 890 (TaHDZ35-1B) amino acids (aa) in length with an average of 501 aa. Furthermore, the computed molecular weights of these HD-Zip proteins ranged from 20.88 (TaHDZ12-6D) to 96.02 (TaHDZ35-1B) kDa. The theoretical pI of the deduced HD-Zip proteins ranged from 4.59 (TaHDZ5-6A) to 9.79 (TaHDZ12-6D).

Phylogenetic analysis of HD-zip gene family

Our study aimed to understand the phylogenetic relationships between plant HD-Zip proteins. We began by identification of HD-Zip genes from seven other plant species with varying levels of complexity for which entire genomes were accessible, including Chlamydomonas reinhardtii, Physcomitrella patens, the monocotyledonous angiosperms Brachypodium distachyon, Oryza sativa, and Zea mays, and the dicotyledonous angiosperms Arabidopsis thaliana, Populus trichocarpa, and Vitis vinifera. From this analysis, we found that the HD-Zip gene family seems to be restricted to land plants all genomes except that of the algae contained genes for HD-Zip proteins. We then analyzed their evolutionary relationships using full-length HD-Zip proteins from eight land plant species to construct a neighbour-joining phylogenetic tree. Acordingly, the phylogenetic tree was divided into four well-conserved subfamilies, designated as HD-Zip I to IV (Fig. 1a). The phylogenetic tree also revealed the species-biased distribution of these plant HD-Zip proteins (Fig. 1b). HD-Zip I members consisted of the largest subfamily in the plant species except for Brachypodium distachyon and wheat, where HD-Zip II and IV were the largest respectively. In contrast, HD-Zip III subfamily composed of the fewest HD-Zip members except for moss (Fig. 1c). Subfamily I included 31 TaHDZ genes, grouped into 11 clusters (TaHDZ1-4A/B/D, TaHDZ2-5A/B/D, TaHDZ3-4A/B/D, TaHDZ4-5A/B/D, TaHDZ5-6A/D, TaHDZ6-5A/B/D, TaHDZ7-2A/B/D, TaHDZ8-6A/B/D, TaHDZ9-4A/B/D, TaHDZ10-2B/D, and TaHDZ11-2A/B/D) Similarly, subfamily II embraces 31 TaHDZs, grouped into 12 clusters (TaHDZ12-6A/B/D, TaHDZ13-6A/B/D, TaHDZ14-7A/B/D, TaHDZ15-1A/B/D, TaHDZ16-4B/D, TaHDZ17-3B/D, TaHDZ18-5A/B/D, TaHDZ19-3A/B/D, TaHDZ20-1A/B/D, TaHDZ21-2A/B/D, TaHDZ22-4A, and TaHDZ23-7A/D) While subfamily III is the smallest, and contained 14 TaHDZs, which grouped into 5 clusters (TaHDZ24-3A/B/D, TaHDZ25-1A/B/D, TaHDZ26-4B/D, TaHDZ27-5A/B/D, and TaHDZ28-5A/B/D) subfamily IV contained 36 TaHDZs, and grouped into 12 clusters (TaHDZ29-3A/B/D, TaHDZ30-4A/B/D, TaHDZ31-5A/B/D, TaHDZ32-3A/B/D, TaHDZ33-6A/B/D, TaHDZ34-7A/B/D, TaHDZ35-1A/B/D, TaHDZ36-6A/B/D, TaHDZ37-2A/B/D, TaHDZ38-5A/B/D, TaHDZ39-7A/B/D, and TaHDZ40-2A/B/D) (Table 1).

Phylogeny and distribution of HD-Zip proteins from eight plant species. a Phylogenetic tree of HD-Zip proteins from Arabidopsis, Populus, Vitis, wheat, rice, maize, Brachypodium, and moss. Phylogeny was constructed by PhyML using maximum likelihood analysis. Bootstrap support values as percentage, are shown on selected major branches. The bar indicates substitutions per site b Percentage representation of HD-Zips across the eight plant species within each subfamily c Percentage representation of distributions for HD-Zips within each plant species

To clarify the paralog and ortholog relationships of wheat HD-Zip members, we further divided each subfamily into subclasses. According to this reshaped phylogenic tree (Fig. 2), each subfamily contain the HD-Zip proteins from Arabidopsis, rice, and wheat, suggesting that these subfamilies were appeared before the dicot-monocot split. Consistent with the nomenclature in previous studies of Arabidopsis and rice [36], HD-Zip I subfamily was divided into seven subclasses, i.e., α, β, γ, δ, ε, φ and ζ (Fig. 2). Clade ε and φ contains only sequences from Arabidopsis. Clade ζ contains sequences from both rice and wheat, with no members in Arabidopsis, suggesting the gene loss in Arabidopsis during the long period of evolution of this group. The HD-Zip II subfamily was divided into ten subclasses, from α to κ, according to Hu et al. (2012) [37]. Clade β contains only sequences from Arabidopsis. Clade α and γ contains sequences from both rice, wheat, and Arabidopsis. While the other clades only contains sequences from rice and wheat. The HD-Zip III subfamily was only divided into three subclades, designated as clade α, β and γ, consistent with the previous studies [37]. Each clade contains sequences from both rice, wheat, and Arabidopsis. The HD-Zip IV subfamily was also divided into six subclades, designated clade α, β, γ, δ, ε and ζ as in a previous study [37]. Clade δ excluded genes from rice and Arabidopsis, while clade ζ included only sequences from rice and wheat. Eudicot- and monocot-specific clustering patterns of HD-Zip genes emerged when tree topology was examined. This pattern may reflect evolutionary history of these subgroups: HD-Zip genes in eudicots were likely retained after they diverged from monocots and then expanded.

The phylogenetic tree of HD-Zip proteins from wheat, Arabidopsis, maize and rice. Members of the HD-zip genes from wheat are marked in red. Two-letter prefixes for sequence identifiers indicate species of origin. Ta, Triticum aestivum At, Arabidopsis thaliana Os, Oryza sativa Zm, Zea mays. The tree was constructed using the Neighbor-Joining algorithm with 1000 bootstrap based on the full length sequences of HD-Zip proteins. The HD-Zip proteins are grouped into four distinct groups

Gene structure and motif composition analysis

Exon-intron structural divergence can play an important role in the evolution of multiple gene families [38]. We constructed a phylogenetic tree using only the 113 full-length wheat HD-Zip protein sequences fo further examine patterns in wheat. We found that wheat HD-Zip proteins also fell into the four subfamilies described previously (Fig. 3a). We further mapped the exon/intron organization in the coding regions of each TaHDZ gene. Specifically, 21 TaHDZ genes had two introns, 28 had three introns, 15 had four introns, two had five introns, two had seven introns, 11 had eight introns, 12 had nine introns, three had 10 introns, five had 11 introns, two had 15 introns, two had 16 introns, and 10 had 18 introns (Fig. 3b, c). In general, orthologous genes are highly conserved with respect to gene structure, and this conservation is sufficient to reveal their evolutionary relationships [38]. In wheat, HD-Zip genes within the same subfamily shared similar gene structures (intron number and exon length), especially the members of the HD-Zip I and HD-Zip III subfamilies, i.e., HD-Zip I genes mainly had two or three introns in their gene regions, and HD-Zip III genes mainly had 18 introns. However, the exon/intron compositions in HD-Zip II and IV genes were more variable, i.e., HD-Zip II members possessed two to four introns, and the number of introns in HD-Zip IV family members varied from 4 to 11 (Fig. 3b, c).

Phylogenetic relationships and gene structures of wheat HD-Zip genes. a Phylogenetic tree of 113 full length HD-Zip proteins from wheat were constructed by MEGA 6.0 using the Neighbour-Joining (NJ) method with 1000 bootstrap values. b Exon/intron structures of wheat HD-Zip genes. Exons and introns are represented by purple boxes and grey lines, respectively. c The distribution of intron numbers between four distinct HD-Zip subfamily of wheat

The allohexaploid bread wheat genome is known to have formed by fusion of the T. urartu (subgenome A), Aegilops speltoides (subgenome B), and A. tauschii (subgenome D) genomes prior to several hundred thousand years ago. A majority (60.1–61.3%) of genes in the A, B, and D sub-genomes have orthologs in all the related diploid genomes. To deeply understand the intron gain or loss for homeologous TaHDZ genes in wheat, the intron/exon structures of TaHDZ genes that clustered together based on the phylogenetic tree were compared. Among these, fourteen clusters showed changes in their intron/exon structure, including TaHDZ1-4A/B/D, TaHDZ3-4A/B/D, TaHDZ5-6A/D, TaHDZ10-2B/D, TaHDZ12-6A/B/D, TaHDZ20-1A/B/D, TaHDZ24-3A/B/D, TaHDZ25-1A/B/D, TaHDZ30-4A/B/D, TaHDZ32-3A/B/D, TaHDZ35-1A/B/D, TaHDZ38-5A/B/D, TaHDZ39-7A/B/D, and TaHDZ40-2A/B/D (Fig. 3b). Because there are many orthologs in the wheat A, B, and D sub-genomes, intron gain/loss of these orthologs significantly increases the transcriptome and proteome complexity in wheat.

To further examine the diverse structurse of wheat HD-Zip proteins, the conserved motifs were identified by searching the SALAD database along with subsequent annotation with InterPro (Additional file 2: Figure S2). Seven of these motifs were found to be associated with the functionally defined domains. Motifs 1 and 2 were referred to the HD domain, which is the typical conserved domain found in the middle of all the TaHDZ proteins, and motif 5 was associated with the adjacent LZ domain. Motifs 17 and 34 were specifically made up the MEKHLA domain in subfamily III proteins of wheat (14 members). Motifs 3 and 4 were associated with the START region, which has been identified in subfamily III and IV proteins (Additional file 2: Figure S2). Similar motif compositions are shared by TaHDZ proteins which cluster together, and this indicates that members of a given group possess similar functionalities.

Tissue-specific expression profile of TaHDZ genes

Gene family members can exhibit different expression patterns in different tissues to accommodate various physiological processes. To gain insight into the temporal and spatial expression patterns and putative functions of HD-Zip genes in wheat growth and development, the tissue-specific expression patterns of the 113 TaHDZ genes were investigated using RNA-seq data from 10 different tissues. All TaHDZ genes were found to be expressed in at least one of the tissues examined (Fig. 4 Additional file 3: Table S1). Subfamily I TaHDZ genes were found to be much more highly expressed in seedling roots, stems, leaves, flag leaves, young spikes, and 5-day-old grains for example, TaHDZ1-4A/B/D are highly expressed in leaves and 5-day-old grains, TaHDZ8-6A/B/D are highly expressed in leaves and young spikes (15-days-old), and TaHDZ11-2A/B/D are highly expressed in leaves and 5-day-old spikes (Fig. 4 Additional file 4: Figure S3). Subfamily II TaHDZ genes are more highly expressed in seedling roots, stems, leaves, flag leaves, and young spikes for example, TaHDZ19-3A/B/D are highly expressed in young spikes, while TaHDZ20-1A/B/D are highly expressed in seedling stems, leaves, and 5-day-old spikes (Fig. 4 Additional file 5: Figure S4). Subfamily III TaHDZ genes showed relatively higher expression levels in seedling stems, leaves, and young spikes TaHDZ24-3A/B/D are highly expressed in seedling leaves, and TaHDZ27-5A/B/D are highly expressed in seedling stems and leaves (Figure 4 Additional file 6: Figure S5). Subfamily IV TaHDZ genes are highly expressed in seedling stems, young spikes, and grains TaHDZ29-3A/B/D are highly expressed in 10-day-old grains, TaHDZ32-3A/B/D are highly expressed in 5–20 day-old grains, and TaHDZ38-5A/B/D are highly expressed in seedling stems and young spikes (Fig. 4 Additional file 7: Figure S6). Thus, genes in the four wheat HD-Zip subfamilies display obvious differences in expression patterns and levels, which indicates that these genes have undergone functional differentiation and redundancy. It is worth mentioning that most homologous genes show similar expression patterns during development. However, it should also be noted that many clustered expression profiles do not reflect gene similarities, and this includes the copies of individual HD-Zip gene types from the sub-genomes. Some of them even show the opposite expression patterns. For instance, TaHDZ7, which is located on chromosome 2D, is preferentially expressed in the seedling leaves and flag leaves, whereas the homologous TaHDZ7 gene from 2A is only expressed in the flag leaves, and the TaHDZ7 homolog from 2B is preferentially expressed in flag leaves and 5-day-old spikes (Fig. 4 Additional file 4: Figure S3). TaHDZ37 on 2A shows relatively higher expression in 10–15 day-old grains, while its homologous TaHDZ37 from 2B is preferentially expressed in seedling leaves and 20-day-old grains, and the homologous from 2D is highly expressed in 15-days-old grains (Figure 4 Additional file 7: Figure S6). The divergences in expression profiles between homologous genes from the different subgenomes reveals that some of them may have lost their function or acquired a new function after polyploidization during the evolution of wheat.

Expression profiles of TaHDZ genes in ten different organs or tissues. The heat map was drawn in Log10-transformed expression values. The red or green colors represent the higher or lower expression level of each transcript in each sample. R, root of wheat seedling at five-leaf stage S, stem of wheat seedling at five-leaf stage L, leaf of wheat seedling at five-leaf stage FL, flag leaf at heading stage YS5, young spike at early booting stage YS15, spike at heading stage GR5, grain of 5 days post-anthesis GR10, grain of 10 days post-anthesis GR15, grain of 15 days post-anthesis GR20, grain of 20 days post-anthesis

Expression patterns of TaHDZ genes in response to drought stress

Wheat productivity is severely affected by drought stress, and therefore the study of drought responsive genes is important to increase wheat yield. Many studies have shown that the HD-Zip genes play a crucial role in the response to abiotic stresses in plants. To gain more insight into the roles of wheat HD-Zip genes in stress tolerance, we first identified the cis-elements within 2 kb promoter region using online program PlantCARE ( We found a number of cis-acting elements related to stress response in the promoter of TaHDZs. They included DRE (Dehydration-responsive element), ABRE (ABA-responsive element), MBS (MYB binding site involved in drought-inducibility), MYC (MYC recognition site), MYB (MYB recognition site), and LTR (low temperature responsive element) (Additional file 8: Table S2). To further understand the potential role of TaABFs in the drought stress response, we reanalyzed the expression profiles of all wheat HD-Zip genes using RNA-seq data from roots and leaves that were subjected to drought treatment. We found that the wheat HD-Zip genes could be mainly classified into two groups based on their expression patterns (Fig. 5a, b Fig. 6a, b). In leaves, the expression levels of 45 TaHDZ genes were up-regulated at one or more time point during drought stress treatment this included 20 genes from the HD-Zip I subfamily (TaHDZ2-5A/B/D, TaHDZ4-5A/B/D, TaHDZ5-6A/D, TaHDZ6-5A/B/D, TaHDZ7-2A/B/D, TaHDZ8-6A/B/D, TaHDZ9-4B/D, and TaHDZ11-2D), 19 genes from the HD-Zip II subfamily (TaHDZ18-5A/B/D, TaHDZ20-1A/B, TaHDZ16-4A/B/D, TaHDZ12-6A/D, TaHDZ13-6A/B/D, TaHDZ14-7A/B, TaHDZ15-1A/B/D, and TaHDZ17-3D), one gene from the HD-Zip III subfamily (TaHDZ24-3A), and five genes from the HD-Zip IV subfamily (TaHDZ29-3A, TaHDZ30-4B, TaHDZ31-5D, TaHDZ37-2A/B) (Fig. 5a, c, and d). In contrast, 50 TaHDZ genes showed down-regulated expression under drought stress, including seven genes from subfamily I, six genes from subfamily II, 12 genes from subfamily III, and 25 genes from subfamily IV (Fig. 5a, c, d). In roots, 34 TaHDZ genes were found to be up-regulated in response to drought stress, including 16 genes from subfamily I (TaHDZ4-5A/B/D, TaHDZ6-5A/B, TaHDZ7-2A/B/D, TaHDZ8-6A/B/D, TaHDZ9-4B/D, and TaHDZ11-2A/B/D), 16 genes from subfamily II (TaHDZ15-1A/B/D, TaHDZ16-4A/B/D, TaHDZ17-3B, TaHDZ19-3A/B/D, TaHDZ20-1A/B/D, TaHDZ21-2A/B, and TaHDZ22-4A) and two genes from subfamily IV (TaHDZ37-2B and TaHDZ40-2B) (Fig. 6a, c, d). In contrast, 51 TaHDZ genes were down-regulated under drought stress in roots, including 12 genes from subfamily I, 8 genes from subfamily II, 13 genes from subfamily III, and 18 genes from subfamily IV (Fig. 6a, c, d). These results indicate that most TaHDZ genes in subfamilies I and II may play important roles in the response to drought stress.

Expression profiles of TaHDZ genes in seedling leaves under drought stress treatment. a hierarchical clustering of the relative expression level of TaHDZ genes under drought stress treatment. The heat map was drawn in Log10-transformed expression values. The red or green colors represent the higher or lower relative abundance of each transcript in each sample. b Expression patterns of TaHDZ genes under drought stress treatment. c The numbers of up-regulated and down-regulated TaHDZ genes in four HD-Zip subfamilies. d The ratios of up-regulated and down-regulated TaHDZ genes in four HD-Zip subfamilies

Expression profiles of TaHDZ genes in seedling roots under drought stress treatment. a Hierarchical clustering of the relative expression level of TaHDZ genes under drought stress treatment. The heat map was drawn in Log10-transformed expression values. The red or green colors represent the higher or lower relative abundance of each transcript in each sample. b Expression patterns of TaHDZ genes under drought stress treatment. c The numbers of up-regulated and down-regulated TaHDZ genes in four HD-Zip subfamilies. d The ratios of up-regulated and down-regulated TaHDZ genes in four HD-Zip subfamilies

TaHDZ5-6A confers drought tolerance in Arabidopsis

The phylogenetic analysis and gene expression profiles suggest that TaHDZ5-6A/D may participate in regulating the drought stress response in wheat. Protein sequence analysis revealed that TaHDZ5-6A and TaHDZ5-6D share 95% sequence similarity (Additional file 9: Figure S7). In order to further comfirm the potential role of TaHDZ5 in the drought stress response, we performed quantitative real-time PCR (qRT-PCR) using RNA isolated from different tissues and drought conditions. The PCR primers were designed to amplify the homologous alleles of TaHDZ5. The results showed that TaHDZ5 is expressed at higher levels in the seedling leaves, flag leaves and young spikes, with the highest expression detected in the seedling leaves, and TaHDZ5 was upregulated throughout the testing period by drought stress (Additional file 10: Figure S8). To further investigate the role of TaHDZ5 in the drought stress response, we generated 35S::TaHDZ5-6A transgenic Arabidopsis lines. Three independent transgenic lines (OE1, OE2, and OE3) were chosen for analysis based on their TaHDZ5-6A expression levels (Fig. 7a). WT and 35S::TaHDZ5-6A transgenic plants were grown for 3 weeks in soil before water was withheld for 14 d. There was no obviously phenotypic differences between 35S::TaHDZ5-6A transgenic and WT plants under normal conditions (Fig. 7c). After the drought treatment and six days of rewatering, 72–88% of the 35S::TaHDZ5-6A plants had survived, whereas only

8% of the WT plants were alive (Fig. 7b, c). Thus, the ectopic of TaHDZ5-6A greatly improved drought tolerance in transgenic Arabidopsis.

Phenotype of the 35S:TaHDZ5-6A transgenic Arabidopsis. a RT-PCR analysis of TaHDZ5-6A transcript levels in the three transgenic lines. b Statistical analysis of survival rates after the drought-stress treatment. The average survival rates and standard errors were calculated based on data obtained from three independent experiments. Significant differences were determined by a t-test. *P < 0.05, **P < 0.01. c Drought tolerance of 35S:TaHDZ5-6A transgenic Arabidopsis. Photographs were taken before and after the drought treatment, and followed by a six-day period of re-watering. D Stomatal aperture of WT and 35S::TaHDZ5-6A transgenic plants under normal and drought conditions. e Statistical analysis of stomatal aperture of WT and 35S::TaHDZ5-6A transgenic plants. Values are mean ratios of width to length. Error bars represent standard errors of three independent experiments (n = 60). Bars, 10 μm. f Water loss from detached rosettes of WT and 35S::TaHDZ5-6A transgenic plants. Water loss was expressed as the percentage of initial fresh weight. Values are means from eight plants for each of three independent experiments. Significant differences were determined by a t-test. *P < 0.05, **P < 0.01. g Free proline content of WT and 35S::TaHDZ5-6A transgenic plants under normal and drought stress treatment

The stomatal apertures of leaves from 35S::TaHDZ5-6A and WT plants grown in soil were measured. The stomatal aperture indices of of the OE1, OE2, and OE3 plants were 0.41, 0.42 and 0.41, respectively, while that of the WT plants was 0.40, when grown under normal conditions (Fig. 7d, e). After being subjected to 10 d of drought stress, the stomatal aperture indices of the OE1, OE2, and OE3 plants decreased to 0.22, 0.18, and 0.22, respectively, significantly reduced as compared to that of the WT (Fig. 7d, e). Consistent with these results, the water loss in detached leaves of 35S::TaHDZ5-6A transgenic plants was much more slowly than those of WT plants under dehydration (Fig. 7f). These results indicate that the 35S::TaHDZ5-6A transgenic plants removed water from the soil more slowly than did the WT plants, reducing the rate of wilting. To explore whether TaHDZ5-6A ectopic expression influences proline accumulation, we compared the free proline contents in 35S::TaHDZ5-6A transgenic and WT plants. Consistent with the drought tolerance phenotype, the proline contents were much higher in transgenic plants than those of the WT plants under drought conditions (Fig. 7g). These findings collectively indicate that TaHDZ5-6A can enhance drought tolerance in transgenic Arabidopsis.

Global gene expression changes in 35S::TaHDZ5-6A transgenic Arabidopsis

RNA sequencing allowed us to understand how drought tolerance was conferred by the ectopic of TaHDZ5-6A. The transcriptome of the 35S::TaHDZ5-6A transgenic plants was compared to that of WT plants under normal, non-stress conditions. In transgenic plants, a total of 495 and 111 genes were upregulated and downregulated by at least 2-fold (P < 0.001, FDR < 0.05) as compared with the WT (Fig. 8a, b Additional file 11: Table S3). The upregulated genes included genes related to water deprivation, abscisic acid, hormones, and abiotic stimuli, and downregulated pathways included those responsive to auxin stimuli, oxidative stress, and defense responses (Fig. 8c). We then chose 10 genes upregulated in transgenic plants and known to be involved in response to drought: DREB2A [39], RD29A [40], RD29B [40], RD26 [41], RD17 [42], PP2CA [43], RAB18 [42], ANAC019 [44], NCED3 [45], and RD20 [46]. We used qRT-PCR to measure their relative expression levels under normal and drought conditions in transgenic and WT plants (Fig. 8d). The results of qRT-PCR were in alignment with those of RNA-seq, indicating that TaHDZ5-6A may positively regulate the transcription of these 10 genes, and thereby play a role in the response, including rapid stomatal closure and reduction of water loss, of transgenic Arabidopsis plants under drought conditions.

Transcriptomic analyses of the 35S::TaHDZ5-6A transgenic Arabidopsis under normal condition. a venn diagrams of up- or down-regulated genes in transgenic plants relative to WT plants using a significance cutoff of P < 0.001, and a fold-change (FC) > 2. b Hierarchical clustering of up- or down-regulated genes in 35S::TaHDZ5-6A transgenic Arabidopsis lines relative to WT plants. The indicated scale is the log2 value of the normalized level of gene expression. c Gene ontology of biological pathways (GOBPs) enriched in TaHDZ5-6A transgenic plants based on up or downregulated genes. d qRT-PCR analysis of drought induced genes in the transgenic and WT plants under normal and drought conditions. The error bars indicate standard deviations derived from three independent biological experiments. Significant differences were determined by a t-test. *P < 0.05, **P < 0.01

Materials and methods

Pre-processing input data

We first generated a binary matrix of predictions of TF–gene regulatory interactions. For predictions based on condition-specific ChIP-chip data, a ‘1’ was encoded if the binding P-value of the TF to the gene's promoter region was <0.005, otherwise a ‘0’ was encoded. For TFs without condition-specific ChIP-chip data, we followed a version of the regulatory code of Harbison et al (2004) . For these TFs, a TF–gene pair had a ‘1’ encoded in the matrix if the TF bound the promoter of the gene with a P-value <0.005 in at least one ChIP-chip experiment and there was a motif in its promoter that was evolutionarily conserved in at least two other yeast species. If no ChIP-chip data were available for a gene, then a ‘0’ was encoded for all entries. All time-series data were transformed to start at ‘0’ so that the value at each time point represents the log ratio change from an unstressed control. Genes were filtered if there was more than one missing value or if the gene did not change sufficiently at any time point (see Resutls section in Supplementary information).

Dynamic regulatory events miner algorithm

Each state of the probabilistic model is associated with a Gaussian distribution. A tree structure was used among the states and their transitions. At time point 0, there is one state, which is the root of tree. Every state except those associated with the last time point has at least one child, and for the results in this paper we allowed not more than two children. Any state having more than one child has a logistic regression classifier with L1 loss penalty ( Krishnapuram et al, 2005 ) associated with it. This classifier maps the set of predicted TF interactions for a gene to a probability distribution of transitions to each of the child states. To learn a dynamic regulatory map, the DREM algorithm first performs a search over tree structures. A randomly selected subset of genes is used to train the Gaussian distribution parameters and the classifiers in the tree structure under consideration. The remaining genes are used to score the various tree structures considered. Training is carried out using a version of the Baum–Welch algorithm ( Durbin et al, 1998 ). After the best scoring structure is found using the test set of genes, weakly supported splits are pruned to avoid overfitting the test set of genes. After a final model structure is selected, all genes are used to train the parameters of the final model. See Methods section in Supplementary information for full details.

DREM software: The DREM software is available for download at

Inferring gene assignments and TF scores

Genes are assigned to their most likely path through the model using the Viterbi algorithm ( Durbin et al, 1998 ). The assignment of genes to paths through the models is used to determine if certain paths are overenriched for genes regulated by certain TFs. Overenrichment scores are used for the association of TFs with paths. These scores are obtained using the hypergeometric distribution, with a lower score meaning a stronger association. The base set of genes for the hypergeometric distribution can be just the genes going into the previous split giving a TF split association score, or all genes on the microarray giving an overall association score of a TF for a path.

GO P-values

GO P-values were computed in the DREM software based on the hyper-geometric distribution. All P-values reported are uncorrected, but are still significant at the 0.01 level when correcting for multiple hypothesis testing using a randomization procedure ( Ernst and Bar-Joseph, 2006 ).

Saccharomyces cerevisiae strain list

For the immunoprecipitation experiments, we used Myc-tagged W303 yeast strain obtained as a gift from Rick Young. The genotype of the Ino4 strain is MATa:ade2-1:trp1-1:can1-100:leu2-3,112:his3-11,15:ura3:GAL+:psi+:INO4::9myc:TRP1 and of the Gcn4 strain is MATa:ade2-1:trp1-1:can1-100:leu2-3,112:his3-11,15:ura3:GAL+:psi+:GCN4::9myc:TRP1.

Growth condition

For the AA starvation experiments, cells were grown in complete minimal medium (SCD) to early-log phase. Cells were collected by centrifugation and resuspended in an equal volume of minimal medium lacking AAs and adenine (YNB−AA, 2% glucose, 20 mg/l uracil) and allowed to grow. Samples for location analysis were taken before resuspension in AA starvation conditions and 4 h afterwards.

For the MMS experiments, cells were grown in YPD media to early-log phase at 30°C until the culture reached an OD600 of 0.8–1.0. MMS (Sigma) was added to a final concentration of 0.03%, and the culture was grown for an additional hour. Samples for genome-wide location analysis were taken before adding MMS and 15 and 60 min after adding MMS.

Chromatin immunoprecipitation–PCR

Bound proteins were formaldehyde-crosslinked to DNA in vivo, followed by cell lysis and sonication to shear DNA. Crosslinked material was immunoprecipitated with an anti-myc antibody, followed by reversal of the crosslinks to separate DNA from protein ( Aparicio 1999 Orlando, 2000 ). Enrichment for Ino4-binding site was measured by semiquantitative PCR using primers designed for the detection of upstream regions of the genes YDR497C, YNL169C, YGR196C, and YHR123W. Primer sequences are as follows: YDR497C: TAGCGCACCAAACTGAAAGA, AAGCGCATATACTTAGTTCTCTCCA YNL169C: CGACCAAGAAGGATTTGAGC, CCAGCACCTTTTTGGTGTTT YGR196C: CGCTTTCCAGAAAAAGGGTA, CGTCGTTTGTTTGTTTGGTG YHR123W: TGGCAAAATACAGAACACAGG, TATGCTCAGTCCAGCCCTTT. As a negative control, primers for the upstream region of Cts1 were used (AGTGGTTGGTTGGTGGGAATA TCTTTGACCAATGCCTATGAA). The quantization of the enrichment was performed by calculating the ratio of the IP signal and the input signal for the target gene divided by the IP ratio and the input ratio of the negative control gene (Cts1), utilizing the software TINA. TINA is software for quantification of band intensity. After PCR, the fragments were separated on agarose gel (1%) and monitored by a CCD camera. Bands intensities were quantified using TINA 2.09d quantification software (Raytest,

Chip on chip

Genome-wide location analysis was performed as described previously ( Ren et al, 2000 ). Briefly, following purification of the DNA in the ChIP procedure, immunoprecipitated DNA and DNA from an unenriched sample were amplified and differentially fluorescently labeled by ligation-mediated PCR. These samples were hybridized to a microarray consisting of spotted PCR products representing the intergenic regions of the S. cerevisiae genome. The data have been deposited into ArrayExpress with the accession numbers E-MEXP-905 (Gcn4 experiments) and E-MEXP-906 (Ino4 experiments).

Budding index calculation

Cells grown continuously at 30°C were collected by centrifugation, resuspended in an equal volume of 37°C medium, and returned to 37°C for growth. Samples were collected at time points as described in Figure 4C. For each time point, cells were mildly sonicated in order to separate clumps, counted and divided into three groups: ‘No Bud’—a single cell ‘Small Bud’—a cell with a small bud attached and a ‘Big Bud’—a cell with a bud that is more than half of the size of the cell it is attached to.