What is the relationship between protein-protein interaction networks and metabolic networks?

What is the relationship between protein-protein interaction networks and metabolic networks?

We are searching data for your request:

Forums and discussions:
Manuals and reference books:
Data from registers:
Wait the end of the search in all databases.
Upon completion, a link will appear to access the found materials.

I am trying to find out how these networks can be linked together. I know that Protein-protein interaction networks and metabolic networks both fall under the Intra-cellular type of biological networks that describe the cellular functioning. But what is the relationship between them?

Thank you very much.

Proteins interact with each other often for regulation purposes and for localization of several enzymatic reactions for increased efficiency. For example, some proteins inhibit their binding partners. Or DNA replication complex is made of bunch of proteins, many of which do different jobs, but they are in physical complex (e.g. helicase with DNA polymerase).

Metabolic networks are sets of chemical reactions, they might even happen in different compartments inside the cell. While you can try to investigate protein interaction network by cross-linking proteins and then analyzing complexes you extract, metabolic networks are constructed by deciphering sequences of chemical reactions (e.g. metabolic cycles) and finding appropriate enzymes for each reaction.

Metabolic process (e.g. DNA replication) can be run by interacting proteins, but also might include proteins that don't directly interact with each other. For example, DNA polymerase complex does not produce nucleotides, but they are necessary for it's function.

In short, proteins in Protein-protein interaction networks perform a function by directly interacting with one another. They may for instance bind to one another forming permanent or momentary complexes (e.g. insulin binding to insulin receptor or the pieces of F0-F1 ATPase combining), or they may chemically modify one another (e.g. protein kinases adding a phosphate to the residue of another protein).

Proteins in a Metabolic network might never be in physical contact one another. They effect one another by producing chemical compounds which are utilized by other enzymes, forming a network of chemical transformations.

Of course an enzyme in a metabolic network might also have its activity modulated through interacting with other proteins in a protein-protein network. The two terms are of course not mutually exclusive, but describe different sorts of relationships between proteins.

The basic struture of the metabolic networks (MN) is like this: molecule1 -> molecule2, where the edges are enzymes.

And the basic structure of the PPIN is like this: Protein1 - Protein2, where the edges are van der Waals forces between proteins.

There are some diferences, PPIN isnt directional and MN is directional in the way of the spontaneous reaction. PPIN has two modes of interaction date interactions (1-1) and party interactions (many-many).

One easy way to combine both is to invert the MN so the enzymes are now the nodes and the metabolites are now the edges, like this: enzyme1 -> enzyme2, and then add the proteins in the PPIN, like this: enzyme1(protein1) -> enzyme(protein2). But in this way you need to demostrate that some properties of the network are conserved.

Protein-Protein Interaction Networks

Proteins are vital macromolecules that facilitate diverse biological processes at both cellular and systemic levels. Enormous molecular processes are regulated via a large number of protein components organized by Protein-Protein Interactions (PPIs), which refer to intentional physical contacts established between two or more proteins and resulted in specific biochemical events. Such interactions undertake at the core of the entire interactomics system of the living cells, unsurprisingly, specific PPIs are identified with a correlation of multiple diseases.

Figure 1. Protein-protein interaction hot spots and allosteric sites. (Turnbull A P. et al. 2014)

The discovery and verification of protein-protein interaction is the first step to understand where, how and under what conditions these proteins interact in vitro / in vivo and their functional implications underlying behind interactions. As shown in Table 1, Some popular methods in PPI studies listed below.

Table 1. The most popular methods in PPI studies

MethodTypes of PPIs
Co-Immunoprecipitation (co-IP)Stable or strong
Pull-Down AssayStable or strong
Crosslinking Protein Interaction AnalysisTransient or weak
Label Transfer Protein Interaction AnalysisTransient or weak
Far-Western Blot AnalysisModerately stable

Protein-Protein Interaction Analysis at Creative Proteomics

Various techniques have been applied in our group to study PPIs. Each method has its own advantages and limits. We instruct the best suitable method to our customers. Our services include but are not limited to:

Co-IP is a useful in vitro method to evaluate proteins that involved in the complex and whether it binds to each other tightly. PPIs indentification utilizes target protein-specific antibodies to capture proteins that are bound to the specific target protein. Co-IP enables to capture and purify not merely the primary target, but as well other macromolecules that are involved in the interactions.

The pull-down assay is capable of detecting a physical interaction between two or more proteins and identifies previously unknown PPIs. In a pull-down assay, a bait protein is tagged and immobilized to affinity resin. When a sample incubates with the bait proteins, proteins binding to the bait protein will be captured and &ldquopulled down&rdquo.

Crosslinking protein interaction analysis is suitable for transient or weak interaction, which can be performed in vivo or in vitro. In this method, analytical solution, like crosslinking reagents or crosslinkers, enable to arrest protein-protein complexes by covalently binding, correspondingly followed with consequent isolation and characterization.

Label transfer has been applied for detecting transient or weak PPIs that are difficult to capture using other in vitro detection strategies. A label transfer reagent will be performed to tag proteins that interact with a protein of interest. The development of new non-isotopic reagents and methods enhanced a simpler and more accessible label transfer analysis.

Far-western blot, based immunoblotting procedure, also detects PPIs in vitro and do not require to preserve the native state of target protein. In this method, a purified and labeled "bait" protein is used to probe the target "prey" protein on the membrane.

The result of two or more proteins that interact with a specific functional objective is available to be demonstrated in several different ways. The measurable effects of protein interactions have been outlined as follows:

  • Inactivate or denature a protein
  • Alter the kinetic properties of enzymes
  • Create a new binding site, typically for small effector molecules
  • Change the specificity of a protein for its substrate through the interaction with different binding partners
  • Serve a regulatory role in either an upstream or a downstream event
  • Allow for substrate channeling by moving a substrate between domains or subunits, resulting ultimately in an intended end product

Creative Proteomics has a team of scientists with specific experience in protein-protein interaction studies, our protein-protein interaction network platform will assist you to decipher protein-protein interaction and expand the outlook on your research.

Our ordering procedure is as follows. If you have any questions or specific requirements, please feel free to contact us.

1. Turnbull A P, Boyd S M, Walse B. Fragment-based drug discovery and protein–protein interactions. Research and Reports in Biochemistry, 2014, 4: 13-26.

Systems biology and metabolic networks predict heterosis

When genetically distant individuals are crossed, their offspring often show greater vigour than their parents for quantitative traits. For example, growth is faster, age of reproduction is earlier, fertility is higher and resistance to disease is stronger. This phenomenon, called heterosis, has been exploited by humans in animal and plant breeding and has implications for evolutionary biology.

The introduction of cross-pollinated (hybrid) plants was one of the most important innovations in agriculture and global food security. Most annual crops show heterosis. Hybrid maize, for example, can yield twice as much as the parent plants. The growth rate of some hybrid yeasts exceeds that of parental strains by more than an order of magnitude.

So far heterosis has been used without full knowledge of the underlying genetic or molecular principles. Because the effects of heterosis can be so impressive, such an understanding could greatly advance breeding. Most recent studies of heterosis have focused on detecting quantitative trait loci (QTL) – sections of DNA that correlate with variation in a trait. QTL are used to explore genetic effects or to search for expression of transcripts or proteins in hybrids in the hope of identifying molecular mechanisms for heterosis in particular traits.

However, descriptive approaches like this cannot provide a general and biologically realistic model accounting for the pervasiveness of heterosis. This is where a systemic approach based on metabolic network modelling is proving useful.

Systems and network biology
Heterosis has inspired many genetic, genomic and molecular studies, but has less often been investigated from the perspective of systems biology, Prof Dominique de Vienne’s focus. Systems biologists model complex biological systems, such as molecules and their interactions within a living cell, rather than looking at isolated parts.

Related to this is network biology enabling the representation and analysis of biological systems with tools derived from graph theory, which uses mathematical structures to model multiple relations between objects, and topology, which considers the arrangement of the elements of a network.

Network analysis works with the complexity of the network to extract meaningful information that you would not have if individual components were examined separately. The data explosion from the ‘omics’ era of biological research has led to more systemic approaches to data analysis and a move away from single gene/protein studies.

Adding the ending ‘omics’ to a molecular term implies a comprehensive assessment of a set of molecules. Genomics, the first omics approach, focused on entire genomes as opposed to genetics which looks at individual variants or single genes. Quantitative proteomics provides expression data for hundreds of proteins, including enzymes, while metabolomics techniques can access thousands of metabolites.

Heterosis has inspired genetic, genomic and molecular studies, but has less often been investigated from the perspective of systems biology.

Complex information like this can be represented by networks to model the biological system of interest. Some of the most common types of biological networks are protein–protein interaction networks, metabolic networks, genetic interaction networks, gene/transcriptional regulatory networks and cell signalling networks: heterosis could emerge from all these networks.

Studying genotype–phenotype relationships
The genetic makeup of an individual, its ‘genotype’, determines its characteristics or ‘phenotype’ in a given environment. The genotype–phenotype relationship is of fundamental interest to breeders as it describes how genetic polymorphism causes phenotypic variation. Genetic polymorphism due to mutations in the genotype produces the variety of forms seen in populations.

Heterosis in action. The middle plant is the offspring of the two plants on either side: its increased vigor is clear. Photo Credit: Julie B. Fiévet

If the genotype–phenotype relationships were linear (proportionality between genotype and phenotype values), offspring would have intermediate trait values compared to their parents, not better ones. Actually, the cellular processes involved in biological functions and structures are complex, and the relationship between measurable parameters at genotype and phenotype levels is often found to be non-linear. Network models in systems biology are typically highly non-linear and can be used effectively to study phenotype responses to genotype variation.

Dominique de Vienne and colleagues suggest that heterosis is an emergent property of living systems resulting from non-linear relationships between genotypic variables and phenotypes or between different phenotypic levels, from the molecular to the individual. They use a systemic approach to show that the key to understanding heterosis may lie in the ‘law of diminishing returns’.

This ‘law’ states that in all productive processes, adding more of one factor while holding all others constant, will eventually yield lower incremental returns. In biological terms, when the concentration or activity of a cellular component increases gradually, the effect on the phenotype is at first high but then begins to fade away.

Mathematic modelling of physiological dominance suggested that heterosis is an intrinsic property of non-linear relationships between traits.

For example, if the concentration of an enzyme in a metabolic network increases, the metabolic flux (rate of synthesis of molecules catalysed by enzymes) through this network initially grows rapidly, then slows down as the enzyme concentration goes up. Thus the kinetics (or rates) of biochemical and molecular reactions are intrinsically non-linearly related to enzyme concentrations.

Non-linear genotype–phenotype relationship is the key of heterosis
Non-linearity has been demonstrated at different levels of organisation, from genetic transcription/translation to fitness-related characteristics. When considering one locus, it appears to explain the dominance of the most active allele over the least active, as proposed as early as 1934 by Sewall Wright, a famous American evolutionary geneticist.

Improvements in technology have advanced crop breeding. LuckyStep/

When the trait is controlled by many loci, which is the most common situation, heterosis arises as a consequence of two linked phenomena. First, the slightly deleterious recessive alleles of one parent are complemented by superior dominant alleles of the other parent. The hybrid can therefore have a higher value than both parents, and so hybrid vigour is expected to be stronger when the parents are genetically distant due to better complementarity. This model, likewise, explains inbreeding depression as the accumulation of deleterious recessive alleles at homozygous loci, i.e., loci with identical alleles. Second, the non-linearity results in epistasis, meaning that the effect of substituting one allele for another is dependent on the genotype at other loci. This genetic effect also plays a role in heterosis.

Dominique de Vienne and colleagues have mathematically formalised and validated the dominance/epistasis model of heterosis experimentally using in vitro and in silico (computer simulated) genetics. They have worked with the glycolytic pathway in yeast to look at metabolic flux prediction and optimisation in relation to heterosis.

They first reconstituted in vitro the four-enzyme upstream segment of the glycolysis pathway, simulating genetic variability by varying enzyme concentrations in test tubes. “Hybrids” were obtained by mixing the content of “parental” tubes, and their fluxes were measured. They found that usually the phenotypic value of a hybrid is higher than the average of its parents, and in some cases higher than that of the best parent.

Then they used mathematical modelling. Modelling metabolic networks relies on mathematical tools and specialised computer programs. But identifying and estimating the many enzyme parameters for biochemical processes is difficult. So, modelling efforts based on conceptual shortcuts are essential to simulate complex cellular behaviours from a smaller amount of biological data.

A simplified formalism based on metabolic control analysis was used to derive global parameters that accounted for the kinetic behaviour of four enzymes from the upstream part of glycolysis. According to the structure of the pathway and the position of the enzyme in the pathway, just one or two parameters per enzyme were sufficient.

Genetic variability was created by varying in silico enzyme concentrations. The virtual parents were crossed to get hybrids, the flux of which was computed. Again the curvature of the relationship describing the genotype–phenotype relationship resulted in heterosis. This result is robust, as it was confirmed by explicit modelling of the whole glycolysis and a similar in silico genetics approach.

This mechanism for heterosis is valid beyond the metabolic systems. In another recent work, Dominique de Vienne, co-author François Vasseur and colleagues successfully predicted the amplitude of heterosis for two fitness-related traits – growth rate and fruit number – in series of hybrids among accessions of Arabidopsis thaliana, a valuable model plant for studies of growth and development and the first plant genome to be fully sequenced.

The traits of interest were non-linearly related to individual biomass, in the same way that metabolic fluxes are non-linearly related to enzyme concentrations. This non-linearity forces hybrids to deviate from the average value of their parents, which leads to their better vigour. Mathematical modelling made it possible to predict up to 75% of the amplitude of heterosis while the genetic distance between parents explained at best 7% of heterosis.

Both mathematical and experimental results suggested that the appearance of heterosis in hybrids is a systemic property emerging from biological complexity. These findings were consistent with various observations in quantitative and evolutionary genetics, and provide a model unifying the genetic effects underlying heterosis. The geometric view of genotype–phenotype relationship in crop plants has potential for predicting heterosis in traits affecting yield and environmental stability.

Personal Response

How has the field of plant genetics changed since you began working in it?

Since I began working in plant genetics, there has been spectacular evolution of techniques available for the biologist. Robots for high-throughput genotyping and phenotyping, mass spectrometers for proteomics and metabolomics, increasingly powerful computers, etc., make it possible to accumulate and analyse huge amounts of data in a relatively short time. It is now possible to map finely and identify QTL for traits at all levels of biological organisation, from transcript/protein/metabolite abundances to fitness components, and to better understand the genomic bases of phenotypic trait variation. For the breeder, marker-assisted or genomic selection allows more efficient selection methods to be implemented.

What is the relationship between protein-protein interaction networks and metabolic networks? - Biology

Influence of metabolic network structure and function on enzyme evolution

7 5 R39

2006 Vitkup et al. licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License ( ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Metabolic enzyme evolution

An analysis of evolutionary constraints, gene duplication and essentiability in the yeast metabolic network demonstrates that the structure and function of a metabolic network shapes the evolution of its enzymes.

Most studies of molecular evolution are focused on individual genes and proteins. However, understanding the design principles and evolutionary properties of molecular networks requires a system-wide perspective. In the present work we connect molecular evolution on the gene level with system properties of a cellular metabolic network. In contrast to protein interaction networks, where several previous studies investigated the molecular evolution of proteins, metabolic networks have a relatively well-defined global function. The ability to consider fluxes in a metabolic network allows us to relate the functional role of each enzyme in a network to its rate of evolution.

Our results, based on the yeast metabolic network, demonstrate that important evolutionary processes, such as the fixation of single nucleotide mutations, gene duplications, and gene deletions, are influenced by the structure and function of the network. Specifically, central and highly connected enzymes evolve more slowly than less connected enzymes. Also, enzymes carrying high metabolic fluxes under natural biological conditions experience higher evolutionary constraints. Genes encoding enzymes with high connectivity and high metabolic flux have higher chances to retain duplicates in evolution. In contrast to protein interaction networks, highly connected enzymes are no more likely to be essential compared to less connected enzymes.

The presented analysis of evolutionary constraints, gene duplication, and essentiality demonstrates that the structure and function of a metabolic network shapes the evolution of its enzymes. Our results underscore the need for systems-based approaches in studies of molecular evolution.

Evolution Biochemistry and structural biology Bioinformatics

In the present study, we ask how the topology of a metabolic network and the metabolic fluxes (a metabolic flux is the rate at which a chemical reaction converts reactants into products) through reactions in the network influence the evolution of metabolic network genes through point mutations and gene duplication. Our results suggest that both network structure and function need to be understood to fully appreciate how metabolic networks constrain the evolution of their parts. The present study has become possible with the recent publication of a comprehensive compendium of metabolic reactions in the yeast Saccharomyces cerevisiae 10 . This compendium comprises 1,175 metabolic reactions and 584 metabolites, and involves about 16% of all yeast genes.

Using the stoichiometric equations that describe chemical reactions, we calculate the connectivity of an enzyme as the number of other metabolic enzymes that produce or consume the enzyme's products or reactants (see Materials and methods and Additional data file 1). In other words, a metabolic enzyme A and a metabolic enzyme B are connected if they share the same metabolite as either a product or reactant. Highly connected enzymes in this representation are enzymes that share metabolites with many other enzymes. Including the most highly connected metabolites and cofactors such as ATP or hydrogen in a network representation would render the network structure dominated by these few nodes, and would obscure functional relationships between enzymes. We thus excluded the top 14 most highly connected metabolites: ATP, H, ADP, pyrophosphate, orthophosphate, CO2, NAD, glutamate, NADP, NADH, NADPH, AMP, NH3, and CoA 12 . The results we report below are qualitatively insensitive to the exact number of removed metabolites.

Highly connected enzymes evolve slowly

We will first discuss how network structure - specifically, an enzyme's position in the network - influences enzyme evolution. Generally, enzymes in central parts of metabolism such as the tricarboxylic acid cycle will have more neighbors than enzymes in peripheral metabolic pathways (Figure 1 ). The correlation shown in Figure 1 arises from the fact that more connected enzymes have a direct access to many network nodes and consequently have shorter path lengths to other enzymes in the network. The evolutionary constraints on a metabolic enzyme can be estimated through the normalized ratio of non-synonymous to synonymous substitutions per nucleotide site (Ka/Ks) that occurred in the gene coding for the enzyme 13 . A small Ka/Ks ratio suggests higher evolutionary constraints on the enzyme, that is, a smaller fraction of accepted amino acid substitutions. In our analysis, we used the average ratio Ka/Ks of unambiguous orthologs in four sequenced Saccharomyces species: S. cerevisiae , S. paradoxus , S. bayanus , and S. mikatae 14 . The average Ka/Ks values used in the main analysis were taken from the study by Kellis et al . 14 . We also recalculated the average ratios using the maximum-likelihood method of Yang and Nielsen 15 and obtained qualitatively similar results.

The correlation between enzyme connectivity and centrality in the yeast metabolic network

The correlation between enzyme connectivity and centrality in the yeast metabolic network. Spearman's rank correlation r = -0.74, P < 0.0001 Pearson's correlation r = -0.67, P < 0.0001. The centrality of an enzyme is equal to the mean length of network distances from the enzyme to all other enzymes in the networks (pairs of enzymes not connected by any path in the network were excluded from the calculation).

Figure 2 demonstrates a statistically significant negative correlation between the metabolic connectivity of an enzyme and the ratio Ka/Ks (Spearman's rank correlation r = -0.20, P = 1.1 × 10 -4 Pearson's correlation r = -0.18, P = 7 × 10 -4 ). The inset in Figure 2 shows that this negative association holds over a broad range of connectivities, and that it is not caused by a small number of highly connected proteins. Additional data file 2 demonstrates a weaker negative correlation between non-synonymous (amino acid changing) substitutions Ka and gene connectivity (Spearman's rank correlation r = -0.13, P = 1.6 × 10 -2 ). The reason is that using only Ka, instead of the preferable Ka/Ks, as a measure of evolutionary constraints does not compensate for gene-specific differences in synonymous substitution rates and thus introduces additional noise in the data. Additional data file 3 shows that synonymous (silent) substitutions Ks and enzyme connectivity are not significantly correlated (Spearman's rank correlation r = 0.056, P = 0.30). This is to be expected, as synonymous substitutions do not cause amino acid changes and are thus selectively neutral for the purpose of our analysis.

The relationship between enzyme connectivity in the yeast metabolic network and evolutionary constraint quantified by the Ka/Ks ratio

The relationship between enzyme connectivity in the yeast metabolic network and evolutionary constraint quantified by the Ka/Ks ratio. Spearman's rank correlation r = -0.20, P = 1.1 × 10 -4 Pearson's correlation r = -0.18, P = 7 × 10 -4 . The connectivity of a metabolic enzyme is equal to the total number of other network enzymes producing or consuming the enzyme's reactants and products. Ka is the fraction of amino acid replacement substitutions per amino acid replacement site on DNA Ks is the fraction of silent substitutions per silent site on DNA. The inset shows the histogram of binned enzyme connectivity versus median evolutionary constraint Ka/Ks (using the same data as in the main figure). The standard errors in each bin are also shown.

Enzymes that carry large metabolic fluxes evolve slowly

Correlation between enzymatic flux magnitude and evolutionary constraint Ka/Ks

Maximum uptake rates (mmol/gDW/h)

Spearman's rank correlation ( P value) with zero fluxes

Spearman's rank correlation ( P value) without zero fluxes

The correlation between enzymatic flux magnitude and evolutionary constraint Ka/Ks was calculated with and without enzymes carrying zero fluxes. gDW, grams dry weight.

Gene duplication correlation with connectivity and flux

Gene duplications have effects opposite from those of most amino acid changes: they may increase rather than reduce flux through an enzymatic reaction. We established that highly connected enzymes and enzymes with high associated flux are especially sensitive to amino acid changes (Figures 2 and 3 ). Are their enzyme-coding genes, conversely, also more likely to undergo duplication? Figure 4 shows that this is indeed the case for enzyme connectivity. The figure demonstrates an association between an enzyme-coding gene's number of duplicates and enzyme connectivity (only enzymes with sequence identity higher than 40% were considered as duplicates). Mean connectivity for genes with no duplicates is 15.0, and for genes with duplicates it is 19.2 (non-parametric Wilcoxon test, P = 1.4 × 10 -4 ). This result suggests that duplicates of enzymes producing or consuming widely used metabolites are more likely to be retained in evolution. Figure 5 and Additional data file 5 demonstrate that a similar association exists between non-zero enzymatic flux through a reaction and the number of duplicates of the respective enzyme's coding gene. Specifically, the higher the flux through a reaction, the more duplicates an enzyme-coding gene has. Qualitative association between enzymatic flux and gene duplication was also recently shown by Papp et al . 22 .

The relationship between metabolic flux and evolutionary constraint

The relationship between metabolic flux and evolutionary constraint.(a) The relationship between metabolic flux values and evolutionary constraint Ka/Ks for aerobic growth on glucose. (maximal uptake rate for glucose 15.3 mmol/g dry weight (DW)/h maximal oxygen uptake 0.2 mmol/gDW/h). Spearman's rank correlation r = -0.30 P = 2.7 × 10 -3 Pearson's correlation r = -0.24, P = 1.7 × 10 -2 . The metabolic fluxes were calculated using flux balance analysis to maximize the cell growth rate. Fluxes more than two orders of magnitude larger than the median non-zero flux - representing large glycolytic fluxes - were excluded from the analysis. (b) The same as (a) but using log coordinates for the metabolic flux magnitude.

The relationship between enzyme connectivity and the average number of duplications in corresponding enzyme-coding genes

The relationship between enzyme connectivity and the average number of duplications in corresponding enzyme-coding genes. Enzymes with sequence identity larger than 40% over 100 or more aligned amino acids were considered as duplicates.

The relationship between the number of duplicates of an enzyme-coding gene and the magnitude of the metabolic flux through the enzymatic reaction

The relationship between the number of duplicates of an enzyme-coding gene and the magnitude of the metabolic flux through the enzymatic reaction. The results are shown for aerobic growth on glucose (maximal uptake rate for glucose 15.3 mmol/gDW/h oxygen 0.2 mmol/gDW/h). Putative duplicate pairs with less than 40% amino acid similarity or less than 100 aligned amino acid residues were excluded.

Connectivity, essentiality, and metabolic robustness

Evolutionary constraints on enzymes are indirect indicators of metabolic robustness to amino acid changes, changes that a metabolic network tolerated for well over millions of years of evolution. Another type of biological robustness is that against complete gene deletions. Robustness against gene deletions can be derived from laboratory studies in which the effects of gene deletions on growth rate and other indicators of fitness are studied 23 24 . These studies determine essential genes, that is, genes whose elimination in one or more laboratory environments is effectively lethal. Our use of available essentiality data is motivated by the observation that highly connected proteins in protein interaction networks may be more likely to be essential to a cell 1 . We carried out analyses using data on essential genes derived from a large scale gene deletion study by Giaever et al . 23 , and used the Saccharomyces genome database (SGD) 25 to collect the essentiality data.

The relationship between enzyme connectivity and gene essentiality

The relationship between enzyme connectivity and gene essentiality. The connectivity of a metabolic enzyme is equal to the total number of other network enzymes producing or consuming the enzyme's reactants and products. The information on gene essentiality was obtained from the systematic gene deletion study by Giaever et al . [23] using the SGD database [25].

In sum, we demonstrate that both highly connected enzymes and enzymes that carry high metabolic fluxes in the yeast metabolic network have tolerated fewer amino acid substitutions in their evolutionary history. Why are enzymes carrying larger fluxes more constrained? The likely answer comes from the observation that most mutations affecting enzymatic activity may reduce rather than increase flux. Enzymes carrying high fluxes tend to have reaction products that enter a large number of metabolic pathways. Consequently, a mutational reduction in the activity of such enzymes should be more detrimental than a reduction in the activity of enzymes with lower flux.

We also show that the genes encoding enzymes with high flux have more duplicates. Importantly, we do not argue that duplications arise more frequently for genes whose products carry high flux, but that such duplications are more likely to be preserved in evolution, because of the advantage - higher flux - they provide. While a gene's duplicates can initially be preserved through an advantageous increase in metabolic flux, after divergence they may provide other functional benefits 30 . Divergence of metabolic genes in their expression and regulation is well-established for gene in intensely studied parts of metabolism, such as tricarboxylic acid cycle enzymes 31 .

We found that the association between predicted enzymatic flux and evolutionary rate is most pronounced for carbon sources that dominate the natural environment of yeast. This suggests that one can use the association between flux and evolutionary constraint to search for conditions that dominated the evolution of metabolic networks. Similar analyses, which use genomic data to infer the environment that has shaped an organism's evolution, have been used before to show that carbon limitation may have influenced the evolution of the E. coli metabolic network more strongly than nitrogen limitation 19 , and to show that yeast evolution favored fermentation over respiration 32 .

It should not be surprising that the observed associations are weak in magnitude. The reason for the low magnitude is that many other factors influence the evolution of enzyme-coding genes. Two of these factors are gene expression levels (discussed in the paper) and constraints stemming from the tertiary and quaternary structure of enzymes, which may differ among enzymes (little is known about such constraints). The key point is that besides all these other factors, metabolic network function and structure also has a clear influence on protein evolution.

In conclusion, our analysis of evolutionary constraints, gene duplication, and essentiality demonstrates that the structure and function of a metabolic network shapes the evolution of its enzymes. In the long run, system analyses of biological networks will allow us to increasingly place the evolution of genes in the larger context in which they operate, as building blocks of cellular networks.

The following additional data are available with the online version of this paper. Additional data file 1 is a figure showing examples of metabolic connectivity. (a) An example of the metabolic reaction network from sphingoglycolipid metabolism metabolites are drawn as small circles (DHSP, sphinganine 1-phosphate PETHM, ethanolamine phosphate SPH, sphinganine CDPETN, CDPethanolamine ETHM, ethanolamine) and enzyme-encoding genes are shown in rectangles. (b) Metabolic connectivity of the dpl1 gene (solid edges), as defined by the reactions shown in (a). The dpl1 gene has a total of six metabolic connections: two established through ethanolamine phosphate (red edges) and four through sphinganine 1-phosphate (blue edges). Metabolic connections between other enzymes are show by dashed edges. Additional data file 2 demonstrates the relationship between enzyme connectivity and the average amino acid divergence Ka. Spearman's rank correlation r = -0.13, P = 1.6 × 10 -2 . Additional data file 3 shows the relationship between enzyme connectivity and the average silent divergence Ks. Spearman's rank correlation r = -0.056, P = 0.30. Additional data file 4 is a histogram of the calculated metabolic fluxes in the yeast network for aerobic growth on glucose (maximal uptake rate for glucose 15.3 mmol/g dry weight/h oxygen 0.2 mmol/g dry weight/h). Note the small number of fluxes - representing glycolysis - with disproportionately large magnitudes. Similar flux distributions were also obtained for other growth conditions. Additional data file 5 shows the correlation between non-zero enzymatic flux through a reaction and the number of duplicates of the respective enzyme's coding gene. Additional data file 6 provides connectivity and evolutionary parameters (Ka/Ks, Ka, Ks) for yeast metabolic enzymes.

Examples of metabolic connectivity

(a) An example of the metabolic reaction network from sphingoglycolipid metabolism metabolites are drawn as small circles (DHSP, sphinganine 1-phosphate PETHM, ethanolamine phosphate SPH, sphinganine CDPETN, CDPethanolamine ETHM, ethanolamine) and enzyme-encoding genes are shown in rectangles. (b) Metabolic connectivity of the dpl1 gene (solid edges), as defined by the reactions shown in (a). The dpl1 gene has a total of six metabolic connections: two established through ethanolamine phosphate (red edges) and four through sphinganine 1-phosphate (blue edges). Metabolic connections between other enzymes are show by dashed edges.

The relationship between enzyme connectivity and the average amino acid divergence Ka

The relationship between enzyme connectivity and the average amino acid divergence Ka. Spearman's rank correlation r = -0.13, P = 1.6 × 10 -2

The relationship between enzyme connectivity and the average silent divergence Ks

The relationship between enzyme connectivity and the average silent divergence Ks. Spearman's rank correlation r = -0.056, P = 0.30.

Histogram of the calculated metabolic fluxes in the yeast network for aerobic growth on glucose

Maximal uptake rate for glucose 15.3 mmol/g dry weight/h and for oxygen 0.2 mmol/g dry weight/h. Note the small number of fluxes - representing glycolysis - with disproportionately large magnitudes. Similar flux distributions were also obtained for other growth conditions.

The correlation between non-zero enzymatic flux through a reaction and the number of duplicates of the respective enzyme's coding gene

The correlation between non-zero enzymatic flux through a reaction and the number of duplicates of the respective enzyme's coding gene.

Connectivity and evolutionary parameters (Ka/Ks, Ka, Ks) for yeast metabolic enzymes

Connectivity and evolutionary parameters (Ka/Ks, Ka, Ks) for yeast metabolic enzymes.

We thank Dr Andrey Rzhetsky, Dr Uwe Sauer, and Dr Eugene Koonin for valuable discussions. We also thank two anonymous reviewers for several very helpful suggestions.

Lethality and centrality in protein networks.

Protein dispensability and rate of evolution.

Highly expressed genes in yeast evolve slowly.

Evolutionary rate in the protein interaction network.

No simple dependence between protein evolution rate and the number of protein-protein interactions: only the most prolific interactors tend to evolve slowly.

Molecular evolution in large genetic networks: does connectivity equal constraint?

Comparative assessment of large-scale data sets of protein-protein interactions.

How reliable are experimental protein-protein interaction data ?

The Pathway Tools software.

Genome-scale reconstruction of the Saccharomyces cerevisiae metabolic network.

The Escherichia coli MG1655 in silico metabolic genotype: its definition, characteristics, and capabilities.

Filling gaps in a metabolic network using expression information.

Sunderland: Sinauer Associates

Sequencing and comparison of yeast species to identify genes and regulatory elements.

Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models.

Dissecting the regulatory circuitry of a eukaryotic genome.

Biochemical production capabilites of Escherichia coli .

In silico predictions of Escherichia coli metabolic capabilites are consistent with experimental data.

Analysis of optimality in natural and perturbed metabolic networks.

Large-scale evaluation of in-silico gene deletions in Saccharomyces cerevisiae .

The Molecular Biology of the Yeast Saccharomyces.

Cold Spring Harbor Press, NY

Metabolic network analysis of the causes and evolution of the enzyme dispensability in yeast.

Functional profiling of the Saccharomyces cerevisiae genome.

Systematic screen for human disease genes in yeast.

Saccharomyces genome database: underlying principles and organisation.

Properties of metabolic networks: structure versus function.

Role of duplicate genes in genetic robustness against null mutations.

Robustness against mutations in genetics networks of yeast.

Robustness analysis of the Esherichia coli metabolic network.

Metabolic functions of duplicate genes in Saccharomyces cerevisiae .

Molecular genetics of yeast TCA cycle isozymes.

Inferring lifestyle from gene expression patterns.

Boston: Free Software Foundation

GenomeHistory: a software tool and its application to fully sequenced genomes.

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

A general method applicable to the search for similarities for amino acid sequences of two proteins.

A likelihood approach for comparing synonymous and nonsynonymous nucleotide substitution rates, with application to the chloroplast genome.

A codon-based model of nucleotide substitution for protein-coding DNA sequences.


Watts, D. J. & Strogatz, S. H. Collective dynamics of 'small-world' networks. Nature 393, 440–442 (1998).

Uetz, P. et al. A comprehensive analysis of protein–protein interactions in Saccharomyces cerevisiae. Nature 403, 623–627 (2000).

Ito, T. et al. A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA 98, 4569–4574 (2001).

Rain, J. C. et al. The protein–protein interaction map of Helicobacter pylori. Nature 409, 211–215 (2001).

Reboul, J. et al. C. elegans ORFeome version 1.1: experimental verification of the genome annotation and resource for proteome-scale protein expression. Nature Genet. 34, 35–41 (2003).

Giot, L. et al. A protein interaction map of Drosophila melanogaster. Science 302, 1727–1736 (2003).

Rual, J. F. et al. Towards a proteome-scale map of the human protein–protein interaction network. Nature 437, 1173–1178 (2005).

Stelzl, U. et al. A human protein–protein interaction network: a resource for annotating the proteome. Cell 122, 957–968 (2005).

Butland, G. et al. Interaction network containing conserved and essential protein complexes in Escherichia coli. Nature 433, 531–537 (2005).

Arifuzzaman, M. et al. Large-scale identification of protein–protein interaction of Escherichia coli K-12. Genome Research 16, 686–691 (2006).

Gavin, A. C. et al. Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature 415, 141–147 (2002).

Ho, Y. et al. Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature 415, 180–183 (2002).

Gavin, A. C. et al. Proteome survey reveals modularity of the yeast cell machinery. Nature 440, 631–636 (2006).

Krogan, N. J. et al. Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature 440, 637–643 (2006).

Tarassov, K. et al. An in vivo map of the yeast protein interactome. Science 320, 1465–1470 (2008).

Fell, D. A. & Sauro, H. M. Metabolic control and its analysis. Additional relationships between elasticities and control coefficients. Eur. J. Biochem. 148, 555–561 (1985).

Thomas, S. & Fell, D. A. A computer program for the algebraic determination of control coefficients in metabolic control analysis. Biochem. J. 292, 351–360 (1993).

Durek, P. & Walther, D. The integrated analysis of metabolic and protein interaction networks reveals novel molecular organizing principles. BMC systems biology 2, 100 (2008). Provides the topological differences between PPI and metabolic networks.

Díaz-Mejía, J. J., Pérez-Rueda, E. & Segovia, L. A network perspective on the evolution of metabolism by gene duplication. Genome Biology 8, R26 (2007).

Jensen, R. A. Enzyme recruitment in evolution of new function. Annu. Rev. Microbiol 30, 409–425 (1976).

Feist, A. M. et al. A genome-scale metabolic reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and thermodynamic information. Mol. Syst. Biol. 3, 121 (2007).

Herrgård, M. J. et al. A consensus yeast metabolic network reconstruction obtained from a community approach to systems biology. Nature Biotechnol. 26, 1155–1160 (2008).

Duarte, N. et al. Global reconstruction of the human metabolic network based on genomic and bibliomic data. Proc. Natl Acad. Sci. USA 104, 1777–1782 (2007).

Sharan, R. et al. Conserved patterns of protein interaction in multiple species. Proc. Natl Acad. Sci. USA 102, 1974–1979 (2005).

Kanehisa, M. et al. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484 (2008).

Joshi-Tope, G. et al. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33, D428–D432 (2005).

Salwinski, L. et al. The database of interacting proteins: 2004 update. Nucleic Acids Res. 32, D449–D451 (2004).

Jensen, L. J. et al. STRING 8 — a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 37, D412 (2009).

Jensen, L. J. & Bork, P. Biochemistry. Not comparable, but complementary. Science 322, 56–57 (2008).

Yu, H. et al. High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 (2008).

von Mering, C. et al. Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417, 399–403 (2002).

Bader, J. S., Chaudhuri, A., Rothberg, J. M. & Chant, J. Gaining confidence in high-throughput protein interaction networks. Nature Biotechnol. 22, 78–85 (2004).

Feist, A. M. & Palsson, B. Ø. The growing scope of applications of genome-scale metabolic reconstructions using Escherichia coli. Nature Biotechnol. 26, 659–667 (2008).

Jeong, H., Tombor, B., Albert, R., Oltvai, Z. N. & Barabási, A. L. The large-scale organization of metabolic networks. Nature 407, 651–654 (2000).

Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N. & Barabási, A. L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002). The first demonstration of a topological analysis for biomolecular networks, suggesting that the metabolic network is scale free.

Barabási, A. L. & Oltvai, Z. N. Network biology: understanding the cell's functional organization. Nature Rev. Genet. 5, 101–113 (2004).

van Noort, V., Snel, B. & Huynen, M. A. The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO Rep. 5, 280–284 (2004).

Wagner, A. How the global structure of protein interaction networks evolves. Proc. Biol. Sci. 270, 457–466 (2003).

Rison, S. C. & Thornton, J. Pathway evolution, structurally speaking. Current Opinion in Structural Biology 12, 374–382 (2002).

Janga, S. C. & Babu, M. M. Network-based approaches for linking metabolism with environment. Genome Biology 9, 239 (2008).

Schmidt, S., Sunyaev, S., Bork, P. & Dandekar, T. Metabolites: a helping hand for pathway evolution? Trends Biochem. Sci. 28, 336–341 (2003).

Horowitz, N. H. On the evolution of biochemical syntheses. Proc. Natl Acad. Sci. USA 31, 153–157 (1945). Together with reference 43, this paper provides the first evolutionary models of biochemical networks.

Ycas, M. On earlier states of the biochemical system. J. Theor. Biol. 44, 145–160 (1974).

Lazcano, A. & Miller, S. L. On the origin of metabolic pathways. J. Mol. Evol. 49, 424–431 (1999).

Copley, R. & Bork, P. Homology among (βα)8 barrels: implications for the evolution of metabolic pathways. J. Mol. Biol. 303, 627–641 (2000).

Teichmann, S. A. et al. The evolution and structural anatomy of the small molecule metabolic pathways in Escherichia coli. J. Mol. Biol. 311, 693–708 (2001).

Alves, R., Chaleil, R. A. & Sternberg, M. J. Evolution of enzymes in metabolism: a network perspective. J. Mol. Biol. 320, 751–770 (2002).

Raymond, J. & Segrè, D. The effect of oxygen on biochemical networks and the evolution of complex life. Science 311, 1764–1767 (2006).

Borenstein, E., Kupiec, M., Feldman, M. W. & Ruppin, E. Large-scale reconstruction and phylogenetic analysis of metabolic environments. Proc. Natl Acad. Sci. USA 105, 14482–14487 (2008).

Gianoulis, T. A. et al. Quantifying environmental adaptation of metabolic pathways in metagenomics. Proc. Natl Acad. Sci. USA 106, 1374–1379 (2009).

Snel, B., Bork, P. & Huynen, M. A. Genomes in flux: the evolution of archaeal and proteobacterial gene content. Genome Research 12, 17–25 (2002).

Berg, J., Lässig, M. & Wagner, A. Structure and evolution of protein interaction networks: a statistical model for link dynamics and gene duplications. BMC Evol. Biol. 4, 51 (2004).

Campillos, M., Doerks, T., Shah, P. K. & Bork, P. Computational characterization of multiple Gag-like human proteins. Trends Genet. 22, 585–589 (2006).

Liang, H. & Li, W. H. Gene essentiality, gene duplicability and protein connectivity in human and mouse. Trends Genet. 23, 375–378 (2007).

Rambaldi, D., Giorgi, F., Capuani, F., Ciliberto, A. & Ciccarelli, F. Low duplicability and network fragility of cancer genes. Trends Genet. 24, 427–430 (2008).

Molina, N. & van Nimwegen, E. The evolution of domain-content in bacterial genomes. Biology Direct 3, 51 (2008).

Maslov, S., Krishna, S., Pang, T. Y. & Sneppen, K. Toolbox model of evolution of prokaryotic metabolic networks and their regulation. Proc. Natl Acad. Sci. USA 106, 9743–9748 (2009).

Raes, J., Korbel, J. O., Lercher, M. J., von Mering, C. & Bork, P. Prediction of effective genome size in metagenomic samples. Genome Biology 8, R10 (2007).

Sorek, R. et al. Genome-wide experimental determination of barriers to horizontal gene transfer. Science 318, 1449 (2007).

Prachumwat, A. & Li, W. H. Protein function, connectivity, and duplicability in yeast. Mol. Biol. Evo. 23, 30–39 (2006).

Han, J. D. et al. Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature 430, 88–93 (2004).

Jeong, H., Mason, S. P., Barabási, A. L. & Oltvai, Z. N. Lethality and centrality in protein networks. Nature 411, 41–42 (2001). First demonstration of a large-scale analysis of protein–protein physical interactions as a biomolecular network.

Wuchty, S. Evolution and topology in the yeast protein interaction network. Genome Research 14, 1310–1314 (2004).

Fraser, H. B. Modularity and evolutionary constraint on proteins. Nature Genet. 37, 351–352 (2005).

Drummond, D. A., Raval, A. & Wilke, C. O. A single determinant dominates the rate of yeast protein evolution. Mol. Biol. Evo. 23, 327–337 (2006).

Ekman, D. et al. What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? Genome Biology 7, R45 (2006).

Lu, C. et al. Impacts of yeast metabolic network structure on enzyme evolution. Genome Biology 8, 407 (2007).

Ciccarelli, F. et al. Complex genomic rearrangements lead to novel primate gene function. Genome Research 15, 343–351 (2005).

Kim, P. M., Korbel, J. O. & Gerstein, M. B. Positive selection at the protein network periphery: evaluation in terms of structural constraints and cellular context. Proc. Natl Acad. Sci. USA 104, 20274–20279 (2007).

Roguev, A. et al. Conservation and rewiring of functional modules revealed by an epistasis map in fission yeast. Science 322, 405–410 (2008). Shows rewiring events on the genetic interaction network with large-scale experiments and analysis.

Conaway, R. & Conaway, J. The INO80 chromatin remodeling complex in transcription, replication and repair. Trends Biochem. Sci. 34, 71–77 (2009).

Jin, J. et al. In and out: histone variant exchange in chromatin. Trends Biochem. Sci. 30, 680–687 (2005).

Shevchenko, A. et al. Chromatin central: towards the comparative proteome by accurate mapping of the yeast proteomic environment. Genome Biology 9, R167 (2008).

Lorch, Y., Zhang, M. & Kornberg, R. Histone octamer transfer by a chromatin-remodeling complex. Cell 96, 389–392 (1999).

Park, Y., Chodaparambil, J. V., Bao, Y., McBryant, S. J. & Luger, K. Nucleosome assembly protein 1 exchanges histone H2A-H2B dimers and assists nucleosome sliding. J. Biol. Chem. 280, 1817–1825 (2005).

Park, Y. J. & Luger, K. The structure of nucleosome assembly protein 1. Proc. Natl Acad. Sci. USA 103, 1248–1253 (2006).

Walfridsson, J., Khorosjutina, O., Matikainen, P., Gustafsson, C. M. & Ekwall, K. A genome-wide role for CHD remodelling factors and Nap1 in nucleosome disassembly. EMBO J. 26, 2868–2879 (2007).

Hahn, M. W. & Kern, A. D. Comparative genomics of centrality and essentiality in three eukaryotic protein-interaction networks. Mol. Biol. Evo. 22, 803–806 (2005).

Barton, N. H. & Keightley, P. D. Understanding quantitative genetic variation. Nature Rev. Genet. 3, 11–21 (2002).

Maslov, S. & Sneppen, K. Specificity and stability in topology of protein networks. Science 296, 910–913 (2002).

Zhu, D. & Qin, Z. S. Structural comparison of metabolic networks in selected single cell organisms. BMC Bioinformatics 6, 8 (2005).

Wolf, D. M. & Arkin, A. P. Motifs, modules and games in bacteria. Curr. Opin. Microbiol. 6, 125–134 (2003).

Kreimer, A., Borenstein, E., Gophna, U. & Ruppin, E. The evolution of modularity in bacterial metabolic networks. Proc. Natl Acad. Sci. USA 105, 6976–6981 (2008).

Spirin, V. & Mirny, L. A. Protein complexes and functional modules in molecular networks. Proc. Natl Acad. Sci. USA 100, 12123–12128 (2003).

Spirin, V., Gelfand, M. S., Mironov, A. A. & Mirny, L. A. A metabolic network in the evolutionary context: Multiscale structure and modularity. Proc. Natl Acad. Sci. USA 103, 8774–8779 (2006).

Snel, B. & Huynen, M. A. Quantifying modularity in the evolution of biomolecular systems. Genome Research 14, 391–397 (2004).

Ihmels, J., Levy, R. & Barkai, N. Principles of transcriptional control in the metabolic network of Saccharomyces cerevisiae. Nature Biotechnol. 22, 86–92 (2004).

von Mering, C. et al. Genome evolution reveals biochemical networks and functional modules. Proc. Natl Acad. Sci. USA 100, 15428–15433 (2003).

Yamada, T., Kanehisa, M. & Goto, S. Extraction of phylogenetic network modules from the metabolic network. BMC Bioinformatics 7, 130 (2006).

Campillos, M., von Mering, C., Jensen, L. J. & Bork, P. Identification and analysis of evolutionarily cohesive functional modules in protein networks. Genome Research 16, 374–382 (2006).

Kelley, B. P. et al. Conserved pathways within bacteria and yeast as revealed by global protein network alignment. Proc. Natl Acad. Sci. USA 100, 11394–11399 (2003).

Fokkens, L. & Snel, B. Cohesive versus flexible evolution of functional modules in eukaryotes. PLoS Comput. Biol. 5, e1000276 (2009).

Parter, M., Kashtan, N. & Alon, U. Environmental variability and modularity of bacterial metabolic networks. BMC Evol. Biol. 7, 169 (2007).

Kashtan, N. & Alon, U. Spontaneous evolution of modularity and network motifs. Proc. Natl Acad. Sci. USA 102, 13773–13778 (2005).

Bork, P. & Serrano, L. Towards Cellular Systems in 4D. Cell 121, 507–509 (2005).

Laub, M. T., McAdams, H. H., Feldblyum, T., Fraser, C. M. & Shapiro, L. Global analysis of the genetic network controlling a bacterial cell cycle. Science 290, 2144–2148 (2000).

de Lichtenberg, U., Jensen, L. J., Brunak, S. & Bork, P. Dynamic complex formation during the yeast cell cycle. Science 307, 724–727 (2005). Provides a time-dependent protein interaction network by gene expression, leading to the study of protein complex dynamics.

Jensen, L. J., Jensen, T. S., de Lichtenberg, U., Brunak, S. & Bork, P. Co-evolution of transcriptional and post-translational cell-cycle regulation. Nature 443, 594–597 (2006).

Hooper, S. D. et al. Identification of tightly regulated groups of genes during Drosophila melanogaster embryogenesis. Mol. Syst. Biol. 3, 72 (2007).

Tomancak, P. et al. Systematic determination of patterns of gene expression during Drosophila embryogenesis. Genome Biology 3, 0088 (2002).

Schmid, M. et al. A gene expression map of Arabidopsis thaliana development. Nature Genet. 37, 501–506 (2005).

Haudry, Y. et al. 4DXpress: a database for cross-species expression pattern comparisons. Nucleic Acids Res. 36, D847–D853 (2008).

Berg, J., Tymoczko J., Stryer L. & Clarke N. Biochemistry (W. H. Freeman).

Shyamsundar, R. et al. A DNA microarray survey of gene expression in normal human tissues. Genome Biology 6, R22 (2005).

Saito-Hisaminato, A. et al. Genome-wide profiling of gene expression in 29 normal human tissues with a cDNA microarray. DNA Res. 9, 35–45 (2002).

Erdo˝s, P. & Renyi, A. On the strength of connectedness of a random graph. Acta Math. Hung. 12, 261–267 (1961).

Barabasi, A. L. & Albert, R. Emergence of scaling in random networks. Science 286, 509–512 (1999).

Karp, P. D. et al. Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res. 33, 6083–6089 (2005).

Roguev, A., Wiren, M., Weissman, J. S. & Krogan, N. J. High-throughput genetic interaction mapping in the fission yeast Schizosaccharomyces pombe. Nature Methods 4, 861–866 (2007).

Siegert, R., Leroux, M. R., Scheufler, C., Hartl, F. U. & Moarefi, I. Structure of the molecular chaperone prefoldin: unique interaction of multiple coiled coil tentacles with unfolded proteins. Cell 103, 621–632 (2000).

G-5. The human E3 ubiquitin ligase enzyme protein interaction network

Kar G XE "Kar G" (1,*), Keskin O XE "Keskin O" (1), Nussinov R XE "Nussinov R" (2,3), Gursoy A XE "Gursoy A" (1)

Ubiquitination is crucial for protein degradation in eukaryotic cells. It is achieved by a sequential cascade of ubiquitin-activating (E1), ubiquitin-conjugating (E2) and ubiquitin-ligating (E3) enzymes. E3 ligases mediate ubiquitin transfer from E2s to substrates and as such confer substrate specificity. Despite their essential role, current knowledge of their distinct biological functions and interaction partners is limited.Here, using structural data, efficient structural comparison algorithms and appropriate filters,we construct human E3 ubiquitin ligase enzyme protein interaction network.

Materials and Methods

We first compile the available structures for E2 and E3 proteins in the human ubiquitination pathway. Second, we apply our efficient protein-protein interaction prediction algorithm PRISM, which uses experimental (X-ray, NMR) protein-protein interface templates to model the interactions of E3 and E2 proteins in a large, proteome-scale docking strategy based on interface structural motifs. Then, we include flexibility and energetic considerations in our modeling using FiberDock, a flexible docking refinement server, to obtain more physical and biologically relevant interactions.


Analysis of the human E3 ubiquitin ligase enzyme protein interaction network reveals important functional features and uncovers an a priori unknown E3-E2 and E3-E3 interactions. Our results show that E3 proteins such as Mdm2 and Huwe1 share E2 partners, which may explain how both Mdm2 and Huwe1 ubiquitinate p53 tumor suppressor protein for degradation. In addition, we discover the mode of E3-E3 interactions such as Mdm2-Siah1, which are known to enhance the degradation of the Numb protein.


Here, for the first time, we constructed a structural human E3 ubiquitin ligase enzyme protein interaction network. Our strategy allows elucidation of both which E3s interact with which E2s in the human ubiquitination pathway and how they interact. In addition to identifying E3-E2 interactions, our strategy also reveals functionally-relevant E3-E3 interactions in the human ubiquitination pathway that were hitherto unknown.

Author Affiliations

(1) Koc University, Center for Computational Biology and Bioinformatics, and College of Engineering, Rumelifeneri Yolu, 34450 Sariyer Istanbul, Turkey (2) Basic Research Program, SAIC-Frederick, Inc., Center for Cancer Research Nanobiology Program, NCI-Frederick, Frederick, MD 21702, USA (3) Sackler Inst. of Molecular Medicine, Department of Human Genetics and Molecular Medicine, Sackler School of Medicine, Tel Aviv University, Tel Aviv 69978, Israel


Dataset construction

Experimental proteomic/genomic data comparing normal (N) and preeclamptic (PE) pregnancies was obtained analysing the Gene Expression Omnibus (GEO) database[37]. The datasets considered are represented in Table4.

Each experiment was analyzed independently in order to reduce the number of genes. In our case we considered an adjusted p-value ≤0.05 and a fold expression ≥2 as discussed elsewhere[6, 7, 25–27, 38, 39]. Initially the p-value was obtained by a bootstrapping procedure with 1000 or 10000 iterations (depending on size of the sample) obtaining 645 statistically significant modulated genes, however, applying the false discovery rate (FDR) correction by the Benajmini-Hochberg method[40], this sample was reduced to 330 genes.

In addition, several text mining exploration tools were used to complement the GEO results. There are several tools to perform a text data mining analysis but several of them require extra information (i.e. chromosome region) instead phenotype or diseases notation (i.e. diseases name or related keywords). In our case we choose those methods that do not require previous genetic knowledge of the disease[8]. Moreover, the text mining procedures usually could provide several false positive associations and therefore those tools which also combine text-mining with other data sources in the analysis are preferred[8, 41]. Considering these aspects, we used the following tools: PolySearch[42], Candid[43] and Phenopred[44]. Candid and PhenoPred use several heterogeneous data sources to overcome bias while in PolySearch analyse was restricted to PubMed publications. Obviously many other algorithms could be used in alternative. In order to reduce the risk of including biased relationships, the top 10–20 genes/proteins with the highest scores were selected and individually analyzed considering the preeclampsia related scientific publications. Some of the top genes were also present in the previous dataset (GEO), therefore, the final dataset contained 347 genes.

Protein-protein interaction network (PPI)

The proteins associated with the previous 347 genes were identified and cross-referenced with the IRefIndex (v1.16)[45] and a signaling curated databases[46] that were used to create the protein-protein interaction (PPI) network. The IRefIndex database provide an index of protein interactions available in several databases like: BIND, BioGRID, DIP, HPRD that simplify the task consuming process of inter-database mapping and lead to a comprehensive covering of the available known protein interactions space. On the other hand, this PPI database is easily integrated in Cytoscape. Additionally, many diseases are related with signaling pathways modifications and therefore the inclusion of this interaction database considerable improve the PPI space. The interaction search was restricted to Homo Sapiens and includes all kind of experimental procedure as well as some predictive interactions (mainly from the OPHID database). The curation of the final database was performed both, manually and using home-made software to remove duplicate interactions and unify isoforms notation with unique genes. We obtained our final PPI network with 3279 interactions and 2400 nodes.

Some of the proteins present in our initial dataset had not any known experimental interaction (at least in humans) and therefore the 2400 nodes cover only 234 (67.45%) genes of the initial set (347). The network visualization and network topology indexes, calculated in the hubs detection process, were carried out using Cytoscape 2.8.2 and CytoHubba[47, 48].

Several methodologies are available for hubs and essential genes identification, and all of them with the respective advantage and limitations[47, 49–54]. Some strategies are the use of genetic algorithm or machine learning procedures[49, 50], however, the centrality approaches are by far the most applied procedures even by simplicity and because several studies had being pointed out its applicability[47, 51, 52]. Therefore, several centrality indexes were evaluated: Betweenness, bottleneck, density of maximum neighborhood component (DMNC), node degree, edge percolation component (EPC), eccentricity, maximal clique centrality (MCC), maximum neighborhood component (MNC), radiality and stress[47]. On the other hand, to obtain a scoring index we created the measurement (Score I) as follow:

Where Ici is the values of centrality indexes and i = 1…Nc, and is the number of calculated centrality indexes (Nc = 10). As we can note, score I is the sum of all the indexes percent value after individual normalization and therefore is restricted to a maximal value of 100×Nc which simplify even better the top genes selection. With the normalized centrality indexes we also performed a model-based clustering analysis using R-package[53] in order to study hubs distribution with respect to centrality ranks. We also performed a communality (or cliques) network analysis by clique percolation method using CFinder[54]. The communality analysis provides a better topology description of the network including the location of highly connected sub-graphs (cliques) and/or overlapping modules that usually correspond with relevant biological information.

Pathway and diseases enrichment analysis

The pathways and diseases enrichment analysis were performed through the DAVID bioinformatics resource 6.7[55], exploring the well know databases: KEGG, BioCarta and Reactome (pathways related) as well as OMIN and Genetic Association Database (GAD) (diseases related analysis). This online resource (DAVID) integrate, in a faster computational analysis, a wide range of enrichment analysis thought different databases providing also a substantial statistical description. The analysis was carried out considering the complete gene space of the PPI network. We also used DAVID in order to perform a gene ontology enrichment analysis in the obtained clusters.

Unraveling protein interactions between the temperate virus Bam35 and its Bacillus host using an integrative Yeast two hybrid-High throughput sequencing approach

Bacillus virus Bam35 is the model Betatectivirus and member of the Tectiviridae family, which is composed of tailless, icosahedral, and membrane-containing bacteriophages. The interest in these viruses has greatly increased in recent years as they are thought to be an evolutionary link between diverse groups of prokaryotic and eukaryotic viruses. Additionally, betatectiviruses infect bacteria of the Bacillus cereus group, known for their applications in industry and notorious since it contains many pathogens. Here, we present the first protein-protein interactions network for a tectivirus-host system by studying the Bam35-Bacillus thuringiensis model using a novel approach that integrates the traditional yeast two-hybrid system and Illumina high-throughput sequencing. We generated and thoroughly analyzed a genomic library of Bam35’s host B. thuringiensis HER1410 and screened interactions with all the viral proteins using different combinations of bait-prey couples. In total, this screen resulted in the detection of over 4,000 potential interactions, of which 183 high-confidence interactions were defined as part of the core virus-host interactome. Overall, host metabolism proteins and peptidases are particularly enriched within the detected interactions, distinguishing this host-phage system from the other reported host-phage protein-protein interaction networks (PPIs). Our approach also suggests biological roles for several Bam35 proteins of unknown function, resulting in a better understanding of the Bam35-B. thuringiensis interaction at the molecular level.

A uthor summary Members of the family Tectiviridae, composed of non-tailed icosahedral, membrane-containing bacteriophages, have been increasingly scrutinized in recent years for their possible role in the origin of dsDNA viruses. In particular, the genus Betatectivirus receives increased attention as these phages can infect clinical strains as well as industrially relevant members of the B. cereus group. However, little is known about the interactions between these temperate viruses and their hosts. Here, we present the first high-throughput study of tectivirus-host protein-protein interactions focusing on Bam35, model virus of betatectiviruses, and its host B. thuringiensis, an important entomopathogenic bacterium. We adapted the well-known technique yeast-two-hybrid and integrated high-throughput sequencing and bioinformatics for the downstream analysis of the results which enables large-scale analysis of protein-protein interactions. In total, 182 detected interactions show an enrichment in host metabolic proteins and peptidases, in contrast with the current knowledge on host-phage PPIs. Specific host-viral protein-protein interactions were also detected enabling us to propose functions for uncharacterized proteins.


Construction of the Cell Metabolism-Based Human Disease Network.

As a starting point of our analysis, we used the Kyoto Encyclopedia of Genes and Genomes (KEGG) Ligand database (15) and a database of biochemically, genetically and genomically structured genome-scale metabolic network reconstructions (BiGG) (16), each representing a manually curated list of metabolic reactions in a generic human cell and the enzymes catalyzing them. We used the list of disorder–gene association pairs available in the Online Mendelian Inheritance in Man (OMIM) database (23) to identify the disorders associated with each of the enzymes present in the human metabolic network (Fig. 1a), finding that in the KEGG (BiGG) database 737 (1,116) among the total of 1,493 (3,742) metabolic reactions are associated with at least one disease. Similarly, 337 (378) among the 1,437 distinct disorders identified in OMIM are related to at least one metabolic reaction in KEGG (BiGG).

MDN. (a) Construction of the MDN. (Upper) A local region of the glycolysis, where the catalytic enzymes are shown with red background and their corresponding genes are shown with orange background. (Lower) A local neighborhood of the metabolic diseases (blue) associated with the shown reactions. The gene ENO3 encodes the enzyme catalyzing the conversion between phosphoenolpyruvate and glycerate-2P, and its mutation is involved in the development of enolase-β deficiency. The gene products of PGAM2 and BPGM, catalyzing the reaction involving glycerate-2P and glycerate-3P, are connected to myopathy and hemolytic anemia. Then the two diseases are not only connected with each other but also linked to enolase-β deficiency due to the adjacency of their associated reactions. (b) In the network representation, 308 nonisolated diseases (nodes) are connected by 878 metabolic links combining the potential links predicted by KEGG and OMIM reconstructions. The color of the nodes indicates the disease class (see SI Text and Dataset S1), and node size is proportional to the prevalence of each disease in the Medicare dataset. The width of the link between diseases is proportional to the comorbidity C of the two connected diseases. We show with red the links with significant (P < 0.01) comorbidity. Clusters of diseases associated with purine metabolism (blue shading), fatty acid metabolism (red shading), and porphyrin metabolism (green shading) are shown.

If the same substrate is shared between two metabolic reactions, the scarcity or abundance of that substrate may affect the fluxes of both reactions, potentially coupling their activity. For example, in Fig. 1a, if the phosphoglycerate mutase is not active, the production (or consumption) of glycerate-2P, and in turn of phosphoenolpyruvate, is expected to also be altered. In the following, we consider two metabolic reactions linked if they process a common metabolite, i.e. if they are adjacent to each other in a metabolic reaction map (see SI Text, Dataset S1, Dataset S2 and Dataset S3).

The altered activity of some metabolic enzymes is known to be associated with specific disorders. For example, mutations in the ENO3 gene (that encodes the enolase enzyme) are known to cause enolase-β deficiency, an autosomal recessive disorder characterized by muscle weakness and fatigability. Similarly, mutations in the BPGM gene (encoding one isoform of the phosphoglycerate mutase enzyme) can lead to hemolytic anemia. Our hypothesis is that, given that the two diseases can result from metabolic defects affecting coupled reactions, linked by glycerate-2P (Fig. 1a), their pathogenesis may also be related. That is, we hypothesize that the occurrence of one of the two diseases in a patient may enhance the likelihood of developing the other disease phenotype as well. The sum of all such cell metabolism-based links among disease phenotypes can be represented as a metabolism-based human disease network, hereafter referred to as metabolic disease network (MDN). In the MDN, each node corresponds to a disease and two diseases are connected if the metabolic reactions they are associated with are adjacent, suggesting that their fluxes may be coupled.

Characterizing the MDN.

The complete MDN is shown in Fig. 1b. The network has a large disease cluster, often called the giant component, in network theory (11, 24–26) and several smaller ones. The giant component includes 197 disorders of various disease classes, such as diabetes mellitus, obesity, Parkinson disease, asthma, unipolar depression, hypertension, and coronary artery diseases. The observed clustering of the MDN mirrors the existence of the distinct metabolic pathways. To illustrate this, in Fig. 1b, we highlighted with background colors the diseases associated with some of the better known pathways. For example, according to KEGG, human purine metabolism consists of 62 reactions associated with 33 diseases including congenital dyserythropoietic anemia and nucleoside phosphorylase deficiency. These diseases form a visually distinct cluster, highlighted with blue shading in Fig. 1b. Fatty acid metabolism, containing 34 reactions and 34 associated diseases, such as trifunctional protein deficiency and syndrome of hemolysis, elevated liver enzymes, and low platelet count (HELLP) appears again as a highly interlinked group (pink shading in Fig. 1b).

The statistical characteristics of the MDN are shown in Fig. S2. We find that on average, a disease is connected to about five other diseases and that the degree distribution is much broader than that of a random network with the same number of nodes and links, indicating that there are considerable differences among the metabolism-based relatedness of various diseases. For example, some diseases, like hypertension, warfarin resistance/sensitivity, and hemolytic anemia, act as “hubs” (11, 24, 27), with links to 27, 19, and 17 other diseases, respectively. In contrast, the majority of diseases have links only to few other diseases (see Fig 1b, and Figs. S3 and S4). To a degree, this is expected because the studied disease phenotypes span a wide range of conditions, from simple Mendelian disorders, such as enolase-β deficiency (caused by deficiency of a single enzyme), to highly heterogeneous complex diseases, such as hypertension and diabetes (for which a fraction of the genetic contribution is in the form of susceptibility alleles that are neither necessary nor sufficient to cause the disease).

Gene Expression and Flux Coupling-Based Functional Relationships Among Disease Genes.

To examine the functional relevance of the MDN, next we explored to what degree the predicted links between metabolic diseases and the associated enzymes represent detectable functional relationships. By using published microarray data for gene expression in 36 normal human tissues (28), we computed the Pearson correlation coefficient (PCC) between the expression profiles of each pair of genes in the metabolic network. We find that the average coexpression of gene pairs connected by metabolic links is higher than the coexpression between genes for which no such metabolic link is known (29) (Fig. 2b and Fig. S5) with P < 10 −8 . For example, the genes ENO3 and PGAM2 (Fig. 1a) have a PCC = 0.66 with P < 10 −5 , a 7-fold increase over the average expectation.

Flux coupling and coexpression of metabolic genes. (a) To illustrate the use of flux-coupling analysis, we show the reactions that display directional coupling (DC) with the reaction converting propanoyl–CoA to (S)-methylmalonyl–CoA. In blue, we indicate the genes encoding the corresponding enzymes, and in red, we indicate the associated diseases. The production (consumption) of pentadecanoyl–CoA is performed by a single reaction, catalyzed by CPT2 (ACADM, ACADS), and therefore the ratio of their fluxes should be a constant (full coupling FC). On the contrary, propanoyl–CoA may be produced by four reactions and is consumed by only one reaction. Therefore a nonzero flux of any of those four reactions implies a nonzero flux of the reaction consuming propanoyl–CoA, but the opposite is not the case, which is DC. Because of the FC between the reactions producing and consuming pentadecanoyl–CoA, the reaction (CPT2) has DC also with the reaction (PCCA, PCCB). (b) Distribution of the PCC for all pairs of metabolism-related genes and for the pairs of genes connected by metabolic links based on the KEGG database. (c) Average PCC for all pairs of genes, all pairs of metabolism-related genes, genes connected by metabolic links, and genes associated with flux-coupled reactions displaying DC or FC. The coexpression is stronger for connected genes and significantly higher for flux-coupled genes.

The causal relationship among diseases may not be limited to those associated with adjacent reactions but could extend to disease pairs that are associated via reactions whose fluxes are coupled (22, 30, 31). By using the flux coupling finder methodology (22, 30–32), we identified two types of coupling between pairs of reactions i and j: (i) directional coupling (ij), if a nonzero flux for i implies a nonzero flux for j but not necessarily the reverse or (ii) full coupling (ij), if a nonzero flux for i implies not only a nonzero but also a fixed flux for j and vice versa (31) (Fig. 2a). For the BiGG reconstruction, we identified 2,605 gene pairs catalyzing flux-coupled reactions. The average coexpression (PCC) of the flux coupled genes is 0.31, higher than 0.24 found for the genes catalyzing adjacent reactions and significantly higher than PCC = 0.10 characterizing all gene pairs (Fig. 2c). We also find that reactions connected by directional coupling show a significantly higher PCC (0.36) than those fully coupled (PCC = 0.17) (Fig. 2c). Taken together, these results confirm the existence of functional links between adjacent and flux-coupled reactions, suggesting the significance of these links for the coexistence of the related diseases in humans.

Comorbidity Analysis.

Disease pathobiologies originate from a full or partial breakdown of physiological cellular processes together with subsequent (often compensatory) interactions among components of the genome, proteome, metabolome, and the environment. Therefore, the affected metabolic network activity is likely to contribute to disease progression and comorbidity on the cell, organ, and organismal level.

To examine whether the links in the MDN predict disease cooccurrences, we analyzed the Medicare records of 13,039,018 elderly patients in the United States who, over the period 1990–1993, had a total of 32,341,348 hospital visits. These records are highly complete and accurate and are frequently used for epidemiological and demographic research (33, 34). The present sample was abstracted from a complete set of all hospital visits of all elderly patients (aged 65–113) in the Medicare program, which is 96% of all elderly Americans. The sample of 13 million hospitalized patients has a mean age of 76.5 ± 7.5 41.7% were male, and 90.1% were Caucasian (Fig. S6). Most patients were diagnosed with several diseases during the observation period, a cooccurrence that in some cases is accidental but is also often causal, i.e. one disease increases the likelihood of the development of other diseases(C.A. Hidalgo, N. Blumm, A.-L.B., and N.A.C., unpublished data 36), perhaps in part because of causal effects rooted in the metabolic network-based links among the cellular components implicated in the particular disease.

To test whether the links of the MDN can be detected in the population as significant cooccurrences between metabolically linked diseases, for each pair of diseases X and Y, we computed the comorbidity index (CXY, SI Text), which captures to what degree the two diseases cooccur in the same group of patients. A positive comorbidity indicates that patients with disease X are likely to develop disease Y as well, whereas a negative comorbidity indicates a potential protective effect from a disease Y in a patient with disease X. We prepared a hand-curated mapping of the ICD-9-CM codes based on the genetic disorders in OMIM by using an expert coder and standard coding procedures implemented in hospitals for assigning ICD-9-CM codes to prose descriptions of diseases (e.g. converting “diabetes” to ICD-9-CM code 250), thus allowing us to compute the comorbidity of each pair of diseases CXY in the MDN, where X and Y are indices for the 337 diseases associated with KEGG and the 378 diseases associated with BiGG.

The overall tendency of the diseases to cooccur is supported by the right-skewed comorbidity distribution (Fig. 3a and Fig. S5), implying that in general, metabolically connected diseases show a higher than average comorbidity. The average comorbidity for all diseases is 0.0009 (0.0008) for the KEGG (BiGG) reconstruction, in contrast with metabolically connected pairs of diseases for which the average comorbidity is 0.0027 (0.0023), three times larger than the average for all diseases (P < 10 −8 ). Furthermore, the average comorbidity of the diseases associated with the reactions whose fluxes are fully (directionally) coupled is 0.0062 (0.0041), ≈7 (5) times larger than the average for all diseases. In general, we find that 17% (16%) of all metabolic disease pairs for the KEGG (BiGG) reconstruction show significant (P < 0.01) comorbidity. This fraction is elevated to 31% (28%) for the disease pairs connected by a metabolic link and 28% for the flux-coupled diseases according to the KEGG (BiGG) reconstruction, a highly significant enhancement with P < 10 −8 .

Comorbidity and the human MDN. (a) Comorbidity distributions for all pairs of metabolism-related diseases and for connected diseases. (Inset) The average comorbidities. (b) Distribution of the prevalence of metabolism-related diseases, well approximated by a power–law with exponent −2.03 ± 0.05 (see red line). (c) Prevalence as a function of the degree of the disease in the MDN. The prevalence increases with the degree with the PCC 0.333 for KEGG database and 0.092 for BiGG database with P values <10 −7 and ≈0.07, respectively. (d) Comorbidity as a function of the distance between diseases in the MDN, decreasing as the distance increases. The PCCs are −0.06233 and −0.12511 for the KEGG and BiGG databases, respectively, and the P values are <10 −8 for KEGG and ≈0.0002 for BiGG database. (e) Mortality as a function of disease degree in the MDN. The mortality increases with the degree with the PCC 0.162 for KEGG database and 0.0693 for BiGG database with P values 0.044 and 0.22, respectively. (f) Correlation of potential disease comorbidity factors with disease comorbidity. PCCs between the presence of common associated genes, of metabolic links, and of flux-coupled links, with disease comorbidity are presented for metabolism-related diseases and classical metabolic diseases.

We also identified the prevalence IX of each disease, defined as the fraction of the patients having disease X (Fig. 1b). The prevalence distribution is well approximated by a power–law with exponent −2.0 (Fig. 3b), indicating that although the vast majority of diseases are rare, a few affect a significant fraction of the examined patient population. Hypertension is one of the most prevalent diseases with prevalence 0.337 followed by coronary artery disease (0.246), diabetes mellitus (0.167), and pulmonary disease (0.147). Given this broad distribution of prevalence (Fig. 3b), it is plausible that the more links a disease has to other diseases in the MDN, the higher its prevalence is, given the increased likelihood that it will be induced by other diseases in the network. Therefore, we measured the correlation between the prevalence and the degree of connectivity of each disease in the MDN (Fig. 1b), finding that the average value of the disease prevalence (Dataset S4) increases with the degree (PCC is 0.333 for KEGG, P < 10 −7 , Fig. 3c). Thus, the more connected a disease is in the MDN, the higher the likelihood that it may contribute to the emergence of other diseases.

We next examined whether the comorbidity effects are limited to adjacent reactions or whether comorbidity relationships also can be discerned spreading over longer distances in the MDN (i.e. if disease X is linked to disease Y, which in turn is linked to disease Z, can one expect comorbidity between X and Z?). To address this question, we define the network distance between two diseases as the length (number of links) of the shortest reaction pathway connecting them within the MDN, a metric often used in network theory (10, 11, 24, 25). We find that the PCC between the network distance and comorbidity is −0.062 (−0.13) with P < 10 −8 (P < 0.0002) for KEGG (BiGG), indicating that the comorbidity of two diseases decreases as their network distance in the MDN (Fig. 3d). This finding suggests that although the direct or local links are the most relevant for average comorbidity, measurable effects persist up to three links, leading to a potential clustering of diseases discerned in the comorbidity relationships. We also found that reactions associated with diseases are active in more than one tissue (Figs. S13 and S14). In particular, ≈27% (12%) of the reaction pairs associated with diseases displaying significant comorbidity are active in all tissues, from the KEGG (BiGG) database, suggesting that the reactions associated with diseases are located in the core of the metabolic network (37).

The widely different connectivities of various diseases (Fig. 1b) prompted us to ask whether the more connected diseases are associated with higher mortality rates (deaths) than the less connected ones. Therefore, we quantified the mortality rate associated with each disease, defined as the percentage of all elderly people who died in an 8-year period after the diagnosis with the particular disease. We find that the connectivity of a disease to other diseases in the MDN and its associated mortality rate display a PCC 0.16 (0.07) in the KEGG (BiGG) database (Fig. 3e). A potential explanation for this is that a patient diagnosed with a hub disease is very likely to also develop the diseases connected to it, whether they are diagnosed or not, and they together elevate the mortality of the hub disease.

Previous work has indicated that although most diseases can be grouped into a human disease network based on the genes the diseases share, metabolic diseases are the most disconnected class in this network (18). The main hypothesis behind the present work is that the potential relatedness of metabolic diseases is better predicted by the shared metabolites and correlated metabolic reactions than by shared genes. Therefore, we next tested whether metabolic links indeed offer a better measure of functional relatedness than shared genes by using multivariate analysis to quantify the contribution to comorbidity of the various potential links between diseases, distinguishing shared genes, metabolic links, or flux-coupled links. We find that when considering all diseases linked to metabolic enzymes (i.e., all nodes in Fig. 1b), the strongest comorbidity effects are predicted by the metabolic links in the KEGG database followed closely by shared genes (Fig. 3f). However, many diseases in Fig. 1b are not classical metabolic diseases but are related to metabolic diseases through multifunctional enzymes (6). To correct for those effects, we repeated the analysis for only diseases that are classified as metabolic diseases in the medical literature (shown as red symbols in Fig. 1b). For these, we find that the strongest predictors of comorbidity are the metabolic links, representing an equally strong effect in the KEGG and BiGG databases (Fig. 3f). In contrast, shared genes and, surprisingly, flux coupled enzymes offer a negligible predictive power. This result supports our initial hypothesis that for metabolic diseases, coupled metabolic reactions offer the best predictors for disease relatedness.

MDN-Predicted Significant Comorbidity Effects Between Diseases.

The MDN-based methodology allowed us to uncover 193 pairs of diseases that are metabolically linked according to either the KEGG or the BiGG dataset and also show significant comorbidity. The full list is provided in Dataset S5), and the subset of diseases connected in both datasets and showing the highest level of comorbidity is shown in Table S1. Among the pairs of diseases having high gene coexpression and high comorbidity are diabetes and obesity, a well known comorbidity relationship (38), but less obvious pairs, such as glutathione synthetase deficiency and myocardial infarction, are also apparent.

We also find that a detailed analysis of individual disease pairs can help to understand the way by which disturbance in the underlying metabolic network may contribute to shared pathophysiology and suggest other potential disease-modifying factors. For example, diabetes mellitus and hemolytic anemia show higher than expected comorbidity (Table S1) in our database, we find 1,656 patients that are diagnosed with both diseases, in contrast with the expected 1,215 if the two diseases are to occur independently (P < 10 −8 ). Inspecting the relationship between the genes associated with the two diseases, we find that some of the mutated genes associated with them encode enzymes catalyzing adjacent metabolic reactions (Fig. S7). Indeed, NADPH deficiency due to glucose-6-phosphate dehydrogenase deficiency causes a reduction in the levels of glutathione that is a main factor in protecting against oxidative damage. In turn, impaired glucose uptake due to glucokinase mutation may not only alter the threshold of insulin release in pancreatic β-cells but may also increase their sensitivity to oxidative damage by reducing substrate flow toward the pentose phosphate pathway (that produces NADPH). Thus, single nucleotide polymorphisms (SNPs) in the coding region of enzymes directly or indirectly affecting the redox capacity of cells (39, 40) are expected to be among the different factors that affect the phenotype and penetrance of either or both diseases (Fig. S7).

Finally, similar disease cooccurrence associations, linking the metabolic dependency and the MDN to comorbidity, can be found for many other disease pairs, such as hypertension and coronary spasm (Fig. S8), glutathione synthetase deficiency and myocardial infarction (Fig. S9), alcoholism and epilepsy (Fig. S10), and asthma and atherosclerosis (Fig. S11), together indicating the MDN based approach's utility in discovering comorbidity effects and highlighting their potential mechanisms.


Surprisingly, genes occupying the higher hierarchical positions of the human signal transduction network are not subject to stronger levels of purifying selection, suggesting that they are not more important for the function of the network and the fitness of the organism than genes occupying the lower hierarchical positions. This observation sharply contrasts with the patterns observed in metabolic and transcriptional regulatory pathways and networks, in which upstream genes are generally the most selectively constrained. These contrasting patterns of evolution might reflect fundamental differences in the function and organization of signaling and biosynthetic and transcriptional regulatory networks. In any case, results presented here broaden our knowledge on how natural selection distributes across molecular networks.


Construction of Species-Specific Metabolic Networks.

We constructed the metabolic networks of 325 bacterial organisms following the approach outlined in (32). Metabolic data were collected from KEGG (release 39, September 2006, Parsing KEGG reactions, compounds and enzymes' data, we created a list of the existing reactions in each species in our collection, their products and substrates, and their directionality. Water, protons, and electron components were removed from the networks as in ref. 33. Highly connected metabolites that participate in >10 reactions were also removed, and reactions that have one of these compounds as their sole product or substrate were subsequently removed (analogous to the procedure used in ref. 34). A mapping associating metabolic enzymes to the reactions they catalyze was generated, based on the information in the KEGG database.

The metabolic network of each organism was generated from its list of reactions as follows: Each enzyme is represented as a node in the network. Let E1 = <e1 1 , e2 1 , …, en 1 > denote the set of enzymes that catalyze reaction R1, and E2 = <e1 2 , e2 2 , …, em 2 > denote the set of enzymes that catalyze reaction R2. If a product of R1 is a substrate of R2, then edges are assigned between all nodes of E1 and all nodes of E2. Edges are also assigned within E1 nodes and within E2 nodes. Edges in the network are considered undirected. For each network, we computed the ratio between the number of metabolic enzymes and the overall number of genes in the genome of the pertaining species. Networks for which this ratio was <0.05 were considered as lacking sufficient data and were omitted from our analysis (overall 12 networks were filtered out, resulting in a total of 325 metabolic networks).

Identifying Topological Features of the Network.

For each metabolic network, we computed the network centrality measure and the mean degree of its nodes. A network's centrality is computed as follows: All pairwise shortest paths were determined, using the Floyd–Warshall algorithm (35), and for each node, its mean shortest-path distance to all other nodes in the network was computed, denoting the node's centrality. In cases where the network has more than one connected component, nodes from two different components are assumed to have a distance of twice the maximal distance obtained within the components. The node with the smallest mean shortest distance is considered the most central node, and its mean distance is defined as the network's centrality.

Computing Network Modularity.

The modularity score of each metabolic network is computed by using the algorithm presented in ref. 23. Newman's algorithm partitions the network into modules such that the number of edges between modules is significantly less than expected by chance. The algorithm provides a mathematical measure for modularity with network-size normalized values, ranging from 0 (low modularity) to 1 (maximum modularity). The use of Newman's algorithm provides a size-invariant modularity measure and thus enables us to study the role of network size on modularity as an independent, interesting topological variable [this is different from Parter et al. (20), which used a modified measure and examined equal size networks].

Characterizing Bacterial Environments.

We first used the number of transporter genes in a species' genome as a rough correlate of the diversity of the environment in which it resides. The number of transporter genes was computed by counting the number of appearances of the words “transporter” and “permease” in the pertinent.ent file of each organism in the KEGG database, describing the organism's genomic data: gene numbers, names, functional description, orthology, position, etc. A second, more refined characterization of the environment of each species was obtained from the prokaryotic attributes table of the National Center for Biotechnology Information Genome Project ( For each organism, we obtained four features: salinity, oxygen requirements, habitat, and temperature range. Each of these features is defined by discrete categories as follows: salinity: nonhalophilic, mesophilic, moderate halophile, or extreme halophile oxygen requirements: aerobic, microaerophilic, facultative, or anaerobic temperature range: cryophilic, psychrophilic, mesophilic, thermophilic, or hyper thermophilic habitat: host-associated, aquatic, terrestrial, specialized, or multiple. This four-feature description of each organism's environment was then used to search for specific environmental characteristics that may influence metabolic modularity.

Phylogenetic Analysis and Reconstruction of Ancestral Metabolic Networks.

The tree of life generated in ref. 21 was used to identify the phylogenetic relations between the species studied in our analysis and for inferring ancestral metabolic networks along the tree. This tree includes a relatively large number of species, covering most of the taxonomic groups for which metabolic data are available. Specifically, this tree was used to measure the distance of each extant and ancestral species to the last universal common ancestors of bacteria and to calculate the species pairwise phylogenetic distances (measured as the sum of distances from the two species to their last common ancestor). The phylogenetic reconstruction part of our analysis was restricted to bacterial species that could be matched to those included in the reference tree, resulting in a total of 138 species. Using the presence/absence pattern of each enzyme across extant species and employing Fitch's small-parsimony algorithm to determine the presence/absence of each enzyme in every internal node (36), the ancestral metabolic networks (corresponding to internal nodes in the tree) were reconstructed.

Watch the video: Protein protein interaction (August 2022).