Metabolic Engineering Prof. Amit Ghosh School of Energy Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 14 Regulatory Networks
Today we will start a new topic that is regulatory network. So, far you have learned a little bit about the metabolic networks. And the metabolic networks assume that all the metabolites are actually present inside the cell without regulation. But if you put regulation not all the metabolites are synthesizing inside the cell and that is why the regulation of metabolism is very crucial and the regulatory networks are very important for metabolism. So, in the metabolism which you want to understand the metabolism or the metabolic network, you have to actually understand the regulation. The regulation is actually driving the metabolism.
(Refer Slide Time: 01:20)
So, understanding the metabolic regulatory network you need to understand things like what is the difference between the biochemical reaction network and statistical influence network. So, metabolic network sometime also known as biochemical reaction network and statistical influence networks is actually the regulatory network. So, we will compare these 2 networks how it is different and then what is transcriptional regulation. So, we will learn the basic concept in transcriptional regulation and there are 3 fundamental data types for regulatory network.
And mainly we have top down and bottom up approaches for regulatory networks. What is top down and what is bottom up approaches? That also we are going to learn today.
(Refer Slide Time: 02:03)
So, this is the slide in which we have seen in my previous lecture where you have the annotation like given a genome sequence because of the genomics you have readily available genome sequence and genome sequence you can annotate based on the open reading frame and using the open reading frame you identify, you can see that this you can identify a different component. So, this is one component, this is another component and this is another component.
So, this component, you can identify and then this is known as one dimension annotation and then you can see that the protein which is generated from the gene is actually binding the DNA. DNA enabled strand and it might be a transcription factor that is why it is binding the protein DNA interactions are happening. And then we have it may happen that the 2 protein can interact each other and they become a subunit of a given protein and which can catalyze a reaction.
So, where it can be the 2 subunit this is coming from 2 different gene that is a gene 1 and gene 2 and this form 2 subunit and it is catalyzing a reaction from A to B. So, this becomes the component of interaction and that you can represent it in a matrix. The matrix form that we learned in the previous class where the stoichiometric matrix is basically the metabolize. We have metabolize A, B and the rows are basically the reaction.
So, and also you can have protein-protein interaction that can also be stored in a matrix and also the regulatory interaction also can be stored in a matrix. So, in mathematically you can
store different information like what protein-protein DNA per metabolite which are participating in the reaction also you can store in a stoichiometric matrix.
(Refer Slide Time: 04:10)
So, there are 2 kinds of networking in biological networks, broadly classified as a biochemical reaction network and this statistical influence network. So, biochemical reaction network is basically the metabolic network which is basically the biochemistry of the cell, every cell has a particular biochemistry and that biochemistry has come up because of lot of experiments have been done over the last several years and you get the biochemical reaction network completely on the biochemistry of the cell.
But the statistical influence network are actually we get because of the transcriptomics, proteomics, metabolomics data. So, we have this high throughput data available transcriptomics, proteomics, metabolomics data and then you try to infer the type of interaction the protein-metabolite interaction, protein-protein interaction, protein-DNA interaction and DNA-DNA interaction. So, this interaction you would be able to get from the high throughput data which are readily available nowadays.
So, these are basically the interactomics data and using that data you try to infer some network the statistical influence network where you can represent as a Boolean network layers of function of A, B, D it either it can activate a gene. So, here you can see that these 2 component A and B is actually represented by a AND gate and it is actually activating some gene that is C. So, C becomes a function of A, B, C, D. The A, B, D and then you get the product.
This way you can build the network based on the high throughput data. Similarly, the reaction stoichiometry which I have already explained, we have the reaction in the every enzyme corresponding we catalyze one reaction, the reaction 1, 2, 3 like that you can actually represent depending on the biochemistry of the cell. So, this 2 different the reaction stoichiometry and the interaction network based on poor transcriptomics, proteomics data, metabolomics data that helps you to build the regulatory networks also.
(Refer Slide Time: 06:25)
So, if you make a difference between the biochemical reaction and statistical influence network. The statistical influence network is basically you have the statistical system level inference that you get the statistical influence at the system level that you can build based on the data. And so, biochemical reaction one of the first difference is that there is a significant knowledge of the system.
If you have a significant knowledge of the system then you can construct the biochemical reaction network. And on the other hand, the statistical influence network is basically you do not need much prior knowledge about the system, you can build the network without knowing much about the system. The broadly applicable in this case it can be broadly applicable where biochemistry is known. Biochemistry is known then you can actually make a biochemical reaction network otherwise not.
And the biochemistry you know that it takes a lot of time to actually know the biochemistry of the cell. And on the other hand, the statistical influence network is basically without the
knowledge of biochemistry. So, you do not need to know the biochemistry of the cell. Whereas in biochemical reaction network the laws of physics and chemistry can be applied. Whereas in the other statistical networks, we have the physico chemical laws typically not applicable.
So, these are the main differences related more closely to phenotype. We say a biochemical reaction are closely related to the phenotype that is fluxes because from the biochemical reaction network, you can calculate the fluxes whether you apply high regulation or not but still at any condition, you can calculate the flux. Relate more directly to high throughput data, the statistical influence network are more directly to high throughput data.
Once reconstructed from biochemical data, network not likely to change. So, in the biochemical reaction network, it is not going to change this is important difference that the biochemistry of the cell is not changing with time. So, this is the major difference between biochemical reaction networks and statistical influence network where the as you add more data the statistical influence network then the wiring diagram changes, it is actually data driven.
Whereas, the biochemical reaction is not going to change is almost rock solid. So, this is the major difference between these 2 kinds of network which are present in biology. One is the biochemical reaction data another is the statistical influence network.
(Refer Slide Time: 09:16)
Then now I just want to explain like what is the transcriptional regulation? The transcriptional regulation is basically you have some input signal and then based on the input signal the regulatory component get activated like transcription factors. So, transcription factor, TF get that is the regulatory component which get activated because of the input signal and the transcription factor, the activated transmitter either it can repressed or activate the gene for expression.
So, this is how you get your RNA protein output increases because of the regulatory component and then ultimately you see the change behaviour of this cell, cell behaviour changes and also the structure of the cell changes because of the regulation. That is why transcriptional regulatory networks are very important and they act like a on and off switches at the gene level. So, either the gene will be expressed or not expressed depend on the transcriptional regulatory network.
And that is very important for metabolism as well. Because the metabolism without the regulation it may not be accurate. You may construct the metabolic network but if you do not apply the regulatory network, the transcriptional regulatory network then your prediction of the network is incomplete.
(Refer Slide Time: 10:47)
So, what do you care about the regulation? Why you want to learn about regulatory regulation because the regulation if you see in E.coli, it is estimated that around 400 regulatory genes are present, 400 genes are present in E.coli itself. And there are 178
regulatory and putative regulatory gene found in genome. So, these are putative regulatory and putative regulatory genes.
And then we have 690 transcriptional units that is a continuous contiguous gene which are with a common expression condition promoter and terminator identified in Regulon database. So, we have various databases for example, the Regulon database can give you more data about the transcriptional regulation. So, how many transcription units are there that you can identify from different and also it will help in prediction of the cellular behaviour.
Once you add the regulatory component then the prediction of the cellular behaviour will be more accurate because it is much more close to the origins. So, the cell behaviour can be more, much more manipulated when you include the regulatory component.
(Refer Slide Time: 12:07)
So, here I have shown the lac Operon in E.coli. The lac Operon in E.coli consists of 3 structural gene that is lac A, lac Z, lac Y and involve in lactose utilization, the operon is regulated by lactose and glucose signal mediated by 2 DNA binding protein, the lac repressor and CAP respectively, lac repressor binds DNA only in the absence of lactose whereas CAP bind only in the absence of glucose.
So, this we have 2 transcription for binding protein and that has a 2 different role basically, the one binds in the absence of glucose that the CAP protein actually binding in absence of glucose. And lactose or lac the other DNA binding protein that is that the lac repressor also
bind in absence of lactose. So, this you can understand how it is happening like when glucose is present, only then there is no lactose.
So that is why the lac repressor binds the promoter region of the DNA and then when lactose is only present that is the only time the lac repressor is absent. So, the here you can see the lac repressor is moving away because lactose is present, whenever lactose is present then the lac repressor will not bind then it can recruit the RNA polymerase in the promoter region and that actually help for the expression of the gene that is the mRNA is formed.
And enhances there is no glucose the CAP protein is not binding. So, now, we have neither of them, so, neither of glucose or lactose is present then the lac repressor is still binding. So, that is why there is no expression of the gene. And in the other case, we have glucose and lactose are present. So, when glucose and lactose are present then what happened? Then the CAP is not binding and also the lac repressor is also not binding.
Since there is no binding and none of the transcription factor is binding then also we do not have expression of the gene. Because mRNA, RNA polymerase is not binding. So, the mRNA level is 0. So, this switch is actually, you can see it is almost like a switch where the gene is expressed only in one condition that is on condition where there is only lactose. So, this lac Operon is activated only when the lactose is present. And if lactose and glucose both are present then also you can see it is off.
Because the CAP protein is not binding, the cap protein is only binding when there is the absence of glucose. So, the CAP binds only in the absence of glucose and also the lac repressor binds DNA only in the absence of lactose. So, whenever the lactose is absent the lac repressor is binding. So, this is making the regulation of where the expression level is 0. The RNA polymerase binds weakly to the promoter and the operation is transcribed at a low level and whenever the lactose is present, only the lactose is present then only the lac repressor is moving away.
So, the references for this lecture is basically you can look for Biochemical and Statistical network model for systems biology which is published in Current Opinion in Biotechnology as also you can read the Nature Biotechnology paper for the 2 dimensional annotation. You can follow these references. Thank you. Thank you for listening.
Prof. Amit Ghosh
School of Energy Science and Engineering
Indian Institute of Technology – Kharagpur
Lecture – 15
Reconstruction of Metabolic Networks
Welcome to metabolic engineering course, today we will discuss about the reconstruction of metabolic networks. Metabolic networks as you know it is very important in metabolic engineering where the reconstruction procedure is very unique and which has been developed for last 2 decades.
(Refer Slide Time: 00:51)
So, it involves a lot of algorithm, a lot of methods which actually enable you to reconstruct the network. The network reconstructions have divided the topic into 4 parts that is 4 level functional decomposition of metabolism. So, you decompose the metabolism into 4 levels and then followed by data collection, the data collection mainly involve genome annotation biochemistry data and then physiology data.
And then ultimately you have to map gene protein reaction association, this is nothing but the how you can genotype, how you can construct genotype to phenotype relationship in metabolic network. And another important thing in metabolic network is the biomass composition that generally we measure experimentally.
(Refer Slide Time: 01:44)
So, let us start biochemical network reconstruction, the biochemical network reconstruction as you can see on the right hand side you have to identify all the reaction in the metabolic network that is the job of the network reconstruction procedure. So, first you identify what are the reactions involved in glycolysis. So, in glycolysis how many reactions are there in citric acid cycle, various other processes like urea cycle, calvin cycle.
The calvin cycle is actually uses carbon dioxide light and convert into hexose, to this major pathway like the liquid production pathway and the pyruvate formation pathway, acetyl coenzyme formation pathway, all nucleotide amitide 20 amino acid production pathways. So, all those pathways you have to identify for a given organism. So it changes from organism to organism but most of the time the central metabolic pathways are almost similar in different organisms.
So, the metabolism is in like a chemical engine, so the chemical engine in this many metabolic pathways is like a chemical factory that converts raw material into energy building blocks of biological structure, obeys laws of physics and chemistry, also have a regulatory structure. So, all these metabolic reactions are under regulation. So, we should keep in your mind that metabolic network the biochemical reaction network, assume that all metabolites are available.
But actually in the living cell when you model the cell system, then the regulation plays an important role where the reactions are active or not dependent on the regulation. And these networks are not separated like lipid metabolism is not separate from the other glycolysis. So,
everything is connected and they are dependent on each other. So that is why the network understanding the when you do metabolic engineering, you remove one gene or add new pathway, then the entire metabolic network is actually parted.
So this way, you can actually understand how the metabolic fluxes? The metabolic fluxes as a professor Pinaki so told you can only measure the flux if you have a metabolic model.
(Refer Slide Time: 04:03)
So, the 4 level of functional decomposition of metabolism which starts with the cellular input and output, so we divided the metabolism into 4 component then overall the metabolism composed of enzymatic reaction pertaining to the transformation of substrate molecule into essential building block of macromolecules and other vital product for growth and maintenance.
It growth kinetics description of the overall activity of metabolism involves substrate as input and biomass and bio-metabolic by-product as output. So, the substrate is the input and whatever you see is basically the growth and the waste that is a by-product are basically the outcome the output, so you have an input and then some output. This comes under the level one that is the cellular input and whatever you are getting out is basically the output.
So, the description compressor is a simple set of couple mass and energy balance with various empirical determined yield coefficient that describe partitioning of the consume subsets. So, you can calculate the yield that is how much carbon you are getting and how
much carbon it is consuming? So, you can see how much efficient your cell is in terms of input and output that also you can see.
(Refer Slide Time: 31:05)
For example, we define Gene-Protein-Reaction association, this is very important and because of gene protein reaction association, we are able to make a relationship that is from genotype to phenotype association are made by the gene is the genotype and the reaction is the phenotype and then it is connected through protein. For example, to give an example, we see that many gene have 1 reaction, this is a many gene, not all genes have 1 to 1 relationship.
When I consider that 1 gene, 1 reaction, this is not true always so, there are cases where we see that many genes are involved to catalyse 1 reaction, this is 1 example where 4 subunits combined to form fumarate reductase enzyme catalysing fumarate to succinate. There are 1 gene reaction for example, catalysing many reaction, so this is the example for fumarate is converted into succinate instead of fumarate reductase enzyme.
And the fumarate reductase has 4 subunit and each of these subunit come from 4 different gene and there is another 1 reaction is a 1 gene many reaction. So, we have many genes 1
reaction, this is the example and then we have 1 gene many reactions, this is the example. So, this gene is catalysing to reaction that is transketolase enzyme catalysing 2 reaction which is present in to be pathway.
(Refer Slide Time: 32:41)
Then we have the integration of the omics data, so integration of the omics data you can see that is GPR relation that is open reading frame the notation of the gene that you get from the genome, you identify the ORF of the genome and then you identify the gene and that is this transcriptomics mRNA level and then it identified the protein and this protein is catalysing 2 reaction, the reaction 1 and reaction 2.
And here you can see that from gene you are actually connecting to the reaction and this is present in the metabolic network in the metabolic network, you have to map from gene to reaction. Reaction you can get it from KEGG database but they will not give the information about the gene, they will only give the protein and the reaction but above protein you have to go to gene and then the open reading frame this say information you need for the metabolic network reconstruction. So that you can part of the gene and try to get the change in the phenotype that are the reaction fluxes.
(Refer Slide Time: 33:52)
So, another example that is the example of isoenzyme that is fructose 1 6 bisphosphate aldolase where it has 3 protein, 3 protein catalysing the same reaction. So, if you remove any of the gene, then what will happen? Then it will reaction is still visible by using these 2 reactions. So, if you want to stop the reaction then what do you have to do here to remove all the 3enzyme. So, because it is isozyme, isozyme is where they are basically the same there are multiple enzyme for the same reaction.
And if any of these gene is present then the reaction will be active, this kind of GPR relation that is the gene protein gene reporting reaction relationships are made in the metabolic network GPR gene protein reaction relationships on the gene and the protein and the reaction. So, this is the reaction that is the flux the phenotype determined and this is the gene it is actually catalysing gene, because of the genealogy gene the protein is catalysing and the protein is catalysing the reaction.
(Refer Slide Time: 35:09)
The more example that is pyruvate metabolism while you can see that gene these 3 genes are actually catalysing 3 reactions. So, this reaction ACKr is actually involved for 2 protein these 2 protein AckA and PurT both are catalysing the same reaction. And then TdcD is actually catalysing these 2 reactions ACKr PPAKr. And PurT is catalysing the ACKr and GART. So, you can see the same enzyme is catalysing 2 reactions.
Suppose, I want to stop ACKr how many genes to block? Suppose, I want to add to remove this reaction from the genome then for that to stop this reaction how many genes you have delete? Can you guess, how many genes you have to delete? So, we have to find how many proteins are involved so, you can see that the protein which are involved for this ACKr is basically all these 3 protein, so all these 3 proteins are involved.
So, we found to stop this reaction we have to actually remove all the 3 reactions to all the 3 genes and to stop these reactions. So, this way you can I can identify how many genes are need to be involved or how many genes are involved for a given reaction and this mapping is known as GPR gene protein reaction relation association and this is very important information which is available in the metabolic network where you can map from gene to phenotype.
(Refer Slide Time: 36:59)
So, these are the reconstructed network which are available for influenza virus E. coli H. pylori S. cerevisiae and then other many other organisms are now metabolic models are available as you can identify how many the main component in the metabolic network we will see the how many number of genes the model has, so that that is the important thing and the number of metabolites and the number of reactions.
So, over a time you will see there is a model is published and journal then after 44 years, you will see another model the updated models are available, those updated model basically a get updated in number of genes in number of metabolise the number of reaction. So, over time more data more information are available, then you can update the model based on the available data.
(Refer Slide Time: 37:54)
Another important thing is the biomass composition, the biomass composition of the very important that you determine experimentally and put it into the model. So, biomass compositions are required because the cell biomass or the growth rate dependent on the biomass composition. The biomass composition you can measure indicate the demand of the system the precursor may also be used for similar network approximation of biomass competition for less characterize organism like H. pylori and H. influenza.
So, those are also very well you have to characterize based on the biomass composition, because the biomass composition actually determine the growth rate of the cell that you are modelling. If the biomass compositions are wrong or inappropriate. So, exact number in the millimolar level you have to measure experimentally and those are feed into the model and why are you actually look for the growth rate of the cell.
And you compare with the experimental growth rate to match the metabolic growth rate with the exponential value and if the exponential value and the theoretical growth rate of the model are matching, then you know that then they are very good agreement, the model is in good agreement with the experimental growth rate. In this way, you can make the model much more accurate by comparing with the experimental growth rate for that you need the experimentally determined by biomass competition.
So, each of the component like ATP, NAD, G6P, F6P all these components you can measure experimentally and try to feed it into the model and then the model become much more appropriate. This is important part of metabolic reconstruction the more accurate the model is, the more accurate is your biomass completion. The biomass completion is wrong. Then you may not be able to compare with the experimental data and that is experimental growth rate of the cell that you are modelling.
(Refer Slide Time: 39:58)
In conclusion you can find that the complex networks carry out complicated biological function. So, like metabolism, the metabolism is very complex network and can carry out complicated biological function, all networks based on biochemical reaction described by stoichiometric matrix. So, all metabolic network or device evaluation network are actually represented by this stoichiometric matrix which I have already told you.
So, the stoichiometric matrix is important part of the network reconstruction, the hierarchy can be used to conceptualize network at various resolutions and various resolutions have seen that and various sectors; sector 1 2 3 4 those are each of the sector you can conceptualize and visualize the network in much more detail. The metabolism is the best characterized network in terms of biochemistry, kinetics and thermodynamics.
The very well characterized based on the biochemistry, kinetic and thermodynamic data and the network reconstruction required detailed examination of all components that links the network, many resources can provide the information. So, metabolic network reconstruction are actually give you detailed information of the component how they are connected to each other and how you can use different sources of data to actually able to build the network.
The metabolic network do not act independently of each other network integration of all the network is necessary to describe a cellular function. So, this is very important if you want to look for a very cellular function which you can compare with the experimental results. Then you have to integrate all these see network that is a metabolic network, regulatory network
and signalling network these 3 networks we want to integrate to describe the cellular function.
(Refer Slide Time: 41:42)
So, these are the reference which I have already told you and thank you for listening.
Log in to save your progress and obtain a certificate in Alison’s free Advanced Diploma in Metabolic Engineering online course
Sign up to save your progress and obtain a certificate in Alison’s free Advanced Diploma in Metabolic Engineering online course
Please enter you email address and we will mail you a link to reset your password.