Prof. Amit Ghosh
School of Engineering Science and Engineering
Indian Institute of Technology – Kharagpur
Lecture – 16
The Stoichiometric Matrix: Representing Reconstructed Network Mathematically
Welcome to metabolic engineering course, today we will talk about this stoichiometric matrix which is very important in part in metabolic network which represent the reconstructed and networks mathematically. So, mathematically we want to build a network that can be represented mathematically and it will have all the information all the reaction which you can actually operate for different purpose.
(Refer Slide Time: 01:00)
So, this lecture today it will have following subheading, how we will learn about, how to form a stoichiometric matrix and what is S for a metabolic network? defining network boundaries and topological maps and stoichiometric matrix and different topological properties for S.
(Refer Slide Time: 01:25)
So, mathematically representation of reconstructed network we have falling course and how do we go about talking information from chemical reaction in our network reconstruction and putting in into some kind of mathematical framework. So, the entire metabolic network or the network reconstruction we put in into some kind of mathematical framework, what are the characteristics of this mathematical form that we would like to know when you actually bring all the reaction in a network framework.
And what does these mathematical formulation tell us about the state of the biological or chemical network, what are the insights you get from this network? Mathematically you can represent any network in the form of a stoichiometric matrix and then you can calculate different properties of these network, network properties can be evaluated from the matrix as well because matrix is easy to operate.
Because matrix algebra is very well known, many problems can be addressed when you actually put in a matrix form matrix algebra, matrix optimization and other things we can play around and find many inside you can bring out from the network that is why the network is always represented in the form of a matrix. And today we are going to learn how you can build this matrix that is stoichiometric matrix S. So, the different properties of S we will see how? How we can bring out new feature from the matrix itself.
(Refer Slide Time: 03:10)
So, in a previous class also I gave this example where you have a small network, you can see on the left hand side the substrate suppose this is a small network where some carbon source is entering the network and then I have 2 product; product 1 and 2. And this is a very simple network where you have only 3 metabolites A, B, C the internal metabolites inside the suppose you consider this as a cell and it has extra solar fluxes which is entering.
That is V 1 and some product flux like V 4 and V 5 which is going outside the cell which is forming product V 1 and V 2 product 1 and product 2. So, finally we have 5 reactions v 1, v 2 , v 3, v 4 and v 5 and we have internal metabolite A, B, C. So, now we want to write a differential equation that is the time derivative of this concentration dA / dt, dB / dt and dC / dt how do you write that? I think right now you will be able to write easily.
This is if you write in terms of fluxes V 1 , V 2 , V 3 , V 4 , V 5 then your dA / dt will become dA / dt it will be v 1 – v 2 – v 3. So, the component which is coming in and the component which is going out, so whichever is going out is minus and whichever is coming in it will be plus, so plus v 1 – v 2 – v 3 for example for dA / dt. So, for dA / dt you have v 1 – v 2 – v 3 for dB / dt we have v2 – v 4 and for dC / dt we have v 3 - v 5.
So, these are the time derivative of the concentration that is dA / dt, dB / dt, dC / dt and that is a change in metabolite A as a function of time. So, this can be represented this equation can be
represented in a stoichiometric matrix form. So, this stoichiometric matrix for these 3 equation will be written as S. So, S will have this is there are 5 reactions, so that is why you have 5 columns this is called column 1 for v 1, v 2 , v 3 , v 4 and v 5.
So, we have 5 columns; so column 1 is for v 1, column 2 for is v 2, column 3 for v 3, column 4 for v 4 and column 5 for v 5 and the metabolize are in the row. So, metabolize we have 3 metabolites A, B, C that is why I have only 3 rows, so now we can see the metabolite A is actually involved in how many reactions, so metabolite A is involved in that is v 1, v 2, v 3. So, that is why I have nonzero term for v 1, v 2, v 3 whereas v 4, v 5 are 0.
Because it does not participate the metabolite A does not take part in the reaction v 4 and v 5. So that is why this component is 0 and which is v 1 is coming in that is why this plus 1 and v 2, v 3 are going out. That is why this minus1, minus 1. Similarly for metabolite B we have only V 2 and v 4 are actually active for metabolite B. So, we can see that v 2 is plus 1 and v 4 is minus 1. And similarly for metabolite C we have v 3 +1 and v 4, v 5 -1.
So, in this way you can write the network in mathematically in the form of a stoichiometric matrix S. And how much ever big the network is you can compress those information into a matrix form suppose in a metabolic network the dimension of the matrix maybe 1000 by 2000 like that. So, here we have only 3 by 5 matrix. So, we have 3 rows and 5 columns but actually in real case in a metabolic network we may have 1000 metabolite and 2000 reactions it become 1000 by 2000 matrix.
And more important thing these matrix is a sparse matrix, so most of the elements will be 0 and few elements will be nonzero. These kinds of matrix what you get from the metabolic network is basically a sparse matrix where most of the elements are 0. Now I will describe how we can actually form these stoichiometric matrix are the chemical equation and this stoichiometric coefficient how you determine this stoichiometric coefficient.
(Refer Slide Time: 08:13)
So, this stoichiometric coefficients are actually determined by the chemical reaction coefficient. So, if a and the coefficient for metabolite a and the coefficient for metabolite c that is the number it is an integer number as a correspond to the stoichiometric coefficient of metabolite a like that we have e, h and that is stoichiometric coefficient of metabolite e and h respectively. So that we can for reaction V i and this is the reaction which I want to represent in stoichiometric matrix.
And you can see that since it is in the stoichiometric coefficient for metabolite a then the stoichiometric minus a which is a react that is why its minus and then so on for reactant c we will have the coefficient minus c and the product we have plus e and plus h. This way you can represent the reaction in a matrix where the rows are basically the metabolite. So A, B, C, D, E, H are the metabolite these are each row for each metabolite you see how many reaction it participate and also how many reaction you represent it will come under the columns and this way this stoichiometric coefficients are added in the matrix.
(Refer Slide Time: 09:35)
Now the basic feature of stoichiometric matrix is what is this stoichiometric matrix mean? It is generally represented by S. So, it is generally represented by stoichiometric by S. And the stoichiometric information they contain stoichiometric information from reaction in the chemical reactor network and the network have the this stoichiometric information associated with S is additional information about open reading frames, transcript level, enzyme complex formation and protein localization.
So, S can also have the transcript level and the enzyme complex formation and protein localization information as well. Because S is actually determined based on this information and represent and you can represent them in a BIGG database. As a BIGG database is another database which is actually available online is you can check all the reaction and how this stoichiometric data is stored in the database also you can look at.
An interface between high throughput data and in silico analysis. So, you can integrate different high throughput data and correlate with the matrix. So, this matrix is looks very simple but It is very powerful in the sense that it can integrate the high throughput data in the matrix. And then performing silico analysis. Today is we can see the biologists are actually generating lot of data high throughput data.
And sometime we do not know what to do with the data. So, these high throughput data are generated and it is in huge amount and sometimes we do not know how to use the data. So, using the, you can use the data in these to integrate the data in this stoichiometric matrix and do in silico analysis which can predict new behavior already going to understand the biology better by these data and why this stoichiometric matrix plays an important role?
The center half of these in a network analysis is basically the stoichiometric matrix. So, stoichiometric matrix is given great attention sometime building these stoichiometric matrix can be one PhD, so it is for a given organism suppose you want to construct these stoichiometric matrix. So, this stoichiometric matrix are readily available for many microorganism for human cell lines are also available.
We have econ 1 econ 2 econ 3 for generation network is also available where you the entire metabolic network of the human cell is also available by the S is available and the construction of S is very laborious in case of a multicellular organism where you have many compartments for microbial cell it is very, it is quite easy. Since there are no compartments but as you go to higher organism you will see that we have many compartments inside the cell that has to be taken care properly when you build the network.
(Refer Slide Time: 12:33)
Other basic features of the stoichiometric matrix that is the stoichiometric coefficient that comprise the reaction network, this stoichiometric coefficient the matrix have that comprise the reaction network entries are integers you can see the column of S is represent to reaction and the rows of S corresponds to the compound. So, we have the as I told earlier also the columns always correspond to reaction and the rows always correspond to compound by looking across the rows one observe all the reaction a given compound participant.
So, if you just consider the row you will see that how many reaction a given compound participates and how the reaction are interconnected. So, they just by looking into the stoichiometric matrix you know how the networks connection how they, what the basic properties of the network you can understand. For example given a compound you know how many reactions it participates that is how many reaction it is involved.
The S transforms the flux vector into a new vector that contains the time derivative of the concentration and therefore S is a linear transformation of the flux vector. So, in the previous the small letter was also I have shown that that the concentration vector that is dC / dt = S dot v. So, this time derivative of the concentration is actually proportional or transformed the time derivative or transformed to contain the time derivative the S actually transform the flux vector.
So, this is the flux vector v, so the v flux vector is actually transform, the S matrix actually transforms the flux vector into time derivative of the concentration and this is a linear transformation. So, S is a linear transformation of the flux vector that is what the S transform the flux vector into time derivative of the concentration and it is a linear transformation.
(Refer Slide Time: 14:55)
So, now another important thing there the reaction you import from various database like KEGG, Metastasi, Brenda those reaction may not be mass balanced or charge balanced all the elements have to be balanced during chemical conversion and the number of C, H, O has to be equal on both sides of the reaction you have seen that how any school level also you have balanced the reaction like the number of atoms on the left hand side should be equal to the number of atoms in the right hand side.
But for a metabolic network we have about 2000, 3000 reactions manually correcting the elements is actually laborious and it may have a lot of error also. So, here in this class we will know how mathematically using matrix algebra you would be able to correct the reaction mass balance, so the mass balance, so this is the scheme which is used for balancing the reaction where you can balance the carbon atoms, hydrogen atom, oxygen atom.
And you can balance in both sides both left hand right hand sides you can balance for a given reaction. For example here you can see the left hand side has glucose and ATP. And the product we have glucose 6 phosphate and ADP. And the reaction is v 1. So, for this simple 1 reaction you can make a stoichiometric matrix, since we have only one reaction that is why we have only one column and we have 4 metabolites that is why you have 4 rows. So, it said 4 by 1 metrics.
And the reactants are minus1 and the product is plus 1. So, minus for glucose, minus for ATP and product glucose 6 phosphate is plus 1 and ADP is plus 1 very simple. And once you construct the stoichiometric matrix then the next step is to construct the elemental matrix. So, the elemental matrix which I have defined here the elements balance of the stoichiometric reaction vector can be checked using elemental matrix.
So, let us construct the elemental matrix what is elemental matrix? Elemental matrix is again the rows are actually the atom. So, I represent the rows are carbon, hydrogen, oxygen, phosphorus and nitrogen. And the columns are the metabolite. So, for a glucose molecule how many carbons are there? 6, how many hydrogen’s are there? 12. And how many oxygen’s are there? 6 remaining atoms like phosphorus and nitrogen or 0.
So, in this way and give the number of in each of the rows and then for ATP and glucose 6 phosphate and ADP similarly you write the number of elements present in the molecule that is the number of carbon, number of hydrogen, number of oxygen, number of phosphorus, number of nitrogen. So, this we write down in a column. And your cost will be elemental matrix. And this elemental matrix and stoichiometric matrix when you make and then you take the dot product.
(Refer Slide Time: 18:07)
So, the dot product E dot S = 0 then you say that the reaction is mass balance. So, if E dot S is not 0 then there is a problem. So, here you can see for the same reaction you can see glucose plus ADP plus glucose 6 phosphates and ADP we saw after taking it when you multiply and the matrix E and S you should know how to multiply this matrix multiplication you should know to perform this calculation. So, matrix are multiplied A and B you can multiply the matrix this is the matrix E has the dimension of 5 by 4.
And this one this stoichiometric matrix has a dimension of 4 by 1. So that is why you can multiply these 2 matrix and you get another 4 by 1 matrix 4 by 1 while the 4 by 1 matrix is the all the elements are not 0. So, one of the elements you can see that this is not 0 and since it is not 0 that is why the reaction is not balanced the element mass balance is not there. So, to make this reaction mass balance since this is a minus 1, it is the dot product is minus 1 that is there is extra hydrogen in the left hand side.
In the left hand side we have more hydrogen so that is why I have added one proton on the right hand side to balance the reaction. So, if I add one more hydrogen at the product side then this reaction is balanced. So, by taking a dot product E dot S you can actually able to find the balance reaction that is the mass balance you can do so that the reactions are elementary balance and this you can do it by running a small code or you can run a code to actually able to find what how much elements are on the left hand side and how many of them are on the right hand side.
And if there is any imbalance then you can add accordingly to balance the reaction. So, elemental balance is required when you form the metabolic network.
(Refer Slide Time: 20:35)
Similarly charge balance also required charge balance also you can construct these stoichiometric matrix for this reaction this peroxide calamine is a reaction while you have the oxygen combining with proton giving right hydrogen peroxide plus oxygen. So, this reaction you can construct the stoichiometric matrix which is very easy where it is just minus 2 , minus 2, plus 1, plus 1 and the charge elemental matrix for the charge also you can make by the while you can see the oxygen has minus 1 charge and then the higher proton has plus 1 charge that is why I have 1 and hydrogen peroxide and oxygen has 0, 0.
So, you should take a dot product E dot S then you see that the dot product is 0 that means it is charged balance. So, it do not have to do anything, so in this way you can balance the charge and you can balance the mass make sure all the reactions are actually mass balance charge balance and then you can apply laws of physics another other mass conservation formula or many other techniques can be applied provided you have elemental and charge balance. This way you can make the network accurate and which is also you can publish.
(Refer Slide Time: 22:00)
This stoichiometric matrix for metabolic reaction network. That is from the genes to the now you have to add the gene. So, for reaction the reactions stoichiometric you have added but on top of that we have the gene information. So, in the metabolic network the very important part that they include the gene information, so every reaction is mapped back to a gene. So, we have this stoichiometric matrix which has many reaction v A, v BC, v D1, v D2.
And each of the columns that is each column is a reaction and each reaction is mapped back we have another mapping table. So, the matrix may not have the mapping but we have a parallel file which actually take care of the mapping and it says that the reaction v A is actually correspond to enzyme A and then it is corresponding to a gene. So, this way you can have the mapping for each reaction.
So, this is a one to one, one gene one reaction one enzyme mapping which is a very simple gene A give rise to enzyme A and it is enzyme A catalyzing reaction v A. So, this one gene one enzyme one reaction is very simple but the biology this is where sometimes become very complicated in the sense like you can have 2 genes one enzyme one reaction. So, this is a case where there are 2 gene; gene B and gene C which from A complex protein complex.
And we have basically gene B and gene C give rise to protein which are the subunits of a protein and they catalyze reaction BC. So, in this way you can see that 2 genes that are giving rise to one
enzyme one protein and then one reaction and there are cases like we have one gene one enzyme but it is catalyzing 2 reactions. So, this is also a possibility where you have one gene one enzyme and 2 reactions.
So, this then the mapping you have to do where based on the annotation on the enzyme gene annotation and try to figure out how the and this is also known as GPR relationship, last class also I told that gene protein reaction association GPR relations are also required for each and every reaction you have to map to gene. And that is one gene to stoichiometric matrix and compiling all the reaction vector. So, you have to compile all the reaction vectors so that each reaction is connected to some gene so this way the network is complex.
(Refer Slide Time: 14:17)
So, this problem, how to solve the problem, then how we can measure the flux? So, then FBA, flux
So, this point you from eliminate all by regulation and still you get a solution, small solutions space the allowed state step state and among the allow state you reach a point which is the optimal solution where the biomass is maximum.
(Refer Slide Time: 27:26)
(Refer Slide Time: 32:24)
So, we will learn about how to reconcile these how to fix this thermodynamically infeasible cycle and in the case 1 you can see the duplicate reaction. So, this is a reaction where you can
see PGCD and PGCDr which is this is a forwarding reaction and it is a reversible reaction. And this kind of duplicate reaction exists in the network because you keep on adding reaction you will see that you might have the same there are 2 reaction for this conversion from metabolite A to B.
And how do you remove these infeasible loop the better is to remove the duplicate reaction, so if you remove PGCD, then you are retaining you are not losing any information in the network that means you are keeping the reaction reversible that is forward and reverse going just you are crossing 1 component. So, removing the duplicate reaction will fix the infeasible cycle. So, you can remove fixing infeasible cycle by just by removing the duplicate reaction.
Then I come to the lumped reaction, lumped reactions are the reaction where many reactions are lumped together, for example citrate to isocitrate. So, citrate to isocitrate you can either you can go from citrate to isocitrate directly through ACONT or you can go through to reaction. So, if you want to take the other part and that is from ACONTa to ACONTb and then you are covering 2 reaction but if you get both the reactions to weather then it will be infeasible cycle.
So, how do you remove this infeasible cycle just by removing the reaction that is ACONT, so if you remove ACONT that is citrate to isocitrate then you are not losing any information in the network where you are the reaction network is complete and still you are able to remove the lumped reaction that is from citrate to isocitrate it. So, by removing the lumped reaction you can remove the infeasible cycle and also there are cofactors specificity which exists in the network.
So, sometimes you keep the same 2 reactions for same metabolic conversion, so these 2 reactions one is specific to NAD and another is specific to NADP. So, you have to keep only one reaction if you keep 2 reactions in the network. Then again also it will create an infeasible cycle and this infeasible cycle you can remove by removing one of the reactions. So, you have to see which of the factor out of these 2 cofactor enzyme which one is actually much more specific that you keep and remove the other one.
So, in this way you can remove the NAD specific enzyme and keep the NADP specific enzyme that you have to see which one is actually highly specific cofactor. So, these are the
techniques, these are the major 3 category where you can remove the thermodynamically infeasible cycle but there are there exists many infeasible cycle in the network that you have to see individually and you would be able to remove the infeasible cycle.
(Refer Slide Time: 35:34)
So, another example, I will show you are how we can actually bridge the network gap, so while you construct the metabolic network, we will see that you have left out many gaps in the network and how do you remove the gaps? So, here in the example, you can see that, so the Xanthosine is in this network Xanthosine is not produced but it is consumed. So, I want to add a reaction, so that the Xanthosine is produced inside the cell to Xanthosine is not producing the cell but it is consume.
So, to do that, I can actually make a new addition I can add a look for hypothetical protein through homology charge and try to add a reaction either I can add between these 2 points or these 2 points or these 2 points, this is I can add a reaction and that you can run that I can add a new reaction based on the homology charge I can look for hypothetical protein which I can catalyse this reaction.
So, I started with the charge and I found that that there is 3 enzyme one is from GMP to guanosine by this reaction, you can bridge the gap or if I see this gene or it is not actually located in the closely neighbour species. So, it is that is why I cannot add this one. So, I cannot add this one because it is not present in the closely related organism for which the network I am getting and then also if I add when an amino hydrolyse then also from when and Xanthine, then also I see that it is forming a infeasible loop.
So, these red dotted line you can see that it is forming a infeasible cycle, so this one is also not possible and then what do I left is basically the other one the Xanthosine 5 phosphate phosphohydrolase. So, this is a much more feasible, so it when you do the bridging network, when you fill the gap then you have to decide logically which one is actually suitable for your network.
So, out of 3 choices, I found that this is only actually feasible, where if you add these reactions then you can see the Xanthosine is produced inside the cell and you are able to bridging the network we can bridging the network gaps are based on the literature data.
(Refer Slide Time: 38:09)
So, there are many in the metabolic network you can either minimize ATP production or you can minimize nutrient uptake rate or you can maximize certain metabolite production or already I told about maximization of the biomass or even maximize the biomass and metabolite production both more detail objective function considered thermodynamics and kinetics of the cell.
(Refer Slide Time: 38:35)
So, you have to understand that, how whether the given reaction you are maximizing whether it is a growth couple or not. So, growth coupled reaction or is it not a growth coupled reaction. So, how do you check that whether the reaction is growth couple or not and the best way to actually maximize that reaction and you see, if you maximize a certain reaction, if you see that the growth is 0, then it is you can say that reaction is not growth coupled.
And if a reaction is if you maximize certain reaction and if you see that growth is automatically coming up that is a nonzero growth you are getting then it is a growth couple reaction. This way you can actually measure which of the reaction is actually growth coupled and which of the reaction is not growth coupled to simulate a growth simulation the biomass maintenance recommend have to be account for that is the growth associated component that you want to add in the network constant energy drain needs to be satisfied even in the absence of growth.
(Refer Slide Time: 39:40)
So, these are the constraint based reconstruction analysis method which is known as COBRA. COBRA toolbox in matlab also which is available which involves a lot of constraints. So, each of the methods which are available the optimization method while different constraints are applied and you can see the solution space is slinking with increasing number of constraints and this method the constraint based reconstruction analysis method involved optimization method.
And they (())(40:13) you use basically linear programming most of the time we use linear programming and then mixed integer linear programming and sometimes we use nonlinear programming and also quadratic programming. These various algorithms have been developed we will discuss in subsequent fluxes for understanding the gene deletion or they are understand the regulatory constraint or the flux variability and analysis and OptKnock flux variability, flux coupling, sampling.
So, various toolbox is available that is known as COBRA where you can apply this method and analyse the network and to understand it better where you can infer the flows or new phenotypes you can predict or you can apply different genetic perturbation to improve the production of certain metabolites that also you can do using this method.
(Refer Slide Time: 41:08)
So, in conclusion, the flux analysis and optimally based method flux for flux prediction is one of the most popular modelling approaches for metabolic system. So, using FBA you will be able to predict flux and it is one of the most popular modelling approach and the flux balance analysis is typically under-determined system where the number of fluxes are more than the measure fluxes. So that is n is greater than m and cannot be solved using Gaussian elimination.
That is why you have to use a formalism which is FBA maximize the biomass and you get the flux solution and the key precursor must be synthesised to ensure biomass production of the biomass components should be synthesizing inside the cell, otherwise the cell will not be growing. So, you have to make sure that the biomass components are synthesizing inside the cell, how you determine the biomass component? It is through experiment?
So, you have actually run an experiment to determine how much DNA, how much protein, how much RNA is present inside the cell and that must be produced inside the cell to make the cell growing and reconstruction enables integration of multi omics data, today we have lots of high throughput data that you can integrate into the metabolic network and understand the different properties of the network, why do you make the network much more constrained which is very specific to address your problem.
(Refer Slide Time: 42:39)
The reference you can see that there is a protocol for generating high quality genome scale metabolic reconstruction this one you can read, where is a very useful we will know what are the states required for genome scale metabolic reconstruction. Then whatever flux balance analysis which you learn today, you can read in more detail in nature biotechnology paper, also the tools the constraint was tool, you can read more in detail in nature review microbiology where it gives details of different methodology used to actually constraint the network.
The more or even go through all the references and get a more idea about flux balance analysis and how we can use to address different metabolic engineering problem. So, I close here thank you fo
Log in to save your progress and obtain a certificate in Alison’s free Metabolic Network Analysis online course
Sign up to save your progress and obtain a certificate in Alison’s free Metabolic Network Analysis online course
Please enter you email address and we will mail you a link to reset your password.