Loading

Module 1: Proteomics

Nota de Estudos
Study Reminders
Support
Text Version

Proteins and Proteomics

Set your study reminders

We will email you at these times to remind you to study.
  • Monday

    -

    7am

    +

    Tuesday

    -

    7am

    +

    Wednesday

    -

    7am

    +

    Thursday

    -

    7am

    +

    Friday

    -

    7am

    +

    Saturday

    -

    7am

    +

    Sunday

    -

    7am

    +

Welcome to MOOC NPTEL course on bioengineering, an interface with biology and medicine, inthe last lecture we discussed some basic concepts of amino acids and proteins.Today, we are going to talk about fascinating words of proteins and proteome technologies.Let us first start with the genome sequencing projects.So, the human quest to try to understand and analyse all the genes present in human andvarious other organism really got materialized during the timeframe of 1990’s to 2002,many genome sequencing projects especially looking at bacterial genome sequencing, yeastor Saccharomyces cerevisiae, drosophila melanogaster or fruit fly, Arabidopsis thaliana, mousehas the human genome sequencing projects when progressed during that time frame.In 2001, the first draft of human genome was reported in general Nature and Science, themolecular medicine is progressing beyond classical genetics to genomics and proteomics and afterthe completion of human genome project, the success of human genome project gave riseto the new field which is aiming towards studying all the human protein under proteomics.Let us first look at what we obtain after doing the genome sequencing projects.If you look at on the screen, the fruit fly or Drosophila melanogaster, it has around40,000 genes.Roundworm, C. elegans has around 19,500 genes.Arabidopsis thaliana or Thale cress has around 27,000 genes.And human has around 20,500 genes, so if you look at velasco complexity from the fruitfly to round worm to the plant as well as the human, the gene numbers does not proportionatewith the velasco complexity, so then it was really unexplained that how 20,00 genes aregoverning such a complex physiology of human, so immediately the new field of proteomicsis started getting attention that it is not gene.But probably, the proteins which are formed from the transcription and translation processes,they are probably governing many of the functions which is much more important for us to learnand therefore, you know we had gone back to the central dogma concept where the questwas to unravel genome and then further proteome, just to remind you about the central dogmafrom the genes after the process of alternative splicing, the transcripts are being formed.And then the process of translation, proteins are being formed, so if we think about allthe genes of the system that is genome, all the transcript of the system that is transcriptome,all the proteins of a system that is proteome, so while if we have around 20,000 genes, thetranscript number could be even 100,000, the proteome number could be even 1 million, wehave literally no idea exactly how many proteins are present.Because even after protein synthesis, there are many modifications happens, which areknown as post translational modification, so while the genome provides these staticsnapshot of total information which is encoded in a cell, the biological processes are verydynamic and complex and probably, better explained by the process of transcriptomics as wellas proteomics.So, this figure illustrates different circles and different level of complexity for example,the inner circle shows all the human genes which could be studied under genomics, nowfrom the same gene different type of functional RNA molecules could be formed in the processof alternative splicing and that blue sphere, one could study under transcriptomics andthen each of the transcript can now give rise to the protein forms which could be furthermodified in the process of post translational modification.And the yellow circle shows the complex world of proteome, so proteome is set of all theproteins which are expressed by a genome.And the field of proteomics aims to study the proteins and their properties to providean integrated view of all the cellular processes, whether it is protein modification, localizationor protein, protein interactions.So, proteomics broadly aims to study the extent of protein expression, how the Co and posttranslation modifications occur, how the enzymatic regulation happens whether its activationor inactivation, how the intermolecular interactions happens especially, protein protein interactions,where the proteins are localized, what is the structure and function of these proteins?So, broadly if you see proteomics have very ambitious goals and much more complex goalsand objective to achieve as compared to the genomics.It is kind of the conceptualise the comparison of genome versus proteome, so if we go backto the basics looking at the pre mRNA structure, there is different exons and introns and now,after the process of alternative splicing, same pre mRNA can now give rise to 2 differenttype of mature mRNAs, as you can see mature mRNA-A and mature mRNA – B. Mature mRNA- A has exon 1, 2 and 4.Mature mRNA- B has exon 1, 3 and 4, so alternative splicing is a process by which exons or thecoding sequences of the pre mRNA produced by the transcription of a gene is combinedin different ways during the RNA splicing.The resulting mature mRNAs can now give rise to different protein products for example,mature mRNA - A will give rise to protein A, whereas mature mRNA- B will give rise toprotein B.So, therefore a single gene can give rise to the multiple protein products, if you furthercontinue on the complexity of the proteins, proteins can further get modified in the processof post translational modification and here we can see, if the phosphate residue wereattached that process is known as phosphorylation, if sugar moieties like glucose are attachedthat is process of glycosylation.If methyl groups are attached that is protein methylation and if hydroxyl groups are attachedthat is known as hydroxylation, so many proteins undergo post translational modification atsome of their amino acid residues after the synthesis processes have already happenedand some of those examples are shown on the screen but there is still many other typeof PTMS happen which further governed the complexity of the proteins.These PTMS are also highly relevant for various physiological processes for example, manysignal transaction cascades, they are governed with the phosphorylation processes, so 20001to 2003 was a time frame when human genome projects were getting completed but consideringthe complicity of the proteome and overall proteins present in the human, the processof identifying the protein took longer time.And only 2014 and 2015, the first draft of human proteome Maps was reported, now stillaim was to only first try to report all the protein coding genes, so it means we are nottalking about millions of proteins but only that many proteins which are coded by thegenes can we at least detect those 20,500 proteins in the human body and of course,we need the experimental evidence for detecting those peptides and proteins.So, mass spectrometry based approaches as well as microscopy and tissue based maps ofthe human proteome were reported during this time frame of 2014 and 15 however, still theseare the drops because they had reported almost 17,500 proteins, so still we have not accomplishedunderstanding and knowing about all the proteins present in human or any other given system.Nevertheless, the advancements in the proteomic technologies have not started showing promiseand now there is hope that some of the missing protein which, which we have missed out probablywill be screened and will be discovered further.So, let us review the comparison of genomics and proteomics in the following animations.Several genome sequencing projects that aim to elucidate the complete genome sequenceof organisms have been undertaken by several research groups all over the world, the DNAsequences are identified by the shotgun sequencing technique and then aligned using suitablesoftware to provide the complete genome sequence.The genome sequence of a large number of prokaryotic and eukaryotic organisms has been successfullyTed used.The immense amount of information held by the human genome motivated researchers tounderstand the nature and content of genetic material in great detail, a collaborativeeffort between 6 countries and 20 laboratories was undertaken in 1990 to produce a draftof the human genome sequence, work proceeded rapidly with a draft covering most of thegenome being completed by 2000.And greater coverage being achieved by 2003, before sequencing the entire genome, physicalmaps of the chromosomes were made, this helped in providing key tools for identificationof disease genes and anchoring points in the genomic sequence, pilot projects were thenlaunched to create a draft of the genome sequence, doubt regarding the potential of availablesequencing techniques were overcome in this phase.When it was established that they could sequence the genome efficiently, the shotgun approachwas a fundamental technique used for large scale sequencing of the human genome whichalso makes use of sanger's sequencing, the collaborative effort to sequence the entiregenome was challenged in 1998 by a privately funded organization, which aimed to reachthe target before the publicly funded group.Progress made in sequencing was very rapid and by 2001, a draft of the sequence was readycovering around 83% of the genome.[The shotgun approach was a fundamental technique used for large scale sequencing of the humangenome which also makes use of sanger’s sequencing, the collaborative effort to sequencethe entire genome was challenged in 1998 by a privately funded organization which aimedto reach the target before the publicly funded group, progress made in sequencing was veryrapid and by 2001, a draft of the sequence was ready covering around 83% of the genome.]Several hurdles were encountered during sequencing of the human genome, the largest of them beingthe presence of long stretches of repeating sequences, these repeat regions made it difficultto assemble the genome accurately and the improved drafts published in 2003 covered92% of the genome with a large part of the remaining 8% being due to the repeat sequencesnevertheless, these genome sequencing studies successfully provided many findings aboutthe human genome.Genomic DNA consisting of introns and exons gets transcribed as such into its pre mRNA.Specific recognition sequences within the intron employ the spliceosome assembly toit which cleaves the intron out of the pre mRNA.The resulting sequence consisting of only the coding exons is known as immature mRNAand is ready for translation into the corresponding protein.And from genomic DNA is often made up of several coding exons in dispersed by non-coding introns.Alternative splicing, a common phenomenon observed in eukaryotes, allows the exons tobe reconnected in multiple ways by several different mechanisms.The diversity of proteins encoded by a genome is greatly increased due to alternative splicing,each mature mRNA formed gives rise to different protein products upon translation.Let us now talk about history of proteomics, is this feel very new, as I mentioned maybeafter genomics, the new field of proteomics was really got attention however, the developmentof proteomics actually spans from very long time scale.If you think on the you know, various developments which are happening over the period, evenfrom 1970’s, the 2 dimensional electrophoresis or how to separate proteins based on theirdifferent properties of molecular weight and isoelectric point was used by in 1975, simultaneouslydifferent advancements are happening in the field of mass spectrometry, various type ofnucleotide sequencing ESTs, genome scales were actually under development during thetimeframe of 1980’s to 1990’s.And then, over the period advancement in the mass spectrometry especially, the soft ionizationtechniques like ESI and MALDI, they started giving now in a much better understandingof the proteins and good capability of performing the large scale protein analysis, so all thesedevelopments are happening parallel and along with the genome sequencing projects whichwe are giving rise to lot of data.Now, people started realizing the need for developing the new algorithms, new databasesand new ways of searching the genes and proteins simultaneously, different type of chip basedapproaches, a different type of microarray chips and various genetic approaches wereall under development but only after completion of you know genomics project, the scientistsstarted realizing the need for studying proteomics or studying the proteins comprehensively underthe field of proteomics.And Mark Wilkinson gave the term proteome in 1995 and then over the period after completionof genome sequencing project the whole field of proteomics really got into the limelightand now that you know from 2003 onwards, lot of new developments is happening in the areaof proteomics.For any kind of complex proteome analysis, there are certain major steps involved whetherwe talk about human, bacteria, plant, any kind of sample, first thing is for studyingprotein you have to extract the protein out of that complex system, you want to you knowrupture all the membranes you want to get all the cellular contents out and only youwant to get the proteins extracted whereas, nucleic acid, lipids, carbohydrates, etc.you want to get rid of them.Then you want to separate the proteins and the protein separation could be based on thechromatography, could be based on the electrophoretic techniques, there are many different waysof protein separation, then further you want to identify those proteins and different typeof mass spectrometry based approaches like MALDI-TOF TOF or ESI Q TOF or ESI orbitrap,those kind of platforms could be very helpful for both protein identification.And if you want to compare the control versus disease condition, then protein quantificationand in the process now, the bioinformatics, different algorithm, software's they becomevery crucial because without them you cannot do the proper identification as well as quantificationand then, if now you are interested biologically for a given protein, then doing the structureand functional studies under the protein characterization can be very important.And different type of technologies including various type of you know, structural studies,CD and MR etc. and microarrays technologies they are all helpful for the protein characterization.So, looking at these kind of you know, various technological approaches which are tryingto understand the proteome, one could term; one could broadly classify the whole fieldinto gel based proteomics which looks into the protein analysis using the gels like SDSpage or 2d gel or dye gels or gel free proteomics which essentially looks into direct proteinanalysis using shotgun approaches especially, mass spectrometry based approaches.Functional proteomics aims to try to understand the function of the given protein and differenttype of functional technologies including microarrays, surface plasmon resonance etc.have been very helpful for the functional proteomic analysis, then it comes the targetproteomics where now you have identified the given peptides or given protein sequencesand now, you want to only look into those targets, you want to know further ignore allthe remaining you know millions of peptides.So, how to selectively analyse those peptides from the complex sample can be a studied nowunder the new field of target proteomics.So, very briefly let us kind of you know, glass through different type of proteomictechnologies which are currently being used.There are you know, gel based platform as I mentioned which includes 2D gel and differencein gel electrophoresis, there are different type of protein microarray platforms includingthe protein arrays and antibody arrays, there are gel free approaches which are essentiallymass spectrometry based approaches which also include the quantitative proteomics usingiTRAQ, SILAC TMT etc.So then, many technologies which are currently under development which are currently advancingthe field of proteomics, in the subsequent lectures we are going to talk about variousprotein technologies in much more detail, however, let me kind of you know give youthe the glimpse of different proteomic technologies which are currently being used for many biologicalapplications.For example, the gel based proteomics, as I mentioned you can separate the proteinsbased on their molecular weight, in the SDS page or you can separate proteins based ontheir molecular weight as well as isoelectric point in the process of 2 dimensional gelelectrophoresis.Or if your intention is to quantitate the proteins from the control and the test conditionsthen you can use a new technology which is difference gel electrophoresis or DIGE alright,so now let us talk about mass spectrometry, so mass spectrometers have increasingly becomethe method of choice for analysis of complex protein samples in proteomic studies becauseof their ability to identify as well as quantify thousands of protein.Broadly, any mass spec has ionization sources, mass analysers and detector, there could bea different type of ionisation sources, the popular ones are electro spray ionizationor matrix assisted laser desorption ionization or MALDI, a different type of mass analysers,again some of the popular configurations include TOF; time of flight quadruples, orbit rapsion traps etc.This slide shows you the kind of various major steps involved in doing mass spectrometrybased proteomics where you have a sample inlet especially liquid chromatography, now thepeptides are coming to the mass spec in the ionisation source, now they are entering intothe mass analysers where you want to solve these ions based on the mass to charge ratio,then they are being detected on the detector and then the data could be analysed usingdatabases.And we can also look at the their identification as well as relative abundance or quantitation,let us now watch this animation to get an overview of mass spectrometry based proteomics.The ionization cells is responsible for converting analyte molecules into gas phase ions in vacuum,this has been made possible by the development of soft ionization techniques like matrixassisted laser desorption ionization and electrospray ionization which ensure that the non-volatileprotein sample is ionized without completely fragmenting it.MALDI; the analyte of interest is mixed with an aromatic matrix and bombarded with shortpulses of laser.The laser energy is transferred to the analyte molecules which undergo rapid sublimationinto gas phase ions.In ESI, the sample is present in the liquid form and ions are created by spraying a dilutesolution of the analyte at atmospheric pressure from the tip of a fine metal capillary creatinga mist of droplets, these ions are then accelerated towards the mass analyser depending upon theirmass and charge.The mass analyser resolves the ions produced by the ionization source on the basis of themass to charge ratios, various characteristics such as resolving power, accuracy, mass rangeand speed determine the efficiency of these analysers.Commonly used mass analyser includes time of flight TOF, Quadrupole Q and ion trap.Here, we will focus on TOF and Quadrupole mass analysers.The time of flight analyser accelerates charged ions generated by the ionization source alonga long tube known as a flight tube, ions are accelerated at different velocities dependingon the mass to charge ratios, ions of lower masses are accelerated to higher velocities.And reach the detector first, the TOF analyser is most commonly used with MALDI ionizationsource, since MALDI tends to produce singly charged peptide ions.The time of flight under such circumstances is inversely proportional to square root ofmolecular mass of the ion.Quadrupole mass analysers use oscillating electrical fields to selectively stabilizeor destabilize the parts of ions passing through a radio frequency RF quadrupole field.The quadrupole mass analyser can be operated in either the radio frequency or scanningmode, in the RF mode, ions of all m/z are allowed to pass through which are then detectedby the detector.In the scanning mode, the quadrupole analyser selects ions of a specific m/z value, a setby the user, a range can also be entered in which case only those specific ions satisfyingthe criteria will move towards the detector and the rest are filtered out.Some of the most commonly used MS configurations are MALDI with TOF or ion trap and ESI withQ, TOF and ion trap, 2 mass analysers can also be connected in series such that thefirst one separates intact ions while the second one separates the fragmented ion particles,this helps in providing better resolution and allows identification of proteins throughpeptide fingerprinting.Hybrid TOF analysers such as Q TOF make it possible to carry out high throughput analysis,so we have discussed about broadly 2 major technology streams, one is the gel based proteomics,second is gel free or mass spectrometry based proteomics, now let us briefly look into interactomicswhere intention is to look at protein, protein interaction or various biomolecular interactionsand some sort of even functional characterization of known protein functions which you wantto identify.So, for example if you have identified various candidate proteins or biomarkers from a discoveryset off either you know 2D gel based approach or ITRAQ or you know various types a quantitativemass spectrometry based methods, then you want to further study their protein of interest.To do that you can use heterologous system, where you can clone the gene, you can expressthe that particular protein in the bacterial system, purify their protein and now you canstudy that on the functional array surface or the microarrays to look into that you knowthis protein interacts with which are the other possible proteins or then you can startdoing various type of modifications, various types of structural characterizations to knowthis protein in much more detail.Let us watch this animation to learn about how protein microarrays could be used forstudying protein, protein interactions interaction.Interaction studies of proteins with various biomolecules help in deciphering and understandingthe functions of various proteins in the complex network of cellular pathways, proteins interactwith other biomolecules such as nucleic acids, lipids, hormones etc. to execute a multitudeof functions in living organisms such as signal transduction, growth and regulation and metabolismto mention a few.Protein interactions with other biomolecules can be of several different types, they maybe weak or strong, obligate or non-obligate, transient or permanent.The physical basis for these interactions include electrostatic, hydrophobic, stericinteractions, hydrogen bonds etc.Protein micro arrays are widely used for protein interaction studies, one of the proteins tobe analysed is printed onto a microarray surface usually made of glass, the proteins knownas bait proteins get immobilized onto the array surface that is functionalized withreagents like nickel or aldehyde compounds that interact with groups present in the protein.This bait protein is then probed for interactions with suitably labelled query or pre proteins.Any unbound proteins are washed off the array surface, once the unbound proteins are washedoff the array surface, the protein interactions are detected by means of an array scanner,these protein micro arrays are extremely useful in studying interactions with other proteinsas well as small molecules, DNA or RNA.Another major area is label free detection technologies which mainly you know being usedby surface plasmon resonance or SPR imaging kind of technologies.Again the protein of interest which you want to further characterize, let us say an antibodyraised against that protein is immobilized on the chip surface and now, you have youwant to study the antigen antibody interaction, so you can float that protein and now youcan see that whether this antibody is binding this protein of interest or not and if youwant to look at you know the protein drug interaction then you can immobilize the proteinof interest on the chip surface.And then float the drug molecules and see whether the drug or inhibitors they are makingany complex with this protein of interest or not, so again the various type of labelfree biosensors which are currently being used for the further characterizing theseproteins of interest and they can provide not only the binding information but alsothe kinetic information about the KD values of how these binding events are governed.So, if you look at any biological applications and specifically more so about the clinicalapplications, various proteomic technologies which we are going to discuss in much moredetail in the subsequent lectures, they all can fit into you know in various ways to adddifferent type of questions.For example, you know if you are studying let us say some clinical applications youare looking at some disease biomarkers in the human context, you are looking at tissuebiopsies.So, again you are you know, extracting the proteins from them, you can also take theblood samples, you can extract the protein from them either analyse them using the massspec or analyse using the gel based platforms, you have identified now certain you know characteristicprotein which looks very interesting for the pattern analysis point of view from the youknow, disease individual as compared to the healthy individuals.You have also now after looking at these proteins of interest, you see there are many proteinsbelonging to a given network, they are you know highly , so now you want to look at theiryou know interactome or interaction networks, you can use microarray kind of you know platforms,so again many of these technology which we are talking together, they can be used forus to understand a given system much more in detail, much more comprehensively we canstudy.Especially, now again if you think about clinical context, these technologies can be used tomonitor the therapeutic responses as well as the early disease detection, so many ofthese applications are not possible without advent and advancements of these kind of proteomictechnologies however, you know while I am mainly discussing about the proteomic technologies.But I would like to highlight here that you know there are different spheres of Omics,if you think about all the gene information at the part of genome or the transcripts inthe transcriptome, all the proteins proteome, all the metabolites in the metabolome andthe phenotypic behaviour in the phenome, so the balance in complexities are governed withall of these spheres, only studying proteins or only studying genes will not help us togive the full picture of what is happening in that physiological context.And therefore, a new field of proteogenomics, where intention is to correlate both genomeand proteome information is become much more powerful.So, as studying all these you know different biomolecule in systems network or cell biologyis becoming much more powerful, so while genome provides the static snapshot of the totalinformation which is encoded in the cell, these biological processes are very dynamicgoverned with the proteins, transcripts, metabolites, so studying all of them together can be comprehensivelycan be much more powerful for us to know what is exactly happening in the given system.So, currently you know there is lot of data being generated, the big data is coming fromvarious type of DNA sequencing platform, there are various type of next generation sequencing,technologies which are you know, really advancing fast and they are now able to give us theyou know entire whole genome sequence, RNA expression analysis, so whereas you know differenttype of mass spectrometry based technologies are able to provide you know, very big dataset for all the proteome as well as the metabolites or metabolome.So, now there is system approaches are really required to analyse and interpret these largedata sets because if you really want to comprehensively investigate these ballast system, we needto analyse the data together, so this will help us actually to find the molecular mechanismsand therapies for diseases, it could also relate the molecular phenotypes with relevanceto the clinical characteristics.So, in conclusions today we try to, to see the journey of genome to proteome, some briefcomparison of what are the key differences from the genome to proteome, we have alsolooked at there are many technologies which are part of extending the proteome is startingfrom the gel based platforms to different type of advent in the mass spectrometry platforms,different protein microarrays, label-free biosensors as well as the emerging field oftargeted proteomics.So, unraveling the structural and the functional details of proteins at the proteome levelis very daunting task, proteomics has quickly evolved to become an integral aspects of humanbiology and medicine however, there is still the need to integrate Omics at different levelwhether we talk about genome, transcriptome, proteome, metabolome and phenome, so thatwe can make a real impact in physiology and medicine.I will stop here and in the subsequent lectures, we are going to talk to you about some ofthese proteomic technologies in much more detail with certain lab sessions, thank youvery much.