Transcription Factors

Today, we will focus on transcription factors, and then we will move on to modifications to
DNA itself and how they influence differential gene expression and, therefore, development. So
transcription factors usually contain roughly three different domains. This does not mean all of
them to have all the three all the time. But by and large, most of them have this DNA binding
domain, which is a part of the protein. That binds to the DNA directly in amino acids interacting
with the DNA double helix and a trans-activating domain where other proteins or factors that

bind might modulate the activity of that particular transcription factor and a third one protein-
protein interaction. Some of the transcription factors, for example, acts as a dimer, so the two

polypeptide chains interact in that region. Sometimes other proteins interact and influence their
activity, so the trans-activating ones are responsible for actually activating or not activating the
RNA pol II to eventually.

(Refer Slide Time: 01:41)

So any defect in these transcription factors causes disease. An example is MITF, so this
transcription factor is expressed in the ear, the skin, and the pigment-forming cells of the eye like
irises. If you have a mutation in this, you will have a problem hearing and multicolored irises and
then white forelock. As you see in this picture, see the mother has white forelock, and her
daughter also has genetically inherited. So, particular problems arise when you have a specific
transcription factor missing. So their activity is essential.
(Refer Slide Time: 02:36)

And I am going back to those three domains. So in this model, protein dimerizes in the middle
portion, which is the protein-protein interaction domain. So that helps them to dimerize, and the

a long carboxy-terminal region is the one that helps in recruiting other proteins. For example, a
histone deacetylase, etc., and the amino-terminal region is where the DNA binding domain is
located.

So there are different types of DNA binding domains, and based on that, the transcription factors
are classified into several classes. Within those classes, small variations in the sequence might
define which promoter they bind and do not bind.
(Refer Slide Time: 03:33)

To give you an idea, let us look at some of them as a table which is there in the book like
homeodomain. So that is a particular DNA binding domain that is conserved, and that is present
in these proteins listed here. So we will see the Hox protein in detail several lectures later from
now, and then some have this helix-loop-helix, HLH, and that is present in these transcription
factors, and these are their functions listed here in the rightmost column. And then leucine
zipper, they form a zipper-like structure based on the leucine present in it. Usually, every 7th
amino acid will be a leucine in them, and then you have these zinc finger motifs. These were
historically discovered much earlier than others. So this coordinator zinc helps in interacting with
the DNA, and this is present in these proteins Krüppel, Engrailed. These are all discovered
initially in Drosophila, and the names are based on the mutant phenotype. And they are
expressed in these tissues. The nuclear hormone receptors also have zinc finger, and they are

present in the steroid hormone receptors, and then Sry sox that is another domain. So these are
the classes based on variations in the DNA binding domain structure.
(Refer Slide Time: 05:06)

So how do these transcription factors function like how do they activate or inactivate or control
transcription? Often it is by one of these two or both, one, they recruit histone-modifying
enzymes; for example, when a transcription factor binds to a particular sequence, then this
transcription factor may recruit histone acetylase, or they might recruit an enzyme that removes
methyl groups inhibiting methyl groups from histones. And by doing that, they will displace the
nucleosome structure, and that DNA gets opened up, and it is more accessible for RNA pol II
and other transcription factors. So primarily by altering the modifications on histones, they open
up the chromatin, which allows transcription, and the second is they stabilize RNA pol II often
RNA pol II is bound to the core transcription factors as shown in this cartoon. It is not very
stable, but when these transcription factors are bound to enhancer when they interact with all
These proteins make a more stable initiation complex. They stabilize RNA pol II on the
promoter, increasing the probability that RNA pol II will continue to initiate the elongation

phase. So in this structure, you see the enhancers can be at a great distance, but through protein-
protein interactions, the DNA can loop like this.

So this explains it is present within the coding sequence or in the introns or whatever, or it can be
even the downstream sequence. So, these are the general ways, but there are variations for each

transcription factor, but this is generical; if you look at it, these are the primary ways by which
transcription factors help in controlling the rate of transcription.
(Refer Slide Time: 07:16)

So how powerful are these transcription factors?. The digestive enzyme-producing part of the
the pancreas is called the exocrine cell. They usually produce the digestive enzymes, the proteolytic
enzymes, and so on, and they do not produce hormones such as insulin or glucagon. So here in
the image, this blue is showing the presence of DNA in the nucleus. Now you express three
different transcription factors in this Pdx1, so this is expressed in the pancreatic lineage starting
from the cells that initially required for the intestinal tube formation. In those cells, if some cells
express Pdx1, they set the pancreatic lineage, and in that, if you have these two transcription
factors Ngn3 and Mafa, they become the endocrine cells of the pancreas.

Now here you have taken exocrine cells; this is in an organism, it is not in the in-vitro cell
culture, so this is in the organism, where in the early on when you express these three
transcription factors, you have insulin-producing cells there. So the insulin is stained here with
the red color, and one of these transcription factors is fused to GFP, so, therefore, you see green,
and wherever both are there, you get yellow. So they are so powerful they can change the fate of
a cell from exocrine fate to endocrine fate.
(Refer Slide Time: 09:17)

So, of course, now more dramatic things have been done; people have shown by expressing a
few transcription factors any differentiated cell can be converted into undifferentiated pluripotent
cells. So this leads to a few questions; how do transcription factors themselves get expressed in a
tissue-specific manner?

So the answer is quite simple like the stories that people tell, when I was a kid I had one person
who was several years older to me like I when I was in elementary school this person was in
college, so he talked about some game that he plays. Then I asked who taught you this; he said
his PT master then who taught him, then his PT master, so I kept on asking, and I never got the
relevant answers because the relevant answer is someone who first discovered it. So similarly,
why is this transcription factor expressed in endocrine cells because another transcription factor
activated it. Why is that active only in pancreatic lineage it is because another one activated it in
the endoderm lineage, So that leads to what is called transcription factor cascades. So they work
in the Cascades. Example Mbx activates pax6, pax6 activates crystallin, insulin, glucagon,
somatostatin, etc.

Similarly, MyoD, this muscle-specific, really powerful transcription factor activates myogenin,
which activates other genes involved in skeletal muscle differentiation. So it is so on and so forth
like one after the other. So the central concept here is there is a cascade.
(Refer Slide Time: 11:17)

If you follow up the Cascade up to the top, then you have something called Pioneer transcription
factors. These transcription factors can open up a highly condensed heterochromatin and initiate
transcription. So it need not be already poised for access to proteins. A good example is this Pbx,
so it can go and bind to sequences in a highly condensed repressed chromatin.

So that is the definition for pioneer transcription factors. It probably binds to inhibitors bound to
that repressed chromatin. But once this transcription factor binds, it can recruit other
transcription factors, for example, MyoD transcription factor, and it will come with other
accessory factors that help in really activating the transcription finally and open up the place. So
that this Mef3, Mef2 etc., can go and bind to their respective enhancers and initiate transcription.

So these are the Pioneer transcription factors, and on top of it you have proteins like the
Drosophila Polycomb complex protein and Trithorax. So these proteins bind to the histone
modifications and maintain a memory of this original activation, memory meaning when that
particular cell fate is specified, and that is going to divide within that individual organism during
the ontogenetic stage. All the cell descendants of that particular cell will all know that they have
to keep a region active and a region suppressed. So those are done by those proteins, the
polycomb, and trithorax group proteins. So this is all about transacting factors controlling
transcription.

So like enhancers, there is an opposite phenomenon like there are other DNA sequences that act as negative enhancers meaning their sequence prevents the spread of an activation activity. For example, if an enhancer activates and if it is going to disassemble the nucleosome and spread along the length of the chromosome, then the adjacent genes might also get activated, So you do not want that you want that particular factor to be expressed in that tissue, not all genes. So something has to restrict that activation, and for that, you have DNA sequences to which proteins bind, which insulate or restrict these enhancer activities. So that is what we are going to see next, and they are often called silencers.
(Refer Slide Time: 14:22)

So silencers are opposite of the enhancers, so here is one example, here you have an element
called neural restrictive silencer elements. So what it does is it binds to proteins, that protein is
expressed in all tissues except in neurons. So, as a result, in all the tissues, this sequence will be bound by the protein, and there the genes that are under the influence of this particular enhancer will not be expressed.

So therefore, the genes downstream of those promoters will be expressed only in neurons and as As a result this is called a neural restrictive silencer. So here in the image is a reporter where instead of the actual gene you have LacZ because you can assay the LacZ encoded protein’s activity. So when you have this silencer sequence adjacent to LacZ, you find the reporter is expressed only in the central nervous system here in the 11.5 day old mouse embryo. If you do not have that silencer element, it is expressed everywhere. So these do the opposite of enhancers; they restrict the influence; otherwise, what will happen is the enhancer effect will not be very specific and restricted to the genes that need to be activated. It will spread and the control will not be really a tight control so adjacent genes may be partially activated etc.
(Refer Slide Time: 16:10)

So next, we go to modifications that happen to the DNA itself. So initially, we saw that
methylation, etc. in the histone proteins and that affects the chromatin architecture, whether it is
tightly coiled with nucleosomes and histone H1 that is bringing all those nucleosomes together
into a solenoid structure or it is going to be opened up for methylation or deacetylation. We also
saw some methylation in the H3 tail can be activated. So do not forget that often you may be
misled, you will automatically assume methylation means inactivation and acetylation means
activation. Acetylation is activation, but that generalization is not for methylation. So now, we
are going to look at methylations that happen to DNA.

So as mentioned earlier, to perpetuate an active state or repressed state, we have those Trithorax and Polycomb proteins that bind to the modified histones. For example, if something is
acetylated and you want that to be active, these Trithorax proteins bind there, and they maintain
the active state. Still, very similar but more robust is the modifications that happen to the DNA,
and that happens by methylating the cytosine residues. So CH3 is added to the fifth base that is
5-methylcytosine. So this matters a lot in regulation. So here methylation usually means a

repressed state like an inactive gene, and it is not going to be transcribed, and this can be
perpetuated through mitotic cell divisions. We will see how that happens in a couple of slides.
Second, this can have a developmental time factor involved in it.

Modification happens at a different space and time, not all the time. So a good example of that is the hemoglobin genes. These genes are expressed as ß-globin in the adult. In the early embryo,you have an ε version of the globin gene expressed. Its promoter is not methylated, whereas the γ-globin, which is usually expressed in the fetus, is methylated, so it is not expressed. As the embryo progresses, the γ-globin gets demethylated and gets turned on, which is dormant, and while the ε-globin gene gets turned off and when the infant starts to grow, the γ-globin gets methylated and is inactivated. In contrast, the ß-globin gene gets activated, and that is what is expressed in our body.

So our genome has ε and γ sequence, but they are methylated and not expressed. They were
expressed sequentially during your embryonic and childhood development. So now you have
only ß-globin, and there are consequences if there is a problem with this regulation. You may
have heard this disease Thalassemia that results from a failure in the sequential methylation and demethylation. So in these patients, you may have a problem activating the ß-globin. Let us say you have a mutation in ß-globin, and now you do not have a functional globin protein produced.
Although perfectly good copies of the gene are present in the chromosome, unfortunately, they
are methylated. So the gene is not expressed, and that is ß-thalassemia when ß-globin gene is involved. So this is a very well characterized congenital disease in India, particularly in some pockets bordering Andhra Pradesh and Tamil Nadu. In that area in certain communities where you have marriage among close relatives like first cousin marriage or sometimes an uncle marrying a niece. Like a brother marrying the elder sister’s daughter. Those are not uncommon; maybe they are rare now, but a couple of generations ago they were not unusual in those families, for example, this sister may be heterozygous, and this guy also may be heterozygous,
because they come from the same parent and they survived because they are heterozygous. Now
there is one-fourth chance that their child will be homozygous for the mutant allele. So that is
how you have ß-thalassemia running in families, and the underlying cause is this methylation
issues.

(Refer Slide Time: 22:06)

So now, how do you perpetuate this? So usually, these methylation block transcription by
preventing transcription factors binding to the enhancer. Sometimes inhibitors are also involved;
they will bind to the unmethylated one, and they will not bind methylated.

The sequence in this particular cartoon, you have CG coming together. So this is often called
CPG Islands, and its significance will become clear in a couple of slides. So for now, do not
worry; you think that this promoter region is usually subject to methylation and demethylation.
So when it is not methylated, the transcription factor binds and activates transcription from the
downstream promoter, and if it is methylated, this transcription factor does not bind; as a result
the gene is not active. So, therefore, here, this example shows that DNA methylation blocks
transcription factor binding to an enhancer.
(Refer Slide Time: 23:21)

And another way by which they function is, this methylated cytosine may recruit a protein like in
this case MeCP2; which can do two things, one removes the acetylation mark by recruiting a
histone deacetylase and second recruit a histone methyltransferase and mark histones with
inhibitory methyl groups. Due to these two actions, these methylated promoters end up blocking
transcription.
(Refer Slide Time: 23:59)

This sort of methylation based; transcriptional repression can be perpetuated through mitosis.
Because these cytosines are always adjacent to a guanosine residue, CPG Island; the phosphate in
between probably helps in pronouncing better; otherwise, I would say CG. So normally people
call CPG repeats, CPG Islands means in the chromosome here, and there you have a lot of repeats of CPG. And these are recognized by a methyltransferase called Dnmt3; this does not need either one of the two C’s that you see here. CG means the opposite strand will be GC. So you have C in both the strands due to this base complementarity. So here, neither C’s are methylated, and this methyltransferase3 can recognize such sequences and that is why it is called de novo methyltransferase. It can methylate with prior information. Now you have a perpetuating methyltransferase. Remember, this methyl group is not erased during mitosis; it is going to remain there. Now, after replication, one strand will have the cytosine methylated the other one will not have. The methyltransferase1 recognizes such methylated cytosines, and they methylate in the opposite strand the nearest C. That is how the adjacent G becomes crucial for this. So now both strands are methylated and again undergo DNA replication, then one strand will be methylated by Dnmt1; the other strand will not be, so this is how the repressed state is maintained during cell divisions. So during embryonic development at some point, inactivation by methylation takes place. Let us say the transcription factor cascade and chromosome modification ended up methylating the DNA, now all the cells descending from that original cell will all maintain that active or inactive state.
(Refer Slide Time: 26:30)

And so this has a lot of significant consequences in many situations, particularly here if we look
at this dosage compensation. So what is dosage compensation? For example, in mammals like
humans, females have two X chromosomes males have only one X chromosome. While the Y
chromosome does not have many essential genes, the X chromosome has a lot of important
genes. So will the females produce proteins double the number of males, and will that not cause
a problem in terms of the phenotype? So that has to be taken care of, and that happens by one of three different mechanisms. Like for example, if you take C. elegans, both the X chromosomes get reduced by half, and therefore you have the final quantities like one, compared to the males; females will have only one X chromosome.

In Drosophila, the single male X chromosome is doubled up. Its chromatin is modified such that
it is truly euchromatin, and the output is more efficient. And in humans we do the opposite; one
of the two X chromosomes in the female is converted into heterochromatin and repressed. And in this human-derived cell were this arrow points to a large black region is the condensed inactive X chromosome. And this is from a person with three X chromosomes, and therefore you see two black things which are called Barr bodies.

So that is how the inactivation works. The important thing here is if you look at the B and C, in
B what do you have is a very early embryo, in this you have the reporter LacZ fused to the
promoter on the paternal X chromosome. So LacZ will be expressed if the paternal X
chromosome is active; otherwise, there will be no LacZ, and therefore this blue color will not
happen. So the pink cells are, where the paternal X chromosome is not expressed. It is not
working, so this is very early; you see most of the cells having this color, so this is the inner cell
mass from which the entire embryo develops, but when you go to the later stage here in C, these cells do not have the LacZ expression. Later it turns out that in Mouse, the trophoblast cells preferentially inactivate the X chromosome of paternal origin, but in other regions, both kinds are mixed. Three essential points to remember about this inactivation is one this starts early in the early embryo meaning in the one-cell stage itself. If it is inactive, then you have an entire tissue or a part of a tissue-derived from this cell having no gene expression. If it is paternal, then paternal expression will be absent. If it is maternal, the maternal expression is inactivated. Second, the X chromosome gets inactivated randomly, either maternal or paternal. Third, once inactivated, it is irreversible. It remains in the descendants of that lineage, and due to that, you can have patches of variations in the somatic body. And that is often readily visible in organisms where you have a skin color having patches which are seen in calico cats. So these three points that it happens very early, and it can happen randomly, and once happens, it is irreversible needs
to be kept in mind.

All the descendants will have the inactivation, and if this is the case, then if I have a gene on X
chromosome and that is very vital. If that gets inactivated in my father’s genome, that means
having one wild type copy from my mother is not going to be enough. For certain genes, the
mother’s copy is required. And similarly, for some genes, a father's copy is essential.

So this is where you will find that; when you are drawing punnett square, it does not matter where The allele comes from either maternal or paternal. But there are situations like this X
chromosome dosage compensation where that matters. So we will see that in the next set of
slides.
(Refer Slide Time: 32:36)

So existing methylation gets erased during gametogenesis, and new methylations take place, and
this does not happen to all genes. There are specific genes that are methylated depending upon
whether it is in a male or female body. For example, some genes may be methylated during
spermatogenesis only, and some other genes may be methylated only during oogenesis. Usually,
they are mutually exclusive genes. The genes that are methylated during oogenesis are not
methylated during spermatogenesis and vice versa. This is called genome imprinting, so genome
imprinting means; sex-specific methylation as a consequence, sex-specific expression.

To further explain, If a particular gene is methylated during spermatogenesis, then it will not be
expressed from the paternal allele in the offspring. If it is methylated during oogenesis, that
a specific gene is not going to be expressed from the maternal allele. So if you have two alleles,
assume both are wild-type, one allele should be inactive, and that takes care of the dosage

compensation. For example, If a maternal allele is inactive, it does not matter even if its wild-
type the paternal allele is required for the normal function of that gene, and the same logic holds

in the opposite direction too.

But germline uses a very different set of combinations of the existing molecular biology rules in
taking care of its genome. You cannot readily mess with its genome, and the way it protects has
its peculiarities, and one of them is this erasing all the methylation and then newly methylate the
genes. The new methylation is going to be based on whether you are a female or male. For
example, a particular base pair sequence could have come from your mother; now, if you are a
male, you will have male-specific methylation added to that gene, even though it initially came
from a maternal or female source and vice versa. So the methylation is sex-specific, sex of the

individual in whom the gamete is forming. Now, as a result, the imprinted genes, meaning sex-
specific methylated genes, need to have both the alleles because one allele is going to be

inactivated, and the other allele is going to be the only one available. So, in that case, if you have
a mutation in such genes, then you will have a sex-specific phenotypic outcome. So that is what
we are going to see next.
(Refer Slide Time: 36:16)

So in this cartoon, in (A) Igf-2 is not transcribed because an insulator protein binds to the Igf-2
sequence and prevents its expression. So this protein binds to unmethylated DNA. Remember in
the previous case, like several slides ago, we saw a situation where the activator binds to
unmethylated DNA and not to methylated DNA. So here, methylation inactivates the
transcription. So, this Igf-2 will not be produced when this sequence is not methylated in the
maternal genome. Igf-2 is methylated in the paternal chromosome, meaning this locus gets
methylated during spermatogenesis and not during oogenesis. Now an insulator protein binds to
this sequence; these are the proteins that bind to silencers, which binds and insulates this coding
sequence from the effect of the enhancer. So this just happens to share the same enhancer, so we
need not worry here. In the Sperm-derived chromosome, a methyl group is present; as a result,
this insulator does not bind, and the enhancer activity impacts transcription from Igf-2, and it
gets expressed. So you need to have both copies of it. If you had this mutated like the father
carried the mutant version, then due to mutation, you will not have Igf-2 protein, and although
your maternal copy is wild-type, it will not express. So due to that, you are going to be deficient
in this protein.
(Refer Slide Time: 38:25)

Such mutations are not hypothetical; it does have real-life consequences. So if you take these two
syndromes, Angelman syndrome and Prader-Willi syndrome; these have two different
phenotypes coming from mental retardation and seizure and so on. Though both have defects,
Angelman syndrome is more severe than the Prader-Willi, but double is Lethal.

So if you look at the first one in this Punnett square you have chromosome 15. This is coming,
let us say from a wild-type sperm and a wild-type egg, then you are totally wild-type. Now let us
say you have a particular region in 15 that is deleted in the male. So, now you have a wild-type
copy of chromosome 15 coming from the egg but that is not helpful. Because the required genes
are actually inactivated due to maternal specific imprinting and when you do not have that from
the sperm then you get this disease. It is because of those particular genes, usually expressed
from the paternal copy.

Now if you look at the opposite; where you have wild-type chromosome coming from the male
but a deletion coming from the female, a different set of genes are affected because the
methylation is sex-specific here. And now the corresponding alleles in the male are inactivated
and you need it from the maternal copy and that is not available due to the mutation and due to
the difference in the genes affected you get Angelman syndrome. So this locus is shown in the
next cartoon. (Refer Slide Time: 40:30)

So this is a complex locus where you have several genes there. The grey indicates inactivation
except for PWS in males, this is a typo or a drawing error they had in the books. So PWS is
active in males, due to methylation, PWS activates other genes in the paternal copy. But in the
maternal copy due to methylation, you have a block in PWS, and because of that, these genes are
not expressed. So in this particular case, UBE3A is expressed only from the maternal copy while
These blue ones are expressed from the paternal. When you need both the gene products, then you
need to have both maternal and paternal alleles, and that is why you get those diseases. So this is
genome imprinting which is a consequence of sex-specific methylation.

The critical thing you need to remember is existing methylation marks are erased during
gametogenesis. So right now, in your cells you will have imprinting, the paternal allele will
know that it is a paternal allele because of a paternal specific methylation. Similarly, the
maternal allele will know that it is maternal because of maternal specific methylation or the lack
in both cases, and when both are there, it is fine.

Now when germ cells enter into gametogenesis, these marks are erased, and no methylation
happens. So if it is spermatogenesis you are going to methylate a specific, loci, or if it is
oogenesis, then you are going to imprint again another locus; these are mutually exclusive.

Module 1: Differential Gene Expression

Module 1: Introduction to Developmental Biology

Introduction to Developmental Biology - Learning Outcomes

Introduction to Developmental Biology

Life cycles and Evolution of Developmental Patterns

Experimental Embryology

Introduction to Developmental Biology - Lesson Summary

Module 2: Differential Gene Expression

Differential Gene Expression - Learning Outcomes

Eukaryotic Gene

Coordinated Gene Expression

Nuclear RNA-based Differential Regulations

Differential Gene Expression - Lesson Summary

Course assessment

An Introduction Developmental Biology - Course Assessment

We offer unlimited learning for free. Be a part of our mission.

Support us in our mission to keep education free for all.

Pick Your Contribution Amount.

Select Payment Method

Thank you for being part of our mission!

“Education should be...”

Education should be... free and accessible.

Select Payment Method

Thank you for your contribution!

You’ve started now, make sure you finish!

Learners with study reminders are 34% more likely to finish their course!

Set A Weekly Study Reminder

Set Study Reminders

Set Study Reminder

Empower Yourself For Free

Education should be...
free and accessible.