Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
L07v01a complete export [00:00:00.00] [00:00:01.08] SPEAKER: Hi there. In today's series of videos, we're going to talk about gene specific transcription. In the previous class, we talked about general features of transcription and translation. What genes, in general, have to happen for genes to get turned on or off. But it's not enough to be able to turn genes on or off. You have to turn on selective genes when you want them, and keep others off when you need them to be off. [00:00:27.82] So this is about a higher level of control. And the very basic switch in that control is a DNA binding protein binding to a DNA specific sequence to turn a gene on that's right in that locality. Call this a lock and a key. The pieces have to fit perfectly. They're made to match each other. And that's how a gene can be activated. [00:00:56.12] We'll look at some of the molecular details, in terms of how proteins contact DNA, which I say is reading the code of the DNA in the major and minor grooves to recognize this is the gene that I'm supposed to turn on. And then lastly, in this video we'll talk extremely briefly about some different classes of DNA binding proteins or transcription factors. [00:01:22.38] And this slide sort of highlights the point. Let's start with the carrot. Here we have a single cell with one nucleus, with one copy of the genome, and yet all the information is there so that it, over time, develops from a young embryo, young plant, finally to a mature carrot plant. This is a timed orchestration of turning genes on and off so they produce the proper proteins at the proper time, the proper molecules. They build the proper structures, and in the end, allow the life cycle to continue. And they have produced an adult functioning organism. [00:02:07.51] The frog and the cow examples focus in on the cloning techniques, and we'll discuss those later in the course. But it's the same thing as how do the various instructions that are contained in the DNA be deployed in a time dependent manner to produce the final results of an adult organism? [00:02:31.52] So here we see a slide again. Six levels where genes can be controlled and their functions modulated so that you can have, in the end, an active protein doing a job that you want it to do. We'll talk mostly today about this very first step, transcriptional control. Because of its primacy, it's one of the best studied of these features. And it is, as you know, in essence, of your 20,000 to 25,000 genes, which ones are we going to turn on at this particular time to perform the next steps that we need to do in order to respond to the environment or carry out the plan that is inherent in our genome? [00:03:17.97] Once you've made the RNA, there's lots of other ways to influence this process of getting to the active protein, processing the RNA, transporting the RNA, binding to it, localization, making it free and available, or sequestering it from being used. There is control at how much protein is made for each mRNA, how actively or quickly the mRNA is degraded, how folded proteins' activity is controlled. Because an inactive verses an active protein can produce completely different results. And by the second part of this course, we will have discussed all of these in pretty decent detail. [00:04:10.55] OK, transcriptional control. Two basic components, a lock and key. The biological equivalents are a short stretch of a defined DNA sequence. This is the gatekeeper for a gene. And a gene regulatory protein that recognizes and binds to that sequence. This is the key that's sort of opening the lock or turning the gene on. [00:04:42.51] So how will the proteins read the sequence that is present? We know-- this is a picture from a previous slide-- how DNA recognizes itself. A and T participate in hydrogen bonding. And T recognizes an A by two hydrogen bonds. And C recognizes G by three hydrogen bonds. But the protein is not able to recognize these bases based on this base pairing. It is in the center here of the DNA double helix, and they are occupied with each other. That is not available. [00:05:21.12] What is available to the proteins is the ends that are exposed in the major and minor grooves. And although it's poorly drawn, this is what-- the protein C is looking at the end-this is what's in the major groove. Same here. Down here. This side is, in the way it's drawn in this book, which is this is a poor representation of the actual geometry, is what's seen in the minor groove. [00:05:56.14] So let's look here. Let's look at a particular slice, right here in the end on of a base pair. You could see hydrogens; carbons, the darker blue; nitrogens, the lighter blue; carbon, hydrogen, nitrogen, and oxygen. And this pattern, which we will schematize on the next slide, is how DNA binding protein recognizes whether it is a G, a C, an A, or a T, at that location. And it has the ability to query several base pairs at a particular time. You'll see, because of the geometry of the double helix, because it is twisting, that it's going to be hard for a single protein, for instance, to recognize 20 consecutive bases. [00:06:58.64] Because of the approach, it might be able to recognize quite a few. But it might recognize five here, down here, and five here, just because that is the face of the DNA that is exposed to the protein. And here we make that clearer. This is drawn to better perspective. And it's color coded in a very convenient way. So let's imagine the protein is approaching this GC base pair from the major groove. It'll see a hydrogen, nitrogen, oxygen, a hydrogen, a hydrogen, and a hydrogen. [00:07:39.09] Also note that, if we go left to right, a GC base pair will look different to it than a CG base pair. So the proteins distinguish between these two binding situations. If we compare the GC to the AT, we can see, for instance, that the G will see if it's looking at a G, it's going to see an oxygen here where it's seeing a hydrogen in sort of the complementary position. And then there's this methyl group, a hydrophobic atom, for the T base, which is not present for C. So this is the way that proteins can recognize these differences in DNA sequence without trying to access the hydrogen bonding, which we think of as synonymous as defining the G base. [00:08:33.66] And now, we've completely schematized the view. The helix of the DNA is running up and down. And we're looking in the major groove from the side. And we'll see for a GC base pair, hydrogen bond acceptor, acceptor, donor, and hydrogen atom for our GC based pair. And you can see that the patterns for the other bases are all unique. [00:09:02.96] This is the DNA. This is the ones and zeros, essentially, of the DNA code. Of course, that analogy works on different levels. The information content of DNA is two bits per base pair. And so that a single one or zero could not distinguish a G from an A, just like a single nucleotide can't code for a single amino acid. Two digits, a one and a zero, could code for the four different bases. [00:09:36.87] Just building in some molecular details, in terms of how a DNA binding protein will access or read the information. And that's by positioning a collection of amino acids in three dimensional positions such that they can interact properly with the sequence. And then several stacks of bases on top of this or below this. In this case, the amino acid asparagine is making two hydrogen bond interactions with the base from a single amino acid. [00:10:08.64] Now there is a great deal of interest of learning the code. Learning which amino acids are contacting the base so you can predict which is the sequence that's going to respond to a particular DNA binding protein. And that's a super challenging problem, and we haven't gotten there. And in my mind, it's an open question if we'll get there or not. [00:10:36.41] So in this slide, we start to look at different classes of DNA binding proteins. There are, I believe-- I'll have to check some of these numbers-- about 1,000 proteins which are known to bind to DNA. And a lot of them are general factors, like histone proteins. Of the types of transcription factors that bind DNA in a sequence specific manner, with the express intent of regulating gene expression, I think the human genome contains about 400 of those different proteins. [00:11:11.90] But of course, humans build complexity and diversity by forming heterodimers of two different pairs of proteins or alternative splicing, which could possibly influence the base pairs which are recognized. Anyway, there are about five major classes. I thought it's worth highlighting about three of them. One of the very common ones is called a helix-turn-helix motif. And the larger of these two helices lies right in the major groove, allowing many amino acids to potentially interact with the exposed edges of the DNA bases. [00:11:57.18] These proteins do not regularly act as monomers. Here, it's only making contact with, at most, about six bases. But a dimer of these, either a homodimer or a heterodimer, could interact with about 12 base pairs and start to give you reasonable specificity in terms of binding the sequence that you're interested in. [00:12:22.66] Another class of proteins that bind DNA specifically are called zinc fingers. Here you see a picture of a zinc finger with three fingers, if you will, three portions of an alpha helix binding in the major groove over about 180 degrees, or more, actually, in the major group of a DNA. You see a schematic. These are three relatively independent helices contacting approximately six bases still, even though it's spread out over nine or so. And they're called zinc fingers because these three balls are atoms of zinc. They are coordinated by cysteine and histidine amino acids. And their positions are relatively conserved. [00:13:18.23] This class of enzymes is of extreme interest to biotechnologists because you have the ability to engineer these three different fingers relatively independently. That is, if I make a change here in trying to alter which base it recognizes, to the first approximation, I'm not going to greatly affect which bases these other two fingers recognize. [00:13:49.60] And this gives engineers the ability to start designing circuits where specific genes that only you want to be turned on at a certain time could be controlled by a sequence creating the protein that you want and a unique sequence, which exists only where you want it in the genome and controlling and really creating that interaction. And here we see a detailed view of the zinc finger, the zinc atom, the two cysteines, and the two histidine residues. Overall, there about a hundred of these proteins in the human genome, this class. [00:14:33.31] The last class that we want to introduce ourselves to is leucine zipper proteins. And you see these two long alpha helices are grabbing the DNA, kind of like chopsticks. And here, the alpha helices will fit into two portions of the major groove. And these helices interact by hydrophobic faces. In this case, it's right here on the side. Now this would be rotating round to the back. And then it would be coming over to this side. Hydrophobic faces between these two residues. [00:15:10.79] Leucine is one of the smaller hydrophobic amino acids, although they're smaller. And they predominate on the interface between these two alpha helices. That's true whether these two helices are homodimers from the same subunit or from heterodimers, meaning that the two helices belong to two separate proteins. This class of proteins sort of lept into scientific consciousness when a couple of the very important cancer promoting genes, jun and fos, were recognized to be leucine zippers. [00:15:53.49] So now we've talked a little bit about how proteins can recognize DNA. And in the next video, we will see why the proteins sometimes will bind and sometimes won't bind. So we're getting further along in our quest to control DNA in a time-specific manner. Thanks.