Loading

Mega May PDF Sale - NOW ON! 25% Off Digital Certs & Diplomas Ends in : : :

Claim My Discount!
Study Reminders
Support
Text Version

Set your study reminders

We will email you at these times to remind you to study.
  • Monday

    -

    7am

    +

    Tuesday

    -

    7am

    +

    Wednesday

    -

    7am

    +

    Thursday

    -

    7am

    +

    Friday

    -

    7am

    +

    Saturday

    -

    7am

    +

    Sunday

    -

    7am

    +

This lecture, we move a little forward and talk about. Other kinds of artifacts from images that are practically useful, specifically blobs and corners. Before we go there, we did leave one question for you at the end of the last lecture. Hope you spend some time working out the solution yourself. If not, we'll briefly discuss the outline right now and, uh, wait for you to work out the details yourself. So we talked about trying to use Ganni edges to get straight lines in images. How do you do this? You first compute canny edges in, in your image, which means you will compute your gradients in the X direction and the Y direction. And you can also compute your datas. Which are your angles of each of your edges, which is given by tan inverse of gradient with respect to Y divided by grade into perspective X. Now, as the next step, what we do is assign each edge to one of eight directions. It is just a number you could choose. Anything else. Would you assign each edge to one of eight directions? So we'll probably combine seven inches into a single bin. And for each of these eight directions, let's call one of those directions to be B we obtain what are known as edge lens and what are edge lips, edge lits are connected components for edge pixels with directions in D minus one D and D plus one. So for a given, so you have eight possible directions that you've been all your agility additions into, and you take one of them and connect them with edges. But. Orientation D minus one and D plus one. So you will get a set of what are known as edge let's. Once you compute your Atlas, we then compute two quantities, straightness and orientation of the edge lips. The way we go about doing this is using the eigenvectors Eigen values of the second moment, matrix of the edge pixels. So, what we do here is you have your edge pixels that belong to a particular edge elect. So you take all of those pixels and compute such a matrix. The matrix is given by. So if you look at the first entry here, the first entry here, corresponds to X minus new X, new X is the mean X dimension. With all of those pixels in the edge land and each X corresponds to one of those export and X of one of those edge pixels. So you're simply computing in some sense, a sense of variance with respect to the mean edge pixel along the X direction. Similarly, you have a variance of the along the Y direction of those edge pixels with respect to their means. And you also find the obedience with respect to. X and Y directions. So once you have this video, similar to how principal component analysis happens, we take the eigenvectors and eigenvalues of this certain moment matrix. And remember the I can vector and I can, I can vector corresponding to the largest eigenvalue. We'll give you the direction of the maximum variance among these pixels. And that's what we are looking for. Understanding. So we finally decide that the orientation of the intellect is going to be given by 10 was, have we won by being not where we want is the larger eigenvector by larger. We mean the eigenvector corresponding to the larger eigenvalue. Remember here that M is a to cross to matrix, which means you'll only have two eigenvalues at maximum. And you're going to take the eigenvector corresponding to those, to the larger eigenvalue. And we call that to be, we want, and we not is the other is the other, I get better. So if you take a tangent was of these two vectors, that data will give you the direction of that overall athlete. Okay. So in case, uh, you are finding this difficult to understand, please go back and read principal component analysis, and you'll be able to connect that to this particular idea. And we define straightness to be Lambda two by London. One where Lambda two is the second largest eigenvalue and Lambda one is the largest eigenvalue. So the statement is here is going to be the highest value when both the Lambda twos and Lambda ones are, uh, are, are equally close to close to equal. And once you get this quantity, you just threshold the straightness appropriately and story line segments. So it really doesn't matter whether you measure sweetness as lamb, that two by Lambda one or Lander one by Lambda two. You just have to ensure that you construct your threshold appropriately. If you inward the straightness ratio, you just have to ensure that you throw, shorten it up appropriate value and then obtain your line sickness. So here's a visual example. So you can see that after applying canny, you get all these edges on the left. So you do get lots of edges, which are fairly, well-connected not too noisy, but not all of them correspond to straight lines. Many of them correspond to edges that represent the, say the texture of the floor and so on and so forth. But we really don't want one that when the model let's say there is an application that we really don't want it. So then you use this kind of an approach to obtain just the straight lines from your Kenyon. Just think more about this, if, and you can, I think knowing more about BCA will help you understand this a bit better. Moving forward. As we said, the focus is going to be going from edges to newer artifacts that is blobs and corners. So remember, last lecture, we talked about taking the and we said that the zero crossings of the lab, lesson of grass, guardians could give you a measure of just an image. That was another way you could obtain it, just in an image. They're going to talk about it slightly differently now. So once again, just to recap, remember, this is your This is your derivative of the Garcias, and this is your lap LaSeon of your costume. Where the LA scene we defined us grad square F is equal to do square F by the X squared by D plus those square Y by X, do I spread. Now, let's say an example of a is going to look somewhat like this, where you have a minus four in the center. And one, one, one, one in its nearest neighbors and zero zero, zero, zero. And it's next level of Nunez Naples. But can you try to guess why this is a relevant One point to know, does the up is an equal and could also have been zero zero, zero, zero minus one, minus one, minus one, minus one and four. So if you actually visualize the spirit up, you would see them at the center. There is a peak. And it's immediate nearest neighbors. There is a minus one. There is a value that goes below underneath your zero. And then there are zeros in other places. So if you try to visualize the street, that would be very, very similar to the shape of the , but we just saw on the earliest light, but that's sort of a geometry of conceptual perspective. Why did we say fourth? Why not? Why not? Eight? Why not? Any other number? Once again, is it goes back to approximating the gradients in some manner. And we can work this out to show you how this works out. So remember, we are competing secondary bedrooms here. So those square F by doing square can be recognized in approximating using first principles. You can say it's F of X plus one, Y. Plus F of X minus one, Y minus two F of X, Y similarly, you can write it out for those square F by the white square. Now, if you try to put both of these into your lab loseno costume equation or your equation, in this particular case, you would then have that squared F is equal to F of X plus one Y plus. F of X minus one, white plus F of X white plus one plus F of X, Y minus one, minus four times X, Y this should straightaway ring a bell for you as to why you got this kind of afraid of F of X swipe. The coefficient is minus four and its neighbors X plus one X minus one. And similarly, Y plus one by minus one, the coefficient is one. So the simply came from. Taking an approximation of the gradient. Remember you could have other kinds of approximations of the gradient with respect to the local neighborhood. You could consider a larger windows, one and so forth. If you do that, obviously the definition of the would have to be appropriately. Another visual illustration here is the original image. Here is simply taking the and here is the dessert. If you take the lap Lyceum of costume, remember again, that gives you the smoothing effect, which smooths out the noise and then picks the of your original image. Once again, remember the beer, just taking a filter and in the filter by moving it around at every point in the image and getting your opinion. Now let's ask the question. So this is something that you partially saw the last thing. So what is, can , do you have ideas? The other thing passion of Garcia can do is detect blobs. Y remember that a lap last, you know, Gus in filter looks somewhat like this. Remember you could write the Laplace in this way or the other way. It really doesn't matter. Like, so, uh, at the end of the day for edge detection and similar detection of other artifacts, the only take the absolute value, whether the output is negative or positive, we just take the observed value. So that would change if you had a white circle with a black hole, Or a black circle that the white one, we ideally do not care. Both of them are blogs to us. And we ideally want to recognize blobs in both these cases. So you could write your lab, glycine of Garcia in this way, or another way where your central peak goes up on top. So, and the other part comes below. This is similar to writing the lab, lesson of costume with a minus four in the center. And one, one, one around or affording in the center and minus one, minus one, minus one at all. So now, if you try to visualize this as an image, so if you took a lap lysine of Gus freed up a three by three, maybe too small, let's say you expanded, take a larger neighborhood. Let's say you take a seven by seven or 11 by 11 . You would find that the last lesson of Gustin freedom, just the filter itself. Remember that's also matrix that also can be visualized as an image would look something like this. Uh, black blob in the middle or white ring, which is where you have a peak and gray all, all the way. In other places, you could also have an invert of this, where you have white in the middle of black green, and then gray all over what of these are similar. And by looking at the filter, you can say that it's likely to detect blogs in some sense, can we do show the filter can be viewed as. Comparing a little picture of what you want to find against all local regions in the image. So that is a slight nuance, which is different. This does look a bit similar to template matching for what you use cross correlation, but in convolution, you would double flip the filter and search for that across the image. But when your filter is symmetric, it really does not matter what cross correlation and combination we'll be looking for the same freedom in the image. So once, once you have an application of glycine like this, you can probably come some flowers in a field, or you can probably detect, uh, red blood cells, uh, cells in your blood test. Any, any structure that blobs across an image, clearly in this image with sunflowers, there are blobs of different sizes. So you would have to run a lab with different, uh, the different blobs sizes. To be able to capture all of those blobs in the image. Let's not move forward to the next artifact, which is very useful to expect from images for a large part. Uh, extracting corners from an image was a very, very important area of research in computer vision in the late nineties and early two thousands, a lot, probably in Thai nineties and early two thousands. So we'll try to describe one popular method today. And let's start by asking the question. Suppose we had an image such as this, what would be interesting features that set apart this image from any other dimension, remember that we want to do any processing. We want to be able to extract some unique elements of the image. So what are those unique elements in this image? And you're going to say that those are going to be your. Kronos, but let's try to go there. We ideally want to look for images, image, regions that are unusual, something that sets apart, that image some, that region image from other images. If you have a region that is textures, for example, the blue sky, then that could be common across several images. And it may not really be unique to that particular image. You might not be able to localize that region in any other dimension with the same, with the similar content. You ideally looking for patches with large contrast changes, large gradients. Like you're just a good example, but we'll talk about why it just may not be the right artifact in a moment, but we are either looking for some patches or some regions in the image, which have large contrast changes such as ratings. But the problem with edges is it just suffered from what is known as D. About your problem. Your problem is just, uh, see a visual illustration on the next slide, but that why I just are good, unique aspects of a particular image, they do suffer from a problem, which we'll talk about the next slide. So what we ideally are looking for are regions of the image or what we're going to call corners, where gradients in at least two different directions are significant. So remember the edge is an artifact where you have a significant gradient in one direction, which is going to be normal to the page. It could be whatever direction, the edge, maybe the normal direction to that edge is going to be the orientation of that, of that edge. But the thing we want, those points where there could be significant gradient, what does that mean? I don't want some. Let's take a tangible example. Let's assume that there's an artifacts such as this, this looks like some inverted V let's say, and we have this such an artifact and we want to find out, uh, which part of this you mentioned, let's assume the system for the image or ignore the blue box. The blue box is for explaining it to you. Just consider the image with just in button V. You want to find out now which aspect of the image is unique to it, which can help us. Recognize it say other times, or if you view that image from others. So if you consider this blue box placed here, you see that that's a flat textualist region. There's no change in intensity in that region. So it's not going to be very useful. It's like the blue sky with absolutely no change. It's not going to be very useful when you try to compare this image with other images. So if you now place the blue box on the edge, part of the, uh, Of the image. This is good. There is some artifact that's useful, but the problem is if you move this patch hood off here before all of that have the same response, you will never know whether you placed your box on here on here, because all of them would have exactly the same response. And there's no difference in the local characteristics in all of these places. So, which means while it is a useful, there is something that they're lacking. So which means if you try to match or let's say you take a panoramic photo in your phone and you try to align two images, you may not know which part of the edge to align that. Ideally looking for placing the box at that kind of a point where there is change in two directions, and that point could be unique. To this particular image. And ideally we're looking for many such points in an image, but such points are the kind of points that we want to detect in an image. How do we find such points in an image? Like we know how to do it, just know, but let's say to go one level further to try to see how do you find such corners in an image to do that? We're going to define a quantity or auto correlation. Okay. As the name States, it's auto correlation, it's correlation with itself. So if you're not going to use any external freedom, it's going to take a patch in and image and see how it correlates with itself. What does that mean? Let's try to quantify that. So we are going to define the autocorrelation function as your take a patch in the image, and you compute the sum of squared differences between pixel intensities. But small variations in the image patch position. So if you had a point B I in the image, let's say you place it at the center. That's what is going to be the eye. And if you now have a small day, are you, which is going to be the difference. You're going to move that patch by a certain Delta. You remember BI and Delta, you are both 2d coordinates. So even then you will probably have. And talk, you and the Delta will be, for instance, like I want to, we do that mentions that it's electric, which is going to be, so yeah, I have BI, which is the image intensity at that point BI, then you move the point, the I plus Delta, you, you move it a little bit and you see what is the image intensity at that new location? And you think the sum of squared differences for all people in that particular patch. So if you take a square patch, Take every point, move it a bit to move that entire box of it. And then you compute pairwise distances between the same locations in the original patch and the new patch and sum them all up. This is what we define as auto correlation about dysfunction, WPI. WF BI is a function that tells you how much you want to give weight for each particular point in that batch, maybe for the central pixel, you want to give more weight for a pixel at the periphery of that square of that blue square that you have. You may want to weight it a little less of it. So you could have a fixed rate for all points in that square. Or you could have a which defines WF BI. When you wait in the central pixel mall. And the pixels at the periphery, a little lesson, we define this as auto anymore. Now let's try to see how you, uh, compute autocorrelation and then come back to how do you compute honor's using this auto correlation? So let's look at this more deeply. Let's consider the Taylor series expansion of your. I have BI plus Delta U book PA. Plus the interview is about taking the patch, which was centered at BI and moving it by a Delta U and placing it at a slightly offset location in the same image from Taylor series expansion, you can write disaster IFPI plus grad of IPI Delta. You were gratified. RPI is the image gradient at that particular location? Do I. BI by new X come up though. I at BI BI doorway, you know how the computer gradient, like we've already seen that so far. Now let's write up auto correlation. Remember the definition of auto correlation that we wrote on the previous slide is this one where we say what the correlation is WPI into IFPI, plus Delta U minus IFPI. Right now we're going to replace, I have IFPI plus Delta U. With this Taylor cities expansion. So plugging that in here, you're going to have WPI this first item here gets written as IFPI plus Delta IFPI, Delta U minus IFPA. So just to, uh, just to explain the notations here by Delta of U V mean a vector Delta of you and say Delta V. Because it's, we have taken out a submission here. They simply writing it as one of those data of using insight. Like, so anyway, that's going to be a submission for TAVI, which will take care of the other competent. Once you have this, you can see that IFPI and IFPI gets canceled and you're left with summation would X, Y WPI into those. Why do I have BI? Do you hold script? And we're going to write that as a quantity, Delta, you transpose a matrix a into Delta U. Okay. So remember that you have this tiny square, you just split it into two parts where you have one part that comes here. And one part that goes here and the rest of what you have WPI into though, ITI, you combine that into a matrix. How would it look okay. It would look like something like this number is going to be a combination of the blue into your device though ice. So it means it's going to look something like this. W have you come a week, you're going to have number two, competence of the submission. Are you squared? I X I, Y I S I, Y I, my square, those are going to be your gradients, and that is going to be your Imitrex. So we took the definition. So we defined autocorrelation in a particular manner where we said, we take a patch, move it a bit and see how the image properties changed in that local region. So we then took the definition of auto correlation, played with a little bit, and then came up with an expansion, which looks somewhat like this. And we're not focusing on this matrix. So. This may be a so DECA you is simply the change that you imposed in the batch, um, in the patch location. Like that's what that's, what you posted is what is giving you how the gradients changed between those two patches and, uh, w rating factor that tells you how much you should read the central part of that batch versus the peripheral parts. So since it is what defines the intensity change, you're going to consider. And I couldn't decomposition of a, which has given by you Lambda you transpose that Lambda is going to be a diagonal matrix with Lambda one and Lambda two to a two I can values. Remember is a two dimensional matrix, which means the maximum number of eigenvalues you can have is to a, of, into UI is equal to land that you want and your standard I can do composition. This is wonderful. So we, once again, started with auto correlation. Kim wrote it in a slightly more different, slightly different manner. And then now we're done. I couldn't decomposition of, Hey, where do we go from here? How do we go from your, to them to finding a column? That's the question we'd ask you. Think about it for a moment. How do you think you can go from the eigenvalues of a position of a column? Very similar to the discussion that we had at the early part of the lecture, when both Lambda one and Lambda two, a large, you know, that the intensity changes in both directions at those points when either Lambda two is greater than land, much greater than Lambda one or Lambda one is much greater than Lambda two. It's going to be an edge because there's going to be changed only along one direction. And if both Lambda one and Lambda two, a small, you're going to say it's a flag or next to this region. So which means we know that from the identical position of Abe, all what we're looking for is both, again, values to be hyped. And we know you probably hit a corner, so let's see how you actually do this inference. So there's another way of looking at it is. So whenever there is what tickle edge or the horizontal edge you're going to have either London, one greater than London, two or London, two greater than the one at the corner. You could have both London and London to, to be launched. And in a flat region, you're going to have both London and London, very small. So what do we do with this? So the baby, you're going to compute your corner. This was a method given by opposite part. How does, and that's why it's known as the hottest corner detector. It's a very popular detected. It was used for several, several years. Of course, there have been lots of improvements and better methods that people have developed, but this was one of the earliest, uh, corner detectors that was developed and was used for many years. The entire pluses, your false follows, something like this. You compute gradients at each point in the image. Using that you compute your aim matrix. You can use a weighting function, or if you don't use a weighting function, you just assume that you're going to consider all of them to equal all of the batch position positions of that batch to be equal. Then we ideally want to compute your icon value and design your corner based on the icon value. But because I can decomposition by itself can be a costly process. We try to do a slight, a slight deviation to be able to compute a continent as measured. So we've got to define a common as measured as Lambda one into Lambda two minus CAPA times Lambda one Lambda, two squared. I let you work this out to show that when this entire quantities height, you will know that both those Lambda ones and Landa twos are height. What does art for yourself? Try out different London ones and London dues. And you'll see what I'm saying, but what is interesting here is Lambda one and number two is nothing but the determinant of a and London one plus number two is nothing but the trees of eight, which means we can define our continent as measured as determinant of a minus some constant copper times. Three square of a CAPA is just a constant that you have to define to get what you want to get four different images. You may have to say differently. Why is this useful? You normally need to compute the ideal decompositional Imitrex. You only need to compute your determinant and place, which is a bit easier in computing. I finally. You then take all points in that image who's gone on as measured is greater than the threshold. So you can take lots of different points, probably all points, compute the colonists using auto correlation, and then whichever is greater than a particular threshold. You actually call them corners. You can finally also perform non maximum suppression where if you find that there are many corners in a very small local neighborhood. You pick the one with the highest calling us measure. That's your normal maximum suppression that we also talked about in, uh, in the, in the County edge detector. So there's not maximum suppression will keep coming and coming back to us at various stages in the scores at various, uh, uh, in, in various use cases. So here is a visual illustration of. The hottest corner detector. So let's consider two images of the same object. They come from different angles. The objects are at different poses and the illumination is also different. Let's try to run the hottest Conor detector. Ideally, what we want is that in both these images, the same parts or the same corners in the Dole get picked. Why is that element? Once again, if I take the example of. Say, stitching different images in your cell phone, a panoramic mode or something like that. We want the same points in both those objects to be located so that they can be matched and probably stitched together to get a panoramic image, just as an example of an application, let's see it now. So here is the step of computing, the continents, where you do auto correlation, you get your communist values and then you take all the high responses by using a threshold. No non maximum suppression. That's why you see many of those points in the regions. You just take the most highest. You just take the highest Cardinal value. You do the non max suppression, and now you get a bunch of different points. And now let's try to visualize them on the image and you can actually see, okay. If you look at say these highlighted examples, you can see that, although. Uh, those are fairly differently treated. You can get similar corners in both of these regions, so you can see many other industries. Obviously you get a few more corners, which are not there in the second image, but those can be overcome at the matching phase. As you see a bit later in this course, the focus of this particular lecture is on parliament. Just detecting the economist. How do do the matching between the two images? We'll come back to that a bit later. Why we saw 1 billion of the hand, Scotland had a scholar detector here, which was, uh, which was, it was developed by Harrison Stevens in 88. When we use the determinant minus CAPA or alpha times your trace. That's what we used. That's what we used yet. There have been other improvisations of the same method where a researcher built rigs suggested that you can use Lambda, not minus alpha Lambda. One Lambda not is the first I get money or the larger eigenvalue in Lambda. One is a second again, value. A researcher Brown ban a team proposed determinant of a bite raise or fake, instead of doing the determinant of the minus Lambda, I'll find two, uh, three square of eight, uh, VC cup on two slides ago. That's just, doesn't matter. It's just a constant, uh, so you have determinant of a place of faith. All of these could be different ways of playing around with the same quantities to get what you want. All of them effectively try to measure the economist. Using the eigenvalues of your email, which comes from let's. I'll ask a few questions about properties of the hardest Connor detector before concluding this lecture. The first question we're going to ask is, is the hardest scoring detector scale in variant. What do we mean by scale? Remember, scale is complimentary to resolution. So if you have an object as which is very small in an image, we call that smaller scale. Or if it's large, we call that larger scalable. So you could have an artifact such as this curved line here in a particular image, or you could also have the scope line. On a larger canvas. Maybe you just took one image from close up and another image farther out, but it's the same image. It's the same object that's being taken. So in this particular case, if you observe in the second image, this point would be considered a corner by the hardest corner detector, but in the first image, If you took the same size patch to do the auto correlation, you would find that all of these would get categorized as edges. And none of them will get categorized as corners, which means the hardest corner detector need not necessarily be skilled in variant. How do you make it scale in Verint? We'll talk about that later. What about rotation in variant is the hardest going to detector rotation in video? It happens that it would be rotational invariant. So whether you have this particular important V like this, or whether you rotate it and have the inverter be like this in an image, this particular corner we'll have height change in both directions in both cases. And you would detect this corner, both in this image. As well as the selection, as long as there's no change in scale, you would detect the same content on both.