Virtual Reality Engineering
Dr. Steve Lavalle
Department of Multidisciplinary
Indian Institute of Technology, Madras
Lecture – 11
Human Vision (depth perception, cont’d)
Everyone, welcome back in the most recent lecture I explained to you. How the human eye works and we looked at the entire visual pathway from photoreceptors. Where the light hits going through the cells that are in front of the retina up to the ganglion cells and then communicating that information through the optic nerve and back into the visual cortex and we emphasize hierarchical processing that goes on and then once we understand that we like to start to talk about perception which is when the brain makes some kind of conclusion about what it is seeing or sensing if it is another sense beyond seeing and we gave examples of depth perception.
I gave you a number of examples of that we also talked about eye movements in the last lecture which are very important for doing things like keeping the image stabilized on the retina for a moving target or if you are moving your head or some combination of the 2. One thing to think about to continue from last time is on when we have various depth cues that I gave think about the combination of cues and I talked about many different kinds of cues.
(Refer Slide Time: 01:17)
Like for example, one object is in front of another or you look at the overall size of the object on the retina. The size of the image of the object on the retina.
We looked at these things and the brain is taking all of these into account in order to make a judgement about depth. We also had what you would expect a binocular disparity rate coming from 2 different eyes, but I just want to emphasize last time through I think I gave a many examples almost a dozen examples of monocular cues which just used a single eye in order to make conclusions about depth.
In order to combine the cues one way to think about how this happens; it is very much like statistical decision theory which also shows up in machine learning. If you wanted to construct some kind of model of how the brain might be doing this perceptual psychology is very often like to consider it to be a kind of example of a Bayesian model Bayesian or probabilistic model and as such the brain is considering; how useful is each one of these cues.
In the particular, context where it is being observed what are the priors the brain is using a lot of information about priors given the context if I been out on the forest many times in my life and then I go out on the forest again there are a lot of expectations of what to see around there? You will not see a um bunch of cars you know driving right through the middle of the forest or something let us say or you know something completely absurd. Suddenly, appearing in front of you in the forest there shouldn’t expectations about the context your brain is filling in the this information in some ways trying to reconstruct a full picture without having requiring additional information or additional sort of reasoning power.
It is falling back on priors very often these are some things that are very important for a Bayesian modeling and analysis and this seems to happen as well in the human vision system. There is prior bias your biased by the expectations of what you see and in the process of putting together information from multiple cues the brain is trying to decide whether that information is consistent in which case it will increase your confidence in what you are seeing or is it contradictory.
It is contradictory, it will lower your confidence and when we see some of these optical illusions, they may contain contradictory cues and this causes some significant amount of confusion. (Refer Slide Time: 04:16)
How useful or how discriminatory? Let us say is each cue in the context, context could be in combination with all of your other senses and your memory of what place is like whatever you are seeing I have appeared like before all such as one extra thing I wanted to add to the depth perception and this generally applies to all aspects of perception. These kinds of ideas here and not only combining multiple cues from the same sense, but also combining information from multiple senses to make some kind of coherent view of the world with a very high amount of confidence. (Refer Slide Time: 05:05)
Another related topic which I will not cover very much, but you can also talk about scale perception and compare that to depth perception.
How large is the object? That I see if you make a virtual world and you start putting furniture in there. Does this correspond to furniture that you seen before? Is it just look kind of vaguely similar like maybe there is some kind of sofa chairs appearing well based on the way the chairs look? You might be able to estimate how large it should be, but also the depth perception is coming into play again. How far away is this object right is it that the objects very far away and enormous or is it up closer and smaller.
You can see that depth and scale are very closely intertwined; these are very important a aspects or concepts for virtual reality because, maybe you would like to reach and grab something if you are tracking your hands. For example, even if you are not tracking your hands you may still have a simple some kind of simple controller interface, but you want to with your virtual arms reach and grab something.
How perception of depth and scale come together will be very important in a context like that right and as well your perception of scale and depth are going to be affected by your interpupillary distance in the virtual world that is another interesting thing right we talked about that.
If I put your eyes very, very close together in the virtual world you what you may perceive yourself as smaller or perceive the outside world as larger you can think about that very, very philosophically if you want they more or less the same thing geometrically. How does your brain choose and interpret that when you in some kind of virtual world it is very difficult to say it probably depends on the amount of realism.
If the virtual world looks very much like some familiar physical place then if you make the IPD very small you will most likely feel small in that world if you make the IPD very large you will most likely feel large in that world if it is a completely synthetic cartoon like world it is very difficult to say what will happen? What your; how your brain will interpret them. Questions about this.
Virtual Reality Engineering
Prof. Steve Lavalle
Department of Multidisciplinary
Indian Institute of Technology, Madras
Lecture – 11-1
Human Vision (motion perception)
I want to get into another ah another perception topic called motion perception. This is obviously very important for virtual reality, you can of course, look at static panoramas in virtual reality, but our perception of motion is really fundamental to the kinds of things that we that we do all over the place in virtual reality.
(Refer Slide Time: 00:35)
So, that is the next big topic motion perception. Remember as I said last time, it is kind of a template, you know there is x perception, where x may be all sorts of things depth scale, motion, color on and on and on. So, so we may also apply the same concept to hearing or touch. So, perception of different kinds of textures for examples, this is um, again as we go through these pathways from the sense organ, all the way up to the higher parts of the brain, there is some hierarchical processing going on and there is a perceptual process in there. And that is what we are trying to get an understanding of and trying to characterize. So, what are the purposes for motion perception, some of the main purposes?
Well of one of them is segmentation or segregation. In other words, we might like to know if we are looking at a scene in the in the jungle, let us say we might like to know that there is an animal that is moving there, all right. So, so the perception of that motion is very important, and you might imagine for evolutionary reasons it is very important, right, to know if there is something moving in the jungle, right or in the forest that is up that is about to I do not know potentially eat you let us say right. So, so, it is very important to identify that.
So, perceiving that motion among a static background is very important. And you might notice that, our eyes tend to fixate very quickly on something that is moving. In modern times, our ability to detect a moving car, or cycle or some other some other vehicle that may be a dangerous to us as we cross the road is also important right. So, as a similar kind of purpose, we are very good at extracting some object that is moving from a background at stationary.
Um another purpose is to extract 3D structure, 3D structure of an object from the motion. So, for example, you know if I if I see this book moving around in some way, I get some idea about it is structure, this book does not have a very complex structure. But take something you know much more complicated like a chair with legs and such move it around and you get an idea of it is structure just from the way it is moving around on your retina, right.
There is some 2 dimensional image that is changing on your retina, but you can infer 3Dimensional structure you can learn a lot about that it is almost as good as if you walked up and started probing the object with your arms and use all of that information to try to build a 3D picture. So, we are very good at building 3D pictures, somehow in our minds of objects that we have not maybe ever seen exactly before, but we are using a lot of prior information again prior bias.
(Refer Slide Time: 03:25)
And from the context, we have a lot of information about how the lights falling on it in a certain setting, that we are familiar with and we make judgments about it is 3Dimensional shape.
You can make illusions and fool the brain in some in some cases and get some false conclusions, but nevertheless we rely on this very carefully very frequently in a large number of settings. And finally, another key purpose is visual guidance of action, and this is exactly part of where I wrote on scale perception here one of the one of the important aspects reaching out and grabbing something. So, if we want to reach out and grab something, then we are using motion and we are using motion perception we are perceiving, the motion of our hand as we reach and grab something and that visual feedback helps us to accomplish the task. So, visual guidance of our actions so, that is useful for manipulation like grab a cup. general hand eye coordination, and may provide information about you know other kinds of self-motion.
Questions about that I am cover a couple more topics and then I am want get to one of the main aspects of this perception of motion which is going to be frames per second I really would like to talk about that that is; obviously, something that is on our brains a lot we talked about this in connection to the lab, the virtual reality lab and we would like to know how many frames a second are enough.
(Refer Slide Time: 05:10)
But I want to give a few more basic concepts first. So, one thing that I find fascinating is if we look at the neural circuitry; remember, we talked about the visual pathways from photoreceptors all the way up to the visual cortex. So, along that path and actually very early in the path, we have the following kind of circuitry so, suppose there is an object that is going to go moving across the retina or some object it goes moving across the retina. And imagine that there are neural detectors let us say. So, somewhere back here there is a detector A and A detector B. So, imagine this neuron fires whenever the object moves across this location, and as the object moves across the location here this one fires.
So, which one is going to fire first well first a fires then B fires what is interesting is there is extra neural circuitry that looks like the following, it goes through here I was a box that I am going to describe in a minute, and then these feet together to some third a neural structure called, let us say C, and inside of this box is a delay say some delta t perhaps it is 50 milliseconds. So, what happens here is this C fires if both A and B fire, but only if a fired a little bit ago. So, a fires a little while ago and B is firing right now, the superposition of those in time right a little while ago and then B right now will fire C. So, that is a very, very simple motion detector. So, we have those all over the place for varying speeds and various distances of neighboring distances. And so, one interesting outcome of that is that you are able to fool this sense by something called the wagon wheel effect.
(Refer Slide Time: 07:01)
Which if you have ever made some kind of pattern like the following, let us take a we put some stripes around the rim of a wheel. And then you spin the wheel it may start to look like it is rotating backwards, even though it is going forward. So, this is interfering with the basic operation of this neural circuitry it is easy to fool it into thinking that it is going in the opposite direction. Because it is just based on very simple timings and the flashings of neighboring, let us say the kind of in some sense the flashing or signal appearing in various spots along your retina and those photoreceptors responding. So, so you can very easily fool it into thinking reverse directions. Unless, there is a more coherent picture of moving from place to place to place to place along a very long sequence.
(Refer Slide Time: 08:06)
So, this is very interesting I think. Also, one thing to think about is object motion versus observer motion. This is one of the key ingredients to avoiding simulator sickness if you are not careful. So, if I just were to look at the images, if I could somehow make a recording of the images that hit your array of photoreceptors, right? And we imagine we are just going to play a video back, now of what is hitting your photoreceptors.
I might see some kind of motion and occurring there, but how do I know if that motion is due to you moving, or some object in the scene moving, could be some combination of the 2, but let us suppose it is one or the other, how would I distinguish between the 2 if I were just watching the video I do not think there is any way to fully distinguish that right unless there is one maybe you would start to guess from the context what is probably happening, and use your powerful brains to make an inference. But if you just look at the raw data there may not be enough there to really make the judgement let me give a simple example.
Let us suppose they have an eye looking upward, all right? So, I will draw the cornea sticking out here, and I have an object that moves from right to left. So, it is moving along like this; so, if I draw out the retina of the eye, I guess when the objects over here on the right, then there is an image of it, over here on this part of the retina. And then when it moves across it ends up over on this part of the retina. I am not drawing too careful of a diagram here, but I think you will understand the principle getting really curvy here. So, as the object moves from right to left, imagine the eye is held fixed, and then the image goes from left to right makes sense all right. So, that is one possibility this is the object moves, and the observer and eye are fixed, all right. So, the observers head and the eye let us say are fixed, and then there is another case which in which I will hold the object in one place, and then I have an eye here and I move the eye over here. (Refer Slide Time: 10:42)
So, when I start off on this side, can I have the retina coming around ? Draw it again over here, should be the same I just moving across from left to right. And I start off over here I get the image on this side. And want to come over to this side I get the image on the other side, way off there I
see all right. So, in this case, the eye moves and the objects fixed. So, these 2 cases, I just want to say if you imagine how the image looks along the retina, they should more or less correspond to the same thing.
Now one thing I have I have done here I should admit this is a bit fictitious of a scenario, because it is actually very hard to do this in practice. So, if I if I want to you know try to look at an object let us say that is moving across my field of view, it is very hard to prevent your eyes from tracking it and doing smooth pursuit it is very hard for you to disable your vestibule ocular reflex if you are moving around while you are fixated on something so in fact, if I go back and forth like this I am looking at the camera right, now it is very hard for me to keep from continuing to look at the camera, it is like I have to put my fingers on the sides of my eyes like this and then I can keep it from rotating that is about the only thing I can do not try that at home I guess maybe that is hazardous to your health.
But so, it is very hard to stop these reflexes. So, given if that is the case given that your eyes are going to be rotating anyway. That is going to make these 2 cases even look very similar to the case of no movement at, all right and because your eye is rotating to keep a stable image on the retina. So, I just want to point out all these cases like even if your eyes are not rotating these 2 are somewhat indistinguishable; however, when your eyes do rotate, in one case if you are doing this will invoke the vestibule ocular reflex, which you are moving your head back and forth, you can also do with pure rotation which is also translating your eyes through space, and then this corresponds to smooth pursuit if this object is moving slowly enough. So, those are different modes of operation, which is part of the key of how the brain is going to distinguish between these.
So, let me give you how the brain distinguishes between these cases. Let me just list out the information that is being used it is not just looking at a raw video signal; let us say from the array of photoreceptors. It is got more sources of information coming together. (Refer Slide Time: 13:50)
So, the brain uses more information to distinguish. We talked last time during saccade motions there is saccadic masking or suppression, which when this happens, it suppresses the motion detectors. So, while we are performing saccade, it is that the motion detectors that, I just erased one of them right that when I did the A B and C neural circuit those outputs are suppressed. So, already it is not going to be feeding motion detection information to your brain, if your brain knows that you are performing saccade so, very interesting. So, there is already some inhibiting that occurs there another thing that the brain uses is eye movement commands.
So, that is very interesting the somewhere there is a command issued to let us say commands issued to contract or expand the muscles that move your eyes and rotate them right. So, if that is the case it is a matter of sending those signals to the right place where perception is being performed or the motion perception is being performed and using that information. This is sometimes called by neuroscientist efference copies by the motion commands or motor commands very powerful information, right?
The same thing happens in engineering if I give a command to a robot that says move forward. And then I observe what the robots doing you can make a lot better inference if you have access to the command oh I know that the robot was commanded to go forward, but if you program the robot yourself you have that information, if you did not program the robot you do not know what is inside of it then you do not have that information, right. So, so this is extra information that is inside the brain, throughout our neural system let us say and I can be used here.
And then finally, one more piece of information is large scale motion. So, if the entire scene is moving what is usually the case, is it usually the case that everything around us is moving simultaneously is usually the case that in that event it usually means that we are moving right. So, if I if I almost sudden notice a huge change, I go like this, right it is very unlikely all of you are rotating, and in some kind of role all right. So, that is prior information if there is a enormous largescale motion across the retina, probably you are the one that is moving.
And so, that is a falling back you might remember, I talked about the big swing illusion, where in an amusement park over 100 years ago, there were a bunch of participants sat down in a swing that was actually stationary, and in the entire room started rocking. That is such an unusual scenario, that would correspond to you know it corresponds to in reality everything is moving around you, but your brain interprets that is I must be moving and everything else is stationary. And that is what made people very sick and quickly believing that the chair that these chair that are in a spinning all the way around that the swing goes all the way around 360 degrees.
Because it was tricking this information basically, giving you an artificial scenario where the largescale motion assumption is in fact incorrect yeah.
Student: What cases the saccadic masking occurs? Even when we are moving or the object is moving.
Um let us see, if we are I am saccadic masking occurring. So, let us see if if we are holding stationary, and we are looking at an object. Well, let us make nothing to find the scenario first. So, is it is it let suppose there is a stationary object. And we are looking around the object, right, like reading a book for example, in that case there is a lot of change across the retina, but the motion detection is suppressed, right? There is another scenario where let us say the object is moving slowly, in that case there is no saccade it is smooth pursuit. There is a scenario where there is the objects moving very fast, and some additional saccades are being inserted in order to keep up. But I think at that point there is already been enough motion detected I believe from the smooth pursuit times in between the saccades. I am no expert on neuroscience time trying to speculate as best as possible for those different scenarios.
Student: Yeah, but still how is it different from the book is moving or you are moving, the other way, both cases are identical except the large scale motion.
Um no, I think the I think the saccade motions we do, most often the seccade motions that we are doing another, normal saccade mode is to point the fovea or oriental fovea. So, that it is so, that you are getting higher acuity images of a single fixed target. So, those particular motions are interpreted because of the saccadic masking as that must be motion of the eye just in that mode all right. So, in the case of smooth pursuit it is usually not the case that you are doing extra saccade to try to take in all the information of the object by pointing the fovea, while it is also moving usually in the case of smooth pursuit is trying to keep a stable image on the retina. So, it is not a saccade case hm.
Student: and vestibular sense involved in this, that can also be an information.
That is a very good question, yeah, I think that is I think that is reasonable, why not add the vestibular sense here? I think that is reasonable very good yeah absolutely if the head is moving there is going to be additional vestibular information. If the head is fixed then it uses eye movement commands. So, I like that very much yeah. So, vestibular input if head moves. Or I guess your entire body could be moving without your head moving if you are on some kind of moving carts, let us say in a car for example, then the head itself it is still technically moving with respect to the world frame, but may not move with respect to your collarbone for example. All right? Very good any other questions or comments ? Yeah.
Student: You gave two examples, one was smooth pursuit and the other was vestibulo-ocular.
Student: their result is kind of like the same right? Or is there any discrepancy in them?
Smooth pursuit versus vestibulo-ocular; well, the vestibulo-ocular is using your vestibular signal. Student: Yeah.
Which I guess that is where it got used for one of the cases right. So, it is it is using exactly your self-motion in order to compensate. The smooth pursuit case is not your emotion, it is the motion of the object you are trying to see right. And so, it is based purely on expectation of the motion of the object, and trying to keep that stable across your retina, right? Both are trying to do some kind of let us say stable image on your retina, right? These are fundamental differences and whether or not the vestibular signals getting used and evoked because of the motion, or you are just trying to track a moving target, and that is where the difference is are I am just trying to say that kind of geometrically in terms of just raw images these look identical. But there is so much more information going being obtained by the brain that it can distinguish these 2 cases.
Student: That seems like without there are extra information, they are pretty much identical.
That is what I want you to think, yes, that is right. Like, wow, it does not have you know if just from images. So, if we were to think like engineers and we just turn on a video signal, and look at it well how does the brain know right. So, I am saying yes, the brain knows there is a lot more information here, all right. Very good these are very good comments very good ways to think about things there is one more piece of one more piece of information that is very naturally extracted from the images that change across your retina and this is called optical flow this is this ends up being very important and it is one of the most important aspects in relation to motion sickness or simulator sickness, I should say not necessarily motion sickness.
Because the VR participant is not necessarily the one moving, all right? Usually we say motion sickness for something you might get from moving in a vehicle, let us say moving in a car while being a passenger especially that is motion sickness, simulator sickness would refer to what you get from a virtual reality experience for example, where you may perceive your own motion, but you are actually not moving. So, those 2 are kind of in versus motion sickness versus simulator sickness.
(Refer Slide Time: 22:51)
So, optical flow ; so, what happens is in this case? The brain can keep track of images on the retina or features let us say on the retina. So, there may be some particular feature, I see a red dot above the camera there. So, I can I can move around in some way, and that dot is going to appear somewhere on the retina and I can keep track of that, right? And if there are several dots, then if I get closer to them the dots will start spreading apart, if I go further away they will start contracting if I go to the left then will move to the right. So, this is called optical flow information, where features are moving in the image. So, our brain perceives them to be the exact same object or element in the world there are a bunch of them and we perceive a kind of general flow because of it.
So, it is tracking movement of features on the retina, and we end up with a kind of vector field, kind of velocity field on the image plane or roughly the image sphere if you think about the way the retinas shaped it is fine.
(Refer Slide Time: 24:27)
So, if we think about various self-motions, as I just illustrated, if I am going forward, then the flows tend to be outward from the center, correct? Also, the flows tend to be faster at the periphery, then in then straight in front of me. So, not only are there directions in the optical flow, but there is also a magnitude all right. So, these are these are full full-fledged velocities in the image plane or image sphere if you would like to view it that way. If we go backward, then these reverse, and if we go to the left, then we see flow to the right. So, when I move myself to the left, I see you flowing to the right. Also, if I turn myself counter clockwise, I also see flowing to the right. So, I see also counter clockwise.
(Refer Slide Time: 25:47)
Of course, I will get flow in the opposite direction by going right or a clockwise rotation, our ability to perceive optical flow turns out to be strongest as I look forward right about here. I think, somewhere around 20 to 30 degrees off; this is the place where you have a very high density of rods if you remember when we looked at photoreceptor density. And generally, it is very strong in the periphery, right? So, that means, that if there is some kind of motion in the periphery we respond very quickly to that turn our heads could be a predator right maybe some evolutionary reasons for that. So, just generally speaking it is stronger than periphery I believe it peaks around 30 degrees or so, and then it starts trailing off again because of the low density of photoreceptors that we have as we gradually go to the side gradually look to the side.
(Refer Slide Time: 26:50)
So, based on this optical flow, we have a big problem in virtual reality, a big problem, you have experienced in the lab already I am sure; which is the illusion of self-motion from optical flow. This is called vection. I have said this before ah, if you are sitting stationary in a train and the train next to you starts moving you are looking out the window maybe you are not even paying a lot of attention.