Loading
Note di Apprendimento
Study Reminders
Support
Text Version

Concept for VR and AR

Set your study reminders

We will email you at these times to remind you to study.
  • Monday

    -

    7am

    +

    Tuesday

    -

    7am

    +

    Wednesday

    -

    7am

    +

    Thursday

    -

    7am

    +

    Friday

    -

    7am

    +

    Saturday

    -

    7am

    +

    Sunday

    -

    7am

    +

Virtual Reality Engineering
Dr. M. Manivannan
Department of Biomedical Engineering
Indian Institute of Technology, Madras

Lecture - 86
Principles of Perception
Gestalt principles of visual perception
- Gestalt – Movement in experimental psychology which began prior to WWI
- We perceive objects as well-organized patterns rather separate components
- The whole is greater than the sum of its parts
- Based on the concept of “grouping”

- Gestalt means “form”
- The of the most important principles:
o Figure-ground o Grouping


Virtual Reality Engineering
Dr. M. Manivannan
Department of Biomedical Engineering
Indian Institute of Technology, Madras

Lecture - 87
Introduction to Kalman Filter

So, there are specific algorithms, which is required for augmented reality called SLAM S L A M SLAM for simultaneous localization and mapping simultaneous localization and mapping. So, there is a there are two things happening localization and mapping and that is simultaneously happening.
Localization is to estimate the pose of the camera in a augmented reality to estimate the pose of the user where exactly he is and mapping is to have 3D modeling of the environment, so that we can place the virtual object onto the real environment, so that we can and overlay the 3 D objects virtual objects into the physical environment in the camera images.

In this example, I am going to introduce a very simple a motor vehicle moving in in the road let us say; this is the road we are talking about. And this motor we need to look at what is its position let us say this a position we need to measure it. And let us say here is the sensor which measures the position. So, when it measures the measurement distance, it measures the distance right.
But what happens is this sensor is noisy. The output of the measurement and of the sensor itself is noisy. So, we are not very confident about what the sensor is. Most of the sensors are like this; it is uncertain, the measurement is a little bit of uncertain. So, in the case of the uncertain measurements, how are we going to how are you going to make better measurement of our or our objects in the virtual reality environment is a question.
So, we are talking about how do we improve the noisy measurement process. What Kalman filter does it, it uses some prediction algorithms and then uses this prediction algorithms to update the measured are now distance for example here. And then the measured distance is used to rotate there are the two stages in the Kalman filter.
So, in a normal process, what we will do is we will use only the you know measured distance without worrying about the prediction. Measured distance and using it directly is a little risky because it the data is noisy. So, to avoid this Kalman filter does it is that it uses some special prediction algorithms and uses this prediction algorithms to update them measure distance. So, the measurement distance is not directly used, but it is updated based on the prediction as well as from the measurement data. Once we update the measured data, then we use it to predict again ok. There are the two stages of the stages of Kalman filter right.

So, essentially if you look at the, if you take a Gaussian distribution, x is a measurement distance, we are talking about mu is the mean and sigma is the variance. Then this is a standard equation for the Gaussian distribution Gaussian bell shaped curve; it has e power minus as x minus mu whole square divided by 2 into sigma square right. This is the standard bell shaped Gaussian distribution. If you look at mu 1, mu fused then it is going to be mu naught plus sigma naught square into mu 1 minus mu naught divided by mu naught squared plus mu 1 square.
Similarly, we can write sigma fused is going to be sigma naught square minus sigma naught squared divided by sigma naught square plus sigma 1 square. This is the equation for the fused distribution. In this, if you observe, if you consider the whole thing as some constant k, then and this also is going to be k, therefore, we can write again the same equation, if you fused is going to be mu naught plus k times mu 1 minus mu naught sigma fused is equal to sigma naught squared minus is k times k times sigma naught square yeah.

Virtual Reality Engineering
Dr. M. Manivannan
Department of Biomedical Engineering
Indian Institute of Technology, Madras

Lecture - 88
Introduction to Extended Kalman Filter


We are talking about a extended Kalman filter, usually it is called EKF in the reading material or and the textbooks. So, the Kalman filter what we looked at is if you just for recollecting, we have this state vector t plus 1, we wanted to you know predict this state vector from the earlier state, it should be yes or earlier state vector some function. And another function into you know u of t it is a control this is the you know position or yeah state vector, the earlier time stamp, this is the control vector earlier at the current state and also some noise right, this is the process noise we said process noise.
And similarly, we have this is a measurement as a some function of t you know x of t, and there is a there is a measurement noise measurement noise. These are the two equations we saw in the last class about the Kalman filter.
So, these functions we mentioned that this is a F function, this is a B function, this is a H function. These functions it is assumed to be linear functions; F, B, and H are linear functions. If it is a linear function, this algorithm works fine, but usually in most of the real cases they are all non-linear functions, it is not linear a linear function. If it is not non-linear, then and we need to we cannot at use the Kalman filter as it is, we need to linearize the non-linear equations and before we use it for the VSLAM techniques.
So, EKF is nothing but linear version of linear version of linearizing the non-linear functions about the current mean and you know variance. So, it is an approximation of the non-linear functions which is most of the cases in a in a real situations so that is what the EKF of Kalman filter is extended Kalman filter is.
Again I am not going to go into the details of the mathematical equations, I am going to give a tutorial in the website; I am going to ask you to go through the tutorial and then do a very simple assignment as required in the in the course that will help you to you know pick up the real basic concepts of the Kalman extended Kalman filter which is necessary for the visual SLAM ok.

Now, with EKF how can we do these visual SLAM. In the visual SLAM essentially
VSLAM, specifically we are going to look at the monocular monocular slam, monocular SLAM involves really four steps; the first step is the identification of the landmarks from the images, identify landmarks from images, from video frame. So, this is the landmarks are also called the features.
The features are examples of these features are the edges or corners so many edges and corners in a typical image is can be used as a no landmark. Identifying the landmark is the first step in an image for the monocular slam, and then landmarks in these successive frames are matched in these successive frames, we are talking about the video frames; video frames are matched.

So, in a visual SLAM the camera state can be written as, let us say camera state is nothing but the pose. The pose can be either in a quaternion or vector notation, or anything you can use. And then we have this feature state, let us say feature state i, each feature state let us say x i, y i, z i. And then we can say EKF system state we can write that is nothing but x is nothing but the camera, comma x 1, f 1, f 2, feature two and such like this.
So, this is a system state and for each of this state it is we can also come up with the variance matrix, and the estimation or the measurement from the camera also will have a variance. And the prediction stage as well as the update stage can be you know used to a
accurately estimate the pose as well as a map of it. Again in a in all of this algorithms, the pinhole camera model is used. In the pinhole camera model, we have one image and we have the origin over here; and in this origin we are talking about the x, and then we are talking about the y, and then we can talking about the z which is going through this one, so this is going to be the z of the camera, let us say these are all camera.
And any point on the image then can have a ray let us say this point p is x of i, y of i, then this ray can be of p of x of i, y of i comma z of c, z of c this is the depth of the this particular plane right z c plane. So, all the points on the plane will have the depth z of c, there is a pinhole camera so with this simple camera model I will we can and the EKF algorithm. We can recursively estimate the pose as well as the map of this map of the environment that is a monocular SLAM.

So, again to make it a little more clear. Let us start with yeah video frame, from the video frame let us estimate features, and from the features let us do the measurement model, from the measurement model we have this EKF update, from the EKF update the EKF update also needs prediction right; model prediction, motion model prediction so motion model prediction right. And the from the motion model prediction, again the features it can be a input to the feature. And this motion model prediction can be you know can have input from IMU also, these are the inputs.

Virtual Reality Engineering
Dr. M. Manivanan
Department of Biomedical Engineering
Indian Institute of Technology, Madras

Lecture - 89
Grand Challenges in VR/AR
Welcome back. In this class we will discuss about what are the grand challenges in virtual reality and augmented reality. We are almost at the you know final stages of this course and in this lecture, we will talk about we will summarize is some of the topics we discussed in the earlier classes and then see what is their challenges in the coming days.
And specifically we will focus on the; you know most difficult concepts, technical challenges in the virtual reality and augmented reality. Like the way we started in the first classes; the whole virtual reality depends on the two pillars of, two pillars like you know immersion and interaction. So, we will look at the grand challenges in the immersion first and then later on the grand challenges in the interaction. (Refer Slide Time: 01:30)

And if there are or few other technical challenges other than this immersion and interactions; we will see them after this interaction and immersion challenges. Let us start with the immersion and grand challenges. We are talking about the immersion, and we are talking about the interaction and then we will talk about the other, challenges, other challenges these are the 3 topics we will discuss as today.
We will start with the immersion; in the immersion we are talking about either the visual immersion, or the auditory immersion, or the haptic immersion, or the any other are sensual immersions. Of course, you know visually immersion is the most important now aspect of the virtual reality and augmented reality we will initially see the visual immersion and then later on go into the other sensory immersions.
As far as the visual immersion is concerned and one of the major challenge we will talk about the visual immersion. One of the major technical challenge is the now pixel density. We talked about in one of the earlier classes what is the required pixel density for the human not to see any of the visualization or pixelizations in the virtual reality displays right.
What is just enough what is the enough pixel density pixel density; in one of the classes we talked about from the physiological limitations of the pixels we derive some numbers and then later on we talked about the limiting resolutions using the perceptual or perceptions. For example, I will we looked at the you know contrast sensitive functions using the contrast sensitive functions. And then we looked at we derived some numbers.
For example, we talked about you know we talked about some 10 pixels, 10 pixels, pixels per mm is a current stage. But what is required it is about you know 80 pixels as per mm alright. These are the numbers are some approximate numbers we got earlier ok. For example, we are talking about one of the latest display, one of the latest gadget VR gadget at in the market is called the you know Vive-pro the professional vive which has got display resolution something like 2880, cross 1600 pixels.
It is almost about you know 6015 dpi, or it is about you know 25 pixels per mm. So, our goal is to reach about 80 pixels per mm. Whereas, we are somewhere near you know 25 we probably it will take about another 5 years to you know get there. So, pixel density is one thing the next major thing is the field of view. What is the necessary field of view? We know again the latest vive pro has a field of view about the 110 degree.
Whereas, as in one of the earlier classes we looked at it that we needed about the 240 degree to give you the feeling of the immersion ok. And also there are some studies which says that you know field of view is very important to avoid the cyber sickness, or motion sickness, or all the other discomfort of the you know using the VR display, or AR displays.
But there are some studies few other very few studies which is saying and which has found that and field of view it does not really matter even those people with 240 degree of field of view the user still gets the motion sickness or cyber sickness.
There are few HMD’s which have which have we are aiming at about 200 degree field of view or you know something like this. So, instead of going with a flat display the few novel HMD’s in the market they you know they have the you know bend HMD’s so that the field of view can be you know increased. So, the flat and displays used in the HMD’s can have very can have limited, but the same m flat displays if it has bent then it can increase the field of view.
There are few HMD’s already available in the market. So, ideal all full spherical or field of view is going to be now 360 degree cross 100, and 80 degree. So, that will be our aim to you know reach that is going to be the challenge probably we will take about another 10 years to reach to this stage ok.
(Refer Slide Time: 08:26)

On a third topic is third challenge is the latency, again in as we discussed in one of the earlier classes, the latency is that time lag between and the perception of the motion in the display, if from the that time the user has actually moved. So, there is latency there are few displays which are already in the markets have a latency about you know 20 to 30 milliseconds which is which is not perceivable, but what does actually needed is 2 to 10 milliseconds.
Whereas, oculus for example, has 20 to 30 milliseconds, and vive has vive has about 17 milliseconds, gear VR has 20 milliseconds, and PSVR about 18 milliseconds ok. So, what we really want to do is you know our aim at a 2 to 10 milliseconds. There are the software techniques can be used in order to reduce this existing latency into the record latency. For example, oculus uses some technique called asynchronous time warping, asynchronous time warping. So, essentially we can we can estimate what is going to be the next frame and then render the frame aim ahead ok. So, this is to reduce the latency.
So, this method is to pre render, render an image before a scene hits the visual display so that we can reduce the latency. There are few other techniques it is you know again and what we need to achieve is 2 to 10 milliseconds. And then next challenge is to ensure the quality of the visual virtual content. So, if the quality of the virtual content again it means the number of polygons is the very crude measure number of polygons, or the number of you know the graphics pipeline right, the number of polygons that can be processed you know per second.
There are nvidea graphics hardwares, or amd hardware which is specializing in virtual reality and augmented reality. Specifically they have the SDK ’s, they have the hardware for virtual reality and AR. For example, nvidea has this VR works both you know unity and unreal engine suppose VR works we can use these VR works which is a hardware implemented in the nvidea graphics cards.
So, in the unity and then render many more polygons and therefore, you can increase the number of the quality of the virtual contents ok. A number of frames is another number of frames per second, frames per second is another crude measure of the quality of the virtual content of we are looking at least 50 to 100 or frames per second. Again in all this graphics hardware which is going to help amd has you know liquid VR these are the SDK ’s right.
So, again and liquid VR is supported in the unity, and the and unreal engine you can also you know download it and then start experimenting it. So, then the third measure could be that you know the depth cue, it is very difficult to provide the information of that depth.
The quality of the virtual content is really the 3 d immersive environment, the 3 d immersion comes because of the depth information how well we can render the depth in the virtual environment is a still a challenge for lot of researchers. And what depth cue is used is you know still an active research. So, depth perception it still on a active research and if the depth is not rendered properly then the user will get cyber sickness very quickly alright.
So, the it will cause lot of health effects which we are going to see later. So, as far as the depth perception is concerned I am going to emphasize another important point which is still at you know grand challenge.
(Refer Slide Time: 14:45)

I would call it as depth perception. In the depth perception one of the most important grand challenge is called the vergence, accommodation com accommodation dation conflict, this is called the VAC in the literature. There is a conflict between the virgins and the accommodations. So how are we going to solve this issue that is one of the major issue still it is being researched ok.
(Refer Slide Time: 15:40)

And then fifth major challenge is that at are aligning let me say aligning which will content into the real environment with real environment. So, this is where the AR comes into picture, how quickly that the algorithms what we saw in the last videos about the extended Kalman filter using extended Kalman filter using the monocular slam, or the you know visual slam, or the slam.
In general how accurately how quickly we can and we can and align the virtual content with a real environment. In most of the situations the scene does not have it does not have any features the features in the sense there are no edges, there are no sharp edges, there are no sharp corners whereas, the slam assumes that there are edges and corners.
If the scene does not have any sharp edges it is all very curved very nice features. Then you know feature extraction is going to be very extremely difficult when there is no feature possible then simultaneous location a localization. And then mapping is going to be extremely difficult it is still a challenge it is still an active of research as far as the AR is concerned. And for a general scenario where there are no edges and corners what kind of other features general features can be used for slam it is a still a challenge so that is about the visual immersion.
Similarly, we have this auditory immersion. We are talking about you know how to give the how to provide the real immersion auditory immersion and in the virtual reality. Again yeah you know very high quality of HRDF, if is necessary for very good sound rendering. But this HRDF which most of the 3 D audio rendering algorithms are using is for general user it is all you know average user averaged from many users. Let us say averaged from many users, but individual users we will have a different HRDF whereas, this is for the you know general user.
So, audio rendering, audio rendering with personalized HRDF is going to be the main challenge in the coming days. So, we do not know whether this is important, but in order to improve the immersion of a audio auditory immersion effect as this may be you know important. So, similarly we can talk about the haptic immersion. So, the haptic immersion again in as we saw in earlier classes as without with without the feeling of wearing heavy objects, we should be able to give the touch feedback.
So, without adding inertia to the user or we need to render to render force feedback. So, as of now there are gloves available, there are cyber gloves available which adds lot of weight to the users hand. And therefore the inertia and therefore, it changes the feeling of immersion. So, ideally you know without any of these gadgets in the hand and can we feel the force feedback that is a challenge.
How do we feel the force feedback it is still a challenge, most of the virtual reality without force feedback is not much useful the user feel the realism is missing without their haptic feedback. So, the feeling of the virtual immersion, and auditory immersion, and on the haptic immersion it is all very important.
(Refer Slide Time: 21:38)

There are recently apple has come up with some SDK ’s, AR kit which talks about specifically you know improving the visual immersion, as well as the auditory immersion, as well as the haptic emotions, you can take a look at it. Similarly Google has also come up with another SDK called Google AR core.
So, if you are not aware of these things and so I am going to request you to pause the video for a minute and then you know get yourself exposed to those kits which may be very important and in the coming years. So, there you can see that in order to improve the immersion there is a war between the major are gains software gains and in order to come up with a you know better SDK ’s right ok.
Apart from this there is also recently you know there is also magic leap. So, magic leap is one of the one of the AR glasses where it is much lighter than the hololens Microsoft, hololens and it has lot of other features. The special features I think the specification itself is not at out there at. Again and you can look forward to you can look forward to see advanced features in the magic leap ok.
One of them one of the you know very well talked about features of the magic leap is that at if there are virtual objects, virtual objects in VR, in an AR. And there are some real objects it is in the there are real objects which is behind the behind the virtual objects or say in front of virtual objects, virtual objects in the AR real objects.
Then and the occlusion does not usually happened occlusion does not happen where is that is supposed to happened. So, magic leap says that at it they demonstrate that the virtual objects in front of or behind the real objects the occlusion is taken care very well. Whereas, in the other kits say AR kit or AR core or that is not happening over here.
And also it is a magic leap is a very lightweight and it is talking about. So, that is about the that is about the sorry let me stop here so that is about the immersion and challenges. (Refer Slide Time: 25:21)

Now let us look at the interaction challenges of course, the interactions we have talking about the haptic interactions though or the auditory interactions, or the visual interactions head tracking. So, head tracking is one thing, head tracking hand tracking hand tracking and you know objects tracking in the virtual reality object tracking and then we are talking about the you know sound interactions, haptic interactions, or sound interactions, or sound interactions.
So, in each of this we can talk about the head tracking itself is a this is lot of challenge the HMD’s we have talking about has to be very light right as of now of the HMD’s you know ways much more than the 1 kilogram. Whereas, if the HMD is more than 1 kilogram we cannot wear this as for a long time even if you wear long time you are getting used to the weight in your head.
And when you come out of the virtual reality you get you know cyber sickness you get to use to adapted to the adapted to having lot of weight in your head. It is suggested that HMD weighs much less than 1 kilogram in the future the all the big companies are running towards making this magic leap kind of devices it is like a sunglasses it should not be much more than the sunglasses.
But are still we should be able to render virtual object in the augmented reality we should be able to experiences that is the directions we have looking at it as of now of the HMD’s weigh by 1 kg. So, imagine how much the sunglasses is being how much is this the current HMD’s are weighing that is a you know long way to go right.
Hand tracking again hand tracking without any of the gadgets added to it is again a challenge. And there are challenge is there are you know there are techniques available say leap motions, or you know real sense which is tracking the hands, but again reliably tracking is one of the important challenges.
Now what is the accuracy of this hand tracking using this leap motions under are real sense is again the big challenge. How do we improve the accuracy of hand tracking not only hand tracking even the finger tracking. So, because in the virtual reality environment it is not just you know moving around it is not just you know navigation. And in the future there are many applications will come where we are precise now a finger or hand movements is necessary.
For example, in the skills training so hand tracking precise hand tracking is still one of the important challenges over here. Again object tracking there will be many objects in the real objects in the environment and this objects should be tracked and the corresponding virtual objects should be updated again an accurate object tracking is important.
So, vive tracker has solved this problem some way, but without adding inertia to the objects. For example, I have a very light objects in that light object we cannot put a you know virtual sorry vive tracker which is the itself is having more than and a couple of 100 grams right. So, if the object itself is you know 10 gram and if you add the vive tracker then it is killing the immersion object immersion of interacting with the objects so that is again a the big challenges.
As I mentioned haptic interaction is going to be a big challenge without adding inertia to your hand, without adding extra gadgets in your hand, how do we give the force feedback that is still a challenge ok. And then sound interaction also we talked about. So, a speech recognition is still a challenge speech recognition most of the speech recognition engines and implemented in virtual reality or a AR environment is trained using one particular user. But now it has to recognize is entirely new, new voice.
So, that is going to be the you know the future challenge if you have lot of data, maybe you know machine learning algorithm much better machine learning algorithms can be used to recognize a speeches of a entirely new user that is going to be it is still a you know research topic.
Of course, we have talking about the eyes tracking also eyes tracking the foliated rendering and we talked about in one of the earlier classes ok. Locomotion, legs, and locomotion tracking locomotion tracking is again and a big challenge, there are few gadgets, or few products available which will let us you to walk around.
And, but still there are challenges over here it the gadgets are very big but without adding you know much inertia to the legs, or restricting your motion. How do we give a immersive effect of a locomotion, and it is a challenge. So, you can see that there are many more interaction and challenges. Apart from those interaction challenges there are other challenges.
The other challenges something like this is how to let you build here virtual reality quickly how to let you build well VR systems, VR environments, it is quickly or easily quickly. So, there are the unities and unreal engines which you are already aware of it they are solving this problem. Their aim is to help you to build virtual reality systems quickly, but there are challenges ok.
For example, if you look at the unity there are something is not possible for example, soft objects rendering soft object is still a challenge a very simple soft objects you can render it. But now very complex soft objects which is necessary for some specialized applications is still a challenge. For example, soft objects interacting with the soft objects in the unity, again it is going to be a challenge whereas, unreal engines and soft engines soft objects may be possible, but there are other challenges in the unreal engines.
So, coming up with the you know SDK unity or unreal engine in a like SDK with lot of features at the same time let us you build the virtual environment quickly is a challenge. So, it can restrict the features so that you can you know fasten the development at, but having full features many features at the same time speeding up the development is a still a challenge.
So, that is where this apples AR kit, and the AR core is a AR core, or AR kit it is all helping you to build you know much better systems. So, one more challenge as far as the virtual reality or AR system is concerned is that the computer systems which is you know doing that number crunching. So, much of number crunching along with a graphics card hardware has to be very small.