AMAZING AUGUST SALE: 25% Off Certificates and Diplomas! Limited-time Offer - ends Friday, 6th August 2021

Claim My 25% OFF

An Introduction to Attention Models in Computer Vision

Learn about the applications of attention mechanisms in various computer vision tasks with this free online course.

Publisher: NPTEL
Humans have the ability to process data by focusing on a particular aspect of an image that carries different information. Have you ever wondered whether computer networks can mimic these cognitive attention skills? The primary objective of this course is to ascertain the significance of input data based on specific outputs. Study the various kinds of attention mechanisms applied to the tasks of computer vision.
An Introduction to Attention Models in Computer Vision
  • Duration

    3-4 Hours
  • Students

  • Accreditation






View course modules


Attention mechanisms in computer vision involve the process of imitating the intellectual observation of humans based on algorithms. The course explains the process of interpreting an input image by shifting the focus to its vital parts and fading the other parts of the image. It begins by describing the various technical features of recurrent neural networks in deep learning mechanisms. Then, you will explore the techniques used for aiming at a specific segment of input data in complex datasets. This technique will include the procedure for ranking the significant features of an image based on neural network approaches. Next, you will comprehend the significant connection between autoencoders and principal component analysis (PCA). The course explains the processes of generating text captions to an image using computer vision techniques. Then, you will study how Long Short-Term Memory (LSTM) attributes helps in enhancing the quality of captioning an image.

Next, the course explains the notion of visual question answering (VQA) in computer vision. You will discover how these questions target various aspects of an image by ascertaining its background. This section includes the procedure for building an artificial intelligence (AI) mechanism for ensuring an automatic reply to queries. In addition to this, you will explore the various VQA datasets and models that are used to carry out the tasks of answering questions based on images. Following this, you will study the process of having a significant conversation between the artificial intelligence agent and humans about visual content. You will discover how computer intelligence has the vision to evaluate specific responses and target progress. The evaluation protocols behind the prime conversation about visual contents using computer vision techniques are described. You will study the importance of deep neural networks in creating an encoder-decoder configuration for performing meaningful visual dialogues.

Finally, the course illustrates the different applications of various attention models. You will discover the process of performing read-write operations and mimicking the sequential drawing approaches using neural turing and recurrent attentive writers. The strategies of increasing the geometric invariance using spatial transformer networks (SPN) are highlighted. You will be taught about the notion of self-attention and transformers in computer vision. You will discover how the inputs converse with each other to ascertain the areas that require more attention. You will comprehend how self-attention mechanisms have replaced the recurrent neural networks (RNN) in modelling dependencies. Lastly, the process of enhancing the speed of training the attention models using transformers is explained. Would you like to understand how predictive and analogical human behaviours are being used in computer vision using various mathematical and computational methods? This course aims to address this fascinating question. Enrol today and become a master at the attention models in computer vision. 

Start Course Now