Alison's New App is now available on iOS and Android! Download Now
In this unit, I will try to define the notion of information, a mathematical notion of information, which we will be able to use for practical applications as well. In doing so, we will have to review some basic concepts of probability, which is the underlying mathematical tool that will be used throughout this course. So, let us get started.
So, let us start with today's class, with the following question. What is information? Now, when I raise this question in my class, some students answer information is power. Well, there are classes where that may be an acceptable answer, but this is one of these, this is not one of those classes. Here, we are looking for a mathematical measure of information, a concrete notion of information that will also allow us to measure information. So to help us motivate the answer to this question, let us try to think of the following, how much information will I reveal when I answer the following questions to you.
So, you look at these questions, these are very simple fill in the blanks. So the first question here is you see the letter T, you see the letter H and then you see a blank, what do you think is the answer to the blank? Well, here is what comes to my mind. This letter can be either E making the word THE, or it can be Y making the word THY, nothing else that comes to my mind. The second question is similar, maybe the answers which come to your mind are letters A, making the word CAT, letter O making the word COT, or the letter U making the word CUT. So, when you see these questions, you probably already have this guess list of answers, you know that the answer is one of these three. Similarly, for the last one, the words that, the letters that come to my mind are letters H, C making it CAT, P making it PAT, and maybe you can have some more guesses here, but when you see this question some guesses come to your mind.
So, that you already know, so given that when I answer this question how much information do you all reveal. Let us look at some other question. Here is a very vague question, what is the next word I am going to say. Well, you will have a long guess list and I will say one of those words.,Here is another very vague question, what is the 20th decimal place of pi? Now, if you have a computer, you can write a small program and it will give you precisely the 20th decimal place of pi, answering this question for you. But, so when I tell the answer to this question to you, how much information do you think I reveal to you? How do I measure that information?
So here is something we have observed that when I ask a question you already have a guess list of answer, but you are not sure which one is correct. When I answer, what I help you figure out is, I help you narrow down your guess list to a fixed answer. I basically reduce it uncertainty.So if you take a look at this example, if I just show you letter CO, you have a guess list, it can be COOL or COAL or COAT, and suppose now I reveal the next letter to you, I tell you actually the next letter is A then I have reduced your guess list to now only two possibilities from three in the beginning. And that is the informativeness of this revealing of this A, that is how informative my answer A was to you.
So this suggests a very simple definition of information. Information is the reduction of uncertainty; that is your definition of information; information is reduction of uncertainty. So in this course, we will try to formalize this principle that information is reduction of uncertainty and in particular, to use this principle, to be able to use this principle to quantitatively measure information, we need to figure out what is this uncertainty, what is, how do we measure uncertainty? So, we have replaced the question of measuring information with that of measuring uncertainty and information is simply the reduction in uncertainty. So we will justify this principle slowly we will build a little theory around it and this theory will help us do many things, it will help us do compression, communication, computation, and statistics; and we seek a quantitative theory of information and this particular notion of information, a measure of information is simply reduction of uncertainty, will allow us to come up with a quantitative theory of information.
How do we model uncertainty, how do we measure uncertainty, that is the question we now have to answer because we decided that for us information will be reduction in uncertainty.
So one possible, very simple answer to this question is we simply use the number of possible values in a guess list as the measure of uncertainty. So when I raise the question to you, you have a guess list of answers and the number of possible values, the size of that guess list is a measure of uncertainty. So let us take few more examples.
Suppose, I tell you that I have a prime P in my mind and that prime is less than 10, so you know that if it is less than 10, there are just four possibilities for this prime P, it can either be 2, 5, 7; 2, 3, 5 or 7; so there are just four possible values and therefore, with this measure of uncertainty, the number of possible values, the uncertainty of this P, the uncertainty in this unknown P is just 4.
Let us look at another question. Suppose, I showed you this letter, this word, C question mark T, and I told you that I have a word in my mind and it is a three-letter word, the first letter is C and the last letter is T and the middle letter is something I am not sure about. Then, in this case, you know that there are three possible values of this middle letter and therefore, your uncertainty here is 3. Let us look at a slightly more complicated example.
We think of this auto-completion in an email client, so I use Gmail sometime and Gmail’s auto-completion basically, tries to complete the sentences for me. So here is a sentence that I have that I try to type on my email client and perhaps, you cannot see it let me read it for you.
So it says - Dear all, there will be no information theory class today. A make-up class will be held on Tuesday. Sorry for, and then my clients suggest, the confusion. At this point, there could have been a, there could have been various guesses for completing my sentence, for the confusion was one, that is what my email client chose. But, it could have been, for the inconvenience; there could have been many more, but perhaps, these are the only two possibilities which my email client had in mind. So uncertainty for my email client is 2.
For you, there are many more possibilities because you have not seen my emails, you do not know what I typically write when I am writing my email, and so for you, this uncertainty is much larger. So this was another interesting example. Let us look at an example of different kind.
Suppose, I told you that I have a vector, a three-dimensional vector in my mind, which lies within a unit ball, a Euclidean ball of radius 1, and I told you that this vector can be stored using 16 bits per coordinate, what is the uncertainty of x? Well, this question is not very concrete, but maybe, you can think about the answer, what are the possible values of this x, how many such x are possible?
So I will leave you to think about this question but, let us move forward with another example this is also another interesting example.
So earlier, I told you that I had a prime in my mind and it was less than 10. Suppose, now I told you that I have a prime in my mind and it is less than 10 to the power 100, what is the uncertainty now? Well, now the guess list is really big, you need a really large computer to list down all the primes less than 10 to the power 100 and count the number of possibilities, and that is the uncertainty.
So, each of these examples are of different kinds. The first example where I told you that I have a prime P less than 10 in my mind, that example is very concrete all of us will, may agree that the uncertainty in this example is just 4, there are just four possibilities there.
The second example when I told you that I have three-letter word in my mind and the first letter is C and the last letter is T, well, that is a little bit less concrete. Based on how good your vocabulary is, different words may come to your mind but for most of us, that list would be either CAT, CUT, or COT.
Example three, which was the example of my Gmail client, email client, everyone can think of many other guesses and your guesses will depend on how well you know me. So my email client, which knows me really well because it reads all my emails, for that that guess list was very small, it just narrowed it down to two or three guesses; “sorry for the inconvenience”, “sorry for the confusion”, perhaps “sorry for the complication”, “sorry for the late notice”, it could have guessed any of these.
But for you, “sorry for the pizza” is a valid choice because you have no clue what I am going to say next because you do not know me well enough. So, what this example brings out is that this measure of uncertainty depends on who is trying to answer the question.
Example four, which was this unknown vector, three-dimensional vector within a unit ball, there a different aspect comes out. The number of guesses that you have depends on the acceptable accuracy. So, there can be many, there are infinitely many vectors in the unit ball, but maybe the person asking the question, that is me here, will accept an answer to a certain accuracy, and based on that accuracy there can be finitely many guesses and that is the measure of uncertainty here, it is related to what accuracy is acceptable to you.
The final question, when I just told you that I have a prime in my mind and it is less than 10 to the power 100, well, there the possible guess list can be very large actually, and it depends on how much computation you have that you can form your guess list.
Someone with a very large computation at their disposal can form a very large guess list, someone with smaller computation at their disposal will make smaller guess list. There is another interesting aspect that the uncertainty also depends on computation that is available to you.
So one more point, maybe this is slightly different aspect. Till now we were answering, we were measuring uncertainty as the number of possible values but suppose, we need to store the answer to the previous questions, how many bits will it take? What we will do is, we will look at the number of possible answers and take the log of that, log to the base 2, and that is the number of bits it will take. So, this log to the base 2 of the number of possible answers is the number of bits it will take to store the answer.
So, instead of just saying that the cardinality of my guess list is my uncertainty if I want to measure uncertainty in bits I can say the log of that cardinality, log to the base 2, is the measure of uncertainty. Later, we will see in this course why this makes more sense than just saying the cardinality is the measure of uncertainty; the log of cardinality makes more sense, we will see that point.
So from these examples, some very interesting lessons come out. First, uncertainty can vary based on your knowledge of belief, different people can have different amount of uncertainty for the same question.
Second, all the elements in your guess list may not be equally likely, you may have different likelihood for different guesses, for instance, when I told you that I am thinking of a word, a three-letter word, which starts with C and ends with T, perhaps, the most likely answer is CAT; perhaps, the second most likely answer is COT; maybe, some of you get to know that I have a young kid and then you may think that the most likely answer is COT and second most likely answer is CAT.
So these guesses do not come out just as a guess, each guess has a likelihood of occurring in your mind. Third aspect, which we saw in that example of an unknown x in the unit Euclidean ball, in the unit Euclidean ball is the notion of accuracy. The accuracy up to which answer is acceptable also determines the measure of uncertainty. Well, this point is not very important for now, but we will come back to it. So these are some of the lessons which come out.
So one very important thing that I have touched upon here is that all guesses are not equally likely and they depend on, and how you rank order the guesses which guesses are more likely for you and which are less, depend on your previous knowledge of belief.
Motivated by these things what we will do is, we will use not only the cardinality of guess list as a measure of uncertainty we will have some more refined information about it, we will use probabilities to model the likelihood of different guesses. So, not only we will have a list of guesses, we will also have probabilities associated with each guess to model the likelihood of each guess.
So, here is what we will do, I am just setting up some basic notation for the course. All right, so throughout this strange-looking X is my calligraphic X, it is my set of guesses, set of all possible answers that I can have. The set X, in most cases, will be a discrete set but in some cases, we will also look at continuous.
So, discrete set is something like natural numbers, something like numbers between 1 and 10; a continuous set is something like reals, which is infinitely many, uncountably many points. The notation here is notation for cardinality of a set, and we assume that this set X, the set of all possible answers, has k possible values, number of possible outcomes; it is some number k because it is a discrete finite set we can have this number k.
For each guess X in this set, we have a probability P x associated with that answer. So the answer x is the correct answer in our view with probability P x, this moderates the likelihood of answer x being correct. So, with this we have a list of likelihoods P x1 to P xk, remember there are k elements in this set, so we have a list of likelihoods P x1 to P xk which represent how likely our guesses are in our view and this provides a ranking between the guesses as well.
One question will come to your mind, may come to your mind, who came up with this P? Well, this answer can only be subjective at this point, so this P is with the answer, the person who is trying to answer the question. So he or she came up with that guess list, he or she came up with also the likelihoods of different guesses, it depends on his or her knowledge and, or you can think of it as some general wisdom or some common knowledge which everyone has agreed on on this guess list.
But, for this course, we will not worry about that, we will assume that this guess list is available and different answers have different possibilities. For example four, which was that three-dimensional vector example, we need not just look at discrete probabilities like here, we also need to look at continuous distributions.
So with this in mind what we realize is that to measure and model uncertainty, we have to use probability. So what I will do next, I will try to review some basics of probability, some basic probability, maybe advanced undergraduate probability, so that we can build on it and have a concrete mathematical model of uncertainty, that is what we will do next.
Log in to save your progress and obtain a certificate in Alison’s free Diploma in the Foundations of Information Theory online course
Sign up to save your progress and obtain a certificate in Alison’s free Diploma in the Foundations of Information Theory online course
Please enter you email address and we will mail you a link to reset your password.