 Notes Study Reminders Support
Text Version

### Measuring Randomness in a Random Variable

We will email you at these times to remind you to study.
• Monday

Tuesday

Wednesday

Thursday

Friday

Saturday

Sunday

So, now that we have seen a definition of a distance between distributions namely we define this total variation distance which is a good measure for distance between two distributions. We return to a question of measuring the amount of randomness in a random variable. So, we will start with the first question which studies, how many uniform bits can be extracted, perfectly uniform random bits can be extracted out of a given random variable. In fact will relax a requirement a little bit, instead of asking for exact exactly uniform random bits or exactly uniform random variable what we will ask for is approximately uniform random variable so almost random bits. So, here is the formal version. Given a sample X, so given a single observation X from a distribution P we want to generate a uniform random variable U out of X by applying some function to X. So, here is a formal version of this question. So, you choose your accuracy epsilon. This is something between 0 and 1 and it can you can just imagine something large like 0.99 or something like that. The question is what is the largest value of L such that we can find a function f which takes input X and produces L bits and satisfies this bound. Now what is this bound? What is this bound here? this bound says that the output of the function f x is approximately uniform on L bits. So, it is just like a random L bit string and when I say approximately this approximation is in distribution, the total variation distance between f x and U is at most epsilon. So, it is approximately uniform. That is a requirement what is the largest such L that we can generate you may, you may be a bit worried about this error and how it will play how it will affect your gem quality of your the randomness that you have generated. So, remember that this total variation distance can be expressed as sup over all a the different between probability of this first random variable and the second, so in the first PMF and the second PMF. Now suppose you have designed some algorithm which uses randomness generated by Pu. What that formula says is that if you replace Pu with this pfx if you replace U with this f x then all your errors will increase by at most epsilon. So, it is not a big loss if you replace U with f x and that is why it makes sense to seek random number generation in this approximate sense. So, you do not want perfectly uniform you want almost uniform. Suppose this is a question you want to solve what will you do? Here is a very simple scheme, I am putting scheme in quotes because it is really a scheme, it is only the idea for a scheme. You will divide x into parts with roughly equal probability. And this roughly equal probability is important here. You may not be able to divide X into parts with exactly equal probability. So, roughly equal probability and this epsilon basically will account for this roughly part. So, probability of each part may not be the same as other and this mismatch between probabilities is what contributes to epsilon. Also the more parts you have the larger is this L. So, you want to divide x into as large number of parts as possible such that each part has roughly the same probability at the P and if you can do that then well, then you can generate this F and then you can find this F very larger. Keep in mind this is some small point but perhaps it is good to revise it here. You can view a function of X as simply a partition where each part corresponds to those x for which you declare a particular output. Let us just what you can be a function of Fs alright. So, how do we do this? Let us do this a little bit more formally. So, formally first observation is that we cannot partition all of x into this kind of parts. We at least need some handle over how small the probability of how large the probability of each part is. For instance if x has just two parts with probable if X has just two elements with probability 3 by 4 and 1 by 4 then there is nothing you can do about it. The only partition you can create is keep one element with 3 by 4 in one part and the second element with 1 by 4 in second part and there is no way you can make this random variable uniform with very high accuracy epsilon. So, it is important to first take into account. What is the largest probability of an element in it. In particular we can work with this set the sort of typical set T lambda. Lambda is the parameter that I will bring in soon.
In this video, we will now visit our second notion of measure of Randomness namely we want to find out what is the least number of bits L least number of bits L such that for if I give you a uniform random variable on L bits. So, if this if I give you L independent coin tosses which are unbiased then you can find a function f such that you can generate a given distribution P using this U.
So, P of f of U is close to the given distribution P. It is closed in this total variation distance epsilon. So what is the minimum that such bits required? This is the number of bits required to generate this random variable. And we will think of it as a measure of randomness in this kind of variable. This is sort of the complementary definition to the one that we saw in the previous video. So, this problem is very much related to the one we saw last time and rather than, I will just directly jump into the scheme.
The scheme is very much like the scheme we saw last time where last time what we did was we were given a random variable of a given distribution and we want to generate uniform random variable out of it. What we did was we divided the domain into small parts such that each part had roughly same probability some mass was left and that we dumped into this final part, which we are calling XM plus 1.
But the first M parts have a balanced they had roughly same probability. Therefore if you declare different values on these different parts that random variable is almost uniform and we showed this formally and what the amazing part about the scheme was how what was the number of bits that you were able to extract? Well, it was the entropy of the random variable.
So, now we will look at a complementary scheme where we are given uniform random variable and we want to extract, we want to generate a TMF P out of it. This is something which may be very important in practice. You may have seen similar questions but in a probability course where you use the inverse of the probability cumulative distribution function to do this, but what we will do now is very different we are accounting for the number of random bits used.
So, here is a scheme we define the set. This is a new set T lambda that we are defining it is different from the previous one, define this set unlike the previous one where we had a large probability lower bound on minus log Px, this is your P the one you want to generate. Suppose Lambda is a large probability upper bound of minus log Px. So, this is my set T lambda this one and we assume that the lambda is large probability.
So, you can check very easily check that for each x in T lambda Px is greater than equal to 2 to the power minus lambda. So, what we do is we set M to be greater than 2 to the power Lambda the same Lambda here and now we consider a partition of 1 to M. So, we since each probability is greater than 2 to the power minus Lambda we can say something about the cardinality of the set.
So, if you sum all the elements of the set the probabilities of all the elements that sums to 1 and that must be less than equal to the probability must be less than equal to 1 but each element has probability more than 2 to the power minus lambda. Therefore, there are at most 2 to the power lambda elements in this set. So, consider a partition Y1 to Y M prime plus 1 of this 1 to M set such that M prime is equal to 2 to the power to the cardinality of T Lambda.
So, remember cardinality of T lambda is less than equal to 2 to the power lambda. Why? Because each element has probability at least to the power minus Lambda and probabilities sum to 1 so the total number of elements in T Lambda cannot accede to the power Lambda. So, this M prime that we have is less than M by our assumption. We assume M is greater than 2 to the power Lambda.
Therefore you can form this partition of 1 to M of size M prime plus 1. Such that, what do you want out of this partition? What we want out of this partition is if you take any element X remember, there are M prime elements D lambda. What we want is that each part has probability that is between Pxi and Pxi plus 1 by M.
So, the question is, why can we do this or how can we do this rather? So, take each element, we know that each element is probability greater than 2 to the power minus lambda and 1 by M is smaller than that. So, you just basically take one element and these parts here is probability is under a uniform distribution. So, this is just the size of the number of elements you will take here that number divided by M prime by M which is more than 2 to the power Lambda.
So, you can convince yourself that this kind of partition Y1 to YM prime plus 1 can be designed it is just a very simple account keeping. This U is the uniform distribution. So, now this is what we have done. We have constructed these parts as many just one more than the number of elements in T Lambda just one more and each part has probability for the first M prime parts the probability is between the probability of elements here plus 1 by M.
So, we roughly have these element we roughly have these parts which have probabilities of this element. So, now you can imagine what we are trying to do we whenever we see something in this part this Yi we will declare Xi, that because you want to generate something from PMF P and it is off by some 1 by M but that is the error you have to account for.
And this last element x m prime plus 1 is something outside the T lambda some arbitrary element you can choose. Now what is a function the function that we want to use to generate TMF? Well, it is very simple when you see while in why are you declare Xi that is a function.
So, let us analyze the total variation distance between the generated regenerated output and then target distribution. So, this is the generated and this guy here it is the target distribution. And there are close that is what you want to say. So, for each element x in T lambda probability that you output that element minus T probability of that element under this distribution P, this differences we will compare and claim that this is small or summed over all the elements and then well this function never declares anything outside T lambda.
The only thing in declares outside T lambda is x m prime plus 1. So, but the you that we account for the error here and then this y m prime plus 1 final part, but that also you can account for the error you validate you can check this bound. So, will bound each of this term.
So, for the first term, each of these elements what we notice is, so let us analyze each of these terms. So, for the first term that is how we constructed the set this Yi minus Pxi remember that Yi minus Pxi. This is the condition the green one here is they differ by just 1 by m. That is how we constructed yM. So, this is at most 1 by M for each part and there are M prime elements in T lambda and that is M prime by M.
The first one is easy third term is just this guy. So, remember what is the third term, third term is this guy. So, for the third term so this is the last term, it requires some thought. So, what we have seen is that for each Xi and this is less than this that is a construction of this partition. So, we already have this found for the first term and now we sum over all of these guys.
So, you get P of T Lambda there and what you get here you get the first M prime parts, but remember that we have partitioned the whole 1 to M into just this M prime plus 1 part. So, this is just equal to 1 minus P of last part under uniform distribution, correct? So, this guy here we can check from this one is less than equal to 1 minus T Lambda complement probability of that. So, by our assumption this is less than equal to epsilon.
So, what we have shown is that this total variation is so this is T lambda complement one part and then another one of such thing. So, divide two time that when you divide by 2 what you get is this is less than equal to this by this and then you substitute that this I put a question here for you to think about maybe you can pause and think about because I will answer this question right away.
So, remember that our construction tells us that each element in T lambda is this and from that we verified this earlier, let me read later proof. So, if for this is true for all x in T lambda, so if you add the probability of all the elements in T lambda this probability must be greater than the cardinality of T lambda times 2 to the power minus lambda. But this sum here must be less than equal to 1.
Because probability some to at most 1 and therefore T lambda must be less than 2 lambda, 2 to the power lambda. So, you get this bound and you can choose any M of your choice. We choose M to be this guy here. So, the choice of M, and then we substitute this what we get is that this is less than equal to 3 by 2 lambda. So, the M is the number of bits the uniform, the size of the informed random variable that we used.
So, what have we shown here is the theorem that we have shown if shown that suppose that you can find a lambda for which the probability for which minus log P x is less than equal to lambda B probability greater than 1 minus Epsilon. So it's a large probability upper bound on this minus log the px. This is the quantity we have been playing around this week and also last week. So, suppose that is true.
So, this is related to your PMFP this lambda which gives this bound is associated with you PMFP. The one you want to generate then if you use L equal to lambda log 1 by epsilon bits that is just log of this kind log M. Lambda log 1 by epsilon bits then if you generate a uniform random variable with this length or if you generate these many random bits and you can find a function f the one we specified above such that the total variation distance between Fu and P is at most 3 by 2 epsilon.
So, this is a number of bits needed roughly lambda. Now once again, what is a good estimate for lambda earlier we had large probability lower bound this now, we need large probability upper bound on this. Well, that is a good estimate is the expected value of this guy, which is the same guy, which is the