Loading
Notes
Study Reminders
Support
Text Version

Probability Models for Count Data

Set your study reminders

We will email you at these times to remind you to study.
  • Monday

    -

    7am

    +

    Tuesday

    -

    7am

    +

    Wednesday

    -

    7am

    +

    Thursday

    -

    7am

    +

    Friday

    -

    7am

    +

    Saturday

    -

    7am

    +

    Sunday

    -

    7am

    +

In this lecture we discuss probability models for counts; we discuss binomial and Poisson distributions 2 discrete distributions. Where we actually try to count and see what is the probability of some count happening out of some possibilities.So, let us explain this in detail familiar situations could be tossing a coin with which we are very very familiar whenever we study probability. So, the simple question is and if I toss a coin10 times what is the probability that I get 4 heads, what is the probability that I get6 tails, what is the probability I do not get a head at all what is the probability that I get more than 1 head and so on. Another situation that is often discussed in textbooks and literature is a medical representative trying to meet a doctor with a certain probability, representative goes and asks for a meeting and the doctor may meet the doctor may say later. So, there is a probability of meeting the doctor similar questions come out of 10times, how many times or what is the probability that the representative is able to meet the doctor 6 times. Just to extend it if the probability of meeting the doctor is the same if he meets 10 different doctors what is the probability or tries to meet 10 different doctors what is the probability that he meets 6. So very similar situation; winning a match is reasonably familiar in the sense just like tossing a coin, there is a probability associated with winning a match.All these have 3 common characteristics each is a random variable that has 2 outcomes, one of which is called a success and the other is called a failure. The moment we call one of the outcomes a success the other automatically becomes a failure. Now in the example of winning a match, you could say winning success and losing is a failure, in the example of a medical representative meeting a doctor successfully having a meeting with the doctor would be called a success, and not being able to meet the doctor could be called failure.Whereas in tossing a coin we have to define what success is and what failure is and it depends on how we define in term one could define the probability of getting a head as a success and getting a tail as a failure, somebody else would define the probability of getting a tail as a success and head as a failure.Sometimes when we tried to do an inspection and try to find out defective items, success could be identifying a defective item; whereas, in reality, a defective item would not mean something successful it would mean something that is not successful. So, it only depends on what we define as success and what we define as not success which becomes a failure. So there are 2 outcomes which we call success and failure.Now, the probability of success is the same irrespective of the number of times it happens tossing a coin is a very good example. So, we might have just got a head, and then we toss again what is the probability of getting a head half, it neither increases nor decreases because of the earlier attempt and results of successive events are independent once again tossing a coin is a very good example of successive events being independent. It actually doesn't matter whether the previous toss resulted in a head or a tail, the probability of head and tail remained the same, so to that extent they are independent.If we look at winning a match it is expected to be independent it does not matter whether you won the previous match or not, but then you play a match your probability of victory is the same. Medical representative meeting a doctor is expected to be independent at times we may question that because maybe the last attempt we the medical representative was able to meet the doctor and therefore the doctor might possibly decline and so on.But if we extend the same example by saying that this medical representative is trying to meet 10 different doctors then we can quickly understand that the events are independent unless the doctors talk to each other. But let us assume that these 3 common characteristics are there in this situation and such a trial is called a Bernoulli trial.So, again we represent the same thing there are 2 outcomes B equal to 1 the trial is if it is a success and a 0 if it is a failure, success is with a probability of p, and failure is a probability of 1 minus p. So, the expected value of B is 1 into p plus 0 into 1 minus p which is p. The variance of B is 0 minus p the whole square into the probability of B equal to 0 plus 1 minus p the whole square into the probability of B equal to 1, which is p square into 1 minus p plus 1 minus p the whole square into p which is p into 1 minus p, so again to repeat random variables with 3 characteristics are known as Bernoulli trials.So, there are only 2 possible outcomes which are called success and failure, probability of success is the same for every trial and the results are independent. So, we just ask a question in reality it is true we discuss this aspect particularly with the medical representative visiting a doctor, but then if there are 10 doctors and we want to do that, then they are independent. The same thing is true with tossing a coin, the problem is the same whether the same individual tosses a coin 10 times and you want to find out the probability of getting 4 heads versus 10 different people tossing at the same time with the same probability of getting a head and then you want to find out; out of these10 what is the probability that 4 got heads, so the problem is the same.Now, we define a random variable that counts the number of successes. So, every binomial random variable is the sum of the given number of iid Bernoullitrials independent identically distributed in independent Bernoulli trials. So, let n be the number of Bernoulli trials and p be the probability of success for each trial.So, the expected value of Y is the expected value of B 1 plus the expected value of B 2 plus the expected value of B n which is p plus p plus p n times. So, when this Bernoulli trial is repeated n times expected value is n into p and the variance of y is the variance of B 1 plus the variance of B 2 and so on. So, it is p into 1 minus p plus p into 1 minus p n times, so n into p into 1 minus p, we consistently use p and 1 minus p to represent the probability of success and probability of failure. At times we also use q equal to 1 minus p as an additional notation and then say that the variance is n into p into q where q is 1 minus p which is the probability of failure.Now, assume now we define what is called binomial probabilities so assume n equal to10. So, the probability of y the random variable equal to 0 will be the probability of the first one equal to 0, and second one equal to 0 and the third one equal to 0, and so on. So, each is a failure so each is 1 minus p so 1 minus p multiplied 10 times, so 1 minus p to the power 10; Y equal to 1 success. So, 1 success out of 10 is the first one being successful in the others fail the second one being successful and the others fail and so on.So, it's 10 times p into 1 minus p to the power 9 and in general, we can now show that the probability of x successes out of n trials is n C x p to the power x q to the power n minus x. So, there are n trials out of which x is successful so that is p to the power x the remaining minus x is a failure. So, q or 1 minus p to the power n minus x and the x successes out of n trials can happen n C x times, therefore n C x p power x q to the power n minus x.For example, if we extrapolate this as y equal to 2 then one could go ahead and say 1 and2 being successful the rest not 1 and 3 being successful the rest not 1 and 4 being successful, and so on. So, finally, it boils down to choosing 2 out of 10, 10 C 2 ways into p to the power x p square q to the power n minus x q to the power8 or 1 minus p to the power 8, so in general its n C x p power x q to the power n minus x.Just try to find the probabilities for n equal to 6 and p equal to 2, so x equal to 0 n C0 p to the power 0 q to the power 6 we get 0.262144. So, the probability of one success out of 6 is n C 1 6 C 1 p to the power 1 q to the power 5, so which is 6 C1 is 6 into 0.2 into 0.8 to the power 5 which is 0.393, 2 out of 6 is 245, 3 out of 6 is 081, 4 out of 6 is 015, 5 out of 6 is 001536 and all 6 out of 6 is 000064. If we try to plot these they obviously add up to 1 we can check that 0.26, 0.39 is roughly about 0.65 this0.25 is about 0.8 0.8889 and so on, 0.26 plus 0.39 is about 0.65 here it is about 0.25.So, 0.65 plus 0.25 is 0.9, 0.98 0.99 and the fractions add up to 1.The plot also tells us something interesting that when we have n equal to 6 and depending of course on p equal to 2 since p equal to 0.2 the maximum probability happens for 1here and so on and one can show that as p increases it moves a little bit to the right.But after some p of 4 p of 5 p of 6 etc, you realize that they have very very small values, and they kind of come close to 1 as we add them they come close to 1 the smaller values are closer to 0, and progressively decreasing.Now, we try to look at Poisson random variables. So, we look at again some situations the numberof visitors in an hour, the number of phone calls in a call center per hour, number of defects in a square centimeter of the wafer, and so on. So, the Poisson random variable describes the number of events determined by a random process during an interval it is very important during an interval, the parameter lambda which is shown by this symbol here the letter the greek letter lambda represents the rate within the disjoint intervals.So, if x denotes a Poisson random variable with a given parameter lambda, then the probability distribution of p of X taking the value small x is equal to e to the power minus lambda to the power x by x factorial. Now in the Poisson distribution, there is no limit on the size of the variable X can take any value, for example, if lambda is 2 per minute.For example, we say that people arrive at the rate of 2 per minute in probability of0 people arriving is 0.135 which comes from e to the power minus lambda power rx by x factorial. The probability of 2 people coming in the interval is 0.135 probability of 1person coming in the interval is 0.27, so that is got by e to the power minus lambda to the power 1 by 1 factorial where lambda is equal to 2.So, the probability of 3 people coming in that interval is 0.18, 4 people are 0.09 and 5 is0.036 and as X increases small x increases the probability of X equal to x becomes very very small. So, even here if the average is 2 per minute it is fair acceptable that no person comes 13 percent of the times, 1 person comes 27 percent of the times, 2people come 27 percent of the times, and so on.If we start adding 0.135 plus 0.27 is 0.405 plus 0.27 is 0.675 plus 0.18 is about 0.8550.951, we realize that around with p equal probability of 5 it almost reaches 1, but then X can take any value. So, as small x becomes larger the probability becomes very very small in a Poisson random variable. Though we are not going to prove this so the probability of X equal to x becomes the smaller expected value of the random variable is lambda the variance is also equal to lambda.Customers arrive at an average rate of 10 minutes assume a Poisson process what is the probability of 6 people arriving in the next 1 hour. So, lambda is 6 per hour 10 minutes 6 per hour, p of 6 is e to the power minus lambda lambda power x by x factorial e to the power minus 6 into 6 to the power 6 by 6 factorial which is 0.1606. So, even on an average 6 people arrive in an hour on average, but then we realize their actual probability of 6 people arriving in an hour it is very small.So, let us continue on this topic with a little bit of discussion as we have been doing in all previous topics.So, we now have a match the following, so assume that X is a Poisson random variable and Y is a binomial random variable. So, we try to match the mean of X so mean of the Poisson variable is lambda, so mean is lambda, expected value of Y so Y is a binomial random variableso expected value is equal to np; variance of Y, Y is a binomial variable so ninto p into q or n into p into 1 minus p which is shown here probability that X equal to1 X is Poisson. So, the probability that X equal to 1 is the equation is e to the power minus lambda lambda to the power X by X factorial So, when we put X equal to 1 X factorial is1, so e to the power minus lambda into lambda to the power 1 which is lambda e to the minus lambda chance of failure 5 binomial, so binomial we define success and failure. So, the chance here is a probability, so the probability of success is p probability of failure is 1 minus p probability that Y is equal to n binomial. So n successes n C x p power x q to the power n minus x, so n C np to the power n q to the power n minus n n C n is 1 q to the power n minus is q to the power 0 which is 1 and therefore the value is n C x p power x q power n minus, n C n p power n q to the power n minus n which is p to the power n.Now, let us look at some situations and try to study them past data indicate that 5 percent of the arriving parts have defects, 1000 parts have arrived and the inspector picks 25 at random and tests them for defects. A Bernoulli assumption is incorrect because of finite population one may disagree with this one can say that 1000 parts are large enough for a population. But then we could take this has to be reasonably large and continue and that is exactly how most of the inspection also happens, that we take a reasonably large number and then we take a small fraction of them to do the inspection.The binomial model can assume n is equal to 25 and p equal to 0.05. So, 5 percent so depends on what we define as success and what we define as a failure. So, if the defect is a success then n equal to 25 equal to 0.05 if not being defective is the success then p is 0.95. Assuming binomial the probability of the first 3 being faulty is 0.05 cubed it would not p. So, this will be one minus 0.95 into 0.95 and so on. Next one probability of winning a match is 0.4 assuming that there are no draws or ties or no results and so on, which has a higher probability win win win FFF is lost or fail and win-win win fail win-win and failed. Now we look at try to model this as binomial then we realize that out of 6 matches 3 victories and 3 defeats is the probability that we are looking at. So, the sequence does not matter I think that is that that is a big earning from this the sequence does not matter. So, the probability of 3 wins irrespective of the order in which they arrive in the same. So, this will be n C x p power x q to the power minus x. So 6 and 3 so we could do this 6 C 3 0.4 to the power 3 0.6 to the power3 which works out to be 0.276. A die has 4 sides pasted red and 2 sides pasted green it is rolled 6 times which has a higher probability of 4 red and 2 green or 3 red and3 green. Even though this question is about a die so it is not about the numbers 1 to6, therefore we should not use the probability of 1 by 6 and so on.Now, this has 4 sides pasted red and 2 sides pasted green, so if we define red as a success then the probability of success is 4 by 6 which is 2 by 3, and the probability of failure is 1by 3. Now we have to find out the probability of 4 red and 2 green which is given by n Cx p power x q to the power n minus x. So, we would have 6 times it is rolled so 4 red, so 6 C 4 2 by 3 to the power 4 1 by 3 to the power 2 which is 0.329 and the other one is6 C 3 2 by 3 to the power 3 1 by 3 to the power 3 which is 0.2195 and therefore 4 red and 2 green has a higher probability than 3 red and 3 green.Now, 2 separate teams have to write code that is merged to form the final code before testing, each has a 50 percent chance of completing it in time. Is there a 50 percent chance that the testing will start in time no, it would be one could take 1 by 2 as success and 1by 2 as a failure because of the 50 percent, and then we realize the answer is actually when it started in time will be both will be successful. So, 2 C 2 half to the power2 half to the power 2 minus 2 which is 1 by 4, another way of doing it is the probability that team A successful is 0.5 team B successful 0.5 both being successful is 0.5 into 0.5which is 0.25 and therefore we do not have a 50 percent chance of starting the testing in time.Now, a jeweler while fitting a gem into an ornament breaks at 1 percent of the time, if he works on 100 stones what is the probability of breaking at least 2 stones. So, we could model this as binomial or Poisson. Poisson would give us a lambda, so 1 percent of the times he breaks out of 100 times. So, we can take lambda equal to 100 into 1 percent which is 1, and therefore Poisson breaking at least 2 stones are 1 minus probability of breaking stone less probability of breaking 1 stone. So, each would become 1 by e therefore, the answer is 1 minus 1 by e minus 1 bye which is 1 minus 2 bye which is 0.2641.Now, if we use binomial then we would have 1 minus probability of 0 break and 1 breaking.So, 1 minus 0.99 to the power 100 minus 100 C 1 which is 100 into 0.01 into 0.99 to the power 99 which on simplification gives us 0.2642. So, we also observe that in this instance either a binomial way of approaching it or approaching it as Poisson gives us the same probability. There is a 10 percent chance that a cow eats a harmful plant and becomes sick, what is the probability that all 10 cows are not sick when they graced yesterday in an area that has these plants try binomial and Poisson. So, p is 0.9 because there is a 10 percent chance that the cow can become sick, therefore the probability that all the cows are not sick is 0.9 to the power 10 which is 0.3487.So, if we look at poison 10 percent chance there are 10 cows, so lambda is 1 X equal to 0. So e to the power minus 1 e to the power minus lambda lambda to the power X by X factorials e to the power minus 1 1 to the power 0 by factorial 0 so e to the power minus 1 which is 0.3678Batsman on an average hits a 6 every 10 balls what is the probability that he hit 6 sixes in an innings where he faces 30 balls. So, every 10 balls he hits one 6, so p equal to0.1 q equal to 0.9, and then we have to do out of 30 what is the probability of hitting 6 sixes. So, 30 C 6 0.1 to the power 6 0.9 to the power 24 which is 0.032, when we doa Poisson so he hits a 6 every 10 balls so 30 balls so lambda is equal to 3 and x is equal to 6 sixes so the probability is 0.0504. So, p of 6 is equal to e to the power minus3 into 3 to the power 6 by 6 factorial 0.0504. Poisson and binomial we have used alternately for some problems we have actually used both, it is also possible to show that binomial approaches Poisson when n is large and p is small and it approaches Poisson distribution and therefore we would find that in some cases the answers are close, while in some cases the answers are slightly different. So, with this, we complete our discussion on binomial and Poisson models.

In this lecture, we study the normal probability model, where the random variable follows a normal distribution.So, we first try to understand the normal distribution. So, normal random variables have bell-shaped histograms, all of us have seen the bell-shaped curve we will also be showing the bell-shaped curve in this lecture. The probability distribution of a normal variable is the bell curve. The probability of distribution of any random variable that is the sum of enough independent random variables is also bell-shaped, we will see that. If random variables to be summed have a normal distribution, then the sum has a normal distribution. Sum just about any random variable is eventually normally distributed.So, if we take a random variable and keep summing them we at some point would come to the normal distribution. So, you have seen 2 or 3 sentences and we will try to explain these sentences suitably.For example, you consider tossing a coin 10 times and compute the probability of 0 heads to 10 heads. So, p is equal to 0.5 and n is equal to 10. We realized that p is 0.5 probability of success is also equal to the probability of failure and therefore, P of 0 which means the probability of getting 0 heads assuming head is a success can be calculated by n C X Power X Q to the power n minus x. Q is 1 minus P we have seen that in the previous lectures binomial. So, P is also equal to Q therefore, P of 0 you seem C 0 which is one P to the power 0 0 heads and Q to the power 10 and since P and Q are equal. You would have Pto the power 10 in all these cases, except the n C X will change.So, the probability of 0 heads will be equal to the probability of getting 10 heads, which is equal to 0.0009766. Similarly, the probability of getting one head will be equal to the probability of getting 9 heads. Please note in this case, because P is equal to Q this happens and both are equal to 0.5. So, 0.009766, P of 2 is equal to P of 8 0.044, P of getting 3 heads is equal to P of getting 7 heads which are 0.117, P of 4 is equal to P of 6, 0.205 and P 5 is 0.246. So, if we add from P of 0 to P of 10, we will get 1. So, this comes from the binomial distribution doing.And if we try to plot this, we try to get a picture which is like this, which is very similar to the normal curve. So, the central limit theorem which is a very important theorem; says, the probability distribution of a sum of independent random variables of comparable variance tends to a normal distribution as the number of summed variables increases.So, try tossing the coin as n tends to infinity we will get this.Now, what we will look at more in this lecture is, the standard normal distribution, we will study this further. There are some more equations that describe the normal distribution which we would possibly not do in this introductory course on probability and statistics. Perhaps, the first level course we would look at all of them. This is the normal distribution curve or the bell-shaped curve. This is also the standard normal we will also see the difference.A typically standard normal distribution has mu equal to 0 and variance equal to 1. The area under the curve will be equal to 1. So, otherwise, you would have a mu here, now we have 0 here. Now also realize that this does not touch the X-axis from either side. It can asymptotically converge it just goes on and on. Therefore, in principle, the random variable can take any value. Now, remember that both the normal curve as well as the standard normal curve look similar their shape is the same, except that we have equal to 0 in the standard normal and the corresponding mu in the normal distribution.We will see examples to understand all of this.So, which of the following can be treated as normal. So, whenever the normal comes one has to understand the symmetry, one has to understand the peak in the middle, and so on.So, when we plot so, what kind of a curve can we get? And from that curve can we say that something is normal. So, simple characteristics are the random variable can take any value, there is a peak at mu and then is a bell-shaped curve and then there is asymmetry. So, we will look at all these factors then and try to answer these questions.Marks obtained out of 100 by 200 students in a subject. Generally, we could look at this kind of as a normal distribution in the sense that, there are some interesting reasons just why we need not. Because we just saw that the mark, if we assume it to be normal can take a very large value, it can take a very small value as well, but then when we are talking about an exam. In a subject, we have clearly defined boundaries. Let's say 0 to 100 and therefore, we do not have a value of X equal to 1 or 1 and so on. But in spite of that, we could expect a reasonable amount of symmetry. And we could think of this as close to a normal. The money value of each purchase in a supermarket in a day or may not be close to normal. What will happen is the average will not peak at the average.We could have few very large purchases, we could have a large number of small purchases, and so on So, it could be a skewed distribution. So, the skewed distribution would not be symmetric, we have seen skewed distributions are skewed distribution earlier in this course. So, it will taper to the right the peak will shift will be to the left if it is right-skewed and the other way. If it is left-skewed, career scores in ascending order are of a cricketer, it talks about individual scores. So, we would not be able to do that. But if we sort these career scores in some order and try to build a histogram and so on. So, one might try to get a picture that reasonably close. But again in this case there will be a small number of very large scores, and a large number of small scores. So, we could expect some amount of skewness in the data and therefore, we need not treated as close to normal. The number of visitors in a day toa department. Again may not be very close to normal we could have some days. You could have simply a bunch of visitors, and we could have 40 or 50 visitors on some days, and on some other days, we would have a small. So, we will have a large number of days with a small number of visitors and a small number of days with a large number of visitors and therefore, would not be close to normal. Now, what is the relationship between the normal distribution and the standard normal distribution? So, we will be working with a standard normal distribution most of the time. So, a given normal distribution will have a given mu and a given sigma. The standard normal changes that to mu equal to 0 and variance equal to 1. So, what is the difference? So, what is the difference is here? Z score measures the number of standard deviations that separates a given value from the mean.So, if mu is the mean and sigma is a standard deviation and x is a given value we calculate what is called x minus mu by sigma. So, x minus mu is the difference divided by sigma is the difference, divided by the standard deviation which tells us the number of standard deviations that separates the value from the mean.For example, if z equal to 2 then x minus mu is equal to 2 sigmas. Therefore, the numberof standard deviations that separate the value from the mean is 2. So, z is x minus mu by sigma so, quickly to do a computation. The average mark in a class of 200 students is assumed to be normal with 60, 20. So, mu is 60 and the standard deviation is 20. Find the probability that a randomly chosen student has a mark greater than 70.So, we first find out z. So, in this case, we normally used small z lowercase z. So, have shown it is upper case Z here, but we use z is equal to x minus mu by sigma, muis 60 sigma is 20 x is 70. So, z corresponding to x equal to 70 is x minus mu by sigma 70minus 60 which is 10 divided by 20 which is 0.5. So, the probability of x greater than 70is the same as the probability of z greater than 0.5. Because we have now reduced or cha or approximated or converted the given mu and sigma into a z score, and we will start working using the z score and using the standard normal table and the z score the area would correspond to the probable. So, what we have to understand is given mu and sigma. x is related to z and z is equal to x minus mu by sigma. So, for a given, we can calculate z, and then for the z value get some figures from the standard normal and then use it to solve for the given x. That is something which we will do. From standard normal tables, the area we will compute and show. So, there is the standard normal table and the area under the standard normal table. We will use that area to compute.Now, how do we do that? I have just shown these 2 tables cannot see the internet. So, I acknowledge that, and these tables are available in open source these tables are available in most statistics books, and it is not difficult to get these tables. You will see a clutter o number, and sometimes you will see a small picture which also tells you what this number represents. Now, in this table the picture is replaced b a sentence which as table values represent area to the left of the Z score. So, if you rea this table very carefully. Since I have to show the entire thing in one slide, I have t reduce the font size. So, you will not be able to read it. So, I am reading it for yo. Table values represent area to the left of the Z score. So, what is it? So, there I a Z score here, it starts from minus 3.9 and goes to 0 in this picture or table. And yo also see 0 0, 0 1, 0 2 0 3 and so on. So, if your Z score is minus 3.43, I am just plac in the mouse in that place minus 3.43. So, area to the left of z equal to 3.43, 3.43is here and that is 00.0003 is the area to the left of minus 3.43.Now, we go to the next table which is also a similar table. Again area to the left, and then we realize that here Z varies from 0 to 3.9 on this, and then we have Z on the other side. So, if we look at plus Z is equal to plus 3.25, let us say. So, 3.2 is here3.25 is here. So, 0.99942 is the area to the left of 3.25. So, if we use these 2 tables, what we understand is given Z value, these tables give us in the area to the left of the given Z. Since the total area under the normal standard normal curve is 1, area to the right of the given Z will be 1 minus the area to the left of the given Z. So, we will use this to solve some of these problems. So now what are we do here? So, somewhere here we said, now we have to find out what is the area for z equal to 0.5. Before we do that let us also understand something from the 2 tables. If you see carefully from the 2tables, you realize that Z equal to minus 3.9 is coming to Z equal to 0. And then you realize that z is equal to now if you have to see this little carefully. So, we see z equal to 0.00 the area is 0.5. So, Z is 0.00 is this point, this is Z is equalto 0.