Lecture – 06
Basics of Sampling
[FL] In today’s class, we will have a look at the Basics of Sampling. (Refer Slide Time: 00:19)
Now, sampling is an important technique and the first concept is the difference between census and sampling.
Refer Slide Time: 00:25)
Now, census as we know is something like, in the case of India, we have a population census every 10 years. So, what happens in a census? In a census, there would be a person who would be coming to all the homes, who would be asking a information about who are the residents, all the information about the family and other data. But the point to note here is that each and every person of this country would be counted, which is why it is called a census.
On the other hand, when we look at sampling we have this whole population and we have a very small portion of the population that is taken out for the study which becomes a sample. So, the major difference between census and sampling is that in a census all the members of the population would be studied, whereas in sampling only very small proportion of the people or the objects of study would be studied and that small portion would be called a sample.
Refer Slide Time: 01:32)
So, what is the objective of sampling? The objective of sampling is to secure a sample which will represent the population and reproduce the important characteristics of the population under study as closely as possible. What do we mean by that? Now, in any case it is always good to have a census of the complete population because that would give us a hundred percent correct information.
But when we do a sampling, when we are doing a census, it will take quite a lot of effort, it will take quite a lot of time and it will involve quite a lot of money. But if in place of studying the complete population if we can get the same amount of data with appreciable levels of precision and accuracy from a very small sample then we would be able to reduce our effort, our cost, and our time.
So, the aim of sampling is to do the sampling in such a way that even this small bit of information that we are taking out from the sample is able to adequately represent the complete population, so that we have a good amount of data.
Refer Slide Time: 02:40)
Now what is the population? The word population is defined as the aggregate of units from which the sample is chosen.
So, in the case of humans, we know the human population in the case of our census would mean, all the residents, all the citizens of the country. In the case of wild life, the population would mean all the animals of a particular species that we are studying, which are there in our protected area with the national park, a sanctuary, a tiger reserve whatever. So, for instance, if you wanted to know, what is the proportion of animals say, chital in our park that are diseased. So, the population would mean all the individuals of the chital species that are present in our national park. Next is sampling unit.
Refer Slide Time: 03:31)
Now, a sampling unit may be an administrative unit or a natural unit like topographical sections and sub compartments or it may be artificial units like strips of a certain width or plots of a definite shape and size. The unit may be must be a well defined element or groups of elements identifiable in the forest area on which observations on the characteristics under study could be made. The population is this sub divided into suitable units for the purpose of sampling and these are called sampling units. (Refer Slide Time: 04:09)
So, to show it graphically, suppose this is our complete forest and we have divided it into some compartments. Now, those compartments could be artificial. So, for instance, we take strips of equal width. So, in this case we have divided the whole forest into this gridded arrangement and we can very clearly demarcate this portion of the forest with this portion of the forest.
So, we can very clearly see that this portion is different and this portion is different. We can always go back if we say, call this portion 1, this one is portion number 20. So, we have clearly demarcated it in such a way, that if ever we wanted to go to this place again, if we set that we wanted to go to section number 1, we would be able to clearly identify that this is our section number one.
So, in this case sampling unit would mean all of these sections together. So, to take another numbering we have, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40.
So, in this case, we have 40 different administrative units or natural units that are well defined and are covering our forest completely. So these 40 units in total will be called the sampling unit the next thing is a sampling frame.
(Refer Slide Time: 06:31)
A sampling frame is the list of sampling units which is called a frame. So, essentially when we are considering this whole area of the forest, when we say that our units are 1, 2, 3 and so on up till 40, then this list becomes sampling frame. Now once we have decided on a sampling frame, now it is important to note here that this was one arrangement in which we have put up a gridded system.
(Refer Slide Time: 07:11)
On the other hand in the forest, we could also be having an administrative system. So, in our country we have the arrangement such as this. So, we have a beat or we have compartments which are the smallest areas, the smallest management units are the compartments, then we have the beats, then we have ranges, then we have sub divisions, then we will have a division.
Now, suppose this whole forest forms part of a division called division A. Now this division is divided in to two sub divisions which are sub divisions A 1 and A 2. Now the sub division called A 1 is divided into say, three ranges. So, these three ranges are called say R 1, R 2, and R 3. Now a division is governed by a divisional forest officer. A sub division is governed by a sub divisional forest officer. A range is governed by a range forest officer.
Now, within this range these ranges. So, let us consider range R 2, it will be divided into a number of beats. Now each of these beats; so, let us called these B 1, B2, B 3, B 4, B 5, and B 6. Now each of these beats in the range R 2, all of these beats B 1, B 2, B 3, B 4, B 5, and B 6 are governed by a forest guard and in each beat, we will be having some compartments. So, let us call this C 1, C 2, C 3, C 4, C 5, C 6, C 7, C 8, and so on.
So, when we say compartment C 2, we can very clearly say that this compartment C 2 is a part of beat B 1, which is the part of range R 1, which is a part of sub division A 1, which is a part of the division A. So in place of our previous arrangement of a gridded arrangement, we could also make use of these administrative divisions.
Now, once we have decided on the frame, so, in this case the frame was 1 to 40. In this case, our frame would be say, all the compartments. So, we have C 1, C 2, C 3, so on, say up till C 100. So, all this list becomes our sampling frame.
(Refer Slide Time: 10:25)
Now, from this sampling frame, the next concept is that of a sample. So, coming back to the slides. A sample is one or more sampling units selected from a population according to some specified procedure which will constitute a sample. So, essentially what we are saying is that; coming back to the board; so, once we have decided on our sampling frame our sample would say consist of 10 compartments out of these 100 compartments.
So, now the question is which of these 10 compartments should form our sample? Do we take the first 10 compartments? Do we take the last 10 compartments? Do we take every tenth compartment in which case we will be taking C 10, C 20, C 30, C 40 up till C 100. So, the question is what is the rule from using which will be taking out these sampling units from the sampling frame, that would constitute the sample? So, one or more sampling units that are selected from a population according to some specified procedure will constitute a sample. The next concept is that of sampling intensity.
(Refer Slide Time: 11:33)
So, we have intensity of sampling is defined as the ratio of the number of units in the sample to the number of units in the population. So, now, coming back to the board. Suppose we have a scenario 1; in scenario 1, we took a sample of 10 compartments from division A out of 100 compartments. So, in this case our sampling intensity would be the number of compartments in sample divided by the number of compartments in the division which would be 10 by 100 which is 0.1 or 10 percent. So, this is our sampling intensity. (Refer Slide Time: 12:39)
On the other hand, in place of taking 10 samples; in another scenario 2, we took a 20 compartments in sample, out of 100 compartments. Then our sampling intensity becomes number of compartments in sample divided by number of compartments in the division which is 20 by 100 is 0.2 or 20 percent.
Sampling intensity (SI)
?????? ?? ???????????? ?? ??????
?????? ?? ???????????? ?? ????????
= = 0.2 ?? 20%
So, as we see. So, we can see here that as we increase the number of compartments in our sample, out of a total of 100 compartments our sampling intensity increases. Now, what happens when we have all the 100 compartments in our sample that were there in the division? So, in that case our sampling intensity would become 100 divided by 100 which is 1 or 100 percent in which case, our results will be very close to the sensors results. (Refer Slide Time: 13:53)
Now, next when we are selecting our areas; coming back to the board. So, in this case, we had selected ours our sampling units in the form of these grids, but in place of these grids we could also select some plots. So, what do we mean by a plot?
(Refer Slide Time: 14:15)
So, in this forest area, we could say that we will be having; we use our computer to find out 3 coordinates. And in these 3 coordinates, we draw circular plots and we say that that these three circular plots become our sample.
So, in that case our sampling intensity would be the sum of the areas of plots divided by the total area of forest. So, essentially if this was areas a 1, a 2, and a 3 and in the total area of the forest was A. So, our sampling intensity would be a 1 plus a 2 plus a 3 divided by capital A.
∑ ???? ?? ?????
????? ???? ?? ??????
?1 + ?2 + ?3
Now, in place of taking circular plots, we could also take rectangular plots. So, things like this. So, in place of grids, we could also go for rectangular plots we could also go for strips. (Refer Slide Time: 15:20)
So, in the case of a strip, we take this whole area and we divide it into strips and let us say that we include every third strip starting from strip number 2. So, this is strip number 2, this is a part of the sample, then we leave two others, then we take the third one, then we leave two others and then we take the third one and so on.
So, we can even use strips to de mark it R our samples. The forth thing is topographical units which is generally used in the hills.
(Refer Slide Time: 16:01)
So, essentially if you have hills you could say that we will use the contour lines. So, contour lines will tell us the height.
So, let us say that this is a contour line. These are contour lines that represent say a height of 100 meters, 200 meters, 300 meters, 400 meters, 500 meters, and 600 meters. So, we could even say that our sample is all the area which falls between two contour lines and will take every other strip as our sample. So, this is a part of our sample, this is a part of our sample, this is a part of our sample. So, here we have taken a sampling intensity of close to 50 percent and we have taken topographical units.
So, the plots that are used to demarcate our sampling units could be any of these we could have. So, coming back to the slides, we could have circular plots, we could have rectangular plots, we could have strip plots. Now, strip plots are generally used in plane areas or we could have topographical units which are generally used in the hilly areas.
So, once we have decided on the sampling units, what is the formula that we use to take out are sample from the sampling units.
(Refer Slide Time: 17:31)
Now that formula would give us different kinds of sampling which are these are the most common types of sampling: simple random sampling, systematic sampling, stratified sampling, multi stage sampling and PPS sampling which stands for probability proportional to size sampling.
(Refer Slide Time: 17:51)
So, let us now look at these in greater details. The first is a simple random sampling, now a sampling procedure such that each possible combination of sampling units out of the population has the same chance of being selected is referred to as a simple random sampling. For example, lottery or random numbers.
So, coming back to our example of the grids or let us take the compartments. So, here we have 100 compartments C 1 to C 100. Now, if we say that we are going to select 10 compartments and the method of selecting those 10 compartments is lottery. So, what we do is we take out chits of paper, we write all these names 1 to 100, then we fold them, we put them into a box, we mix it up and then we take out 10 slips.
So, the numbers that are there will form a part of our sample. So, such a procedure will be known as a simple random sampling because all of these 100 compartments have an equal probability of forming a part of this sample. Another way of doing the same thing is by using random number generators. So, we can write a computer program to generate random numbers for us and those random numbers would be from 1 to 100 and then we take the first 10 random numbers.
(Refer Slide Time: 19:19)
So, formula such as these form a part of the simple random sampling. The second sampling is known by the name of systematic sampling. Now a systematic sampling employs a simple rule of selecting every kth unit starting with a number chosen at random from 1 to k as the random start. So, essentially this means that when we took out these numbers; so, let us take the example of our 40 grids.
(Refer Slide Time: 19:47)
So, we have, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40.
So, we have these 40 grids and we are doing a systematic sampling. So, in the case of systematic sampling, we will first decide that how any numbers do we want to out of these 40? Let us say that we want 10 numbers out of these 40 numbers. So, we have 40 numbers and we have ten units that will form our sample. So, we will have the value k is equal to 4. So, if we take every fourth value, then we will have a total number of 10 samples here.
Now, those fourth values can start from any of these first 4 and that number would be taken at random. So, let us take the second value. So, this becomes our first value for the sample. Now, the next unit would be two plus the fourth one.
So, we will leave 1, ,2 3, and we take the forth one because our k is equal to 4 here and we leave 1, 2, 3, take the fourth one; 1, 2, 3, take the fourth one; 1, 2, 3, take the fourth one; 1, 2, 3, take the fourth one; 1, 2, 3,, take the fourth one; 1, 2, 3, take the fourth; 1, 2, 3, take the forth one; 1, 2, 3 take the fourth one. How many have we taken till now? 1, 2, 3, 4, 5, 6, 7, 8, 9, 10.
So, these 10 values that is 2, 6, 10, 14, 18, 22, 26, 30,, 34 and 38, this become part of our sample. When we have done this whole exercise, we have constituted our sample from the population by making use of a systematic sampling.
Now, if we take the starting position as something else, suppose we started at number 3.
So, here also we will perform the same procedure. So, we leave 1, 2, 3, take the fourth one; 1, 2, 3, take the fourth one. So, essentially these values would form part of our sample. The main thing to note in the case of a systematic sampling that here we are making use of some formula. The formula is to take every kth unit and how do we get the value of k? We get the value of k from our sampling intensity.
So, in this case we had said that we wanted 10 sampling units. In the sample out of 40 sampling units in the population, so, we get a sampling intensity of 10 by 40 which is 25 percent. So, once we have decided on the sampling intensity, we can get the value of k and from that value of k and selecting the first value at random, we can form a sample. So, this is all about a systematic sampling.
(Refer Slide Time: 23:21)
Now next thing is a stratified sampling. The basic idea of an stratified random sampling is to divide a heterogeneous population into sub populations, usually known as strata. Each of which is internally homogeneous in which case, a precise estimate of any stratum mean can be obtained based on a small sample from that stratum by combining such estimates, a precise estimate for the whole population can be obtained.
(Refer Slide Time: 24:01)
So, what we are referring to in this case is that suppose in our area we have in our big patch of forest, we have some areas that are Teak forest, then we have some areas that have water so, we have a pond and then we have some areas that have a grassland. And we wanted to take a sample of our complete forest so, we could divide our sample into these thre3 areas which are already natural units. So, why do we divide it into these natural units? Because considering any animals say let us consider a water bird. Now a water bird will be found in the pond area, but will not be found in the grassland area, will not be found in the Teak area.
On the other hand, if we consider an animal such as chital. So, chital will be found here chital will be found here, but chital will not be found here because chital is a deer and this deer uses grassland for feeding and might even be using the teak forest as a shelter or a resting place.
So, in such a scenario when we are considering the chital population, we can say that our unit is divided into; our whole forest is divided into three strata. So, the first strata is the teak forest and if we take any sample from the teak forest, we would be able to extrapolate the results to the whole of the teak forest areas but we will not be able to extrapolate those results into the pond areas and similarly, we will not be able to extrapolate those results into the grassland areas. But if we take another sample from the grassland area, we would be able to extrapolate the result into the whole of the grassland area but not into the teak area and not into the pond area and similarly, for a sample in the pond area.
So, in this case we have divided our whole forest into three strata : the teak forest, the grasslands and the pond; then we will take samples out of all these three strata. Now those samples could be taken out by means of a simple random sampling or by means of a systematic sampling. When we are we are using simple random sampling. So, in that case; so, in the case of the teak forests, we divided it in to grids, then we took a random number generator and we took out these portions as part our sample.
So, we have these three portions. So, we are doing a random; a simple random sampling in the teak area. Similarly in the grassland area also, we are dividing this whole area into grids, then we are using a lottery system to generate the sampling units that will form part of our sample. So, suppose we got these units. So, these were also selected randomly and similarly, for the pond area.
So, from these three samples, we are extrapolating the results of the teak area into the whole of the teak area. We are extrapolating the results of the grassland samples into the whole of the grassland by multiplying these by the areas of these three stratas. So, in this case, we get a stratified sample. So, this is a stratified random sampling in case in place of using a random sampling. In all this stratas, we went for systematic sampling; it could be a stratified systematic sampling of all the 3 stratas.
(Refer Slide Time: 27:41)
The next thing is a multistage sampling. Now, a multi stage sampling is a procedure of first selecting large size units and then choosing a specified number of subunits from the selected large units to get a sub sample.
(Refer Slide Time: 28:01)
So, what we mean in that case is that if we have this whole forest that we have divided into grids and suppose we have 100 grids. When we are doing a simple random sampling, we will take all these 100 grids together and put it into our lottery system. In the case of a multi stage sampling, we would say, that we are dividing this whole area into say 4 large units. So, we take this line and we take this line and then we will say, that for section A, B, C, and D, for every section we will be doing a simple random sampling.
So, for instance in the case of section A, we did A sample random sampling and we got these two units; in the case of section B, we got these two units; in the case of section C, we got these two units and in the case of section D, we got these two units. So, what we have done in this case is, in place of putting 100 grids together, we have divided this whole area in to 4 sections and for each section consisting of 25 grids. We are doing a simple random sampling.
So, because we have done our procedure in to in two stages, one of selecting the large sized grids and two of doing a simple random sampling for each grid this becomes a part of a multi stage sampling procedure.
(Refer Slide Time: 29:41)
The fourth procedure is probability proportional to size. When units vary in their size and the variable under study is directly related with the size of the unit, the probabilities may be assigned proportional to the size of the unit this type of sampling where the probability of selection is proportional to the size of the unit is known as PPS sampling.
(Refer Slide Time: 30:11)
A good example here is the trees in a forest. So, let us consider a forest in which we have some large sized trees, some small sized trees and some medium sized trees and let us consider some very small sized trees. Now, if we wanted to figure out, what is the total amount of bio mass in our forest, then if we say, left out this small tree. It would not make much of a difference to our calculation, but if we left out this tree which is a large sized tree and is having quite a substantial amount of biomass, then our results will be very different from the actual true results. So, we will not be getting accurate results in this case.
So, a probability proportional to size sampling says that for the large sized samples the probability of getting into the sample should be greater than for the smaller sized samples like this. And for the medium sized samples the probability should be in between. So, we make use of this sort of a sampling when we are using a point sampling data in the case of forest.
(Refer Slide Time: 31:47)
So, in that case we make use of say, a pen and we put it at our arms length. And we would say that any tree when we are looking from here an any tree that has a bole or its trunk that has a greater diameter than this, then we would include it in to our sample.
So, this type of a sampling would become a probability proportional to size sampling because we will be incorporating the larger sized trees into our sample by may be disregarding the smaller sized trees. So, this is the probability proportional to size sampling.
So, we looked at 5 different kinds of samplings which are the most important types. The simple random sampling in which all the sampling units are put into a lottery system and every sampling unit has an equal probability of getting into the final sample. The second was the systematic sampling, in which case we made use of a formula. Then third was multi stage sampling, a stratified sampling, in which we had divided our whole of the area in to strata which was internally homogenous. So, internally homogenous as in forest is homogenous grassland is homogeneous pond is homogenous.
So, that is the stratified sampling. The fourth one was a multi stage sampling, in which our sub units may or may not be homogenous. This is just a way of simplifying our lottery system. So, in place of getting samples out of 100, which is also important because in this example if we had taken samples out of the complete grid say, 100 samples, it is also possible that we got all our samples that were concentrated in this region.
So, in that case we would say that our results may not be completely representative of the complete forest. So, when we have divided it into these grids, even though these units may or may not be homogenous we are having a much greater confidence that the results from our sample would be closer to the results of the whole of the population. So, this was the multi stage sampling and their final one was the probability proportional to size sampling, in which the larger samples, the larger sampling units have a more preference of getting into our sample. So, that is all for today. Thank you for your attention. [FL]