Loading
Notes d'étude
Study Reminders
Support
Text Version

Market Basket Analysis Hands-On

Set your study reminders

We will email you at these times to remind you to study.
  • Monday

    -

    7am

    +

    Tuesday

    -

    7am

    +

    Wednesday

    -

    7am

    +

    Thursday

    -

    7am

    +

    Friday

    -

    7am

    +

    Saturday

    -

    7am

    +

    Sunday

    -

    7am

    +

Hello everybody, welcome to marketing analytics class, we are in week 8, session 5 and inthis particular session we will discuss about, how to do a market basket analysis in a handson way?So, if you go to the files section of week 8, you will find a file called Market Basketanalysis dot r and you will find also our grocery stores data.So, if I just opened this data, these each of these particular rows is a transaction,so if I just.So, you see that like whole milk, butter, yoghurt, rice and abrasive cleaner has beenbought together.And let say pip fruit, yogurt, cream cheese and meat spades have been bought together.So, this is a publicly available data we have taken it and then we will be using that usein Market Basket Analysis.So, for that there are two libraries one is arules and one is datasets that I will use,so, if I do not have as usual I have to install them.So, arules stands for association rules.So, I will install them and then I will also install the datasets library which will beuse.So, arules has been installed easy and datasets library.So, data sets is not there I think right now for the new one data sets is not there, solet us see whether we can do it without that.So, arules library I am calling or datasets library might already we installed here itis there in your base package, so arules has been installed.Now, the data is data groceries, it is a inbuilt data, so the same one that I have writtenhere I have there is inbuilt and it is written here all the data classes and etc.So, I have used that.Now, in I will find out the first thing is to find out the one items which items aremore prominent.So, item frequency plot is something that I am creating with top 20.So, this are my top 20 products, you can see that whole milk is the most common product,then other vegetables, then comes rolls and buns, then comes soda, yoghurt, bottled waterand etc.So, I think I have around 10,000 observations and out of them 2500 cases whole milk is occurringand the other ones is occurring like that.So, that is something which is important to understand.Just one minute, let me check how many rows I have, yeah around 10,000 it is there.So, now there are three definitions support and confidence we are already known.What is support?Support is basically the fraction of which our item set occurs, means?If I say that item a and b, given that if I if I asked you that what is the probability,what is the case that a and b occurs, c will also occur?So, if you buy a and b, c will also occur.So, support is basically probability of a comma b , and confidence is basically probabilityof c given a comma b , so that is basic thing.So, out of these case, how when the fraction of which our items set occurs in our dataset?That is support.Confidence is probability that the rule is correct, probability that this rule is correct.This rule correct means?Probability of this thing, so that is called your confidence.And lift, the ratio by which by the by which the confidence of a rule exceeds the expectedconfidence.So, what is the Lift is basically in other words, probability of c given a comma b byprobability of c, so that is lift.So, this is something these three things is important for marketing.The higher the lift, the better is your rule, the more stronger, more applicable, more actionableis your rule.Because if our rule is increasing the chances of being bought very high, then you mightwant to focus on that rule other than some other rules.So, rules in the previous presentation I have told that that will be analysed based on,so rules has to be a has to have a minimum support.Higher confidence means rule will be accurate and lift high means rule will be useful, simplestform.So, it has to be it has to be supportive that means you the rule is not a bleep, not a statisticalerror that he has enough support for the rule, you have to have confidence on the rule andthe rule has to be in useful.So, that is the three things based on which we judge which rule we will act on which willdiscard, so these are the three things.So, rules to find out the rules, we use the Apriori algorithm with the groceries dataset, so the syntax is Apriori within bracket data.And then what is the parameter?I am saying that the support cut off has to be 0.001 that means, 0.1 percentage timesit has to occur, as I told that, that is also very rare.And the confidence has to be 0.8 at least.So, at least 80 percent times it has to be correct.If I just run this, so there are lots of rules that has been created.And if I just want to see summary of those rules.So, let us say inspect the top 5 first 5 rules.These are the first 5 rules that has been created.Now, these 5 rules are not based on anything they are randomly created.Like the first rule says in the left hand side, we have got liquor, and then we allwe have got liquor and red blush wine and the right hand side we have bottled water,bottled beer that means, given that liquor and wine occurs, what is the probability thatthe bottle beer will also occur.So, corresponding support is 0.0019 that means, one up 0.19 percent cases that left-hand sideoccurs which is small.But when the left-hand side occurs, right hand side occurs with 90 percent probability0.9 is my confidence, the lift is also very high 11.2.So, otherwise it will not occur, but the moment the left-hand side occurs, what will be, aany way people not buy, but if left hand side occurs then the parts probability of buyingis 11.2 times normally buying, so, that is a huge lift.Similarly, curd and cereals, this has a support which is low, but when that occurs purchasingof whole milk is very high, confidence is 0.9 and so on.But the lift is only 3.6, why lift and 6?Because anyway people buy whole milk, I have shown you 2500 times out of around 10,000times at least 40 percent times people anyway by whole milk, almost 40 percent times.So, given that this occurs the percentage is 0.91 that is not very high, the lift is3.6 that means actually, normally people buy whole milk 0.91 divided by 3.6 this is thenatural probability of buying a whole milk.Similarly, 0.81 divided by 3.2, it is also will come 0.25, so that is the natural probabilityof buying whole milk.So, 25 percent times people anyway buy whole milk, so, that is why the lift is not so high.Now, but will I do?I will find out a summary of these rules.So, the average support is 0.00125 the maximum support is 0.003 and the minimum is 0.001that that is what the cut of that we have taken.But confidence is 0.821, the average confidence is 0.87, lift is 0.31 to 11.2 which is goodand so is the rules.So, we get some basic rules, the rules are distributed based on the sizes also.So, three size left hand side and right hand side together three items rule that 29 suchrules.Two items, four items rules there are 229 such rules, 5 items 140 and six items 12,so that is the distribution that has been given that total 410 rules based on the cutoff that we have created.Now, which rules are the most likely that means most likely means?Most accurate.So, I will this thing I will sort them based on confidence.So, rules is, I sort the rules by confidence in a decreasing order.So, if I now, just plot them, these are the rules which have confidence is 1, they aremost.So, whenever people buy these, they always buy whole milk, so that is thing.But you see their count is very small first of all and the support is also very small.Now, I can also see that which of these guys have a maximum length of 3 that means, thereare lots combinations has to happen then only this happens.For example, this one, you will see that root vegetables, whipped sour cream, flour, happens,then whole milk happens, this rule number 4 has 4 items which is big, or let us saythis one butter, soft cheese, domestic eggs, 3 items happening in the left hand side, thenonly 1 item happening in the right hand side that is also very big rule.So, if I just want to limit my size of the rule, I can do maxlen is equal to 3, maxlenis equal to 3 means?The maximum length both side included, right hand and then right hand side and left handside, the length of the rules will be 3 items at max.So, now if I do that, and then I sort it by confidence, what will I look like?So, I have created them, and now if I just inspect them, see the maximum items are 3in all the cases not more than that, again the confidence is very high, but the supportis small.So, this kind of this and that you can do, this kind of changes, this kind of playingyou can do.I will do two more playing and then I will close.For example, I can also say that, given this particular item is in the left hand side,what is the chances or given whole thing in the right hand side, what is what are therules?So, that I can say.For example, here what I am trying?I am putting up appearances is equal to list, left hand side is default and right-hand sidehas to have whole milk.So, the moment I am saying right hand side has to have whole milk and then I am expecting.So, then I am saying the right-hand side has to, it must that it will have whole milk andcorresponding then I am sorting it based on confidence and that values I am getting.Similarly, I can say left-hand side also must have whole milk.So, that will create this kind of rules, for left-hand side must have whole milk thereis no other choice.And you say I have written min-length is equal to 2, minlength is equal to 2 means?The minimum length of this particular thing has to be 2.So, minimum length is always 2, there will be rules which are more than 2.If I do maxlength is equal to 2 then there will be absolutely one item in the left-handside one has in the right-hand side.So, that is how we create lots of rules, you can sort by confidence, you can also sortby lift, you can sort by support or you can create a combination of lift and support.And so, because the rules are all stored here, so, if I if we want to find out that for support,I will give x percentage weightage, confidence I will give y percentage weightage, you cando multiple combinations to find out which rules are acceptable and based on that youcan take a call.So, here we are just creating the rules, playing with the rules left-hand, right-hand sideI am sticking up and doing something, but later the marketing decision is what willI do with these rules?For example, what will I do if I know that rice and sugar along with that whole milkis also being sold?Or if you remember the initial rules that we have created and the very first one justone minute.Yeah.So, now if I just inspect the rules, the very first one, yeah.So, this is also 3.9, not this one sorry.So, very first one was this.So, if I just checked, so what will I do with this?If I know the lift is very high, confidence is very high, support is not bad, what willI do with this rule?Liquor and wine, red wine and along with that bottled beer is also being sold, then I haveto put them together or I have to create a bundle or I can put them into different ends,so that people can purchase.So, these kind of decisions you have to take as a marketing manager.This part is only a data science to create hypothesis that hypothesis can be tested,can be used, once it is tested true to create marketing insights.So, that is what all about Market Basket Analysis.Analysis but is very easy, its understanding is very easy, and you have to use that inyour marketing context.So, that is all for week 8.We will meet you in week 9.And we will discuss about further aspects of franchises and later text mining.Thank you for being with me and I will see you in the next video.