Marketing Analytics Professor Swagato ChatterjeeVinod Gupta School of Management Indian Institute of Technology, KharagpurLecture 20 Demand Forecasting and Pricing (Contd.)Hello everybody.Welcome to Marketing Analytics course.This is Doctor Swagato Chatterjee from Vinod Gupta School of Management, IIT Kharagpurwho is taking this course.We are in week four and we are discussing demand forecasting and pricing.So still now we have talked about a little bit of pricing and we have talked about demandfunctions.And now in this particular class I will discuss about how to use analytics to forecast a demand.So in your data set, you will have a filed called computer, I would say sales dot excelfile(Sales.excel).So, sales dot excel(Sales.excel) file will have a data set which looks like this, whichhas if you carefully see that it has year component and then the timestamp.Time stamp means whether, it is not timestamp actually, it is a time period component.And the month and the sales, so for 2016, from 2016 till 2019 I have 4 years of data.And I have monthly data of sales of something.And the timestamp is nothing but the month-year combination of that.So, as TS increases, you go, come closer to the current thing.And now our job is to try to estimate what this particular thing is, how I can forecastthe next demand.So what happens is, in real life situation, what happens is that if you have Xt, Xt isconsidered to have three components.One is trend, which is Tt plus one is seasonally, St and then.So there are lots of ways of actually estimating a time series.But here we will be focusing on the simplest way, what we will break this particular thinginto three parts, one is the trend component, one is the seasonality component and one isthe error component.So trend, what is trend component?So, let us say, if I have a time series which looks like this, if this is the time series,you can easily see that I can plot a middle value which goes like this, the dotted one.So this is the trend, the overall, the overall function I would say function of this particularthing whether it is going up, going down, going up and then coming down, or somethinglike that.That is trend.And then, so one a little bit of less undulations are there.And then seasonality is more related to certain patterns which come back after certain periodof time.For example, let us say, the sales of a cold beverage is going up, why?Because population is going up and if population is going up and their purchasing power isgoing up, then the sales of cold beverage will go up over years.So that is something that is a positive trend.But in the summer season, the sales goes up much higher than the winter season.So, you will see that within that going up, within the sales going up like this, therewill be a jump in the summer season and little bit dip in the winter season and again a jumpand dip, again a jump and dip and so on.So, that part is called seasonality which, where the pattern actually repeats itselfover certain period of time.So when this particular seasonality happens for a very large period of time, we call itcycle which is generally we do not use to forecast because forecasting is generallydone for a shorter period of time and then the trend is like this.The rest of the part is error.Now, this error part is something which your time series cannot explain.What is time series cannot explain, what does that mean?That means that for example, let us say, if I ask you that how much will you score ina math exams which will come in the next two months, the best bet of your math’s examscore that will happen in last two months, in the next two months is whatever you scoredin the last two months probably.So what I am trying to say is that your recent past performance of math’s is the closestpredictor of your recent future performance.So similarly, for any time series there are some part of the time series is past whichexplains what will happen in the future.So that is the part where we do time series analysis.The trend is how it actually talks about, how the past of this particular observationis going to affect the future.And seasonality is also related to how the past, not exactly past, how the time willimpact the changes in the x variable.But other than my past, other than my time factor, there can be so many other factorswhich might, which are cross-sectional, which is at that time period, which might impactmy performance.For example, let us say, in case of your score in math’s, probably your health whetheryou are okay on that particular day or not will actually impact your performance.So, that is the component, all those components are stored in Et.So, if there is no such component, if absolutely there is no such component, then this Et willhave a mean value of 0 and absolutely, but if by chance that if there is certain thingswhich might impact then you have to find out Xt minus Tt minus St which is Et.And then you have to find out Et as a function of various stuff.For example, let us say in this context price, so price at tth time period, price of theproduct and price of the competing product, P dash t.So, all of these things might impact and this is where the demand function actually comesin So you, if you get a trend and seasonality, let us say from this data that we have isa sales data.Now, this sales has happened because of multiple reasons.Something is actually related to its past sales, so if I remove that past sale component,the whatever effect of past sales on present sales, remove that component and if I alsoremove the seasonality component that okay, in certain seasons the production goes up,that is why the sales goes up.I will actually remove those part and then I will keep the rest.But the time factor is not anymore affecting.I will get some error.Now, by chance if I also have some price data here, for every time period price of the productand price of the competing product, so price let us say something like this.So, if I have this data, these two columns, then what I will do?I will try to find out how the error part whatever is remaining is related to this price.So, I will do a regression on the error with price and price dash as my independent variables,and I will try to create a demand function out of it.So, before I do that, I have to do the, find out how to remove the trend and seasonalitycomponent of a data set.So that is something that is more important to do.So what I am going to do here is the first job is to try to plot in.And the plot looks like this.So, I can easily see at least in this particular plot that you can see that, so there is apattern that, it is 8 let us say, and this one is around 18.So 8 to 18, 11 is something that there are coming back.Now this is 18.I will explain if the, this is 29, so then here also you will see that the jump is 11.And then if it is 29, then this one is 40, 41.So I can say that 11 or 12 will be something where the pattern is repeating itself.Now, then you should not only see the picture, you should also focus on the common sense.So, it is monthly data.So probably, it is not 11, 12 is the, so it is only because of these 3 years I got 11but probably it is a 12 number of months actually which create the seasonality factor.So what I will do is I will know that, so this guy has a seasonality of 12 months, somethinglike that.So one job, one good job is to do that, so I will create first thing is I will try tofind out the trend.So the easy way to find out the trend is to do a smoothening.Now, you can also see that 7, here it is 8 and here it is I think 10.And then here it is 13 and then this one is probably 16.And this one is 18, so 2, 3, 3, 2.And then this one is let us say 18 to 22, 4.And then this one is probably another 22 to 325, so average 3, in average 3 there is asmall jump and coming down, small jump and coming down, small jump and coming down.So, I can also think about that as my seasonality component or something like that.So keeping that in mind, I will be finding out the trend.The easy of finding out the trend is something called simple moving average method.What is simple moving average?Simple moving average is saying that, average of these three terms.So average of one point earlier, now and one point later, something like that.Now, here I have chosen the timer period for the simple moving average is 3.You can choose as 5 or 7.By chance if you do choose even number, then you have to do double smoothing.Two times you have to do the smoothing.So, once with the even number and the second time with 2 as your time period.But if you do odd smoothing, then you are more or less safe.Then that it is centralized.So I will go ahead with doing with odd number.So 3 is the thing that we chose by seeing that curve, so 3.So because there is no past data for this particular sale, this is remains blank andbecause there is no past data for this particular sale, this also remains blank.So these are the two values that remains blank.Now, if I just try to plot this table, whatever I created.The simple moving average table if I try to plot, you will see that it is a little bitof smoothened version, the undulations has come down.But this looks very-very beautiful.That it goes up and comes down, goes up and comes down, goes up and comes down and inthe SMA itself, in the, I would say in the moving average itself, there is a seasonalitycomponent.That is there, that is number one.Now if I find out the error, so how much did I do mistake, how much I could not explain?That is nothing but this minus this, something like this and then I, if I just drag it up,it is like this.Now if I just try to plot this part which is the part which I could not explain, itlooks like this.So this is very, very I would say peculiar.So, if you see that this particular guy has a, it is more or less the average value is0, more or less it has come down to the average value, the trend value, the trend component,the increasing or decreasing component it is not there.But there is a seasonality component that is there.And I think the seasonality size is almost 3, So, every three time periods there is ajump and coming down.So you will see that this peak, this lower is 0.7 and this lower is 0.9 and this loweris 12.And then this lower is 15 and this one is 17, this one is 19.So, almost 2 or 3, something like that, this one is 21.And then this one is probably 24, so 2 or 3 is something where things are repeatingitself.This is zig zig, zig zig zig.So, I can take also 2 or 3 depending on the situation.So, let us go ahead because there are lots of, so for example, 24, this is 26.This is 28 and this is 30.This one is 33.So I can take also 2 probably as my in the picking up.So how much will be the every jump?So let us do that.Now there are two jumps.One is to find out the seasonality component and another thing is to find out the SMA,how I can estimate this SMA.Now carefully see, if I do the first job first that if I try to create a function which willdefine what this SMA is, I will pick up this time series, two time series components.So timestamp, or let us say I will pick up the whole thing here and I am pasting it asspecial.And if I just zoom it up and I remove this, because this is the part that I do not focuson, this is the thing that I am interested in.So I can do a regression on this one.I have to find out a linear equation, so I can find out, do a regression or I can easilyjust simply do a time series component.So, let us do a regression.And I say that okay, TS is 1 to 48, and timestamp is a numeric variable.And one job of this thing is to do a let us say, if there are 12 time periods, so 1, 2and then 10, 12.So I will put a value of 1 here, all of these values are 1 let us say.And I will see that okay, so if the predicted value here, predicted value here is if thisvalue is 1, so if this, I will actually VlookUp this month value, the small table and I willput F4 because I will drag later and then 2 comma false.So you take from the second column and I will drag it up.So all 1’s because they are, now what is the error part?The error part, how much is error remaining after this thing?Error is nothing but basically, or error square is this minus this square.That is the error square that I got.And what is RMSE?RMSE is equal to basically square root of mean of or average of, square root of averageof this particular column.So I got the RMS.Now I have to find out such values here such that this minimizes.So what I will do, I will go to the data solver, I will minimize i15 by changing what?By changing these values and these values now can be negative as well.So not positive and solve.And the solution is giving me something and carefully see what is solution giving.So the solution is saying that, that this is the thing that will repeat itself.This is the thing that will repeat itself.And how the repetition looks like?So this particular table, so 1 to 24, this will go on repeating itself.And you will find out this thing.So if you remember, the table was, the actual SMA values were these, the actual SMA value.So, if I just draw two lines, this is the, actual series 1 is the red line which is theactual line.And the red line is the one that I predicted and they are more or less same.They are pretty good in terms of the predictions.So, I can say that, so this particular thing can actually talk about the trend part.So this is something that is done that I can find out.I can also run if you want, you can run a regression equation with timestamp minus 12as your variable.So, this minus 12 or this is square as your, as your x variable and try to predict.Now, this is how I am trying to predict the trend part, how the trend will look like.So if I, if you ask me that okay, what will be the trend component here?I will find out, okay, so this value is 12.Whatever value is corresponding to 12 put that.So the value is 650.So I will just drag it up for the next probably 12 and then again 1, 2, 3, 4, 5, 6, 7, 8,9, 10, 11, 12.So, next one year’s data I know that this one will be the trend component.So this is something which I know.Now, next component, now the problem with this model is that I am saying that the trendis same, fixed.There is no going up or going down.So sometimes we prefer, that is why I am saying that you can also run a regression equation,regression equation will give you a movement but it will be slightly going up.So you can actually instead of doing this, you can actually do this one as well.So, you can probably delete this and try to find out SMA as a function of this time seriescomponent.So if you remember it went up and then came down and then again went up and then camedown and so on.So, I can use this month 1 and then timestamp both as my x variable, these two is my x variableand this is my y variable and try to predict that.And that will also give me some kind of solution, the problem is that here you have to treatthese values, the month values as categorical variable.So, you have to open it in R file, change this one to a factor variable and then runthe regression.This is your y variable, this timestamp is your x variable which is continuous and monthshould be a categorical variable because month 1, month 2, month 3 are actually January,February, March.This has to be treated as factor variable.You can find out equation for your trend component.Now what is to do about this one, the error part which is, now here what is this errorpart?The previous slide, the previous slide error part is we have actually already consideredthe….So we have considered as I told, X t is equal to what?Trend t plus St plus Et So what then?What is that?Xt minus Tt is absolutely seasonality and error part.So this is the part that I will focus on.The seasonality part and the error part and I saw that the seasonality is either 2 monthsor 3 months or something like that.So I can do the same thing that I did earlier.I can say that okay, this is of 3 months.And the values are 1, 1, and 1.You can try out with two months also.I do not know which one will give better result.The methodology is same.So I say that okay, VlookUP, VlookUP, basically I have to repeat this particular value everythree times.So if I have to repeat this particular value every three months, I have to divide it by3, this month I have to divide it by 3 and try to find out, so the timestamp, for example,this timestamp I will actually divide it by 3 and I will find out what is the remainder.So what was that?So, remainder is probably, so if I am not wrong which is, I have to find out the remainder,mod.You can see, remainder after a number is divided.Mod of this by 3.So this, 3 basically.This comma 3.So that gives you the remainder.So see, 1, 2, 0; 1, 2, 0; 1, 2, 0 gets repeat.So this is the season component and I am saying 1, 2, 0, these are my three seasons.And I am saying that VlookUp, VlookUp of what?VlookUp of this particular value, the season value.For this one, it is this one is value and then what?In the small table, and I have to put 4 because I will actually drag it and then 2 comma false.So that gives me the values and if I just drag everyone comes one.The last value will be not there.So this is my predicted seasonality, predicted season.And what means the error now?The error now after seasonality, so error 2, error 2 is nothing but basically this,this is what?Is this, square it up, no, I have not squared it up.So basically, is equal to this error minus this predicted square, square of that andthen how much is the RMSE?The RMSE is nothing but the square root of average of this particular column.Not the 0, 1; not the last 0, 1.So, if I just do that and press an enter, I get the, so what is then?I have to again optimize this RMSE.How to optimize?I will go to data, click on solver, and the solver saying that this is my reference cell.And which one I will change?I will change these three cells.And I unconstrained, so solve.And it gave me solution and it told me that this month should be the seasonality.519 and minus 27, and this is the predicted values and this is the actual values.And now RMSE is coming as 65, you can also try out with two time periods and see thatwhether RMSE is coming up further lower.And if I just try to predict, plot these two and see that how good it is, I will see that,not so good probably.I should have a better option than this one.So, I have to find out whether some other variable is predicting better or not.So, probably two time period or three time period.Two time period is something probably might work better.You have to check that.For example, how to do that two time period?The simple thing is if I just change it 2, then this one will be 1 and 0, so I removethis and then I put this as here, 1 and 0.And then what is the VlookUP of C3?In K2 to L3, 2 falls, so this part remains same.And then the error also remains same, RMSE remains same and then I will run the solveronce more.65, if you should remember that previous one was 65 or something like that.Solver, solve it.So L2 to L3, solve.And now it went up actually.It did not work well.So I will go with the three time period only.And if I just see the picture here, the picture is coming up to be something like this, whichis worse.So, previous one was better so something like this you can do to find out the undulations.Now, there can be other things.These undulations, they are one that we could not predict which is still remaining here.The square of that is still remaining here, can be actually predicted by many other factors.Not everything is actually coming from your past.So you do not have to worry if there is something which is still left after removing out theseasonality component and trend component.That remaining part is something that you try to regress on price or let us say somethingelse.So what we do is in that case that we try to find out this that okay, so now I havefound the seasonality component.So, I say that Xt minus Tt minus St is equal to Et . This should be, this should be somethinglike, it should look like this.But it is, no trend component is there.And some, so absolutely this is 0 thing and it is actually distributed over 0.Now what are these particular spikes?Why these spikes are coming?I will, what I will do is I will write E t is probably alpha plus beta 1 into P t, myprice.And then beta 2 into P dash t, someone else’s price and so on . And then if I have let ussay advertisement expenditure, if I have price, if I have probably some other kind of promotionsgoing on, all those variables will now come in.After the time series component has been taken care of, all those components will now comein to do the predictions.There can be also some things where you can say that okay, so there were cases where wesee that the sales actually is not only related to current pricing but also related to pastpricing.For example, let us say during the festival season before the festival season or duringthe festival season, during the festival season prices goes up.Before the festival season they give huge discounts, 50 percent discount in pantaloonsand let us say Big Bazaar and here and there.Lot of huge huge discounts and people go and buy.Big billion day, after big billion day you will see for a very long period of time, probably1 month or 2 month there will be a dip.So the sales is very high in big billion day or etcetera.Will it actually impact the sales later because people has accumulated and stored lots ofstuff and they have created their inventory and they will not buy?So you can say that the pricing at T minus 2, T minus 3 of time period is actually affectingyour current sales, also the sales at that time actually affecting the current sale.And so those kind of factors are there that those are ergonometric issues that whetherthe price of two time periods back is dependent variable independent variable or the salescomponent is an independent variable.But in general, if I actually remove the sales part, if I remove the seasonality part thenI generally do not have an issue.There are other methods, there are other methods but here if you see I am only consideringthe past effect of three time periods at max.But there can be a scarce situation where the sales of the current time period is dependenton the sales of three time periods or two time periods back.So we can also use concepts of ARIMA, ARIMAX is actually classic example.ARIMA part is the part where you take care of the component which is the time seriescomponent, you remove all of those stuff and then you also include the cross-sectionalcomponent in the x variables.So that is also something can be used in forecasting.We are not actually focusing on that part because that is something which will be, whichitself is a course.The time series forecasting and etcetera itself is a, is probably 20 hours, 30 hours courseand it will actually overshoot our this particular course’s purpose.So that is why we are not going to that but this is something that is very basic, anybodycan use it in any context and can get fairly good result.So by saying that, I will be meeting you in the next video and we will talk about a littlebit of a different kind of forecasting and different situation.We call it mass diffusion model where we will see in the next slide about it.Thank you very much for being with me.
Log in to save your progress and obtain a certificate in Alison’s free Segmentation and Demand Focusing in Marketing Analytics online course
Sign up to save your progress and obtain a certificate in Alison’s free Segmentation and Demand Focusing in Marketing Analytics online course
Please enter you email address and we will mail you a link to reset your password.