Plots and Charts and Loading Data
00:00 Speaker 1: Hello, everybody, and welcome to part seven of section two for our Data Visualization with Python and Matplotlib tutorial series. In this part, what we're gonna be talking about is scatter plots. So with that, let's go ahead and get started. An example of when we might actually use a scatter plot is if we're trying to show maybe groups or correlation categories, stuff like that, sometimes you'll use a scatter plot. Now let's cover an example where you might do it and we'll do a correlation or a trend example. So I think we could continue on our last example which was with these test scores and see if we can't find some sort of correlation with test scores and time spent on tests or something like that.
00:55 S1: So the way that we might do that is with a "time_spent" variable and from there we can just populate that with supposed times that students spent on the test. So the question really is time spent on test, and the reason why we might ask that question is to kind of derive whether or not more time on the test equaled, in general, a better grade on the test. So from here, let's go ahead and get started. Let's say the students had an hour to do the test, and this is how maybe long people spent on it. So just to make it easy for me to see when we're done, I'm gonna go ahead and add one more space there, that way everything lines up, and let's get started. So I'm just gonna put... You can put whatever numbers you want in here. I'm just gonna kind of make them up. So here we go.
01:47 S1: 11 minutes, this guy spent 10 minutes, this guy spent 22, this guy spent 23, 28, 32, 54, 55, 43, 23, 53, 33, 23, 64... No we can't spend 64, 55, 23, 33, 38, 48, 22, 35, 37, 42, 29 and 12. Okay. So we've got test scores and time spent, and we don't need bins, and we really don't need any of this. So the next thing that we would do is we would do a scatter, and the way that we do that is with plt., and you might be able to guess it, scatter. Then we wanna scatter and we'll just do time spent and then the test scores. So the x-axis will be time spent, the y-axis will be test scores. So we can plot that up and there we have it. Now, there appears to be possible a trend to this way, but it's really hard to tell and then also, this is a really great example of a graph that really, really, really begs us to have some labels and a title, [chuckle] because otherwise you look at this graph and you have literally no clue what is happening.
03:22 S1: So, we might... You maybe do the "plt.x" label and this was time spent on test, and then "plt.y" label and that was test score. Okay. And then we could also add a title like "plt.title," test scores versus time spent, something like that. So now we have it and we can kind of see there might be a trend, but it's certainly not a very strong trend, but of course this is just fake data. But if this was the real data we would say there's maybe a slight correlation but not really strong. Anyway, it looks like it's just a nice, simple scatter plot with Matplotlib, really not much more to it than that.
04:15 S1: Now, sometimes with scatter plots you might wanna show multiple data sets. Okay. So like I was saying before, a lot of scatter plots are used to show categories or groups as well. So you might have something like this. We'll get rid of all of this for now. Actually, we'll just leave all that and then we'll plot a second graph. So for example you might have "X1=Y1=" and then maybe you've got one, two, three, four, five... Five's enough for now. And then we'll have two, three, two, four, two. Okay. And actually let's make... Let's just say X is one, two, three, four, five, and then we'll have different Y's. And then this will be eight, eight, six, seven, six. Okay. Now, conceivably you might see something like this, "plt.scatterXY1," and then you can change the marker. So you can say the marker... By default, the marker is, I believe... The default would be an O possibly, so let's... We'll keep that, I guess, and then let's do color equals, and then we'll do C for cayenne. Then let's do another one, so "plt.scatterXY2," and then we'll say this marker equals, and we'll do a V, I believe, will be an upside down triangle, and then the color.
06:00 S1: Here will be magenta and now, let's do a "plt.show" and the first one will just be the test grade thing but then we'll come down here and yet. So now you see that you've got maybe possibly a group up here and then a group down here. So you can use scatter plots in that regard too and plot different groups and clearly see the difference of them. So anyways, that's it for scatter plots. In the next tutorial, we're gonna be covering stack plots which is kind of a way to show all the parts that add up to a whole and usually that's where I see it done actually. So, we'll use an example of a company expenditure just as an example of why you might use a stack plot. So anyways, that's what you guys have to look forward to, so stay tuned for that.
00:00 Speaker 1: What's going on, everyone? And welcome to the eight part of our Section Two, all about data visualization with Python and matplotlib. In this part, what we're gonna be talking about is a stack plot. So I'm gonna go ahead and delete basically everything except for the original import here, and we're gonna be talking about stack plots now. So the idea of a stack plot is, it's way to kind of visualize the entire whole but also see how the parts make up that whole. So it's like a pie chart, only it's a pie chart that also has another axes to it. So with the pie chart, pie charts at any one time it really only has the one dimension to it, but this will have two dimensions. You have an X and a Y. So a pie chart can only show you a slice of a pie at any given time whereas, in stack plot can show you the slices of that pie over time. So that's what we'll be doing now.
00:56 S1: So let's go ahead and say... Let's make the assumption that we are operating a company and we've been operating this company for 10 years. So we're just gonna have I guess a year here and that will be a list, and we're gonna have a one, two, three, four, five, six, seven, eight, nine, and 10. So our company's been alive for 10 years and then we've got some expenditures. Now of course, a real company has just hundreds of things but we'll just make up a few for now. We'll say "Taxes," and we'll just assume that these numbers are in the thousands. So we've got taxes. So we paid on the first year, $17,000 in taxes and then 18,000, then 40, then 43, then 44,000, then 8,000 we didn't make much that year, the 43,000, 32, 39, and 30. So that should be one, two, three, four, five, six, seven, eight, nine, 10. Right? Yes. Okay. So that's 10. Now we're just gonna say... We'll say our overhead, and these are just the cost of production basically for us to make whatever the trinket is that we happen to make. So we'll say 30, 22, 9, 29, 17, 12, 14, 24, 49, and 35.
02:23 S1: Then we're gonna have some entertainment, and these are just kind of like the frivolous spending that we might do in the company. So we'll do this 41, 32, 27, 13, 19, 12, 22, 18, 28, and 20. That should also be 10. Yes. Okay. Now, what we can do is we can stack all these up, so we can do "plt.stackplot," and we can do year, taxes, overhead, entertainment. So the way stack plot works is it takes the parameter X and then Y to infinity as arguments. So with methods you've got args and kwargs which are keyword arguments and you have to pass the arguments before you get the keyword arguments, but you have an unlimited amount of arguments that you can pass here so it's just really in stack plots probably like four arg in args one: Or something like that and then as your Ys for those who are curious. Anyways, so we'll stack plot that bad boy, and we'll do the "plt.show." So let's see where we are at the moment.
03:40 S1: So there you go. You've got a nice stack plot, and this is kinda how it all works. So these are company expenses basically even though taxes wouldn't really be an expense, but we're just using them. But here's the other question though when we look at this graph, we're like "Hmm, which is which here?" Right? It's very difficult but unfortunately, there's really no, with stack plot really no way for us to label with legends. So, we have to get a little hacky with it but we can do this. So first of all what we can do is we can... Because what we have to do is, if we want to spoof a legend so to speak, what we can do is something like this. We can say, "plt.plot," and we can plot just empty sets. So X, Y of nothing and then we can assign values. So we could say color here is equal to magenta, and then we can give it a label. We can say label equals taxes. And then we could say," plt.plot," and then again we will plot an empty set so empty set and then we would say color equals cyan and then label equals overhead.
05:12 S1: And if we don't force the colors like this, it will cycle through colors. So this would be blue then I think it goes green, red then it does, I don't know, purple and black and it cycles through. So if we don't force the colors, it'll just kind of cycle them to be different colors. But then we can do this, so "plt.plot," and we'll just continue along here empty sets color equals, and we'll make this one blue label equals entertainment. And then what we can do is come up... Or actually down here to the stack plot itself and we can pass the colors as a list. So we do colors equals, and this will be a list of colors, so we have three colors. And if our taxes, we're magenta.
06:00 S1: And then our overhead was cyan, and now we had a B for entertainment. So now we have the proper colors, and then all we have to do is call in "plt.legend," and while we're at it, let's do a "plt.titlecompanyexpenses," plt.xlabel" will be year. And then "plt.ylabel" will be cost in thousands. Something like that. Okay. Let's save and run that. And now... Oops we forgot to... Oh, we did "plt.legend" without the parms there. Anyway, there we go. And now we've actually done it. So here we can see, okay, this is how much we're spending on entertainment. We really cut back on entertainment these years. Really cut back on entertainment and overhead. And we paid a lot less taxes, look at how that worked out for us. Anyway, that was by luck, moving on. So we have our colors and we can kinda see. But again, we're seeing how our legend is actually in our way this time, but we can move things down and do this. There we go. It's still in the way. There's no way to get away from this problem. One thing you can do actually. If you use the zoom button, you can click it. And normally you click and drag to zoom in, but if you hold your right mouse button instead, it actually zooms out in accordance to whatever your choice was.
07:26 S1: So keep that in mind 'cause you can do stuff like that. So now what we can do it like this and force ourselves to have more space in our graph. That would be better so we can kinda cheat. Anyway, okay, so that's just stack plots and zooming out and all that kinda fun stuff and also kinda spoofing a legend. So there's some plotting things that just don't, with legends are just probably the most common that it doesn't work with. But stack plot, and I'm trying to think of some of the other ones like fills. If all you plot is fill, that's not a plot and so it's not. So basically anything that's polygon. So anything that fills in lines, generally is not gonna be legend-able. But you can always do this to create kind of a fakish legend. I'm not really sure why they don't just do something like this in the background even if it's just a fill and you really wanna have a label. But they don't. So anyways, you gotta kind of hack your way through it. But that's okay. So anyway, that is stack plots and in the next tutorial, since we were talking about how stack plots are like pie charts with a timeline, we're gonna talk about pie charts next. So anyways, stay tuned to that. Thanks for watching.
00:00 Speaker 1: What is going on everyone? Welcome to part nine of Section Two with data visualization using matplotlib in Python. In this part what we're gonna be talking about is pie charts. So, pretty nice inclusion to matplotlib is pie charts, and they do a few nice things behind the scenes for us, like automatically converting to percentage of the pie, and so on. So, with that, let's hop right in. First thing that we're gonna do is the habitual deletion of everything except for that first import and we're ready to rumble. So, with pie charts in matplotlib, you generally give it the... Suppose your X is your amount of the pie, and then your Y would be your labels, and then you can pass through colors, if you want, and then you can pass a start angle, even, if you want, so if you want your little pie chart to be nicely oriented, you can do that, and then also you can fill in percentages too if you wanted, I'll show you guys how to do that using the code from matplotlib, and that's about it.
01:07 S1: So, let's go ahead and get started now. So the first thing that we're gonna do is we're gonna assign some labels. The labels will be a list or a tuple but we'll make it a tuple for now. And these'll just be the slices, they're gonna be ordered, and they will be plotted this way, and generally the plots will go counterclockwise, just in case you really care. So, we've got our taxes, we've got our overhead, and we've got our... And oops! We've got our entertainment. Okay. And then we're gonna say our sizes of these, and we'll make this a list, the sizes will be 25, 32, 12. And then we specify the colors. I don't really remember what the colors were before, but we'll do, let's do cyan, magenta, and I think we did blue. We'll do that, I'm not sure if those were the ones that pertain to these specifically, but that'll be fine. Now, what we're gonna do is, we can do plt.pie and then we pass the X, which was sizes. Then we have your labels, and actually labels should equal "labels" because really all you have to do... You really could just probably get away actually passing sizes, I'm pretty sure. Let's go ahead and just run a plt.show, see what happens.
02:42 S1: Yeah, so this'll give you a pie chart without really anything to it, [chuckle] so that's okay. And another thing that we can do as well is do "plt axis" and we can pass equal here, let's just make sure, see what that does for us, right. So, if you notice, the default was kind of a tilted pie chart. People like to do those tilted pie charts for some reason, but if you don't want it to be tilted like that and kind of distorted, you can use equal there and that will keep it from doing that. Now, because those sizes don't really mean much to us without labels, we're gonna add labels and we're also gonna force those little colors like we did before. So pie, you've got sizes, then we'll say labels equals labels, and then we'll say that colors are equal to the colors, we can pass start angle and we say that is equal to 90, okay? So let's go ahead and run that now. And so now you can see we've got our labels. This is our taxes, this is our entertainment expenditure, and this is our overhead, so that's it. And the start angle, by the way is, you started at 90 degree angle, that's why, so here's your straight up-and-down line, and then again, things are plotted counterclockwise, so you've got taxes, overhead, entertainment, and that's why it did it that way.
04:05 S1: Now... Oh right, so the percentages on the actual chart, you could do something like this. Auto pct equals, and then you can use %1.1f%%, and then now let's try this one more time, and now you can see that you've got your percentages here. So you can do that. Another thing that we can add is a "shadow equals true", so this should add a little shadow to it, so you can kind of see a three-dimensional element to it. So now we've done the shadow, maybe this won't look so goofy without the thing. I don't know, I still feel like this looks really distorted to me for some reason, I don't know, I don't really like it. You can make up your own mind though. [chuckle] But, I don't seem to like it. The shadow's okay, though, I don't mind that.
05:01 S1: So, got shadow, and another thing that we can do is we can add an explosion. So an explosion is kind of where we pull a piece out a little bit, so for example, we could add explode, so we could say "explode equals", and then we can pass through, since we have three elements in this pie chart, you could have zero, zero and zero, oops, and to explode something just a little bit, we could do 0.1. So this means that... Oops, we forgot our comma. This means the second piece, so overhead, will be kind of pulled out a little bit, like we're about to eat a piece of that pie. So there you go, it's pulled out a little bit. But we could pull it out quite a bit, [chuckle] so as you can see it's pulled really far out. You could also... You could pull them all out if you wanted. You could do 0.1, 0.1 and 0.1, something like that and then they would all be kind of apart from each other and stuff like that.
06:11 S1: Okay, so that's about it with pie charts. But as you can see, there's a lot of little customization stuff that we can do. So, that's pretty cool. In the next tutorial, what we're gonna be talking about is loading data from files. So actually, the next two tutorials, that's what we're gonna be talking about. So, this would be kind of our first entry into reading data from other sources or whatever, but a lot of times people have data that's maybe in a CSV or even a text file, but it's usually separated by some something.
06:41 S1: So, we can use the methods for opening CSVs on just about any file, same thing with text files. And so, we'll talk first about the native CSV module that we... That's come part of our standard library with Python. Talk about how to use that to open up a CSV file and then we'll talk about using NumPy to do it as well. NumPy is a little better at doing pretty much everything. It's a very efficient module whereas CSV modules aren't really that great, but for small files you probably won't notice the difference. But anyway, that's what you guys have to look forward to, so stay tuned for that.
00:00 Speaker 1: Hello, everybody, and welcome to the 10th programming tutorial with data visualization in Python using Matplotlib. In this tutorial, what we're gonna be talking about is loading data from a CSV file. So, to do that we're gonna go ahead and use the CSV module. Now, first things first, the proverbial "delete all." Okay, and now we're ready to continue.
00:26 S1: So, I have been coding all of our code in this little video code directory. So that's where we're going to go ahead and put this script as well. If you're on Windows, you can code with local paths, if you're not on Windows you'll have to still give the full path but let's go ahead and create a new file. And I'm just gonna call this, "example." And for now, it's a text file. We should be able to get by with a text file. It shouldn't matter but you can also make yours a CSV or whatever you want. But then in this example file, I'm just gonna have a one, two, three, four, five, six, seven, eight, nine, and a ten and a 10 we'll go ahead and add some commas, not periods. Comma, comma, comma, comma, comma, comma, comma. Okay, and then what we're going to do, those are like our Xs and then we're gonna add some Ys, let's just make some stuff up. You don't have to copy me perfectly. That 78's gonna have to be something else. There you go. Okay, so, save the example there and that's basically it. So tutorial over. No.
01:32 S1: We'll move this aside now. And in our code, we wanna be able to reference that example file. So, what we're gonna go ahead and do is we're gonna say... First we need to import CSV and that's part of your standard library every installation of Python at least after three should for sure have it, I'm pretty sure it comes in two as well though. And let's just make an empty list of X and Y. Now, what we're gonna go ahead and do is we're going to say with open and this will be example, and I made mine in text file so I'm just gonna have example text with the intention to read as CSV file, we're gonna say the "plots equal CSV.reader" and it's gonna read the CSV file and the delimiter equals A comma, so obviously you could have just about any delimiter you want. Like for example when I do maybe text data, so like I was saying before I worked with Natural Language Processing so in sentences a lot of people have comments. So generally my delimiter is like a triple colon like that 'cause that never really occurs in text so that's what I'll use. [chuckle]
02:52 S1: Anyway, so you've got plots equals that and then we're gonna say four row in plots, we're gonna say "X.append" and because we're reading text, this comes through as string. So we have to convert, so if you have floats in there you'd convert it to a float but we know it's in the file, it's all integers. So we're going to convert to int and then the row and the zeroth element of that row is our X, and our Y is the int value for the row with the first element. Cool. So, now we'll do... That's basically it with that. So we've populated X and Y, so all we have to do now is plot it. So, "plt.plotXY," the label is going to be loaded from file and then we're going to have "plt.xlabel" and that will be our plot number, then "plt.ylabel" is gonna be randomly chosen tutorial number and then we'll go ahead and have a "plt.legend" and we'll do "plt.title" and we'll call this, "awesome graph." And finally, our "plt.show." So save and run that, and there we go, and there's our information loaded from our example file.
04:36 S1: Now, that's with the CSV module and in the next tutorial, we're going to cover how to use the NumPy module which is a third party module. It doesn't come with Python but it's a super useful module. You guys may actually have it depending on how you installed matplotlib. It's also probably one of the most popular modules alongside matplotlib that people just have because it's such a useful module. So anyways, that's what we're going to be talking about in the next tutorial so stay tuned for that.
00:00 Speaker 1: What's going on, everybody? Welcome to the 11th part of our second section for data visualization with Python and matplotlib. In this tutorial, we're gonna be talking about using NumPy to load data from files. So NumPy is generally a faster method for loading data from files and you can also do some number crunching as you're loading them in. We're not really gonna be talking too much about that, but we do wanna show using NumPy just simply because down the road especially you'll fine yourself probably using NumPy and using it to import data from files just makes a whole lot of sense. So that's what we're gonna be talking about here. So, we'll go ahead and basically leave... We're just gonna delete this. And then, we need to get NumPy. So first of all, make sure you have NumPy. So try to do import NumPy, for example, in a... Just open up IDLE and do import NumPy. And if you can import NumPy, great. If you can't, you need to get it. To get it, you would do pip install NumPy like that.
01:11 S1: If that says no program or whatever found for pip, you just do SQL/Python whatever your version is, mine is 3.4 so Python34/scripts/pipinstallNumPy. So make sure you get NumPy and then what we're gonna go ahead and do is we're gonna say X, Y... So yeah, we don't need these actually. X, Y equals NP.loadtxt and then you specify the text file or the... Again, it's like CSV, right? With CSV, we loaded a text; here with load text you could load a CSV, it does not matter. So load text and then we're gonna say example.text. Then we're gonna say what our delimiter... Oops, that's not supposed to be in quotes. Delimiter equals comma, and then we're gonna say unpack equals true. So unpacking is basically what we're doing here. So when you have a function and define simple and this function returns five and seven, let's say, and you say... Let's do Y, U equals simple, what you're doing is you're assigning five to Y and eight to U, and that's what's called unpacking.
02:47 S1: So if you just did this, like Y equals simple, Y would be equal to five, eight like a tuple. And in fact, actually, that may not even work but let's run that really quick. NumPy not defined. So we have to import NumPy. So first let's do import NumPy as NP. Let's run it one more time. X and Y must have the same... Oh, I know what we've done. What we need to say like I equals simple. We'll try again. I'm really just trying to run it. Okay, so it did work. So if we come over here... So first of all, this graph did indeed load and if we come over here and we ask what's... Oh, wait, well, the graph is up, so let's close the graph and we say I now, I is indeed equal to the tuple of five, eight. But if we say like IU equal simple, we're unpacking into those values. So, IU and so that's basically what's happening up here. When we say unpack equals true, we're unpacking the values that are split by this common delimiter, we're unpacking them into the order of XY per line.
04:04 S1: So, as you can see we removed quite a few lines that were used for the whole CSV operation. So you can see this is just even programming wise far more efficient but NumPy is a C accelerated module so it uses the language of C and in a lot of cases a lot faster. This isn't really all that you can do with NumPy either and you can load files that have date stamps and convert them and all kinds of fancy stuff.
04:31 S1: So, anyway, that's it for loading data with NumPy, and really the two popular ways you might load data into matplotlib. Another one would be with Pandas or something like that, but that's a whole, another tutorial. So, anyway, that's it for loading data with NumPy.
Log in to save your progress and obtain a certificate in Alison’s free Data Visualization with Python and Matplotlib online course
Sign up to save your progress and obtain a certificate in Alison’s free Data Visualization with Python and Matplotlib online course
Please enter you email address and we will mail you a link to reset your password.