Loading

Alison's New App is now available on iOS and Android! Download Now

Study Reminders
Support
Text Version

Set your study reminders

We will email you at these times to remind you to study.
  • Monday

    -

    7am

    +

    Tuesday

    -

    7am

    +

    Wednesday

    -

    7am

    +

    Thursday

    -

    7am

    +

    Friday

    -

    7am

    +

    Saturday

    -

    7am

    +

    Sunday

    -

    7am

    +

Basic Customization Options

00:00 Speaker 1: Hello, everybody, and welcome to Part 2 of Section 3 of our Data Visualization with Python and Matplotlib tutorial series. In this part, what we're gonna be talking about in this section are a lot of the customization options that we can do with Matplotlib. But first we need to have a decent dataset. So we could go through and we could manually create this dataset. That would be kind of silly, especially when there's plenty of datasets that we can get on the Internet. So the dataset type that I'd like us to use is stocks. So with stock data, it represents a pretty good tutorial dataset for quite a few reasons actually. First of all, stock data tends to come with multiple series. So what I mean by that is it has multiple types of data, so a lot of times you'll get the x-axis will be dates, so that's one interesting aspect, but also you'll have pricing data, and sometimes the pricing data will be bid/ask if you're paying a lot of money for your data anyway, or you can get open-high-low-close data, and so that's automatically four lines. We'll talk about what open-high-low-close is when we get to it.

01:12 S1: And then also you might get volume data which is how many companies were traded in that time period or how many shares of that company were traded in that time period. So with that, it's another dataset. And volume is a completely different scale than price is, so it allows us to cover a lot of customization, plus we can also do various kinds of transforms on the data. So we can apply moving averages, or whatever, and plot those as well. So it allows us to kind of work with a lot of variations of datasets. And then also, like I was saying before, because it's a time series, we're working with dates and stuff like that, so plotting dates, dates are not numbers, so your graph does not automatically understand what a date it. So what we have to do is we have to usually convert the data that we have into some sort of date format that is acceptable by our module.

02:03 S1: So Matplotlib has its own little mpl dates that it likes. And for example, another charting module you might find yourself using down the road, maybe, would be like some sort of JavaScript plotting module. So the one that comes to my mind would be something like Highcharts, which takes data in the form of Unix times 1000, that's like the JavaScript time. So you need to be able to be comfortable with converting date stamps to Unix and Unix to date stamps, and all kinds of formats like that. So working with stock data is just very useful, so that's what we're gonna go ahead and do. So first we're gonna do our proverbial 'delete everything,' except for matplotlib.pyplot. And apparently I've lost my mouse. There it is. And so, we'll delete everything up to this point, and now we're gonna do, we're gonna go ahead and import one more thing from Matplotlib, so import matplotlib.dates as mdates. Again, most people that use matplotlib.dates shorthand it to mdates, so we're just gonna continue following that standard.

03:06 S1: Then we're also... We may or may not touch Matplotlib dates in this specific tutorial right now, but it will be used in this section. Next we're gonna go ahead and import url lib because we're gonna be accessing the Internet. And then we're gonna import NumPy as NP, because we're gonna use NumPy to do some of our basic crunching and stuff like that. So like I was saying before, NumPy becomes pretty integral, pretty fast, any time you're working with numbers data. So if you don't have NumPy, again, you do pip install NumPy, no problem.

03:40 S1: So first what we wanna go ahead and do is we need some sort of function that's gonna grab the data, and we're gonna go ahead and use the Yahoo Finance API. If you're not familiar with it that's totally fine. There's no authorization or authentication that goes on there. It's free information, so you don't need to have an account with Yahoo or anything like that. So to start, we'll go ahead and do define, and we're gonna call this 'graph_data,' and then we're gonna have a parameter that we pass, and that's gonna be stock. So what that's gonna be is whatever stock we pass through as that parameter, that's what's gonna be pulled and graphed. So to help us know what company we're doing, we're gonna go ahead and print that company, so we'll print, and we'll do 'currently pulling', then we'll do, comma, stock. So to just say, currently pulling, and then we'll do a colon there and it will print out the ticker that we're pulling. So if you're not familiar to... Stocks are identified by their tickers. So for example, Apple, the company that sells phones and computers and stuff, their ticker... Their name is A-P-P-L-E, Apple Incorporated, and then you've got their ticker is AAPL. Or like another one would be like Tesla with their electric cars, they're TSLA, so just understand that. You can always go to Google and type in the company's name and you can find out the ticker if you don't already know it.

05:05 S1: So anyways, we'll pull that and now we're gonna write the URL. So this will be the URL that... And we'll call this URL equals... And this will be the URL for that specific ticker from the Yahoo Finance API. So we're gonna put this in a string, and it will be 'http: //api.finance.yahoo.com/instrument/1.0/++'' and then this in here will be the stock, so this is actually the ticker. So currently pulling, this will be ticker, this is the ticker for that stock, this is the ticker. So this one might be in theory 'AAPL/chartdata;quote' because it's a price quote, and then we can specify the range here. So range will be equal to whatever we want and we're just gonna do 10 years for now. We'll play with ranges as we go on. The Yahoo Finance APIs switches up their date representations.

06:25 S1: So if you do really short term like one day or three days, you'll get time stamps that are UNIX time. And if you do longer time frames, I think anything after like 10 days, it will be your... 10 days and longer, it will be represented in like date stamps that you can look at and you would read them. So anyway, we'll handle with both or deal with both so you would know how to handle them. But for now we'll start with 10 years and then we'll just do /CSV and close off our quote there. So, in theory, let's go ahead and we'll just print the URL. Okay, so we'll come down here and then, we'll say like stock here equals and then we'll say input and then stock like this 'stock2plot:' Add a space because otherwise if you type it it'll like type right on top of it.

07:19 S1: So stock2plot, so this will allow the user to like type something in. So input just allows us to write into the console basically. And then we'll go ahead and run graph_data for whatever that stock is. So we'll just say stock. We pass through stock here which is also called stock, stock and it builds this URL at least. And then we're just gonna print the URL, and then we'll visit it manually to look at it. So, let's go ahead and save and run that. So, up should pop your consul over here. Let me drag it over. Can't seem to touch it right now. So move it over here and let's say we wanna do TESLA. So this would be the URL that it feeds to us. So let's copy that URL and let's open it up in a browser. So, when we visit it in a browser, this is what we get. It's a little small price so let me just zoom in a little bit for you guys. And so this is the data. So this was with the 10 year CSV. And so you can see here that there's some initial information here that is basically useless to us. But then if we scroll down a little bit, this starts to look like some pretty normalized data. So, we can look at this visually and kinda take a gander at what this is.

[chuckle]

08:33 S1: So let's see. We'll go to here. So you've got let's say this line here. This is the date stamp and we can sort of deduce. It might be kind of hard to see visually but we know that this is the year. So, 2011, the month, so 10 so October and then the day, so the 3rd. So October 3rd, 2011, and then we have the pricing information. And they order it kind of funky. They do close-high-low-open. And then finally the... So this is close, this is the high for the entire day, this is the low for the entire day, and this is the open. So, when that day's prices opened up, at market open, what was the price of the company? It was 24.95. For that entire day, what was the highest price? 27.6 and then you've got some other information there. And then this is the volume so this is how many shares of that company were traded. That's actually quite a bit of volatility from 22 to 27. Like, oh, my gosh. [chuckle] That was... This is a pretty volatile stock as I suppose initially. Oh, my goodness. They had quite a few... Like these ranges are massive for comp stocks. Like usually you've got like 1% in a day as the most. Anyway, so this is our data and we're gonna continue working with this in the coming tutorials we'll be talking how to parse all of this data out and assign it to variables, and then soon we'll be graphing it and all of that. So, stay tuned for that.
00:00 Speaker 1: Hello everybody, and welcome to the fourth part of Section Three, which is data visualization in Python with matplotlib. In this part, what we're gonna be talking about is creating this conversion function for our data information that we're pulling offline. So like I was saying sometimes the conversion of date data so we can actually plot it in a chart can be confusing but once you get used to handling for date data it doesn't become so tedious anymore. So, anyways, in the conversion we've specified a conversion function but we don't actually have that function. So we're gonna go ahead and build that function now. So let's just make a new function here and we're gonna call it the same thing obviously, so bytes pdate2num and then this function takes a format and then it takes encoding. And the encoding is gonna be TF8 because that's the encoding of internet data.

00:56 S1: So: What we're gonna do now is we're gonna first start with a string_converter and the string converter is the equivalent of M dates which normally this is how it used to work and it was great but it doesn't quite fully work yet in three so equals M dates.strpdate2num, so strip date to number and then format, fmt. So in the past, you used to be able to say your converter was basically equal to this, right? Like this. So we could copy this and paste. And that base... That used to work in Python 2 but it won't work in three because of this bytes information stuff. So, the string converter is equal to that but then we have to do a few more things. So let's do a... We're gonna create this little trial function and it will be called bytes converter and then we'll pass in B here and then we'll say S=b.decode using the encoding format there. So, we're decoding UTF8 then we're gonna return the str_converter S then we return the bytes_... Or bytes, we didn't put underscore. Let's add an underscore here. So, underscore and underscore.

02:34 S1: So this should convert the data to the format that we want which is in M dates format. So we come down here and let's go ahead and just print date. So after this long line here, let's print date and see if it actually worked the way that we wanted. So we'll just save and run real quick. We'll continue on with TSLA. Okay, sure enough we plot these M dates. Okay, so these are your dates and everything is great. So, even though... Yeah, again these dates mean nothing probably to you but it means a lot to matplotlib. So that's the numbers we wanna see, something in the 700,000s for now. So close that. Cool. So our conversion worked even though it may not look the best but now we're ready to actually plot some information. So what we can do now is instead of date there, we'll just delete that. And then now, we can do plt.plot_date, so we're notifying matplotlib we're about to pass dates to it. The date and then the close P, then we'll do a plt.show real quick. Plt.show.

03:47 S1: Now with matplotlib, when we do a plot date it doesn't default to the line plot. It defaults to a scattered plot so let me show you, not a scattered, it defaults to just a dot marker though so if we do TSLA again, the chart that'll come up here. Right, so you see the little markers are these little dots? Okay, so we can handle for that by closing here and add another argument here and that's just the line type, so just do a dash, right? So the dash is two keys over from your backspace key. Now let's run that one more time. Oh, I was like "Where's the chart? It's not popping up." TSLA and now we actually have a line graph. Now, let me make it a little bigger here. So we can see we've got July, Jan; basically Jan and July all the way through but what happens, what's pretty cool about the date formatting is right now you can see it's just the month and the year, and really not many months, it's just Jan and July, and July is just halfway through. So really these are just every six months we're getting a marker there.

04:55 S1: But if we zoom into to a specific point, you'll see that, "Oh, we get more months." Right? We got February and April but when we can zoom in some more, and now we've got multiple days inside of February. And we could continue zooming in and you can see now we got six, seven, eight, nine, 10, 11, 12 and we don't have really high granularity data here. But we actually could continue zooming in and now you can see these are timestamps in that date, okay? So, we don't have data for that time but we can in theory just continue zooming in and get really, really, really, really close. [chuckle] Anyway, okay, so that worked. So now we've got our data set, we got a really simple graph. We don't have our titles and our labels and all that stuff yet. But what we're gonna be doing in this series is customizing this chart to include all kinds of awesome customizations, colors, options, and all this good stuff. So, this should be pretty exciting and luckily we won't really be doing too much rewriting of code, we'll just be adding on top of basically this chart. So not so much of that proverbial deleting this time. So anyways, pretty cool series or section coming your way soon. So, stay tuned for that and thanks for watching.