Alison's New App is now available on iOS and Android! Download Now

Module 1: Database Management Systems

    Study Reminders
    Support

    Okay, so let us begin in the last lecture I was talking to you about this idea of data models right.A data model, let us briefly recall that and then proceed with the rest of the material. Basically a data model will allow us to describe the information system or the database system ata certain level of abstraction, I want you to understand clearly is that why do you need severallevels of abstraction for the information system. We need that mainly because informationsystem is a very complex entity, it is a sometimes a large pieces of information and very detailedinformation is you know descriptions possible, etc.So it is a complex entity and so, we need to have a description of this thing this entireinformation system at a level where you know, it can be easily understood by the stakeholders ofthis information system. There are several stakeholders for this information system. First thing isthe people who want it to be built and then use okay. And then we have what are called I will tellyou a little later what are called main users of the information system.Who just want to interact with the information system, just feed information and then you knowlike that. So, there are various groups of people who are interacting with this information system.And when we are collecting the requirements for building an information system, there is reallyno need for us to kind of expose all the internal details of how this information system is going tobe organized to the stakeholders.First thing is, they would not like to know, they do not want to know those details. So we shouldhave the ability to hide those hide certain unnecessary details from these users. And thendescribe the database at a level of abstraction which is comfortable for them. So that is why weneed various again, of course when you have to build the system itself. Then we need all thedetails of that.So, we as computer science people would like to look at the information system in a much moredetailed level right. And so we need a different abstraction level for us and ultimately theinformation system, you know is going to be stored on a secondary storage, which is discs andtapes if necessary, if the amount of information is too huge, then we may even want to use tapes,the usage of tapes is very limited these days.But then the disk the actual database is going to lie there in the storage medium on the disks. So,we need some to understand how exactly the database is going to be organized in the form ofalways as files on disk systems, so that is a different level of detail for me. So, because there arethese various levels of details at which we can describe a database system, we need differenttoolkits for doing that.So basically a data model provides that toolkit, the conceptual toolkit to kind of describe thedatabase as at a level of abstraction. So, for a high level description, will use this conceptual datamodels and this is basically useful for understanding the requirements and also collecting them.We use representational data models to describe the database at a logical level.This is what actually we computer science people would like to use, okay. So, this will give thefull logical details about the database. And then we have also a data model that can be used todescribe the full details of the record formats, file structures everything. And that is what wenormally call as a physical level data model physical data model okay. So, let me spend a littlebit time with each of at least the first 2 of these things. So, a popular conceptual level data model is what is called the entity relationship model. E/Rmodel. So what this provides is basically, concepts like entities, relationships and attributes,basically entities are things that we would like to keep track of the important things in theinformation system that we would like to keep track off. And relationships basically, orassociations between these important entities in the religion in the information system that wewould like to keep track of.And attributes are like, for example, the details such as the names, roll numbers, addresses of astudent entity. And name employee number, phone number of faculty entity. These are specificattributes. And here are some examples of relationships. So, for example, the enrollmentrelationship between student and course, students enroll for courses and we'd like to keep trackof these association.And the employment relationship between faculty and a specific department. Now, so the it isvery easy for us to kind of make use of these terms like, you know, entities and relationships andattributes. These are the conceptual tools that we will use, and then sit across with thestakeholders who want this information system to be developed, and then discuss with them whatare your entities, what are the relationships you bother about, etc.And then describe that entire database in terms of these 3 concepts, there is a few more concepts,then we will see them when we actually discuss this model in a separate module in the course.Okay, I will give you more details little later, but I just want to give you some idea about whatconceptual level data model looks like okay, any questions in this. Then we will move on to a popular representational level data model, which is actually therelational data model. Relational data model is going to be the central you know concept for thiscourse and we will spend a lot of time understanding this relational data model and also otherdata is connected with the relational data model. So, to give you a brief idea about this basicallyit provides this concept of a relation.And a relation is nothing but what we are familiar with in discrete mathematics course, which isa subset of cross product of sets, basically, a relationship. So, we are going to use that and thatwill play a central role in this data model. So, for example, in the context of a university datamodel database, we can think of a relation called student. So, this is the name of the relation andthen there are here as the various attributes for the student relation.So, yes name, roll number, the joining year, birth date and the program in which the student isenrolled for and what is the department in which the student is studying. So, these are theattributes and we also call that as the schema of the relation. And here is a specific pieces of data,which we will talk about a specific student called Sriram, who has all these details.So, we call this the entire thing as a data tuple okay, data tuple okay and a relation is basicallywill consist of a set of such data tuples set, okay. Please remember this definition with calling itas a set of tuples not a sequence of tuples. There is slight difference between these 2 things right.So, this is a set of tuples, a finite set of tuples will constitute our relational data instance we call itinstance.So, in general relational database is going to have several relations with these kind of attributesetc. and a bunch of these finite number of these relations will constitute our relational database and we will go into much more details about this relational model when we take it up in when wego into the relational data model okay. Now, let us move on basically the I want you tounderstand that there is this important notion called a data model which is a collection ofconceptual tools to describe the database at a certain level of abstraction.You can see that this level of abstraction this gives you, you know a different tool to describe thedatabase. This is the relation to whereas in the previous slide, we have seen entities andrelationships because in relationships, etc. okay, so please do not hesitate to stop me if you haveany questions. Now, this point we have also mentioned earlier that the distinction between data and metadata isone of the central ideas in the database systems field and the DBMS database managementsystem basically becomes generic in nature, because it is able to separate these data and metadataand then store them separately. So, when a DBMS has to deal with a particular database, okay.What it will do is to console the metadata corresponding to that particular database. Open that upin some is as you know, and then plan to do certain modifications or whatever is required for thatparticular database. And that is why it is not tied to a single database. It is capable of managingseveral databases at a time. I will give you some more details about the architecture of a RDBMSas we go along.So in our DBMS, relational database management system context, the schema, basically, or themetadata basically consists of all these table names and we call them table also, it is actually inthe last slide I call it as relation right. Relations are informally called also as tables. So the namesof the relations, names of the attributes, along with their data types for each of these relations.And we will also see later on that there are certain kinds of constraints. For example, if you putan age attribute there, you cannot put a negative number right. So, there are certain kind ofconstraints about these attributes and also about table relations in general. So, all this informationtogether is what is called the metadata or the schema information and this will be storedseparately from the data.And then whenever we have to deal with the database, we will conserve this schema and thenoperate on the on the database. So, the database definition basically talks about you know whenyou say that we are defining a database, what we mean is first designing this is that schema forthe database and then we will when you say database loading data is getting updated what wemean is to actually store data into this the skeletal structure which we have set up.The skeletal structure is basically the table of tables with table names, attribute teams, etc. atblanks in it. So that is what the distinction between the schema and data versus schema. This is avery central idea in database systems. Okay, so moving on. Let me take you to this interesting architecture diagram which describes the internal abstractionlevels that are available within a database management system. Now, we are talking about adatabase management system, specifically relational database management system and what arethe different abstractions it provides in as part of its architecture. So this is popularly called thethree schema architecture.And we will see how exactly what are the various things that are there. Now, basically, themiddle layer is the actually the important layer, which is this logical level representation of thedatabase. It consists of a set of relations. Guys I was previously telling you, the relationaldatabase basically consists of a bunch of relational relations as set of relations. So, this is what isthe representational data model, which is the relational data model and the description of thedatabase is given as a bunch of relations.Now at a low level of detail, we have the actual data is sitting in files, there are a set of files,okay, which are actually organized on the desk. And there are also what are called index files.Basically, index files help us to retry record efficiently from a data file. A data file is the one thatactually contains a lot of records in it. Each record basically is somewhat similar to our structurethat you have defined as part of C programs etc. right.So, these are records. And then we also have index files, index files are again basically, theyimplement some kind of a data structure, and then they will help us retrieve a data record from adata file efficiently. So, we will spend some a lot more time trying to understand what are thevarious kind of indexing techniques later on in the course. But right now, we can just say thatindex files help us locate data records efficiently.So, the data is actually lying in the physical level on the disk in the form of these files index filesand a few other things, okay. So that is the physical level scheme. So to say the metadata at thatlevel. So the metadata here basically consists of what is a file, what is the record structure etc. isthat record a fixed length record or is that record having variable number of fields in it, variousother details like that.So, that is the physical level scheme. So, we will come back to this diagram. Let me first takeyou, okay, so at the top level is this view level. So, I have a slide for that. So, let us look at what is view level. So, view level logical levelphysical level. These are the 3 levels we are talking about. The view level basically consists of afinite number of views, what is the view, a view describes certain aspect of the database which isrelevant to a particular group of users okay. A aspect of the database which is relevant to aparticular group of users, let me give you an example.For instance, let us look at the context of a library database system. Okay if you have gone to theuniversity library or the institute library, you will see that there is this place where issues ofbooks, return of books and all of them are managed, and most of them you will interact with thatparticular section because that is where you borrow items, return items, probably pay fines andthings like that.If you are delayed in returning an item, they will collect, you collect fines from you and all thatright. So, that is the issue return management section. And then when new users, new studentscome into the campus, they need to be enrolled into the library. So they have to be given smartcards or they have to be given their details need to be taken down, etc. So, that is the usersmanagement section you will see that it is there in a separate place okay.And then of course, the library has a huge collection of books and so faculty keep requesting fornew books to be added to the library, etc. And so the books have to be purchased. So there is aprocurement section. A complete section dedicated for interacting with the suppliers of books,placing orders for books, receiving books, etc. that you do not see, because you do not need that,right you do not need to know it is it is kind of hidden from you.But to manage a library of our kind, you can imagine that you need a separate section, which willkeep taking the requests from the faculty taking appropriate approvals, you know, looking at thebudget and then placing orders receiving books and managing books. Once the books arepurchased, then they will be stopped and so that issues and returns can happen. And so, so youcan see that a large even a enterprise like a library has different groups of people who areconcerned with different aspects of the same enterprise.It is the same library enterprise, but you can see that there are a group of people who arebothered about books, purchase of books, dealing with suppliers and things like that, there is agroup of people who are actually are issuing and written the books returning and managing allthat. So, you can see that these are best model as some kind of a view of the entire databasespecifically meant for a particular section like a book purchase section okay.So you can see that each of these sections views a portion of the entire data. For example, thebooks purchase section does not really bother about what is happening in the issue return section.There is not concerned with that. In a similar way, this issue returns section, you know, simplydoes not bother about what the book purchase for where people are doing, they have their owntask cut out for them.So, they keep doing. So this is little interaction between these 2 people, but all of them areinteracting with the same enterprise database system. Now, how nice it would be to kind of youknow, give you a view of the entire enterprise to this book section people separately, so that theycan always deal with that part of the database only okay. And this is possible by what is calledcreating of these views.So, we can create views which are nothing but actually virtual relations. Okay, they can answerthey may not be actually be physical you know relations as part of the entire relational schema,but they will be some kind of virtual relation. So, it is as far as that group is concerned thatrelation exists and they can interact with that schema. So, a part of the, so that is what weactually mean by a view. A View describes an aspect of the database relevant to a particulargroup of users. Whereas the middle layer, which is the logical level schema, this describes the entire database.But of course, no physical level details are given at the logical level schema. Now, this is ofcourse, useful because this is as I was telling you that this is the most important schema becauseit gives the entire logical structure of the whole database. Now, the physical structure of data interms of as I was telling you record formats, file structures, indexes is given by the physical levelschema.Now, views are actually optional, you know if your enterprises not to date and you know, if thereare no easily identifiable user groups who want a particular you know, aspect or a view of thedatabase then we can probably not create these views at all okay. So, it is up to the modern allthe modern relational database management systems, the software do provide this option of creating views. So that when a group of people want to take a restricted view of the entire database and then work with that, it is possible for us to enable that in the RDBMS okay. So, the views are actuallyoptional. And the whereas the logical scheme of course is essential because that is the one that describes the entire database in its full form and the physical level most of the modern RDBMS software is actually completely hiding this physical level. And giving the even the database administrators giving a limited access to this physical layer, we will see why it is it is happening. You may want to actually have access to this physical layer as to how the data is actually stored on files, formats and things like that, if you want to tune the performance of the database, how is the database performing for certain kinds of queries, if the database is taking unusually long time, you would like to go investigate as to why it is happening. And then you may want to kind of tinker the data at the physical level. But the trend is to hide the details of the physical layer, but then of course, when there will be ways in which you can actually access the physical level details okay. So I hope this three scheme architecture is kind of clear. This is not to be confused with the data models that I was mentioning just a while ago, okay. The 3 kinds of data models that I was mentioning a while ago. So, a while ago I mentioned conceptual level data models, those conceptual level data models are different in the within relational data model, which is a representational level data model. We are now looking at the implementation of the relational data model. The RDBMS is the one that implements the relational data model. And while implementing the relational data model, it provides these kind of other, you know, facilities. One of them is the facility of creating views okay, and sometimes it is very useful to have that facility. Now, within this context we traditionally describe 2 things called the physical so let me take you to that slide. We call it physical data independence, okay. The notion of physical data independence is important. What it basically says is that I should have the ability to modify my physical level schema without affecting the logical or the view level schemas okay. So, the database management system should be so architected such that, I should have the ability to kind of modify the physical level details like you know, creating a new file, inserting it into a system or taking a existing file and splitting it into 2 different files etc. Such kind of modifications I should be able to do without affecting the logical and the view level descriptions of the levels of the schemas. Why would you require that, first thing is that once you describe this database there are these what are called applications okay, that will be built based on the assumption that the database has all these details okay. So, what exactly is an application. An application is a program written in a typical high level language like C, C++, Java which uses SQL and interacts with the database and modifies the database, retrieves results from the database okay. So, these application programs are very important and they are the ones that actually update the database okay and they in fact implement the day to day requirements of the enterprise for modification of the database okay. Now, supposing I have already set up a few application programs like this, I do not want to redevelop all those application programs okay, because a new file has been added into thedatabase, you know, for example, okay. So is it really possible. That is the kind of question thatwe are facing. So, if it is really possible, then we will call that as a physical data independence. And actually, the three schema architecture enables this. How does it actually do that and first of all, why do we have to do any modifications at the physical level that I supposed to be clear, because we may want to do performance tuning, a certain application is running slow, because it is trying to access the database in a certain way. And we now find that after looking through the application, we now find that if we have an additional file, which stores a slightly different kind of records. Then this application will now become faster. And so we would like now add that file at the physical level. So this is what is called performance tuning. And so, because of this, we should be able to, we want to have this ability of doing a modification to the physical level. And ideally, if we do this physical level modification we will not like the logical and view level schemas to change okay. So that is this ability that we are talking about. So, how is it achieved. It is achieved by ensuring that the modification we have done is in some sense localized okay. And nothing comes free. So how does it exactly come. It is achieved this particular ability for physical data independence is achieved by suitably modifying the physical layer to logical layer mappings okay. Now, it is time to go back to this three schema architecture picture, I think I can just go back like this okay. So, I talked about this set of relations and then I talked about this set of views, these are the ones that provide specified you know use view for a group of people etc. Now is the time to talk about these boxes that are lying between these layers. So, this is a box that will that basically captures the logical level to physical level mappings, what basically it consists of is okay, you have a relation, let us say R1. How is it actually implemented. What are the files, where are the remembers the relation R1. R1 is a set of tuples. It is a set of tuples okay. Each tuples consists of a bunch of values. So it is a conceptual thing. Now physically, it has to be stored as a each of those tuples might be stored asa single record, or each of those tuples might be stored as 2 records. We do not know exactly how it happens okay. So there is a bunch of ways probably 1 file or probably 2 files or 3 files that are kind of implementing this relation okay. So that details are actually here. How is this related, mapped to actual physical right. Now, that detail is there in this particular layer box. And so, if the basically this physical data independence is basically achieved by suitably modifying this mapping, if you create a new file now at the physical layer. Then we will record the details, will modify the details of this logical level physical level mapping suitably to reflect this new situation that we now have a new file and so and so relation that was not using this particular file will no use that etc. whatever is the appropriate modification will have to record it here. And with that we will be able to, now any application that was using relation are 1 will continue to work. Because now the software the RDBMS package can looking while you know, servicing the request for version R1 will go through these mappings, new mappings and then look up the appropriate files okay. So that is how logical physical data independencies achieved. In a similar spirit we have okay. So, another similar notion is this notion of logical data independence. So, the ability to change the logical level schema without affecting the view level schemas and the application programs is what is called logical data independence. Now, why do you need to do this in the first place. For example slightly you know adding a new attribute to some relation, because then the designers have now realized that they also need to keep track of this new attribute, which was not earlier thought of it is a term. Now that has to be added. And certain attribute has to be deleted, let us say. So if these requirements come, then how do we handle them. That is the thing that we are talking about. So of course, if you add a new attribute, then to add some relation, then what we would like to do is that with this situation we like is that there should be no need to change the programs or the views that do not require to use this new attribute. Obviously, things that want to use this new attribute will have to be changed because this is what has come into picture. So those things that are not using this new attribute can continue to be used as they are in a similar way, if we delete an attribute all those are programs that you know use the remaining data there should be no need for changing them. Whereas, the ones that are using this particular deliverable it obviously has to be changed. So, how do we achieve this logical data independence. Again, it is the views which are on the top layer, again not defined in terms of the lower level relations okay, as I was briefly mentioning earlier. Views are basically nothing but virtual relations. So, those views would be defined in terms of the actual relation that there are the logical layer, we will see how exactly we will start to be defined when we go into the relational data model in detail. So, the logical data independence is achieved by looking at the real view level to logical level mapping and then suitably modifying that mapping we will be able to achieve this logical data independence okay. So, are there any questions on these three schema architecture, it is a kind of a abstraction, layers of abstraction that are provided within the RDBMS package or the DBMS system. Of course, if you look at some RDBMS they may not provide you know, the ability to create views. Okay so they kind of implemented the RDBMS in a partial manner. They think that okay, the users oftheir RDBMS may not really require views and things like that we are not probably handling such large enterprises. So they may, you know, optionally not provided. But the ability to create views is part of SQL standard. And so any RDBMS package has to implement the SQL standard. And, and so it should be impossible, it should be available okay. Now, let us look at the development process of a database system. How exactly we are going to do the development of a typical database okay. The first thing is we have to collect the requirements, we have to sit with the users end users of this information system and collect our requirements and for this and there are 2 kinds of things that we have to collect. First thing is what is called the data model the data model requirements. That means, what are the various pieces of data to be stored and their interrelationships, this information and this is typically presented using a conceptual level conceptual data model like an E/R model, E/R model is entity relationship model. You also have some alternatives actually at the conceptual level model which is UML I am sure you would have heard about UML also unified modeling language.UML is more used when you focus on you know, large scale software development. Whereas in the in the database world, the entity relationship model is more popular both of them are conceptual level data models. So, these data model requirements are presented in the form of E/R model. And then there is what is called the functional requirements from the end users that were to collect. Now, once the data model requirements are collected, you would typically represented in a entity relationship model and this model we will see more details about this model in the in the next module. Basically, it also has a diagrammatic representation.