Amazing April Sale! 🥳 25% off all digital certs & diplomas!Ends in  : : :

Claim Your Discount!

Module 1: Introduction to Relational Data Model

    Study Reminders
    Support

    So in order to talk about referential integrity constraints we introduced the notion of a foreignkey. A foreign key is something in a scheme is a group of attributes which would be used to refer totuples in other relations. That is what we call a foreign key. And the referential integrityconstraint basically says that this kind of referencing you know, should be to existing tuples inthe database instance, and they should not be dangling references okay. So if there are anyquestions in that, we can take it up. In this class, I am going to give you a set of an example database, an example relationaldatabase. And I am going to kind of continue using this example throughout this module and alsoin the SQL and other later modules. So it is going to be important to clearly understand thisrelational database scheme and we will keep it. I suggest that you write down this the just therelational and names of the attributes separately in a sheet of paper.So that it is accessible to you when we are discussing queries and SQL and things like that okay.So okay so let us go ahead with this. So we have database is to capture the academic detailsabout a in institute which has a number of departments and students studying in the departmentsfor various degrees. So, we have student relation which has these are the following attributes forthe student relation.We have roll number, name, degree, years, sex, department number and advisor. So, here theconventional relational schemes when you write down relational schemes is to underline the keyattribute. So when an attribute or a group of articles written together are underlined, what itmeans is that it is a key okay. So roll number is obviously the key for this particular. So there aremultiple case then there will be multiple groups of attributes which are underlined in therelational scheme.So roll number is a key for this student relation because attribute the value of roll numbers can beused to uniquely identify a particular rho or a particular tuple in the student relation, right. Andso, name these are obvious, you know, degree is the program, the value of the degree attribute isthe strings that stand for the various programs that the student might have been enrolled for.So, what are the degree for which he has joined the institute and the year is the year of admissionand advisor attribute is the employee Id, employee ID of a faculty member identified as thestudents advisor okay. Also all students required to have an advisor, so there is a advisor attributefor student which will take values that are from the employee Id domain of employee Id of afaculty member.So that way you will be able to identify as to who is the advisor for the student. And of course, ifyou want to know more details about the students and advisor, then you have to look up theappropriate other relation. So, and here is the other relations, the there are actually some more inthe next slide. So department is another relation. It has department Id, name, hod, and phone.So phone is the department's office phone number, and hod is a faculty member who is the headof that particular department who is kind of academically responsible for the running of thedepartment. And what is the name of the department and what is an ID. So, we give somenumbers for the department so that they are easily identifiable and serve as a key. In this case,even the name could serve as a key.But we choose to have a department Id which is a number. Now, professor, employee Id becauseprofessors are all employees of the institute. So, there is an employee Id and name, sex start year.So, start year is the year when the faculty member has kind of joined that particular department.So, all faculty members work in departments. So, what is the department in which particularperson is employed is given by the department number and then phone number of the particularprofessor. So, these are the various attributes we have for professor relation okay. Now, next one is the other important things that we need to keep track of course, a course is asubject of study, right and it has been named, it has been given a name c name, we call it to justto not confuse it with other names. So C name is the course name. And each course has certaincredits, right the number of credits that you earn if you complete that course and then departmentnumber indicates the department that offers that particular course.And, then the course Id. So we use a course Id to kind of identify the courses so to like, act like akey for the course relation. And then we have the issue of roll numbers. Sorry, the issue ofenrollments. So students enroll for courses. Now in this case, we see that there is a need for youknow taking a key, which consists of 4 attributes, the reason being that you know, course Idalone cannot be, you know, roll number and course Id.A student typically does several courses in a particular semester. And so and of course getsoffered multiple times across the years, right, say so for example, you are now doing CS3700.The same course would have been offered last year and it will get offered next year and so on soforth. And so the year in which it is offered, and the semester in which it is offered, that is theone that is you know, in which a particular student is enrolling and obtaining a particular grade.So because of this reason, you need roll number, course Id, semester in which this course isrunning, and the year all of them to be part of the key for this relation. And what we will assumehere is that the same attribute stands for semester, and it has the value either odd or evenstanding for the August semester and the January semester okay. So, we are in the odd semesternow, though there is actually nothing odd about the semester.Anyway, so the value ah grade will assume that the value grade will have null for the currentsemester. And then we will have appropriate values after the semester gets over, okay. And sofor all the previous semesters, there will be some value for the grade. So that is how this is. Nowso you can see that the issue of how students, you know, enroll into courses and then applyinggrades is all captured by this enrollment relation.Then we have the from the faculty side we have, we want to keep track of who teaches a course.So we have teaching relation, which basically captures that a particular employee, a particularprofessor is teaching a particular course, in a particular semester and a year okay. For example,so, I am teaching the CS3700 course of the odd semester of 2019 in the classroom, CS25 like sothat is one.So naturally you can also see that the need for all of the attributes to be the key here because nosubset of this will be able to uniquely identify a rho in this particular relation and employee Idtypically employee are typically teachers several courses across the that is you knowemployment and so, obviously, that itself is not a key for this and course can be taught by severalprofessors.And also so, that itself cannot be the and then you know, obviously, you need to have thissemester and year only then you will see that a particular edition of the course is being talkedabout and so you have the unique way of capturing that particular topic okay. So, the identifyingthe keys are depends on the domain that we are modeling and so in this case these are the keys.Now, finally we also have prerequisite relation which basically has 2 attributes calledpreReqCourse and course Id.And here so, if some tuple C1C2 exist in this particular relation, what it basically indicates is thatC1 is a prerequisite for C2, C1 should be successfully completed before enrolling for C2. So, thatis the meaning of a prerequisite course. So, for example, data structures is a prerequisite fordatabase systems course. So you should have completed data structures before you take databasesystems course. So there will be a tuple like that here okay.So I hope the example relations scheme is kind of clear. So we will use this relation scheme inthis module and also in the SQL module. Now, I will show this entire relation scheme in oneslide. So that you know we have it all in one place and now actually we will show what are the foreignkeys here, and where are they, what are they referring to okay, well let us identify the referentialintegrity constraints and the foreign key and then how with what a foreign key is refer to. So youcan start identifying lots of these in each of these relations there are foreign keys. So forexample, here department number is a foreign key right.Because this is the one that we will use in the student tuples to kind of identify what is thedepartment in which that particular student is available is present. And then, you know, we canuse this to kind of link up with the appropriate department tuple okay. So to refer to the computerscience department tuple, so we will use the computer science department Id and then put it here,right. That is how we refer to the. So to indicate this we have. So, those things will come up now. So, I will now indicate them byusing these rhos. So for example, hod is a foreign key in the department tuple departmentscheme. And it refers to the employee Id of the professor. So, one of the professors is the hod,and so you have the employee Id of that particular person sitting here. So, and that can be used torefer to the actual professor who is the hod for that particular department.So, you can see that this attribute has the you know, the qualities that we are looking for aforeign key, which basically is a bunch of attributes that we use to refer to tuples in other relationand you know and the constraint here is that it either it should be null or it should have valuesthat are existing in the other column of the other referring the relation that we are referring to.So likewise now we can identify, so prerequisite course and course Id are both foreign keysreferring to the course Id right. So, there are some more one more one thing that you will noticehere is that the same the same attribute name might occur in multiple relations okay. And if thereis a ambiguity as to which course Id attribute are we referring to we will use the conventional dotnotation to disambiguate that name.So, if you say course dot course Id then we are referring to this attribute, if you say employmentdot course Id then we are referring to this particular attribute like that and also the same youknow kind of quantity is actually being referred to by 2 different names like for example, heredepartment number and department Id, you know, we know that they are, in some sense the samekind of thing.So, but then we give the freedom saying that you can use whatever name you want, but then thisis a foreign key that refers to the primary key of the other relation okay. So, even in suchsituations it is not compulsory that now this particular attribute name should be the same as thisparticular attribute name. That is what I am trying to emphasize okay. Now, let us identify someof this employee Id, obviously is a foreign key refers to professors.And then roll number obviously, is a foreign key in the enrollment relation, referring to student.So even though these are part of some keys and all that they can independently be foreign keysdepartment number is a foreign key refers to department Id. Then this department number here inthe professor is also a foreign key reference department. Course Id here is a foreign key referringto that. Obviously here also course it referred to.So this is also as a foreign key referring to that and then here advisor is a foreign key that refersto the employment Id. And I suppose that is all, okay. Department number here in the courserefers to the department Id remove, okay, that is it okay. So this diagram is what we traditionally call as a relational schema diagram, in which we havethe relation names, attribute names, keys are underlined and then we have arrows to indicate theforeign keys and what are the foreign keys referring to, what are the relations they are referringto. So, this particular slide, I want you to you know, note down in a separate page of your book.So that you can refer to this particular diagram when you are talking about queries in eitherrelational algebra or SQL, have it handy so that we can discuss this okay, any questions aboutthis. So, I think, feel clear okay. Now, let me introduce, start introducing the next major topic inthis module, which is the topic of relational algebra. Now we have seen the relational modelrelational algebra is part of this the whole relational approach to databases.And is a basically a set of operators that take relation instances as arguments and return newrelation instances, new relations okay. So obviously it is an algebra, so it works on some domainof things. So the things, it works on our relations. That is why it is a relational algebra. So ittakes relations and produces new relations. So each of the operators here are like that.So we have a bunch of operators, both unary and binary that take relations since this isarguments and return new relations. Now, what we will see here is that using these optionsoperators will be able to express the retrieval request will retrieval see database has a lot ofinformation. And we want to retry certain information from the database right. So how do weexpress your requirement, how do we express your query, it is called query in the databaseparlance, we call it a query.A query is basically a data retrieval request, we want this kind of data, this is what we want fromthe database to be retired. So how do they express that. So, relational algebra operators give usone way of expressing what we want actually. So but interestingly you can also see as we goalong that this is a procedural way of expressing what we want okay, a procedural way ofexpressing what we want, that means it kind of says that okay.You do this operation here on this relation, do another operation on other relation, combine thislike this and then combine it with something else and then give me this okay. So, when you havean algebraic expression, you can think of it as a tree right. And so, you can think of informationflowing up like that finally, the output comes at the root in some sense, we will understand thisas we go along and then discuss the queries .So, relational algebra forms the core component of the relational query engine, which is the heartof the relational database system, uses these relational algebraic operators and efficientimplementation of these operators. Right now in this module we will not really bother otherswith the implementation details about these algebra operators, we will think of them as logicaloperators.And will confine ourselves to the conceptual understanding of what is operators, we will justthink of them as mathematical operators and then see how we can make use of this operators,later on of course, we need to study as to how to exactly implement these operators, how do youwrite a program to realize the operator okay, and those programs will go into will all be availablein the in the query engine okay.So SQL queries so we will study SQL in detail and SQL a declarative way of specifying queries,a largely declarative way of specifying queries, we will see that later. And that is internallytranslated into rational algebraic expressions as I was mentioning earlier also. And because ofthat it kind of provides us an opportunity for optimizing that relation, optimizing the way thatparticular query gets actually executed okay.So we will see all this in this particular set of slides. So what are the operators. It is surprisingthat we have just about half a dozen you know operators, actually, we will see that the minimumset of operators is you know, we will come to that little later. But right now we just have 1 2 3 45 6, just about 6 7 operators. And, if you feel understand these operators, this kind of sufficientfor expressing all kinds of queries on relational databases.So, so that is the power of this particular up the system of the algebra. So we have select projectwhich are unary operators and cross product operation, the union intersection the difference andyou can also see that actually not all of them are really required, you can express some of them interms of the other. So let us say union intersection and difference and then join operator. We willlater discuss as to what is the minimum set of operators and things like that okay.So let us start looking at these operators one by one and understand the meaning of what arethose operators do, for it is not necessarily that you should be familiar with SQL at this stage ofthe course. But if you are familiar with SQL, do not confuse the select operator if you are goingto talk about here with the select clause of SQL and the select clause of SQL actuallycorresponds to the project operator okay. So but even if you do not know SQL at this stage, itdoes not matter okay. So let us start with the select operator, it is a unary operator. And basically it can be used toselect those tuples in the relation instance that satisfy a particular given condition okay. So it hasa parameter which is the condition and what it does is to select all those tuples in the relationinstance with satisfy a certain condition, the given condition. And the notation here is to write itas sigma theta and the relation operator the input religion.So theta is the selection condition, sigma is the select operator and r is the relation name. So it isa very simple operator. So, it basically takes a condition and then returns all the tuples in thisrelation instance r that satisfy that particular condition. So, it gives a new relation instance andthen it is easy to see that the scheme of the new religion instance is the same as the scheme of theinput relation right.So, result is a relation with the same scheme as r, the same set of attributes, same sequence ofattributes, same order of attributes as r consisting of those tuples in all that satisfy the conditiontheta and the operation is commutative. So, if you write sigma c 1 of sigma c 2 of r, okay noticethat we can compose the operators and then get algebraic expressions right. And that is the wholeidea of algebra right.So, you could sigma c 2 r is now a new relation. So I can now use that relation and then makeanother selection operation, right. So sigma c 1 is and then inside that I am giving this newrelation, which is a result of this particular doing selection with c 2 and r. And you will noticethat this is actually same as doing sigma c 2 on sigma c 1 r. So if you are first selecting thosetuples that satisfy c 1 okay.And then from there you are selecting the tuples that said c 2 is the same as doing first the tuplesas c 2 and then choose from among them. So, basically the tuple sets were both c 1 and c 2 andso the order does not really matter and so, this operation is actually committed. So, it is this kindof identities involving this algebra operators that will later help us I know in rewriting thesealgebraic expressions.So that it will be advantages for us to actually run the query in a efficient manner okay. So that isthe simplest operator select. Here are some okay, let us go into the details about what exactly is the selection condition. So,the selection condition can be either a basic condition or a composite kind of condition, the basiccondition can be either of these 2 forms. So, basically when you take a attribute and compare itwith another attribute or what are these attributes, where are these attributes from, obviously theyare from the relation scheme that is on which it is being applied, the relation on which it is beingapplied.So, these A I, A j are the attributes of the relation well, that is the input for this operator. And so,you compare those 2 attributes, somewhat a pair of attributes or you compare an attribute with aconstant appropriate constant each of these attributes come from a certain domain. And so if youtake a particular string you cannot probably compare it to the integer, but you will compare itwith strings right.So, appropriate things and appropriate constants, constants of appropriate type. So, these are thekind of atomic kind of conditions, simplest of the conditions, the simplest of the conditions willinvolve comparing an attribute with another attribute and or comparing an attribute with aconstant. So, these are the simple conditions and then you can combine these conditions with thelogical operators, the logical operators are and or not are all allowed.And then make a complex condition or composite condition, I am calling it as a compositecondition. So, basic conditions combined with logical operators, and these are familiar to us, wehave been using them. So, earlier courses also, so and or and not that we used and so. So, all ofthem of course, are Boolean value right. So, this is a comparison operator is A 1 greater than A2, either it is true or false right.So, these are Booleans so, they can be combined with Boolean operators and or not and then youcan now get a composite condition and that is how you will use. So, this theta is of this form, thetheta the selection condition is either a basic condition or a composite condition. And if it iscomposite, it basically uses logical operators and then combines this atomic conditions and thewhat are the competition operators.The commercial notice the operators are the obvious ones here are the less than less than equal togreater than greater than equal to and equal to not equal, the six possible comparison operatorswhich compare values basically they compare values and of course, we will. So, in order for thewe will assume that if you are applying this comparison operators the corresponding domains areyou know linearly ordered sets like that.The order relations do exist on those attributes only then it will make sense for us okay. So, c is aconstraint of appropriate data right okay. So let us start using this condition and then express afew data requests. So let us look at some examples select expression. So obtain information about a professor withname Giridhar start with that, there may be several Giridhars in the institute. So now we aresaying seek name is a attribute of professor relation. So, this information is there in professorrelation, the details of professor obviously or the professor relation. And so what we are sayingand we want information that tuples that have Giridhar as the name.So we are putting a simple condition here, name should be equal to the string Giridhar in thisrelation professor and then obtain all those tuples one. So if there are multiple Giridhars, theywill all be listed as. So the entire information about the professor tuples will be, so now this isnow stands for a relation, which has the same scheme as professor okay, but contains only thosetuples in which the name is exactly Giridhar this thing okay.So it is a subset of the original set of tuples. Now, we have not had given a name for thatparticular result right. It can, we have to give a name in case we are using this relation that hasbeen produced by this particular expression later on in some other context, we may actually wantto give a name for it okay, so we will see that later. So obtain information about professors whojoined the university between 80 and 85. Both inclusive, some numbers okay.So there is a start year attribute in the professor which says that when the professor has startedworking in that particular institute or department to be specific. So, here we are saying so, I usethis short way of you know expressing 9 and start here is either get an equal to 1980 or less and itis actually less than equal to 85 between exactly between these 2 things. So, this will contain theall the professor tuples, whose start here lays exactly between 1980 and 85 inclusive.So now, these are typical data retrieval requests. And we can see that we can express them usingthis particular operator. So, moving on let me also introduce a project operator, the project operator while the selectoperator who actually working on the tuples and then choosing a subset of these tuples, what theproject operator will do is to help you to kind of get rid of unwanted columns in relation.Supposing you want a relation is pretty wide, and you want to focus on a only particularattributes.And you do not want to really look at all the attributes of the relation, then this project operatorwill be useful to you okay, so what project operator does is to keep only the required attributes ofa relation instance and throw away others in some sense. So I do not want to, you know, otherinformation so I will just throw it away okay, so here is the notation pi is the symbol that we usefor this operator pi.And here is a list of attributes that we you want. So obviously this and then this is the relation onwhich we want to apply that. So, obviously each of these A 1 through A k is part of the schemeof r okay. So they need not be consecutive anything like that. So, some set of attributes from theoriginal scheme which were interested in. So, the this L list L is the desired attributes in thescheme r and exactly what is the output or what is the result of this particular expression is givenhere.The result is the set of tuples v 1 through v k matching all these A 1 through A k such that v 1belongs to the domain of A i okay for each of these case, and there is some tuple t in r. Such thatt. A 1 = v 1 t. A 2 = v 2 and t. A k = v k okay. There may be multiple tuples actually. But as longas there is some tuple in t such that this condition is satisfied, we take that bunch of values v 1through v k and then put them in the output okay.So, basically remember that the result of a particular expression is a relation and what is ourdefinition of a relation, it is a set of tuples, okay. And so it will not have duplicates. It is a set. Sothere is no scope or duplicate elements okay. So that is why even if you let you know, there aremultiple tuples that have these particular corresponding values v 1 through v k for theseattributes A 1 through A k only one sub tuple of this v 1 through v k will exist in the relationshipokay.Let me illustrate that. So before we go further, so if r 1 is the new, you know, relation name weare introducing and then assigning it as this one pi l r 2, then basically we are calling this pi l r 2as this new relation r 1, then the scheme of r 1 is l. The projection list, what is this scheme of theoutput, the scheme of the output is this projection list, right. Because that is what you areprojecting.These are the attributes that you have chosen. The remaining attributes around being thrownaway. So it is kind of obvious that the, you know, schema for the output is this projection list. So, here is the example, just look at this slide student. So, we have Mahesh in computer sciencedepartment and there is a Mahesh in the mechanical engineer department, both of them are doingM.S etc. And so now you do a sigma degree M.S degree equals M.S on student. And projectmean okay. So now degree M.S. So, sigma degree M.S on student will give me this tuple, thistuple and the last tuple the 3 tuples.Because all of them are enrolling the M.S degree. Now, if you do a project name alone withoutprojecting roll number, what is it means is that I only need the column name or attribute nameinformation of this particular relation and do not want anything else. So now if you do that, thenyou will actually lose information because only one Mahesh will exist, and the other tuplecontribution Rajesh.And that is it, you would have actually lost some information. They had to be careful about howto use it, but this just illustrates the idea that if you have a desired set of attributes and youproject that you get information corresponding to that. If you do roll number name from thestudent. Supposing I just want to get only the names of the students I am not really botheredabout degree and all the other details, then you would do pi roll number name on student.