Alison's New App is now available on iOS and Android! Download Now

Module 1: Introduction to Relational Data Model

    Study Reminders
    Support

    Relational data model is a very you know important component of the whole course. Though wecall it a database system these we generally think of them as relational database systems thesedays, a database system is considered by default as a relational database system. So, thisrelational has become such a important thing, before the advent of the relational data model therewere other kind of models in us like network model and hierarchical model and all that.But relational data model, you know, is proposed by Codd in the 70s. And then it became verypopular because very simple and elegant model with a mathematical basis. We will see thosedetails as we go along and called Codd Turing award in 1981, much after it was proposed inearly 70s. But, about almost a decade later, he got the Turing award, which is one of the highestyou know, it is the highest award in computer science.The introduction of this relational data model led to the development of the theory of datadependencies and we will later on spend a you know lot of time trying to understand what arethese data dependencies are, these are essentially to do with evaluating whether one particularrelational data model for a particular application is you know better than some other design.How do we compare database designs. So, it is in this context that we talk about what are calleddata dependencies and then we define what are called normal forms and then evaluate designsbased on these normal forms and then come to a conclusion about what design to adopt. So, therelational database model led to this entire body of knowledge concerned with theory of datadependencies.And then how to evaluate a database designs, then the as part of this relational model, we willalso be discussing what are called relational algebra operations and these algebra operations arethe ones actually that get implemented. And finally, when queries given in SQL, I have to beexecuted by this relational database system, it is in terms of these relational algebra, you knowprograms that implement relational algebra operators that actually work gets done ultimately.So, that what it means is that, the SQL is a you know a declarative style of, you know specifyingthe query and it will get internally translated into these relational algebraic expressions. And as Iwas mentioning earlier also, we then have an opportunity of you know, looking at this relationalalgebra operators and then you know, try to make certain transformations on these algebraoperations.So, that they are the transformations are meaning preserving, but then will help us evaluate theexpression in a much better way. So, optimization is possible. So, after the model has beendefined setup and theoretical issues have been worked out, it kind of laid the development forthis what is called a tuple relational calculus. Relational algebra is one thing, relational calculusis a different thing.And relational calculus is actually based on mathematical logic is and that later kind ofdeveloped into SQL language, so SQL language first and then the of course, that led to thedatabase standard and so, it is very, very important model to understand and as I was telling you,a data model is nothing but a bunch of concepts that we will use to describe the database.And the relational data model is used to describe the database at the logical level completedescription can be given using relational model okay, so let us now start looking at all theingredients of this relational data model. So the idea is actually pretty simple that there is also very interesting thing that is such a simpleidea of a relation as we know of it in discrete mathematics like as a subset of the cross product ofsets has an indeed you know, has such a great impact. So, okay in the database context we talk of2 things here, one is database sorry relation scheme and then the relation instance okay.Relation scheme will basically consist of the name of the relation, for example here and this setof fields or we also call them attributes in this model also we call them as attributes and but thereis an important difference between the attributes that we have seen earlier and the attributes here,I will tell you that. So these attributes are field names or column names. Sometimes we all justcall them as column names because it is table are structure okay, and each of these attributes hasan associated domain.Now, so here is an example. So, we have student name and the domain is strings and roll numberwe also modeled as a string, phone number is an integer, and year of admission is an integerbranch of studies a string. So, these are domains, these are attribute names and then the wholething is what is called one relation scheme. So the relation name comes and traditionally write itthis all the attribute names in parentheses.And now this what is currently is a domain. A domain is nothing but a set of atomic values. Thisis most important a set of atomic or indivisible values okay. So, that means this is a single valueand we will not you know, try to go into the parts of that and things like that, we will not assumethat there it has any components in it will simply deal it as though it is a single value, atomic,indivisible value.So, as again this in the entity relationship model, when we were talking about attributes, forexample, we talked about multivalued attributes, composite attributes and things like that, right.But here now, when we say an attribute, it is always assumed to be an a value that comes from aset of atomic values okay. Now, you can now. So, now this important this assumption is veryfundamental to be database model.And it is so fundamental that it is actually called it is later on in the when we discuss the normalforms, we give a name for that it is called the first normal form okay. Good so that is whatrelation scheme is about. And then we have the relation instance a finite set of tuples consisting, so, they constitute arelation instance and then suppose we have a scheme like this R A 1 through A m which are allattribute names, then we basically have a set of tuples and all this finite number of tuplesconstitute the relation instance, what exactly is a tuple is easy to see because we are kind offamiliar with that.The tuple is basically an ordered sequence of values, v 1 through v m such that this v i belongs tothe domain of the corresponding attribute A i. For all I 1 to m. So, in this case student name RaviTeja and roll number, some year of admission phone number etc. okay. So you can have dot dotdot so you can it is not an infinite, it is finite number of topics because it is a sect. And it is afinite set emphasis on finite set.A finite set of tuples constitute a relation instance okay, so at this point of time, we are the onlyconstraint we're placing here is that all these values. So this is a string. So even though it has, itlooks like having 2 pieces in that it is a string. So there is a blank character here. So we treated asa string, that is it, okay. So, all these are atomic pieces of data. And each of these things shouldcome from the domain define for that particular attribute.That is the only constraint we have and there are a finite number of them, this is what is arelation instances. So relation scheme and relation instance. This should be obvious to you thatthere are no duplicate rhos in relation instance right. Why is it that there are no duplicate tuplesrelation instance because it is defined as a set not a multiset, it is a set, so set means we do nothave duplicates.Later on of course, when you deal with SQL tables, SQL tables, we for practical reasons we mustallow you know rhos you may allow duplicate rhos inside SQL tables. So, we will discuss thatwhen we come to that. So, but right now from the relational theory, we will assume that this is aset of tuples and hence there is no duplicate tuples at all okay. Here is an example. Here is another example. Look at this. I am capturing the enrollment ofstudents into courses. So I am putting student name, roll number okay, it should be sameactually, course number and section number. So let us assume that courses have sections and so,course number section number together is what you know, one a student has to enroll for registerfor. So, in this case now this person Rajesh is registered for these 2 courses.Whereas Suresh is registered for this course, notice the way information gets replicated, becausein order to you know, record the fact that Rajesh has registered or enrolled for this particularcourse, you must record it like this CS3200 and then since the schema is there like that, we needas the point I am trying to emphasize here is that you cannot put CS3200, CS3CS3700 here, whatdoes it violate.It violates our principle that the values here should be atomic. It should not be lists, should not bea waste or it should not be anything else it should be atomic basis of data. Because of that, youcan see that we need to actually replicate the rest of the values and then change only this valuegood. So, here is another example. So, we will see a lot more examples of relations. Now, let us move on to the notion of keys for a relation. A key is defined like this, a set ofattributes K whose values uniquely identify a tuple in any instance, that is important any instanceof that relation is what is called a key and none of the proper subsets of this K have this property.So, it is in some sense the minimal set of it is the minimal set of attributes that have thisinteresting property that their values will uniquely identify at tuple in the set of tuples, a set oftuples are there.So we want to uniquely identify one tuple. What is it that helps us, if you look at this forexample, what is the key, what do you think would be key here, can roll number be key here, rollnumber in this particular relation scheme roll number is not a key because roll number cannotidentify a rho uniquely. There are people who are doing multiple courses and so roll number getsrepeated in multiple rolls.And so you cannot do that. So, you can see that actually roll number along with course numberwould serve as a key in this particular relation okay. So here is another, so roll number for thestudent relation is in fact of course a key there are no 2 students in the same roll number and soin the student relation, which we have seen a little while it before, roll number is a key okay.What about roll number and name, you can see that as long as roll number is there, you canthrow in any other attribute. And then that a set of attributes will anyway will have the propertyof identifying that tuple uniquely because it contains roll number, right. So, this set has theproperty of uniquely identifying the tuple but it is not minimal.Because you can throw away name. You can you do not need name, you just need only rollnumber. So, this set though it can uniquely identify a tuple, it is not minimal and hence is not akey because key is a set of attributes whose values uniquely identify at tuple in any instance andnone of the profits of sense of that has this property, it has to be minimum okay. Now, oneimportant point about keys is that you cannot look at a database a relation instance.And then somehow figure out what could be a key, what is the key for that relationship. It isdifficult to do that. That is because in that particular instance, it so happens that we have a bunchof tuples and some attribute or a set of attributes. Their values can uniquely identify a tuple inthat instance, but then remember that the key is defined that it should in that in not only just inthat tense instance.But in any instance, this attribute should have that property. So, it is something like an intrinsicproperty of the scheme. So, only the it can be determined by you know by understanding themeaning of those attributes right it cannot be simply determined by just looking at a particularinstance of the relational data okay. So it is an intrinsic property of a scheme.And so people who understand the domain only can actually identify what should what are thekeys for the relationship. Now, an important question here is that does every relation have a key,as a key is it always the case that religion has a key. Having understood what is the definition ofkey. Now, can we make a statement saying that every relation has a key, you can write okay.What is the worst case, the worst case scenario is take the entire set of attributes. The entire setof attributes will act like a key because it is a set of attributes. So, set of tuples. So, in any case,the entire if you take the entire set of attributes that is the worst case, but oftentimes you will findsome subset of the set of attributes, which will have this property and that is and does a relationhave multiple keys is another question.Can a relation have multiple keys, yes can right nothing wrong. So a relation can have multiplekeys and then it always has a key. So relation can have more than one key. So here is an example for that. So if it has multiple keys,we will call each of them as candidate key, then new term candidate key, we call all the keys ascandidate keys. Later on we will see that one of the candidate keys will get chosen as a primarykey when we move on to the implementation stages and things like that. But right now they areall keys, so we call them as candidate keys.Now let us look at this example of a book. Where isbn number, author, name, title publisher,year are the attributes, I suppose you know, what is isbn number, right. It is a unique numbergiven to all books all you know, its international standard, and standard book number, so the isbnstands for that. So let us make another additionally, we can make an assumption here that just toyou know, illustrate the point that books have only one author.It is not usually the case, but let us make that assumption but books have only one author. So ifthat is the case, then there are keys like isbn number is a key because it is, you know, setup likethat the it is international standard book number for books and author name and title togetherserve as a key. No person will write 2 books with the same title, right, there will be somedifference in the title.If you are writing a book and then you are revising it, then you will add the edition number to it.So fundamentals of database systems 4th edition is the title of the book. So when you come upwith the 5th edition and fundamentals of database system, fifth editions is the title. So peoplewrite books with different titles and so author name, title because we made this assumption thatauthor there is only one author is also can serve as a key for this particular relation.So we know that so when we say this, it says that whatever be the instance of this particularrelation, it is always possible to uniquely identify a book given the author name and title okay,that is what we are okay, so this is the question I was asking a little while before. Does a relationalways have a key, yes, it always has a key and the set of all attributes, in case no proper subsetis a key will work as a key.Now, another term, we call something as a super key. what is super key a set of attributes thatcontains a key is called a subset sorry, the key contains a key as a subset is called a super keyokay, so I give you an example of a super key in the last slide, the roll number plus name, rollnumber and name is a super key because it contains the key which is roll number for the studentrelation.For the student relation roll number is the key. So anything you add to that then roll numbername, roll number, phone number, etc. they are all will become super keys. So, if you have thisnotion of the super key then key can be actually defined in terms of that. So key can also be canbe defined as a minimal super key. The super key is something which has this property that itsvalues can be used to uniquely identify the tuple and we are now adding minimality to it, thatmeans no proper subset has this property.So a minimum super key. So, in some books, they introduced super key first and then define keyin terms of super keys as minimal super key. So, these are various terms that you will hear. So,primary key is one of the candidate keys is chosen for indexing purposes and things like that. So,we will see those details later on okay, so, this is a important idea, the key for a relations here. Ihope it is clear. Moving on what is a relational database scheme. A relation database scheme D consists of finitenumber of relations schemes and a set of integrity constraints okay. Now, what are integrityconstraints. We will talk about that, first thing is okay it consists of finite number of relationschemes and internationally it is just not these schemes, but we also need certain constraints andwhat are these constraints.Constraints are certain necessary conditions to be satisfied by this data values in these instancesin the relations senses. So that that instances constitute a meaningful database okay, now,normally we talk about 3 kinds of constraints, 1 domain constraints, key constraints andreferential integrity constraints. So, in the next few slides, I will explain these terms. Domainconstraints, key constraints and referential integrity constraints.And then so we call this relational database scheme. So earlier we just had relation scheme. Nowwe have database scheme. A database scheme is nothing but a finite number of relations schemesalong with this integrated constraints and database instance is a collection of relational instancesof all those relations schemes which satisfy the integrity constraints okay. So, we have 2 thingshere the relation scheme and relation instance, database schema and database instance.A database instance is nothing but a collection of relational instances but not just any collection,but those that satisfy the integrity constraints. Now let us go into a little bit deeper into whatthese constraints are. And why do we first call them as constraints and let us get into that okay. This basically the idea of domain and key constraints is to capture the domains and domains forattributes and the keys for the relation. So we will see that now. So, attributes have associateddomains as I have already mentioned. And the domain is nothing but a set of atomic data values.So the constraint here, what it specifies is that the actual values of an attribute in any tuple in therelation instance must indeed belong to this declared domain okay.So, that is the constraint would like to turn that information into this constraint mainly becausewe would later on charge this relational database manager management system responsible formaintaining these satisfying these constraints okay. So that is the reason why we call them asconstraints. So, the corresponding constraint basically stipulates that the actual values of theattribute.So, the data entry errors are often seen. Suppose you define a cell number as something whichhas exactly 10 digits, a string with 10 digits, okay and no all numeric 10 digits, that is it, right.And then you see that somebody entered a 12 digit number there, then it is a violation of theconstraint right. So, whatever is the declared domain the values must come from belong to thatdeclared domain.So, we term this as a domain constraint. And in a similar way we have this what key constraintsbasically, if you in the relation scheme, you say you have kind of declared are asserted that thereare certain keys candidate keys. Now the corresponding constraints would be that if K issupposed to be a key for this relation schema, then in any relation instance r little r okay we willuse lowercase letters to indicate the instances and uppercase letters to you indicate the schemasin general.So, in any relation instance r little r on the schema r should not have 2 tuples that have identicalvalues for these attributes of K okay, it is not just enough to say that okay this is key, but theinstance should you know satisfied, right. Otherwise it is a useless instance. So, none of the andin addition the key attributes should not have null values could not have is null is a special kindof a value views in order to kind of indicate that we do not know the value or the value is notapplicable.So, we use this null values okay. So, these are reasonably straightforward to understand that thatbasically we will use this notion of domain and key and then convert this into domain constraintsand key constraints. So that you know when you are specifying the database scheme you wouldnot only specify the relation schemas, but you will also specify all these constraints. So, that thewhen you consider a database instance it should be an instance of all these relations, relationssenses and together satisfying all these constraints will later on see.I mean we are also going to see one more important constraint called referential integrityconstraint okay. So in order to introduce this referential integrity constraints. I must know introduce another term that uses the name key but it is called foreign Key let uslook at this what exactly are foreign keys okay. Now, it often required see, notice one thing thatin this model so far we have only one thing called the notion of relation. As a contrast to this, ifyou consider the ER model we had entities and relationships, so relationships between entitiesthat captured the associations between entities, okay.But now those relationships also have to be captured as relations in the relational model okay, sowe will let us see exactly how that mapping will come out. But in general, you can understandthat tuples in one relation may have to refer to tuples in another relation. So say r1 on capital R 1some trouble in that may need to refer to the tuple in R 2. If that is the case, how does it refer.So, so, maybe this is to capture the relationships between entities okay. So let me give you anexample of that. But before we look at an example, here is a formal definition for that. So let ussay primary key of R 2 is B 1 through B j okay, so we are talking about 2 relations R 1 and R 2.So let us stay R 2 has some key called B 1 through B j. Then we will define that set of attributesF of R1 which are A 1 through A j same number of attributes.But their names may be different, same number of attributes would satisfy this condition suchthat the domain of A i is same as the domain of B i corresponding domains are same,corresponding domains for these attributes are same. And the values o f these attributes are whatwe will use in order to refer to tuples in R 2 okay, these are the attributes we would like to usewhose values are used to refer to tuples in R 2 is what is called a foreign key in R1.And specifically will say it is a foreign key R 1 referring to R 2 okay. Now the one little twisthere is that R 1 and R 2 can be the same scheme also, sometimes we may use some set ofattributes to refer to the tuples in the same relation. I will show you some example of that littlelater okay, so now let us first understand this and give an example of this and there can be morethan one foreign key in a relation scheme.So this is also possible. So we will call such a collection of attributes as a foreign key because itis basically refers to tuples in a foreign relation okay. Here is a simple example. The let us look at these 2 relations schemes course and department. Soof course has certain things and then so course has course ID, name, credits and departmentnumber to kind of indicate us to what is the department that is responsible for offering thatcourse okay. Now here is department details. So department relation and that has department ID,name, who is the HOD.And what is the phone number, probably the office phone number and the office phone numberlet us say. Now here is the notion of foreign key. So these values we would like to now use inorder to kind of refer to the details of what is the department that is offering it. So algorithms isbeing offered by department number 1, okay, which is actually you can look up here and find outthat is in fact, computer science department.And this DSP digital signal processing is being offered by 2, which is relatively new department.So like that, so basically what we are saying here is that it is possible that we will in one relation,we have certain attributes whose values, we will use in order to refer to tuples in anotherrelationship and obviously the domain of this and the domain of that should be same right. So,that is what we put in the definition.And if there are possibly 2 attributes or 3 attributes of that kind, then obviously thecorresponding domains have to be same. And so such attributes are what we call us the asforeign keys. So in this case, we have taken a department number is a foreign key that refers todepartment, foreign key of course, relation and refers to the department relations.So this is how we enable tuples from to kind of link to other tuples. Notice that this is not aphysical pointer. This is value based. It is not a physical pointer, right. This is not like, this is alogical thing. It is a value. So if this value kind of represents what is the topic that is talkingabout okay. Here is it is possible for a foreign key in relation to refer to the wait, I think, okay. It is possiblefor a foreign key in relation to refer to the primary key of the relationship itself. Let us look anexample. So let us say you have a university employee with employee number, name, sex, salarydepartment and reports to let us assume that you know, just for the example say that allemployees are supposed to report to some other employee except possibly the top personally youknow stay with vice chancellor.So, obviously in this case reports to should be having values that come that our whatever is usedfor employee numbers, let us say employee numbers are some six digit numbers okay. So, thissupposed to also should be similar 6 digit numbers so that when you are looking at one tuple ofone particular employee from Ramesh we can look at here and then find out what is the valuehere reports to value.And that if you look now look up in the set of tuples will tell you who is reporting, who is hissupervisor or boss right. So, in this case, we will see that the information about the employee allthe employees are in the same relation and so the reports has to basically refer to the samerelation instance, it has to refer to the same relation instance. So, the here is a foreign key thatrefers to the relation itself.So, here is the normal key and here is a foreign key that that refers to the table the same relation.So, I hope this is clear to now that every employee in okay, so we basically are trying to captureas to who reports to whom. So, a bunch of employees might be reporting to one particular personand that particular person might be reporting to somebody else, and so on there is it you canimagine that there is some kind of a hierarchy in the university except of course, the director,vice chancellor which you will have a null value. Because he does not report to anybody you canimagine that way. Now, that we have set up this notion of a foreign key, we can talk about what are calledreferential integrity constraints, we will call them as RIC referential integrity constraint, whichbasically you know is some kind of a guard for this foreign keys. So, let us say F is a foreign keyin the scheme R1, referring to some scheme R 2. And let us say K is the primary key for thatparticular R 2.The RIC what it says is that in any religion instance R 1 on this relation, capital R 1 and R 2 itshould be such that any tuple t in r1 if you look at its F attributes, F is the set F be a foreign keyright. So either it is F attributes values are all null or all null or there are identical to the Kattribute values of some tuple in R 2 okay, you got it. What we are saying here is that F is aforeign key in R 1 and K is the primary key for R 2 okay.