Data Base introduction in simple language , Study notes of Database Management Systems (DBMS)

Data Base System in simple language define and Sql Queries in detail

Typology: Study notes

2017/2018

Uploaded on 03/16/2018

flat-slap
flat-slap 🇵🇰

1 document

1 / 76

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
P a g e | 1
Msc IT (3rd Term) ILM College Jauharabad M.Wasim
Data Base System
Course Outline:
1. Databases Overview: Basic Concept; File Processing & Database Approach, Database Applications,
Advantages of the DB, Components of the DB Environment, and Evolution of DBs.
2. Database Architecture: DB Development Process, Three Schema Architecture, Data Modeling, E-R Modeling
(Basic Concepts)
3. Logical Design: E-R Modeling (Entities, Attributes, Relationships; Cardinality Constraints), RDBMS: Logical
View of Data; The Relational Data Model
4. Logical Design:Constraints, Transforming ERD/EERD into Relations
5. The Relational Model: Types, Relations, Relational Algebra, Relational Calculus, Integrity
6. Normalization: First Normal Form, Second Normal Form
7. Normalization: Third Normal Form (3NF), Boyce Codd Normal Form (BCNF)
8. EE-R Diagrams: Development & Constraints, DB Design Life Cycle,
9. DB Development & Management: Introduction to SQL and Basic Commands, SQL Integrity Constraints.
10. Physical DB Design, DB architecture, Query Optimization
11. SQL Commands: Saving, Listing, Editing, Restoring Table Contents; Logical Operators, Management
Commands
12. Arithmetic Operators, Complex Queries and SQL Functions, Aggregate Function, Grouping Functions 13.
Virtual Tables, Views, Indexes, Joins
14. Clint-Server & Distributed Environment, ODBC, Bridges, and Connectivity Issues.
15. Concurrency Control with Locking, Serializability, Deadlocks, Database Recovery Management.
16. Distributed Processing and Distributed Databases,DDBMS: Evolution, Architecture, Components,
Advantages, Security and Authorization. Physical Design: Storage and File Structure, Efficiency And Tuning
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c

Partial preview of the text

Download Data Base introduction in simple language and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!

Data Base System

Course Outline:

  1. Databases Overview: Basic Concept; File Processing & Database Approach, Database Applications,

Advantages of the DB, Components of the DB Environment, and Evolution of DBs.

  1. Database Architecture: DB Development Process, Three Schema Architecture, Data Modeling, E-R Modeling

(Basic Concepts)

  1. Logical Design: E-R Modeling (Entities, Attributes, Relationships; Cardinality Constraints), RDBMS: Logical

View of Data; The Relational Data Model

  1. Logical Design:Constraints, Transforming ERD/EERD into Relations
  2. The Relational Model: Types, Relations, Relational Algebra, Relational Calculus, Integrity
  3. Normalization: First Normal Form, Second Normal Form
  4. Normalization: Third Normal Form (3NF), Boyce Codd Normal Form (BCNF)
  5. EE-R Diagrams: Development & Constraints, DB Design Life Cycle,
  6. DB Development & Management: Introduction to SQL and Basic Commands, SQL Integrity Constraints.
  7. Physical DB Design, DB architecture, Query Optimization
  8. SQL Commands: Saving, Listing, Editing, Restoring Table Contents; Logical Operators, Management

Commands

  1. Arithmetic Operators, Complex Queries and SQL Functions, Aggregate Function, Grouping Functions 13.

Virtual Tables, Views, Indexes, Joins

  1. Clint-Server & Distributed Environment, ODBC, Bridges, and Connectivity Issues.
  2. Concurrency Control with Locking, Serializability, Deadlocks, Database Recovery Management.
  3. Distributed Processing and Distributed Databases,DDBMS: Evolution, Architecture, Components,

Advantages, Security and Authorization. Physical Design: Storage and File Structure, Efficiency And Tuning

Data Base over view and Basic Concepts:

What is Data?

Data is nothing but facts and statistics stored or free flowing over a network, generally it's raw and unprocessed.

Data becomes information when it is processed, turning it into something meaningful.

Database:

Database is a collection of related data and data is a collection of facts and figures that can be processed to produce information.

A Database is a collection of related data organized in a way that data can be easily accessed, managed and updated.

Database can be software based or hardware based, with one sole purpose, storing data. During early computer days, data was collected and stored on tapes, which were mostly write-only, which means once data is stored on it, it can never be read again. They were slow and bulky, and soon computer scientists realized that they needed a better solution to this problem.

What is DBMS?

A DBMS is software that allows creation, definition and manipulation of database, allowing users to store, process and analyze data easily.

DBMS provides us with an interface or a tool, to perform various operations like creating database, storing data in it, updating data, creating tables in the database and a lot more.

DBMS also provides protection and security to the databases. It also maintains data consistency in case of multiple users.

Here are some examples of popular DBMS used these days:

Data Base Application:

 MySql  Oracle  SQL Server  IBM DB  PostgreSQL  Amazon SimpleDB (cloud based) etc.

Components of the DB Environment: The database management system can be divided into five major components, they are:

  1. Hardware
  2. Software
  3. Data
  4. Procedures
  5. Database Access Language

Evolution of DB.

Database Architecture: A Database Management system is not always directly available for users and applications to access and store data in it. A Database Management system can be

Centralized (all the data stored at one location),

decentralized (multiple copies of database at different locations)

or hierarchical , depending upon its architecture.

1 - tier DBMS architecture also exist, this is when the database is directly available to the user for using it to store data. Generally such a setup is used for local application development, where programmers communicate directly with the database for quick response.

Database Architecture is logically of two types:

  1. 2 - tier DBMS architecture
  2. 3 - tier DBMS architecture

2 - tier DBMS Architecture

2 - tier DBMS architecture includes an Application layer

between the user and the DBMS, which is responsible

to communicate the user's request to the database management

system and then send the response from the DBMS to the user.

An application interface known as ODBC (Open Database Connectivity) provides an API that allow client side program to call the DBMS. Such an architecture provides the DBMS extra security as it is not exposed to the End User directly.

3 - tier DBMS Architecture

3 - tier DBMS architecture is the most commonly used architecture for

webapplications.In 3 - tier architecture, an additional Presentation or GUI Layer

is added, which provides a graphical user interface for the End user to interact

with the DBMS.

For the end user, the GUI layer is the Database System, and the end user

has no idea about the application layer and the DBMS system.

If you have used MySQL , then you must have seen PHPMyAdmin ,

it is the best example of a 3-tier DBMS architecture.

Three Schema Architecture

A database schema defines its entities and the relationship among them.

It’s the database designers who design the schema to help programmers

understand the database and make it useful.

Databases are characterized by three-schema architecture

because there are three different ways to look at them. Each schema

is important to different groups in an organization.

Network Model This is an extension of the Hierarchical model. In this model data is organized more like a graph, and are allowed to have more than one parent node.

In this database model data is more related as more relationships are established in this database model. Also, as the data is more related, hence accessing the data is also easier and fast. This database model was used to map many-to-many data relationships.

This was the most widely used database model, before Relational Model was introduced.

Entity-Relationship Model:

Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships among them. While formulating real-world scenario into the database model, the ER Model creates entity set, relationship set, general attributes, and constraints. In this database model, relationships are created by dividing object of interest into entity and its characteristics into attributes.

Different entities are related using relationships.

ER Model is best used for the conceptual design of a database. ER Model is based on −  Entities and their attributes.Relationships among entities.

Entity − An entity in an ER Model is a real-world entity having properties called attributes. Every attribute is defined by its set of values called domain. For example, in a school database, a student is considered as an entity. Student has various attributes like name, age, class, etc.

Relationship − The logical association among entities is called relationship. Relationships are mapped with entities in various ways. Mapping cardinalities define the number of association between two entities. Mapping cardinalities −

o one to one o one to many o many to one o many to many

Relational Model: The most popular data model in DBMS is the Relational Model. It is more scientific a model than others.In this model, data is organized in two-dimensional tables and the relationship is maintained by storing a common field.The basic structure of data in the relational model is tables. All the information related to a particular type is stored in rows of that table. Hence, tables are also known as relations in relational model.

ER MODEL – BASIC CONCEPTS:

The ER model defines the conceptual view of a database. It works around realworld entities and the associations among them. At view level, the ER model is considered a good option for designing databases.

Entity: An entity can be a real-world objectthat can be easily identifiable. For example, in a school database, students, teachers, classes, and courses offered can be considered as entities. All these entities have some attributes or properties that give them their identity. An entity set is a collection of similar types of entities. An entity set may contain entities with attribute sharing similar values. For example, a Students set may contain all the students of a school; likewise a Teachers set may contain all the teachers of a school from all faculties. Entity sets need not be disjoint.

Attributes:

Entities are represented by means of their properties called attributes. All attributes have values. For example, a student entity may have name, class, and age as attributes. There exists a domain or range of values that can be assigned to attributes. For example, a student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.

Types of Attributes:

Simple attribute: Simple attributes are atomic values, which cannot be divided further. For example, a student's phone number is an atomic value of 10 digits.

Composite attribute:

Composite attributes are made of more than one simple attribute. For example, a student's complete name may have first name and last-named.

Derived attribute:

Derived attributes are the attributes that do not exist in the physical database, but their values are derived from other attributes present in the database. For example, average

More than one entities from entity set A can be associated with at most one entity of entity set B, however an entity from entity set B can be associated with more than one entity from entity set A.

Many-to-many:

One entity from A can be associated with more than one entity from B and vice versa.

Basic Relational DBMS Concepts A Relational Database management System (RDBMS) is a database management system based on the relational model introduced by E.F Codd. In relational model, data is stored in relations (tables) and is represented in form of tuples (rows).

Concepts

Tables − In relational data model, relations are saved in the format of Tables. This format stores the relation among entities. A table has rows and columns, where rows represents records and columns represent the attributes. Tuple − A single row of a table, which contains a single record for that relation is called a tuple. Relation instance − A finite set of tuples in the relational database system represents relation instance. Relation instances do not have duplicate tuples. Relation schema − A relation schema describes the relation name (table name), attributes, and their names. Relation key − Each row has one or more attributes, known as relation key, which can identify the row in the relation (table) uniquely. Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.

Constraints

Every relation has some conditions that must hold for it to be a valid relation. These conditions are called Relational Integrity Constraints. There are three main integrity constraints −

 Key constraints  Domain constraints  Referential integrity constraints

Key Constraints

There must be at least one minimal subset of attributes in the relation, which can identify a tuple uniquely. This minimal subset of attributes is called key for that relation. If there is more than one such minimal subset, these are called candidate keys. Key constraints force that −  In a relation with a key attribute, no two tuples can have identical values for key attributes.  A key attribute cannot have NULL values. Key constraints are also referred to as Entity Constraints.

Domain Constraints

Attributes have specific values in real-world scenario. For example, age can only be a positive integer. The same constraints have been tried to employ on the attributes of a relation. Every attribute is bound to have a specific range of values. For example, age cannot be less than zero and telephone numbers cannot contain a digit outside 0-9.

Referential integrity Constraints

Referential integrity constraints work on the concept of Foreign Keys. A foreign key is a key attribute of a relation that can be referred in other relation. Referential integrity constraint states that if a relation refers to a key attribute of a different or same relation, then that key element must exist.

What is Relational Algebra? Every database management system must define a query language to allow users to access the data stored in the database. Relational Algebra is a procedural query language used to query the database tables to access data in different ways.

In relational algebra, input is a relation (table from which data has to be accessed) and output is also a relation (a temporary table holding the data asked for by the user).

The primary operations that we can perform using relational algebra are:

  1. Select
  2. Project
  3. Union
  4. Set Different
  5. Cartesian product
  6. Rename

Select Operation (σ)

This is used to fetch rows (tuples) from table (relation) which satisfies a given condition.

Syntax: σp(r)

Set Difference (-)

This operation is used to find data present in one relation and not present in the second relation. This operation is also applicable on two relations, just like Union operation.

Syntax: A - B

where A and B are relations.

For example, if we want to find name of students who attend the regular class but not the extra class, then, we can use the below operation:

∏Student(RegularClass) - ∏Student(ExtraClass)

Cartesian product (X)

This is used to combine data from two different relations (tables) into one and fetch data from the combined relation.

Syntax: A X B

For example, if we want to find the information for Regular Class and Extra Class which are conducted during morning, then, we can use the following operation:

σtime = 'morning' (RegularClass X ExtraClass)

For the above query to work, both Regular Class and Extra Class should have the attribute time.

Rename Operation (ρ)

This operation is used to rename the output relation for any query operation which returns result like Select, Project etc. Or to simply rename a relation (table)

Syntax: ρ(RelationNew, RelationOld)

Apart from these common operations Relational Algebra is also used for Join operations like,

 Natural Join  Outer Join  Theta join etc

Relational Calculus

In contrast to Relational Algebra, Relational Calculus is a non-procedural query language, that is, it tells what to do but never explains how to do it. Relational calculus exists in two forms −

Tuple Relational Calculus (TRC)

Filtering variable ranges over tuples Notation − {T | Condition} Returns all tuples T that satisfies a condition.

For example

{ T.name | Author(T) AND T.article = 'database' }

Output − Returns tuples with 'name' from Author who has written article on 'database'. TRC can be quantified. We can use Existential (∃) and Universal Quantifiers (∀). For example

{ R| ∃T ∈ Authors(T.article='database' AND R.name=T.name)}

Output − The above query will yield the same result as the previous one.

Domain Relational Calculus (DRC)

In DRC, the filtering variable uses the domain of attributes instead of entire tuple values (as done in TRC, mentioned above). Notation − { a 1 , a 2 , a 3 , ..., an | P (a 1 , a 2 , a 3 , ... ,an)} Where a1, a2 are attributes and P stands for formulae built by inner attributes.

For example

{< article, page, subject >| ∈TutorialsPoint∧ subject = 'database'}

Output − Yields Article, Page, and Subject from the relation TutorialsPoint, where subject is database. Just like TRC, DRC can also be written using existential and universal quantifiers. DRC also involves relational operators. The expression power of Tuple Relation Calculus and Domain Relation Calculus is equivalent to Relational Algebra.

ER Model to Relational Model

As we all know that ER Model can be represented using ER Diagrams which is a great way of designing and representing the database design in more of a flow chart form.

It is very convenient to design the database using the ER Model by creating an ER diagram and later on converting it into relational model to design your tables.

Not all the ER Model constraints and components can be directly transformed into relational model, but an approximate schema can be derived.

So let's take a few examples of ER diagrams and convert it into relational model schema, hence creating tables in RDBMS.

Entity becomes Table

Entity in ER Model is changed into tables, or we can say for every

Entity in ER model, a table is created in Relational Model.And

the attributes of the Entity gets converted to columns of the table.

  1. Ensuring data dependencies make sense i.e data is logically stored.

Problems without Normalization:

If a table is not properly normalized and have data redundancy then it will not only eat up extra memory space but will also make it difficult to handle and update the database, without facing data loss. Insertion, Updation and Deletion Anomalies are very frequent if database is not normalized. To understand these anomalies let us take an example of a Student table.

In the table above, we have data of 4 Computer Sci. students. As we can see, data for the fields branch, hod(Head of Department) and office_tel is repeated for the students who are in the same branch in the college, this is Data Redundancy.

Insertion Anomaly

Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be inserted, or else we will have to set the branch information as NULL.

Also, if we have to insert data of 100 students of same branch, then the branch information will be repeated for all those 100 students.

These scenarios are nothing but Insertion anomalies.

Updation Anomaly

What if Mr. X leaves the college? or is no longer the HOD of computer science department? In that case all the student records will have to be updated, and if by mistake we miss any record, it will lead to data inconsistency. This is Updation anomaly.

Deletion Anomaly

In our Student table, two different information’s are kept together, Student information and Branch information. Hence, at the end of the academic year, if student records are deleted, we will also lose the branch information. This is Deletion anomaly.

First Normal Form(FNF)

First Normal Form is defined in the definition of relations (tables) itself. This rule definesthat all the attributes

rollno name branch hod office_tel

401 Abdullah CSE Mr. X 53337

402 Abdul CSE Mr. X 53337

403 Wasim CSE Mr. X 53337

404 Malik CSE Mr. X 53337

in a relation must haveatomic domains. The values in an atomic domainare indivisible units. We re-arrange the relation (table) as below, to convert it to First Normal Form.

Each attribute must contain only a single value from its pre-defined domain.

Second Normal Form(SNF)

Before we learn about the second normal form, we need to understand the following −  Prime attribute − An attribute, which is a part of the candidate-key, is known as a prime attribute.  Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a non-prime attribute. If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime key attribute. That is, if X →A holds, then there should not be any proper subset Y of X, for which Y → A also holds true.

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key attribute individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency , which is not allowed in Second Normal Form.

Designing Databases/Data Base Life Cycle:

A database is usually a fundamental component of the information system, especially in business oriented systems. Thus database design is part of system development. The following picture shows how database design is involved in the system development lifecycle.

The phases in the middle of the picture (Database Design, Database Implementation) are the phases that you concentrate on in the Database Design course. The other phases are briefly described. They are part of the contents of the Systems Analysis and Design courses, for example.

There are various methods of how the different phases of information system design, analysis and implementation can be done. Here the main tasks or goals are described but no method is introduced.

Database Planning

The database planning includes the activities that allow the stages of the database system development lifecycle to be realized as efficiently and effectively as possible. This phase must be integrated with the overall Information System strategy of the organization.

The very first step in database planning is to define the mission statement and objectives for the database system. That is the definition of:

  • the major aims of the database system
  • the purpose of the database system
  • the supported tasks of the database system
  • the resources of the database system

Systems Definition

In the systems definition phase, the scope and boundaries of the database application are described. This description includes:

  • links with the other information systems of the organization
  • what the planned system is going to do now and in the future
  • who the users are now and in the future.

The major user views are also described. i.e. what is required of a database system from the perspectives of particular job roles or enterprise application areas.

Requirements Collection and Analysis

During the requirements collection and analysis phase, the collection and analysis of the information about the part of the enterprise to be served by the database are completed. The results may include eg:

  • the description of the data used or generated
  • the details how the data is to be used or generated
  • any additional requirements for the new database system

Database Design

The database design phase is divided into three steps:

  • conceptual database design
  • logical database design
  • physical database design

In the conceptual database design phase, the model of the data to be used independent of all physical considerations is to be constructed. The model is based on the requirements specification of the system.

In the logical database design phase, the model of the data to be used is based on a specific data model, but independent of a particular database management system is constructed. This is based on the target data model for the database e.g. relational data model.

In the physical database design phase, the description of the implementation of the database on secondary storage is created. The base relations, indexes, integrity constraints, security, etc. are defined using the SQL language.

Database Management System Selection

This in an optional phase. When there is a need for a new database management system (DBMS), this phase is done. DBMS means a database system like Access, SQL Server, MySQL, Oracle.

In this phase the criteria for the new DBMS are defined. Then several products are evaluated according to the criteria. Finally the recommendation for the selection is decided.

Application Design

In the application design phase, the design of the user interface and the application programs that use and process the database are defined and designed.

Protyping

The purpose of a prototype is to allow the users to use the prototype to identify the features of the system using the computer.

There are horizontal and vertical prototypes. A horizontal prototype has many features (e.g. user interfaces) but they are not working. A vertical prototype has very few features but they are working. See the following picture.