















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Notes; Professor: Huan; Class: Introductn to Database Systems; Subject: Elect Engr & Computer Science; University: University of Kansas; Term: Spring 2009;
Typology: Study notes
1 / 23
This page cannot be seen from the preview
Don't miss anything!
















2/24/
Luke Huan Univ. of Kansas
A database
A database management system
A miniworld
A data model z^
Conceptual model z^
Relational model
Applications insulated from how data is structuredand stored.
Logical data independence
: the capacity to change the
conceptual schema without having to change theexternal schema
Physical data independence
: the capacity to change
the internal schema without having to change theconceptual schema
Q: Why is this particularly important for DBMS?
2/24/
Luke Huan Univ. of Kansas
Using conceptual model such as ER model
2/24/
Luke Huan Univ. of Kansas
Now you have a draft, how do you improve
your design?
Using integrity constraints z^
Attributes value must come from its domain z^
Every relation mush have a primary key z^
The primary key value in a tuple can not be NULL z^
The foreign key value in a referenced tuple must exist orthe foreign key value in the referencing tuple is NULL
2/24/
Luke Huan Univ. of Kansas
How do you improve relational data base
design (cont.)?
Using functional dependency z^
Non-key FD always leads to redundancy z^
BCNF normal form z^ Pick up a non-key FD, do binary decomposition
First normal form: no attribute may be composite ormulti-valued
(
normal form and 3
normal form are coming… )
2/24/
Luke Huan Univ. of Kansas
SELECT
a list of attributes
FROM
a list of relations WHERE
condition
;
Condition may have logical operators AND, OR, NOT
Condition may have comparison operators: <. <=, <>,>=, >
String comparison may use “=” (exactly match) or“LIKE” (matching with regular expressions) z^
%, _, \
(Arithmetic) expressions of attributes are allowed
2/24/
Luke Huan Univ. of Kansas
Bag
Bag
Bag1 UNION ALL Bag
Bag1 EXCEPT ALL Bag
Bag1 INTERSECT ALL Bag
fruit AppleAppleOrange
fruitAppleOrangeOrange fruitAppleAppleAppleOrangeOrangeOrange
fruitApple
fruitAppleOrange
2/24/
Luke Huan Univ. of Kansas
13
Standard SQL aggregate functions:
COUNT
,^ SUM
,^ AVG
,
MIN
,^ MAX
Example: number of students under 18, and theiraverage GPA z^
SELECT
COUNT(*),
AVG(GPA)
FROM
Student WHERE
age
<
18;
z^
COUNT(*)
counts the number of rows
2/24/
Luke Huan Univ. of Kansas
Example: How many students are taking classes? z^
SELECT
COUNT
(SID)
FROM
Enroll;
z^
SELECT
COUNT(DISTINCT
SID)
FROM
Enroll;
2/24/
Luke Huan Univ. of Kansas
SELECT
…
FROM
…
WHERE
…
GROUP BY
…
;
Compute
FROM
(×
)
Compute
WHERE
(σ
)
Compute
GROUP BY
: group rows according to the
values of
GROUP BY
columns
Compute
SELECT
for each group (
π)
z^
For aggregation functions with
DISTINCT
inputs, first
eliminate duplicates within the group
Number of groups = number of rows in the final output
2/24/
Luke Huan Univ. of Kansas
SELECT age, AVG(GPA) FROM Student GROUP BY
age;
Compute
GROUP BY
: group
rows according to thevalues of
GROUP BY
columns
Compute
SELECT
for each
group
Kevin Kim 1306
Susan Wong 1204
Bob Lee
1011
age^2119
Mary Carter 1123
John Smith 1234
gpa
name
sid
Mary Carter 1123
Susan Wong 1204
Bob Lee
1011
age^2119
Kevin Kim 1306
John Smith 1234
gpa
name
sid
age^2122
gpa3.
2/24/
Luke Huan Univ. of Kansas
If a query uses aggregation/group by, then every columnreferenced in
SELECT
must be either
z^
Aggregated, or z^
A^
GROUP BY
column
This restriction ensures that any
SELECT
expression
produces only
one
value for each group
2/24/
Luke Huan Univ. of Kansas
z^
SELECT SID, age FROM Student GROUP BY age; z^
Recall there is one output row per group z^
There can be multiple SID values per group z^
SELECT SID, MAX(GPA) FROM Student; z^
Recall there is only one group for an aggregate query withno
GROUP BY
clause
z^
There can be multiple SID values z^
Wishful thinking (that the output SID value is the oneassociated with the highest GPA) does NOT work