Data Integration and Relational Database Model | ECS 289F, Study notes of Computer Science

Material Type: Notes; Professor: Ludaescher; Class: Data Bases; Subject: Engineering Computer Science; University: University of California - Davis; Term: Winter 2005;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-k1h
koofers-user-k1h 🇺🇸

10 documents

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
B. Ludaescher, ECS289F-W05, Topics in Scientific D ata Management
Course Overview
1. Data Integration and …
structured (relational) databases
knowledge-based extensions, ontologies
semi-structured (XML) databases
2. Scientific Workflows
Dataflow process networks
Web service workflows
The Kepler system
3. Student projects on (1) and (2)
B. Ludaescher, ECS289F-W05, Topics in Scientific D ata Management
Scientific Data
Scientific Data
& Workflow
& Workflow
Engineering
Engineering
Data Integration
Data Integration
Knowledge
Knowledge
Representation
Representation
Process Integration
Process Integration
Databases
Databases
B. Ludaescher, ECS289F-W05, Topics in Scientific D ata Management
Perfect Recall: Database Systems (Î165A)
A Database System (DBS) consists of a Database (DB) and a
Database Management System (DBMS)
A Database is a (typically very large) integrated collection of
interrelated data which are stored in files.
Data can come from commercial or scientific applications and
(usually) represent some abstraction/piece of the modeled real
world.
E.g, a scientific database might contain information about
known biological, chemical, astronomical entities, lab
experiments, etc
A Database Management System is a collection of software
packages designed to store, access, and manage databases.
It provides users and applications with an environment that is
convenient and efficient to use.
pf3
pf4
pf5

Partial preview of the text

Download Data Integration and Relational Database Model | ECS 289F and more Study notes Computer Science in PDF only on Docsity!

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

Course Overview

  1. Data Integration and …
    • structured (relational) databases
    • knowledge-based extensions, ontologies
    • semi-structured (XML) databases
  2. Scientific Workflows
    • Dataflow process networks
    • Web service workflows
    • The Kepler system
  3. Student projects on (1) and (2)

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

Scientific DataScientific Data & Workflow& Workflow EngineeringEngineering

Data Integration Data Integration

KnowledgeKnowledge Process IntegrationProcess Integration RepresentationRepresentation

DatabasesDatabases

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

Perfect Recall: Database Systems ( Î 165A)

  • A Database System (DBS) consists of a Database (DB) and a Database Management System (DBMS)
  • A Database is a (typically very large) integrated collection of interrelated data which are stored in files.
  • Data can come from commercial or scientific applications and (usually) represent some abstraction/piece of the modeled real world.
  • E.g, a scientific database might contain information about known biological, chemical, astronomical entities, lab experiments, etc
  • A Database Management System is a collection of software packages designed to store, access, and manage databases. It provides users and applications with an environment that is convenient and efficient to use.

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

Relational Database Model

  • Think of a relational DB as a number of tables , each have a particular schema : - Course(Instructor, Name, Quarter, Department)
  • The table/relation name “Course”, identifies which table we are talking about.
  • The attribute/column name (e.g., “Instructor”) corresponds to the “column header”
  • Elements aka instances or tuples of a table/relation can be written, e.g., as follows: Course(“Gertz”, “ECS165A”, “W-2005”, “CS”). Course(“Ludaescher”, “ECS289F”, “W-2005”, “CS”). …

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

Example

  • The same in Datalog notation as a set of facts : course(‘Ludaescher’, ‘ECS289F’, ‘W-2005’, ‘CS’). course( … , … , … , …).

Gertz ECS165A W-2005 CS

… … … …

Ludaescher ECS289F W-2005 CS

Instructor Name Quarter Department

Course

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

Hmm.. looks like a Spreadsheet …

  • … but there are differences.
  • What are they?

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

DATALOG

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

DATALOG: Examples of Relational Operations

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

What is a Query?

  • A query expression e.g. in SQL or in Datalog denotes a query (but we still don’t know what a query is…)
  • A query is a (generic*) mapping f from instances of an input schema (EDB) to instances of an output schema (IDB): f : inst(EDB) Æ inst(IDB)
  • Note: Different query expressions can denote the same query (mapping). Example…?

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

What is a Query?

  • A query is a generic mapping f from instances of an input schema (EDB) to instances of an output schema (IDB): f : inst(EDB) Æ inst(IDB)
  • generic : invariant under renamings r, i.e., f (r (I)) = r(f(I)) for all database instances I of the schema EDB
  • Examples: Consider EBD = {p(X), emp(N,S)}. Which of the following are generic? - f_even: “T” if | {x | p(x) is in DB I} | is even - f_jeff: { (N,S) | emp(N,S) in DB I, N = “Jeff” }

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

Problem

  • How can one evaluate DATALOG queries? That is, given a database instance (= a set of facts ), how can one obtain the answer to a given query ( =rule or set of rules )?

B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management

DATALOG: Fixpoint Semantics (Bottom-Up)