Download Data Integration and Relational Database Model | ECS 289F and more Study notes Computer Science in PDF only on Docsity!
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
Course Overview
- Data Integration and …
- structured (relational) databases
- knowledge-based extensions, ontologies
- semi-structured (XML) databases
- Scientific Workflows
- Dataflow process networks
- Web service workflows
- The Kepler system
- Student projects on (1) and (2)
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
Scientific DataScientific Data & Workflow& Workflow EngineeringEngineering
Data Integration Data Integration
KnowledgeKnowledge Process IntegrationProcess Integration RepresentationRepresentation
DatabasesDatabases
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
Perfect Recall: Database Systems ( Î 165A)
- A Database System (DBS) consists of a Database (DB) and a Database Management System (DBMS)
- A Database is a (typically very large) integrated collection of interrelated data which are stored in files.
- Data can come from commercial or scientific applications and (usually) represent some abstraction/piece of the modeled real world.
- E.g, a scientific database might contain information about known biological, chemical, astronomical entities, lab experiments, etc
- A Database Management System is a collection of software packages designed to store, access, and manage databases. It provides users and applications with an environment that is convenient and efficient to use.
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
Relational Database Model
- Think of a relational DB as a number of tables , each have a particular schema : - Course(Instructor, Name, Quarter, Department)
- The table/relation name “Course”, identifies which table we are talking about.
- The attribute/column name (e.g., “Instructor”) corresponds to the “column header”
- Elements aka instances or tuples of a table/relation can be written, e.g., as follows: Course(“Gertz”, “ECS165A”, “W-2005”, “CS”). Course(“Ludaescher”, “ECS289F”, “W-2005”, “CS”). …
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
Example
- The same in Datalog notation – as a set of facts : course(‘Ludaescher’, ‘ECS289F’, ‘W-2005’, ‘CS’). course( … , … , … , …).
Gertz ECS165A W-2005 CS
… … … …
Ludaescher ECS289F W-2005 CS
Instructor Name Quarter Department
Course
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
Hmm.. looks like a Spreadsheet …
- … but there are differences.
- What are they?
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
DATALOG
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
DATALOG: Examples of Relational Operations
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
What is a Query?
- A query expression e.g. in SQL or in Datalog denotes a query (but we still don’t know what a query is…)
- A query is a (generic*) mapping f from instances of an input schema (EDB) to instances of an output schema (IDB): f : inst(EDB) Æ inst(IDB)
- Note: Different query expressions can denote the same query (mapping). Example…?
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
What is a Query?
- A query is a generic mapping f from instances of an input schema (EDB) to instances of an output schema (IDB): f : inst(EDB) Æ inst(IDB)
- generic : invariant under renamings r, i.e., f (r (I)) = r(f(I)) for all database instances I of the schema EDB
- Examples: Consider EBD = {p(X), emp(N,S)}. Which of the following are generic? - f_even: “T” if | {x | p(x) is in DB I} | is even - f_jeff: { (N,S) | emp(N,S) in DB I, N = “Jeff” }
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
Problem
- How can one evaluate DATALOG queries? That is, given a database instance (= a set of facts ), how can one obtain the answer to a given query ( =rule or set of rules )?
B. Ludaescher, ECS289F-W05, Topics in Scientific Data Management
DATALOG: Fixpoint Semantics (Bottom-Up)