Distributed database:Query Processing ,Transaction Management,Distributed Concurrency Ctrl, Lecture notes of Software Engineering

lesson 1 of advanced databases

Typology: Lecture notes

2021/2022

Available from 02/23/2025

sammy-gitonga
sammy-gitonga 🇰🇪

1 document

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Distributed Databases
Chapter 1: Introduction
Syllabus
Data Independence and Distributed Data Processing
Definition of Distributed databases
Promises of Distributed Databases
Technical Problems to be Studied
Conclusion
Page 1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download Distributed database:Query Processing ,Transaction Management,Distributed Concurrency Ctrl and more Lecture notes Software Engineering in PDF only on Docsity!

Distributed Databases

Chapter 1: Introduction

  • Syllabus
    • Data Independence and Distributed Data Processing
  • Definition of Distributed databases
  • Promises of Distributed Databases
  • Technical Problems to be Studied
  • Conclusion

Syllabus

  • Introduction
    • Distributed DBMS Architecture
  • Distributed Database Design
  • Query Processing
  • Transaction Management
  • Distributed Concurrency Control
  • Distributed DBMS Reliability
  • Parallel Database Systems

Data Independence...

  • The development of DBMS helped to fully achieve data independence (transparency)
  • Provide centralized and controlled data maintenance and access
  • Application is immune to physical and logical file organization

Data Independence...

  • Distributed database system is the union of what appear to be two diametrically opposed approaches to data processing: database systems and computer network - Computer networks promote a mode of work that goes against centralization
  • Key issues to understand this combination - The most important objective of DB technology is integration not centralization - Integration is possible without centralization, i.e., integration of databases and networking does not mean centralization (in fact quite opposite)
  • Goal of distributed database systems: achieve data integration and data distribution transparency

Distributed Computing/Data Processing...

  • What can be distributed? - Processing logic - Functions - Data - Control
  • Classification of distributed systems with respect to various criteria - Degree of coupling, i.e., how closely the processing elements are connected ∗ e.g., measured as ratio of amount of data exchanged to amount of local processing ∗ weak coupling, strong coupling - Interconnection structure ∗ point-to-point connection between processing elements ∗ common interconnection channel - Synchronization ∗ synchronous ∗ asynchronous

Definition of DDB and DDBMS

  • A distributeddatabase (DDB)isacollectionofmultiple,logicallyinterrelateddatabases distributed over a computer network
  • A distributeddatabasemanagementsystem (DDBMS)isthesoftwarethatmanages the DDB and provides an access mechanism that makes this distribution transparent to the users
  • The terms DDBMS and DDBS are often used interchangeably
  • Implicit assumptions - Data stored at a number of sites each site logically consists of a single processor - Processors at different sites are interconnected by a computer network (we do not consider multiprocessors in DDBMS, cf. parallel systems) - DDBS is a database, not a collection of files (cf. relational data model). Placement and query of data is impacted by the access patterns of the user - DDBMS is a collections of DBMSs (not a remote file system)

Definition of DDB and DDBMS...

  • Example: Database consists of 3 relations employees, projects, and

assignment which are partitioned and stored at different sites (fragmentation).

  • What are the problems with queries, transactions, concurrency, and reliability?

What is not a DDBS?

  • The following systems are parallel database systems and are quite different from (though related to) distributed DB systems Shared Memory Shared Disk Shared Nothing Central Databases

Promises of DDBSs

Distributed Database Systems deliver the following advantages:

  • Higher reliability
    • Improved performance
      • Easier system expansion
        • Transparency of distributed and replicated data

Promises of DDBSs...

Higher reliability

  • Replication of components
  • No single points of failure
    • e.g., a broken communication link or processing element does not bring down the entire system
  • Distributed transaction processing guarantees the consistency of the database and concurrency

Promises of DDBSs...

Easier system expansion

  • Issue is database scaling
    • Emergence of microprocessor and workstation technologies - Network of workstations much cheaper than a single mainframe computer
  • Data communication cost versus telecommunication cost
  • Increasing database size

Promises of DDBSs...

Transparency

  • Refers to the separation of the higher-level semantics of the system from the lower-level implementation issues
  • Atransparentsystem“hides”theimplementationdetailsfromtheusers.
  • AfullytransparentDBMSprovideshigh-levelsupportforthedevelopmentofcomplex applications. (a) User wants to see one database (b) Programmer sees many databases

Promises of DDBSs...

  • Network/Distribution transparency allows a user to perceive a DDBS as a single, logical entity
  • The user is protected from the operational details of the network (or even does not know about the existence of the network)
  • The user does not need to know the location of data items and a command used to perform a task is independent from the location of the data and the site the task is performed ( location transparency )
  • Auniquenameisprovidedforeachobjectinthedatabase( namingtransparency ) - In absence of this, users are required to embed the location name as part of an identifier

Promises of DDBSs...

Different ways to ensure naming transparency:

  • Solution 1: Create a central name server; however, this results in - loss of some local autonomy - central site may become a bottleneck - low availability (if the central site fails remaining sites cannot create new objects)
  • Solution 2: Prefix object with identifier of site that created it - e.g., branch created at site S1 might be named S1.BRANCH - Also need to identify each fragment and its copies - e.g., copy 2 of fragment 3 of Branch created at site S1 might be referred to as S1.BRANCH.F3.C
  • An approach that resolves these problems uses aliases for each database object - Thus, S1.BRANCH.F3.C2 might be known as local branch by user at site S - DDBMS has task of mapping an alias to appropriate database object