Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Distributed Computing Systems, Lecture notes of Distributed Database Management Systems

University of Zambia Distributed Database Management Systems

distributed systems course teaches the fundamental principles and practical aspects of designing and implementing systems where multiple networked computers coordinate to achieve a common goal. Topics include algorithms for synchronization, consensus, and transactions; handling communication latency and failures; security; and emerging areas like cloud computing, peer-to-peer networks, and the Internet of Things (IoT). The curriculum typically involves both theoretical knowledge and hands-on projects to build and debug real systems.

Typology: Lecture notes

2024/2025

Uploaded on 11/17/2025

francine-40 🇿🇲

1 document

1 / 111

This page cannot be seen from the preview

Don't miss anything!

Syllabus

Distributed Computing

Lecture : 4 Hrs / Week Practical : 3 Hrs / Week

One paper : 100 Marks / 3 Hrs duration Term work : 25 Marks

1. Fundamentals

Evolution of Distributed Computing Systems, System

models, issues in design of Distributed Systems, Distributed-

computing environment, web based distributed model, computer

networks related to distributed systems and web based

protocols.

2. Message Passing

Inter process Communication, Desirable Features of Good

Message-Passing Systems, Issues in IPC by Message,

Synchronization, Buffering, Multidatagram Messages, Encoding

and Decoding of Message Data, Process Addressing, Failure

Handling, Group Communication.

3. Remote Procedure Calls

The RPC Model, Transparency of RPC, Implementing RPC

Mechanism, Stub Generation, RPC Messages, Marshaling

Arguments and Results, Server Management, Communication

Protocols for RPCs, Complicated RPCs, Client-Server Binding,

Exception Handling, Security, Some Special Types of RPCs,

Lightweight RPC, Optimization for Better Performance.

4. Distributed Shared Memory

Design and Implementation issues of DSM, Granularity,

Structure of Shared memory Space, Consistency Models,

replacement Strategy, Thrashing, Other Approaches to DSM,

Advantages of DSM.

5. Synchronization

Clock Synchronization, Event Ordering, Mutual Exclusion,

Election Algorithms.

Discover Lecture notes of Distributed Database Management Systems University of Zambia

Partial preview of the text

Download Distributed Computing Systems and more Lecture notes Distributed Database Management Systems in PDF only on Docsity!

Syllabus

Distributed Computing Lecture : 4 Hrs / Week Practical : 3 Hrs / Week One paper : 100 Marks / 3 Hrs duration Term work : 25 Marks

1. Fundamentals Evolution of Distributed Computing Systems, System models, issues in design of Distributed Systems, Distributed- computing environment, web based distributed model, computer networks related to distributed systems and web based protocols. 2. Message Passing Inter process Communication, Desirable Features of Good Message-Passing Systems, Issues in IPC by Message, Synchronization, Buffering, Multidatagram Messages, Encoding and Decoding of Message Data, Process Addressing, Failure Handling, Group Communication. 3. Remote Procedure Calls The RPC Model, Transparency of RPC, Implementing RPC Mechanism, Stub Generation, RPC Messages, Marshaling Arguments and Results, Server Management, Communication Protocols for RPCs, Complicated RPCs, Client-Server Binding, Exception Handling, Security, Some Special Types of RPCs, Lightweight RPC, Optimization for Better Performance. 4. Distributed Shared Memory Design and Implementation issues of DSM, Granularity, Structure of Shared memory Space, Consistency Models, replacement Strategy, Thrashing, Other Approaches to DSM, Advantages of DSM. 5. Synchronization Clock Synchronization, Event Ordering, Mutual Exclusion, Election Algorithms.

6. Resource and Process Management Desirable Features of a good global scheduling algorithm, Task assignment approach, Load Balancing approach, Load Sharing Approach, Process Migration, Threads, Processor allocation, Real time distributed Systems. 7. Distributed File Systems Desirable Features of a good Distributed File Systems, File Models, File Accessing Models, File-shearing Semantics, File- caching Schemes, File Replication, Fault Tolerance, Design Principles, Sun’s network file system, Andrews file system, comparison of NFS and AFS. 8. Naming Desirable Features of a Good Naming System, Fundamental Terminologies and Concepts, Systems-Oriented Names, Name caches, Naming & security, DCE directory services. 9. Case Studies Mach & Chorus (Keep case studies as tutorial) Term work/ Practical: Each candidate will submit assignments based on the above syllabus along with the flow chart and program listing will be submitted with the internal test paper. References:

Distributed OS by Pradeep K. Sinha (PHI)
Tanenbaum S.: Distributed Operating Systems, Pearson Education
Tanenbaum S. Maarten V.S.: Distributed Systems Principles and Paradigms, (Pearson Education)
George Coulouris, Jean Dollimore. Tim Kindberg: Distributed Systems concepts and design. 

Interconnection Hardware Systemwide shared memory

CPU CPU CPU CPU

(a) Local memory CPU Communication network (b) Local memory CPU Local memory CPU Local memory CPU Fig. 1.1 Difference between tightly coupled and loosely coupled multiprocessor systems (a) a tightly coupled multiprocessor system; (b) a loosely coupled multiprocessor system  Tightly coupled systems are referred to as parallel processing systems, and loosely coupled systems are referred to as distributed computing systems, or simply distributed systems.  In contrast to the tightly coupled systems, the processor of distributed computing systems can be located far from each other to cover a wider geographical area. Furthermore, in tightly coupled systems, the number of processors that can be usefully deployed is usually small and limited by the bandwidth of the shared memory. This is not the case with distributed computing systems that are more freely expandable and can have an almost unlimited number of processors.  In short, a distributed computing system is basically a collection of processors interconnected by a communication network in which each processor has its own local memory and other peripherals, and the communication between any

two processors of the system takes place by message passing over the communication network.  For a particular processor, its own resources are local, whereas the other processors and their resources are remote. Together, a processor and its resources are usually referred to as a node or site or machine of the distributed computing system.

1.2 EVOLUTION OF DISTRIBUTED COMPUTING

SYSTEM

 Computer systems are undergoing a revolution. From 1945, when the modem Computer era began, until about 1985, computers were large and expensive. Even minicomputers cost at least tens of thousands of dollars each. As a result, most organizations had only a handful of computers, and for lack of a way to connect them, these operated independently from one another. Starting around the mid-198 0s, however, two advances in technology began to change that situation.  The first was the development of powerful microprocessors. Initially, these were 8-bit machines, but soon 16-, 32-, and 64 - bit CPUs became common. Many of these had the computing power of a mainframe (i.e., large) computer, but for a fraction of the price. The amount of improvement that has occurred in computer technology in the past half century is truly staggering and totally unprecedented in other industries. From a machine that cost 10 million dollars and executed 1 instruction per second. We have come to machines that cost 1000 dollars and are able to execute 1 billion instructions per second, a price/performance gain of

 The second development was the invention of high-speed computer networks. Local-area networks or LANs allow hundreds of machines within a building to be connected in such a way that small amounts of information can be transferred between machines in a few microseconds or so. Larger amounts of data can be Distributed Computing become popular with the difficulties of centralized processing in mainframe use.  With mainframe software architectures all components are within a central host computer. Users interact with the host through a terminal that captures keystrokes and sends that information to the host. In the last decade however, mainframes have found a new use as a server in distributed

 Whilst three tier architectures proved successful at separating the logical design of systems, the complexity of collaborating interfaces was still relatively difficult due to technical dependencies between interconnecting processes. Standards for Remote Procedure Calls (RPC) were then used as an attempt to standardise interaction between processes.  As an interface for software to use it is a set of rules for marshalling and un-marshalling parameters and results, a set of rules for encoding and decoding information transmitted between two processes; a few primitive operations to invoke an individual call, to return its results, and to cancel it; provides provision in the operating system and process structure to maintain and reference state that is shared by the participating processes. RPC requires a communications infrastructure to set up the path between the processes and provide a framework for naming and addressing.  There are two models that provide the framework for using the tools. These are known as the computational model and the interaction model. The computational model describes how a program executes a procedure call when the procedure resides in a different process. The interaction model describes the activities that take place as the call progresses. A marshalling component and a encoding component are brought together by an Interface Definition Language (IDL). An IDL program defines the signatures of RPC operations. The signature is the name of the operation, its input and output parameters, the results it returns and the exceptions it may be asked to handle. RPC has a definite model of a flow of control that passes from a calling process to a called process. The calling process is suspended while the call is in progress and is resumed when the procedure terminates. The procedure may, itself, call other procedures. These can be located anywhere in the systems participating in the application.

1.3 DISTRIBUTED COMPUTING SYSTEM MODELS

Various models are used for building distributed computing systems. These models can be broadly classified into five categories – minicomputer, workstation, workstation-server, processor pool, and hybrid. They are briefly described below.

1.3.1 Minicomputer Model :  The minicomputer model is a simple extension of the centralized time sharing system as shown in Figure 1.2, a distributed computing system based on this model consists of a few minicomputers (they may be large supercomputers as well) interconnected by a communication network. Each minicomputer usually has multiple users simultaneously logged on to it. For this, several interactive terminals are connected to each minicomputer. Each user is logged on to one specific minicomputer, with remote access to other minicomputers. The network allows a user to access remote resources that are available on some machine other than the one on to which the user is currently logged.  The minicomputer model may be used when resource sharing (Such as sharing of information databases of different types, with each type of database located on a different machine) with remote users is desired.  The early ARPAnet is an example of a distributed computing system based on the minicomputer model. Mini- Computer Communication network Mini- Computer Mini- Computer Mini- Computer Terminals Fig. 1.2 : A distributed computing system based on the minicomputer model 1.3.2 Workstation Model :  As shown in Fig. 1.3, a distributed computing system based on the workstation model consists of several workstations

How does the system find an idle workstation?
How is a process transferred from one workstation to get it executed on another workstation?
What happens to a remote process if a user logs onto a workstation that was idle until now and was being used to execute a process of another workstation? Three commonly used approaches for handling the third issue are as follows:
The first approach is to allow the remote process share the resources of the workstation along with its own logged-on user’s processes. This method is easy to implement, but it defeats the main idea of workstations serving as personal computers, because if remote processes are allowed to execute simultaneously with the logged on user’s own processes, the logged-on user does not get his of her guaranteed response.
The second approach is to kill the remote process. The main drawbacks of this method are that all processing done for the remote process gets lost and the file system may be left in an inconsistent state, making this method unattractive.
The third approach is to migrate the remote process back to its home workstation, so that its execution can be continued there. This method is difficult to implement because it requires the system to support preemptive process migration facility. For a number of reasons, such as higher reliability and better scalability, multiple servers are often used for managing the resources of a particular type in a distributed computing system. For example, there may be multiple file servers, each running on a separate minicomputer and cooperating via the network, for managing the files of all the users in the system. Due to this reason, a distinction is often made between the services that are provided to clients and the servers that provide them. That is, a service is an abstract entity that is provided by one or more servers. For example, one or more file servers may be used in a distributed computing system to provide file service to the users. In this model, a user logs onto a workstation called his or her home workstation. Normal computation activities required by the user’s processes are preformed at the user’s home workstation, but requests for services provided by special servers (such as a file server or a database server) are sent to a server providing that type of service that performs the user’s requested activity and returns the result of request processing to the user’s workstation. Therefore, in his model, the user’s processes need not be migrated

to the server machines for getting the work done by those machines. For better overall system performance, the local disk of a diskful workstation is normally used for such purposes as storage of temporary files, storage of unshared files, storage of shared files that are rarely changed, paging activity in virtual-memory management, and changing of remotely accessed data. As compared to the workstation model, the workstation – server model has several advantages:

In general, it is much cheaper to use a few minicomputers equipped with large, fast disks that are accessed over the network than a large number of diskful workstations, with each workstation having a small, slow disk.
Diskless workstations are also preferred to diskful workstations from a system maintenance point of view. Backup and hardware maintenance are easier to perform with a few large disks than with many small disks scattered all over a building or campus. Furthermore, installing new releases of software (Such as a file server with new functionalities) is easier when the software is to be installed on a few file server machines than on every workstations.
In the workstation server model, since all files are managed by the file servers, user have the flexibility to use any workstation and access the files in the same manner irrespective of which workstation the user is currently logged on. Note that this is not true with the workstation model, in which each workstation has its local file system, because different mechanisms are needed to access local and remote files.
In the workstation server model, the request response protocol described above is mainly used to access the services of the server machines. Therefore, unlike the workstation model, this model does not need a process migration facility, which is difficult to implement. The request response protocol is known as the client-server model of communication. In this model, a client process (which in this case resides on a workstation) sends a request to a server process (Which in his case resides on a minicomputer) for getting some service such as a block of a file. The server executes the request and sends back a reply to the client that contains the result of request processing.

In the processor-pool model there is no concept of a home machine. That is, a user does not log onto a particular machine but to the system as a whole. 1.3.4 Hybrid Model : Out of the four models described above, the workstation- server model, is the most widely used model for building distributed computing systems. This is because a large number of computer users only perform simple interactive tasks such as editing jobs, sending electronic mails, and executing small programs. The workstation-server model is ideal for such simple usage. However, in a working environment that has groups of users who often perform jobs needing massive computation, the processor-pool model is more attractive and suitable. To continue the advantages of both the workstation-server and processor-pool models, a hybrid model may be used to build a distributed computing system. The hybrid model is based on the workstation-server model but with the addition of a pool of processors. The processors in the pool can be allocated dynamically for computations that are too large for workstations or that requires several computers concurrently for efficient execution. In addition to efficient execution of computation-intensive jobs, the hybrid model gives guaranteed response to interactive jobs by allowing them to be processed on local workstations of the users. However, the hybrid model is more expensive to implement than the workstation – server model or the processor-pool model. EXERCISE:

Differentiate between time-sharing, parallel processing, network and distributes operating systems.
In what respect are distributed computing systems better than parallel processing systems?
Discuss the main guiding principles that a distributed operating system designer must keep in mind for good performance of the system?
What are the major issues of designing a Distributed OS?
What is the major difference between Network OD and Distributed OS?
Why is scalability an important feature in the design of a distributed OS? Discuss the guiding principles for designing a scalable distributed system.  

ISSUES IN DESIGNING A DISTRIBUTED

OPERATING SYSTEM

Unit Structure: 2.1 Issues in Designing a Distributed Operating System 2.2 Transparency 2.3 Performance Transparency 2.4 Scaling Transparency 2.5 Reliability 2.6 Fault Avoidance 2.7 Fault Tolerance 2.8 Fault Detection and Recovery 2.9 Flexibility 2.10 Performance 2.11 Scalability

2.1 ISSUES IN DESIGNING A DISTRIBUTED

OPERATING SYSTEM

In general, designing a distributed operating system is more difficult than designing a centralized operating system for several reasons. In the design of a centralized operating system, it is assumed that the operating system has access to complete and accurate information about the environment in which it is functioning. For example, a centralized operating system can request status information, being assured that the interrogated component will not charge state while awaiting a decision based on that status information, since only the single operating system asking the question may give commands. However, a distributed operating system must be designed with the assumption that complete information about the system environment will never be available. In a distributed system, the resources are physically separated, there is no common clock among the multiple processors, delivery of messages is delayed, and messages could even be lost. Due to all these reasons, a distributed operating system does not have up-to-date, consistent knowledge about the state of the various components of the underlying distributed system. Obviously, lack of up-to-date and consistent information

is essential that the replicas have the same name. Consequently, as system that supports replication should also support location transparency.

Concurrency Transparency: It hides the fact that the resource may be shared by several competitive users. Example, two independent users may each have stored their file on the same server and may be accessing the same table in a shared database. In such cases, it is important that each user doesn’t notice that the others are making use of the same resource.
Failure Transparency: Hides failure and recovery of the resources. It is the most difficult task of a distributed system and is even impossible when certain apparently realistic assumptions are made. Example: A user cannot distinguish between a very slow or dead resource. Same error message come when a server is down or when the network is overloaded of when the connection from the client side is lost. So here, the user is unable to understand what has to be done, either the user should wait for the network to clear up, or try again later when the server is working again.
Persistence Transparency: It hides if the resource is in memory or disk. Example, Object oriented database provides facilities for directly invoking methods on storage objects. First the database server copies the object states from the disk i.e. main memory performs the operation and writes the state back to the disk. The user does not know that the server is moving between primary and secondary memory. Persistence Hide whether a (software) resource is in memory or on disk Transparency Description Access Hide differences in data representation and how a resource is accessed Location Hide where a resource is located Migration Hide that a resource may move to another location Relocation Hide that a resource may be moved to another location while in use Replication Hide that a resource may be shared by several competitive users Concurrency Hide that a resource may be shared by several competitive users Failure Hide the failure and recovery of a resource

Summary of the transparencies In a distributed system, multiple users who are spatially separated use the system concurrently. In such a duration, it is economical to share the system resources (hardware or software) among the concurrently executing user processes. However since the number of available resources in a computing system is restricted, one user process must necessarily influence the action of other concurrently executing user processes, as it competes for resources. For example, concurrent updates to the same file by two different processes should be prevented. Concurrency transparency means that each user has a feeling that he or she is the sole user of the system and other users do not exist in the system. For providing concurrency transparency, the resource sharing mechanisms of the distributed operating system must have the following four properties :

An event-ordering property ensures that all access requests to various system resources are properly ordered to provide a consistent view to all users of the system.
A mutual-exclusion property ensures that at any time at most one process accesses a shared resource, which must not be used simultaneously by multiple processes if program operation is to be correct.
A no-starvation property ensures that if every process that is granted a resource, which must not be used simultaneously by multiple processes, eventually releases it, every request for that resource is eventually granted.
A no-deadlock property ensures that a situation will never occur in which competing processes prevent their mutual progress even though no single one requests more resources than available in the system.

2.3 PERFORMANCE TRANSPARENCY

The aim of performance transparency is to allow the system to be automatically reconfigured to improve performance, as loads vary dynamically in the system. As far as practicable, a situation in which one processor of the system is overloaded with jobs while another processor is idle should not be allowed to occur. That is, the processing capability of the system should be uniformly distributed among the currently available jobs in the system. This requirements calls for the support of intelligent resource allocation and process migration facilities in distributed operating systems.

the designers of the various software components of the distributed operating system must test them thoroughly to make these components highly reliable.

2.7 FAULT TOLERANCE

Fault tolerance is the ability of a system to continue functioning in the event of partial system failure. The performance of the system might be degraded due to partial failure, but otherwise the system functions properly. Some of the important concepts that may be used to improve the fault tolerance ability of a distributed operating system are as follows :

1. Redundancy techniques : The basic idea behind redundancy techniques is to avoid single points of failure by replicating critical hardware and software components, so that if one of them fails, the others can be used to continue. Obviously, having two or more copies of a critical component makes it possible, at least in principle, to continue operations in spite of occasional partial failures. For example, a critical process can be simultaneously executed on two nodes so that if one of the two nodes fails, the execution of the process can be completed at the other node. Similarly, a critical file may be replicated on two or more storage devices for better reliability. Notice that with redundancy techniques additional system overhead is needed to maintain two or more copies of a replicated resource and to keep all the copies of a resource consistent. For example, if a file is replicated on two or more nodes of a distributed system, additional disk storage space is required and for correct functioning, it is often necessary that all the copies of the file are mutually consistent. In general, the larger is the number of copies kept, the better is the reliability but the incurred overhead involved. Therefore, a distributed operating system must be designed to maintain a proper balance between the degree of reliability and the incurred overhead. This raises an important question : How much replication is enough? For an answer to this question, note that a system is said to be k - fault tolerant if it can continue to function even in the event of the failure of k components [Cristian 1991, Nelson 1990]. Therefore, if the system is to be designed to tolerance k fail – stop failures, k + 1 replicas are needed. If k replicas are lost due to failures, the remaining one replica can be used for continued functioning of the system. On the other hand, if the system is to be designed to tolerance k Byzantine failures, a minimum of 2 k + 1 replicas are needed. This is because a voting mechanism can be used to believe the

majority k + 1 of the replicas when k replicas behave abnormally. Another application of redundancy technique is in the design of a stable storage device, which is a virtual storage device that can even withstand transient I/O faults and decay of the storage media. The reliability of a critical file may be improved by storing it on a stable storage device.

2. Distributed control: For better reliability, many of the particular algorithms or protocols used in a distributed operating system must employ a distributed control mechanism to avoid single points of failure. For example, a highly available distributed file system should have multiple and independent file servers controlling multiple and independent storage devices. In addition to file servers, a distributed control technique could also be used for name servers, scheduling algorithms, and other executive control functions. It is important to note here that when multiple distributed servers are used in a distributed system to provide a particular type of service, the servers must be independent. That is, the design must not require simultaneous functioning of the servers; otherwise, the reliability will become worse instead of getting better. Distributed control mechanisms are described throughout this book.

2.8 FAULT DETECTION AND RECOVERY

The faulty detection and recovery method of improving reliability deals with the use of hardware and software mechanisms to determine the occurrence of a failure and then to correct the system to a state acceptable for continued operation. Some of the commonly used techniques for implementing this method in a distributed operating system are as follows.

1. Atomic transactions : An atomic transaction (or just transaction for shore) is a computation consisting of a collection of operation that take place indivisibly in the presence of failures and concurrent computations. That is, either all of the operations are performed successfully or none of their effects prevails, other processes executing concurrently cannot modify or observe intermediate states of the computation. Transactions help to preserve the consistency of a set of shared date objects (e.g. files) in the face of failures and concurrent access. They make crash recovery much easier, because transactions can only end in two states : Either all the operations of the transaction are performed or none of the operations of the transaction is performed.

Distributed Computing Systems, Lecture notes of Distributed Database Management Systems

Related documents

Partial preview of the text

Download Distributed Computing Systems and more Lecture notes Distributed Database Management Systems in PDF only on Docsity!

Syllabus

CPU CPU CPU CPU

1.2 EVOLUTION OF DISTRIBUTED COMPUTING

SYSTEM

1.3 DISTRIBUTED COMPUTING SYSTEM MODELS

ISSUES IN DESIGNING A DISTRIBUTED

OPERATING SYSTEM

2.1 ISSUES IN DESIGNING A DISTRIBUTED

OPERATING SYSTEM

2.3 PERFORMANCE TRANSPARENCY

2.7 FAULT TOLERANCE

2.8 FAULT DETECTION AND RECOVERY