Remote Procedure Calls: A Comprehensive Guide | Summaries Compiler Construction

Remote Procedure Calls

Introduction

Goal: Provide a layer of abstraction for process-to-process communication that enables a process on one system to invoke a

function, or service, on a another system without having to deal with the problems of formatting data and parsing the request.

Introduction

Sockets are a fundamental part of client-server networking. They provide a relatively easy mechanism for a

program to establish a connection to another program, either on a remote or local machine and send messages

back and forth (we can even use read and write system calls). This interface, however, forces us to design our

distributed applications using a read/write (send-receive) interface which is not how we generally design non-

distributed applications. In designing centralized applications, the procedure call is usually the standard interface

model, allowing for the creation of components that have functional interfaces. If we want to make distributed

computing look like centralized computing, input/output-based communications is not the way to accomplish

this.

In 1984, Birrell and Nelson devised a mechanism to allow programs to call procedures on other machines. A

process on machine A can call a procedure on machine B. When it does so, the process on A is suspended and

execution continues on B. When B returns, the return value is passed to A and A continues execution. This

mechanism is called the Remote Procedure Call (RPC). To the programmer, it appears as if a normal

procedure call is taking place. Obviously, a remote procedure call is different from a local one in the underlying

implementation.

Steps in a remote procedure call

Consider how local procedure calls are implemented. This differs among compilers and architectures, so we will

generalize. Every processor provides us with some form of call instruction, which pushes the address of the

next instruction on the stack and transfers control to the address specified by the call. When the called

procedure is done, it issues a return instruction, which pops the address from the top of the stack and transfers

control there. That’s just the basic processor mechanism that makes it easy to implement procedure calls. The

actual details of identifying the parameters, placing them on the stack, and executing a call instruction are up to

the compiler. The compiler generates code to evaluate each parameter, push its value on the stack, and then issue

the call to the function. In the called function, the compiler is responsible for generating code to ensure that any

registers that may be clobbered are saved, allocating stack space for local variables, and then restoring the

registers and stack pointer prior to a return.

None of this makes sense if we want to call a procedure that is loaded on a remote machine. This means that

the compiler has to do something different to provide the illusion of calling a remote procedure.

Remote procedure calls are a programming language construct. They are implemented by the compiler or

interpreter of the programming language as opposed sockets, which is an operating system construct since it

is a facility provided by the operating system. They provide the illusion of calling a procedure on a remote

machine. During this time, execution of the local thread stops until the results are returned. The programmer is

alleviated from packaging data, sending and receiving messages, and parsing results.

The illusion of a remote procedure call is accomplished by generating stub functions. On the client side, the

stub (often called a proxy) is a function with the same interface as the desired remote procedure. Its job is to

take the parameters, marshal them into a network message, send them to the server, await a reply, and then

unmarshal the results and return them to the caller. On the server side, the stub (often known as a skeleton) is

responsible for being the main program that registers the service and awaits incoming requests for running the

remote procedure. It unmarshals the data in the request, calls the user’s procedure, and marshals the results into

a network message that is sent back to the recipient.

Distributed Systems

Partial preview of the text

Download Remote Procedure Calls: A Comprehensive Guide and more Summaries Compiler Construction in PDF only on Docsity!

Remote Procedure Calls

Introduction

Goal: Provide a layer of abstraction for process-to-process communication that enables a process on one system to invoke a function, or service, on a another system without having to deal with the problems of formatting data and parsing the request.

Introduction

Sockets are a fundamental part of client-server networking. They provide a relatively easy mechanism for a program to establish a connection to another program, either on a remote or local machine and send messages back and forth (we can even use read and write system calls). This interface, however, forces us to design our distributed applications using a read/write (send-receive) interface which is not how we generally design non- distributed applications. In designing centralized applications, the procedure call is usually the standard interface model, allowing for the creation of components that have functional interfaces. If we want to make distributed computing look like centralized computing, input/output-based communications is not the way to accomplish this. In 1984, Birrell and Nelson devised a mechanism to allow programs to call procedures on other machines. A process on machine A can call a procedure on machine B. When it does so, the process on A is suspended and execution continues on B. When B returns, the return value is passed to A and A continues execution. This mechanism is called the Remote Procedure Call ( RPC ). To the programmer, it appears as if a normal procedure call is taking place. Obviously, a remote procedure call is different from a local one in the underlying implementation.

Steps in a remote procedure call

Consider how local procedure calls are implemented. This differs among compilers and architectures, so we will generalize. Every processor provides us with some form of call instruction, which pushes the address of the next instruction on the stack and transfers control to the address specified by the call. When the called procedure is done, it issues a return instruction, which pops the address from the top of the stack and transfers control there. That’s just the basic processor mechanism that makes it easy to implement procedure calls. The actual details of identifying the parameters, placing them on the stack, and executing a call instruction are up to the compiler. The compiler generates code to evaluate each parameter, push its value on the stack, and then issue the call to the function. In the called function, the compiler is responsible for generating code to ensure that any registers that may be clobbered are saved, allocating stack space for local variables, and then restoring the registers and stack pointer prior to a return. None of this makes sense if we want to call a procedure that is loaded on a remote machine. This means that the compiler has to do something different to provide the illusion of calling a remote procedure. Remote procedure calls are a programming language construct. They are implemented by the compiler or interpreter of the programming language as opposed sockets, which is an operating system construct since it is a facility provided by the operating system. They provide the illusion of calling a procedure on a remote machine. During this time, execution of the local thread stops until the results are returned. The programmer is alleviated from packaging data, sending and receiving messages, and parsing results. The illusion of a remote procedure call is accomplished by generating stub functions. On the client side, the stub (often called a proxy ) is a function with the same interface as the desired remote procedure. Its job is to take the parameters, marshal them into a network message, send them to the server, await a reply, and then unmarshal the results and return them to the caller. On the server side, the stub (often known as a skeleton ) is responsible for being the main program that registers the service and awaits incoming requests for running the remote procedure. It unmarshals the data in the request, calls the user’s procedure, and marshals the results into a network message that is sent back to the recipient.

Distributed Systems

Figure 1. Steps in executing a remote procedure call

The client calls a local procedure, called the client stub. To the client process, this appears to be the actual procedure, because it is a regular local procedure. It just does something different since the real procedure is on the server. The client stub packages the parameters to the remote procedure (this may involve converting them to a standard format) and builds one or more network messages. The packaging of arguments into a network message is called marshalling and requires serializing all the data elements into a flat array-of- bytes format.
Network messages are sent by the client stub to the remote system (via a system call to the local kernel using sockets interfaces).
Network messages are transferred by the kernel to the remote system via some protocol (either connectionless or connection-oriented).
A server stub , sometimes called the skeleton , receives the messages on the server. It unmarshals the arguments from the messages and, if necessary, converts them from a standard network format into a machine-specific form.
The server stub calls the server function (which, to the client, is the remote procedure), passing it the arguments that it received from the client.
When the server function is finished, it returns to the server stub with its return values.
The server stub converts the return values, if necessary, and marshals them into one or more network messages to send to the client stub.
Messages get sent back across the network to the client stub.
The client stub reads the messages from the local kernel.
The client stub then returns the results to the client function, converting them from the network representation to a local one if necessary. The client code then continues its execution. The major benefits of RPC are twofold. First, the programmer can now use procedure call semantics to invoke remote functions and get responses. Secondly, writing distributed applications is simplified because RPC hides all of the network code into stub functions. Application programs don’t have to worry about details such as sockets, port numbers, and data conversion and parsing. On the OSI reference model, RPC spans both the session and presentation layers (layers five and six).

Challenges in implementing RPC

There are a few hurdles to overcome in implementing remote procedure calls: Parameter passing Most parameters in our programs are passed by value. That is easy to do remotely: just send the data in a network message. Some parameters, however, are passed by reference. A reference is a memory address of the parameter. The problem with passing this is that memory is local and the memory address passed from a client to a server will now refer to memory on the server rather than to the contents on the client. There is no good solution to this except to understand the data that is being referenced, send it to the remote side (pass by value), where it will be placed in some temporary memory. A local reference can then be passed to the server function. Because the contents might have been modified by the function, the data will need to be sent back to the calling client and copied back to its original location. Marshalling

Is the server accepting messages only from legitimate clients? Can the server identify the user at the client side? Can the message be sniffed by other processes while it traverses the network? Can the message be intercepted and modified by other processes while it traverses the network from client to server or server to client? Is the protocol subject to replay attacks? That is, can a malicious host capture a message an retransmit it at a later time? Has the message been accidentally corrupted or truncated while on the network? Does the user’s identity exist on the remote system and can access controls be properly applied? Programming with remote procedure calls Figure 2. Compilation steps for remote procedure calls Most languages used today (C, C++, Scheme, et alia) were not designed with built-in syntax for remote procedures and are therefore incapable of generating the necessary stub functions. To enable the use of remote procedure calls, one common solution is to provide a separate compiler that generates the client and server stub functions. This compiler takes its input from a programmer-specified definition of the remote procedure call interface. Such a definition is written in an interface definition language. The interface definition generally looks similar to function prototype declarations: it enumerates the set of functions along with input and return parameters. After the RPC compiler is run, the client and server programs can be compiled and linked with the appropriate stub functions (Figure 2). The client procedure has to be modified to initialize the RPC mechanism (e.g. locate the server and possibly establish a connection) and to handle the failure of remote procedure calls. Even if a language has support for remote procedure calls (e.g., Java), one may still need a separate IDL compiler if there is a need to handle different RPC protocols than the one provided by the language. Advantages of RPC You don’t have to worry about getting a unique transport address (picking a unique port number for a socket on a machine). The server can bind to any available port and then register that port with an RPC name server. The client will contact this name server to find the the port number that corresponds to the program it needs. All this will be invisible to the programmer. The system can be independent of transport provider. The automatically-generated server stub can make itself available over every transport provider on a system, both TCP and UDP. The client can choose dynamically and no extra programming is required since the code to send and receive messages is automatically generated. Applications on the client only need to know one transport address: that of the name server that is responsible for telling the application where to connect for a given set of server functions. The function-call model can be used instead of the send/receive (read/write) interface provided by sockets. Users don’t have to deal with marshalling parameters and then parsing them out on the other side. RPC API Implementing remote procedure calls requires various supporting services. The key components to support RPC will often include: Marshalling operations

Data that goes to and from a remote procedure (arguments, return values, and exceptions) has to be converted to a format that can be transmitted over the network (serialization) and then converted that data back into its original format (deserialization) on the other end. It needs to be augmented with additional information, such as the function or method that is being called, object instance (if objects are supported), and version number (if versioning is supported). Common serialization formats in Python include XML (for XML-RPC), JSON (for JSON-RPC), and Protocol Buffers (for gRPC). The choice of serialization format can influence the speed and efficiency of the RPC communication. Name service and discovery operations Register and look up binding information (ports, machines, protocols). Allow an application to use dynamic (operating system assigned) ports. Transport protocol support Once data is serialized, it needs to be sent to the remote system. This is achieved through a transport protocol. Some systems use TCP or UDP protocols directly. Others, such as XML-RPC may use HTTP/HTTPS over TCP or, in the case of gRPC, HTTP/2 over TCP. Connection Management The system needs to handle the creation, maintenance, and termination of network connections. This can involve dealing with connection pooling, retries, timeouts, and other network-related issues. Service definition and stub/skeleton generation In some RPC frameworks, especially those using strongly-typed data structures (like gRPC), service interfaces need to be explicitly defined through an interface definition language. These definitions often provide information about the methods available, their input parameters, and their return types. Stubs (client-side proxies) and skeletons (server-side method implementations) are then auto-generated from these definitions. Security operations The system should provide mechanisms for the client and server to be able to authenticate each other and to provide a secure communication channel between the two. Stub memory management and garbage collection Stubs may need to allocate memory for storing parameters, particularly to simulate pass-by-reference semantics. The RPC package needs to allocate and clean up any such allocations. They also may need to allocate memory for creating network buffers. For RPC packages that support objects, the RPC system needs a way of keeping track whether remote clients still have references to objects or whether an object can be deleted. Error Handling Errors can occur during remote calls, both at the application level (e.g., an exception in the remote method) or at the network level (e.g., network outage, timeout). A good RPC mechanism will have robust error handling and might also support features like exception propagation. Object and function ID operations Allow the ability to pass references to remote functions or remote objects to other processes. Not all RPC systems support this. Security operations Depending on the use case, the RPC communication might need to be secured. This can involve authentication – verifying the identity of the caller; authorization – determining if the authenticated caller has permission to perform the requested operation; encryption – ensuring that the data transmitted over the network can’t be read by unauthorized parties. Systems such as gRPC that sit on top of HTTP and HTTP/2 use Transport Layer Security (TLS) to support encrypted, authenticated RPC communication.

Remote Procedure Calls: A Comprehensive Guide, Summaries of Compiler Construction

Related documents

Partial preview of the text

Download Remote Procedure Calls: A Comprehensive Guide and more Summaries Compiler Construction in PDF only on Docsity!

Remote Procedure Calls

Introduction

Introduction

Steps in a remote procedure call

Distributed Systems

Challenges in implementing RPC