Comparison of Grid and Database Approaches for Efficient Petabyte Data Management, Study notes of Architecture

The approaches of grid research and distributed database research towards managing and replicating large amounts of data, focusing on the use of object-oriented databases management systems (odbms) in scientific experiments. The author outlines the differences between computational grids and data grids, discusses the need for optimized data replication and access over wan, and proposes solutions for efficient, asynchronous replication and data consistency policies in a data grid. The document also covers replica catalogues and directory services, related work on data replication, and commonalities between the two communities.

Typology: Study notes

2014/2015

Uploaded on 07/28/2015

mohsin_aftab
mohsin_aftab 🇬🇧

1 document

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Distributed Database Management Systems and the Data Grid
Heinz Stockinger
CERN, European Organization for Nuclear Research, Geneva, Switzerland
Institute for Computer Science and Business Informatics, University of Vienna, Austria
tel +41 22 767 16 08
Abstract
Currently, Grid research as well as distributed
database research deals with data replication but
both tackle the problem from different points of
view. The aim of this paper is to outline both
approaches and try to find commonalities between
the two worlds in order to have a most efficient
Data Grid that manages data stored in object-
oriented databases. Our target object-oriented
database management system is Objectivity/DB
which is currently the database of choice in some
existing High Energy Physics (HEP) experiments as
well as in next generation experiments at CERN.
The characteristics of Data Grids are described,
especially within the High Energy Physics
community, and needs for Data Grids are defined.
The Globus toolkit is the Grid middle-ware on
which we base our discussions on Grid research.
1 Introduction
Grid computing in general comes from high-
performance computing, super computing and later
cluster computing where several processors or work
stations are connected via a high-speed interconnect
in order to compute a mutual program. Originally,
the cluster was meant to span a local area network
but then it was also extended to the wide area. A
Grid itself is supposed to connect computing
resources over the wide area network.
The Grid research field can further be divided
into two large sub-domains: Computational Grid
and Data Grid. Whereas a Computational Grid is a
natural extension of the former cluster computer
where large computing tasks have to be computed
at distributed computing resources, a Data Grid
deals with the efficient management, placement and
replication of large amounts of data. However, once
data are in place, computational tasks can be run on
the Grid using the provided data. The need for Data
Grids stems from the fact that scientific
applications like data analysis in High Energy
Physics (HEP), climate modelling or earth
observation are very data intensive and a large
community of researchers all around the globe
wants to have fast access to the data.
In the remainder of this paper we concentrate
on the specific needs of High Energy Physics which
can be regarded as a representative example for
other data intensive research communities. In
particular, we focus on the data intensive Large
Hadron Collider (LHC) experiments of CERN, the
European Organization for Nuclear Research in
Geneva, Switzerland. At CERN, recently the
DataGrid project [1] has been initiated in order to
set up a Data Grid. One of the working groups
explicitly deals with data management in a Data
Grid [2], i.e. in the DataGrid project. The tasks to
be solved include data access, migration and
replication as well as query estimation and
optimisation in a secure environment. In this paper
we deal with the replication aspects that need to be
solved in the DataGrid project. The Globus toolkit
[3] is the middle-ware which we will use for the
Grid infrastructure.
Scientific, data intensive applications use
large collections of files for storing data. As regards
the HEP community, data generated by large
detectors have to be stored persistently in mass
storage systems like disks and tapes in order to be
available for physics analysis. In some HEP
experiments, databases are used to store Terabytes
and even Petabytes of persistent data. The usage of
databases is still a unique feature for a Data Grid.
Let us compare this to the climate modelling
community: in that research domain large
collections of files are available and stored in so
called “flat files” without databases. This requires
additional data management tasks like keeping a
catalogue of available files whereas in some
physics experiments in the HEP community the
database management system (DBMS) takes care of
this.
Currently, some new experiments in HEP use
object-oriented databases management systems
(ODBMS) for data management. This is true for
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Comparison of Grid and Database Approaches for Efficient Petabyte Data Management and more Study notes Architecture in PDF only on Docsity!

Distributed Database Management Systems and the Data Grid

Heinz Stockinger CERN, European Organization for Nuclear Research, Geneva, Switzerland Institute for Computer Science and Business Informatics, University of Vienna, Austria [email protected] tel +41 22 767 16 08

Abstract

Currently, Grid research as well as distributed database research deals with data replication but both tackle the problem from different points of view. The aim of this paper is to outline both approaches and try to find commonalities between the two worlds in order to have a most efficient Data Grid that manages data stored in object- oriented databases. Our target object-oriented database management system is Objectivity/DB which is currently the database of choice in some existing High Energy Physics (HEP) experiments as well as in next generation experiments at CERN. The characteristics of Data Grids are described, especially within the High Energy Physics community, and needs for Data Grids are defined. The Globus toolkit is the Grid middle-ware on which we base our discussions on Grid research.

1 Introduction

Grid computing in general comes from high- performance computing, super computing and later cluster computing where several processors or work stations are connected via a high-speed interconnect in order to compute a mutual program. Originally, the cluster was meant to span a local area network but then it was also extended to the wide area. A Grid itself is supposed to connect computing resources over the wide area network. The Grid research field can further be divided into two large sub-domains: Computational Grid and Data Grid. Whereas a Computational Grid is a natural extension of the former cluster computer where large computing tasks have to be computed at distributed computing resources, a Data Grid deals with the efficient management, placement and replication of large amounts of data. However, once data are in place, computational tasks can be run on the Grid using the provided data. The need for Data Grids stems from the fact that scientific applications like data analysis in High Energy Physics (HEP), climate modelling or earth

observation are very data intensive and a large community of researchers all around the globe wants to have fast access to the data. In the remainder of this paper we concentrate on the specific needs of High Energy Physics which can be regarded as a representative example for other data intensive research communities. In particular, we focus on the data intensive Large Hadron Collider (LHC) experiments of CERN, the European Organization for Nuclear Research in Geneva, Switzerland. At CERN, recently the DataGrid project [1] has been initiated in order to set up a Data Grid. One of the working groups explicitly deals with data management in a Data Grid [2], i.e. in the DataGrid project. The tasks to be solved include data access, migration and replication as well as query estimation and optimisation in a secure environment. In this paper we deal with the replication aspects that need to be solved in the DataGrid project. The Globus toolkit [3] is the middle-ware which we will use for the Grid infrastructure. Scientific, data intensive applications use large collections of files for storing data. As regards the HEP community, data generated by large detectors have to be stored persistently in mass storage systems like disks and tapes in order to be available for physics analysis. In some HEP experiments, databases are used to store Terabytes and even Petabytes of persistent data. The usage of databases is still a unique feature for a Data Grid. Let us compare this to the climate modelling community: in that research domain large collections of files are available and stored in so called “flat files” without databases. This requires additional data management tasks like keeping a catalogue of available files whereas in some physics experiments in the HEP community the database management system (DBMS) takes care of this. Currently, some new experiments in HEP use object-oriented databases management systems (ODBMS) for data management. This is true for

BaBar (an experiment at the Stanford University where currently about 150 TB of data are available in Objectivity/DB [4]), as well as for the CERN experiments, CMS and Atlas. For the CERN experiments the final decision about the DBMS (object-oriented or relational, which vendor, etc.) has not been made yet, but the current software development processes uses an object-oriented approach and Objectivity for storing data persistently. Consequently, we base our work on object-oriented database management systems and in particular Objectivity/DB. Recently, Grid research as well as distributed database research tackles the problem of data replication but from a different point of view. The aim of this paper is to outline their approaches and to find commonalities in order to have a most efficient Data Grid that can manage Petabytes of data stored in object-oriented databases. We will provide a first basis for such an effort. Since Data Grids are very new in the research community, we see a clear need for identifying the characteristics and requirements of Data Grids and how they can be met in a most efficient way. Special attention will be given to data consistency and communication issues. Optimising data replication and access to data over the WAN is not addressed sufficiently in database research. In a DBMS there is normally only one method of accessing data. For instance, a data server serves pages to a client. For the Data Grid, such a single access method may not be optimal. Using an ODMBS also has some restrictions, which are pointed out here and some possible solutions are given. We elaborate on different data consistency models and how global transactions can contribute to this. Global transactions are built on top of transactions provided by a database management system at a local site. As opposed to database research, a separation of data communication is required. In particular, global control messages are exchanged by a message passing library whereas the actual data files are transported via high-speed file transfer protocols. A single communication protocol is usually used in a commercial ODBMS for exchanging small amounts of data within database transactions. This communication mechanism is optimised for relatively small transactions but may not be optimised for transferring large files over the WAN with high performance control information exchanged between distributed sites. We will propose solutions for efficient, asynchronous replication and policies with different levels of data consistency in a Data Grid.

The paper is organised as follows. Section 2 will give an introduction to related work in the database as well as the Grid community. The issues raised there will be further analysed in later sections of the paper. Section 3 deals with replica catalogues and directory services that are used in Grid research and points out how these techniques can be mapped to the database approach. Section 4 discusses Objectivity, its replication option and some general ODBMS aspects. Since we assume that the usage of Grid applications will not be fully transparent to the end user, we dedicate Section 5 to possible implications. Section 6 discusses data consistency and replication methods known in database research and practise. Possible update synchronisation solutions are given in Section 7, which is followed by concluding remarks.

2 Related work on data replication

We first identify some selected related work in the database community as well in the Grid community. From this we derive commonalities and discuss briefly what the contribution for an efficient Data Grid can be. The common aspects will be dealt with throughout this paper.

2.1 Distributed DBMS research Distributed database research basically addresses the following issues.

  • Replica synchronisation is based on relatively small transactions where a transaction consists of several read and/or write operations. In contrast, in HEP a typical data production job can write a whole file and thus a transaction should be relatively “large”.
  • Synchronous and asynchronous replication (more details can be found in Section 6): Most of the evaluation techniques are based on the amount of communication messages needed.
  • Cost functions for network or server loads are rarely integrated into the replica selection process of a DBMS.
  • Rather low amount of data as compared to the Petabyte scale of HEP. Data access over the WAN and optimisation of replicas is rarely addressed in database research. In a DBMS there is normally only one internal method of accessing data. For instance, a data server serves pages to a client. In detail, a client sends a request for data to the DBMS, and the DBMS (in particular the data server) uses the implemented method for serving the requested data. For read and write access the same method is used, independent of the amount of data accessed. For the Data Grid, a single access method is not optimal depending on

heterogeneous Data Grid environment with many sources of data: managing replicas of files and dealing with heterogeneous data stores.

  • Managing replicas: As for Objectivity, a file catalogue exists. The granularity for replication is now assumed to be on the file level. When multiple copies of a file exist, they have to be introduced to a global replica catalogue which is an extended version of the native file catalogue of the DBMS. Furthermore, a name space has to be provided that takes into account multiple replicas. Both features are enhancements to a DBMS that only considers single sources of information. Note that we assume here that the underlying DBMS does not support file replication. Using the LDAP protocol and a database backend, the required replica catalogue information can be stored. Another approach is to simply build the catalogue and the access methods with the native schema of the DBMS and store the replica information in a particular file of the DBMS.
  • Heterogeneous data stores: The diversity of the HEP community requires an approach which supports heterogeneous data stores since different kinds of data may be stored in different formats and also in data stores of different vendors. Even the two different database paradigms, relational and object- oriented, should be supported. In this case, a standard protocol for directory information like LDAP seems to be a good choice since replica location information can be accessed in a uniform way without the knowledge of the underlying data store. Combining a DBMS with a directory service means to expose the replica catalogue to a larger user community, where it is not necessary to have a database specific front end to access the directory information. However, the LDAP protocol still needs to have a database backend where the file information is stored and concurrency mechanisms have to be established. The database backend can either be a vendor specific and LDAP specialised database, or any database of choice like Oracle or Objectivity/DB. The usage of a directory service allows an open environment and hence allows the integration of data sources of different kinds. Note that the problem of accessing heterogeneous data sets is not addressed here and may require mediators for data integration. However, introducing LDAP is at least a starting point for extensibility. Another point important point in replica management is synchronisation of single sites and

keeping replica catalogues up to date. Using the LDAP directory service also implies that each site, that stores replica catalogue information, runs an LDAP server locally. LDAP commands can be used for synchronisation. For a homogeneous database approach, where only a single DBMS is used to store the entire data of the Data Grid, it may not be necessary to have an LDAP server for synchronisation - any communication mechanism for exchanging synchronisation information between sites in the Data Grid is sufficient. However, since each data site using a directory service will have its own LDAP server, this server can also be used as a synchronisation point for distributed sites. Whenever a file is introduced to a site, this information has to be made public within a global name space spanning all the other sites in the Data Grid. The replica synchronisation protocol to use depends on the data consistency requirements imposed by the application. We conclude that the usage of LDAP in combination with a DBMS seems to be useful in a heterogeneous environment and for synchronisation of sites holding replica catalogue information.

4 Objectivity/DB

Since Objectivity is the main ODBMS system we are referring to in this paper, we dedicate this section to explaining more details of this product and problems that we were confronted with when using Objectivity for wide area data replication. Some of the problems mentioned are specific to all object-oriented data stores while others are Objectivity specific. However, the usage of an ODBMS allows us some simplifications like a global namespace and a global schema information. Objectivity/DB is a distributed, object- oriented DBMS which has a Data Replication Option called DRO. This option is optimised for synchronous replication over a local area network. In this section, we briefly describe DRO and its drawbacks, and state complications that one comes across when persistent objects have to be replicated.

4.1 Objectivity’s data replication option DRO provides a synchronous replication model on the file level, i.e. entire Objectivity databases which are mapped to physical files can be replicated. Note that in the following section we use the term replica to refer to a physical instance (a copy) of an Objectivity file. There are basically two ways to use DRO: data can be written first, and then synchronised and replicated ( populate - replicate ). Multiple replicas can be created and synchronised and the data can be written into all the replicas at

the same time ( replicate - populate ). Once a replica is synchronised with another one, all the database transactions on a single replica are synchronised with other replicas. Objectivity/DRO performs a dynamic quorum calculation when an application accesses a replica. The quorum which is required to read or write replicas can be changed, which provides some flexibility concerning data consistency. The usage of a quorum method is well established in database research and goes back to an early paper [10]. There is one possibility to overcome the immediate synchronisation. A replica can be set to be off-line. However, only if a quorum still exists, data can be written to the off-line replica. There is no clean way in DRO to perform a real asynchronous or batch replication where you specify a synchronisation point in time. An asynchronous or batch replication method allows replicas to be out of sync for a certain amount of time. Reconciliation of updates is only done at certain synchronisation points, e.g. every hour. The lack of such an explicit asynchronous replication method is one of the reasons why DRO is not considered as a good option for WAN replication. Like many commercial databases, Objectivity does not provide any optimisation for accessing replicas. The replica catalogue only contains the number of copies and the location of each single copy of a file. Once an object of a replicated file has to be accessed, Objectivity tries to get the first copy of the file which appears in the catalogue. It does not examine the bandwidth to the site nor takes into account any data server load. Some operations such as database creation require all replicas to be available. When we talk about Objectivity in any other section or subsection of this paper we mostly ignore the fact that DRO exists and thus see Objectivity only as a distributed DMBS with single copies of a file. However, these single copies of a file can exist at distributed sites in the Data Grid and thus remote access to Objectivity data is possible.

4.2 Partial replication and associations In this paper, as well as in many Grid environments, replication is regarded to be done on the file rather than on the object level and hence a file is regarded to be the smallest granularity of replication. In Objectivity, single objects are stored in containers. Several containers are stored together in an Objectivity database which corresponds to a file. Each object can have associations (also called links or pointers) to other objects which may reside in any database or container. Let us now assume that two objects in two different files are connected

via one association. When an Objectivity database (a file) is replicated, this association gets lost when only one of the files is replicated. Note that there is still the possibility of remotely accessing objects. Hence, the replication decision has to be carefully made and possible associations between objects have to be considered. This may result in replicating a set of files in order to keep all the associations between files. Furthermore, this also imposes severe restrictions on partial replication where only particular objects of a file are replicated. Again, when a certain set of objects is selected which does not have associations to objects which are not in the set, the replication set is association safe. This is a particular restriction of object-oriented data stores and does not only hold for Objectivity. This is also an import difference and complication compared to a relational DBMS. Whereas in an ODBMS an object can be stored in any file, in a relational database data items are stored in tables that have well defined references to other tables.

4.3 The Objectivity file catalogue The integration of files into the native Objectivity file catalogue is done with a tool called ooattachdb which adds a logical file name of the physical location into the catalogue. Furthermore, it guarantees that links to existing Objectivity databases are correct. The schema of the file is not checked. This feature is used when a file created at one site has to be integrated into the file catalogue of another site. Since the native catalogue only has a one-to- one mapping from one logical to one physical file, replicas are not visible to the local site (not taking into account DRO). Furthermore, it is possible to have single links from one site to another one. For instance, site 1 has a file called X and this file shall be shared (not replicated) between several sites. The file name and the location can be integrated into the local file catalogue and a remote access to the file can be established. Note that this requires an Objectivity Advanced Multi-threaded Server (AMS) running or the usage of a shared file system like AFS that connects both sites. The AMS is responsible for transferring (streaming) objects from one machine to another one and thus establishes a remote data access functionality. We want to address the more general solution where no shared file system is available. In the HEP community it is generally agreed and proven that DRO is not optimised for the use in a WAN. Hence, all our discussions here neglect the DRO of Objectivity/DB and we conclude that the

simple copy of an original but still has a logical connection to the original copy, we have to address the data consistency problem. The easiest way to tackle the consistency problem is in the field of read-only data. Since no updates of any of the replicas are done, the data is always consistent. We can state that the consistency can reach its highest degree. Once updates are possible on replicas, the degree of data consistency normally has to be decreased (provided we have read and write access to data) in order to have a reasonable good response time for accessing data replicated over the WAN. Clearly, consistency also depends on the frequency of updates and the amount of data items covered by the update. Thus, we can state that the degree of data consistency depends on the update frequency, amount of data items covered by the update and the expected response time of the replicated system, i.e. the Data Grid. Let us now identify in more detail the different consistency options which are possible and which are reasonable for HEP and within the DataGrid project. In this section we also introduce the term global transaction which has to be distinguished from a local transaction. A local transaction is done by the DBMS at the local site whereas a global transaction is an inter-site transaction which spans multiple sites and thus multiple database management systems. Furthermore, local consistency is maintained by the DBMS whereas the global consistency spans multiple sites in the Data Grid.

6.1 Synchronous replication The highest degree of consistency can be established by having fully synchronous replicas. This means that each local database transaction needs to get acknowledgements from other replicas (or at least a majority of replicas). In practice, as well as in the database research community, this is gained by global transactions which span multiple sites. Objectivity/DRO supports such a replication model. Each time a single local write transaction takes place, the 2-phase-commit protocol and normally also the 2-phase-locking protocol are used to guarantee serialisability and global consistency. This comes at the cost of relatively worse performance for global writes compared to local writes with no replication at all. Consequently, one has to decide carefully if the application requires such a high degree of consistency. We derive from this statement as well as from the current database literature that the type of replication protocol and hence the data consistency model has to be well adapted to the application.

Data in the DataGrid project will have different types and not all of them require the same consistency level. In this paper, we try to point out briefly where different replication policies are required. Furthermore, we claim that a replication system of a Data Grid should not only offer a single policy but several ones which satisfy the needs of having different data types and degrees of consistency. For a middle-ware replication system it is rather difficult to provide this high degree of consistency since global, synchronous transactions are difficult to establish. Within a DBMS a global, synchronous transaction can be an extension of a conventional, local transaction, i.e. the DBMS specific locking mechanism is extended to the replicas. Since a distributed DBMS like Objectivity has built-in global transactions, no additional communication mechanism like sockets or a message passing library is required. Hence, the performance for an integrated distributed DBMS is superior to middle-ware replication systems that have to use external communication mechanisms. Now the following question arises: why not use a distributed DBMS like Objectivity to handle replication? Several points are already covered in Section 4 but there is another major point. Objectivity does not provide flexible consistency levels different kinds of data. Hence, we aim for a hybrid solution where a local site stores data in a DBMS which also handles consistency locally by managing all database transactions locally. A Grid middle-ware is required to provide communication and co-ordination between the local sites. The degree of independence of a single site needs to be flexibly managed. A form of global transaction system is necessary. Let us illustrate this by an example. Some data are allowed to be out of sync (low data consistency) whereas other types of data always need to be synchronised immediately (high consistency). Thus, global transactions have to be flexible and do not necessarily always have to provide the highest degree of consistency. Furthermore, a site may even want to be independent of others once data are available.

6.2 Asynchronous Replication Based on the relative slow performance of write operations in a synchronously replicated environment, the database research community is searching for efficient protocols for asynchronous replication at the cost of lower consistency. Currently, there is no standard for replication available, but a few commonly agreed solutions:

  • Primary-copy approach : This is also known as “master-slave” approach. The basic idea is

that for each data item or file a primary copy exists and all the other replicas are secondary copies [13]. The updates can only be done by the primary copy which is the owner of the file. If a write request is sent to a secondary copy, the request is passed on to the primary copy which does the updates and propagates the changes to all secondary copies. This policy is implemented in the object data stores Versant [14] and ObjectStore [15]. Also Oracle provides such a feature. The primary-copy approach provides a high degree of data consistency and has improved write performance features compared to synchronous replication because the lock on a file has not to be agreed among all replicas but only by the primary copy.

  • Epidemic approach : User operations are performed on any single replica and a separate activity compares version information (e.g. time-stamps) of different replicas and propagates updates to older replicas in a lazy manner (as opposed by an eager, synchronous approach) [16]. Thus, update operations are always executed locally first and then the sites communicate to exchange up-to-date information. The degree of consistency can be very low here and this solution does not exclude dirty reads or other possible database anomalies. Such a system can only be applied for non time-critical data since the propagation of updates can also result in conflicts which have to be solved manually.
  • Subscription and relatively independent sites : Similar to the epidemic approach, another policy is that a site only wants to have data and does not care about consistency at all. When any other site does updates on a particular file, only certain sites are notified of the updates. This follows a subscription model where a site subscribes explicitly to a data producing site. A site that has not subscribed is itself responsible to get the latest information from other sites. This allows a site to do local changes without the agreement of other sites. However, such a site has to be aware that the local data is not always up-to-date. A valid solution to this problem is to provide an export buffer, where the newest information of a site is stored, and an import buffer, where a local site stores all the information that needs to be imported from a remote site. For instance, a local site has finished writing 10 different files and puts the file names into the export buffer. A remote site can than be notified and transfers the information from the export buffer to its

local import buffer and requests the files if necessary. This approach allows more flexibility concerning data consistency and independence for a local site. A site can decide itself which data to import and which information to filter. Furthermore, a data production site may not export all the locally available information and can filter the export buffer accordingly. A valid implementation of this approach can be found in the Grid Data Management Pilot (GDMP) [17].

6.3 Communication and Transactions As outlined above, there is a clear need for global transactions. Such a transaction does not necessarily need to create locks at each site, but at least a notification system is required to automate and trigger the replication and data transfer process. In general, there is a clear separation between exchanging control messages which organise locks and update notifications, and the actual data transfer. This is an important difference to current database management systems. Replication protocols are often compared by the amount of messages sent in order to evaluate their performance. The type of a message is dependent on the DBMS. A message of the same type is then sent to do the actual update using the same communication protocol. In Data Grids where most of the data are read-only, we can divide the required communication into the following two parts. This concept is also realised in GDMP [17].

  • control messages : These are all messages that are used to synchronise distributed sites. For instance, one site notifies another site that new data are available, update information is propagated, etc. To sum up, replication protocols are using these control messages.
  • data transfer : This includes the actual transfer of data and meta data files from one site to another one. This separation is similar to the FTP protocol where we also have this clear separation between the two tasks [18]. The main point for this separation is to use the most appropriate protocol for a specific communication need. Simple message passing is appropriate for exchanging control messages whereas a fast file transfer protocol is required to transfer large amounts of (large) files. In a Grid environment both protocols are provided. To be more specific, in Globus the GlobusIO library can be used for control messages and Grid-FTP implementation [19], which is based on the WU- FTP server and the NC-FTP client, serves as the data transport mechanism. A single communication protocol used in an ODBMS like Objectivity may

update synchronisation on the object rather than on the file level since a middle ware system cannot access DBMS internals like pages or object tables. A common solution in database research is to communicate only the changes of a file to remote sites. Since an Objectivity database file can only be interpreted correctly by a native Objectivity process, the file itself appears like any binary file and a conventional process does not see any structure in the file. We call this the binary difference approach. As second possibility to update data stored in an ODBMS, we use an object- oriented approach which requires knowledge about the schema and data format of the files to be updated. Both approaches are outlined in this section.

7.1 Binary Difference Approach There is also the possibility to use a tool called XDelta [20], which produces the difference between any two binary files. This difference can than be sent to the remote site. This site can then update the file, which is out of date, by merging the diff file with the original data file. XDelta is a library interface and application program designed to compute changes between files. These changes (deltas) are similar to the output of the "diff" program in that they may be used to store and transmit only the changes between files. However, unlike diff, the output of XDelta is not expressed in a human-readable format - XDelta can also apply these deltas to a copy of the original file(s).

7.2 Object-oriented approach Another approach is to create objects that are aware of replicas. In principle, an object can be created at any site and the creation method of this object has to take care of or delegate the distribution of this object. The class definition has to be designed in a way that there is some information on the amount and the site of replicas. For instance, an object should be created at site 1 and replicated to the sites X and Y. A typical creation method can look like follows:

object.create (site1, siteX, siteY);

The advantage of this approach is that all the necessary information is available to create the object at any site. When an update on one of the objects is done, the update function has to be aware of all the replicas and the sites that need to be updated. This can be compared to the stored procedure approach which is known in the relational database world. In principle, the model

presented here has similar ideas. A local site may update immediately and can store the updates into a log file. Based on the consistency requirement of remote sites, the log information is sent to the remote sites which apply the same update function as the original local site. The update synchronisation problem is then passed to a ReplicatorObject that is aware of replicas and the replication policy. The ReplicatorObject in turn can provide different consistency levels like updating remote sites immediately, each hour, day, etc. When large amounts of data are used, there may be most likely a scalability problem of managing all the logging information. However, since each object is identified by a single OID, only the parameters of an update method together with the OID have to be stored.

object.update_parameter_x (200); // OID = 38-23-222-

The log file stores the triple (x/38-23-222-442/200) where the first argument is the parameter of the object, the second the OID and the third the new value of the parameter. The modus operandi for communicating the changes is like the following. A local site gains an exclusive, global lock on the file and updates the required objects. In parallel the log file is written. The file itself is transferred to remote sites with an efficient file transfer protocol whereas the remote sites are notified via control messages. Since such a replication policy is rather cost intensive in terms of exchanging communication messages sending data, it should only be applied to a relatively small amount of data. In the HEP environment, there exist many meta data sources like replica catalogues, indices etc. which require a high consistency of data. For such data this approach is useful.

8 Conclusion

The data management efforts of the two research communities distributed databases and Grid deals with the problem of data replication where the Grid community specifically deals with large amounts of data in wide area networks. In the HEP community, data are often stored in database management systems, and it is appropriate to try to understand the research issues of both communities: distributed databases and Grid; analyse differences and commonalities, and combine common ideas to form an efficient Data Grid. We have presented research issues and possible solutions. Thus, we provide a

first basis for the effort of combining both research communities.

Acknowledgement

We want to thank colleagues from the following groups for fruitful discussions: DataGrid work package “Data Management” (including CERN, Caltech, LBL and INFN), CMS Computing group (CERN, Princeton and Caltech), Globus project in Argonne and ISI, BaBar experiment at SLAC and colleagues taking part in the Data Grid discussions in the Grid Forum 5 in Boston.

References

[1] T h e E u r o p e a n D a t a G r i d P r o j e c t : http://www.cern.ch/grid/ [2] Wolfgang Hoschek, Javier Jaen-Martinez, Asad Samar, Heinz Stockinger, Kurt Stockinger, Data Management in International Data Grid Project, 1st 1EEE, ACM International Workshop on Grid Computing (Grid'2000) , Bangalore, India, 17-20 Dec.

[3] The Globus Project, http://www.globus.org [4] Objectivity Inc., http://www.objectivity.com [5] Heinz Stockinger, Kurt Stockinger, Erich Schikuta, Ian Willers. Towards a Cost Model for Distributed and Replicated Data Stores, 9th Euromicro Workshop on Parallel and Distributed Processing PDP 2001 , IEEE Computer Society Press, Mantova, Italy, February 7-9, 2001. [6] Particle Physics Data Grid (PPDG), http://www.ppdg.net [7] GriPhyN, http://www.griphyn.org [8] Koen. Holtman, Peter van der Stok, Ian Willers. Towards Mass Storage Systems with Object Granularity. Proceedings of the IEEE Mass Storage Systems and Technologies , Maryland, USA, March 27-30, 2000 [9] Koen Holtman, Heinz Stockinger, Building a Large Location Table to Find Replicas of Physics Objects, Proc. of Computing in High

Energy Physics (CHEP 2000) , Padova, Febr.

[10] D.K. Gifford. Weighted Voting for replicated data_. ACM-SIGOPS Symp. on Operating Systems Principles_ , Pacific Grove, December

[11] Kurt Stockinger, Dirk Duellmann, Wolfgang Hoschek, Erich Schikuta. Improving the Performance of High Energy Physics Analysis through Bitmap Indices_. 11th International Conference on Database and Expert Systems Applications_ , London - Greenwich, UK, Springer-Verlag, Sept. 2000. [12] L. M. Bernardo, A. Shoshani, A. Sim, H. Nordberg. Access Coordination of Tertiary Storage for High Energy Physics Applications. IEEE Symposium on Mass Storage Systems , College Park, MD, USA, March 2000. [13] Yuri Breitbart, Henry Korth. Replication and Consistency: Being Lazy Helps Sometimes, Proc. 16 ACM Sigact/Sigmod Symposium on the Principles of Database Systems , Tucson, AZ 1997. [14] Versant, Inc. http://www.versant.com/ [15] ObjectStore http://www.exceloncorp.com /products/objectstore.html [16] Divyakant Agrawal, Amr El Abbadi, R. Steinke: Epidemic Algorithms in Replicated Databases (Extended Abstract). PODS 1997 ,

[17] Asad Samar, Heinz Stockinger. Grid Data Management Pilot (GDMP): A Tool for Wide Area Replication, IASTED International Conference on Applied Informatics (AI 2001) , Innsbruck, Austria, 2001. [18] J. Postel, J. Reynolds, RFC 959: File Transfer Protocol (FTP), October 1985. [19] Globus Project, Universal Data Transfer for the Grid, White Paper, 2000. [20] J o s h u a P. M a c D o n a l d , X D e l t a , http://www.XCF.Berkeley.EDU/~jmacd/xdelt a.html