














Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Existing systems and their components, including the work done on the framework, software, hardware, and modeling tools. It also covers the requirements for producing these systems, including design, implementation, and testing. various questions related to the systems and their components, as well as their development and implementation. It is a technical document that would be useful for students studying computer science or software engineering.
Typology: Study notes
1 / 22
This page cannot be seen from the preview
Don't miss anything!















In this chapter we discuss the construction of baseline models of exist- ing systems. This activity relies on knowledge of the hardware, software, workload, and monitoring tools associated with the system under study. It also requires access to information recorded by accounting and software monitors during system operation. Here, we describe general approaches applicable to a variety of systems. In Chapter 17, we illustrate these approaches with an example based on a specific system (IBM’s MVS) and a specific monitoring tool (RMF). In Chapter 4 we divided the inputs of queueing network models into three groups: the customer description, the center description, and the ser- vice demands. The structure of the present chapter reflects this division. Section 12.3 is devoted to the customer description: the correspon- dence of the workload components of the system to the customer classes of the model. In specifying the values of the customer description param- eters, we are answering questions such as:
274
12.2. Types and Sources of Information 275
There is little reason to construct a model of an existing system unless this model is to be used for performance projection. Consequently, we cannot completely separate the task of constructing a baseline model of an existing system (the subject of this chapter) from the task of using the model to project performance for an evolving system (the subject of Chapter 13). Our (somewhat artificial) separation between the two tasks will be the following: problems that arise from limitations or shortcom- ings of current monitoring tools and techniques will be treated in this chapter, while problems that would persist even with ideal monitoring capabilities will be deferred to the next chapter.
The information required to specify parameter values for a queueing network model of an existing system includes static information about the system configuration and dynamic information extracted from records pro- duced during system operation by various monitoring packages. Some information is recorded for purposes of accounting, while other informa- tion is recorded explicitly for performance evaluation purposes. Software packages of varying degrees of sophistication are available for storing, analyzing, and reporting the information recorded during system opera- tion. In this section, we discuss briefly the information needed, how it can be obtained, and how it can be managed. Our intention is not to be comprehensive, but rather to highlight points of particular relevance to the construction and use of queueing network models.
One type of information required is a description of the hardware and software of the system. With respect to hardware, this information includes an enumeration of the components of the system (processors, channels, storage devices, communication devices, etc.) and an indication their interconnections (e.g., the paths over which data can be moved from a particular storage device to memory). With respect to software, this information includes the operating system in use, and the values of parameters that influence resource allocation. Examples of such parame- ters include CPU scheduling priorities for various workload components, placement of files on storage devices, etc.
12.2. Types and Sources of Information 277
present the most significant performance problems. The duration of the observation interval should be long enough that end effects do not significantly affect the accuracy of the measurements. End effects are measurement errors caused by the fact that some customers are processed partly within and partly outside of the observation interval. In particular, it is typical to assume that the system operates in flow balance over the measurement interval, so that the job arrival and completion rates are equal. However, because some jobs arrive but do not complete during the interval, and other jobs arrive before but complete during the interval, flow balance may not hold. Clearly, measurements obtained from longer observation intervals are affected less by these end’effects than are shorter intervals. Typically, observation intervals of thirty to ninety minutes are appropriate for obtaining software monitor data. If monitor- ing overhead is a concern, shorter intervals can be used, but the danger of anomalies is increased.
Other sources of useful information include hardware monitors and monitors specialized for particular application subsystems (such as data- base or telecommunications subsystems). Hardware monitors, because they are “external observers” of the system, obtain accurate measure- ments and do not perturb system operation. They are incapable, how- ever, of associating resource usage with workload components. The spe- cialized application subsystem monitors are helpful in assessing the per- formance of subsystems whose autonomy from the host operating system prevents standard monitors from being able to record their activity. (For example, special monitors are needed for IBM’s IMS database system because RMF does not record information about individual IMS transac- tions.) While any information that is available from hardware and spe- cialized application subsystem monitors should be exploited, our discus- sion in this chapter will be restricted to the kinds of information that are commonly reported in most medium or large computer installations.
Table 12.1 summarizes the information typically available from various sources. Information from different sources (accounting and software monitors, or even two different software monitors) may be based on different underlying assumptions. For this reason, and also because of end effect anomalies, information from different sources may appear to be contradictory. For example, consider a small interactive system in which monitors report that in a thirty minute observation interval:
We would conclude that throughput during the observation interval was:
7200 transactions 1800 seconds
= 4 transactions/second
278 Parameterization: Existing Systems
information provided
system description
accounting monitor
hardware configuration operating system (and version) resource allocation and scheduling strategies tuning parameter values CPU usage, by workload component logical I/O operation count, by workload component customer completions. by workload component measured busy time, by device software monitor
physical I/O operation count, by device average queue length, by device throughput, by workload component average response time, by workload component
monitor observed busy time,^ by device
Table 12.1 - Sources of Information Because the observation interval is long relative to the average response time, we could be confident that end-effects would not lead to significant errors in the estimates of throughput or response time. Considering Little’s law, however, we would find the sum of the queue lengths (18) to be much higher than expected from the product of throughput ( transactions/second) and response time (3 seconds). One possible expla- nation for such a situation is that the queue lengths include system tasks that are not counted in either the throughput or response time calcula- tions. On the other hand, if the sum of the queue lengths had been reported as 8 (and other values remained the same>, then Little’s law would reveal a discrepancy in the other direction. A possible explanation for the second case would be that requests were queueing for admission to memory, thus spending a significant part of their response time where they were not included in the queue length of any device. The funda- mental laws presented in Chapter 3 can be used to detect such apparent contradictions. System intuition and careful thought is required to resolve them. Enhanced awareness of the problems of configuration management and capacity planning has led recently to some encouraging progress in the use and management of system measurement data. First, special reporting routines tailored to the requirements of queueing network modelling have been developed for some systems. (^) These routines analyze records produced by existing accounting and software monitors. Some are capable of defining a queueing network in a format directly acceptable by particular queueing network modelling software packages.
280 Parameterization: Existing Systems
l Classes may be made to correspond to accounting and performance groups. This facilitates the calculation of various parameter values, since accounting data is organized by accounting group. 0 Classes may be used to distinguish work generated by various organi- zational units (e.g., divisions of a company). This permits unit- specific performance projections, and facilitates later modification analysis (since workload forecasts frequently are made on an organiza- tional unit basis). A first step in identifying customer classes is to group portions of the workload according to whether they are best represented as batch, termi- nal, or transaction types. Often, the nature of a workload component suggests an appropriate type: if requests arrive at a constant rate, then transaction; if requests are generated by a set of users that await the com- pletion of service to one request before generating another, then termi- nal; if the number of active requests is constant, then batch. Variations are possible, though, especially in conducting a modification analysis. As one example, a workload component might in fact consist of users at ter- minals, but for planning purposes its intensity might be described in terms of a request arrival rate. In this case, the use of a transaction type might be appropriate. As another example, a system might have many workload components, only a few of which are of interest. The presence of the other components might be reflected in the model by a single “aggregate” class of transaction type (so that its throughput is guaranteed to equal the measured value). Within each type of customer class, further separation of workload components may be desirable. Batch work of different priorities may be represented as distinct classes. Different interactive systems (e.g., APL and TSO in an IBM environment) may be treated as separate terminal classes. If trivial transactions (such as simple editing commands) can be distinguished from substantive transactions (such as complex database queries), then different classes can be used to distinguish the two groups. The queueing network model input parameter C is simply the number of customer classes, determined according to the guidelines suggested above. Models of simple systems typically have just one or two classes, while models of complex multi-purpose systems may have eight or more. In some special situations it is useful to have a very large number of classes~- say, twenty to forty. One example of a situation in which a large number of classes was used is a model developed for projecting the performance of a hospital information system used in many hospitals. There were roughly thirty major transaction types (admit-patient, order-blood-test, set-dietary- restriction, etc.) each one of which was represented as a separate custo- mer class. In this way, the arrival rate of each transaction type and the
priority assigned to the transaction type (reflecting its urgency in a partic- ular hospital) could be represented directly in the model. The hospitals using the system differed substantially in size and in the hardware on which they ran the system. Also, they differed significantly in the partic- ular mix of transactions that were processed. The model proved useful in configuration design. The response times for various transaction classes could be related to the arrival rates and priorities of the classes for vari- ous contemplated hardware configurations.
Having identified each workload component to be represented as a dis- tinct customer class and determined the type of that class, the next step is to establish the workload intensity of each class. For a transaction class, the workload intensity is the transaction arrival rate. Over a reasonably long observation interval in a system that is not saturated, the arrival rate is essentially the same as the completion rate. Consequently, an estimate for the arrival rate of class c is:
A, = measured completions of class c length of measurement interval
For a batch class, the workload intensity is given by the average number of batch customers active. An estimate for N,, the number of class c customers, can be obtained in several ways: l If jobs are processed in a fixed number of regions and memory queue- ing times are high (so that it is known that each region is busy throughout most of the observation interval), then N, is the number of processing regions. l If the software monitor provides an estimate of the average multipro- gramming level of the class over the observation interval by sampling, then N, can be taken to be that estimate. l If accounting data provides the residence time of each job in the cen- tral subsystem, then N, can be estimated by:
2 measured job residence time c/assc NC = jobs length of measurement interval
(This alternative is impractical without the use of a reduction package capable of automatically extracting this information from accounting records.) For a terminal class, workload intensity is specified by the number of active terminals, N,, along with the average think time, 2,. Three possi- bilities for estimating N, for terminal classes correspond directly to the three methods used for batch classes:
12.5. Service Demands (^283)
The service centers of a queueing network model correspond to significant points of congestion or delay in the system. There are many ways of representing system resources by a set of service centers. Here we suggest only the most widely accepted methods, which have proven successful in a large number of modelling studies. For systems with single CPUs and for tightly-coupled multiprocessors, a single service center is used to represent the CPU(s) in the queueing network model. Loosely-coupled multiprocessors are modelled by includ- ing one service center per processor. Front end communications proces- sors and back end database machines also may be represented as separate service centers. The representation of disk subsystems can be done in a variety of ways. (See the discussion in Chapter 10.1 A number of components are involved in each disk I/O operation. The modelling approach that has proven most successful, however, is to use a single service center to represent each disk. Congestion due to other I/O subsystem components is represented by calculating an appropriate effective service demand for each center. Other peripheral devices can be represented more simply than disks. Because tape drives are not capable of operation independent of the chan- nel, a group of tape drives on a channel can be represented by a single service center. The service demands at the center can be established using channel utilization only, and ignoring the individual tape drives. Unit record equipment typically is ignored in constructing queueing network models. This is justified in many systems because spooling makes the use of unit record devices asynchronous. Similarly, terminal controll- ers typically are not represented. If delays in the communications front end are thought to be important in a particular study, then a special approach must be used. This might involve a hierarchical model in which a conventional central subsystem model is evaluated, and then the delays due to communication are represented in a high-level model that includes an FESC representing the central subsystem.
The final set of values needed to parameterize a queueing network model are the service demands at each center of the customers belonging to each class. Obtaining these values can be a difficult and time consum- ing process. As a practical consideration, it is important to concentrate on obtaining accurate estimates for the most heavily utilized centers,
(^284) Parameterization: Existing Systems
because a small error in estimating the service demands at the bottleneck center will affect performance projections more than a much larger error at a lightly utilized center.
In estimating service demands, the three center types (delay, FESC, and queueing) are treated differently.
Delay centers have service demands that represent a delay that is not caused by congestion (e.g., a propagation delay in a communication net- work). It usually is not difficult to determine appropriate values for delay centers. In addition, errors in the service demands at delay centers are not “magnified” by queueing delay calculations when the model is evaluated.
For FESCs, the load dependent service rates can be determined in many ways, as described in Chapter 8. Two major approaches are evaluating low-level queueing network models (as illustrated in Chapter 9 for the case of memory constraints) and considering hardware characteris- tics (as illustrated in Chapter 11 for the case of tightly-coupled multipro- cessors).
The remainder of this section is devoted to the case of queueing centers, by far the most common center type in queueing network models. Conceptually, estimating service demands for queueing centers is straightforward: at the conclusion of the measurement interval, the measured busy time for each class at each device is divided by the number of system completions for the class. In practice, however, two difficulties arise:
l In the multiple class case, the available data frequently is insufficient to apportion the measured busy time among the classes with certainty. The reasons and the remedies differ for various devices and various systems. l A portion of the busy time attributed to each class is intrinsic to that class: its basic processing and I/O requirements. The remainder con- sists partly of service demand inflation and partly of overhead. Service demand inflation, introduced in Chapter 10, is the component of measured disk busy times due to contention in the I/O subsystem. (There is no service demand inaation for processors.) Overhead is work done by the operating system “on behalf of’ the customers of the class. Part of the overhead component is jked, in that it does not depend on system congestion (e.g., the CPU service required to ini- tiate user I/O operations), and part of it is variable and typically increases with system load (e.g., paging I/O). In a baseline model these distinctions do not matter, but in conducting a modification analysis they can be crucial, for the service demand inflation and vari- able overhead components of the model usually change in a new environment.
286 Parameterization: Existing Systems
Consider a system with a workload consisting of two components: batch jobs and interactive users. Assume that information comparable to that listed in Table 12.1 has been obtained. Let fBATcH and fINrER be (unknown) factors by which the attributed CPU busy time for each class must be multiplied so that all measured CPU busy time is attributed to some class. (Observe that f, is the inverse of the capture ratio for class c.> This leads to the equation:
&PU = ~BATCH x ABATCH,CPU + TINTER x AINTER.CPU where A,., Cpu is the CPU usage attributed to class c, and BCp” is the total measured CPU busy time. To determine unique values for f BATCHand fl,vr,=~ we must establish a relationship between them in addition to this equation. Several possi- bilities exist: l Assume that the ratio of total CPU time to attributed CPU time is the same for each class, yielding:
.fBATCH = flhll-ER =
BCPU
l Since the unattributed CPU busy time is likely to be overhead, use class based information on activities likely to cause CPU overhead (such as paging rate, swapping rate, spooling, user I/O, and job initia- tions) to determine a relative measure of total overhead for each class. For instance, assuming that overhead is due almost entirely to page fault handling, and letting OK, (the relative overhead of class c> be the measured number of pages transferred because of class c faults, we have:
f INTER = 0 vfNTER 1 + OvINi-ERfOVBATCH
x [BCPG - [A,NTER.cPu + ABATCH.CPU] ] AlvTER cpL/ 1 I
The second approach is the more reasonable. Unfortunately, more than one factor inevitably contributes to overhead. Thus, OV, is better defined as the weighted sum of several factors:
ail ,faciors i When one attempts to apply this approach in practice, two common prob- lems are apt to be encountered:
l Even for a single measurement interval, it may be difficult to deter- mine which factors to consider, and what weights to assign to these factors. Iteration inevitably is required: estimate weights, calculate service demands, evaluate model, re-estimate weights, etc. l If one truly is to have confidence in the weights selected, then data from a number of measurement intervals must be considered, and weights must be found that yield good model results when applied to each set of data. An ad hoc approach can be adopted, or linear regres- sion techniques can be used. Once fBATCH and f [,vTERhave been determined, the service demands of the two classes can be estimated by the equation:
D (^) c,CPU = fc x &,CPU measured class c completions
Note that the service demands determined in this way include intrinsic service, fixed overhead, and an amount of variable overhead that reflects the degree of system congestion in the interval covered by the measure- ment data.
12.5.2. Estimating I/O Service Demands
I/O activity in most current computer systems is dominated by opera- tions on direct access storage devices (fixed head, movable head, and electronic disks). Tape I/O and I/O for staging data to and from mass storage devices plays a secondary role. Other types of peripheral devices typically are inconsequential with respect to performance. Our discussion in this section focuses on disk I/O, reflecting its importance.
In Section 10.7 we described how the lengths of certain portions of disk service requirements (seek, latency, rotation, and transfer) could be established from system knowledge (e.g., device characteristics) and measurement data. We assumed that both the visit counts and the ser- vice times per visit for each class at each disk were known. In this sec- tion, we suggest a method for determining these quantities. First we con- sider the visit counts, then the service times.
We distinguish two ways of viewing I/O operations. Physical I/O operations correspond to activations of I/O subsystem components to transfer data to or from peripherals. Logical I/O operations correspond to operating system calls by customers requesting access to blocks of infor- mation. For a number of reasons physical and logical I/O operations do not correspond directly to one another. Sometimes, a logical I/O opera- tion may not result in a physical I/O operation; for example, a logical I/O operation may request access to a block of information that already is in
12.5. Service Demands 289
physical I/OS 1 P, (^1) p2 p
Table 12.2 - (^) Physical Disk I/OS by Class and Device Table 12.2 suggests a way of thinking about the problem of determin- ing the number of physical I/OS by each class at each device, again for the case of two classes, batch (BATCH) and interactive (INTER). The central rows correspond to classes, while the central columns correspond to disks. The entry to be filled in at column k of row c is the number of physical (^) I/OS by class C at device k ( V,>k X measured class c completions). The information available, how- ever, is only that the columns must add to Pk while the rows must add to L, X g,. This provides a number of equations equal to the sum of the number of classes and the number of disks, whereas the number of v,,k values that we must estimate is equal to the product of these quantities. (For instance, in Table 12.2 there are five constraints corresponding to the two row sums and three column sums, but there are six V,,, values to be determined.) Consequently, we must use additional information to specify the v,,k values uniquely. Alternatives include:
l The simplest assumption, which can be used in the absence of any other information, is that all classes use the various disks in the same proportions:
---L.-=--L--^ v,^ k^ I$^ k Vc,k’ v,‘,k’
for classes c and c’, and disks k and k’
l The software configuration portion of the system description fre- quently indicates the location of various key data sets: paging files, swapping files, catalogs, files devoted to various applications, etc. If a particular class is known not to use a device, then its visit count there can be set to zero. If a particular class is known to be the exclusive user of a device, then its visit count there can be set to the measured physical I/O count of the device divided by the measured number of completions of the class. The remaining visit counts can be resolved in a series of stages. At each stage, the distribution of I/OS for the class for which the least flexibility remains is determined.
290 Parameterization: Existing Systems
l In some systems there are software monitors capable of observing directly the number of physical I/OS broken down by both class and device. Although such monitors cause too much overhead to be used continuously, they can be used over short intervals (e.g., 10 minutes) to obtain an indication of the distribution of physical I/OS by class and device. l Occasionally, the breakdown of logical I/OS by device as well as by class is known. This additional information makes it possible to proceed with greater confidence. In particular, if we can assume that the ratio of physical I/OS to logical I/OS is the same for each class, then the physical I/OS at a particular device can be attributed to classes in the same proportions as are the logical I/OS. We turn now to the problem of determining the SC,,. It is customary to assume that, at any particular disk, all classes have the same service time per visit. With this simplification, the service times are given by:
Situations in which one class has a substantially larger service time at a disk than another class typically arise when the former class uses a much larger block size. In such cases, disk characteristics (transfer rates, rota- tion times, and seek time functions) can be used to estimate the ratios
&.k&',k> for^ each pair^ of^ classes c and^ c’^ that^ use the^ disk.^ Those ratios, together with the equation:
Bk = (^) c VCic,kX sCSk X (measured class c completions) UNclasses c
allow unique determination of the SC,,. In both the cases of equal and unequal service times across classes, the service demands are given by:
D (^) c,k = v,,k &,k
We now consider briefly the estimation of service demands for tape devices. As noted in an earlier section, it generally is appropriate to represent the tape channels rather than the individual drives. Further, it generally is appropriate to model all classes as using the various tape channels in the same proportions (although different classes will have different total amounts of tape I/O activity). Thus, the visit counts are given by:
V 1 c,k = z"' Lj x pk x (^) measured class c completions al/ c/assesj
where the Pk and L, now are measured physical tape I/OS at center k and logical tape I/OS of class c, respectively. Assuming that all classes use
292 Parameterization: Existing Systems
should place special emphasis on the performance measures of that class. Table 12.3 suggests rough guidelines for reasonable expectations of model accuracy during validation.
An important point to note is that queueing network models typically project percentage changes in performance with more accuracy than abso- lute levels of performance. For example, consider the projection of the effect on interactive response time of adding a batch workload to a sys- tem. Assume that the measured response time in the original system was six seconds, and the baseline model validated within 20%, giving a response time of five seconds. If the modified model then projected a ten second response time after the batch workload was added, we should anti- cipate a response time in the modified system of twelve seconds (rather than ten) since the model projected a doubling of the response time.
model type single class multiple class (per class)
system system (^) device device throughput response utilizations queue time lengths 0 to 5% 5 to 20% 0 to 5% 5 to 20%
5 to 10% 10 to 30% 5 to 10% 10 to 30%
Table 12.3 - Reasonable Tolerances in Validation
Often, even in well conceived and well executed modelling studies, an initial model will not satisfy the validation criterion. In such cases, rea- sonable modifications of the assumptions used in estimating input param- eters (especially service demands) should be attempted. For example, by noting which classes have throughputs underestimated, the analyst may be guided in a reassessment of how overhead should be attributed to the various classes. This review is repeated until the model can be validated. It is not unusual for several iterations to be required at this stage. In some cases, however, no reasonable technique for estimating inputs yields acceptable results. This is a sign that some important aspect of the system’s behavior has not been captured in the model. In many such cases, accuracy can be improved by adding more detail to the model. It is important to realize the significance of validating a model success- fully. If information from measurement data is used to establish values of model inputs, then the fact that the model outputs match the measure- ment data is, at first glance, not surprising. After a little thought, how- ever, one realizes that success in validation carries the significant implica- tion that the numerous assumptions made in establishing the model are acceptable in the context of the particular system under study. With a validated model, we are prepared to proceed to the modification analysis and performance projection, the subjects of the next chapter.
The inputs required by queueing network models can be divided into three groups: the customer description, the center description, and the service demands. The information required to determine the values of these inputs is obtained from a system description and data recorded and reported by various monitors. Many of the input values can be deter- mined in a straightforward manner from this information. Other values, however, must be inferred. The bulk of this chapter has been devoted to techniques for doing so, for various inputs.
An appropriate modelling strategy is to start with the simplest model that might suffice, adding detail as necessary. The process of model vali- dation may involve several iterations in which input values are revised and detail is added.
Thorough validation must be based on several measurement intervals. It also must be based on knowledge of the kinds of performance projec- tion questions for which the model is to be used.
Several good books on computer system performance measurement techniques are available, such as [Ferrari 19781, [Ferrari et al. 19831, and [Svobodova 19761. These, however, do not deal specifically with the needs of queueing network modelling. Rose 119781 treats the queueing network parameterization problem in general, and also relates the techniques to various specific systems. Kien- zle and Sevcik [1979] review the approaches to parameterization taken by a number of early queueing network modelling case studies. Curtin [19791 describes a performance database which serves as a repository for measurement data, and which can be accessed by the SAS statistical analysis package to produce reports suitable for both managers and analysts. Lindsay [19801 reports on the accuracy of a software petfor- mance monitor by comparing its results to those of a hardware monitor. Artis [19791 suggests a technique for identifying customer classes based on the similarity of their resource demand patterns. Cooper [ describes both the identification of customer classes and the use of cap- ture ratios as part of his presentation of an overall capacity planning methodology. Anderson [1979l proposes a sophisticated method for apportioning unattributed device activity to classes using multiple linear regression.