Reliability and Safety in Software Engineering: Measurement, Models, and Assurance - Prof., Study notes of Software Engineering

An overview of software reliability, focusing on measurement techniques, reliability growth models, safety arguments, and safety assurance. It also discusses the challenges of operational profile generation and the use of reliability models. Validation costs, operational profiles, reliability measurement problems, reliability prediction, and various reliability models and approaches.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-rqj
koofers-user-rqj 🇺🇸

10 documents

1 / 27

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
cmsc435 - 1
Reliability
cmsc435 - 2
Objectives
To explain how system reliability can be
measured and how reliability growth models
can be used for reliability prediction
To describe safety arguments and how these
are used
To discuss the problems of safety assurance
To introduce safety cases and how these are
used in safety validation
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b

Partial preview of the text

Download Reliability and Safety in Software Engineering: Measurement, Models, and Assurance - Prof. and more Study notes Software Engineering in PDF only on Docsity!

cmsc435 - 1

Reliability

Objectives

l To explain how system reliability can be

measured and how reliability growth models

can be used for reliability prediction

l To describe safety arguments and how these

are used

l To discuss the problems of safety assurance

l To introduce safety cases and how these are

used in safety validation

cmsc435 - 3

Validation costs

l Because of the additional activities involved,

the validation costs for critical systems are

usually significantly higher than for non-

critical systems.

l Normally, V & V costs take up more than 50%

of the total system development costs.

Reliability validation

l Reliability validation involves exercising the program to assess whether or not it has reached the required level of reliability.

l This cannot normally be included as part of a normal defect testing process because data for defect testing is (usually) atypical of actual usage data.

l Reliability measurement therefore requires a specially designed data set that replicates the pattern of inputs to be processed by the system.

cmsc435 - 7

Reliability measurement problems

l Operational profile uncertainty

 The operational profile may not be an accurate reflection of the real use of the system.

l High costs of test data generation

 Costs can be very high if the test data for the system cannot be generated automatically.

l Statistical uncertainty

 You need a statistically significant number of failures to compute the reliability but highly reliable systems will rarely fail.

An operational profile

cmsc435 - 9

Operational profile generation

l Should be generated automatically whenever

possible.

l Automatic profile generation is difficult for

interactive systems.

l May be straightforward for ‘normal’ inputs but

it is difficult to predict ‘unlikely’ inputs and to

create test data for them.

Reliability prediction

l A reliability growth model is a mathematical model of the system reliability change as it is tested and faults are removed.

l It is used as a means of reliability prediction by extrapolating from current data  Simplifies test planning and customer negotiations.  You can predict when testing will be completed and demonstrate to customers whether or not the reliability growth will ever be achieved.

l Prediction depends on the use of statistical testing to measure the reliability of a system version.

cmsc435 - 13

Reliability prediction

Reliability

Required reliability

Fitted reliability model curve

Estima ted time of reliability achievement

Time

= Measured reliability

l TIME DEPENDENT APPROACHES  TIME BETWEEN FAILURES (Musa Model)  FAILURE COUNTS IN SPECIFIED INTERVALS (Goel/Okumoto) l TIME-INDEPENDENT APPROACHES  ERROR SEEDING  INPUT DOMAIN ANALYSIS l PROBLEMS WITH USE OF RELIABILITY MODELS  LACK OF CLEAR UNDERSTANDING OF INHERENT STRENGTHS AND WEAKNESSES  UNDERLYING ASSUMPTIONS AND OUTPUTS NOT FULLY UNDERSTOOD BY USER  NOT ALL MODELS APPLICABLE TO ALL TESTING ENVIRONMENTS

Reliability measures

cmsc435 - 15

ASSUMPTIONS:

  1. ERRORS ARE DISTRIBUTED RANDOMLY THROUGH THE PROGRAM
  2. TESTING IS DONE WITH REPEATED RANDOM SELECTION FROM THE ENTIRE RANGE OF INPUT DATA
  3. THE ERROR DISCOVERY RATE IS PROPORTIONAL TO THE NUMBER OF ERRORS IN THE PROGRAM
  4. ALL FAILURES ARE TRACED TO THE ERRORS CAUSING THEM AND CORRECTED BEFORE TESTING RESUMES
  5. NO NEW ERRORS ARE INTRODUCED DURING DEBUGGING K t KE

WHERE E IS TOTAL ERRORS IN THE SYSTEM t IS THE ACCUMULATED RUN TIME (STARTS at 0) T IS THE MEAN TIME TO FAILURE

(^1) e T =

Reliability measures - Musa

m = Mo 1 - EXP - C t M (^) o T (^) o

CUMULATIVE EXECUTION TIME t

FAILURES EXPERIENCED m

TOTAL FAILURES M (^) o

Reliability measures – Failure rate

cmsc435 - 19

CLEAN ROOM DEVELOPER USES READING TECHNIQUES, TOP DOWN DEVELOPMENT TESTING DONE BY INDEPENDENT ORGANIZATION AT INCREMENTAL STEPS RELIABILITY MODEL USED TO PROVIDE DEVELOPER WITH QUALITY ASSESSMENT FUNCTIONAL TESTING/COVERAGE METRICS USE FUNCTIONAL TESTING APPROACH COLLECT ERROR DISTRIBUTIONS, E.G., OMISSION vs COMMISSION OBTAIN COVERAGE METRICS KNOWING NUMBER OF ERRORS OF OMISSION, EXTRAPOLATE ERROR ANALYSIS AND RELIABILITY MODELS ESTABLISH ERROR HISTORY FROM PREVIOUS PROJECTS DISTINGUISH SIMILARITIES AND DIFFERENCES TO CURRENT PROJECT DETERMINE PRIOR ERROR DISTRIBUTIONS FOR CURRENT PROJECT SELECT CLASS OF STOCHASTIC MODELS FOR CURRENT PROJECT UPDATE PRIOR DISTRIBUTIONS AND COMPARE ACTUAL DATA WITH THE PRIORS FOR THE CURRENT PROJECT

Reliability – Combining approaches

Approaches toward reliability: Fault tolerance

l Fault detection  The system must detect that a fault (an incorrect system state) has occurred. l Damage assessment  The parts of the system state affected by the fault must be detected. l Fault recovery  The system must restore its state to a known safe state. l Fault repair  The system may be modified to prevent recurrence of the fault. As many software faults are transitory, this is often unnecessary.

cmsc435 - 21

Fault detection

l Preventative fault detection

 The fault detection mechanism is initiated before the state change is committed. If an erroneous state is detected, the change is not made.

l Retrospective fault detection

 The fault detection mechanism is initiated after the system state has been changed. This is used when a incorrect sequence of correct actions leads to an erroneous state or when preventative fault detection involves too much overhead.

l Preventative fault detection really involves

extending the type system by including

additional constraints as part of the type

definition.

l These constraints are implemented by

defining basic operations within a class

definition.

Type system extension

cmsc435 - 25

l Forward recovery

 Apply repairs to a corrupted system state.

l Backward recovery

 Restore the system state to a known safe state.

l Forward recovery is usually application specific

  • domain knowledge is required to compute possible state corrections.

l Backward error recovery is simpler. Details of a safe state are maintained and this replaces the corrupted system state.

Fault recovery and repair

l Corruption of data coding

 Error coding techniques which add redundancy to coded data can be used for repairing data corrupted during transmission.

l Redundant pointers

 When redundant pointers are included in data structures (e.g. two-way lists), a corrupted list or filestore may be rebuilt if a sufficient number of pointers are uncorrupted  Often used for database and file system repair.

Forward recovery

cmsc435 - 27

l Transactions are a frequently used method of

backward recovery. Changes are not applied

until computation is complete. If an error

occurs, the system is left in the state

preceding the transaction.

l Periodic checkpoints allow system to 'roll-

back' to a correct state.

Backward recovery

Safety assurance

l Safety assurance and reliability measurement are quite different:  Within the limits of measurement error, you know whether or not a required level of reliability has been achieved;  However, quantitative measurement of safety is impossible. Safety assurance is concerned with establishing a confidence level in the system.

l Confidence is developed through:  Past experience with the company developing the software;  The use of dependable processes and process activities geared to safety;  Extensive V & V including both static and dynamic validation techniques.

cmsc435 - 31

Construction of a safety argument

l Establish the safe exit conditions for a component or a program.

l Starting from the END of the code, work backwards until you have identified all paths that lead to the exit of the code.

l Assume that the exit condition is false.

l Show that, for each path leading to the exit that the assignments made in that path contradict the assumption of an unsafe exit from the component.

Example code: Delivery code

currentDose = com puteInsu lin () ; // Safety check - adjust c urrentDose if necessary // if st atem ent 1 if (previousD ose == 0 ) { if (currentDo se > 16) currentDose = 16 ; } else if (currentDo se > (previousD ose * 2) ) currentDose = p reviou sDose * 2 ; // if st atem ent 2 if ( currentDose < m inimumD ose ) currentDose = 0 ; else if ( c urrentDose > ma xDose ) currentDose = m axDose ; administerInsulin (currentDo se) ;

cmsc435 - 33

Safety argument model

Program paths

l Neither branch of if-statement 2 is executed  Can only happen if CurrentDose is >= minimumDose and <= maxDose.

l then branch of if-statement 2 is executed  currentDose = 0.

l else branch of if-statement 2 is executed  currentDose = maxDose.

l In all cases, the post conditions contradict the unsafe condition that the dose administered is greater than maxDose.

cmsc435 - 37

Run-time safety checking

l During program execution, safety checks can

be incorporated as assertions to check that

the program is executing within a safe

operating ‘envelope’.

l Assertions can be included as comments (or

using an assert statement in some languages).

Code can be generated automatically to check

these assertions.

Insulin administration with assertions

static v oid adm inisterInsulin ( ) throws S afetyException { int ma xIncrements = InsulinPu mp.ma xDo se / 8 ; int incremen ts = InsulinPum p.currentDose / 8 ; // assert currentDose <= Ins ulinPump .maxD ose if (I nsulinP ump .currentDose > InsulinPum p.maxDo se) throw new S afetyExcep tion (Pump .doseH igh); else for (int i=1 ; i<= increm ents; i++) { generateS ignal () ; if (i > ma xIncremen ts) throw new S afetyExcep tion ( Pump.inc orrectIncreme nts); } // for loop } //administerInsulin

Will discuss this next time – formal specifications

cmsc435 - 39

Security assessment

l Security assessment has something in common with safety assessment. l It is intended to demonstrate that the system cannot enter some state (an unsafe or an insecure state) rather than to demonstrate that the system can do something.

l However, there are differences  Safety problems are accidental; security problems are deliberate;  Security problems are more generic - many systems suffer from the same problems; Safety problems are mostly related to the application domain

Security validation

l Experience-based validation  The system is reviewed and analysed against the types of attack that are known to the validation team. l Tool-based validation  Various security tools such as password checkers are used to analyse the system in operation. l Tiger teams  A team is established whose goal is to breach the security of the system by simulating attacks on the system.

l Formal verification  The system is verified against a formal security specification.