Software Reliability - Dependable Software Systems | CS 576, Study notes of Computer Science

Material Type: Notes; Professor: Mancoridis; Class: Dependable Software Systems; Subject: Computer Science; University: Drexel University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 08/19/2009

koofers-user-lec-1
koofers-user-lec-1 🇺🇸

10 documents

1 / 68

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Dependable Software Systems (Reliability)
Dependable Software Systems (Reliability) © SERG
Dependable Software Systems
Topics in
Software Reliability
Material drawn from [Somerville, Mancoridis]
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44

Partial preview of the text

Download Software Reliability - Dependable Software Systems | CS 576 and more Study notes Computer Science in PDF only on Docsity!

Dependable Software Systems

Topics in

Software Reliability

Material drawn from [Somerville, Mancoridis]

What is Software Reliability?

  • A Formal Definition: Reliability is the probability of failure-free operation of a system over a specified time within a specified environment for a specified purpose.

Software Reliability

  • It is difficult to define the term objectively.
  • Difficult to measure user expectations,
  • Difficult to measure environmental factors.
  • It’s not enough to consider simple failure rate: - Not all failures are created equal; some have much more serious consequences. - Might be able to recover from some failures reasonably.

Failures and Faults

  • A failure corresponds to unexpected run- time behavior observed by a user of the software.
  • A fault is a static software characteristic which causes a failure to occur.

Improving Reliability

  • Primary objective: Remove faults with the most serious consequences.
  • Secondary objective: Remove faults that are encountered most often by users.

Improving Reliability (Cont’d)

  • Fixing N% of the faults does not, in general, lead to an N% reliability improvement.
  • 90-10 Rule: 90% of the time you are executing 10% of the code.
  • One study showed that removing 60% of software “defects” led to a 3% reliability improvement.

The Cost of Reliability (Cont’d)

  • Cost of software failure often far outstrips the cost of the original system: - data loss - down-time - cost to fix

Measuring Reliability

  • Hardware failures are almost always physical failures ( i.e., the design is correct).
  • Software failures, on the other hand, are due to design faults.
  • Hardware reliability metrics are not always appropriate to measure software reliability but that is how they have evolved.

Reliability Metrics (ROCOF)

  • Rate Of Occurrence Of Failure (ROCOF): - Frequency of occurrence of failures. - E.g., ROCOF of 0.02 means 2 failures are likely in each 100 time units.
  • Relevant for transaction processing systems.

Reliability Metrics (MTTF)

  • Mean Time To Failure (MTTF):
    • Measure of time between failures.
    • E.g., MTTF of 500 means an average of 500 time units passes between failures.
  • Relevant for systems with long transactions.

Time Units

  • What is an appropriate time unit?
  • Some examples:
    • Raw execution time, for non-stop real-time systems.
    • Number of transactions, for transaction-based systems.

Types of Failures

  • Not all failures are equal in their seriousness: - Transient vs permanent - Recoverable vs non-recoverable - Corrupting vs non-corrupting
  • Consequences of failure:
    • Malformed HTML document.
    • Inode table trashed.
    • Incorrect radiation dosage reported.
    • Incorrect radiation dosage given!

Automatic Bank Teller Example

  • Bank has 1000 machines; each machine in the network is used 300 times per day.
  • Lifetime of software release is 2 years.
  • Therefore, there are about 300,000 database transactions per day, and each machine handles about 200,000 transactions over the 2 years.

Example Reliability Specification

Failure class Example Reliability metric Permanent, The system fails to ROCOF =1 occ./ days non-corrupting operate with any card; must be restarted. Transient, The magnetic strip on POFOD = 1 in 1000 trans. non-corrupting an undamaged card cannot be read. Transient, A pattern of transactions Should never happen corrupting across the network causes DB corruption.