Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli


Appunti di Computing Infrastructures, Dispense di Sistemi Integrati

Appunti di Computing Infrastructures in lingua inglese. Argomenti trattati: - Storage: HDD, SSD, RAID, NAS, SAN - Virtual environment: Virtual Machines, Emulation - Performance

Tipologia: Dispense

2019/2020

In vendita dal 26/03/2020

Dr.Stoned
Dr.Stoned 🇮🇹

4

(1)

13 documenti

1 / 68

Toggle sidebar

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

bg1
COMPUTING INFRASTUCTURES
Index
Architectures
o Hard Disk (HDD), SSD (Solid State Disk), RAID
o NAS, SAN, DAS
o Dependability
o Data Centers
o Virtual Machines, Emulators
Performance
o Forced Flow Law, Throughput, Transaction workloads, Balance flow
o Model with one job class
Files
Disks can be seen by an OS as a collection of data block that can be read or written independently.
To allow the ordering/management among them, each block is characterized by a unique numerical
address called LBA (Logical Block Address).
Typically, the OS groups blocks into clusters to simplify the access to the disk.
Clusters: minimal unit that an OS can read from or write to a disk.
Files: a better way to address data on the clusters than LBA numbers.
Clusters contains:
File data: the actual content of the files
Metadata: the information required to support the file system (additional data)
Metadata contains:
File names
Directory structures and symbolic links
File size and file type
Creation, modification, last access dates
Security information
Links to the LBA where the file content can be located on the disk
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44

Anteprima parziale del testo

Scarica Appunti di Computing Infrastructures e più Dispense in PDF di Sistemi Integrati solo su Docsity!

COMPUTING INFRASTUCTURES

Index

• Architectures

o Hard Disk (HDD), SSD (Solid State Disk), RAID

o NAS, SAN, DAS

o Dependability

o Data Centers

o Virtual Machines, Emulators

• Performance

o Forced Flow Law, Throughput, Transaction workloads, Balance flow

o Model with one job class

Files

Disks can be seen by an OS as a collection of data block that can be read or written independently.

To allow the ordering/management among them, each block is characterized by a unique numerical

address called LBA (Logical Block Address).

Typically, the OS groups blocks into clusters to simplify the access to the disk.

Clusters : minimal unit that an OS can read from or write to a disk.

Files: a better way to address data on the clusters than LBA numbers.

Clusters contains:

  • File data: the actual content of the files
  • Metadata: the information required to support the file system (additional data)

Metadata contains:

  • File names
  • Directory structures and symbolic links
  • File size and file type
  • Creation, modification, last access dates
  • Security information
  • Links to the LBA where the file content can be located on the disk

We can:

  • Accessing the meta-data to locate its blocks → Access the blocks to read its content
  • Accessing the meta-data to locate free space → Write the data in the assigned blocks

Since the file system can only access clusters, the real occupation of space on a disk for a file is always a

multiple of the cluster size.

Given:

  • s: file size
  • c: cluster size (cannot be 1 byte because the metadata)
  • a: actual size on disk

Then, we have:

Optimal: 𝑎 = 𝑠

Commonly: 𝑎 > 𝑠

And the quantity 𝑤 = 𝑎 − 𝑠 is wasted disk space due to the organization of the file into clusters

This waste of space is called internal fragmentation of files.

Ex.

Deleting a file requires:

  • Only to update the meta-data to say the blocks where the file was stored are no longer in use by

the O.S.

Disk service time and response time

Service time 𝑠 𝑑𝑖𝑠𝑘

𝑑𝑖𝑠𝑘

  • Seek time : head movement time, is a function of the number of cylinders traversed

[

]

  • Latency time : time to wait for the sector (

1

2

𝑟𝑜𝑢𝑛𝑑) [𝑚𝑠𝑒𝑐]

  • Transfer time : is a function of rotation speed, storing density, cylinder position [

𝑀𝐵

𝑠𝑒𝑐

]

  • Controller overhead : buffer management (data transfer) and interrupt sending time

Response time R: time spent in queue waiting for the resource (that is busy executing other requests) and

for the execution of the request.

Response time is function of the:

  • Number of requests in the waiting queue
  • Utilization level of the resource
  • Mean disk service time
  • Variance of disk service time (distribution)
  • Arrival rate and inter-arrival times distribution

Assuming the same mean values of the metrics, the response time increases as the variability increases.

Ex.2.

Read/write of a sector of 512 𝐵𝑦𝑡𝑒 = 0. 5 𝐾𝐵

The previous service times consider only the very pessimistic case where sectors are fragmented on the

disk in worst possible way.

This is true if the files are very small or the disk is very fragmented.

In many circumstances, this is not the case: files are larger than one block, and they are stored in a

contiguous way to decrease the needs of seek and rotational latency times.

Suppose we can measure the data locality of a disk as the percentage of blocks that do not need seek or

rotational latency to be found.

Ex.2.

Read/write of a sector of 512 𝐵𝑦𝑡𝑒 = 0. 5 𝐾𝐵

RAID disk

Redundant Arrays of Independent Disks

The data are striped across several disks accessed in parallel:

  • High data transfer rate : large data accesses (heavy I/O op.)
  • High I/O rate : small but frequent data accesses (light I/O op.)
  • Load balancing across the disks

Two orthogonal techniques:

  • Data striping : to improve performance
  • Redundancy : to improve reliability

I/O virtualization: data are distributed transparently over the disks

Data striping:

  • striping : data are written sequentially (a vector, a file, a table, ...) in units (stripe unit: bit, byte,

blocks) on multiple disks according to a cyclic algorithm

  • stripe unit : dimension of the unit of data that are written on a single
  • disk stripe width : number of disks considered by the striping algorithm (does not necessarily

coincide with the number of physical disks in the array – there can be “hot spares”)

  1. Multiple independent I/O requests will be executed in parallel by several disks decreasing the

queue length (and time) of the disks.

  1. Single multiple-block I/O requests will be executed by multiple disks in parallel increasing of the

transfer rate of a single request

RAID architectures

  • RAID 0: stripping only
  • RAID 1: mirroring only
  • RAID 2: bit interleaving
  • RAID 3: byte interleaving - redundancy
  • RAID 4: block interleaving – redundancy
  • RAID 5: block interleaving – redundancy
  • RAID 6: greater redundancy (2 failed disks are tolerated)
  • RAID 7: proprietary solutions

N.B. RAID levels can be combined

RAID level 0: striping, no redundancy

Data are written on a single logical disk and splitted in several blocks distributed across the disks according

to a striping algorithm.

Used where performance and capacity, rather than reliability, are the primary concerns, minimum two

drives required.

Pros: lowest cost because it does not employ redundancy and best write performance because it does not

need to update redundant data and it is parallelized

Cons: single disk failure will result in data loss

RAID level 1: mirroring

Whenever data is written to a disk it is also duplicated (mirrored) to a second disk, minimum 2 disk drives.

Pros: high reliability (when a disk fails the second copy is used), read of data (it can be retrieved from the

disk with the shorter queueing, seek and latency delays), fast writes (no error correcting code should be

computed, but still lower than RAID 0)

Cons: high costs (50% of the capacity is used)

In principle, a RAID 1 can mirror the content over more than one disk. This give resiliency to errors even if

more than one disk break. It allows also with a voting mechanism to identify errors not reported by the disk

controller.

NEVER USED IN PRACTICE , because the overhead and costs are too high.

Two organization RAID 0+1 or RAID 1+0 are possible.

RAID 0+

  • High performance and reliability
  • Minimum 4 drives
  • After the first failure the model becomes as a RAID 0
  • High overhead (disk duplication)

RAID 1+

  • High performance and higher reliability than R0+
  • Minimum 4 drives are required
  • Used in databases with very high workload (fast writes)

R01 VS R

The blocks are the same but are allocated in a different order. Some controllers 0+1 execute striping and

mirroring as a single operation.

o Two concurrent disk failures are not tolerated

o After a disk fails, any other disk failure concerning another stripe is a single point of failure

o All the disks of the array are needed to recover the lost information

o One disk failure for each RAID1 group is tolerated, but after this failure any other disk

becomes a single point of failure of the entire array

RAID level 2: bit interleaved parity (not used)

Assumption: the failed component is unknown.

Memory stile ECC, less cost than mirroring but still very high

Number of redundant disks: proportional to the log of the total number of disks.

Parity blocks are computed for several subset of overlapped data.

When a disk fails several of the parity blocks will have inconsistent values, the failed component is the one

held in common by each incorrect subset.

RAID level 3: bit or byte interleaved parity

RAID5 data losstwo disks failed concurrently

Ex.4 - RAID level 5 -

5 physical disks configured as a single logical disk RAID5.

Write op. that update only one block require 5 disk I/Os (read-modify write)

Example: write of a new data in Block

  1. Read Block 2,3 and 4
  2. Computation of new Parity Block 1-4 with new Block 1
  3. Write Block 1 and Parity Block 1- 4

INEFFICIENT

5 disk I/O are required → BAD!

Disk 0 Disk 1 Disk 2 Disk 3 Disk 4

Stripe 1 Block 1 Block 2 Block 3 Block 4 Parity 1- 4

Stripe 2 Block 5 Block 6 Block 7 Parity 5- 8 Block 8

Stripe 3 Block 9 Block 10 Parity 9- 12 Block 11 Block 12

Write new data in Block 1

  1. Read old Block 1, old Parity 1- 4
  2. Compute new Parity 1-4 from old Block 1, new Block 1, old Parity 1- 4
  3. Write new Block 1 and Parity Block 1- 4

Only 4 disk I/O are required.

  • Write : compute new Parity 1-4 = old Parity 1- 4 – Block 1 + new Block 1
  • Read : parity blocks are not accessed during a read op. However, they are read if a sector suffer

from a CRC (Cyclic Redundancy Check) error. In this case the sector with the error can be

reconstructed from the remaining blocks of the stripe and the parity block.

RAID level 3, 4 and 5 comparison

Raid level 3,4 and 5 have all the same characteristics in terms of overhead, failure protection and total

available capacity.

However, they require controllers gradually more sophisticated:

  • Raid 3 has a simple controller, but requires disks synchronization
  • Raid 4 has the bottleneck on the parity disk
  • Raid 5 needs to implement a more complex algorithm

Currently, with the reduction of the hardware costs, Raid 5 are exactly as expensive as Raid 3 and 4.

For this Raid 3 and 4 are no longer used.

Current controller has also reduced the impact of interleaving, making thus the choice between corse and

fine grained less influent.

RAID level 6

As the size of the disk arrays are increasing, multiple failures become likely: this is due to the longer

recovery times required to consider all the sectors of the disks.

The probability of encountering an URE (Unrecoverable – Uncorrectable Read Error) can be significant.

RAID6: more fault tolerance with respect RAID

Uses Solomon-Reeds codes with two redundancy schemes: (P+Q) distributed and independent 2

concurrent failures are tolerated high overhead for the computation of parities N + 2 disks required with 4

data disks, each write require 6 disk accesses due to the need to update both the P and Q parity blocks

(slow writes) adopted for very critical applications.

N.B. Not efficient when the number of disks is small

When the number of disks increases the loss of efficiency decreases but the probability of 2 concurrent

failures increases, so RAID 6 becomes mandatory.

Let as call 𝐷 𝑖

the i-th data block

, and P and Q the two parity values.

Parity P is computed exactly as in RAID 5:

0

1

𝑛− 1

𝑖

𝑛− 1

𝑖= 0

Parity Q is instead computed using a generator value g:

0

1

𝑛− 1

𝑛− 1

𝑖

𝑖

𝑛− 1

𝑖= 0

Ex.

Disk 0 Disk 1 Disk 2 Disk 3 Disk 4

Stripe 1 Block 1 Block 2 Block 3 P 1- 3 Q 1- 3

Values 2 10 7 19 50

Let as call the 𝐷 𝑖

′ the old data in the i-th block

, and P’ and Q’ the two old paries.

The new parity P can be computed as:

𝑛𝑒𝑤

𝑜𝑙𝑑

𝑛𝑒𝑤

𝑖𝑖

𝑜𝑙𝑑

𝑗−𝑖

𝑗− 1

𝑖

𝑗− 1

The secondary disk, can then be recovered from the first by simply using parity P:

𝑗

𝑖

Ex.5.

Disk 0 Disk 1 Disk 2 Disk 3 Disk 4

Stripe 1 Block 1 Block 2 Block 3 P 1- 3 Q 1- 3

Values 2 10 7 19 50

2 − 1

2 − 1

− 1

𝑗− 1

1

2

DAS (Directed Attached Storage)

DAS is a storage system directly attached to a server or workstation.

The term is used to differentiate non-networked storage from SAN and NAS.

DAS has:

  • Limited scalability
  • Complex manageability
  • Limited performance
  • To read files in other machines, the “file sharing” protocol of the OS must be used

DAS does not necessary mean “internal drives”, all the external disks, connected with a point to point

protocol to a PC can be considered as DAS.

USB and eSATA drive are example of DAS.

The current state of the art connection system for DAS is called SAS – Serial Attached SCSI.

Both DAS and NAS can potentially increase availability of data by using RAID or clustering.

Comparing NAS with local DAS, the performance of NAS depends mainly on the speed of and congestion on

the network.

SAN (Storage Area Network)

SAN are remote storage units that are connected to PC using a specific networking technology.

NAS vs SAN

  • NAS provide both storage and a file system
  • This is often contrasted with SAN which provides only block-based storage and leaves file system

concerns on the “client” side

  • One way to loosely conceptualize the difference between a NAS and a SAN is that

o NAS appears to the client OS as file server

o A disk available through a SAN still appears to the client OS as a disk: it will be visible in the

disks and volumes management utilities, and available to be formatted with a file system

In conclusion:

  • NAS is used for low-volume access to a large amount of storage by many users
  • SAN is the solution for petabytes of storage and multiple , simultaneous access to files , such as

streaming audio/video

SANs have a special network devoted to the accesses to storage devices.

Two distinct networks (TCP/IP and dedicated network)

High scalability

Problems of TCP/IP based storage networks

TCP/IP stack over Ethernet adds several overheads:

  • Managing Ethernet frames
  • Size of Ethernet frames
  • Routing of IP packets

This overhead is in most cases not required to transfer disk commands.

Moreover the TCP/IP overhead is proportional to the quantity of data transferred.

A specific protocol can reduce the overhead and increase performance ( fibre channel , a high speed

network technology that is well established in the open system environment as the underlining

architecture of the SAN).

SAN + NAS head

  • SANs are used to actually store the data
  • SAN disks are accessed using standard Ethernet, using a special appliance called NAS head

Solid State Disk (SSD)

A solid state drive is a non-volatile storage device.

A controller is included in the device with one or more solid state memory components.

The device should use traditional hard disk (HDD) interfaces and form factors.

Data are stored in array of NAND cells:

  • SLC: cells store one bit
  • MLC: cells store more bit by multilevel voltage

o Twice the capacity of SLC using MLC

o Tolerance on adjacent level smaller than SLC → decreasing data reliability

  • NAND flash is organized into:

o Pages: smallest unit that can be read/written. In particular: a sub-unit of an erase block

consisting of number of bytes which can be read from and written to in single operations,

through the loading or unloading of a page buffer and the issuance of a program or read

command

o Blocks (or Erase Block): smallest unit that can be erased, typically consisting of multiple

pages

Ex.

Hypothetical SSD:

  • Page size: 4KB
  • Block size: 5 Pages
  • Drive size: 1 Block
  • Read Speed: 2 KB/s
  • Write Speed: 1 KB/s

Let’s write a 4KB text file to the brand new SSD

Now let’s write a 8KB pic file to the almost brand new SSD

Now let’s delete the txt file in the first page

Finally lets write a 12KB pic to the SSD.

How long should it take? 1KB/s write speed

Problems:

  • The OS is told there are 3 open pages on the SSD when there are only 2 available
  • Time for the SSD to do some fancy footwork to open up the space

So steps:

  1. Read block into cache
  2. Delete page from cache
  3. Write new pic into cache
  4. Delete the old block on SSD
  5. Write cache to SSD

The OS only thought it was writing 12KB of data when in fact the SSD had to read 12KB and then write

20KB, the entire block.

Since the SSD is quite slow the operation should have taken 12sec but actually took 26sec, resulting in a

write speed of 0.46KB/s and not 1KB/s!

SSD drawbacks

  • They cost more than the conventional HDD
  • Flash memory can be written only a limited number of times (wear):

o Have a shorter lifetime

o Error correcting codes

o Over-Provisioning

  • Different read/write speed
  • Write performance degrades of one order of magnitude after the first writing
  • Often the controller become the real bottleneck to the transfer rate
  • SSD are not affected by data-locality and must not be defragmented

Definitions

  • Retention failure: a data error occurring when the SSD is read after an extended period of time

following the previous write

  • Endurance failure: a failure caused by endurance stressing
  • Endurance rating (TBW rating): the number of terabytes that may be written to the SSD while still

meeting the requirements

  • Write amplification factor (WAF): the data written to the NVM divided by data written by the host

to the SSD

  • Uncoverable Bit Error Ratio (UBER): a metric for the rate of occurrence of data errors, equal to the

number of data error per bits read

Trim command

As we have seen, deleting a file only updates the meta data:

  • The TRIM command allows an operating system to indicate blocks that are no longer used in a

solid-state drive, such as blocks released after deleting one or more files

  • Generally, in the delete operation performed by an OS the data blocks are marked as not in use
  • The TRIM allows the OS to pass this information to the SSD controller which otherwise would not

be able to know which blocks to delete