Docsity
Docsity

Prepara i tuoi esami
Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity


Ottieni i punti per scaricare
Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium


Guide e consigli
Guide e consigli


Computing Infrastructures (Dispensa del Corso), Dispense di Sistemi Informatici

La dispensa rielabora e integra gli appunti presi a lezione con le slides mostrate e fornisce una preparazione teorica e pratica completa per quanto riguarda il corso di Computing Infrastructures, erogato al Politecnico di Milano.

Tipologia: Dispense

2022/2023

In vendita dal 02/07/2023

MattBlue00
MattBlue00 🇮🇹

5

(1)

7 documenti

1 / 76

Toggle sidebar

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

bg1
1
COMPUTING
INFRASTRUCTURES
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c

Anteprima parziale del testo

Scarica Computing Infrastructures (Dispensa del Corso) e più Dispense in PDF di Sistemi Informatici solo su Docsity!

C O M P U T I N G

I N F R A S T R U C T U R E S

1. C O M P U T I N G I N F R A S T R U C T U R E S

Modern large-scale data centers require the seamless integration of different components (i.e. applications, computation nodes, storage devices, and networks) into one computing infrastructure.

1. 1 B A S I C D E F I N I T I O N S

A computing infrastructure is a technological infrastructure that provides hardware and software for computation to other systems and services. Examples of computing infrastructures are Internet of Things, embedded devices, Edge Computing Systems and data centers. A virtual machine provides the full stack (OS, LIB, APP), and applications depend on the guest OS. A container is an application packaged with all its dependencies into a standardized unit for software development/deployment.

1. 2 P R O S A N D C O N S O F N O T A B L E I N F R A S T R U C T U R E S

Data centers have many advantages :

  • (^) lower IT costs
  • (^) high performance
  • (^) instant sofwtare updates
  • (^) “unlimited” storage capacity
  • (^) increased data reliability
  • (^) universal document access
  • (^) device independence However, they have also some drawbacks :
  • (^) they require a constant Internet connection
  • (^) they do not work well with low-speed connections
  • (^) hardware features might be limited
  • (^) privacy and security issues
  • (^) high power consumption
  • (^) latency in making decision Another crucial issue w.r.t data centers is water consumption: indeed, a midsize data center uses roughly as much water as about three average hospitals.

Edge Computing Systems have some advantages :

  • (^) high computational capacity
  • (^) distributed computing
  • (^) privacy and security
  • (^) reduced latency in making decision However, they have also some drawbacks :
  • (^) they require a power connection
  • (^) they require a connection with the Cloud Embedded Devices have some advantages :
  • (^) pervasive computing
  • (^) high performance unit
  • (^) availability of development boards
  • (^) programmed as PCs
  • (^) large community However, they have also some drawbacks :
  • (^) pretty high power consumption
  • (^) (some) hardware design has to be done Internet of Things systems have some advantages :
  • (^) highly pervasive
  • (^) wireless connection
  • (^) battery powered
  • (^) low costs
  • (^) sensing and actuating However, they have also some drawbacks :
  • (^) low computing ability
  • (^) constraints on energy
  • (^) constraints on memory (RAM/FLASH)
  • (^) difficulties in programming

dedicated hardware infrastructure that is de-coupled and protected from other systems in the same facility. Applications tend not to communicate with each other. Those data centers host hardware and software for multiple organizational units or even different companies. Instead, WSCs belong to a single organization, use a relatively homogeneous hardware and system software platform and share a common systems management layer. WSCs run a smaller number of very large applications (or internet services). The common resource management infrastructure allows significant deployment flexibility. The requirements of homogeneity, single-organization control and cost-efficiency motivate designers to take new approaches in designing WSCs. Initially designed for online data-intensive web workloads, WSCs now power public clouds computing systems (e.g. Amazon, Google, Microsoft). Such public clouds do run many small applications, like a traditional data center; all of these applications rely on virtual machines (or containers), and they access large, commons services for block or database storage, load balancing and so on, fitting very well with the WSC model. Notice that WSCs are not just a collection of servers: the software running on these systems executes on clusters of hundreds to thousands of individual servers (far beyond a single machine or a single rack), and the machine is itself this large cluster or aggregation of servers and needs to be considered as a single computing unit. Multiple data centers are (often) replicas of the same services, so to reduce user latency and to improve serving throughput. However, a request is typically fully processed within one data center.

2. 3 H I E R A R C H I C A L A P P R O A C H O F W S C

The world is divided into Geographic Areas ( GAs ), defined by geo-political boundaries (or country borders) and determined mainly by data residency. In each GA, there are at least 2 computing regions. Customers see Computing Regions ( CRs ) as the finer grain discretization of the infrastructure, i.e. multiple data centers in the same region are not exposed. There’s a latency-defined perimeter, that is 2ms for the round trip. Hundreds of miles apart, with different flood zones and so on, such latency is not enough for synchronous replication, but is its okay for disaster recovery. Availability Zones ( AZs ) are finer grain location within a single computing region. This allows customers to run mission critical applications with high availability and fault tolerance to data center failures. The AZs are indeed fault-isolated locations with redundant

power, cooling and networking. There’s application-level synchronous replication among AZs: 3 is minimum and enough for quorum. Services provided through WSCs must guarantee high availability, typically aiming for at least 99.99% uptime (i.e. one hour downtime per year). Achieving such fault-free operation is difficult when a large collection of hardware and system software is involved. WSC workloads must be designed to gracefully tolerate large number of component faults with little or no impact on service level performance and availability.

2. 4 A R C H I T E C T U R A L O V E R V I E W O F W S C

The hardware implementation of WSCs might differ significantly between each other; however, the architectural organization of these system is relatively stable. Servers are like ordinary PCs, but they have form factors such that they can be fit into three different types of structures:

  • (^) rack (1U or more)
  • (^) blade enclosure format
  • (^) tower These may differ in:
  • (^) number and type of CPUs
  • (^) available RAM
  • (^) locally attached disks (HDD, SSD or not installed)

3. S E R V E R S

Servers hosted in individual shelves are the basic building blocks of WSCs. They are interconnected by hierarchies of networks, and are supported by the shared power and cooling infrastructure.

3. 1 M A I N P R O C E S S I N G E Q U I P M E N T

Servers are like ordinary computers, but with a form factor that allows to fit them into shelves, in various formats. They are usually built in a tray or blade enclosure format, housing the motherboard, the chipset and additional plug-in components. The motherboard provides sockets and plug-in slots to install CPUs, memory modules (DIMMs), local storage (such as Flash SSDs or HDDs), and network interface cards (NICs) to satisfy the range of resource requirements. WSCs use a relatively homogeneous hardware and system software platform. Servers also have:

  • (^) from 1 to 8 CPU socket
  • (^) from 2 to 192 DIMM slots (i.e. available RAM)
  • (^) from 1 to 24 Drive Bays, HDD or SSD, SAS or SATA as locally attached disks
  • (^) from 1 to 20 GPUs/TPUs per node, as other special purpose devices
  • (^) a form factor either from 1U to 10U, or tower

3. 2 R A C K S , T O W E R S , B L A D E S

Racks are special shelves that accommodate all the IT equipment and allow their interconnection. They are used to store rack servers. Server racks are measured in rack units, or “U’s” (1U is 44.45 mm). The advantage of using these racks is that it allows designers to stack up other electronic devices along with the servers. Of course, IT equipment must conform to specific sizes in order to fit into the rack shelves. A rack server is designed to be posi- tioned in a bay, by vertically stacking one over the other along with other devices. However, it is not only a

physical structure: indeed, they surely hold tens of servers together, but they also handle shared power infrastructure, including power delivery, battery backup and power conversion. The width and depth of such racks range widely across WSCs. Notice that it is often convenient to connect the network cables at the top of the rack; such a rack-level switch is appropriately called Top of Rack (TOR) switch. A tower server looks and feels much like a traditional tower PC. Instead, blade servers are the latest and the most advanced type of servers in the market. They can be termed as hybrid rack servers, in which servers are placed inside blade enclosures, forming a blade system. The biggest advantage of blade servers is that these servers are the smallest types of servers available at this time and are great for conserving space. Notice that a blade system also meets the IEEE standard for rack units and each rack is measured in the units of “U’s”. The IT equipment is stored into corridors, and organized into racks. Server racks are never back-to-back : corridors where servers are located are split into cold aisle , where the front panels of the equipment is reachable, and warm aisle , where the back connections are located. Cold air flows from the front (cool aisle), cools down with the equipment and leave the room from the back (warm aisle). SERVER TYPE PROS CONS RACK SERVERS

  • failure containment: it is needed very little effort to identify, remove and replace a malfunctioning server with another
  • simplified cable management: it is easy and efficient to organize cables
  • cost effective: computing power and efficiency are achieved at relatively low costs - power usage: racks need additional cooling systems due to their overall high component density, thus consuming more power - maintenance: since multiple devices are placed in racks together, maintaining them gets considerably tough with the increasing number of racks TOWER SERVERS
  • scalability and ease of upgrade: they are customized and upgraded based on necessity
  • cost-effective: they are probably the cheapest of all kinds of servers
  • cools easily: since they have an overall low component density, they cool down easily
  • consumes a lot of space: they are difficult to manage physically
  • provides a basic level of performance: a tower server is ideal for small businesses that have a limited number of clients
  • complicated cable management: devices aren’t easily routed together SERVER TYPE
  • (^) is the output;
  • (^) is the prediction function;
  • (^) is a feature. Given a training set of labeled examples, we can estimate the prediction function by minimizing the prediction error on the training set. If we apply to a never seen before test set , we can output the predicted value.

3. 4 H A R D W A R E A C C E L E R A T O R S

Deep learning models began to appear and be widely adopted, enabling specialized hardware to power a broad spectrum of machine learning solutions. Since 2013, AI training computation requirements have doubled every 3.5 months, in contrast with what expected from Moore’s Law. In order to satisfy the growing computation needs for deep learning, WSCs deployed specialized hardware accelerators: GPU, TPU and FPGA. Graphical Processing Units (GPU) are characterized by data-parallel computations, i.e. the same program is executed on many data elements in parallel. The scientific codes are mapped onto the matrix operations, and high level languages (such as CUDA and OpenCL) are required. GPUs are up to 1000x faster than CPUs. y f x f f

A deep neural network can be trained on multiple GPUs. The performance of such synchronous system is limited by the slowest learner and the slowest messages through the network. Since the communication phase is in the critical path, a high performance network can enable fast reconciliation of parameters across learners. GPUs are configured with a CPU host connected to a PCIe-attached accelerator tray with multiple GPUs. GPUs within the tray are connected using high-bandwidth interconnections such as NVLink. In the A100 GPU, each NVLink lane supports a data rate of 50x4 Gbit/s in each direction. The total number of NVLink lanes increases from six lanes in the V100 GPU to 12 lanes in the A100 GPU. Nowadays, we are yielding 600 GB/s overall performance! While suited to machine learning, GPUs are still relatively general purpose devices. In recent years, designers further specialized them to machine-learning-specific hardware, thanks to custom-built integrated circuit tailored for TensorFlow. A Tensor is an n- dimensional matrix, which is the basic unit of operation in TensorFlow. Tensor Processing Units (TPU) are used for training and inference: TPUv1 is an inference-focused accelerator connected to the host CPU trough PCIe links, whereas TPUv2, TPUv3 and TPUv4 focus training and inference. Each Tensor core has an array for matrix computations (MXU) and a connection to high bandwidth memory (HBM) to store parameters and intermediate values during computation. TPUv2 has 8 GiB of HBM for each TPU core, one MXU for each TPU core, and 4 chips ( cores per chip). In a rack, multiple TPUv2 accelerator boards are connected through a custom high-bandwidth network, to provide 11.5 petaflops of machine learning computations. The high-bandwidth network enables fast parameter reconciliation with well- controlled tail latencies. In a TPU Pod, there are up to 512 total TPU cores and 4 TB of total memory (as if there were 64 TPU units). TPUv3 is liquid-cooled, and 2.5x faster than TPUv2. Such supercomputing-class computa- tional power supports new machine learning capabilities (e.g. AutoML) and rapid neural architecture search. The TPUv3 Pod provides a maximum configuration of 256 devices for a total of 2048 TPUv3 cores, 100 petaflops and 32 TB of TPU memory.

4. S T O R A G E

We live in a data-driven world. Nowadays, machines generate data at an unprecedented rate: major big-data sources are media, sensors, surveillance cameras, digital medical imaging devices, Industry 4.0 and AI. Such growth favors the centralized storage strategy, limiting redundant data, automatizing replication and backup, and reducing management costs. Storage technology is dominated by HDDs, but recent technologies include SSDs, NVMe and hybrid solutions of HDDs and SSDs (such as SSHDs). Anyway, keep in mind that tapes will never die!

4. 1 F I L E S

Disks can be seen by an operating system as a collection of data blocks that can be read or written independently. In order to allow the ordering/management among them, each block is characterized by a unique numerical address called Logical Block Address (LBA). Typically, the operating system groups blocks into clusters to simplify the access to disk. Clusters are the minimal unit that an operating system can read from or write to a disk; a typical cluster’s size ranges from 1 disk sector (512B) to 128 sectors (64KB). Clusters contain:

  • (^) file data. They are the actual content of the files.
  • (^) meta data. They hold the information required to support the file system: file name, directory structure and symbolic links, file size, file type, creation/modification/last access dates, security information (owners, access list, encryption) and links to the LBA where the file content can be located on the disk.

Reading a file requires to:

  1. access the meta data to locate its block;
  2. access the blocks to read its content. Writing a file requires to:
  3. access the meta data to locate free space;
  4. write the data in the assigned blocks. Since the file system can only access clusters, the real occupation of space on a disk for a file is always a multiple of the cluster size. If we define as the file size, as the cluster size and as the actual size on disk , we have: and the quantity is the wasted disk space due to the organization of the file into clusters. This waste of space is called internal fragmentation of files. Deleting a file requires only to update the meta-data to say that the blocks where the file was stored are no longer in use by the operating system. Deleting a file never actually deletes the data on disk: when a new file will be written on the same clusters, the old data will be replaced by the new one. As the life of the disk evolves, there might not be enough space to store a file contiguously. In this case, the file is split into smaller chunks that are inserted into the free clusters spread over the disk. The effect of splitting a file into non-contiguous clusters is called external fragmentation. As we will see, this can reduce a lot the performance of an HDD.

4. 2 H A R D D I S K D R I V E S ( H D D )

A Hard Disk Drive (HDD) is a data storage using rotating disks (platters) coated with magnetic material. Data are read in a random-access manner, meaning that individual blocks of data can be stored or retrieved in any order rather than sequentially. An HDD consists of one or more rigid (“hard”) rotating disks with magnetic heads arranged on a moving actuator arm to read and write data to the surfaces. Externally, hard drives expose a large number of sectors (blocks), of typically 512 or 4096 bytes. They have a header and an error correction code. Notice that individual sector writes are atomic, whereas multiple sectors writes may be interrupted (a torn white happens when only a part of a multi-sector update is written successfully to disk). A drive’s sector is s c a a = ⌈ s c ⌉^ ⋅ c w = as

Disks are subject to delay :

  1. rotational delay. It is the time needed to rotate the desired sector to the read head. It is related to RPM: indeed, the full rotation delay is. In seconds: . Therefore: ;
  2. seek delay. It is the time needed to move the read head to a different track. This means the read head has to accelerate, coast (i.e. go at constant speed), decelerate and settle. The modeling consider a linear dependency with the distance, therefore: ;
  3. transfer time. It is the time needed to read or write bytes from/to the surface. It is the final I/O phase that takes place, and it includes the time for the head to pass on the sectors and the I/O transfer (rotation speed and storing density influence this time delay);
  4. controller overhead. It is the overhead needed for the request management (i.e. buffer management and interrupt sending time). Service time is defined as: Response time is defined as: where depends on queue-length, resource utilization, mean and variance of disk service time (distribution) and request arrival distribution. The worst case scenario is the one in which files are very small (each file is contained in one block) or the disk is very externally fragmented. Thus, each access to a sector requires to pay rotational latency and seek time. In many circumstances, this is not the case, as files are larger than one block and are stored in a contiguous way. We can measure the data locality of a disk as the percentage of blocks that do not need seek or rotational latency to be found. Therefore, the average service time can be defined as: Caching helps improve disk performance, but it can’t make up for poor random access times. If there’s a queue of requests for a disk, they can be reordered to improve R = 1 /DiskRPM R sec = 60 ⋅ R T rotation_avg = R sec / 2 T seek T seek_avg = T seek_max / 3 T I/O = T seek + T rotation + T transfer + T overhead R^ ˜ = T queue +^ T I/O T queue T I/O = ( 1 − Data_Locality) ⋅ ( T seek + T rotation) + T transfer + T overhead

performance. The estimation of the request’s length is feasible knowing the position of data on the disk. Thus, there are several scheduling algorithms :

  • (^) First Come, First Serve (FCFC). This is the most basic scheduler, as serves requests in order. It spends a lot of time seeking;
  • (^) Shortest Seek Time First (SSTF). Its idea is to minimize seek time by always selecting the block with the shortest seek time. The good part is that SSTF is optimal, and can be easily implemented, but the bad part is that is prone to starvation;
  • (^) Elevator algorithm (SCAN). The head sweeps across the disk, servicing the requests in order. The good part is that it provides reasonable performance and no starvation, but the bad part is that average access times are lower for requests at high and low addresses;
  • (^) C-SCAN (Circular SCAN). It is like SCAN, but only service requests in one direction. The good part is that is fairer than SCAN, but the bad part is that the performance is worse than SCAN;
  • (^) C-LOOK. It is a C-SCAN variant that peeks at the upcoming addresses in the queue, i.e. the head only goes as far as the last request. The last problem to address is where to implement disk scheduling:
  • (^) OS scheduling. It can perform request re-ordering by LBA, but cannot account for rotation delay;
  • (^) on-disk scheduling. The disk knows the exact position of the head and the platters, thus can implement more advanced schedulers. However, it requires specialized hardware and drivers;
  • (^) disk command queue. It is available in all modern disks. It is a queue where a disk stores pending read/write requests (this is called NCQ - Native Command Queuing). The disk may re-order items in the queue to improve performance;
  • (^) joint OS and on-disk scheduling. This can bring to problems, e.g. NCQ and I/O scheduler may be conflicting.

4. 3 S O L I D S T A T E D I S K S ( S S D )

A Solid State Disk (SSD) is a storage device that has no mechanical or moving parts (unlike HDD) and is built out of transistors (like memory and processors). They retain information despite power loss (unlike typical RAM). A controller is included in the device with one or more solid state memory components. It uses traditional HDD interfaces (protocol and physical connectors) and form factors (as technology evolves, this is becoming less true). It provides higher performance than HDD.