Prepara i tuoi esami
Ottieni punti
Guide e consigli
Vendi su Docsity
Docsity AI

Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity

Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium

Guide e consigli

Vendi su Docsity

Docsity AI

Accedi Registrati

Prepara i tuoi esami

Studia grazie alle numerose risorse presenti su Docsity

Cerca documenti

Prepara i tuoi esami con i documenti condivisi da studenti come te su Docsity

Cerca la tua università

Trova i documenti specifici per gli esami della tua università

Video Corsi

Preparati con lezioni e prove svolte basate sui programmi universitari!

Quiz

Rispondi a reali domande d’esame e scopri la tua preparazione

Docsity AINEW

Riassumi i tuoi documenti, fagli domande, convertili in quiz e mappe concettuali

Maturità 2026

Studia con prove svolte, tesine e consigli utili

Esplora domande

Togliti ogni dubbio leggendo le risposte alle domande fatte da altri studenti come te

Argomenti di studio

Esplora i documenti più scaricati per gli argomenti di studio più popolari

Ottieni i punti per scaricare

Guadagna punti aiutando altri studenti oppure acquistali con un piano Premium

Condividi documenti

20 Punti

Per ogni documento caricato

Rispondi alle domande

5 Punti

per ogni risposta data (max 1 al giorno)

Tutti i modi per ottenere punti gratis

Ottieni punti subito

Scegli un piano Premium con tutti i punti di cui hai bisogno

Opportunità di studio

Scegli il tuo prossimo programma di studio

Entra in contatto con le migliori università del mondo e scegli il tuo percorso di studi

Classifica delle migliori università

Scopri le migliori università italiane secondo gli studenti

Community

Chiedi alla community

Chiedi aiuto alla community e sciogli i tuoi dubbi legati allo studio

Guide Gratuite

I nostri eBook salva studente

Scarica gratuitamente le nostre guide sulle tecniche di studio, metodi per gestire l'ansia, dritte per la tesi realizzati da tutor Docsity

Warehouse-Scale Computing: Architectures, Storage, and Virtualization - Prof. Roveri, Schemi e mappe concettuali di Architettura Dei Calcolatori

Politecnico di Milano (POLIMI)Architettura Dei Calcolatori

Prof. Manuel Roveri

A detailed overview of warehouse-scale computers (wscs), covering their architecture, storage solutions like raid, and virtualization techniques. It explores key components such as motherboards, cpus, ram, and network interface cards, as well as cooling systems and power supply units. The document also delves into storage architectures, including hdds and ssds, and discusses the flash translation layer (ftl) and its role in managing flash memory. Furthermore, it examines raid architectures for improving performance and reliability, and virtualization techniques including hypervisors and containerization. The document concludes with a discussion of energy consumption, data center tiers, and the importance of reliability and availability in modern datacenters. This comprehensive guide is ideal for students and professionals seeking to understand the complexities of wscs and their underlying technologies.

Tipologia: Schemi e mappe concettuali

2024/2025

In vendita dal 02/09/2025

anita-grossi-1 🇮🇹

4.7

(6)

63 documenti

1 / 23

Questa pagina non è visibile nell’anteprima

Non perderti parti importanti!

COMPUTING

INFRASTRUCTURES

Scopri Schemi e mappe concettuali di Architettura Dei Calcolatori Politecnico di Milano (POLIMI)

Documenti correlati

INFORMATICA GRAFICA esempio di esame da anno precedente

Elementi di Informatica Grafica

(1)

Argomenti Informatica Grafica

(1)

Fondamenti di informatica grafica EDA

(1)

Appunti di Computing Infrastructures

FrontEnd - Interaction Design

appunti informatica grafica

Riassunto Teoria 'Fondamenti di Informatica' (POLIMI Roveri Manuel 2023)

(1)

Appunti Informatica Grafica

Appunti corso Computing Infrastructure - Manuel Roveri 2024/2025

ARCHITECTURES AND MATERIALS FOR HISTORIC HERITAGE

Appunti completi di geologia a.s.2022/2023 pt.2

(1)

Anteprima parziale del testo

Scarica Warehouse-Scale Computing: Architectures, Storage, and Virtualization - Prof. Roveri e più Schemi e mappe concettuali in PDF di Architettura Dei Calcolatori solo su Docsity!

COMPUTING INFRASTRUCTURES

Data center

A data center is a building hosting many servers and communication devices , placed together for:

Environmental needs (cooling, humidity)
Physical security
Easier maintenance Traditional data centers run many small/medium-sized applications, each application uses its own dedicated and isolated hardware with no interaction. Often shared by different organizational units or companies

Warehouse-Scale Computer

To support large-scale internet services (like Gmail, Maps, Google Search), a new architecture was developed: the Warehouse-Scale Computer. A WSC:

Runs a few very large applications
Uses a homogeneous hardware/software platform
Belongs to a single organization
Shares a central resource management layer
Is seen as one logical computing unit, even though it consists of thousands of servers Availability and Geographic Distribution Because services running on WSCs need to be available almost all the time (typically 99.99% uptime, meaning less than 1 hour of downtime per year), the architecture is designed to be fault-tolerant and geographically distributed. To achieve this, WSCs are deployed across multiple locations, organized in a hierarchy:
Geographic Areas (GAs): Defined by legal or geopolitical borders, each containing at least two computing regions
Computing Regions (CRs): Groups of data centers close enough (within ~2ms latency) to support coordinated activity and disaster recovery
Availability Zones (AZs): Independent data centers within a region that offer synchronous replication and redundancy for fault tolerance (usually 3 per region to allow for majority-based decision making) HW (^) INFRASTRUCTURES SYSTEM LEVEL (before)

In a Warehouse-Scale Computer, a node is the smallest operational unit, basically a server that handles computation, storage, and network operations. Each node typically includes: A motherboard that holds: One or more CPU sockets Multiple DIMM slots for RAM (up to 192 slots) PCIe slots for GPUs or network cards SATA/SAS ports for local disks RAM : often hundreds of GB, ECC-protected Storage : from a few to 24 HDDs or SSDs, either SATA or faster SAS Network Interface Cards (NICs): often 10 Gbps or faster Power Supply Unit (PSU) and cooling systems (fans, airflow guides) Think of it as a “super-powered PC” mounted to work efficiently with others in a rack. Servers are mounted in standardized racks (typically 19-inch width, 48 inch depth ), flat and stackable units. Each rack:

Hosts multiple servers (1U, 2U, etc.)
Shares power and cooling
Has Top-of-Rack (ToR) switches for network aggregation
Supports modular cable management and airflow optimization Cooling system Racks are arranged into alternating corridors: Cold aisle → where server fronts face each other (cold air intake) Hot aisle → where server backs face each other (hot air exhaust) Cold air is pushed into the front of the racks, flows through the servers to cool components, and hot air is expelled out the back. This approach is called cold aisle / hot aisle containment, and it’s key for energy-efficient cooling in large-scale datacenters. NODE LEVEL SERVERS

Others techniques:

Raised floor airflow
Liquid cooling (e.g., Google TPUs)
Evaporative cooling for energy savings Power Infrastructure Inside the Datacenter In addition to cooling and server management, datacenters must ensure a stable and redundant power supply:
Rack-level PDUs (Power Distribution Units)
UPS (Uninterruptible Power Supplies) for short-term backup
Battery systems integrated with racks or rows
Diesel generators and redundant power feeds for long outages Power systems are deeply integrated into the rack and corridor structure, making the datacenter a cohesive, engineered environment. Some datacenters occupy more than 100,000 m². Power consumption may exceed 150 MW (equivalent to a small city). A typical goal is 99.99% availability → less than 1 hour downtime per year Design priorities: Redundancy (N+1, 2N) Modularity (easy to replace parts) Maintainability Energy efficiency Hardware accelerators As AI workloads exploded and Moore’s Law slowed down, WSCs increasingly adopted hardware accelerators to handle the rising compute demands efficiently. Main Types:
GPUs : Ideal for parallel workloads like ML training; widely supported (CUDA, TensorFlow)
TPUs (Google): Custom ASICs optimized for TensorFlow; much faster but vendor-locked
FPGAs : Reconfigurable for specific logic; efficient but harder to program Accelerators are plugged via PCIe or custom interconnects like NVLink, depending on the workload.

To optimize storage, modern disks use:

Zoned Bit Recording: More sectors on outer tracks → faster sequential reads
Skewing: Slight delays between adjacent tracks to minimize access time Access Time Components 1 Seek Time : move the head to the correct track → includes acceleration, coasting, deceleration, settling → average seek time ≈ T_seek_avg = T_max / 3 2 Rotational Latency : wait for the sector to spin under the head → T_rotation = 60s / RPM → average latency = T_rotation_avg = T_rotation / 2 3 Transfer Time : time to read/write actual bytes Ttransfer = data_size / transfer_rate 4 Controller Overhead : small delays in handling the request Performance Depends on Locality
High locality: Data read in one seek/rotation → fast
Low locality: Each block requires its own seek/rotation → very slow

2. SSDs - Solid State Drives

A Solid-State Drive (SSD) is a storage device based on NAND flash memory, with no moving parts. Unlike HDDs, which rely on spinning platters and mechanical arms, SSDs are purely electronic, making them:

Much faster, especially for random access
More reliable, with no mechanical failures
Silent and energy-efficient
Better suited for data-intensive, high-throughput workloads common in WSCs SSDs are built from:
Flash memory chips (organized in pages and blocks)
A controller to manage operations
An internal firmware layer called the Flash Translation Layer (FTL) T_IO = Tseek + Trotation + Ttransfer + Toverhead T_response = Tqueue + T_IO FCFS SSTF SCAN C-SCAN C-LOOK · (1-locality )

arrival closest (^) - to correc S L

SSDs organize their memory as follows:

Page : the smallest unit that can be read or written → Typically 4 KB
Block : the smallest unit that can be erased → Typically consists of 64 or 128 pages (e.g., 256 KB–512 KB) Important constraint: You can read/write pages erase entire blocks. Only empty pages can be written. To free a page, you must erase its entire block. The Write Amplification Problem When you update a page (e.g., 4 KB), you can’t just overwrite it in-place. Instead, the SSD must: 1 Read the whole block into cache 2 Modify the relevant page(s) 3 Erase the block 4 Write back the updated version of the entire block → This means that writing 4 KB may involve reading and writing hundreds of KB. This overhead is called write amplification — you're writing much more than the original data size. Consequences: - Slower write performance • Increased wear on memory cells • Shorter SSD lifespan The Flash Translation Layer (FTL) To manage the complex nature of flash memory, SSDs rely on the FTL, a key firmware layer that provides: a. Logical to Physical Mapping
Maps OS-level LBAs (logical block addresses) to internal physical pages
Enables overwriting without needing immediate erase b. Garbage Collection (GC)
Finds blocks full of dirty pages
Copies valid pages to a new block
Erases the old block to reclaim space GC introduces latency and write amplification, especially when triggered during heavy I/O. c. Wear Leveling
Flash blocks have limited erase/write cycles (typically 3,000–100,000)
FTL spreads writes across all blocks to ensure uniform wear
Prevents premature failure of frequently written areas Delete and TRIM Command File deletion is not sufficient in SSDs because OS typically just removes file metadata. → The TRIM command allows the OS to explicitly inform the SSD about which blocks are no longer in use. TRIM helps the SSD:
Identify garbage early. • Reduce unnecessary copying during G. • Improve long-term performance

RAID Architectures

RAID (Redundant Array of Independent Disks) is a technique used to:

Combine multiple physical disks into a single logical storage unit
Improve performance, reliability, or both
Offer transparency to the OS, which sees one logical volume regardless of underlying complexity Why Use RAID in WSCs?
- Disks are prone to failure → risk increases as more disks are added
- RAID adds redundancy and/or parallelism
- Used in databases, storage backends, and critical data systems Striping : distributes data across disks to improve speed Redundancy : introduces duplication or parity to tolerate failures RAID 0 – Striping Only Data is evenly split across disks ( the capacity is fully employed for storage space) No redundancy — any disk failure = total data loss Best for temporary, non-critical data like scratch space RAID 1 – Mirroring Only Every disk has an exact copy (mirror) ( Only 50% of disk capacity is usable) High fault tolerance Ideal for small, critical datasets or system partitions RAID 0+1: Stripe first, then mirror Stripe first: creates 2 groups of RAID 0 Then mirror: creates a copy (RAID 1) Second failure in a stripe group disables the whole group RAID 1+0 (RAID 10): Mirror first, then stripe Mirrors pairs of disks, then stripes across them Can tolerate multiple failures (as long as mirrors survive)

RAID 10 is preferred in databases and high-performance workloads Mirror Stripe Stripe Stripe Mirror Mirror c (^) capacity) ES : N =^ G disk C = (^) 1TB storage capacity^?

N.^ C^ =^ GTB^ storage

C = (^) 1TB (^) storage (col (^5) copie S -v & = 3 S -v O

RAID 5 – Striping with Single Parity One disk is used for recovery in case of failure: parity block -> allows recovery from one disk failure Write operation requires: (slower)

Reading old data + old parity
Computing and writing new parity Good balance of performance, capacity, and reliability RAID 6 – Double Parity Two disks are used for recovery in case of failure: 2 parity block -> allows recovery from two simultaneous failures Even slower write (and higher cost) Used in archival systems, WSCs with large disk arrays Parity Block A = 00110011 Block B = 10100011 Parity = A ⊕ B = 10010000 Now suppose A is lost: → Recovered A = Parity ⊕ B = 10010000 ⊕ 10100011 = 00110011 Il disco di parità non conserva i dati originali, ma sufficiente informazione ridondante (parity) calcolata su tutti gli altri, per ricostruire i dati persi se un disco fallisce. Formule: MTTF RAID 0 =

MTTF

#disks 1 disk MTTF (^) RAID 1 = Prob (DL) = Pr( 1°fail) • Pr(2°fail) Pr( 1°fail) = n/ MTTF Pr(2°fail) = MTTR/ MTTF MTTF (^) RAID 10= Prob (DL)

Pr( 1°fail) • Pr(2°fail) Pr( 1°fail) = n/ MTTF Pr(2°fail) = MTTR/ MTTF during repair MTTF (^) RAID 01= (^) Prob (DL) =^ Pr( 1°fail) • Pr(2°fail) Pr( 1°fail) = n/ MTTF (^) Pr(2°fail) = MTTR • n / 2 MTTF MTTF (^) RAID 5 = Prob (DL)

Pr( 1°fail) • Pr(2°fail) Pr( 1°fail) = n/ MTTF Pr(2°fail) = MTTR • (n-1) / MTTF MTTF (^) RAID 6 = Pr( 1°fail) • Pr(2°fail) • Pr(3°fail) Pr( 1°fail) = n/ MTTF Pr(2°fail) = MTTR • (n-1) / MTTF Pr(3°fail) = MTTR • (n-2) / 2 MTTF ② (N- 1) - C =^ 5TB Disk (^) D D1 D2 D O ⑧ 1 Po ? Pl us 5 S (N -^ 2)^.^ C^ =^ GTB Disk D D1 D2 D3^ D O ⑧ Po^ QU Q (^) usefull when ? (^) i (^) - S (^) MITR is P3 Q3 big 2 2 I 11 2 2 11 I

Warehouse-Scale Computers (WSCs) include thousands of interconnected servers running distributed applications(e.g., microservices, machine learning, replication, etc.). As compute power increases, network interconnects must scale accordingly. A good Intra-Datacenter Network (DCN) must provide:

High throughput
Low latency
Fault tolerance
Load balancing Key Concepts 1. East-West vs North-South Traffic East-West = server-to-server (internal), e.g., service calls, replication North-South = client-to-datacenter (external), e.g., user web requests → In modern DCs, ~76% of traffic is East-West 2. Bisection Bandwidth The total bandwidth between two halves of the datacenter. It must scale linearly with the number of servers to avoid bottlenecks. DCN Architecture Types Traditional DCN architecture : 3-Tier Model Layers: 1 Access (ToR switches) 2 Aggregation (connects ToRs) 3 Core (connects to Internet or other DCs) Limitations:
Limited bisection bandwidth
Costly and hard to scale beyond a few thousand servers ToR vs EoR switches:
ToR: 1 per rack, easy cabling, harder to manage
EoR: 1 per corridor, centralized, harder cabling Oversubscription: Bandwidth to upper layers is often oversubscribed (e.g., 4:1), which may hurt latency-sensitive apps. NETWORKING

3 unicast^ (1-1)^ ,^ multycast^ (1-may

,^ incast^ (many -^ 1)

~ internal bandwidth is far more important

Modern DCN architectures Modern designs aim to:

Use commodity switches
Maximize parallelism and path diversity
Achieve scalability without bottlenecks Clos / Spine-and-Leaf
Two layers: leaf switches (ToR) and spine switches (backbone)
Each leaf connects to all spines
All paths are equal-cost (→ ECMP load balancing) Benefits:
Scalable, uniform latency
High bisection bandwidth
Easy to expand (just add more switches)

Used in practice in hyperscale datacenters like Google, Facebook, Azure Fat-Tree A specific Clos variant optimized for commodity switches. Built around Pods, each with servers and two switch layers. Let k = number of switch ports → Supports up to 2k³ servers using 5k² switches Advantages:

Modular and symmetric
Full bandwidth between any two servers
Perfect for scale-out designs Recent innovations:
Optical Circuit Switching (OCS) for direct, high-bandwidth connections
Central SDN control for traffic engineering, fault response, and live reconfiguration V

Two-Loop vs Three-Loop Three-loop: Adds chillers to further cool water before reuse More efficient, but also more expensive and complex Two-loop: First loop: Air circulates within datacenter (via CRACs) Second loop: Liquid coolant removes heat to rooftop exchangers Advanced Cooling Techniques As servers get more powerful (especially with GPUs/TPUs), we need cooling closer to the source. In-Rack Cooling

Heat exchangers are placed inside the racks, right at the hot air exit
Very effective and fast cooling In-Row Cooling
- Cooling units placed between server racks
- Easier to maintain, and allows flexible datacenter design Liquid Cooling
Uses cold plates directly on high-heat components
Heat is transferred via liquid to a nearby exchanger
Efficient, but not all components can be cooled this way (some remain air-cooled) Container-Based Datacenters These are modular server systems built inside shipping containers (6–12 meters long).
Include integrated power and cooling
Can be deployed rapidly, even in remote locations
Ideal for edge computing or fast scaling

Energy consumption and sustainability

Datacenters consume huge amounts of energy:

Cooling alone may account for ~50% of the total power
Globally, datacenters: use ~3% of electricity (more than the UK), emit ~2% of global CO₂ (similar to aviation) This makes energy efficiency and sustainability a major concern in datacenter design. raffreddo i^ server · (^) vi-raffreddo l'aria aggiunge un^ secondo^ ri-raffreddamento D (^) ML and Al

PUE – Power Usage Effectiveness PUE is the standard metric for datacenter efficiency: PUE = Inversely, DCiE (Data Center infrastructure Efficiency) is: 1 / PUE Total power used (including cooling, lights, etc.) Power used by IT equipment only -> Ideal PUE = 1.0 (impossible in practice): The closer to 1, the more efficient the datacenter [ In 2012, Google achieved PUE < 1.1 ] Data Center Tiers Availability and reliability are standardized in Tier levels (from 1 to 4):

Tier I: Basic capacity, no redundancy
Tier II: Redundant capacity components
Tier III: Concurrent maintainability (no shutdowns during maintenance)
Tier IV: Fault-tolerant infrastructure Higher tier = higher cost, higher reliability

Reliability

1. System VMs Emulate the entire machine, including the ISA and hardware Support full operating systems Managed by a Virtual Machine Monitor (VMM), also known as a Hypervisor Type 1 (Bare-metal): runs directly on hardware Type 2 (Hosted): runs inside an OS 2. Process VMs Focus only on a single application Translate code into instructions for the host OS Examples: JVM, .NET CLR Virtualization Methods

Multi-Programming Not true virtualization, but OS-based abstraction:

The OS gives each process its own address space and file system
Feels like each user has their own machine

Emulation Used when the guest ISA ≠ host ISA:

Software translates instructions from one ISA to another
Useful for running legacy systems (e.g., old games or other architectures)

High-Level Language VMs

Focused on portability across platforms
Run applications in sandboxed environments Example: JVM runs Java bytecode the same way on any OS Properties of Virtualization Virtualization offers key features for cloud systems: Partitioning: multiple OSs on one physical server Isolation: failure in one VM doesn’t affect others Encapsulation: entire VM state can be saved/restored or migrated live Hardware Independence: move VMs between servers easily TYPE & TYPE 2 &

The Hypervisor A Hypervisor (aka VMM) is the software layer that manages virtual machines.

Allocates CPU, memory, and I/O
Handles privileged instructions
Ensures security and performance isolation Types of Hypervisors Type 1 – Bare-Metal
Installed directly on hardware
More efficient and secure
Two types: Monolithic: all drivers inside the hypervisor Microkernel: uses a separate service VM for drivers Type 2 – Hosted
Runs inside a traditional OS (e.g., VirtualBox, VMware Workstation)
Easier to set up and manage, but with slightly higher overhead Hardware-Level VMM sits between hardware and OS Entire OSs can run as guests Application-Level VMM sits between OS and applications Apps run inside sandboxed containers or language VMs System-Level VMM allows secondary OSs to run atop a host OS Paravirtualization The guest OS is modified to be "aware" it's running on a VM Uses hypercalls instead of slow system traps High performance. Requires OS modification Full Virtualization Guest OS runs unmodified The hypervisor traps and emulates privileged instructions Compatible with all OSs. Higher overhead, may require hardware support Containers vs Virtual Machines Containers are another virtualization method—but at the OS level.
Share the same kernel across containers
Contain all code, dependencies, configs needed to run the app
Ideal for microservices, CI/CD, and cloud-native development & 3