











































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
University of Texas at Austin CS352H - Computer Systems Architecture Fall 2009 Don Fussell 2. Introduction. Goal: connecting multiple computers.
Typology: Slides
1 / 51
This page cannot be seen from the preview
Don't miss anything!












































Multiprocessors Scalability, availability, power efficiency
High throughput for independent jobs
Single program run on multiple processors
Chips with multiple processors (cores)
Synchronization
Associativity
Cache Coherence
Redundant Arrays of Inexpensive Disks
Otherwise, just use a faster uniprocessor, since it’s easier!
Partitioning Coordination Communications overhead
Speed up from 10 to 100 processors
add
Time = 10 × tadd + 100/10 × tadd = 20 × tadd Speedup = 110/20 = 5.5 (55% of potential)
Time = 10 × tadd + 100/100 × tadd = 11 × tadd Speedup = 110/11 = 10 (10% of potential)
add
Time = 10 × tadd + 10000/10 × tadd = 1010 × tadd Speedup = 10010/1010 = 9.9 (99% of potential)
Time = 10 × tadd + 10000/100 × tadd = 110 × tadd Speedup = 10010/110 = 91 (91% of potential)
Hardware provides single physical address space for all processors Synchronize shared variables using locks Memory access time UMA (uniform) vs. NUMA (nonuniform)
Each processor has ID: 0 ≤ Pn ≤ 99 Partition 1000 numbers per processor Initial summation on each processor sum[Pn] = 0; for (i = 1000Pn; i < 1000(Pn+1); i = i + 1) sum[Pn] = sum[Pn] + A[i];
Reduction: divide and conquer Half the processors add pairs, then quarter, … Need to synchronize between reduction steps
Network of independent computers Each has private memory and OS Connected using I/O system E.g., Ethernet/switch, Internet Suitable for applications with independent tasks Web servers, databases, simulations, … High availability, scalable, affordable Problems Administration cost (prefer virtual machines) Low interconnect bandwidth c.f. processor/memory bandwidth on an SMP
limit = 100; half = 100;/* 100 processors / repeat half = (half+1)/2; / send vs. receive dividing line / if (Pn >= half && Pn < limit) send(Pn - half, sum); if (Pn < (limit/2)) sum = sum + receive(); limit = half; / upper limit of senders / until (half == 1); / exit with final sum */ Send/receive also provide synchronization Assumes send/receive take similar time to addition
E.g., Internet connections Work units farmed out, results sent back
E.g., SETI@home, World Community Grid
Schedule instructions from multiple threads Instructions from independent threads execute when function units are available Within threads, dependencies handled by scheduling and register renaming
Two threads: duplicated registers, shared function units and caches