Download Overview and Existential Crisis - Parallel and Distributed Computing - Lecture Slides and more Slides Parallel Computing and Programming in PDF only on Docsity!
CS 491:
Overview and
Existential Crisis
“Parallel & Distributed Computing”
- What does it mean to you?
- Coordinating Threads
- Supercomputing
- Multi-core Processors
- Beowulf Clusters
- Cloud Computing
- Grid Computing
- Client-Server
- Scientific Computing
- All contexts for “splitting up work” in an
explicit way
What is Supercomputing?
- Supercomputing is the biggest, fastest computing right this minute.
- Likewise, a supercomputer is one of the biggest, fastest computers right this minute. - The definition of supercomputing is, therefore, constantly changing.
- A Rule of Thumb: A supercomputer is typically at least 100 times as powerful as a PC.
- Jargon: Supercomputing is also known as High Performance Computing (HPC) or High End Computing (HEC) or Cyberinfrastructure (CI).
Fastest Supercomputer vs. Moore
Fastest Supercomputer in the World
1
10
100
1000
10000
100000
1000000
10000000
1992 1997 2002 2007 Year
Speed in GFLOPs
Fastest Moore
GFLOPs : billions of calculations per second
Over recent years, supercomputers have benefitted directly from microprocessor performance gains, and have also gotten better at coordinating their efforts.
Hold the Phone
- Why should we care?
- What useful thing actually takes a long time to run anymore? (especially long enough to warrant investing 6/7/8/9 figures on a supercomputer)
- Important: It’s usually not about getting something done faster, but about getting a harder thing done in the same amount of time - This is often referred to as capability computing
What Is HPC Used For?
- Simulation of physical phenomena, such as
- Weather forecasting
- Galaxy formation
- Oil reservoir management
- Data mining: finding needles of information in a haystack of data, such as: - Gene sequencing - Signal processing - Detecting storms that might produce tornados (want forecasting, not retrocasting…)
- Visualization: turning a vast sea of data into pictures that a scientist can understand - Oak Ridge National Lab has a 512-core cluster devoted entirely to visualization runs
Tornadic Storm
May 3 1999[2]
[3]
[1]
What is Supercomputing About?
- Size: Many problems that are interesting™ can’t fit on a PC – usually because they need more than a few GB of RAM, or more than a few 100 GB of disk.
- Speed: Many problems that are interesting™ would take a very very long time to run on a PC: months or even years. But a problem that would take a month on a PC might take only a few hours on a supercomputer.
Supercomputing Issues
- Parallelism: doing multiple things at the same
time
- finding and coordinating this can be challenging
- The tyranny of the storage hierarchy
- The hardware you’re running on matters
- Moving data around is often more expensive than
actually computing something
Parallel Processing
- The term parallel processing is
usually reserved for the situation in
which a single task is executed on
multiple processors
- Discounts the idea of simply running separate tasks onseparate processors – a common thing to do to get high throughput, but not really parallel processing
Key questions in hardware design:
1. How do parallel processors share data and communicate?
- shared memory vs distributed memory 2. How are the processors connected?
- The number of processors is determined by a combination of #1 and #
How is Data Shared?
- Shared Memory Systems
- All processors share one memory address space and can access it
- Information sharing is often implicit
- Distributed Memory Systems ( AKA
“Message Passing Systems”)
- Each processor has its own memory space
- All data sharing is done via programming primitives to pass messages - i.e. “Send data value to processor 3”
- Information sharing is always explicit
Shared Memory Systems
- Processors all operate independently, but operate out of the same logical memory.
- Data structures can be read by any of the processors
- To properly maintain ordering in our programs, synchronization primitives are needed! (locks/semaphores)
Cache
Processor
Cache
Processor
Cache
Processor
Single bus
Memory I/O
Connecting Multiprocessors
The Cache Coherence Problem
I/O devices
Memory
P 1
$ $ $
P 2 P 3
1 2
4 5 3
u =? u =?
u:
u:
u:
u = 7
Cache Coherence Solutions
- Two most common variations:
- “snoopy” schemes
- rely on broadcast to observe all coherence traffic
- well suited for buses and small-scale systems
- example: SGI Challenge or Intel x
- directory schemes
- uses centralized information to avoid broadcast
- scales well to large numbers of processors
- example: SGI Origin/Altix