Semaphore - High Performance Computing - Lecture Slides, Slides of Computer Science

Some concept of High Performance Computing are Addressing Modes, Program Execution, Basic Computer Organization, Control Hazard Solutions, Least Recently Used, Memory Hierarchy Progression. Main points of this lecture are: Semaphore, Mutex Locks, Concurrent Program, Process, Multiplies, Semaphores, Deadlock, Acquirelock, Cycle of Processes, Resources Needed

Typology: Slides

2012/2013

Uploaded on 04/28/2013

dewaan
dewaan 🇮🇳

3.8

(4)

43 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
High Performance Computing
Lecture 21
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Semaphore - High Performance Computing - Lecture Slides and more Slides Computer Science in PDF only on Docsity!

High Performance Computing

Lecture 21

2

Semaphore Examples

 Semaphores can do more than mutex locks

 Example: Consider our concurrent program

where process P1 reads 2 matrices; process

P2 multiplies them & process P3 outputs the

product

 Semaphores

Process P1 Process P2 Process P

Read A[ ], B[ ] C[ ] = A[ ] * B[ ] Write C[ ]

S

1 P(S 1

V(S

1

S

2

P(S

2

V(S

2

4

Classical Problems

Producers-Consumers Problem

 Bounded buffer problem

 Producer process makes things and puts them

into a fixed size shared buffer

 Consumer process takes things out of shared

buffer and uses them

 Must ensure that producer doesn’t put into full

buffer or consumer take out of empty buffer

 While treating buffer accesses as critical section

5

Producers-Consumers Problem

shared Buffer[0 .. N-1]

Producer: repeatedly

Produce x

Buffer[i++] = x

Consumer: repeatedly

y = Buffer[- - i]

Consume y

; if (buffer is full) wait for consumption

; signal consumer

If (buffer is empty) wait for production

; signal producer

7

THREADS

Thread

 Thread of control in a process

 `Light weight process’

 Weight related to

 Time for creation

 Time for context switch

 Size of context

 Recall: Process as a Data Structure

8

Process as a Data Structure.

 What is the data manipulated by these

process operations?

1. Text, Data, Stack, Heap

2. Data stored in hardware

3. Other information maintained by the OS

 Process, parent and user identifiers  Memory management information: Page table  CPU time used by the process, in user/system  File related info: Open files, file pointers

PROCESS CONTEXT

10

Thread Implementation

 Could either be supported in the operating

system or by a library

 Pthreads: POSIX thread library

 int pthread_create

 pthread_t *thread, const pthread_attr_t *attr, void (start_routine), void *arg

 pthread_attr

 pthread_join

 pthread_exit

 pthread_detach

11

Synchronization Primitives

Mutex locks

int pthread_mutex_lock(pthread_mutex_t *mutex)

If the mutex is already locked, the calling thread blocks until the mutex becomes available. Returns with the mutex object referenced by mutex in the locked state with the calling thread as its owner.

pthread_mutex_unlock

Semaphores

sem_init

sem_wait

sem_post

13

Basic Computer Organization

Cache Memory I/O Bus I/O I/O MMU ALU Registers

CPU

Control

14

Performance of Processor

 Which is more important?

 execution time of a single instruction

 throughput of instruction execution

i.e., number of instructions executed per unit time

 Cycles Per Instruction (CPI)

 Current ideas: CPI between 3 and 5

 Pipelining

 Why keep Fetch hardware idle while instruction is

being decoded

 Inspired by petroleum pipelines?

16

Inside the Processor

Mem IR

PC NPC Reg File sign extend A Imm B Inst Fetch IF Inst Decode ID 4 ALU ALU out Zero? Mem LMD Execution EX Memory MEM Cond WB IF ID EX MEM WB

17

Processor Pipelining

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

IF ID EX MEM WB

  • Execution time of each instruction is still 5 cycles, but the throughput is now 1 instruction per cycle
  • Initial pipeline fill time (4 cycles), after which 1 instruction completes every cycle time i i i i

clock cycles