High Performance Computing Notes, Study notes of Computational Methods

HPC Notes. Introduces HPC basics

Typology: Study notes

2018/2019

Uploaded on 04/07/2019

loukit-khemka
loukit-khemka 🇮🇳

1 document

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Most high performance systems are based on Reduced Instruction Set Computer (RISC) processors.
High performance RISC processors are designed to be easily inserted into a multiple-processor
system with 2 to 64 CPUs accessing a single memory using the symmetric multi processing (SMP).
Each processor is very powerful and a small number of processors can be put into a single
enclosure. Often applications need to span multiple enclosures. In such cases, enclosures are linked
with a high-speed network to function as a network of workstations(NOW). A NOW can be used
individually through a batch queueing system or can be used as a large multicomputer using a
message passing tool such as parallel virtual machine (PVM) or message-passing interface (MPI).
The scalable parallel processing systems with hundreds or thousands of processors come in two
flavours: one programmed using message passing. These processors are connected using a
proprietory, scalable, high-bandwidth, low-latency interconnect. Because of the high performace
interconnect these systems can scale to the thousand of processors while keeping the time spent
performing the overhead communications to a minimum.
The second type of the large parallel processing system is the scalable non-uniform memory
access(NUMA) systems. These systems also use high performance interconnect to implement a
distributed shared memory that can be accessed from any processor using a load/store paradigm.
This is similar to programming SMP systems except that some areas of memory have slower access
than others.
High Performance Microprocessors:
A complex instruction set computer (CISC) instruction set is made of powerful primitives, close in
functionality to the primitives of high-level languages like C or FORTRAN. It captures the sense of
"don't do in software what you can do in hardware". RISC emphasizes low-level primitives, far
below the complexity of a high-level language. RISC takes more machine instructions than CISC to
compute anything.
Why CISC?
In the past, the design variables favoured CISC. 50 years ago, high-level language compilers didn't
generate the fastest code and they weren't thrifty with memory. Hence, programming is done in
assembly language. A good instruction set is both easy to use and powerful. "Powerful" instructions
accomplished a lot and saved the programmer from specifying many little steps- which made them
easy to use. A instruction that could roll all the steps of a complex operation, such as a do-loop, into
a single opcode was a plus, because it saved time and memory and memory was precious.
Complex instructions saved time too. When a single instruction can perform several operations, the
overall number of instructions retrieved from the memory can be reduced. Minimizing the
instructions was important because, with few exceptions, the machines of late 1950s were very
sequential; not until the current instruction was completed did the computer initiate the process of
going out to memory to get the next instruction. Modern machines form a bucket brigade- passing
instructions in from memory and figuring out what they do on the way – so that there are fewer
gaps in processing.
The assembly language programmers used the complicated machine instructions, but compilers
generally did not.
Fundamentals of RISC:
The following factors contributed to the growth of RISC:
(i) The number of transistors that could fit on a single chip were increasing. Eventually, one
would be able to fit all the components for a processor board onto a single chip.
(ii) Techniques like pipelining were being explored to improve performance. Variable-length
instructions and variable-length instruction execution times made implementing pipelines
more difficult.
(iii) As compilers improved, they found that well-optimized sequences of stream-lined
pf3

Partial preview of the text

Download High Performance Computing Notes and more Study notes Computational Methods in PDF only on Docsity!

Most high performance systems are based on Reduced Instruction Set Computer (RISC) processors. High performance RISC processors are designed to be easily inserted into a multiple-processor system with 2 to 64 CPUs accessing a single memory using the symmetric multi processing (SMP). Each processor is very powerful and a small number of processors can be put into a single enclosure. Often applications need to span multiple enclosures. In such cases, enclosures are linked with a high-speed network to function as a network of workstations(NOW). A NOW can be used individually through a batch queueing system or can be used as a large multicomputer using a message passing tool such as parallel virtual machine (PVM) or message-passing interface (MPI). The scalable parallel processing systems with hundreds or thousands of processors come in two flavours: one programmed using message passing. These processors are connected using a proprietory, scalable, high-bandwidth, low-latency interconnect. Because of the high performace interconnect these systems can scale to the thousand of processors while keeping the time spent performing the overhead communications to a minimum. The second type of the large parallel processing system is the scalable non-uniform memory access(NUMA) systems. These systems also use high performance interconnect to implement a distributed shared memory that can be accessed from any processor using a load/store paradigm. This is similar to programming SMP systems except that some areas of memory have slower access than others. High Performance Microprocessors: A complex instruction set computer (CISC) instruction set is made of powerful primitives, close in functionality to the primitives of high-level languages like C or FORTRAN. It captures the sense of "don't do in software what you can do in hardware". RISC emphasizes low-level primitives, far below the complexity of a high-level language. RISC takes more machine instructions than CISC to compute anything. Why CISC? In the past, the design variables favoured CISC. 50 years ago, high-level language compilers didn't generate the fastest code and they weren't thrifty with memory. Hence, programming is done in assembly language. A good instruction set is both easy to use and powerful. "Powerful" instructions accomplished a lot and saved the programmer from specifying many little steps- which made them easy to use. A instruction that could roll all the steps of a complex operation, such as a do-loop, into a single opcode was a plus, because it saved time and memory and memory was precious. Complex instructions saved time too. When a single instruction can perform several operations, the overall number of instructions retrieved from the memory can be reduced. Minimizing the instructions was important because, with few exceptions, the machines of late 1950s were very sequential; not until the current instruction was completed did the computer initiate the process of going out to memory to get the next instruction. Modern machines form a bucket brigade- passing instructions in from memory and figuring out what they do on the way – so that there are fewer gaps in processing. The assembly language programmers used the complicated machine instructions, but compilers generally did not. Fundamentals of RISC: The following factors contributed to the growth of RISC: (i) The number of transistors that could fit on a single chip were increasing. Eventually, one would be able to fit all the components for a processor board onto a single chip. (ii) Techniques like pipelining were being explored to improve performance. Variable-length instructions and variable-length instruction execution times made implementing pipelines more difficult. (iii) As compilers improved, they found that well-optimized sequences of stream-lined

instructions often outperformed the equivalent complicated multi-cycle instructions. The RISC designers sought to create a high performance single-chip processor with a fast clock rate, which made it necessary to discard the existing CISC instruction sets and develop a new minimal instruction set that could fit on a single chip. For the first generation of RISC chips, the restrictions on the number of components that could be manufactured on a single chip were severe, forcing the designers to leave out the hardware support for some instructions like no floating-point support and integer multiply. These instructions could be implemented using software routines that combined other instructions. Earliest RISC processors were not successful for following reasons:

  • It took time for compilers, OSs, and user software to be returned to take advantage of the new processors.
  • If an application depended on the performance of one of the software-implemented instructions, its performance suffered dramatically.
  • RISC instructions were simpler, more instructions were needed to accomplish the task.
  • All the RISC instructions were 32 bits long, while commonly used CISC instructions were as short as 8 bits, RISC program executables were often larger. Because of the last two issues, a RISC program may have to fetch more memory for its instructions than a CISC program. This clogged the memory bottleneck until sufficient caches were added to the RISC processors. Caches reduced the appetite for instructions that were loaded from memory. RISC processors eventually became successful as the amount of logic available on a single chip increased, floating-point operations were added back onto the chip. Some of the additional logic was used to add-on chip cache to solve some of the memory bottleneck problems due to larger appetite for instruction memory. Characterizing RISC: The features commonly found in RISC are:
  • Instruction pipelining
  • Pipelining floating-point execution
  • Uniform instruction length
  • Delaying branching
  • Load/store architecture
  • Simple addressing modes Both CISC and RISC have much in common: each uses registers, memory, etc. Many CISC machines also use caches and instruction pipelines. Focusing on a smaller set of less powerful instruction gave RISC its speed advantage. RISC have features such as functional pipelines, sophisticated memory systems, and ability to issue two or more instructions per clock making it most complicated machines ever built. A good optimizing compiler is a prerequisite for machine performance. Instruction Pipelines: Everything within a digital computer happens in step with a clock: a signal that paces the computer's circuitry. The rate of the clock determines the overall speed of the computer. The parameters that place an upper limit on the clock speed include semiconductor tech, packaging, length of wire tying the pieces together and the longest path in the processor. Reducing the number of clock ticks it takes to execute an individual instruction is a good idea, but cost and practicality become issues beyond a certain point. A greater benefit comes from partially overlapping instructions so that more than one can be in progress simultaneously. The most obvious approach for this is to start the instructions simultaneously. But this needs hardware for two instructions in a situation where space is usually at a premium. Instead, consider the following scenario, if a moment after launching one operation,you could