UMA, NUMA and Cluster System-Parallel Processing-Assignments, Exercises of Parallel Computing and Programming

This assignment was assigned by Prof. Rasul Rangarajan at Deenbandhu Chhotu Ram University of Science and Technology for Parallel Processing course. It includes: Processingย Speed, Number, Cores, Family, Operating, System, Interconnect, Application, Area, Vendor, UMA, NuMA, Cluster

Typology: Exercises

2011/2012

Uploaded on 07/23/2012

parama
parama ๐Ÿ‡ฎ๐Ÿ‡ณ

4.1

(12)

56 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Question๎˜ƒ#01๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ๎˜ƒ[36๎˜ƒmarks]๎˜ƒ
Consult๎˜ƒthe๎˜ƒlatest๎˜ƒ10๎˜ƒlists๎˜ƒof๎˜ƒTop500๎˜ƒfastest๎˜ƒcomputer๎˜ƒand๎˜ƒtabulate๎˜ƒthe๎˜ƒnames๎˜ƒand๎˜ƒcount๎˜ƒof๎˜ƒbest๎˜ƒchoice๎˜ƒ
under๎˜ƒeach๎˜ƒof๎˜ƒthe๎˜ƒfollowing๎˜ƒcategory:๎˜ƒ
1โ€ Processing๎˜ƒSpeed๎˜ƒ
2โ€ Number๎˜ƒof๎˜ƒCores๎˜ƒ
3โ€ Processor๎˜ƒFamily๎˜ƒ
4โ€ Operating๎˜ƒSystem๎˜ƒFamily๎˜ƒ
5โ€ Interconnect๎˜ƒFamily๎˜ƒ
6โ€ Application๎˜ƒArea๎˜ƒ
7โ€ Vendor๎˜ƒ
8โ€ Country๎˜ƒ
ANSWER:๎˜ƒ
Computer๎˜ƒList๎˜ƒP.speed๎˜ƒ#๎˜ƒof๎˜ƒ
cores๎˜ƒ
P.family๎˜ƒOS๎˜ƒfamily๎˜ƒIC๎˜ƒfamily๎˜ƒApp.area๎˜ƒVendor๎˜ƒCountry๎˜ƒ
Kโ€computer๎˜ƒ11/2011๎˜ƒ2.0๎˜ƒGHz๎˜ƒ705024 Fujitsu๎˜ƒ
cluster๎˜ƒ
Linux๎˜ƒTofuโ€
custom๎˜ƒ
Research๎˜ƒFujitsu๎˜ƒJapan๎˜ƒ
Kโ€computer๎˜ƒ06/2011๎˜ƒ2.0๎˜ƒGHz๎˜ƒ705024 Fujitsu๎˜ƒ
cluster๎˜ƒ
Linux๎˜ƒTofuโ€
custom๎˜ƒ
Research๎˜ƒFujitsu๎˜ƒJapan๎˜ƒ
Tianheโ€1A๎˜ƒ11/2010๎˜ƒXeon๎˜ƒ
X5670๎˜ƒ6C๎˜ƒ
2.93๎˜ƒGHz๎˜ƒ
186368 NUDT๎˜ƒYH๎˜ƒ
MPP๎˜ƒ
Linux๎˜ƒPropritry๎˜ƒResearch๎˜ƒNUDT๎˜ƒChina๎˜ƒ
Jaguar๎˜ƒ06/2010๎˜ƒOpteron๎˜ƒ
6โ€Core๎˜ƒ2.6๎˜ƒ
GHz๎˜ƒ
224162 Cray๎˜ƒXT5โ€HE๎˜ƒLinux๎˜ƒProprietry๎˜ƒResearch๎˜ƒCray๎˜ƒ
Inc.๎˜ƒ
United๎˜ƒ
States๎˜ƒ
Jaguar๎˜ƒ11/2009๎˜ƒOpteron๎˜ƒ
6โ€Core๎˜ƒ2.6๎˜ƒ
GHz๎˜ƒ
224162 Cray๎˜ƒXT5โ€HE๎˜ƒLinux๎˜ƒProprietry๎˜ƒResearch๎˜ƒCray๎˜ƒ
Inc.๎˜ƒ
United๎˜ƒ
States๎˜ƒ
docsity.com
pf3
pf4
pf5
pf8

Partial preview of the text

Download UMA, NUMA and Cluster System-Parallel Processing-Assignments and more Exercises Parallel Computing and Programming in PDF only on Docsity!

Question #01 [36 marks] Consult the latest 10 lists of Top500 fastest computer and tabulate the names and count of best choice under each of the following category: 1 โ€ Processing Speed 2 โ€ Number of Cores 3 โ€ Processor Family 4 โ€ Operating System Family 5 โ€ Interconnect Family 6 โ€ Application Area 7 โ€ Vendor 8 โ€ Country

ANSWER:

Computer List P.speed # of cores P.family OS family IC family App.area Vendor Country Kโ€computer 11/2011 2.0 GHz 705024 Fujitsu cluster Linux Tofuโ€ custom Research Fujitsu Japan Kโ€computer 06/2011 2.0 GHz 705024 Fujitsu cluster Linux Tofuโ€ custom Research Fujitsu Japan Tianheโ€1A 11/2010 Xeon X5670 6C 2.93 GHz

186368 NUDT YH

MPP

Linux Propritry Research NUDT China Jaguar 06/2010 Opteron 6 โ€Core 2. GHz 224162 Cray XT5โ€HE Linux Proprietry Research Cray Inc. United States Jaguar 11/2009 Opteron 6 โ€Core 2. GHz 224162 Cray XT5โ€HE Linux Proprietry Research Cray Inc. United States

Roadrunner 06/2009 powerXCell 8i 3.2 GHz/ Opteron DC 1.8 GHz 129600 BladeCenter QS22/LS Cluster Linux Infiniband Research IBM United States Roadrunner 11/2008 powerXCell 8i 3.2 GHz/ Opteron DC 1.8 GHz 129600 BladeCenter QS22/LS Cluster Linux Infiniband Research IBM United States Roadrunner 06/2008 powerXCell 8i 3.2 GHz/ Opteron DC 1.8 GHz 129600 BladeCenter QS22/LS Cluster Linux Infiniband Research IBM United States BlueGene/L 11/2007 700MHz 212992 Power CNK/SLES 9 Proprietry Research IBM Unites States BlueGene/L 06/2007 700MHz 212992 Power CNK/SLES 9 Proprietry Research IBM Unites States Analysis: We can analyze from above table that before 2010, it was USA who has best technology but after that they were beaten by China and Japan when China introduced Tianheโ€1A by NUDT. We can also note that Fujitsu is dominating vendor now who built Kโ€computer which is four time powerful than its nearest competitor and also is one of most energy efficient systems.

Question #02 [24 marks]

Draw the block diagram of most powerful pipeline, superscalar and VLIW processor. Identify

major architectural features for each.

1 โ€pipeline:

The block diagram shows a 15โ€™s pipeline. The pipeline starts with 5 โ€stage fetch phase where instructions are fetched from L1 and then move to decode phase where they are decoded into micro ops. After decode phase register are assigned and then instructions are dispatched There is a loop cache in decode phase to store instructions that makeup a loop kernel in decided.

3 โ€VLIW

The block diagram below show the architecture of TMS320C6474 which is based on 65nm process technology the most powerful version of SoCs..The SoC in below diagram is capable of delivering 3.6GHz of total raw DSP processing power with performance of up to 28,800 million instructions per second.

Question #03 [40 marks] Choose one of the most popular/powerful parallel processor build by industrial/research groups in each of the following category and give the architectural block diagram and main features for each 1 โ€ UMA 2 โ€ NUMA 3 โ€ MPP 4 โ€ Cluster System

ANSWER:

1 โ€UMA

Xeon processors are UMA (uniform memory access) architected. Processors are connected to external memory controller through a bus. It is UMA in sense that each socket has uniform way to access any memory location in terms of latency.In latest chipsets dedicated connections are provided to each socket.

Opteron has concept of integrated memory controller so that each socket connect to other socket by means of direct hypertranspprt. This refer to NUMA. Other key features: ๏‚ท Increased HT3 bandwidth ๏‚ท 20%โ€50% higher performance than Quadโ€Core AMD Opteron processor

3 โ€Cluster system:

In diagram below shown a cluster system of Tile64. Tile64 card is composed of 64 core processors with each core running its own Linux OS and communicating using Tilera API. Tile64 is a system on a chip that can be plugged into PCI slot and be used independently from CPU. Each tile is complete full featured processor including integrated L1 and L2 cache and non blocking switch that connects the tile into mesh.

Application of this card are advanced networking and digital video etc.

4 โ€MPP:

๏‚ท MPP consist of Array control unit ,stage memory ,host processor and array unit as shown in below diagram ๏‚ท Stage memory is of 32 mb(multiple of bytes) ๏‚ท Array unit has 16385 processing elements. Array control unit is used to send commands to PEs in Array unit.