Data Center Network Topologies: FatTree, Study notes of Computer Networks

pages 63-74. • Main Goal: addressing the limitations of today's data center network architecture. – single point of failure.

Typology: Study notes

2022/2023

Uploaded on 05/11/2023

stefan18
stefan18 🇺🇸

4.2

(36)

278 documents

1 / 29

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Data Center Network Topologies:
FatTree
Hakim Weatherspoon
Assistant Professor, Dept of Computer Science
CS 5413: High Performance Systems and Networking
September 22, 2014
Slides used and adapted judiciously from Networking Problems in Cloud Computing
EECS 395/495 at Northwestern University
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d

Partial preview of the text

Download Data Center Network Topologies: FatTree and more Study notes Computer Networks in PDF only on Docsity!

Data Center Network Topologies:

FatTree

Hakim Weatherspoon

Assistant Professor, Dept of Computer Science

CS 5413: High Performance Systems and Networking

September 22, 2014

Slides used and adapted judiciously from Networking Problems in Cloud Computing EECS 395/495 at Northwestern University

Goals for Today

  • A Scalable, Commodity Data Center Network Architecture
    • M. Al-Fares, A. Loukissas, A. Vahdat. ACM SIGCOMM Computer Communication Review (CCR), Volume 38, Issue 4 (October 2008), pages 63-74.
  • Main Goal: addressing the limitations of today’s data center

network architecture

  • single point of failure
  • oversubscription of links higher up in the topology
    • trade-offs between cost and providing
  • Key Design Considerations/Goals
  • Allows host communication at line speed
  • no matter where they are located!
  • Backwards compatible with existing infrastructure
  • no changes in application & support of layer 2 (Ethernet)
  • Cost effective
  • cheap infrastructure
  • and low power consumption & heat emission

Topology:

 2 layers: 5K to 8K hosts

 3 layer: >25K hosts

 Switches:

○ Leaves: have N GigE ports (48-288) + N 10 GigE uplinks to one or more layers of network elements ○ Higher levels: N 10 GigE ports (32-128)

Multi-path Routing:

 Ex. ECMP

○ without it, the largest cluster = 1,280 nodes ○ Performs static load splitting among flows ○ Lead to oversubscription for simple comm. patterns ○ Routing table entries grows multiplicatively with number of paths, cost ++, lookup latency ++

Background

Internet

Servers

Access^ Layer-2 switch

Data Center

Aggregation Layer-2/3 switch

Core^ Layer-3 router

Common Data Center Topology

Leverages specialized hardware and communication protocols, such as InfiniBand, Myrinet.

  • These solutions can scale to clusters of thousands of nodes with high bandwidth
  • Expensive infrastructure, incompatible with TCP/IP applications  Leverages commodity Ethernet switches and routers to interconnect cluster machines
  • Backwards compatible with existing infrastructures, low-cost
  • Aggregate cluster bandwidth scales poorly with cluster size, and achieving the highest levels of bandwidth incurs non-linear cost increase with cluster size

Current Data Center Network Architectures

  • Single point of failure
  • Over subscript of links higher up in the topology
    • Trade off between cost and provisioning

Problems with common DC Topology

Properties of the solution

  • Backwards compatible with existing infrastructure
    • No changes in application
    • Support of layer 2 (Ethernet)
  • Cost effective
    • Low power consumption & heat emission
    • Cheap infrastructure
  • Allows host communication at line speed

Clos Networks/Fat-Trees

  • Adopt a special instance of a Clos topology
  • Similar trends in telephone switches led to

designing a topology with high bandwidth by interconnecting smaller commodity switches.

  • Why Fat-Tree?
    • Fat tree has identical bandwidth at any bisections
    • Each layer has the same aggregated bandwidth
  • Can be built using cheap devices with uniform capacity
    • Each port supports same speed as end host
    • All devices can transmit at line speed if packets are distributed uniform along available paths
  • Great scalability: k-port switch supports k^3 /4 servers

Fat tree network with K = 6 supporting 54 hosts

FatTree-based DC Architecture

Does using fat-tree topology to inter-connect racks of

servers in itself sufficient?

  • What routing protocols should we run on these

switches?

  • Layer 2 switch algorithm: data plane flooding!
  • Layer 3 IP routing:
    • shortest path IP routing will typically use only one path

despite the path diversity in the topology

  • if using equal-cost multi-path routing at each switch

independently and blindly, packet re-ordering may occur;

further load may not necessarily be well-balanced

  • Aside: control plane flooding!

FatTree Topology is great, But…

15

 Enforce a special (IP) addressing scheme in DC

 unused.PodNumber.switchnumber.Endhost

 Allows host attached to same switch to route only

through switch

 Allows inter-pod traffic to stay within pod

FatTree Modified

  • Use two level look-ups to distribute traffic

and maintain packet ordering

  • First level is prefix lookup
    • used to route down the topology to servers
  • Second level is a suffix lookup
    • used to route up towards core
    • maintain packet ordering by using same ports for same server
    • Diffuses and spreads out traffic

FatTree Modified

  1. Flow scheduling , Pay attention to routing large flows , edge

switches detect any outgoing flow whose size grows above a predefined threshold, and then send notification to a central scheduler. The central scheduler tries to assign non-conflicting paths for these large flows.

  • Eliminates global congestion
  • Prevent long lived flows from sharing the same links
  • Assign long lived flows to different links

FatTree Modified

  • In this scheme, each switch in the network maintains a BFD (Bidirectional Forwarding Detection) session with each of its neighbors to determine when a link or neighboring switch fails

 Failure between upper layer and core switches

 Outgoing inter-pod traffic, local routing table marks the affected link as unavailable and chooses another core switch  Incoming inter-pod traffic, core switch broadcasts a tag to upper switches directly connected signifying its inability to carry traffic to that entire pod, then upper switches avoid that core switch when assigning flows destined to that pod

Fault Tolerance