EyeQ: Practical Network Performance Isolation for the Multi-tenant Cloud

Vimalkumar Jeyakumar

Mohammad Alizadeh

David Mazi`

eres

Balaji Prabhakar

Stanford University

Changhoon Kim

Windows Azure

Abstract

The shared multi-tenant nature of the cloud has raised

serious concerns about its security and performance for

high valued services. Of many shared resources like

CPU, memory, etc., the network is pivotal for distributed

applications. Benign, or perhaps malicious traffic in-

terference between tenants can cause significant perfor-

mance degradation that hurts performance of applica-

tions, and hence, impacts their revenue. Network perfor-

mance isolation is particularly hard because of the dis-

tributed nature of the problem, and the short (few RTT)

timescales at which they manifest themselves. This prob-

lem is further exacerbated by the large number of com-

peting entities in the cloud, and their volatile traffic pat-

terns.

In this paper, we motivate the design of our system

called EyeQ, with the goal of providing predictable net-

work performance to tenants. The enabler for EyeQ is

the availability of high bisection bandwidth in data cen-

ters. The key insight is that by leaving a headroom

of (say) 10% of access link bandwidth, EyeQ simpli-

fies dealing with potentially a global contention problem

into one that is mostly local, at the sender and receiver.

This allows EyeQ to enforce predictable network sharing

completely at the end hosts, with minimum support from

the physical network.

1 Introduction

The shared, multi-tenant nature of cloud providers

has raised concerns about their security and perfor-

mance. Many cloud customers have reported the “noisy-

neighbour” [2, 4] problem, where performance of the

system is unpredictable if a colocated tenant tries to grab

resources (CPU, disk, IO) disproportionately. While hy-

pervisors are equipped with mechanisms ot deal with

CPU, memory and disk isolation, network isolation has

attracted attention only recently. To date, commercial

Figure 1: Screenshot of an offering from Amazon EC2.

CPU, disk and memory are allocated in familiar units. In

contrast, the units of “IO,” is unclear.

offerings of network performance document only reach-

ability isolation. But, network performance isolation is

particularly vital for scale-out distributed services which

have a demanding network component, unlike CPU in-

tensive jobs.

Many cloud service providers today have some means

of allocating CPU, memory and disk resources, in fa-

miliar units such as “virtual” cores, GB of memory and

disk capacity (Figure 1). When it comes to IO, which

includes the network, the units are either absent, or neb-

ulous: “low, moderate and high.” Customers do not get

a clear picture of their network resource guarantees, and

are unable to cope with bad performance. Some of them

either give up [3] on the cloud citing bad performance,

or significantly rearchitect their applications. For exam-

ple, Netflix reported that their own data center networks

offered good performance, which afforded them to build

chatty applications; but on Amazon Web Services, they

redesigned their infrastructure to deal with performance

variability [1]. Our experience in talking to customers

suggests that they would like to have predictable perfor-

mance, as if the network allocated to them were dedi-

cated.

Conventially, network sharing has been enforced by

making the network aware of competing classes of traf-

fic, i.e., by configuring Class-Of-Service queues exposed

by many commodity switches. However, as noted by

Shieh et. al [18], the number of queues in switches has

EyeQ: Practical Network Performance Isolation for the Multi-tenant Cloud, Study notes of Computer Systems Networking and Telecommunications

Related documents

Partial preview of the text

Download EyeQ: Practical Network Performance Isolation for the Multi-tenant Cloud and more Study notes Computer Systems Networking and Telecommunications in PDF only on Docsity!

Vimalkumar Jeyakumar

Mohammad Alizadeh

David Mazi`eres

Balaji Prabhakar

Stanford University

Changhoon Kim

Windows Azure

Abstract

1 Introduction

2 Why is network sharing hard?

RX

VM

VM

WRR

10G

1G

VM VM VM VM

3.3 EyeQ Architecture

4 Discussion and Future Work

5 Conclusion

Acknowledgments