



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
This paper discusses the challenges of network performance isolation in multi-tenant cloud environments and introduces EyeQ, a system designed to provide predictable network performance to tenants. The paper highlights the importance of network performance for distributed applications and the impact of interference between tenants on application performance and revenue. EyeQ leverages high bisection bandwidth in data centers and leaves a headroom of 10% of access link bandwidth to simplify dealing with contention problems. The system enforces predictable network sharing at the end hosts with minimum support from the physical network. The paper also discusses the limitations of conventional network sharing methods and the need for predictable network performance in cloud environments.
Typology: Study notes
1 / 6
This page cannot be seen from the preview
Don't miss anything!




The shared multi-tenant nature of the cloud has raised serious concerns about its security and performance for high valued services. Of many shared resources like CPU, memory, etc., the network is pivotal for distributed applications. Benign, or perhaps malicious traffic in- terference between tenants can cause significant perfor- mance degradation that hurts performance of applica- tions, and hence, impacts their revenue. Network perfor- mance isolation is particularly hard because of the dis- tributed nature of the problem, and the short (few RTT) timescales at which they manifest themselves. This prob- lem is further exacerbated by the large number of com- peting entities in the cloud, and their volatile traffic pat- terns. In this paper, we motivate the design of our system called EyeQ, with the goal of providing predictable net- work performance to tenants. The enabler for EyeQ is the availability of high bisection bandwidth in data cen- ters. The key insight is that by leaving a headroom of (say) 10% of access link bandwidth, EyeQ simpli- fies dealing with potentially a global contention problem into one that is mostly local, at the sender and receiver. This allows EyeQ to enforce predictable network sharing completely at the end hosts, with minimum support from the physical network.
The shared, multi-tenant nature of cloud providers has raised concerns about their security and perfor- mance. Many cloud customers have reported the “noisy- neighbour” [2, 4] problem, where performance of the system is unpredictable if a colocated tenant tries to grab resources (CPU, disk, IO) disproportionately. While hy- pervisors are equipped with mechanisms ot deal with CPU, memory and disk isolation, network isolation has attracted attention only recently. To date, commercial
Figure 1: Screenshot of an offering from Amazon EC2. CPU, disk and memory are allocated in familiar units. In contrast, the units of “IO,” is unclear.
offerings of network performance document only reach- ability isolation. But, network performance isolation is particularly vital for scale-out distributed services which have a demanding network component, unlike CPU in- tensive jobs. Many cloud service providers today have some means of allocating CPU, memory and disk resources, in fa- miliar units such as “virtual” cores, GB of memory and disk capacity (Figure 1). When it comes to IO, which includes the network, the units are either absent, or neb- ulous: “low, moderate and high.” Customers do not get a clear picture of their network resource guarantees, and are unable to cope with bad performance. Some of them either give up [3] on the cloud citing bad performance, or significantly rearchitect their applications. For exam- ple, Netflix reported that their own data center networks offered good performance, which afforded them to build chatty applications; but on Amazon Web Services, they redesigned their infrastructure to deal with performance variability [1]. Our experience in talking to customers suggests that they would like to have predictable perfor- mance, as if the network allocated to them were dedi- cated. Conventially, network sharing has been enforced by making the network aware of competing classes of traf- fic, i.e., by configuring Class-Of-Service queues exposed by many commodity switches. However, as noted by Shieh et. al [18], the number of queues in switches has
not evolved beyond 8 or 16 queues per port. Hence, we need a scalable mechanism to cope up with number of tenants in the cloud. In the rest of this paper, we fo- cus on the problem of sharing the network at large scale: ∼10K tenants, ∼100K servers and ∼1M VMs. Pro- viding predictable network performance—i.e. guaran- tees on bandwidth, latency, jitter—is hard because, (a) unlike CPU, memory and disk, network contention is a distributed problem: contention can occur anywhere in the network; (b) the number of contending entities and their traffic characteristics are diverse: tens of thousands of tenants with different flavours of TCP, UDP; (c) very short timescale (few milliseconds) contention can affect long term performance (as shown in Figure 2). To enable a customer to set clear expectations of net- work performance, we envison that a cloud provider should be able to provision network resources for its tenants, starting with bandwidth, in familiar units: bits per second. The provider assures each instance (VM) a guaranteed minimum bandwidth, so that the VM is able to transmit and receive at least the chosen capac- ity. The provider then supports a simple performance ab- straction where the customer’s ‘virtual’ network of VMs, in aggregate, perform as if they were connected to a sin- gle switch with full bisection bandwidth. We asked our- selves, “What are the requirements of a mechanism that tries to enforce such guaranteed performance?” We ar- rived at the following requirements by talking to devel- opers and operators of a cloud provider; the requirements are by no means complete, but some of these are high- lighted by other proposals [6, 18] as well.
Related Work: Recently, hypervisor based mecha- nisms have been proposed that mitigate the effect of
many (possibly) malicious traffic patterns. Seawall [18] tackled the problem by tunneling packets through end- to-end congestion-controlled tunnels. On a link, this achieves weighted max-min fairness between multiple source VMs. The inherent problem of weighted shar- ing is that a tenant’s bandwidth share depends on the unknown activity (i.e., number of senders) of other ac- tive tenants; this violates our first requirement. Okto- pus [6] also made the case for predictability in data cen- ter networks, and used an end-to-end mechanism com- bined with intelligent placement to provide strict band- width guarantees. However, Oktopus operated at fairly large timescales (2 seconds). As we will see in §2, it is important for performance isolation mechanisms to react quickly, within a few round trip times. Moreover, Okto- pus (and SecondNet [13]) statically reserve bandwidth, which violates our third requirement. The rest of the paper is organized as follows. Section 2 further elaborates the difficulties of sharing the network. Section 3 lays down some design choices that greatly simplify mechanisms for sharing network bandwidth, and Section 3.2 presents a preliminary design for enforc- ing predictable network performance. Finally, Section 4 concludes with some directions for future work.
In this section, we will first see why network sharing is a hard problem, and then discuss various scenarios in which performance interference can occur.
Distributed Nature: In a general network topology, contention (or congestion) for bandwidth and latency can happen at any hop inside the network. To cope with increasing application bandwidth demands, mod- ern data center networks have very high bisection band- width. Even in such network topologies with large bisec- tion bandwidth, congestion can happen anywhere if the traffic does not conform to the hose model [11] (i.e., if the traffic matrix is non-admissible). This situation arises when any link in the network receives traffic at a sus- tained rate greater than its capacity. If the traffic matrix is admissible, then it is difficult to congest the network core, but it is easy to congest any single “port,” i.e., an end-host. This happens if more than one host sends data simultaneously to a receiver (N to 1 traffic pattern).
Traffic diversity: Many cloud environments host tens of thousands of tenants, allowing them to bring their own code and OS. This allows customers to tweak settings in their network stack; use different TCP implementations that have better performance; use multiple TCP connec- tions to boost throughput; disable congestion control, or even use an unreactive UDP session to transmit data.
dropped
Figure 3: The difficulty of detecting access link con- tention on the receive path.
switch, it has been shown that a speedup of about 2 is necessary [8], to perfectly emulate an Output Queued Switch. In practice, it has been found that a smaller speedup (between 1.1 and 1.2) usually sufficies [10] to have the same benefits of an Output Queued Switch. The purpose of speedup is to simultaneously ensure that (a) the fabric is not a bottleneck; and (b) contention is moved to the edge (i.e., the output queue), where it can be de- tected and resolved locally. The above observation guides the design of our mech- anism. Viewing the network as a giant switching fabric greatly simplifies a global network contention problem to a local one: contention for transmit and receive band- width of the access link. Contention on the transmit side happens first within the end host, which can be resolved by packet scheduling mechanisms at the VSwitch. Un- fortunately, contention at the receiver first occurs at the access link, which is inside the network. As shown in Figure 3, in-network congestion can cause network de- mands of a VM to go undetected. It is important that the access link contention be quickly detected. To un- derstand this, consider the example shown in Figure 2. When UDP bursts at 7Gbps, the access link saturates (at short-timescales) leading to packet drops. Since the switch is not tenant aware, it drops both TCP and UDP packets, to which TCP aggressively reacts by exponen- tially backing-off. This ‘elasticity’ of TCP hides the true demand for bandwidth, and hence, mechanisms that react at timescales larger than a few RTTs cannot differentiate between two cases: one where there is a genuine lack of demand, and the other, where TCP has backed-off. Introducing a small speedup to the network fabric (over the edge links) mitigates contention from the net- work, moving it completely to the edge, i.e., the VSwitch port inside the hypervisor. This allows the VSwitch to detect impending network congestion, and accurately ac- count for congestion on a per-VM basis. The speedup ef- fectively gives headroom to prevent packet drops in the network, and allows the VSwitch to tease out the true network demand from the underlying TCP flow, without requiring any interaction with the VM’s TCP stack. By
Network Fabric
TX datapath
Congestion Detectors
RX datapath
VSwitch
End-to-End Flow Control
Feedback
Rate Limitrs
WRR^ 3G^ 6G
WRR 3G 7G
Figure 4: Overall system architecture.
measuring the rate at which each VM is receiving traffic at the congestion detectors, the VSwitch signals sources that exceed their fair share. Thus, the VSwitch can enforce rich bandwidth sharing policies—such as mini- mum, maximum, or weighted bandwidth guarantees—in an end-to-end fashion. To achieve the effect of speedup, we slow down the end-hosts by detecting congestion when the access link utilization exceeds some fraction γ (less than, but close to 1). This bandwidth headroom of ( 1 − γ) may seem like a big price to pay, but in some cases, it actually leads to better link utilization (as shown in Figure 5). Taming traffic diversity: Our notion of performance isolation translates to providing minimum end-to-end bandwidth guarantees to tenants. Thus, to be agnostic to the type of traffic, we treat packets between source- destination pairs as a single meta-flow whose aggregate rate is controlled through end-to-end congestion control, irrespective of the tenants’ network stack.
Embracing timescales: The example shown in Fig- ure 2(a) illustrates the need to react quickly, at timescales of the order of round-trip times. Implementing the end- to-end rate control mechanism in a distributed fashion, in the datapath makes it possible to react to congestion in a timely manner, within a few RTTs.
Figure 4 shows the high level overview of EyeQ’s trans- mit (TX) and receive (RX) datapaths. The TX datap- ath consists of rate limiters to enforce admission con- trol. Contention at the transmit side is resolved by us- ing a TX-weighted round robin (WRR) scheduler that as-
0 20 40 60 80 100 120 Time (s)
1G
3G
5G
7G
9G
Rate tcp udp total
Figure 5: A preliminary prototype of our design helps mitigate the harmful bursty nature of UDP traffic, and let TCP traffic at- tain its minimum bandwidth guarantee of 6.6Gbps. Figure 2(b) shows the harmful effect of UDP traffic without our isolation mechanism.
sures each VM its (egress) minimum bandwidth guaran- tee. To ensure that traffic does not congest any receiver, there are multiple per-destination rate limiters. This is analogous to ‘Virtual-Output-Queues’ in switches; mul- tiple per-destination rate limiters prevent packets to un- congested destinations from head-of-line blocking other packets. These rate limiters vary their sending rate pe- riodically using a control loop similar to TCP’s AI/MD process, using feedback generated by the RX datapath. The RX datapath consists of a number of congestion detectors, one per VM. Each detector is assigned a “fair rate” by the RX-WRR scheduler, and generates feedback whenever the VM exceeds its allotted rate. The conges- tion detector is clocked by the arrival of packets; if a packet arrives to a VM, and the VM’s rate exceeds its share, the congestion detector sends a feedback to the source of the packet. The feedback can be anything: a single bit (such as ECN), or an explicit rate (such as RCP [9]). The RX-WRR scheduler enforces speedup by splitting a maximum of γ C between the VMs, where C is the physical NIC capacity. The RX datapath works in tandem with the TX datapath to enforce end-to-end flow control.
In this paper, we discussed a mechanism for enforcing network performance isolation in a large multi-tenant en- vironment. We have implemented this design as a Linux Kernel Module, and tested it against many adversarial traffic patterns similar to the scenario discussed in Fig- ure 2(a). We found that EyeQ is able to mitigate the harmful effects of malicious traffic. In particular, Fig- ure 5 shows how EyeQ protects the TCP tenant from bursty UDP traffic, while simultaneously improving the
total network utilization from about 4Gbps (Figure 2(b)) to about 8Gbps.
In-network contention: While high bandwidth network designs present lesser opportunity for in-network con- tention, it does not eliminate its possibility. EyeQ does not ignore this possibility, but gracefully falls back to per-sender max-min fairness, using tenant agnostic con- gestion notification mechanisms such as ECN. If the net- work gets congested on ‘large’ timescales (such as a few hours), it strongly indicates an unbalanced system de- sign. We believe that the right approach is to invest more on the network, so that it does not “get in the way” of providing customer satisfaction.
Other Scenarios: The mechanism presented in this pa- per focuses on a particular kind of traffic pattern, which we call “intra-tenant;” communication between VMs of the same tenant happens over a high speed network in- terconnect, interfering with similar communication pat- terns of other tenants. However, cloud networks also host services like memcached clusters, storage, load bal- ancers. These services are usually implemented as ten- ants, and hence “inter-tenant” communication can also result in performance interference. While we demon- strated EyeQ’s ability to enforce minimum bandwidth guarantees on a per-VM basis, EyeQ can directly benefit from techniques such as Distributed Rate Limiting [15] that enforce a limit on aggregate bandwidth consump- tion. Also, a data center has a lot of network capacity, but typically only has ∼100Gbps uplink bandwidth to the In- ternet. Since tenants share this uplink, it is important to have automated defense mechanisms to protect this bandwidth that is crucial for the infrastructure. And fi- nally, it is equally important to protect the downlink, and defend against attacks originating from the Internet; this is the holy grail of Internet Quality of Service.
We presented EyeQ, a platform to enforce network shar- ing policies at large scale. By viewing the data center network as a giant switch, and by trading off a small frac- tion of the access link bandwidth, EyeQ is able to assure a guaranteed minimum bandwidth guarantees to VMs in a timely fashion, completely at the edge, with minimum support from the network.
This work was partly funded by NSF FIA and Google.