Distributed Detection of Node Replication Attacks in Sensor Networks, Thesis of Network Technologies and TCP/IP

A thesis submitted by Bryan Parno in partial fulfillment of the requirements for the degree of Master of Science in Electrical and Computer Engineering at Carnegie Mellon University in 2005. the vulnerability of sensor networks to node replication attacks and proposes a randomized multicast protocol for detecting such attacks. The document also includes a comparison of different protocols, security analysis, and future work.

Typology: Thesis

Pre 2010

Uploaded on 05/11/2023

sheetal_101
sheetal_101 🇺🇸

4.8

(17)

234 documents

1 / 43

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Distributed Detection of Node
Replication Attacks in Sensor
Networks
Bryan Jeffrey Parno
2005
Advisor: Prof. Perrig
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf28
pf29
pf2a
pf2b

Partial preview of the text

Download Distributed Detection of Node Replication Attacks in Sensor Networks and more Thesis Network Technologies and TCP/IP in PDF only on Docsity!

Distributed Detection of Node

Replication Attacks in Sensor

Networks

Bryan Jeffrey Parno

Advisor: Prof. Perrig

Distributed Detection of NodeReplication Attacks in Sensor Networks

Bryan Parno

A Thesis Submittedin Partial Fulfillment of the Requirementsfor the Degree Master of Science

Electrical and ComputerEngineering Carnegie MellonUniversity Pittsburgh, Pennsylvania May 2005

Abstract

The low-cost, off-the-shelf hardware componentsin unshielded sensor-network nodes leave them vulner- able to compromise.With little effort, an adversary maycapture nodes, analyze and replicate them, and surreptitiously insert these replicas at strategic locations within the network.Suchattacks mayhavesevere consequences;they mayallow the adversary to corrupt networkdata or even disconnect significant parts of the network. Previous node replication detection schemesdependprimarily on centralized mechanismswith single points of failure, or on neighborhoodvoting protocols that fail to detect distributed replications. To address these fundamentallimitations, wepropose two newalgorithms based on emergentproperties [17], i.e., properties that arise only through the collective action of multiple nodes. RandomizedMulticast dis- tributes nodelocation informationto randomly-selectedwitnesses, exploiting the birthday paradoxto detect replicated nodes, while Line-SelectedMulticast uses the topologyof the networkto detect replication. Both algorithms provide globally-aware,distributed node-replica detection, and Line-SelectedMulticast displays particularly strong performancecharacteristics. Weshowthat emergentalgorithms represent a promising newapproachto sensor networksecurity; moreover,our results naturally extend to other classes of networks in whichnodes can be captured, replicated and re-inserted by an adversary.

Contents

Chapter I

Introduction

Theease of deployingsensor networkscontributes to their appeal. Theycan quickly scale to large configura- tions, since administrators can simplydrop newsensors into the desired locations in the existing network.To join the network,newnodes require neither administrative intervention nor interaction with a base station; instead, they typically initiate simple neighbordiscoveryprotocols [6, 13] by broadcastingtheir prestored credentials (e.g., their uniqueID and/or the uniqueID of their keys). Unfortunately, sensor nodes typically employ low-cost commodityhardware componentsunprotected by the type of physical shielding that could preclude access to a sensor’s memory,processing, sensing and communicationcomponents.Cost considerations makeit impractical to use shielding that could detect pressure, voltage, and temperature changes[11, 33, 36] that an adversary might use to access a sensor’s internal state. Deployingunshielded sensor nodes in hostile environmentsenables an adversary to capture, replicate, and insert duplicated nodes at chosennetworklocations with little effort. Thus,if the adversary compromiseseven a single node, she can replicate it indefinitely, spreading her influence throughoutthe network. If left undetected, node replication leaves any networkvulnerable to a large class of insidious attacks. Usingreplicated nodes, the adversarycan subvert data aggregationprotocols by injecting false data or suppressing legitimate data. Further, blamefor abnormalbehavior can be spread across the replicas, reducing the likelihood that any one node exceeds the detection threshold. Evenmore insidiously, node replicas placed at judiciously chosen locations can revoke legitimate nodes and disconnect the networkby triggering correct executionof node-revocationprotocols that rely on voting schemes[6, 10, 13, 27]. Previous approachesfor detecting node replication typically rely on centralized monitoring, since 1o-

CHAPTER I. INTRODUCTION 2 calized voting systems [6, 27] cannot detect distributed replication. Centralized schemes require all of the nodes in the network to transfer a list of their neighbors’ claimed locations^1 to a central base station that can examinethe lists for conflicting location claims. Like all centralized approaches, this creates a single- point of failure. If the adversary can compromisethe base-station or interfere with its communications,then the centralized approach will fail. Also, the nodes surrounding the base station are subjected to an undue communication burden that may shorten the network’s life expectancy. In this paper, we use two different emergent algorithms to provide the first examples of globally-aware distributed node-replication detection systems. The emergent nature of our algorithms makes them ex- tremely resilient to active attacks, and both protocols see.k to minimize power consumption by limiting communication, while still operating within the extremely limited memorycapacity of typical sensor nodes. An emergent algorithm leverages the features that no individual node can provide, but that emerge through the collaboration of many nodes. Our first protocol, RandomizedMulticast, distributes location claims to a randomly selected set of witness nodes. The Birthday Paradox predicts that a collision will occur with high probability if the adversary attempts to replicate a node. Our second protocol, Line-Selected Multicast, exploits the routing topology of the network to select witnesses for a node’s location and utilizes geometric probability to detect replicated nodes. This protocol has modest communication and memoryrequirements. Furthermore, our solutions apply equally well to any class of network in which the adversary can capture, replicate and insert additional nodes. Examplesinclude wireless ad hoc networks and peer-to-peer networks. Weargue that such networks require the resiliency of emergent security techniques to resist an adversary that can subvert an arbitrary number of nodes at unpredictable locations. Weexpect that distributed algorithms based on emergent properties will provide the best defenses for attacks against these systems. In the following section, we provide a more detailed description of the node replication attack that we plan to thwart, and we supply a summaryof notation used throughout the paper. Then, in Section 3 we summarize some of the earlier proposals and explain why they fail to prevent replication attacks. After discussing some preliminary approaches to distributed detection in Section 4, we present and analyze our two primary protocols, RandomizedMulticast and Line-Selected Multicast, in Sections 5 and 6 respectively. Wecompareand contrast the protocols, discuss synchronization and authentication issues and generalize our 1Topreventthe adversaryfromusing the location informationto find anddisable nodes,wecouldinstead broadcasta locator uniqueto the node’sneighborhoodthat wouldreveal less informationbut still be verifiable by the neighbors.For example,the locatorcouldconsistof the node’slist of neighbors.If the list becomesprohibitivelylong,eachnodecanbroadcastthe list to its neighborsbut sign a hashof the list. Theneighborsverifythat theyare on the list, checkthe hash,andthenonlypropagatethe hash value,insteadof the entirelist.

Chapter 2

Background

2.1 Goals

For a given sensor network, we wouldlike to detect a node replication attack, i.e., an attempt by the adversary to add one or more nodes to the network that use the same ID as another node in the network. Ideally, wewouldlike to detect this behavior without centralized monitoring, since centralized solutions suffer from several inherent drawbacks(see Section 3.1). The schemeshould also revoke the replicated nodes, so that non-faulty nodes in the networkcease to communicatewith any nodes injected in this fashion. Weevaluate each protocol’s security by examiningthe probability of detecting an attack given that the adversary inserts L replicas of a subverted node. The protocol must provide robust detection even if the adversarycaptures additional nodes. Wealso evaluate the efficiency of each protocol. In a sensor network, communication(both sending and receiving) requires at least an order of magnitudemore powerthan any other operation [14], so our first priority must be minimizingcommunication,both for the networkas a wholeand for the individual nodes (since hotspots will quickly exhaust a node’s powersupply). Moreover, sensor nodes typically have a limited amountof memory,often on the order of a few kilobytes [14]. Thus, any protocol requiring a large amountof memorywill be impractical.

2.2 Sensor Network Environments

A sensor networktypically consists of hundreds,or eventhousands,of small, low-costnodes distributed over a wide area. The nodes are expected to function in an unsupervisedfashion even if newnodes are added,

CHAPTER 2. BACKGROUND 5

or old nodes disappear (e.g., due to power loss or accidental damage). While some networks include central location for data collection, manyoperate in an entirely distributed manner,allowingthe operators to retrieve aggregateddata from any of the nodes in the network.Furthermore,data collection mayonly occur at irregular intervals. For example,manymilitary applications strive to avoid any centralized and fixed points of failure. Instead, data is collected by mobileunits (e.g., unmannedaerial units, foot soldiers, etc.) that access the sensor networkat unpredictable locations and utilize the first sensor node they encounter as a conduit for the information accumulatedby the network. Since these networks often operate in an unsupervisedfashion for long periods of time, wewouldlike to detect a nodereplication attack soon after it occurs. If we wait until the next data collection cycle, the adversary has time to use its presence in the networkto corrupt data, decommissionlegitimate nodes, or otherwise subvert the network’s intended purpose. Wealso assumethat the adversary cannot readily create newIDs for nodes. Newsomeet al. describe several techniques to prevent the adversary from deploying nodes with arbitrary IDs [27]. For example, wecan tie each node’s ID to the unique knowledgeit possesses. If the networkuses a key predistribution scheme[6, 13], then a node’s ID could correspondto the set of secret keys it shares with its neighbors (e.g., a node’sID is given by the hash of its secret keys). In this system,an adversarygains little advantage by claiming to possess an ID without actually holding the appropriate keys. Assumingthe sensor network implementsthis safeguard, an adversary cannot create a newID without guessing the appropriate keys (for mostsystems, this is infeasible), so instead the adversarymust capture and clone a legitimate node.

2.3 Adversary Model In examiningthe security of a sensor network, we take a conservative approach by assumingthat the ad- versary has the ability to surreptitiously capture a limited numberof legitimate sensor nodes. Welimit the percentage of nodes captured, since an adversary that can capture most or all of the nodes in the network can obviously subvert any protocol running in the network. Havingcaptured these nodes, the adversary can employarbitrary attacks on the nodes to extract their private information. For example,the adversary might exploit the unshielded nature of the nodes to read their cryptographic information from memory.The adversary could then clone the node by loading the node’s cryptographic information onto multiple generic sensor nodes. Since sensor networksare inherently

Chapter 3

Previous Protocols

Thus far, protocols for detecting node replication have relied on a trusted base station to provide global detection. For the sake of completeness, we also discuss the use of localized voting mechanisms.We consider these protocols in the abstract; for specific examplesof previousprotocols, see Section 8.6. Until now,it wasgenerally believed that these two alternatives exhaustedthe space of possibilities. This paper expandsthe designspace to offer newalternatives with strong security and efficiency characteristics.

3.1 Centralized Detection

The most straightforward detection schemerequires each node to send a list of its neighbors and their claimed locations to the base station. Thebase station can then examineevery neighbor list to look for replicated nodes. If it discovers one or morereplicas, it can revoke the replicated nodes by flooding the networkwith an authenticated revocation message. Whileconceptually simple, this approachsuffers from several drawbacksinherent in a centralized sys- tem. First, the base station becomesa single point of failure. Anycompromiseof the base station or the communicationchannel around the base station will render this protocol useless. Furthermore, the nodes closest to the base station will receive the brunt of the routing load and will becomeattractive targets for the adversary. Theprotocol also delays revocation, since the base station must wait for all of the reports to comein, analyze themfor conflicts and then flood revocations throughout the network. A distributed or local protocol could potentially revoke replicated nodes in a moretimely fashion. Finally, manynetworks do not havethe luxury of a powerfulbase station, makinga distributed solution a necessity.

CHAPTER 3. PREVIOUS PROTOCOLS (^8) In terms of security, this protocol achieves 100%detection of all replicated nodes, assumingall messages successfully reach the base station. As far as efficiency, if we assume that the average path lengthI to the base station is O(x/~) and each node has an average degree d (for d << n), then this protocol requires O(r~v~) communicationfor all of the reports from the nodes to reach the base station. The storage required at each node is O(d). At the base station, the protocol requires O(r~- d), though storage is presumably less of a concern for the base station.

3.2 Local Detection

To avoid relying on a central base station, we could instead rely on a node’s neighbors to perform replication detection. Using a voting mechanism,the neighbors can reach a consensus on the legitimacy of a given node. Unfortunately, while achieving detection in a distributed fashion, this methodfails to detect distributed node replication in disjoint neighborhoods within the network. As long as the replicated nodes are at least two hops away from each other, a purely local approach cannot succeed.

1Thiswill holdtrue if the sensornetworkdeploymentapproximatesanyregular polygon.

CHAPTER 4. PRELIMINARY APPROACHES (^) 10 offending node. This protocol achieves 100%detection of all duplicate location claims under the assumption that the broadcasts reach every node. This assumption may not hold if the adversary can jam key areas or other- wise interfere with communication paths through the network. Nodes could employ redundant messages or authenticated acknowledgmenttechniques to try to thwart such an attack. In terms of efficiency, this protocol requires each node to store location information about its d neighbors. One node’s location broad- cast requires O(r~) messages, assuming the nodes employ a duplicate suppression algorithm in which each node only broadcasts a given message once. Thus, the total communicationcost for the protocol is O(r~2). Given the simplicity of the schemeand the level of security achieved, this cost maybe justifiable for small networks. However,for large networks, the r~^2 factor is too costly, so we investigate schemes with a lower COSt.

4.2 DeterministicMulticast

To improve on the communicationcost of the previous protocol, we describe a detection protocol that only shares a node’s location claim with a limited subset of deterministically chosen "witness" nodes. Whena node broadcasts its location claim, its neighbors forward that claim to a subset of the nodes called witnesses. The witnesses are chosen as a function of the node’s ID. If the adversary replicates a node, the witnesses will receive two different location claims for the same node ID. The conflicting location claims becomeevidence to trigger the revocation of the replicated node. More formally, in this protocol, whenever node "7 hears a location claim lc, from node a, it computes F(c~) = {~1, a~2,..., a;~}, where F maps each node ID in the set of possible node IDs, 5:, to a set of 9 node IDs:

(4.1)

The nodes with IDs in the set {w~, ~;2,..., wg}constitute the witnesses for node c~. Node7 forwards l~ to each of these witnesses. If c~ claims to be at more than one location, the witnesses will receive conflicting location claims, which they can flood through the network, discrediting a. In this protocol, each node in the network stores 9 location claims on average. For communication,

CHAPTER 4. PRELIMINARY APPROACHES (^11) assuming c~’s neighbors do not collaborate, we will need each of c~’s neighbors to probabilistically decide which of the a~i to inform. If each node selects ~ random destinations from the set of possible cci, then the coupon collector’s problem[7] assures us that each of the a~i’s will receive at least one of the location claims. Assuming an average network path length of O(v~) nodes, this results in O(~) messages. Unfortunately, this cost does not provide muchsecurity. Since F is a deterministic function, an adversary can also determine the a~is. Thus, they becometargets for subversion. If the adversary can capture or jam all

9 of the messagesdestined to the a~s, then she can create as manyreplicas of c~ as she desires (limited only by the requirement that no two replicas share a neighbor). Since the communicationcosts of this protocol grow as O(g lng), we cannot afford a large value for g, and yet a small value for g allows the adversary almost unlimited replication abilities after compromisinga fixed number of nodes; in other words, if the adversary controls the g witnesses for c~, she can create unlimited replicas of c~ and suppress the conflicting reports arriving at the witness nodes. These disadvantages make this protocol unappealing.

CHAPTER 5. RANDOMIZED MULTICAST 13

hibitive as traditionally assumed.In recent work,Malanet al. demonstratethat they can successfully gener- ate 163 bit ECCkeys on the MICA2 in under 34 seconds [22]. Furthermore, the latest generation of Telos sensors comewith 10KBof RAMand can achieve 5x the data rate of the MICA2, makingpublic-key algo- rithms more practical. In Section 8.3.2, wediscuss howwecould instead use symmetric-keycryptography to lower the computationaloverhead, at the expenseof additional communication.

5.2 Protocol Description At a high level, the protocol has each nodebroadcastits location claim, along with a signature authenticating the claim. Eachof the node’s neighborsprobabilistically forwardsthe claim to a randomlyselected set of witness nodes. If any witness receives two different location claims for the samenode ID, it can revoke the replicated node. Thebirthday paradoxensures that wedetect replication with high probability using a relatively limited numberof witnesses. Moreformally, each node c~ broadcasts a location claim to its neighbors, ¢31, /32 ..... (^) ¢3d. The loca- tion claim has the format (IDa, l~, {H(IDa,l~)}t~gl), wherel~ represents c~’s location (e.g., geographic coordinates(z, ~/)). Uponhearing this announcement,each neighbor,/3i, verifies c~’s signature and the plau- sibility of 1, (for example, if each node knowsits ownposition and has someknowledgeof the maximum propagation radius of the communicationlayer, then it can loosely boundc~’s set of potential locations). Then, with probability p, each neighbor selects 9 randomlocations within the networkand uses geographic routing (e.g., GPSR[19]) to forwardc~’s claim to the nodes closest to the chosenlocations (as in GHT[30]). Since we have assumedthe nodes are distributed randomly, this should produce a randomselection from the nodes in the network.In Section 5.3, weshowthat the probability of selecting the samenode morethan onceis generally negligible. Collectively, the nodes chosenby the neighborsconstitute the witnesses for c~. Eachwitnessthat receives a location claim, first verifies the signature. Then,it checksthe ID against all of the location claimsit has received thus far. If it ever receives two different locations claimsfor the same nodeID, then it has detected a node replication attack, and these two location claims serve as evidenceto revoke the node. It blacklists c~ from further communicationby immediatelyflooding the networkwith the pair of conflicting location claims, la and l~. Eachnode receiving this pair can independentlyverify the signatures and agree with the revocation decision. Thus, the sensor networkboth detects and defeats the nodereplication attack in a fully distributed manner.Furthermore,the randomizationprevents the adversary

CHAPTER 5. RANDOMIZED MULTICAST

from predicting whichnode will detect the replication.

5.3 Security Analysis

Let maliciousnodec~ claimto be at L locations, 11, 12 ..... lL. Wewouldlike to determinethe probability of a collision using the randomizedmulticast protocol outlined above, since a collision at a witness corresponds to detection of c~’s replication. At each location li, p. d nodesrandomlyselect 9 witnesses. If the neighbors coordinatedperfectly, this wouldstore c~’s location claim at exactly p - d ¯ 9 locations. However,since we prefer to have each neighbor act independently, there maybe someamountof overlap betweenthe witnesses each neighbor selects. To determine the impact of this overlap, we wouldlike to determinethe numberof nodes, Nreceive, that will receive the location claim assumingthe neighborschoosewitnesses independently. If Pelaim is the probability that a nodehears at least one claim andPno~eis the probability that a nodehears no location claims, then wehave:

(5.1) (^) E[Nr c v ] = " (5.2) (^) Pclairn = J- - Pnone

Since each neighbor is assumedto select 9 random,unique witness locations, the probability (Pill that node fails to hear any of the 9 announcementsfrom one neighboris:

(5.3) =

Since each neighbor decides independently whether to send out location claims, the numberof nodes that actually send out location claims is distributed binomially, with meanp- d and variance d. p(1 - p). For networkwith d = 20 and p = ~, the variance will be less than 0.005, so wewill approximatethe numberof neighborsthat send out locations claims as p.d. Since the neighborschoosetheir destinations independently, we have:

(5.4) (^) (