






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
risk - risk
Typology: Study Guides, Projects, Research
1 / 12
This page cannot be seen from the preview
Don't miss anything!







Mobile operating systems, such as Apple’s iOS and Google’s Android, have supported a ballooning market of feature-rich mobile applications. However, helping users understand se- curity risks of mobile applications is still an ongoing chal- lenge. While recent work has developed various techniques to reveal suspicious behaviors of mobile applications, there exists little work to answer the following question: are those behaviors necessarily inappropriate? In this paper, we seek an approach to cope with such a challenge and present a continuous and automated risk assessment framework called RiskMon that uses machine-learned ranking to assess risks incurred by users’ mobile applications, especially Android applications. RiskMon combines users’ coarse expectations and runtime behaviors of trusted applications to generate a risk assessment baseline that captures appropriate behav- iors of applications. With the baseline, RiskMon assigns a risk score on every access attempt on sensitive informa- tion and ranks applications by their cumulative risk scores. We also discuss a proof-of-concept implementation of Risk- Mon as an extension of the Android mobile platform and provide both system evaluation and usability study of our methodology.
C.4 [Performance of Systems]: Measurement techniques; D.4.6 [Operating Systems]: Security and Protection—Ac- cess controls, Information flow controls
Smartphones; Android; Risk Assessment
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- publish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. CODASPY’14, March 3–5, 2014, San Antonio, Texas, USA. Copyright 2014 ACM 978-1-4503-2278-2/14/03 ...$15.00. http://dx.doi.org/10.1145/2557547.2557549.
Mobile operating systems, such as Android and iOS, have tremendously supported an application market over the last few years. Google Play announced 48 billion app down- loads in May 2013 [27]. Almost at the same time, Apple’s AppStore reached 50 billion downloads [31]. Such a new paradigm drives developers to produce feature-rich applica- tions that seamlessly cater towards users’ growing needs of processing their personal information such as contacts, loca- tions and other credentials on their mobile devices. Unfortu- nately, the large installed base has also attracted attention of unscrupulous developers who are interested in users’ sen- sitive information for a variety of purposes. For example, spyware tracks users’ locations and reports to remote con- trollers, and adware collects users’ identities for enforcing an aggressive directed marketing. To defend against such rogue applications, Android assists users to review them at install time. Primarily, Android relies on permissions to help users understand the security and privacy risks of applications. In Android, an application must request permissions to be allowed to access sensitive resources. In other words, it is mandatory for Android ap- plications to present its expected behaviors to users. Even though permissions outline the resources that an application attempts to access, they do not provide fine-grained infor- mation about how and when such resources will be used. Suppose a user installs an application and allows it to ac- cess her location information. It is hard for her to deter- mine whether the application accesses her locations on her demand or periodically without asking for her explicit con- sent. Therefore, it is imperative to continuously monitor the installed applications so that a user could be informed when rogue applications abuse her sensitive information. Previous work has proposed real-time monitoring to reveal potential misbehaviors of third-party applications [14, 22, 30, 38, 39]. Specifically, TaintDroid [14] and Aurasium [38] inspect an application’s behaviors at variable and syscall level, respec- tively. While these techniques partially provide valuable in- sights into a user’s installed applications, it is still critical to answer the following challenge: are the behaviors in mobile applications necessarily inappropriate? To answer this question, it is an end-user’s responsibility to conduct risk assessment and make decisions based on her disposition and perception. Risk assessment is not a trivial task since it requires the user to digest diverse contextual and technical information. In addition, the user needs to apprehend expected behaviors of applications under different contexts prior to addressing her risk assessment baseline.
However, it is impractical for the normal users to distill such a baseline. Instead, it is essential to develop an au- tomated approach to continuously monitor applications and effectively alert users upon security and privacy violations. In this paper, we propose an automated and continu- ous risk assessment framework for mobile platforms, called RiskMon. RiskMon requires a user’s coarse expectations for different types of applications while user intervention is not required for the subsequent risk assessment. The user needs to provide her selection of trusted applications from the installed applications on her device and her ranking of permission groups in terms of their relevancy to the cor- responding application. Then, RiskMon builds the user’s risk assessment baseline for different application categories by leveraging API traces of her selected applications. Risk- Mon continuously monitors the installed applications’ be- haviors, including their interactions with other applications and system services. The risk of each interaction is measured by how much it deviates from the risk assessment baseline. For a better risk perception, RiskMon ranks installed ap- plications based on the risk assessment results in a real-time manner. Intuitively, the user can deem an application as safe if it is less risky than any of her trusted applications. As RiskMon interposes and assesses API calls before an application gets the results, we foresee the possibility of in- tegrating RiskMon into an automated permission granting process as discussed in [18] and [32]. Furthermore, while we implement RiskMon on the Android platform, Risk- Mon is equally applicable to other platforms (e.g. Apple iOS and Microsoft Windows Phone) in assisting security ex- perts to discover high-risk applications. Tools like Risk- Mon would practically help raise awareness of security and privacy problems and lower the sophistication required for concerned users to better understand the risks of third-party mobile applications. This paper makes the following contributions:
The remainder of this paper proceeds as follows. Section 2 provides the motivation and problem description of this pa- per. Section 3 provides a high-level overview of the Risk- Mon framework and system design by illustrating each stage of automated risk assessment. Section 4 presents prototype implementation and evaluation of our framework. Section 5 discusses the limitations of our approach. Section 6 describes related work. Section 7 concludes the paper.
2. MOTIVATION AND BACKGROUND TECHNOLOGIES Users are concerned about security and privacy issues on mobile devices. However, in most cases they are not aware of the issues unless highlighted. Although Apple’s manda- tory application review process [1] and Google Bouncer [25] strive to mitigate misbehaving applications, users are still responsible for defending themselves.
2.1 Use Cases and Threat Model A continuous and automated risk assessment framework enhances a number of use cases in the current mobile appli- cation ecosystems. In general, such a framework improves user experience of security features and promotes under- standing about risks of mobile applications. This enables more users to discover misbehaving applications and possi- bly write negative reviews, thereby alerting and protecting other users. In addition, it complements static and dynamic analysis in ensuring appropriateness justifications by secu- rity analysts. This could be applied in both official and al- ternative application markets as a pre-screening mechanism to select suspicious applications for further analysis. Alter- natively, a developer can evaluate her applications against those of her competitors and improve security practices if necessary. For the purposes of this paper we consider the generic scenario where a user assesses her installed applica- tions. Applications, as long as they are not on users’ devices, do not incur any substantial risk. Once an application is in- stalled, it starts interacting with the operating system and other applications. While the application accesses sensitive resources, it gradually builds a big picture of the system as well as the user. Each access, such as calling an API, returns a tiny fraction of the picture and incurs a small amount of risk. Once the picture is finished, it may contain a user’s personal identities (e.g. contacts), device identities (e.g. manufacturer) and context identities (e.g. locations, WiFi SSIDs). Since risk assessment at the pre-installation stage does not address such threats on users’ devices, we aim to provide continuous risk assessment for normal users.
2.2 Risk Assessment of Mobile Applications Recent work has proposed mechanisms to extract risk sig- nals from meta information on application markets such as permissions [16, 28, 33, 36], ratings [9, 10], and application descriptions [26]. Their limitation is that such information is fuzzy and fails to provide fine-grained information about how and when sensitive resources are used. For example, an application may stay in the background and keep probing a user’s locations and surroundings. Moreover, a malicious application with split personalities [5] can evade screening mechanisms of application markets. We argue that users deserve the rights to understand what is happening on their own devices. Thus, continuously revealing runtime behav- iors plays a vital role as a necessary defense line against rogue applications. Previous research concerning applications’ runtime behav- iors specifies a set of risk assessment heuristics tailored to their specific problems. For example, TaintDroid [14] con- siders a case in which sensitive data is transmitted over the network. DroidRanger [40] and RiskRanker [20] assume that dynamically loaded code is a potential sign of malware. While these techniques provide valuable insights about run-
API Traces
RiskMon
Meta Information
Baseline Learner
Security Requirements
Android Applications
GooglePlay Device^ User
Application Intelligence Aggregator
Risk Meter
Figure 1: RiskMon Architecture for Android
scheme to capture various intelligences about applications. This provides a well-founded base for measuring the “dis- tance” between two API calls in the space of runtime behav- iors. Simplified security requirement communication: It is a challenging task for users to specify security requirements for security tools. To tackle this problem, RiskMon adopts a simple heuristic that allows users to communicate security requirements through their coarse expectations. Although this reduces the burden on the user, we cannot entirely elim- inate it. We note that acquiring a user’s expectations is nec- essary since each user has diverse preferences on the same application. For instance, all users of Facebook application may have disparate expectations for controlling their loca- tion and camera utilities. Intuitive risk representation: The way in which risk is presented significantly influences a user’s perception and de- cision upon risky applications. A counterexample would be standalone risk scores, such as a risk indicator saying “Face- book incurs 90 units of risk” without proper explanation. As Peng et al. noted in [28], “it is more effective to present comparative risk information”. Inspired by their approach, RiskMon presents a ranking of applications so that a user can compare the potential loss of using an application with other applications. In addition, the user can view the risk composition of an application for supporting evidences. Iterative risk management: Risk assessment is an on- going iterative process. As applications get upgraded and bring more functionalities, they introduce new risks that should be measured. To this end, the risk assessment base- line should evolve to continuously monitor installed applica- tions and update the risk assessment baseline periodically. Moreover, users need to provide their feedbacks to RiskMon by adding or revising their security requirements.
We now present our risk assessment framework. Figure 1 depicts the RiskMon architecture for Android. Our frame- work consists of three components: an application intelli- gence aggregator, a baseline learner, and a risk meter. The application intelligence aggregator compiles a dataset from API traces collected on a user’s device and meta infor-
mation crawled from application markets. API traces cover an application’s interactions with other parts of the system via API calls and callbacks. To complement API traces with contextual information, RiskMon uses meta information on application markets such as ratings, number of downloads and category which provide a quantitative representation of applications’ reputation and intended core functionalities. The baseline learner combines a user’s coarse expectations and aggregated intelligences of her trusted applications to generate a training set. Afterwards, the baseline learner ap- plies a machine-learned ranking algorithm to learn a risk as- sessment baseline. Then the risk meter measures how much an application’s behaviors deviate from the baseline. Using the deviation to provide risk information, risk meter ranks a user’s installed applications by their cumulative risks and presents the ranking to the user in an intuitive way. The re- mainder of this section describes each component in detail.
3.1 Application Intelligence Aggregator This component aggregates intelligences about a user’s in- stalled applications, including their runtime behaviors and contextual information. As RiskMon monitors runtime be- haviors by interposing Binder IPC, we propose a set of fea- tures for API traces tailored to the peculiarity of Binder. Also, we seek contextual information from application mar- kets and propose corresponding features to represent and characterize them. The proposed features build a space of application intelligences and enables subsequent baseline generation and risk measurement. Unless explicitly speci- fied, all features are normalized to [0,1] so that each of them contributes proportionally.
3.1.1 Features for API Traces Android applications frequently use APIs to interact with system services. Considering that using most APIs does not require any permission, we assume that resources protected by at least one permission are a user’s assets. We are interested in runtime behaviors, i.e. Binder trans- actions, that are used by APIs to reach the assets. How- ever, APIs do not carry information about Binder transac- tions. To bridge this gap, we adopt existing work [4, 17] to provide mappings from permissions to APIs. Meanwhile, we analyzed the interface definitions of Android system ser- vices and core libraries to generate a mapping from APIs to Binder transactions. As a result, we extracted 1, permission-protected APIs, of which each corresponds to a type of Binder transactions. Each type of Binder transac- tion is identified by the corresponding system service, direc- tion of control flow, and a command code unique to the ser- vice. For example, an API named requestLocationUpdate is identified as Binder IPC transaction (LocationManager, callback, 1). We attempt to represent a Binder transaction with its in- ternal properties and contents. For a specific Binder transac- tion between an application and a system service, we are in- terested in its type so as to identify the corresponding asset. Also we need to know the direction of control flow for deter- mining who initiates the transaction. As users trust the sys- tem services more than applications, RiskMon should dif- ferentiate Binder transactions initiated by applications and system services. Thus, internal properties are represented with the following features:
Note that we use 1,003 boolean features to represent the type of Binder transactions instead of using one integer value. This is because Binder transactions are independent from each other, and the Binder command codes are simply nom- inal values. By using the array of 1,003 boolean values, the distances between any two Binder transaction types are set to the same value, which is important for our learning algo- rithm (Section 3.2.3). In terms of contents, parcels in Binder transactions are unstructured and highly optimized, and it is hard to restore the original data objects without implementation details of the sender and recipient. Therefore, we use length as one representative feature of parcel. A motivating example is accesses on contacts. From the length of a parcel we can infer whether an application is reading a single entry or dumping the entire contacts database. Thus, we propose the following two features for parcels:
3.1.2 Features for Meta Information Although meta information on application markets cannot describe applications’ runtime behaviors, it is still viable to use such information as contextual properties that capture users’ and developers’ opinions and complement runtime be- havior information. In terms of representing the opinions of users, we use the following features in correspondence with their counterparts of meta information on application markets:
These three features capture an application’s popularity and reputation. The first two features are similar to num- ber of views and comments in online social networks. Re- cent studies [37] demonstrated that online social networks and crowd-sourcing systems expose a long-tailed distribu- tion. Therefore, we assume they follow the same distribution and use the logarithmic values. We emphasize that we do not attempt to extract risk sig- nals from these features. Instead, we adopt these features
(^1) Number of installs is specified with exponentially increasing
ranges: 1+, 5+,... , 1K+, 5K+,... , 1M+, 5M+.
Figure 2: SOM Representation of 13 Categories
to capture the underlying patterns of a user’s trusted appli- cations as specified by the user and apply the patterns for the subsequent risk assessment. Next, we propose a feature to capture the developer’s opinion:
Google Play uses an application’s category to describe its core functionalities (e.g. “Communication”). As of this writ- ing, Google Play provides 27 category types. We choose Self- Organizing Map (SOM) to give a 2-dimension representation of categories. Barrera et al. [6] demonstrated that SOM can produce a 2-dimensional, discretized representation of permissions requested by different categories of Android ap- plications. Categories in which applications request similar permissions are clustered together. Therefore we use the x and y coordinates in the map to represent categories. Fig- ure 2 depicts the coordinates of 13 categories as an example. It is clear to see that some categories bear underlying sim- ilarities, such as “Entertainment”, “Media and Video” and “Music and Audio” in the center of the figure^2. Clearly an unscrupulous developer can claim an irrelevant category to disguise an application’s intended core function- alities. However, a user can easily notice the inconsistencies and remove such applications. In addition, falsifying an ap- plication’s meta information violates the terms of applica- tion market’s developer policies and may lead to immediate takedown. Finally, based on the scheme defined by these features, the application intelligence aggregator generates a dataset consisted of feature vectors extracted from API traces and meta information of each installed application.
3.2 Baseline Learner The baseline learner is the core module of RiskMon. It takes two types of inputs, which are a user’s expectations and feature vectors extracted by the application intelligence aggregator. Then the baseline learner generates a risk as- sessment baseline which is represented as a predictive model.
3.2.1 Acquiring Security Requirements It is challenging for most users to express their security requirements accurately. We aim to find an approach that (^2) For more details on SOM, please refer to [6].
Support Vector Machine (SVM) solver to classify the order of pairs of objects. Next we explain how we apply RSVM to learn a risk assessment baseline. We assume that a set of ranking functions f ∈ F exists and satisfies the following:
xi ≺ xj ⇐⇒ f (xi) < f (xj ), (1)
where ≺ denotes a preferential relationship of risks. In the simplest form of RSVM, we assume that f is a linear function:
f (^) w (x) = 〈 w, x〉, (2)
where w is a weight vector, and 〈·, ·〉 denotes inner product. Combing (1) and (2), we have the following:
xi ≺ xj ⇐⇒ 〈 w, xi − xj 〉 < 0 , (3)
Note that xi−xj is a new vector that expresses the relation xi ≺ xj between xi and xj. Given the training set T , we create a new training set T ′^ by assigning either a positive label z = +1 or a negative label z = −1 to each pair (xi, xj ).
(xi, xj ) : zi,j =
+1 if ri > rj − 1 if ri < rj ∀(xi, ri), (xj , rj ) ∈ T
In order to select a ranking function f that fits the training set T ′, we construct the SVM model to solve the following quadratic optimization problem:
minimize w
w · w + C
ξi,j
subject to ∀(xi, xj ) ∈ T ′^ : zi,j 〈 w, xi − xj 〉 ≥ 1 − ξi,j ∀i∀j : ξi,j > 0
Denoting w∗^ as the weight vector generated by solving (5), we define the risk scoring function f (^) w∗ , for assigning risk scores to the feature vectors in the application intelligence dataset:
f (^) w∗ = 〈 w∗, x〉 (6)
For any x ∈ X, the risk scoring function measures its projection onto w∗, or the distance to a hyperplane whose normal vector is w∗. Thus, the hyperplane is indeed the risk assessment baseline.
3.3 Risk Meter
Risk meter measures the risks incurred by each installed application including those are trusted by the user. Note that (6) gives a signed distance. We use the absolute value to represent the deviation and risk. The risks incurred by an application ai are the cumulative risks of its runtime behaviors: ∑
x∈Dai
|f (^) w∗ (x)| (7)
Another goal of the risk meter is to provide supporting evidences to end-users. To this end, it presents the measured risks at three levels of granularities. Application: In the simplest form, the risk meter presents a ranking of installed applications by their risks as a bar chart. The X axis indicates the applications and the Y axis indicates the risks. A user can trust an application if it is
less risky than her trusted ones. In contrast, an application that is significantly risky can also draw a user’s attention. Note that the risk meter does not provide any technical ex- planation at this level. Permission group: The ranking of applications may seem unconvincing sometimes for users. In such a case, the risk meter can provide risk composition by permission groups which is represented as a pie chart. The pie chart in- tuitively reveals the proportion of the risks incurred by the core functionalities of an application. As users have basic knowledge of permission groups when they specify security requirements, they should be able to interpret the risk com- position correctly. API calls and callbacks: The evidences presented at this level are intended for experienced security analysts who are familiar with the security mechanisms under the hood of Android. This is the raw data generated by the risk scoring function. An analyst can inspect values of features to reconstruct the semantic view of runtime behaviors. Moreover, RiskMon allows a user to establish and re- vise her security requirements iteratively. RiskMon may generate biased or unconvincing evidences as a user may not have clear and accurate security requirements at the very beginning of using RiskMon. Thus, a user can provide her feedback by adjusting her security requirements and/or adding more trusted applications. RiskMon also periodi- cally updates the security assessment baseline for observed new runtime behaviors. All of these enable RiskMon to approximate an optimum risk assessment baseline to help users make better decisions.
4. IMPLEMENTATION AND EVALUATION In this section we first discuss a proof-of-concept imple- mentation of RiskMon. Then, we present the results of our online user study followed by two case studies. We conclude our evaluation with the usability and performance of our system.
4.1 Implementation and Experimental Setup We implemented a proof-of-concept prototype of Risk- Mon on the Android mobile platform. In terms of contin- uous monitoring, we implemented a reference monitor for Binder IPC by placing hooks inside the Binder userspace library. The hooks tap into Binder transactions and log the parcels with zlog^3 which is a high-performance logging library. In addition, we implemented automated risk assess- ment based on SVMLight^4 and its built-in Gaussian radial basis function kernel. We designed and conducted a user study to evaluate the practicality and usability of RiskMon. We hand-picked 10 applications (Table 2) that were mostly downloaded from Google Play in their respective categories. We assumed that all the participants trust them. Then we used participants’ security requirements for the 10 applications and their appli- cation intelligences to generate the baselines. We also ran- domly selected 4 target applications from the Top Charts of Google Play to calculate their risks based on the generated baselines, including: a) CNN App for Android Phones (ab- breviated as CNN); b) MXPlayer; c) Pandora Internet Radio (abbreviated as Pandora); and d) Walmart. For both trusted (^3) https://github.com/HardySimpson/zlog (^4) http://svmlight.joachims.org/
Table 1: Demographics of the Participants Category # of users Gender Male 29 (87.9%) Female 4 (12.1%)
Age
18-24 15 (45.5%) 25-34 16 (48.5%) 35-54 2 (6.1%)
Education
Graduated high school or equivalent 3 (9.1%) Some college, no degree 6 (18.2%) Associate degree 1 (3.0%) Bachelor’s degree 11 (33.3%) Post-graduate degree 12 (36.4%)
Table 2: Applications Assumed to be Trusted by the Participants in the User Study
Application Category AmazonMobile Shopping BejeweledBlitz Game ChaseMobile Finance Dictionary.com Books & Reference Dropbox Productivity Google+ Social GooglePlayMovies&TV Media & Video Hangouts(replacesTalk) Communication MoviesbyFlixster Entertainment Yelp Travel & Local
(10) and target (4) applications, we collected their one-day runtime behaviors on a Samsung Galaxy Nexus phone. In addition, we developed a web-based system that acquires a participant’s security requirements, feeds them to Risk- Mon and presents the results calculated by RiskMon to the participant. A participant was first presented with a tutorial page that explains how to specify relevancy levels as her security requirements. Then she was required to set relevance levels for each permission group requested by each trusted application after reading the application’s descrip- tions on Google Play. Afterwards, RiskMon generated a risk assessment baseline for the participant based on her in- puts and runtime behaviors of the 10 trusted applications. Then RiskMon applied the baseline on each of the 14 ap- plications, and displayed a bar chart that illustrates a rank- ing of 14 applications by their measured cumulative risks. Finally, an exit survey was presented to collect the partici- pant’s perceived usability of RiskMon. Our study protocol was reviewed by our institution’s IRB. And we recruited participants through university mailing lists and Amazon MTurk. 33 users participated in the study and Table 1 lists the demographics of them.
4.2 Empirical Results
4.2.1 Security Requirements From our user study shown in Table 2, we highlight the results of Chase Mobile and Dropbox because they both re- quest some ambiguous permission groups that are hard to justify for users. Figure 4 demonstrates the average rel- evancy levels set by the participants for each permission group requested by Chase Mobile and Dropbox. The error bars indicate the standard deviation.
(a) Chase Mobile
(b) Dropbox
Figure 4: Average Relevancy Levels Specified by the Participants for Chase Mobile and Dropbox
Chase Mobile is a banking application with functionali- ties like depositing a check by taking a picture and locat- ing nearest branches. Apparently NETWORK is more relevant than others as participants agree that Chase Mobile needs to access the Internet. Even though Chase Mobile uses LO- CATION to find nearby bank branches and CAMERA to deposit checks, both LOCATION and CAMERA have lower relevancy lev- els than NETWORK. We believe it is because some participants do not have the experiences of using such functionalities, but the averages are still higher than neutral. We can also observe that SOCIAL_INFO falls below “neutral”, showing par- ticipants’ concerns of why Chase Mobile uses such informa- tion. Dropbox is an online file storage and synchronization ser- vice. From its results, we identified an interesting permis- sion group, APP_INFO, whose description in Android’s offi- cial document is: group of permissions that are related to the other applications installed on the system. This au- thoritative description does not provide any cue of nega- tive impacts, which leads to user confusion as we can see that APP_INFO has the largest standard deviation. STORAGE, SYNC_SETTINGS and ACCOUNTS are all above “probably rele- vant” possibly due to their self-descriptive names that are semantically close to Dropbox’s core functionalities. Moreover, we noticed that the participants tend to set higher relevancy levels for self-descriptive permission groups, while they tend to be conservative for other permission groups. We note that this does not affect RiskMon in acquiring a user’s security requirements, because RiskMon captures the precedence of one permission group over another. Thus, the least relevant permission group (e.g. SOCIAL_INFO of Chase Mobile) always gets the highest risk scores for both trusted and distrusted applications.
Figure 5: Average Cumulative Risk Scores Measured by the Participants’ Risk Assessment Baselines
Table 5: Microbenchmark Results Benchmark Average (s) Standard Deviation (s) Feature extraction 8.27 0. Baseline generation (10 apps) 289.56 235. Risk measurement (per app) 0.55 0.
4.5 System Overhead
To understand the performance overhead of RiskMon, we performed several microbenchmarks. The experiments were performed on a Samsung Galaxy Nexus phone with a 1.2GHz dual-core ARM CPU. The phone runs Android v4.2.2 and RiskMon built on the same version. Table 5 shows the average results. Feature extraction: The application intelligence aggre- gator extracted feature vectors from the raw API traces of 33,368,458 IPC transactions generated by 14 applications in one day. We measured the CPU-time used by parsing the API traces and generating the feature vectors. The aver- age time is 8.27 seconds, which is acceptable on a resource- constrained mobile device. Baseline generation: We ran baseline generation based on the input acquired in the online user study. The process- ing time varies for different participants, while the average time is approximately 289.56 seconds due to the computa- tion complexity of the radial basis function kernel of SVM- Light. Risk measurement: Applying the risk assessment base- line is much faster than baseline generation. We measured the time taken to apply a risk assessment baseline on 14 ap- plications. The average time per application is 0.55 seconds, which is imperceptible and demonstrates the feasibility of repeated risk assessment. Finally, we anecdotally observed that it took 5-10 min- utes for the participants to set relevancy levels for 10 appli- cations. This usability overhead is acceptable compared to the lifetime of a risk assessment baseline.
5. DISCUSSION
To capture actual risks incurred by applications used by a user, RiskMon fundamentally requires running them on the user’s device. We note that 48.5% of the respondents in our user study claimed that they often test drive applications on their devices. RiskMon itself does not detect or pre- vent sensitive data from leaving users’ devices. We would
recommend users use on-device isolation mechanisms (e.g. Samsung KNOX^6 ) or data shadowing (e.g. [22]). However, it is far from perfect for running untrusted applications on trusted operating systems. RiskMon requires users to specify security requirements through permission groups. While most of the frequently requested permission groups are self-descriptive (e.g. LOCA- TION and CAMERA), some are ambiguous (e.g. APP_INFO) and contain low-level APIs only known to developers. Although we identify permission groups as an appropriate trade-off between granularity and usability, we admit that permission groups are still a partial artifact in representing sensitive re- sources for users. Note that we choose permission groups only to demonstrate the feasibility of our approach of se- curity requirement communication. As our future work, we plan to develop a systematic and intuitive taxonomy of sen- sitive resources on mobile devices to facilitate more effective requirement communication. Moreover, generating a risk assessment baseline is a compute-intensive task that does not quite fit resource-constrained mobile devices. Thus, we plan to offload such a task to trusted third-parties or users’ public or private clouds in the future. Regarding our current implementation of RiskMon, it does not address: (1) interactions between third-party ap- plications; and (2) interactions that do not utilize Binder. This indeed illustrates potential attack vectors that can by- pass RiskMon. Unauthorized accesses on resources of third- party applications [11] might be possible because such re- sources are not protected by system permissions. Also, two or more malicious applications can collude via local sockets or covert channels and evade the Binder-centric reference monitor in RiskMon. For our future work, we will extend our framework to maximize the coverage of attack vectors in our approach.
6. RELATED WORK Analysis of meta information: Meta information available on application markets provides general descrip- tions of applications. Recent work has proposed techniques to distill risk signals from them. Kirin [16] provides a conser- vative certification technique that enforces policies to miti- gate applications with risky permission combinations at in- stall time. Sarma et al. [33] propose to analyze permissions alongside with application categories in two large application (^6) http://www.samsung.com/global/business/mobile/ solution/security/samsung-knox#con
datasets. Peng et al. [28] use probabilistic generative models to generate risk scoring schemes that assign comparative risk scores on applications based on their requested permissions. In addition to analysis on permissions, Chia et al. [10] and Chen et al. [9] performed large-scale studies on application popularity, user ratings and external community ratings. In particular, Pandita et al. proposed WHYPER [26] which automatically infers an application’s necessary permissions from its description in natural languages. However, meta in- formation does not accurately describe the actual behaviors of applications. RiskMon uses meta information to provide contextual information so as to complement the analysis on the runtime behaviors for risk assessment. Static and dynamic analysis: Analysis on execution semantics of applications, such as static analysis of code and dynamic analysis of runtime behaviors, can reveal how ap- plications use sensitive information. Stowaway [17] extracts API calls from a compiled Android application and reveals its least privilege set of permissions. Enck et al. [15] devel- oped a decompiler to uncover usage of phone identifiers and locations. Pegasus [8] checks temporal properties of API calls and detects API calls made without explicit user con- sent. TaintDroid [14] uses dynamic information flow track- ing to detect sensitive data leaking to the network. Regard- ing malware analysis, DroidRanger [40] and RiskRanker [20] are systematic and comprehensive approaches that combine both static and dynamic analysis to detect dangerous behav- iors. DroidScope [39] reconstructs semantic views to collect detailed execution traces of applications. These work focuses on fundamental challenges for assessing actual risks incurred by applications. However, they do not provide a baseline to capture the appropriate behaviors under diverse contexts of different applications. Thus, their approaches are more in- tended for security analysts rather than end users. Mandatory access control frameworks: RiskMon includes a lightweight reference monitor for Binder IPC. While it monitors IPC transactions for risk assessment, sev- eral frameworks mediate IPC channels as part of their ap- proaches to support enhanced mandatory access control (MAC). SEAndroid [34] brings SELinux kernel-level MAC to An- droid. It adds new hooks in the Binder device driver to address Binder IPC. Quire [13] provides IPC provenance by propagating verifiable signatures along IPC chains so as to mitigate confused deputy attacks. Aurasium [38] uses libc interposition to efficiently monitor IPC transactions without modifying the Android platform. FlaskDroid [7] provides flexible MAC on multiple layers, which is tailored the pecu- liarity of the Android system. Along these lines, RiskMon captures Binder transactions with a fine-grained scheme to facilitate risk assessment on applications’ runtime behaviors.
7. CONCLUSION
In this paper, we have presented RiskMon that continu- ously and automatically measures risks incurred by a user’s installed applications. RiskMon has leveraged machine- learned ranking to generate a risk assessment baseline from a user’s coarse expectations and runtime behaviors of her trusted applications. Also we have described a proof-of- concept implementation of RiskMon, along with the ex- tensive evaluation results of our approach.
8. ACKNOWLEDGEMENTS This work was supported in part by the NSF grant (CNS- 0916688). Any opinions, findings, and conclusions or rec- ommendations expressed in this material are those of the authors and do not necessarily reflect the views of the fund- ing agencies. We would also like to thank the anonymous reviewers for their valuable comments that helped improve the presentation of this paper. 9. REFERENCES
[1] App review - apple developer. https://developer. apple.com/support/appstore/app-review/, 2013. [2] C. Alberts, A. Dorofee, J. Stevens, and C. Woody. Introduction to the octave approach. Pittsburgh, PA, Carnegie Mellon University, 2003. [3] C. J. Alberts and A. Dorofee. Managing information security risks: the OCTAVE approach. Addison-Wesley Longman Publishing Co., Inc., 2002. [4] K. W. Y. Au, Y. F. Zhou, Z. Huang, and D. Lie. Pscout: analyzing the android permission specification. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 217–228. ACM, 2012. [5] D. Balzarotti, M. Cova, C. Karlberger, E. Kirda, C. Kruegel, and G. Vigna. Efficient detection of split personalities in malware. In Proceedings of the 19th Annual Network and Distributed System Security Symposium, 2010. [6] D. Barrera, H. G. Kayacik, P. C. van Oorschot, and A. Somayaji. A methodology for empirical analysis of permission-based security models and its application to android. In Proceedings of the 17th ACM conference on Computer and communications security, pages 73–84. ACM, 2010. [7] S. Bugiel, S. Heuser, and A.-R. Sadeghi. Flexible and fine-grained mandatory access control on android for diverse security and privacy policies. In 22nd USENIX Security Symposium (USENIX Security 2013). USENIX, 2013. [8] K. Z. Chen, N. Johnson, V. D’Silva, S. Dai, K. MacNamara, T. Magrino, E. Wu, M. Rinard, and D. Song. Contextual policy enforcement in android applications with permission event graphs. 2013. [9] Y. Chen, H. Xu, Y. Zhou, and S. Zhu. Is this app safe for children?: a comparison study of maturity ratings on android and ios applications. In Proceedings of the 22nd international conference on World Wide Web, pages 201–212. International World Wide Web Conferences Steering Committee, 2013. [10] P. H. Chia, Y. Yamamoto, and N. Asokan. Is this app safe?: a large scale study on application permissions and risk signals. In Proceedings of the 21st international conference on World Wide Web, pages 311–320. ACM, 2012. [11] E. Chin, A. P. Felt, K. Greenwood, and D. Wagner. Analyzing inter-application communication in android. In Proceedings of the 9th international conference on Mobile systems, applications, and services, pages 239–252. ACM, 2011. [12] E. Chin, A. P. Felt, V. Sekar, and D. Wagner. Measuring user confidence in smartphone security and