Detecting fraud applications using sentiment analysis, Study Guides, Projects, Research of Data Mining

A basic detailed report of the entire project. Data Mining techniques and also Sentiment analysis.

Typology: Study Guides, Projects, Research

2019/2020

Uploaded on 05/03/2020

lazyrover
lazyrover 🇮🇳

1 document

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Discovery of Ranking Fraud for Mobile Apps
Abstract
Ranking fraud in the mobile App market refers to fraudulent or deceptive
activities which have a purpose of bumping up the Apps in the popularity list.
Indeed, it becomes more and more frequent for App develops to use shady
means, such as inflating their Apps’ sales or posting phony App ratings, to commit
ranking fraud. While the importance of preventing ranking fraud has been widely
recognized, there is limited understanding and research in this area. To this end,
in this paper, we provide a holistic view of ranking fraud and propose a ranking
fraud detection system for mobile Apps.
Specifically, we investigate two types of evidences, ranking based evidences and
rating based evidences, by modeling Apps’ ranking and rating behaviors through
statistical hypotheses tests. In addition, we propose an optimization based
aggregation method to integrate all the evidences for fraud detection. Finally, we
evaluate the proposed system with real-world App data collected from the
Apple’s App Store for a long time period. In the experiments, we validate the
effectiveness of the proposed system, and show the scalability of the detection
algorithm as well as some regularity of ranking fraud activities.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Detecting fraud applications using sentiment analysis and more Study Guides, Projects, Research Data Mining in PDF only on Docsity!

Discovery of Ranking Fraud for Mobile Apps

Abstract

Ranking fraud in the mobile App market refers to fraudulent or deceptive activities which have a purpose of bumping up the Apps in the popularity list. Indeed, it becomes more and more frequent for App develops to use shady means, such as inflating their Apps’ sales or posting phony App ratings, to commit ranking fraud. While the importance of preventing ranking fraud has been widely recognized, there is limited understanding and research in this area. To this end, in this paper, we provide a holistic view of ranking fraud and propose a ranking fraud detection system for mobile Apps. Specifically, we investigate two types of evidences, ranking based evidences and rating based evidences, by modeling Apps’ ranking and rating behaviors through statistical hypotheses tests. In addition, we propose an optimization based aggregation method to integrate all the evidences for fraud detection. Finally, we evaluate the proposed system with real-world App data collected from the Apple’s App Store for a long time period. In the experiments, we validate the effectiveness of the proposed system, and show the scalability of the detection algorithm as well as some regularity of ranking fraud activities.

Introduction

Web spam refers to all forms of malicious manipulation of user generated data so as to impudence usage patterns of the data. The number of mobile Apps has grown at a breath Taking rate over the past few years. For example, as of the end of April 2013, there are more than 1.6million Apps at Apple’s App store and Google Play. To stimulate the development of mobile Apps, many App stores launched daily App leader boards, which demonstrate the chart rankings of most popular Apps. Indeed, the App leader boar’s one of the most important way for promoting mobile Apps. A higher rank on the leader board usually leads to huge number of downloads and million dollars in the revenue. Therefore, App developers tend to explore various ways such as advertising campaigns to promote their Apps in order to have their Apps ranked as high as possible in such App leader boards. Margins, column widths, line spacing, and type styles are built-in; examples of the type styles are provided throughout this document and are identified in italic type, hence within parentheses, following the example. Some components, such as multi-leveled equations, graphics, and tables are not prescribed, although the various table text styles are provided. The formatter will need to create these components, incorporating the applicable criteria that follow. Indeed, our careful observation reveals that mobile Apps are not always ranked high in the leader board, but only in some leading events, which form different leading sessions. Note that we will introduce both leading events Ease of Use and leading sessions in detail later. In other words, ranking fraud usually happens in these leading sessions. Therefore, detecting ranking fraud of mobile Apps is actually to detect ranking fraud within leading sessions of mobile Apps.

Proposed System

Specifically, we first propose a simple yet effective algorithm to identify the leading sessions of each App based on its historical ranking records. Then, with the analysis of Apps’ ranking behaviors, we find that the fraudulent Apps often have different ranking patterns in each leading session compared with normal Apps. Hence thus, we have characterized some fraud evidences from Apps’ historical ranking records, and develop three functions to extract such ranking based fraud evidences. Nonetheless, the ranking based evidences can be affected by App developers’ reputation and some legitimate marketing campaigns, such as “limited- time discount”. As a result, it is not sufficient to only use ranking based evidences. Therefore, we f further propose two types of fraud evidences based on Apps’ rating and review history, which reflect some anomaly patterns from Apps’ historical rating and review records. In addition, we develop an unsupervised evidence-aggregation method to integrate these three types of evidences for evaluating the credibility of leading sessions from mobile Apps. Figure 1 shows the framework of our ranking fraud.

Advantages of Proposed System

It is worth noting that all the evidences are extracted by modeling Apps’ ranking, rating and review behaviors through statistical hypotheses tests. The propose framework is scalable and can be extended with other domain generate d evidences for ranking fraud detection. Finally, we evaluate the proposed system with real- world App data collected from the Apple’s App s tore for a long time period, i.e., more than two years. Experimental results show the effectiveness of the proposed system, the scalability of the detection algorithm as well as some regularity of ranking fraud activities. According to the definitions introduced in, a leading session is composed of several leading events. Therefore, w should first analyze the basic characteristics of leading evens for extracting fraud evidences hence .By the analyzing the Apps’ historical ranking records, we observe those that Apps’ ranking behaviors in a leading even always satisfy a specific ranking pattern, which consists of the three different ranking phase, namely, rising phase, maintaining phase and recession phase. A leading session is composed of several leading events. By analyzing the Apps’ historical ranking records, we observe that Apps’ ranking behaviors in a leading event always satisfy a specific ranking pattern, which consists of three different ranking phases, namely, rising phase, maintaining phase and recession phase. Besides ratings, most of the App stores also allow users to write some textual comments as App reviews Such reviews can reflect the personal perceptions and usage experiences of existing users for particular mobile apps. Indeed , the review manipulation is one of the most important perspective of App ranking fraud.

Rating Based Evidences:

The ranking based evidences are useful for ranking fraud detection. However sometimes it is not sufficient to only use ranking based evidences. Specifically, after an app has been published it can be rated by any user who downloaded it, Indeed, user rating is one of the most important features of app advertisement. An app which has higher rating may attract more users to download and can also be ranked higher in the leader board. Thus, rating manipulation is also an important perspective or ranking fraud. Intuitively, if n app has ranking fraud in a leading session, the rating during the time period of s may have anomaly patterns compared with its historical ratings, which can be used for constructing rating based evidences.

Review Based Evidences:

Besides rating, most of app stores also allow users to write some textual comments as app reviews. Such review can reflect the personal perception and usage experience of existing users for particular mobile apps. Indeed, review manipulations is one of the most important perspectives of App ranking fraud. Specifically, before downloading or purchasing new mobile app. Users often firstly read its historical review to ease their decision making and a mobile app contains more positive review my attract more users to download. Therefore imposters often post fake review in the leading sessions of specific app in order to inflate the app download, and thus propel the app ranking position in the leader board. Although some previous works on review spam detection has been reported in recent years, the problem of detecting local anomaly of review in the leading session and capturing them as evidences for ranking fraud detection are still user explored.

POS Tagging(pre-processing):

A Part-Of-Speech Tagger (POS Tagger) is a piece of software that reads text

in some language and assigns parts of speech to each word (and other token),

such as noun, verb, adjective, etc., although generally computational

applications use more fine-grained POS tags like 'nounplural'. The tagger was

originally written by Kristina Toutanova. Since that time, Dan Klein,

Christopher Manning, William Morgan, Anna Rafferty, Michel Galley, and

John Bauer have improved its speed, performance, usability, and support for

other languages. The system requires Java 1.6+ to be installed. Depending on

whether you're running 32 or 64 bit Java and the complexity of the tagger

model, you'll need somewhere between 60 and 200 MB of memory to run a

trained tagger (i.e., you may need to give java an option like java - mx200m).

Plenty of memory is needed to train a tagger. It again depends on the

complexity of the model but at least 1GB is usually needed, often more.

Several downloads are available. The basic download contains two trained

tagger models for English. The full download contains three trained English

tagger models, an Arabic tagger model, a Chinese tagger model, and a

German tagger model. Both versions include the same source and other

required files. The tagger can be retrained on any language, given POS-

annotated training text for the language.

Sentiment Analysis Using Sentiwordnet:-

SENTIWORDNET[8] is the result of the automatic annotation of all the

synsets of WORDNET according to the notions of “positivity”, “negativity”,

and “neutrality”. Each synset s is associated to three numerical scores Pos(s),

Neg(s), and Obj(s) which indicate how positive, negative, and “objective”

(i.e., neutral) the terms contained in the synset are. Different senses of the

same term may thus have different opinion-related properties. For example,

in SENTIWORDNET 1.0 the synset [estimable(J,3)] corresponding to the

Hardware Specification  Intel processor IV and above  4 GB RAM  500 GB hard disk SOFTWARE REQUIREMENTS:  Visual Studio 2015  SQL Server  Windows Operating System

phase is full filled. This is also to check that if all the functions proposed are working properly. This is further done in two phases:

  • One before the integration to see if all the unit components work properly
  • Second to see if they still work properly after they have been integrated to check if some functional compatibility issues arise. PERFORMANCE TESTING: Expected Result  The client should be able to connect to the server properly without any problems.  The connection establishment between the mobile device and the server should take minimal time.  The mobile device should be able receive data from the server uninterruptedly.  Information provided by the application should be correct and as per the users need.

 Observation  Connection can be established easily provided that the server is on.  The connection with the server takes time as it uses Internet connection.  Receiving data from the server takes time.  Information coming from the database is correct. LOAD / STRESS TESTING :  Expected Result  Response time should be unaffected irrespective of the no of users.  The introduction of the newer clients should not make the server to work hap hazardously.  Continuous use of the server by different clients should not result into the server getting slowed down.  Response time should not be degraded if there is congestion in network.  Observation  The speed of transmission was fine even when the newer clients were getting added. The response of the server was satisfying even with the introduction of newer client.

Step 1: Software Concept The first step is to identify a need for the new system. This will include determining whether a business problem or opportunity exists, conducting a feasibility study to determine if the proposed solution is cost effective, and developing a project plan. This process may involve end users who come up with an idea for improving their work. Ideally, the process occurs in tandem with a review of the organization's strategic plan to ensure that IT is being used to help the organization achieve its strategic objectives. Management may need to approve concept ideas before any money is budgeted for its development. Step 2: Requirements Analysis Requirements analysis is the process of analyzing the information needs of the end users, the organizational environment, and any system presently being used, developing the functional requirements of a system that can meet the needs of the users. Also, the requirements should be recorded in a document, email, user interface storyboard, executable prototype, or some other form. The requirements documentation should be referred to throughout the rest of the system development process to ensure the developing project aligns with user needs and requirements. Professionals must involve end users in this process to ensure that the new system will function adequately and meets their needs and expectations.

Step 3: Architectural Design After the requirements have been determined, the necessary specifications for the hardware, software, people, and data resources, and the information products that will satisfy the functional requirements of the proposed system can be determined. The design will serve as a blueprint for the system and helps detect problems before these errors or problems are built into the final system. Professionals create the system design, but must review their work with the users to ensure the design meets users' needs. Step 4: Coding and Debugging Coding and debugging is the act of creating the final system. This step is done by software developer. Step 5: System Testing The system must be tested to evaluate its actual functionality in relation to expected or intended functionality. Some other issues to consider during this stage would be converting old data into the new system and training employees to use the new system. End users will be key in determining whether the developed system meets the intended requirements, and the extent to which the system is actually used. Step 6: Maintenance Inevitably the system will need maintenance. Software will definitely undergo change once it is delivered to the customer. There are many reasons for the change. Change could happen because of some unexpected input values into the system. In addition, the changes in the system could directly affect the software operations. The software should

The model consists of six distinct stages, namely:

  1. In the requirements analysis phase (a) The problem is specified along with the desired service objectives (goals) (b) The constraints are identified
  2. In the specification phase the system specification is produced from the detailed definitions of (a) and (b) above. This document should clearly define the product function.
  1. In the system and software design phase , the system specifications are translated into a software representation. The software engineer at this stage is concerned with:  Data structure  Software architecture  Algorithmic detail  Interface representations The hardware requirements are also determined at this stage along with a picture of the overall system architecture. By the end of this stage should the software engineer should be able to identify the relationship between the hardware, software and the associated interfaces. Any faults in the specification should ideally not be passed ‘down stream.
  2. In the implementation and testing phase stage the designs are translated into the software domain  Detailed documentation from the design phase can significantly reduce the coding effort.  Testing at this stage focuses on making sure that any errors are identified and that the software meets its required specification.
  3. In the integration and system testing phase all the program units are integrated and tested to ensure that the complete system meets the software requirements. After this stage the software is delivered to the customer [ Deliverable – The software product is delivered to the client for acceptance testing .]
  4. The maintenance phase the usually the longest stage of the software. In this phase the software is updated to:  Meet the changing customer needs  Adapted to accommodate changes in the external environment