



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The complexities of Incident Handling and the need for automation. It proposes a comprehensive workflow for semi-automated Incident Handling, which minimizes human efforts and gathers automated processing units with expert teams. The document also provides an overview of Incident Handling procedures and tools recommended by various organizations.
Typology: Cheat Sheet
1 / 6
This page cannot be seen from the preview
Don't miss anything!




∗APA Research Center Isfahan University of Technology {hashemi, babaeizadeh, nowruzi, hadian, shahmoradi, biglarbeigi}@nsec.ir
Abstract—Dramatic growth of Information Technology in ev- ery organization increased the number of computer security incidents in recent years. These incidents result in huge finan- cial and reputational lost even in small companies. Naturally, demands for computer-related incident management have been increased. Nowadays, Incident Handling still is a very complex and critical task which mainly done by human expert teams. The cost of keeping such team ready 24x7 is very high, especially in big organizations with large networks. Consequently, automated Incident Handling is greatly desired. However, this task contains many factors and is very human dependent that made it very challenging to automate. In this study, after a review of Incident Handling methods, a comprehensive workflow for semi- automated Incident Handling has been proposed. This workflow has been suggested based on common principles in this concept and gathers automated processing units with expert teams in a way which minimizes human efforts for Incident Handling. Keywords: Incident Handling, Semi-Automated Incident Han- dling, Incident Management, Computer Security.
I. INTRODUCTION By the appearance of “Moris Worm” in 1988, organizations faced a new problem: prevention of computer security inci- dents and responding to them [1]. Trying to overcome this problem, a set of disciplines is defined in every organization under the title of “Incident Management”. The most impor- tant discipline of Incident Management is Incident Handling, which is responding to an incident report as fast as possible [2]. The security monitoring of the growing networks today is mostly performed using Intrusion Detection Systems (IDS) [3]. Looking for evidence of malicious behavior, these systems generate streams of alerts. Trying to detect and analyze an incident, a security team has two major problems: IDSs generate lots of false alarms as well as a large number of alerts for a single event [4]. Correlation Techniques try to overcome these problems by reducing and fusing these alerts to generate a more accurate and meaningful attack reports [5]. These technologies provide very basic tools or no means of responding to security events and only detect them. The Incident Handling phase is currently depends on Computer Emergency Response Team (CERT) which faces too many incident reports everyday to handle and they are not able to respond to all of them. Attempting to overcome this shortage, Incident Handling routines have been recommended to avoid
non-important reports and enhance the performance of security teams. The famous Incident Handling recommendations pro- posed by ENISA^1 [2][6], NIST^2 [7], CERT/CC[8][9], SANS Institute[10], and ITU-T^3 [11][12]. All of these recommenda- tions are specialized for humans rather than machines and highly depends on expertise of CERTs. Hence, the cost of Incident Handling specially in large organizations grows dramatically due to keep a expert human team ready 24 × 7 [6]. Crafting a more automated Incident Handling system can obviously reduce this cost. The fact is that Incident Handling is a very complex task and depends on many factors such as type of the incident, status of the organization, business continuity, capability of expert teams, and many other factors which make automating the process very challenging. Furthermore, according to [7]: “... every incident response team relies on the exper- tise, judgment, and abilities of other teams, including management, information security, IT support, legal, public affairs, and facilities management.... ” This paper proposes a comprehensive semi-automated Incident Handling workflow, since crafting a fully automated Incident Handling workflow seems impractical. The suggested work- flow tries to minimize the human effort required for Incident Handling while covers all aspects of the procedure. The rest of the paper is organized as follows: Section 2 provides a description of Incident Handling procedure and related works. Section 3 presents the proposed workflow and describes its details. Finally, Section 4 draws conclusion remarks and future works.
II. INCIDENT HANDLING Incident Handling is the rapid detection of incidents, while minimize the effects, mitigate the causes, and restoring the affected resources [7]. CERTs as the responsible groups for Incident Handling in organizations, utilize a combinations of procedures and tools to achieve these goals.
(^1) European Network and Information Security Agency (^2) National Institute of Standard and Technology (^3) International Telecommunication Union Telecommunication Standardiza- tion Sector
6'th International Symposium on Telecommunications (IST'2012)
978-1-4673-2073-3/12/$31.00 ©2012 IEEE
There are many recommended Incident Handling procedures available. For example NIST in [7] suggests a four step approach:
CERT/CC also proposes a four step procedure for Incident Handling in [9]:
In [6] ENISA suggests a three step process:
Several other researchers and security groups proposed similar procedures used by CERTs around the world [8] [10] [13] [14]. They also provided multiple tools to ease the Incident Han- dling tasks. Some of them try to facilitate Incident Handling process by utilizing collaborating tools like ticketing systems [15] such as AIRT [16] and RTIR [15]. Others attempt to au- tomate decision making using data mining, expert systems and artificial intelligence in multiple stages of Incident Handling. For example, [17] tried to match most similar previously han- dled incidents, based on ontology and Case Based Reasoning (CBR). Or [18] utilize Recency, Frequency, Monetary (RFM) analysis methodology and CBR to find suspicious users and addresses. Many tools also provide technical help to CERTs for incident response such as AIRS [19] and TRM [20].
Regardless of many Incident Handling methods, their con- cepts look the same and it seems possible to introduce a general procedure which covers the entire procedure suggested by various sources. The relation between different phases of this procedure in famous recommendations has been shown in Table I. A comprehensive method should cover the following principles:
A. Preparations and Receiving
The capability to detect and receive security event reports must be established for Incident Handling procedure. Security event reports origins from four resources [7]:
B. Triage
Triage is the first step in Incident Handling and refers to two very important steps in this procedure [21]:
C. Analysis
Analysis is the act of extracting all the information required for an appropriate incident response from available resources [7]. This information includes:
Fig. 1. The proposed workflow: white shapes refer to processing units, purple shapes demonstrates human processing, and orange shapes draws archiving the incident and early end of the workflow.
vulnerability management, a reported incident containing an old vulnerability is mostly a false alarm and therefore should not be considered. A policy like this which is decided in higher level of administration is represented by Filters. Each filter defines an administrative policy for Incident Handling which indicates if an incident requires handling or not. If the organization keeps record of existing vulnerabilities in its assets, it is also possible to implement some filters based on this kind of information. Filters mostly require very little processing and are very fast. They also can filter out large number of false positive and non important alarms. Therefore, by utilizing enough pre-defined filters, the number of incidents needed to be processed will decrease significantly. This will also reduce the amount of human work required by the whole system. Incidents which passed all the filters are verified to be a valid alarm in Verification unit. The verification phase is a very complicated task which is too hard to be fully automatic. Therefore, using interactive machine learning methods which utilize previous human decisions is inventible. These decisions will generate the dynamic knowledge required for automatic decisions over the time. In this way, the automatic verification will works only if enough similar data resulted by manual verification exists. If such information exists, the current incident will by verified using machine learning methods. Otherwise, the workflow will pass the incident to human team for possible identification and then verification. If the automatic method verifies the incident with low confidence, the incident will be passed to expert team again. Obviously, the human interference will decrease by time. In order to analyze an incident, the very first thing to do is to specify its type and then its priority. These data are determined in Initial Classification and Prioritization. These components will determine the class of the incident (e.g. Malicious Code or Abusive Content) and, if possible, its type (e.g. Botnet or Worm in case of Malicious Code). A verified incident might be related to other incidents, currently being processed in the system. Therefore, it is very important to find these relationships. It is possible to utilize a rule engine and define relation between incidents in first order logic rules. If a relation between current and other incidents has been found, the data of older incidents will be updated and it is not required to handle the new incident any more. This unit also specifies the priority of the incident based on predefined Incident Handling routine specialized for the organization. In some special occasions, an incident might require a very quick response. In this situation, the incident will be treated especially from now on. For example, it will be sent to Analysis unit directly with highest possible priority and then will be responded as soon as possible.
B. Analysis
It is essential to assign available resources to more important incidents. This assignment relies on various information such as accessibility and quality of resources and incident related
information. In the analysis section, all of the required infor- mation for assigning available resources will be determined: incident class, assets sensitivity, technical impact, and max- imum time to start responding. Based on these information, incidents will be scheduled to be resolved. The type of the incident will be extracted automatically, if the initial classification unit could not retrieve this information. The security team will determine the incident type if the auto- matic method fails. Incident type is essential for determination of other required information and response strategy. In the next step, the technical impact of the incident will be extracted. Technical impact draws current and potential effect of incident in the organization. For instance, current effect of a worm incident is infection of a few hosts, while potential effect of it is large infection and probable huge data loss, if it is contained quickly. Therefore, the technical impact of an incident should be considered in incident response scheduling. Maximum time to start responding to an incident is another important factor in incident analysis. This criterion demon- strates the latest time in which the gathered information about the current incident or CERT reaction are valid and effective. For example, a Distributed Denial of Service incident may lasts for a very short time but costs a lot. Therefore, it requires a very quick response and the maximum possible response time is very little. Asset sensitivity of the incident is another vital factor of incident scheduling which shows critically of an asset in business continuity of the organization. For instance, an accounting server is possibly the most important asset in an ISP while a web server is not that sensitive. Bearing in mind all of these criteria, it is possible to schedule and assign available resources for resolving the incident.
C. Response and Post-Incident Activity
Response to an incident consists of three main phases:
to a sequence of commands. This loop continues till the incident stops. After a successfully execution of a response strategy, some post-incident activities can be done. These activities which aim for improvement of the organization security against incidents include:
[4] G. Vigna, “Network intrusion detection: dead or alive?” in Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 2010, pp. 117–126.