Semi-Automated Incident Handling: A Proposed Workflow for Efficient Incident Response, Cheat Sheet of Security Analysis

The complexities of Incident Handling and the need for automation. It proposes a comprehensive workflow for semi-automated Incident Handling, which minimizes human efforts and gathers automated processing units with expert teams. The document also provides an overview of Incident Handling procedures and tools recommended by various organizations.

Typology: Cheat Sheet

2020/2021

Uploaded on 06/07/2021

falaq-iqbal
falaq-iqbal 🇵🇭

1 document

1 / 6

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
A Comprehensive Semi-Automated
Incident Handling Workflow
Sayed Hadi Hashemi, Mohammad Babaeizadeh,
Mohsen Nowruzi, Hossein Hadian Jazi,
Mohammad Shahmoradi, Elaheh Biglar Beigi Samani
APA Research Center
Isfahan University of Technology
{hashemi, babaeizadeh, nowruzi, hadian, shahmoradi, biglarbeigi}@nsec.ir
Abstract—Dramatic growth of Information Technology in ev-
ery organization increased the number of computer security
incidents in recent years. These incidents result in huge finan-
cial and reputational lost even in small companies. Naturally,
demands for computer-related incident management have been
increased. Nowadays, Incident Handling still is a very complex
and critical task which mainly done by human expert teams. The
cost of keeping such team ready 24x7 is very high, especially in
big organizations with large networks. Consequently, automated
Incident Handling is greatly desired. However, this task contains
many factors and is very human dependent that made it
very challenging to automate. In this study, after a review of
Incident Handling methods, a comprehensive workflow for semi-
automated Incident Handling has been proposed. This workflow
has been suggested based on common principles in this concept
and gathers automated processing units with expert teams in a
way which minimizes human efforts for Incident Handling.
Keywords: Incident Handling, Semi-Automated Incident Han-
dling, Incident Management, Computer Security.
I. IN TRO DUC TIO N
By the appearance of “Moris Worm” in 1988, organizations
faced a new problem: prevention of computer security inci-
dents and responding to them [1]. Trying to overcome this
problem, a set of disciplines is defined in every organization
under the title of “Incident Management”. The most impor-
tant discipline of Incident Management is Incident Handling,
which is responding to an incident report as fast as possible
[2].
The security monitoring of the growing networks today is
mostly performed using Intrusion Detection Systems (IDS) [3].
Looking for evidence of malicious behavior, these systems
generate streams of alerts. Trying to detect and analyze an
incident, a security team has two major problems: IDSs
generate lots of false alarms as well as a large number of alerts
for a single event [4]. Correlation Techniques try to overcome
these problems by reducing and fusing these alerts to generate
a more accurate and meaningful attack reports [5].
These technologies provide very basic tools or no means
of responding to security events and only detect them. The
Incident Handling phase is currently depends on Computer
Emergency Response Team (CERT) which faces too many
incident reports everyday to handle and they are not able to
respond to all of them. Attempting to overcome this shortage,
Incident Handling routines have been recommended to avoid
non-important reports and enhance the performance of security
teams.
The famous Incident Handling recommendations pro-
posed by ENISA1[2][6], NIST2[7], CERT/CC[8][9], SANS
Institute[10], and ITU-T3[11][12]. All of these recommenda-
tions are specialized for humans rather than machines and
highly depends on expertise of CERTs. Hence, the cost
of Incident Handling specially in large organizations grows
dramatically due to keep a expert human team ready 24×7
[6]. Crafting a more automated Incident Handling system can
obviously reduce this cost.
The fact is that Incident Handling is a very complex task
and depends on many factors such as type of the incident,
status of the organization, business continuity, capability of
expert teams, and many other factors which make automating
the process very challenging. Furthermore, according to [7]:
“. . . every incident response team relies on the exper-
tise, judgment, and abilities of other teams, including
management, information security, IT support, legal,
public affairs, and facilities management. . . .
This paper proposes a comprehensive semi-automated Incident
Handling workflow, since crafting a fully automated Incident
Handling workflow seems impractical. The suggested work-
flow tries to minimize the human effort required for Incident
Handling while covers all aspects of the procedure.
The rest of the paper is organized as follows: Section 2
provides a description of Incident Handling procedure and
related works. Section 3 presents the proposed workflow
and describes its details. Finally, Section 4 draws conclusion
remarks and future works.
II. I NCIDENT HANDLING
Incident Handling is the rapid detection of incidents, while
minimize the effects, mitigate the causes, and restoring the
affected resources [7]. CERTs as the responsible groups for
Incident Handling in organizations, utilize a combinations of
procedures and tools to achieve these goals.
1European Network and Information Security Agency
2National Institute of Standard and Technology
3International Telecommunication Union Telecommunication Standardiza-
tion Sector
1065
6'th International Symposium on Telecommunications (IST'2012)
978-1-4673-2073-3/12/$31.00 ©2012 IEEE
Authorized licensed use limited to: University of Sunderland. Downloaded on March 07,2021 at 07:57:57 UTC from IEEE Xplore. Restrictions apply.
pf3
pf4
pf5

Partial preview of the text

Download Semi-Automated Incident Handling: A Proposed Workflow for Efficient Incident Response and more Cheat Sheet Security Analysis in PDF only on Docsity!

A Comprehensive Semi-Automated

Incident Handling Workflow

Sayed Hadi Hashemi∗, Mohammad Babaeizadeh∗,

Mohsen Nowruzi∗, Hossein Hadian Jazi∗,

Mohammad Shahmoradi∗, Elaheh Biglar Beigi Samani∗

∗APA Research Center Isfahan University of Technology {hashemi, babaeizadeh, nowruzi, hadian, shahmoradi, biglarbeigi}@nsec.ir

Abstract—Dramatic growth of Information Technology in ev- ery organization increased the number of computer security incidents in recent years. These incidents result in huge finan- cial and reputational lost even in small companies. Naturally, demands for computer-related incident management have been increased. Nowadays, Incident Handling still is a very complex and critical task which mainly done by human expert teams. The cost of keeping such team ready 24x7 is very high, especially in big organizations with large networks. Consequently, automated Incident Handling is greatly desired. However, this task contains many factors and is very human dependent that made it very challenging to automate. In this study, after a review of Incident Handling methods, a comprehensive workflow for semi- automated Incident Handling has been proposed. This workflow has been suggested based on common principles in this concept and gathers automated processing units with expert teams in a way which minimizes human efforts for Incident Handling. Keywords: Incident Handling, Semi-Automated Incident Han- dling, Incident Management, Computer Security.

I. INTRODUCTION By the appearance of “Moris Worm” in 1988, organizations faced a new problem: prevention of computer security inci- dents and responding to them [1]. Trying to overcome this problem, a set of disciplines is defined in every organization under the title of “Incident Management”. The most impor- tant discipline of Incident Management is Incident Handling, which is responding to an incident report as fast as possible [2]. The security monitoring of the growing networks today is mostly performed using Intrusion Detection Systems (IDS) [3]. Looking for evidence of malicious behavior, these systems generate streams of alerts. Trying to detect and analyze an incident, a security team has two major problems: IDSs generate lots of false alarms as well as a large number of alerts for a single event [4]. Correlation Techniques try to overcome these problems by reducing and fusing these alerts to generate a more accurate and meaningful attack reports [5]. These technologies provide very basic tools or no means of responding to security events and only detect them. The Incident Handling phase is currently depends on Computer Emergency Response Team (CERT) which faces too many incident reports everyday to handle and they are not able to respond to all of them. Attempting to overcome this shortage, Incident Handling routines have been recommended to avoid

non-important reports and enhance the performance of security teams. The famous Incident Handling recommendations pro- posed by ENISA^1 [2][6], NIST^2 [7], CERT/CC[8][9], SANS Institute[10], and ITU-T^3 [11][12]. All of these recommenda- tions are specialized for humans rather than machines and highly depends on expertise of CERTs. Hence, the cost of Incident Handling specially in large organizations grows dramatically due to keep a expert human team ready 24 × 7 [6]. Crafting a more automated Incident Handling system can obviously reduce this cost. The fact is that Incident Handling is a very complex task and depends on many factors such as type of the incident, status of the organization, business continuity, capability of expert teams, and many other factors which make automating the process very challenging. Furthermore, according to [7]: “... every incident response team relies on the exper- tise, judgment, and abilities of other teams, including management, information security, IT support, legal, public affairs, and facilities management.... ” This paper proposes a comprehensive semi-automated Incident Handling workflow, since crafting a fully automated Incident Handling workflow seems impractical. The suggested work- flow tries to minimize the human effort required for Incident Handling while covers all aspects of the procedure. The rest of the paper is organized as follows: Section 2 provides a description of Incident Handling procedure and related works. Section 3 presents the proposed workflow and describes its details. Finally, Section 4 draws conclusion remarks and future works.

II. INCIDENT HANDLING Incident Handling is the rapid detection of incidents, while minimize the effects, mitigate the causes, and restoring the affected resources [7]. CERTs as the responsible groups for Incident Handling in organizations, utilize a combinations of procedures and tools to achieve these goals.

(^1) European Network and Information Security Agency (^2) National Institute of Standard and Technology (^3) International Telecommunication Union Telecommunication Standardiza- tion Sector

6'th International Symposium on Telecommunications (IST'2012)

978-1-4673-2073-3/12/$31.00 ©2012 IEEE

There are many recommended Incident Handling procedures available. For example NIST in [7] suggests a four step approach:

  • Preparation: this phase involves establishing an incident response team, acquiring the necessary tools for Incident Handling and limiting the number of incidents that will occur by using hardening solutions.
  • Detection and Analysis: this step detects and analyses the security breaches and alert the organization whenever incidents occur.
  • Containment, Eradication and Recovery: The organiza- tion will try to mitigate the impact of the incident by containing it and ultimately recovering from it.
  • Post-Incident Activity: After the incident is handled, a report will be issued which details the cause and cost of it and the steps the organization should take to prevent future incidents.

CERT/CC also proposes a four step procedure for Incident Handling in [9]:

  • Detecting and Reporting: the ability to receive and review event information, incident reports, and security alerts.
  • Triage: the actions taken to categorize, prioritize, and assign events and incidents
  • Analysis: the attempt to determine what has happened, what impact, threat, or damage has resulted, and what recovery or mitigation steps should be followed. This can include characterizing new threats that may impact the infrastructure.
  • Incident Response: the actions taken to resolve or miti- gate an incident, coordinate and disseminate information, and implement follow-up strategies to prevent the inci- dent from happening again

In [6] ENISA suggests a three step process:

  • Receiving incident reports: to receive incident report from several available resources such as human reports, security devices and published security adversary.
  • Incident evaluation: to verify the validity of the report and determine the severity of the incident.
  • Actions: to resolve and handle the incident and prevent future incidents from happening.

Several other researchers and security groups proposed similar procedures used by CERTs around the world [8] [10] [13] [14]. They also provided multiple tools to ease the Incident Han- dling tasks. Some of them try to facilitate Incident Handling process by utilizing collaborating tools like ticketing systems [15] such as AIRT [16] and RTIR [15]. Others attempt to au- tomate decision making using data mining, expert systems and artificial intelligence in multiple stages of Incident Handling. For example, [17] tried to match most similar previously han- dled incidents, based on ontology and Case Based Reasoning (CBR). Or [18] utilize Recency, Frequency, Monetary (RFM) analysis methodology and CBR to find suspicious users and addresses. Many tools also provide technical help to CERTs for incident response such as AIRS [19] and TRM [20].

III. A COMPREHENSIVE INCIDENT HANDLING

PRINCIPLES

Regardless of many Incident Handling methods, their con- cepts look the same and it seems possible to introduce a general procedure which covers the entire procedure suggested by various sources. The relation between different phases of this procedure in famous recommendations has been shown in Table I. A comprehensive method should cover the following principles:

A. Preparations and Receiving

The capability to detect and receive security event reports must be established for Incident Handling procedure. Security event reports origins from four resources [7]:

  1. Alerts from security devices on network or hosts (e.g. alerts of an IDS).
  2. Event logs on systems (e.g. event log of Web Server).
  3. Human incident reports (e.g. report of the organization website inaccessibility).
  4. Available information on other sources (e.g. report of a new worm spreading across the world).

B. Triage

Triage is the first step in Incident Handling and refers to two very important steps in this procedure [21]:

  1. Verification: tries to verify the correctness of reported incident and determines its impact and importance.
  2. Report Assessment: attempts to prioritize the incident and find its possible related incidents. These steps are necessary for two main reasons [21]. First of all, each incident type requires its own analysis and response. Secondly, CERT resources are limited and they are not able to handle all the reported incidents. Therefore, the available resources must be used to handle the most important incidents.

C. Analysis

Analysis is the act of extracting all the information required for an appropriate incident response from available resources [7]. This information includes:

  1. Current and Potential Technical Effect of the Incident: Incident handlers should consider the likely future tech- nical effect of the incident as well as its current negative technical effect.
  2. Criticality of the Affected Resources: Resources affected by an incident have different importance to the organi- zation which probably is defined in business continuity planning efforts or Service Level Agreements (SLA). The main goal of these steps is to assign available resources for Incident Handling properly, regarding resource limitations and the existence of other incidents waiting for response.

Fig. 1. The proposed workflow: white shapes refer to processing units, purple shapes demonstrates human processing, and orange shapes draws archiving the incident and early end of the workflow.

vulnerability management, a reported incident containing an old vulnerability is mostly a false alarm and therefore should not be considered. A policy like this which is decided in higher level of administration is represented by Filters. Each filter defines an administrative policy for Incident Handling which indicates if an incident requires handling or not. If the organization keeps record of existing vulnerabilities in its assets, it is also possible to implement some filters based on this kind of information. Filters mostly require very little processing and are very fast. They also can filter out large number of false positive and non important alarms. Therefore, by utilizing enough pre-defined filters, the number of incidents needed to be processed will decrease significantly. This will also reduce the amount of human work required by the whole system. Incidents which passed all the filters are verified to be a valid alarm in Verification unit. The verification phase is a very complicated task which is too hard to be fully automatic. Therefore, using interactive machine learning methods which utilize previous human decisions is inventible. These decisions will generate the dynamic knowledge required for automatic decisions over the time. In this way, the automatic verification will works only if enough similar data resulted by manual verification exists. If such information exists, the current incident will by verified using machine learning methods. Otherwise, the workflow will pass the incident to human team for possible identification and then verification. If the automatic method verifies the incident with low confidence, the incident will be passed to expert team again. Obviously, the human interference will decrease by time. In order to analyze an incident, the very first thing to do is to specify its type and then its priority. These data are determined in Initial Classification and Prioritization. These components will determine the class of the incident (e.g. Malicious Code or Abusive Content) and, if possible, its type (e.g. Botnet or Worm in case of Malicious Code). A verified incident might be related to other incidents, currently being processed in the system. Therefore, it is very important to find these relationships. It is possible to utilize a rule engine and define relation between incidents in first order logic rules. If a relation between current and other incidents has been found, the data of older incidents will be updated and it is not required to handle the new incident any more. This unit also specifies the priority of the incident based on predefined Incident Handling routine specialized for the organization. In some special occasions, an incident might require a very quick response. In this situation, the incident will be treated especially from now on. For example, it will be sent to Analysis unit directly with highest possible priority and then will be responded as soon as possible.

B. Analysis

It is essential to assign available resources to more important incidents. This assignment relies on various information such as accessibility and quality of resources and incident related

information. In the analysis section, all of the required infor- mation for assigning available resources will be determined: incident class, assets sensitivity, technical impact, and max- imum time to start responding. Based on these information, incidents will be scheduled to be resolved. The type of the incident will be extracted automatically, if the initial classification unit could not retrieve this information. The security team will determine the incident type if the auto- matic method fails. Incident type is essential for determination of other required information and response strategy. In the next step, the technical impact of the incident will be extracted. Technical impact draws current and potential effect of incident in the organization. For instance, current effect of a worm incident is infection of a few hosts, while potential effect of it is large infection and probable huge data loss, if it is contained quickly. Therefore, the technical impact of an incident should be considered in incident response scheduling. Maximum time to start responding to an incident is another important factor in incident analysis. This criterion demon- strates the latest time in which the gathered information about the current incident or CERT reaction are valid and effective. For example, a Distributed Denial of Service incident may lasts for a very short time but costs a lot. Therefore, it requires a very quick response and the maximum possible response time is very little. Asset sensitivity of the incident is another vital factor of incident scheduling which shows critically of an asset in business continuity of the organization. For instance, an accounting server is possibly the most important asset in an ISP while a web server is not that sensitive. Bearing in mind all of these criteria, it is possible to schedule and assign available resources for resolving the incident.

C. Response and Post-Incident Activity

Response to an incident consists of three main phases:

  • Determination of Response Strategy: Response Strategy is a set of primitive actions for Identification, Contain- ment, Eradication and Recovery of an incident. This part will be done automatically base on previously extracted information. The security team will plan the response strategy, if the automated procedure fails. For example, the response strategy for a Login Attempt incident is to block the attacker address and report the incident to authorities for further investigations.
  • Mapping Strategy to Commands: Each primitive action will be translated to a sequence of device-dependent commands or human readable tasks. This mapping will depend on organization condition or available response capabilities. Similar to previous unit, this task may be done manually, if the automatic process fails.
  • Execution: The execution of commands are not included in this workflow and should be done by CERTs or other facilities like security devices (e.g firewalls, IDPs and ... ). If execution failed to stop the incident another response strategy will be generated and will be mapped

to a sequence of commands. This loop continues till the incident stops. After a successfully execution of a response strategy, some post-incident activities can be done. These activities which aim for improvement of the organization security against incidents include:

  • Improvement of CERTs knowledge: Every handled in- cident provides some valuable experience which can be used by CERTs to update their knowledge about incidents.
  • Enhancement of Incident Handling procedure: Knowl- edge that used by workflow processing units can be adjusted by experience learned during Incident Handling. This will improve the performance of the automatic parts of the procedure significantly and will be useful for manual parts as well.
  • Mitigation of organizations weaknesses: Any incident is the result of one or more than one weaknesses in the organization. These weaknesses must be addressed using the documents and reports generated at the end of Incident Handling.
  • Further Investigations: The final report of the incident can be used by other parties (e.g. forensics teams and lawsuits) for further investigations. V. CONCLUSION In this article, a comprehensive semi-automated Incident Handling workflow has been proposed. To the best of our knowledge, the proposed workflow covers all the common Incident Handling principles. It tries to minimize the human interference over time, while it uses the help of security teams whenever required. This workflow is flexible enough to adjust itself with the organization policies and business continuity plan. The future work would be to evaluate the performance of the workflow in several situations (e.g. industrial scope, health facilities... ). Furthermore, a more thoroughly investigation of each part of the workflow currently being used must be done to determine if any improvement can be made. These improvements must focus on both algorithms being used (especially in learning parts) and the required knowledge. ACKNOWLEDGMENT The authors would like to thank Research Institute of ICT, especially Dr. H. Gharaei, Mr. A. Soozani, and Mr. A. R. Ghaznavi for their support. REFERENCES [1] J. McHugh, “Intrusion and intrusion detection,” Int. J. Inf. Sec., vol. 1, no. 1, pp. 14–35, 2001. [2] E. Network and information Security Agency, “Good practice guide for incident management,” p. 110, 2010. [3] F. Valeur, G. Vigna, C. Kruegel, and R. Kemmerer, “Comprehensive approach to intrusion detection alert correlation,” Dependable and Secure Computing, IEEE Transactions on, vol. 1, no. 3, pp. 146–169,

[4] G. Vigna, “Network intrusion detection: dead or alive?” in Proceedings of the 26th Annual Computer Security Applications Conference. ACM, 2010, pp. 117–126.