





























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A comprehensive overview of reverse engineering and program analysis techniques used in software engineering. It delves into the process of extracting information from source code to understand system specifications, design, and behavior. Various levels of abstraction, including function and data abstraction, and examines different types of program analysis, such as control flow analysis, data flow analysis, and program slicing. It also discusses the importance of visualization and program metrics in software engineering.
Typology: Lecture notes
1 / 37
This page cannot be seen from the preview
Don't miss anything!






























Software Reverse Engineering ❖ Understanding a software system precedes any type of change. ❖ The comprehension process takes up a great deal of the total time spent on carrying out the change. ❖ The reasons for this include incorrect, out-of-date or non-existent documentation, the complexity of the system and a lack of sufficient domain knowledge on the part of the maintainer. ❖ Reverse engineering is a technique to abstract from the source code relevant information about the system, such as the specification and design, in a form that promotes understanding. ➢ It is the process of analyzing a subject system to identify the system's components and their interrelationships and create representations of the system in another form or at higher levels of abstraction.
❖ The goal of reverse engineering is to facilitate change by allowing a software system to be understood in terms of what it does, how it works and its architectural representation. ❖ The objectives in pursuit of this goal are:
Abstraction ❖ Abstraction is achieved by highlighting the important features of the subject system and ignoring the irrelevant ones. ❖ There are three types of abstraction that can be performed on software systems: I. Function Abstraction : eliciting functions from the target system - those aspects which operate on data objects and produce the corresponding output. II. Data Abstraction : eliciting from the target system data objects as well as the functions that operate on them. ✓ An example of data abstraction at the design level in object-oriented systems is the encapsulation of an object type and its associated operations in a module or class. III. Process Abstraction : This is the abstracting from the target system of the exact order in which operations are performed. ✓ Concurrent processes communicate via shared data that is stored in a designated memory space. ✓ Distributed processes usually communicate through 'message passing' and have no shared data area.
Factors that motivate the application of reverse engineering
Program Analysis ❖ Program Analysis is the process of automatically analyzing the behavior of computer programs regarding a property such as correctness , robustness and safety. ❖ Extracting information, in order to present abstractions of, or answer questions about, a software system. Static Analysis ➢ Examines the source code. ➢ Static code analysis is a method of debugging by examining source code before a program is run. It’s done by analyzing a set of code against a set (or multiple sets) of coding rules. Dynamic Analysis ➢ Examines the system as it is executing. ➢ Dynamic analysis is the testing and evaluation of a program by executing data in real-time. ➢ The objective is to find errors in a program while it is running.
Entities ❖ An entity In is a distinct and identifiable object or concept within a system. ❖ It can represent a real-world object, like a person, a car, or a house, or an abstract concept, like a user account, an order, or a financial transaction. ❖ An entity is denoted as a rectangle in an ER diagram. ❖ For example, in a school database, students, teachers, classes, and courses offered can be treated as entities. ❖ Entities are individuals that live in the system, and attributes associated with them. ❖ Some examples: ➢ Classes, along with information about their superclass, their scope, and ‘where’ in the code they exist. ➢ Methods/functions and what their return type or parameter list is, etc. ➢ Variables and what their types are, and whether or not they are static, etc.
Relationships ❖ Relationships are interactions between the entities in the system. ❖ Relationships include: ➢ Classes inheriting from one another. ➢ Methods in one class calling the methods of another class, and methods within the same class calling one another. ➢ A method referencing an attribute.
TECHNIQUES USED FOR REVERSE ENGINEERING ❖ To extract information which is not explicitly available in source code, automated analysis techniques are used. ❖ The well-known analysis techniques that facilitate reverse engineering are I. Lexical analysis II. Syntactic analysis III. Control flow analysis IV. Data flow analysis V. Program slicing VI. Visualization VII.Program metrics
I. Lexical Analysis: The Lexical analysis is the process of decomposing the sequence of characters in the source code into its constituent lexical units. ➢ Various useful representations of program information are enabled by lexical analysis. ➢ The most widely used program information is the cross- reference listing. ➢ A program performing lexical analysis is called a lexical analyzer, and it is a part of a programming language’s compiler. ➢ Typically, it uses rules describing lexical program structures that are expressed in a mathematical notation called regular expressions. ➢ Modern lexical analyzers are automatically built using tools called lexical analyzer generators, namely, lex and flex
II. Syntactic Analysis: ❖ Compilers and other tools such as interpreters determine the expressions, statements and modules of a program. ❖ Syntactic analysis is performed by a parser. ❖ The requisite language properties are expressed in a mathematical formalism called context-free grammars. ❖ Usually, these grammars are described in a notation called Backus–Naur Form (BNF). ❖ In the BNF notation, the various program parts are defined by rules in terms of their constituents. ❖ Similar to syntactic analyzers, parsers can be automatically constructed from a description of the programmatical properties of a programming language. ❖ YACC is one of the most commonly used parsing tools.
Two types of representations are used to hold the results of syntactic analysis: parse tree and abstract syntax tree. a) Parse tree is the more primitive one of the two.
b) Removal of those extraneous details produces a structure called an Abstract Syntax Tree (AST).
Interprocedural analysis: ❖ Interprocedural analysis is performed by constructing a call graph. ❖ Calling relationships between subroutines in a program are represented as a call graph which is basically a directed graph. ❖ Specifically, a procedure in the source code is represented by a node in the graph, and the edge from node f to g indicates that procedure f calls procedure g. ❖ Call graphs can be static or dynamic. A dynamic call graph is an execution trace of the program. ❖ Thus a dynamic call graph is exact, but it only describes one run of the program. ❖ On the other hand, a static call graph represents every possible run of the program.
IV. Data Flow Analysis: ❖ Data flow analysis (DFA) concerns how values of defined variables flow through and are used in a program. ❖ CFA can detect the possibility of loops, whereas DFA can determine data flow anomalies. ❖ One example of data flow anomaly is that an undefined variable is referenced. ❖ Another example of data flow anomaly is that a variable is successively defined without being referenced in between. ❖ Data flow analysis enables the identification of code that can never execute, variables that might not be defined before they are used, and statements that might have to be altered when a bug is fixed.