Buffer Overflow 2-Computer Sciences Applications-Project Report, Study Guides, Projects, Research of Applications of Computer Sciences

This report is for final year project to complete degree in Computer Science. It emphasis on Applications of Computer Sciences. It was supervised by Dr. Abhisri Yashwant at Bengal Engineering and Science University. Its main points are: Buffer, Overflows, Literature, Review, Register, Memory, Addressing, Architecture, Experimenting, Hardware

Typology: Study Guides, Projects, Research

2011/2012

Uploaded on 07/18/2012

padmini
padmini 🇮🇳

4.4

(207)

175 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
i
Table of Contents
1. Introduction.……...……………………………………………………………….01
1.1 Buffer Overflows.................................................................................…...01
1.2 Why Buffer Overflows...…………………………………………………01
2. Goals Achieved…………..……………………………………………………….02
2.1 Literature Review………..…………………………………………….....02
2.1.1 Register…………………………………………………………… 02
2.1.2 Memory addressing………………………………………………..06
2.1.3 x86 architecture for memory………………………………………07
2.1.4 Win32 Assembly…………………………………………………..08
2.2 The stack………………………………………………...………………..08
2.3 The method…………………………………....……...………………..10
2.4 Experimenting of Buffer Overflows…………………...……..…………10
3. Future Goals………………………………………………...…………………….13
4. Key Terminologies…..…………………………………………………………...13
4.1 Hardware Terminologies............................................................................13
4.2 Software Terminologies..............................................................................14
4.3 Security Terminologies...............................................................................20
5. References.................................................................................................................22
5.1 Related to Books..........................................................................................22
5.2 Related to URL............................................................................................22
5.3 Related to Development tool.......................................................................22
docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download Buffer Overflow 2-Computer Sciences Applications-Project Report and more Study Guides, Projects, Research Applications of Computer Sciences in PDF only on Docsity!

i

    1. Introduction.……...………………………………………………………………. Table of Contents
      • 1.1 Buffer Overflows.................................................................................…...
      • 1.2 Why Buffer Overflows...…………………………………………………
    1. Goals Achieved…………..……………………………………………………….
    • 2.1 Literature Review………..……………………………………………..... - 2.1.1 Register …………………………………………………………… - 2.1.2 Memory addressing ……………………………………………….. - 2.1.3 x86 architecture for memory ……………………………………… - 2.1.4 Win32 Assembly …………………………………………………..
      • 2.2 The stack………………………………………………...………………..
      • 2.3 The method…………………………………...….……...………………..
    • 2.4 Experimenting of Buffer Overflows…………………...……..…………
    1. Future Goals………………………………………………...…………………….
    1. Key Terminologies…..…………………………………………………………...
    • 4.1 Hardware Terminologies............................................................................
    • 4.2 Software Terminologies..............................................................................
    • 4.3 Security Terminologies...............................................................................
    1. References.................................................................................................................
    • 5.1 Related to Books..........................................................................................
    • 5.2 Related to URL............................................................................................
    • 5.3 Related to Development tool.......................................................................

ii

List of figures

Figure 1 x86 register set. .................................................................................................. 6 Figure 2 Push operations in a stack................................................................................. 9 Figure 3 Disassembled code, showing the start if the function. .................................. 12 Figure 4 Result of a successful buffer overflow............................................................ 13

iv

Abstract

A great problem in the field of software development is being faced by software developers and that is buffer overflow. Due to buffer overflow a large number of software lost their security. So in this project what I am going to do is to develop a tool that will help ease developers by finding buffer overflows (on a small scale). So for this tool first I am going to learn different aspects of buffer overflows and then make a tool that simulates me. So I am hopeful that at the end of this project at least I will be capable of testing different software for buffer overflow venerability and at most a tool will be developed that will be capable of performing tests for buffer overflow.

1. Introduction

Software engineering is an extremely difficult task and of all software creation related professions, software architects have quite possibly the most difficult task. Initially, software architects were only responsible for the high-level design of the products. More often than not this included protocol selection, third-party component evaluation and selection, and communication medium selection. We make no argument here that these are all valuable and necessary objectives for any architect, but today the job is much more difficult. It requires an intimate knowledge of operating systems, software languages, and their inherent advantages and disadvantages in regards to different platforms. Additionally, software architects face increasing pressure to design flexible software that is impenetrable to wily hackers which is a near impossible task in itself.

1.1 Buffer Overflows

Buffer overflows are proof that the computer science, or software programming, community still does not have an understanding (more importantly, firm knowledge) of how to design, create, and implement secure code. Like it or not, all buffer overflows are a product of poorly constructed software programs. These programs may have multiple deficiencies such as stack overflows, heap corruption, format string bugs, and race conditions—the first three commonly being referred to as simply buffer overflows. Buffer overflows can be as small as one misplaced character in a million-line program or as complex as multiple character arrays that are inappropriately handled.

1.2 Why Buffer Overflows

Buffer overflows may cause a process to crash or produce incorrect results. They can be triggered by inputs specifically designed to execute malicious code or to make the program operate in an unintended way. As such, buffer overflows cause many software vulnerabilities and form the basis of many exploits. Sufficient bounds checking by either the programmer or the compiler can prevent buffer overflows. Contrary to popular belief,

input and output, translate and decimal arithmetic. Where as AH (Higher part of lower two bytes) is used only of byte multiply and bytes divide. EBX (base) The extended base register is used only for translation. Here the translation refers to the translation of memory addresses. It can also be combined with SI and DI for combined indexing. ECX (counter) The extended counter register is used for counting and loops. This register automatically increments in a loop instruction. EDX (data) The extended data register is used for word multiply, word divide and indirect input and output operations.

2.1.1.2 Segment Segment registers are used for base locations for program instructions, data and stack. All references to memory involve a segment register used as a base location. CS (code segment)
The processor uses CS segment for all accesses to instructions referenced by instruction pointer (IP) register. CS register cannot be changed directly. The CS register is automatically updated during far jump, far call and far return instructions. DS (data segment) By default, the processor assumes that all data referenced by general registers (AX, BX, CX, DX) and index register (SI, DI) is located in the data segment. DS register can be changed directly using POP and LDS instructions. SS (stack segment) By default, the processor assumes that all data referenced by the stack pointer (SP) and base pointer (BP) registers is located in the stack segment. SS register can be changed directly using POP instruction.

ES (extra segment) By default, the processor assumes that the DI register references the ES segment in string manipulation instructions. ES register can be changed directly using POP and LES instructions. FS & GS These registers are retentively new to the x86 architecture. These registers are used for referring to segments in the memory.

2.1.1.3 Index Contain the offsets of data and instructions. The term offset refers to the distance of a variable or instruction from its base segment. EBP (base pointer) Extended base pointer is usually used for based, based indexed or register indirect addressing. ESP (stack pointer) Extended stack pointer is a 32-bit register pointing to program stack. ESI (source index) Extended stack index is used for indexed, based indexed and register indirect addressing, as well as a source data addressing in string manipulation instructions. EDI (destination index) Extended destination index is used for indexed, based indexed and register indirect addressing, as well as a destination data addressing in string manipulation instructions.

2.1.1.4 Control EIP (instruction pointer) The extended instruction pointer identifies the location of the next word of the instruction code to be fetched from the current code segment of the memory.

Figure 1 x86 register set.

2.1.2 Memory addressing

In a x86 architecture the following are the addressing modes used for addressing the memory locations.  Implied - the data value/data address is implicitly associated with the instruction.  Register - references the data in a register or in a register pair.  Immediate - the data is provided in the instruction.

Direct - the instruction operand specifies the memory address where data is located.  Register indirect - instruction specifies a register containing an address, where data is located. This addressing mode works with SI, DI, BX and BP registers.  Based - 8-bit or 16-bit instruction operand is added to the contents of a base register (BX or BP), the resulting value is a pointer to location where data resides.  Indexed - 8-bit or 16-bit instruction operand is added to the contents of an index register (SI or DI), the resulting value is a pointer to location where data resides.  Based Indexed - the contents of a base register (BX or BP) is added to the contents of an index register (SI or DI), the resulting value is a pointer to location where data resides.  Based Indexed with displacement - 8-bit or 16-bit instruction operand is added to the contents of a base register (BX or BP) and index register (SI or DI), the resulting value is a pointer to location where data resides.

2.1.3 x86 architecture for memory

Processor architectures are roughly divided into little-endian and bigendian, according to the way multibyte data is stored in memory. If the processor stores the least significant byte of a multibyte word at a higher address and the most significant at a lower address, this is the big-endian method. If the situation is reversed—the least significant byte is stored at the lowest address in memory and higher bytes at increasing addresses—this is a little-endian system. For example, a 4- byte word 0x12345678 stored at an address 0x400 on a big-endian machine would be placed in memory as follows: 0x400 0x 0x401 0x 0x402 0x 0x403 0x

removed, or popped from the stack. Hence, the stack is a LIFO (first-in, last-out) data structure. For clarity, let’s illustrate this.

Figure 2 Push operations in a stack.

A push operation copies a value onto the stack. When a new value is pushed, ESP (the stack pointer) is decremented. ESP always points to the last value pushed. The PUSH instruction is used to accomplish this. The PUSH instruction does not change the contents of EAX, but rather it copies the contents of EAX onto the stack. As more values are pushed, the stack continues to grow downward in memory. A pop operation removes a value from the top of the stack and places it in a register or variable. After the value is popped from the stack, the stack pointer is incremented to point to the previous value on the stack.

2.3 THE METHOD

For a buffer overrun attack to be possible and be successful, the following events must occur, and in this order:

  1. A buffer overflow vulnerability must be found, discovered, or identified.
  2. The size of the buffer must be determined.
  3. The attacker must be able to control the data written into the buffer.
  4. There must be security sensitive variables or executable program instructions stored below the buffer in memory.
  5. Targeted executable program instructions must be replaced with other executable instructions. Let’s look at each of these five conditional steps individually.

2.5 Experimenting of Buffer Overflows.

As presented in my previous work, I have been working on buffer overflows. So here implemented a small piece of c++ code that will show how buffer overflow works. The following code is a simple login program that takes a user password and compares it with the one already stored and if there is a match between the two, a message "You are welcomed!" is displayed otherwise the program just terminates.

#include<iostream.h> #include<conio.h> #include<string.h>

void enter();

int test() { char ch[10]; cout<<"Please enter password:";

Figure 3 Disassembled code, showing the start if the function.

Since Intel used little endian architecture as mentioned above, I flipped this address in two byte form and got 531140. After this I converted this code into decimal form which became 831764. Now what I did is that I gave the executable code to OLLYDBG and in the input I gave it a sequence of characters so that an uncontrolled buffer overflow occurs. After which I noted the length of characters that is needed to reach the extended instruction pointer, and that came to be sixteen. So after this I again ran the code and gave it the following input: AAAAAAAAAAAAAAAAS◄@ And at last I got a controlled buffer overflow with that message that comes with a correct password. Here it is clear that any body having some knowledge about memory access can gain access through this login code.

Figure 4 Result of a successful buffer overflow.

3. Future Goals

Since up till now all the buffer overflows have been done on a local machine so in the 8th semester the emphasis will be placed on the study of remote buffer overflows. And here by remotely it means that the buffer overflow will be done across a local area network and not on the internet.

4. Key Terminologies

One of the most daunting tasks for any security professional is to stay on top of the latest terms, slang, and definitions that drive new products, technologies, and services. While most of the slang is generated these days online via chat sessions, specifically IRC, it is also being passed around in white papers, conference discussions, and just by word of mouth. Since buffer overflows will dive into code, complex computer and software topics, and techniques for automating exploitation, we felt it necessary to document some of the commonest terms just to ensure that everyone is on the same page.

4.1 Hardware Terminologies

The following definitions are commonly utilized to describe aspects of computers and their component hardware as they relate to security vulnerabilities:

interpreters for each system interpret byte-code faster than is possible by fully interpreting a high-level language. ■ Compilers Compilers make it possible for programmers to benefit from high-level programming languages, which include modern features such as encapsulation and inheritance. ■ Data Hiding Data hiding is a feature of object-oriented programming languages. Classes and variables may be marked private, which restricts outside access to the internal workings of a class. In this way, classes function as ―black boxes,‖ and malicious users are prevented from using those classes in unexpected ways. ■ Data Type A data type is used to define variables before they are initialized. The data type specifies the way a variable will be stored in memory and the type of data the variable holds. ■ Debugger A debugger is a software tool that either hooks in to the runtime environment of the application being debugged or acts similar to (or as) a virtual machine for the program to run inside of. The software allows you to debug problems within the application being debugged. The debugger permits the end user to modify the environment, such as memory, that the application relies on and is present in. The two most popular debuggers are GDB (included in nearly every open source *nix distribution) and Softice (www.numega.com). ■ Disassembler Typically, a software tool is used to convert compiled programs in machine code to assembly code. The two most popular disassemblers are objdump (included in nearly every open source *nix distribution) and the far more powerful IDA (www.datarescue.com). ■ DLL A Dynamic Link Library (DLL) file has an extension of ―.dll‖. A DLL is actually a programming component that runs on Win32 systems and contains functionality that is used by many other programs. The DLL makes it possible to break code into smaller components that are easier to maintain, modify, and reuse by other programs. ■ Encapsulation Encapsulation is a feature of object-oriented programming. Using classes, object-oriented code is very organized and modular. Data structures, data, and methods to perform operations on that data are all encapsulated within the class structure.

Encapsulation provides a logical structure to a program and allows for easy methods of inheritance. ■ Function A function may be thought of as a miniature program. In many cases, a programmer may wish to take a certain type of input, perform a specific operation and output the result in a particular format. Programmers have developed the concept of a function for such repetitive operations. Functions are contained areas of a program that may be called to perform operations on data. They take a specific number of arguments and return an output value. ■ Functional Language Programs written in functional languages are organized into mathematical functions. True functional programs do not have variable assignments; lists and functions are all that is necessary to achieve the desired output. ■ GDB The GNU debugger (GDB) is the defacto debugger on UNIX systems. GDB is available at: http://sources.redhat.com/gdb/. ■ Heap The heap is an area of memory utilized by an application and is allocated dynamically at runtime. Static variables are stored on the stack along with data allocated using the malloc interface. ■ Inheritance Object-oriented organization and encapsulation allow programmers to easily reuse, or ―inherit,‖ previously written code. Inheritance saves time since programmers do not have to recode previously implemented functionality. ■ Integer Wrapping In the case of unsigned values, integer wrapping occurs when an overly large unsigned value is sent to an application that ―wraps‖ the integer back to zero or a small number. A similar problem exists with signed integers: wrapping from a large positive number to a negative number, zero, or a small positive number. With signed integers, the reverse is true as well: a ―large negative number‖ could be sent to an application that ―wraps‖ back to a positive number, zero, or a smaller negative number. ■ Interpreter An interpreter reads and executes program code. Unlike a compiler, the code is not translated into machine code and then stored for later re-use. Instead, an interpreter reads the higher-level source code each time. An advantage of an interpreter is that it aids in platform independence. Programmers do not need to compile their source code for multiple platforms. Every system which has an interpreter for the language will