Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Analyzing Databases to Uncover Storage Device Connections: A Forensics Study, Slides of Forensics

University of Sydney (US)Forensics

The creation of a database to analyze digital data from secondary storage devices to find connections and reduce processing time for forensic analysis. the motivation behind the research, the growing amount of data, current challenges in forensic analysis, and methods for identifying duplicated material. It also explores statistical information and data structures that can be used to improve analysis efficiency.

Typology: Slides

2021/2022

Uploaded on 07/05/2022

barbara_gr 🇦🇺

4.6

(73)

1K documents

1 / 55

This page cannot be seen from the preview

Don't miss anything!

NAVAL

POSTGRADUATE

SCHOOL

MONTEREY, CALIFORNIA

THESIS

DATABASE CREATION AND STATISTICAL ANALYSIS:

FINDING CONNECTIONS BETWEEN TWO OR MORE

SECONDARY STORAGE DEVICES

Jennifer M. Johnson

September 2017

Thesis Advisor: Neil C. Rowe

Second Reader: Michael R. McCarrin

Approved for public release; distribution is unlimited

Discover Slides of Forensics University of Sydney (US)

Partial preview of the text

Download Analyzing Databases to Uncover Storage Device Connections: A Forensics Study and more Slides Forensics in PDF only on Docsity!

NAVAL

POSTGRADUATE

SCHOOL

MONTEREY, CALIFORNIA

THESIS

DATABASE CREATION AND STATISTICAL ANALYSIS:

FINDING CONNECTIONS BETWEEN TWO OR MORE

SECONDARY STORAGE DEVICES

by Jennifer M. Johnson September 2017

Thesis Advisor: Neil C. Rowe Second Reader: Michael R. McCarrin

Approved for public release; distribution is unlimited

THIS PAGE INTENTIONALLY LEFT BLANK

Approved for public release; distribution is unlimited

DATABASE CREATION AND STATISTICAL ANALYSIS: FINDING

CONNECTIONS BETWEEN TWO OR MORE SECONDARY STORAGE

DEVICES

Jennifer M. Johnson Civilian, Department of Defense B.S., San José State University, 2005

Submitted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE IN COMPUTER SCIENCE

from the NAVAL POSTGRADUATE SCHOOL September 2017

Approved by: Neil C. Rowe Thesis Advisor

Michael R. McCarrin Second Reader

Peter J. Denning Chair, Department of Computer Science

iii

ABSTRACT

We have created a database to analyze digital data to find connections between two or more different secondary storage devices. We used MongoDB and created a document for each secondary-storage image and each unique sector. Ingesting the secondary-storage images took so much time that we had to carefully consider all the reasons for the slow down and experiment on different ways to insert the data. Using a partial database, we found the fraction of space that is empty (contains NULLS), per secondary-storage image and for the entire database. We found duplicate images. Future students may continue to grow the database. Rather then make the goal a completed database, the students will analyze the current data and add to the database.

THIS PAGE INTENTIONALLY LEFT BLANK

viii

List of Figures

Figure 2.1 Partition Table Layout, mmls Command Output.......... 9

Figure 2.2 Example of SQL table........................ 10

Figure 3.1 Four pieces of useful information.................. 16

Figure 3.2 The _id command used to identify each image in MongoDB... 17

Figure 3.3 Sector layer schema for MongoDB................. 19

Figure 3.4 MongoDB Command....................... 20

Figure 3.5 Histogram of times for inserting secondary-storage images smaller than 500 Mb into the database.................... 22

Figure 3.6 Inserting secondary-storage images that are smaller then approximately 500 Mb............................... 23

Figure 4.1 A MongoDB Command to find most common MD5 hash..... 25

Figure 4.2 Most common hash with about 980 images inserted........ 25

List of Acronyms and Abbreviations

B bytes

CPU central processing units

CS Computer Science

DEEP Digital Evaluation and Exploitation

DoD Department of Defense

EWF Expert Witness Compression Format

FBI Federal Bureau of Investigation

GB gigabytes

GPU graphical processing units

KiB kibibyte

MB megabytes

MD5 message digest 5

ME Mechanical Engineer

NSF National Science Foundation

NIST National Institute of Standards and Technology

NSRL National Software Reference Library

NTFS New Technology File System

NUS non United States

RAM random access memory

RDC Real Data Corpus

SHA-1 secure hash algorithm 1

SFS Scholarship For Service

SQL structured query language

TB terabytes

RCFL Regional Computer Forensics Laboratory

TSK The Sleuth Kit

xii

THIS PAGE INTENTIONALLY LEFT BLANK

xiv

CHAPTER 1: Introduction

1.1 The Problem and Motivation We address two problems. The first is managing large-scale heterogeneous digital-forensic data. The second is finding a digitally forensic connection between two or more secondary-storage devices. The National Institute of Standards and Technology (NIST) defines digital forensics as “the application of science to the identification, collection, examination, and analysis of data while preserving the integrity of the information and maintaining a strict chain of custody for the data” [1].

The growing amount of data is our motivation. In recent years, the per-gigabyte price of data has been steadily decreasing [2]. It is common for the average consumer to purchase terabytes of digital storage space. As a consequence, law enforcement agencies and cyber divisions in the Department of Defense (DoD), have acquired terabytes of data while collecting criminal evidence. The Regional Computer Forensics Laboratory (RCFL), established by the FBI, has annual reports and they noted that the Chicago lab, just one of the 15 labs, had collected and processed 580 TB of digital data in one year [3].

Currently, examiners process data on secondary-storage images drive-by-drive using forensic tools designed to run on a single workstation. Each drive is considered separately, and little work is done to correlate information across different images. From an analyst’s perspective, this approach means important information may be missed. For example, there is no organized effort to detect collaboration or communication between owners of devices acquired at different times. Likewise, little has been done to study large-scale patterns in acquired data. Studying trends in data may offer insight into longstanding forensic analysis problems. Carving deleted files, for example is a longstanding forensic problem, because it can be time intensive. File carving is the method of detecting a file signature and then extracting the data associated with it [4].

A tactic that can reduce the processing time required for file carving is matching blocks that reside in allocated space with those blocks in unallocated space. Allocation means the

Device name
Device hash
Number of sectors
Sector size
Device type
Total disk size
Number of partitions
Partition offsets
Recognizability of the partition?
Volume system type
Block size of volume
Partition type
Partition allocation
Description of partition
File system type
Block size of file system
Number of blocks in files system
Sector offset of file system

Category two is comprised of features that require more extensive analysis to measure:

Fraction of space that is empty (or contains NULLS)
Fraction of space that is unallocated or allocated
Fraction of space that is unallocated and non-empty
Fraction of non-empty unallocated space that matches allocated space
Average (2-byte Shannon) entropy score of non-empty sectors

In order to gather statistical information on all the secondary-storage images on the non United States (NUS) portion of the RDC, we first need to create a database for our analysis. We have two important steps. Step 1a is building the database and step 1b is the analysis. We have 124,104,544,671,744 bytes (B) of data in the NUS portion of the RDC. An important research question is how long will it take to build a database of sector hashes?

1.4 Thesis Structure In Chapter two we cover the background and related work. In Chapter three we discuss the methodology. In Chapter four we discuss our results. In Chapter five we discuss our conclusions and future work.

Analyzing Databases to Uncover Storage Device Connections: A Forensics Study, Slides of Forensics

Related documents

Partial preview of the text

Download Analyzing Databases to Uncover Storage Device Connections: A Forensics Study and more Slides Forensics in PDF only on Docsity!

NAVAL

POSTGRADUATE

SCHOOL

MONTEREY, CALIFORNIA

THESIS

DATABASE CREATION AND STATISTICAL ANALYSIS:

FINDING CONNECTIONS BETWEEN TWO OR MORE

SECONDARY STORAGE DEVICES

THIS PAGE INTENTIONALLY LEFT BLANK

THIS PAGE INTENTIONALLY LEFT BLANK

DATABASE CREATION AND STATISTICAL ANALYSIS: FINDING

CONNECTIONS BETWEEN TWO OR MORE SECONDARY STORAGE

DEVICES

MASTER OF SCIENCE IN COMPUTER SCIENCE

ABSTRACT

THIS PAGE INTENTIONALLY LEFT BLANK

THIS PAGE INTENTIONALLY LEFT BLANK

THIS PAGE INTENTIONALLY LEFT BLANK