Analyzing Databases to Uncover Storage Device Connections: A Forensics Study, Slides of Forensics

The creation of a database to analyze digital data from secondary storage devices to find connections and reduce processing time for forensic analysis. the motivation behind the research, the growing amount of data, current challenges in forensic analysis, and methods for identifying duplicated material. It also explores statistical information and data structures that can be used to improve analysis efficiency.

Typology: Slides

2021/2022

Uploaded on 07/05/2022

barbara_gr
barbara_gr 🇦🇺

4.6

(73)

1K documents

1 / 55

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
NAVAL
POSTGRADUATE
SCHOOL
MONTEREY, CALIFORNIA
THESIS
DATABASE CREATION AND STATISTICAL ANALYSIS:
FINDING CONNECTIONS BETWEEN TWO OR MORE
SECONDARY STORAGE DEVICES
by
Jennifer M. Johnson
September 2017
Thesis Advisor: Neil C. Rowe
Second Reader: Michael R. McCarrin
Approved for public release; distribution is unlimited
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37

Partial preview of the text

Download Analyzing Databases to Uncover Storage Device Connections: A Forensics Study and more Slides Forensics in PDF only on Docsity!

NAVAL

POSTGRADUATE

SCHOOL

MONTEREY, CALIFORNIA

THESIS

DATABASE CREATION AND STATISTICAL ANALYSIS:

FINDING CONNECTIONS BETWEEN TWO OR MORE

SECONDARY STORAGE DEVICES

by Jennifer M. Johnson September 2017

Thesis Advisor: Neil C. Rowe Second Reader: Michael R. McCarrin

Approved for public release; distribution is unlimited

THIS PAGE INTENTIONALLY LEFT BLANK

THIS PAGE INTENTIONALLY LEFT BLANK

ii

Approved for public release; distribution is unlimited

DATABASE CREATION AND STATISTICAL ANALYSIS: FINDING

CONNECTIONS BETWEEN TWO OR MORE SECONDARY STORAGE

DEVICES

Jennifer M. Johnson Civilian, Department of Defense B.S., San José State University, 2005

Submitted in partial fulfillment of the requirements for the degree of

MASTER OF SCIENCE IN COMPUTER SCIENCE

from the NAVAL POSTGRADUATE SCHOOL September 2017

Approved by: Neil C. Rowe Thesis Advisor

Michael R. McCarrin Second Reader

Peter J. Denning Chair, Department of Computer Science

iii

ABSTRACT

We have created a database to analyze digital data to find connections between two or more different secondary storage devices. We used MongoDB and created a document for each secondary-storage image and each unique sector. Ingesting the secondary-storage images took so much time that we had to carefully consider all the reasons for the slow down and experiment on different ways to insert the data. Using a partial database, we found the fraction of space that is empty (contains NULLS), per secondary-storage image and for the entire database. We found duplicate images. Future students may continue to grow the database. Rather then make the goal a completed database, the students will analyze the current data and add to the database.

v

THIS PAGE INTENTIONALLY LEFT BLANK

vi

THIS PAGE INTENTIONALLY LEFT BLANK

viii

List of Figures

Figure 2.1 Partition Table Layout, mmls Command Output.......... 9

Figure 2.2 Example of SQL table........................ 10

Figure 3.1 Four pieces of useful information.................. 16

Figure 3.2 The _id command used to identify each image in MongoDB... 17

Figure 3.3 Sector layer schema for MongoDB................. 19

Figure 3.4 MongoDB Command....................... 20

Figure 3.5 Histogram of times for inserting secondary-storage images smaller than 500 Mb into the database.................... 22

Figure 3.6 Inserting secondary-storage images that are smaller then approximately 500 Mb............................... 23

Figure 4.1 A MongoDB Command to find most common MD5 hash..... 25

Figure 4.2 Most common hash with about 980 images inserted........ 25

ix

List of Acronyms and Abbreviations

B bytes

CPU central processing units

CS Computer Science

DEEP Digital Evaluation and Exploitation

DoD Department of Defense

EWF Expert Witness Compression Format

FBI Federal Bureau of Investigation

GB gigabytes

GPU graphical processing units

KiB kibibyte

MB megabytes

MD5 message digest 5

ME Mechanical Engineer

NSF National Science Foundation

NIST National Institute of Standards and Technology

NSRL National Software Reference Library

NTFS New Technology File System

NUS non United States

RAM random access memory

xi

RDC Real Data Corpus

SHA-1 secure hash algorithm 1

SFS Scholarship For Service

SQL structured query language

TB terabytes

RCFL Regional Computer Forensics Laboratory

TSK The Sleuth Kit

xii

THIS PAGE INTENTIONALLY LEFT BLANK

xiv

CHAPTER 1: Introduction

1.1 The Problem and Motivation We address two problems. The first is managing large-scale heterogeneous digital-forensic data. The second is finding a digitally forensic connection between two or more secondary-storage devices. The National Institute of Standards and Technology (NIST) defines digital forensics as “the application of science to the identification, collection, examination, and analysis of data while preserving the integrity of the information and maintaining a strict chain of custody for the data” [1].

The growing amount of data is our motivation. In recent years, the per-gigabyte price of data has been steadily decreasing [2]. It is common for the average consumer to purchase terabytes of digital storage space. As a consequence, law enforcement agencies and cyber divisions in the Department of Defense (DoD), have acquired terabytes of data while collecting criminal evidence. The Regional Computer Forensics Laboratory (RCFL), established by the FBI, has annual reports and they noted that the Chicago lab, just one of the 15 labs, had collected and processed 580 TB of digital data in one year [3].

Currently, examiners process data on secondary-storage images drive-by-drive using forensic tools designed to run on a single workstation. Each drive is considered separately, and little work is done to correlate information across different images. From an analyst’s perspective, this approach means important information may be missed. For example, there is no organized effort to detect collaboration or communication between owners of devices acquired at different times. Likewise, little has been done to study large-scale patterns in acquired data. Studying trends in data may offer insight into longstanding forensic analysis problems. Carving deleted files, for example is a longstanding forensic problem, because it can be time intensive. File carving is the method of detecting a file signature and then extracting the data associated with it [4].

A tactic that can reduce the processing time required for file carving is matching blocks that reside in allocated space with those blocks in unallocated space. Allocation means the

1

  • Device name
  • Device hash
  • Number of sectors
  • Sector size
  • Device type
  • Total disk size
  • Number of partitions
  • Partition offsets
  • Recognizability of the partition?
  • Volume system type
  • Block size of volume
  • Partition type
  • Partition allocation
  • Description of partition
  • File system type
  • Block size of file system
  • Number of blocks in files system
  • Sector offset of file system

Category two is comprised of features that require more extensive analysis to measure:

  • Fraction of space that is empty (or contains NULLS)
  • Fraction of space that is unallocated or allocated
  • Fraction of space that is unallocated and non-empty
  • Fraction of non-empty unallocated space that matches allocated space
  • Average (2-byte Shannon) entropy score of non-empty sectors

In order to gather statistical information on all the secondary-storage images on the non United States (NUS) portion of the RDC, we first need to create a database for our analysis. We have two important steps. Step 1a is building the database and step 1b is the analysis. We have 124,104,544,671,744 bytes (B) of data in the NUS portion of the RDC. An important research question is how long will it take to build a database of sector hashes?

1.4 Thesis Structure In Chapter two we cover the background and related work. In Chapter three we discuss the methodology. In Chapter four we discuss our results. In Chapter five we discuss our conclusions and future work.