Tackling Big Data, Lecture notes of Technology

analysis using traditional relational approaches or requires ... Credit: NoSQL Databases, Strauch; Understanding Big Data, Eaton et al.

Typology: Lecture notes

2022/2023

Uploaded on 02/28/2023

ekaatma
ekaatma 🇺🇸

4.2

(34)

266 documents

1 / 70

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Michael(Cooper(&(Peter(Mell(
NIST(Information(Technology(Laboratory(
Computer(Security(Division(
Tackling(Big(Data(
(
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46

Partial preview of the text

Download Tackling Big Data and more Lecture notes Technology in PDF only on Docsity!

Michael Cooper & Peter Mell NIST Information Technology Laboratory Computer Security Division

Tackling Big Data

Ȉ What exactly is Big Data? Ȉ What are the issues associated with it? Ȉ What role should NIST play with regard to Big Data?

Ȉ What is the relationship between Big Data and IT Security?

IT Laboratory Big Data Working Group

What are the issues associated with Big Data?

Ȉ Taxonomies, ontologies, schemas, workflow Ȉ Perspectives Ȃ backgrounds, use cases

Ȉ Bits Ȃ raw data formats and storage methods Ȉ Cycles Ȃ algorithms and analysis Ȉ Screws Ȃ infrastructure to support Big Data

IT Security and Big Data

Ȉ Big data sources become rich targets Ȉ Composition of data in one large source as well as across sources Ȉ Security data becoming the source for big data repositories Ȃ Log/event aggregation and correlation Ȃ IDS/IPS databases

Peter Mell Senior Computer Scientist NIST Information Technology Laboratory http://twitter.com/petermmell

An Overview of Big Data

Technology and Security

Implications

NIST Information Technology Laboratory

Ї‹†‡ƒ•Ї”‡‹”‡’”‡•‡––Їƒ—–Š‘”ǯ•‘–‹‘ƒŽ˜‹‡™•‘ big data technology and do not necessarily represent the official opinion of NIST.

Any mention of commercial and not-­‐for-­‐profit entities, products, and technology is for informational purposes only; it does not imply recommendation or endorsement by NIST or usability for any specific purpose.

Disclaimer

NIST Information Technology Laboratory

Section 1: Introduction and Definitions

Ȉ The world is creating ever more data Ȃ ȋƒ†‹–ǯ•ƒƒ‹•–”‡ƒ’”‘„އȌ

Ȉ Mankind created data Ȃ 150 exabytes in 2005 Ȉ (exabyte is a billion gigabytes) Ȃ 1200 exabytes in 2010 Ȃ 35000 exabytes in 2020 (expected by IBM)

Ȉ Examples: Ȃ U.S. drone aircraft sent back 24 years worth of video footage in 2009 Ȃ Large Hadron Collider generates 40 terabytes/second Ȃ ‹ƒ†‡ǯ•†‡ƒ–Šǣ͙͘͝͞–™‡‡–•Ȁ•‡ ‘† Ȃ Around 30 billion RFID tags produced/year Ȃ Oil drilling platforms have 20k to 40k sensors Ȃ Our world has 1 billion transistors/human

Big Data Ȃ the Data Deluge

Credit: The data deluge, Economist; Understanding Big Data, Eaton et al.^11

Ȉ ƒ–ƒ‹•–Ї‡™Dz”ƒ™ƒ–‡”‹ƒŽ‘ˆ „—•‹‡••dzȂ Economist Ȉ Challenges to achieving the revolution Ȃ It is not possible to store all the data we produce Ȃ 95% of created information was unstructured in 2010 Ȉ Key observation Ȃ Relational database management systems (RDBMS) will be challenged to scale up or out to meet the demand

Credit: Data data everywhere, Economist; Extracting Value from Chaos, Gantz et al.^13

Ȉ ǯ‡‹ŽŽ›ƒ†ƒ”†‡ˆ‹‹–‹‘ǣ Ȃ Big data is when the size of the data itself becomes part of the problem Ȉ EMC/IDC definition of big data: Ȃ Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-­‐ velocity capture, discovery, and/or analysis. Ȉ •ƒ›•–Šƒ–dz–Š”‡‡ Šƒ”ƒ –‡”‹•–‹ •†‡ˆ‹‡„‹‰†ƒ–ƒǣdz Ȃ Volume (Terabytes -­‐> Zettabytes) Ȃ Variety (Structured -­‐> Semi-­‐structured -­‐> Unstructured) Ȃ Velocity (Batch -­‐> Streaming Data) Ȉ Microsoft researchers use the same tuple

Industry Views on Big Data

Credit: BigUnderstanding Big Data, Eaton et al. (IBM definition) ; The World According to LINQ, Meijer (Microsoft research) ƒ–ƒ‘™ǡ—””‡–‡”•’‡ –‹˜‡•ˆ”‘ǯ‡‹ŽŽ›ƒ†ƒ”ȋǯ‡‹ŽŽ›†‡ˆ‹‹–‹‘ȌǢš–”ƒ –‹‰ƒŽ—‡ˆ”‘Šƒ‘•ǡGantz et al. (IDC definition); 14

Ȉ Big Data Science Ȃ Big data science is the study of techniques covering the acquisition, conditioning, and evaluation of big data. These techniques are a synthesis of both information technology and mathematical approaches.

Ȉ Big Data Frameworks Ȃ Big data frameworks are software libraries along with their associated algorithms that enable distributed processing and analysis of big data problems across clusters of compute units (e.g., servers, CPUs, or GPUs).

Ȉ Big Data Infrastructure Ȃ Big data infrastructure is an instantiation of one or more big data frameworks that includes management interfaces, actual servers (physical or virtual), storage facilities, networking, and possibly back-­‐up systems. Big data infrastructure can be instantiated to solve specific big data problems or to serve as a general purpose analysis and processing engine.

More Notional Definitions

16

Ȉ NoSQL Origins Ȃ ‹”•–—•‡†‹͙͡͡͠–‘‡ƒDz‘–‘dz Ȃ ‡—•‡†‹͚͘͘͡™Š‡‹– ƒ‡–‘‡ƒDz‘–Ž›dz Ȃ Groups non-­‐relational approaches under a single term Ȉ The power of SQL is not needed in all problems Ȃ Specialized solutions may be faster or more scalable Ȃ NoSQL generally has less querying power than SQL Ȉ Common reasons to use NoSQL Ȃ Ability to handle semi-­‐structured and unstructured data Ȃ Horizontal scalability Ȉ NoSQL may complement RDBMS (but sometimes replaces) Ȃ RDBMS may hold smaller amounts of high-­‐value structured data Ȃ NoSQL may hold vast amounts of less valued and less structured data

Big Data Frameworks are often

associated with the term NoSQL

StructuredStorage

RDBMS (^) NoSQL

Credit: NoSQL Databases, Strauch; Understanding Big Data, Eaton et al.^17

CAP Theorem with ACID and BASE Visualized

Partition Tolerance

Consistency Availability

Small data sets can be both consistent and available

BASE with eventual consistency

ACID with eventual availability

19

NIST Information Technology Laboratory

Section 2: Big Data Taxonomies