






























































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
analysis using traditional relational approaches or requires ... Credit: NoSQL Databases, Strauch; Understanding Big Data, Eaton et al.
Typology: Lecture notes
1 / 70
This page cannot be seen from the preview
Don't miss anything!































































Michael Cooper & Peter Mell NIST Information Technology Laboratory Computer Security Division
Ȉ What exactly is Big Data? Ȉ What are the issues associated with it? Ȉ What role should NIST play with regard to Big Data?
Ȉ What is the relationship between Big Data and IT Security?
Ȉ Taxonomies, ontologies, schemas, workflow Ȉ Perspectives Ȃ backgrounds, use cases
Ȉ Bits Ȃ raw data formats and storage methods Ȉ Cycles Ȃ algorithms and analysis Ȉ Screws Ȃ infrastructure to support Big Data
Ȉ Big data sources become rich targets Ȉ Composition of data in one large source as well as across sources Ȉ Security data becoming the source for big data repositories Ȃ Log/event aggregation and correlation Ȃ IDS/IPS databases
Peter Mell Senior Computer Scientist NIST Information Technology Laboratory http://twitter.com/petermmell
NIST Information Technology Laboratory
ǯ big data technology and do not necessarily represent the official opinion of NIST.
Any mention of commercial and not-‐for-‐profit entities, products, and technology is for informational purposes only; it does not imply recommendation or endorsement by NIST or usability for any specific purpose.
NIST Information Technology Laboratory
Ȉ The world is creating ever more data Ȃ ȋǯȌ
Ȉ Mankind created data Ȃ 150 exabytes in 2005 Ȉ (exabyte is a billion gigabytes) Ȃ 1200 exabytes in 2010 Ȃ 35000 exabytes in 2020 (expected by IBM)
Ȉ Examples: Ȃ U.S. drone aircraft sent back 24 years worth of video footage in 2009 Ȃ Large Hadron Collider generates 40 terabytes/second Ȃ ǯǣ͙͘͝͞Ȁ Ȃ Around 30 billion RFID tags produced/year Ȃ Oil drilling platforms have 20k to 40k sensors Ȃ Our world has 1 billion transistors/human
Credit: The data deluge, Economist; Understanding Big Data, Eaton et al.^11
Ȉ Dz dzȂ Economist Ȉ Challenges to achieving the revolution Ȃ It is not possible to store all the data we produce Ȃ 95% of created information was unstructured in 2010 Ȉ Key observation Ȃ Relational database management systems (RDBMS) will be challenged to scale up or out to meet the demand
Credit: Data data everywhere, Economist; Extracting Value from Chaos, Gantz et al.^13
Ȉ ǯǣ Ȃ Big data is when the size of the data itself becomes part of the problem Ȉ EMC/IDC definition of big data: Ȃ Big data technologies describe a new generation of technologies and architectures, designed to economically extract value from very large volumes of a wide variety of data, by enabling high-‐ velocity capture, discovery, and/or analysis. Ȉ dz ǣdz Ȃ Volume (Terabytes -‐> Zettabytes) Ȃ Variety (Structured -‐> Semi-‐structured -‐> Unstructured) Ȃ Velocity (Batch -‐> Streaming Data) Ȉ Microsoft researchers use the same tuple
Credit: BigUnderstanding Big Data, Eaton et al. (IBM definition) ; The World According to LINQ, Meijer (Microsoft research) ǡ ǯȋǯȌǢ ǡGantz et al. (IDC definition); 14
Ȉ Big Data Science Ȃ Big data science is the study of techniques covering the acquisition, conditioning, and evaluation of big data. These techniques are a synthesis of both information technology and mathematical approaches.
Ȉ Big Data Frameworks Ȃ Big data frameworks are software libraries along with their associated algorithms that enable distributed processing and analysis of big data problems across clusters of compute units (e.g., servers, CPUs, or GPUs).
Ȉ Big Data Infrastructure Ȃ Big data infrastructure is an instantiation of one or more big data frameworks that includes management interfaces, actual servers (physical or virtual), storage facilities, networking, and possibly back-‐up systems. Big data infrastructure can be instantiated to solve specific big data problems or to serve as a general purpose analysis and processing engine.
16
Ȉ NoSQL Origins Ȃ ͙͡͡͠Dzdz Ȃ ͚͘͘͡ Dzdz Ȃ Groups non-‐relational approaches under a single term Ȉ The power of SQL is not needed in all problems Ȃ Specialized solutions may be faster or more scalable Ȃ NoSQL generally has less querying power than SQL Ȉ Common reasons to use NoSQL Ȃ Ability to handle semi-‐structured and unstructured data Ȃ Horizontal scalability Ȉ NoSQL may complement RDBMS (but sometimes replaces) Ȃ RDBMS may hold smaller amounts of high-‐value structured data Ȃ NoSQL may hold vast amounts of less valued and less structured data
StructuredStorage
RDBMS (^) NoSQL
Credit: NoSQL Databases, Strauch; Understanding Big Data, Eaton et al.^17
Partition Tolerance
Consistency Availability
Small data sets can be both consistent and available
BASE with eventual consistency
ACID with eventual availability
19
NIST Information Technology Laboratory