ElasticSearch in big data, Lecture notes of Computer Security

This document is related to Elastic Search in big data security

Typology: Lecture notes

2023/2024

Uploaded on 11/21/2024

arehman001
arehman001 🇵🇰

2 documents

1 / 38

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
18/ 06 /2 0X X
ELASTIC SEARCH
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26

Partial preview of the text

Download ElasticSearch in big data and more Lecture notes Computer Security in PDF only on Docsity!

8/ 06 /2 0X X 1

ELASTIC SEARCH

2

ELASTICSEARCH

  • (^) Elasticsearch is a search and analytics engine designed to help quickly search through and analyze large amounts of data.
  • (^) Indexing in Elasticsearch
  • (^) Index = A storage container for data.
  • (^) Indexing = The process of storing and organizing data in an index.
  • (^) When you store data in Elasticsearch, the data is indexed.
  • (^) This means that Elasticsearch organizes the data in a way that makes it easy to search and retrieve later.
  • (^) Similar to creating an optimized structure for your data so Elasticsearch can quickly search through it.
  • (^) When data is saved (e.g., log data, user records, etc.), it’s stored as a document in an index.
  • (^) Each document is like a record, and it’s typically in JSON format

ELASTICSEARCH

  • (^) Data Log Entry:
  • (^) How Indexing Helps Retrieval :
  • (^) Indexing helps Elasticsearch organize this data efficiently so that it can retrieve any record quickly when you search.
  • (^) It creates an inverted index (a special data structure) that allows Elasticsearch to: - (^) Quickly search specific fields (like timestamp, user_id, or event). - (^) Find records that match your search criteria (e.g., all logs where event = "login").

5

INVERTED INDEX IN ELASTICSEARCH

  • In Elasticsearch, inverted indexing is a technique borrowed from information retrieval systems, similar to the index in a book, but optimized for fast full-text searches across large datasets.
  • (^) 1. Data Indexing: When you index (store) a document in Elasticsearch, it’s not just saved in its entirety.
  • (^) Instead, Elasticsearch breaks down the document’s contents into individual terms (or tokens), usually words.
  • (^) These terms are then stored in a data structure known as an inverted index.
  • (^) 2. Inverted Index Creation: The inverted index maps terms to the documents that contain them, allowing for fast lookups.
  • (^) Each unique term in the document gets an entry in the index, and each entry keeps track of which documents contain that term.

7

INVERTED INDEX IN ELASTICSEARCH

  • (^) Step 2: Building the Inverted Index
  • (^) For each unique term, Elasticsearch records the document IDs where the term appears.
  • The inverted index will look something like this:
  • (^) Step 3: Searching with the Inverted Index
  • (^) Now, if you perform a search for "search engine," Elasticsearch can use the inverted index to quickly identify that:"search" appears in Documents 1, 2, and 3."engine" appears in Document 1.
  • (^) Elasticsearch can then return Document 1 as the most relevant result (since it contains both terms) and rank the other documents based on their relevance.

TYPE OF DATA ELASTICSEARCH CAN

HANDLE

  • Application Data:
  • Use Case: It can store structured application data, like user profiles, transaction records, or event data.
  • (^) Example: A social media platform might store user activity logs or interaction data, which can be searched and analyzed.
  • (^) Product or Service Data :
  • (^) Use Case: Elasticsearch is commonly used for storing data about products, services, or inventories in e-commerce platforms.
  • (^) Example: Data about product descriptions, pricing, availability, and customer reviews.

TYPE OF DATA ELASTICSEARCH CAN

HANDLE

  • Textual and Unstructured Data:
  • Use Case: Elasticsearch is built for text-based search and can store unstructured data such as documents, articles, emails, or customer feedback.
  • (^) Example: Indexing and searching through a collection of articles or support tickets. Summary: E-commerce: Searching for products, filtering results by price, rating, or category. Security: Storing and analyzing security events and alert data. Business Analytics: Aggregating and visualizing business data such as sales performance or customer metrics.

11

GOOGLE VS ELASTICSEARCH

  • Ingesting logs: If you have web server logs, these logs can be ingested into Elasticsearch so that you can search, analyze, and visualize them.
  • (^) Ingesting data from sensors or devices: Data coming from IoT devices can be ingested into Elasticsearch to monitor and

KEY COMPONENTS OF ELASTICSEARCH

ARCHITECTURE:

  • (^) There are several types of nodes:
  • (^) Master Node: Manages the cluster, handles node additions/removals, and manages the distribution of data.
  • Data Node: Stores and manages the data, performs CRUD (Create, Read, Update, Delete) operations, and handles search and aggregation requests.
  • (^) Coordinating Node: Handles requests from clients and forwards them to the appropriate data nodes.
  • (^) Ingest Node: Handles data preprocessing and transformation before storing the data in the index.

KEY COMPONENTS OF ELASTICSEARCH

ARCHITECTURE:

3. Index: An index is a collection of documents that share the same data structure.

  • It is where data (e.g., logs, metrics, documents) is stored.
  • An index can have multiple shards for scalability and replicas for redundancy. 4. Shard: An index is split into smaller units called shards.
  • (^) Shards allow Elasticsearch to distribute data across multiple nodes in the cluster.
  • (^) Each shard contains a portion of the data, and when you perform a search, Elasticsearch can search through all the shards in parallel.

16

ELASTICSEARCH ARCHITECTURE

FLOW

  • (^) Ingestion: Data (logs, metrics, documents) is ingested into Elasticsearch.
  • Indexing: Data is stored as documents in an index, which is split into shards.
  • (^) Distribution: Shards are distributed across multiple nodes in the cluster for scalability.
  • (^) Search: When a query is made, it is forwarded to relevant data nodes, which search the shards and return the results.
  • (^) Aggregation: Elasticsearch performs aggregation queries for analytics and summarization of the data. - (^) Calculate the total revenue by summing the prices of all products sold. - (^) Find the average price of products within each category. - Group sales by month to analyze monthly trends.

BEST PRACTICES FOR SECURING

ELASTICSEARCH

  • (^) Always Enable Authentication and Authorization: Don’t leave Elasticsearch open without login requirements.
  • Implement Encryption (TLS) on All Connections: Encrypt connections both within the cluster and for external clients.
  • (^) Keep Elasticsearch Updated: Security patches are regularly released, so it’s important to stay updated.
  • (^) Regularly Audit Access and Activity: Review audit logs and set up monitoring for abnormal activity.
  • (^) Limit Cluster Exposure: Use network isolation strategies to reduce exposure to only trusted networks.

HOW TO IMPLEMENT NETWORK

ISOLATION FOR ELASTICSEARCH

  • (^) 1. Private Network Configuration: Host your Elasticsearch cluster in a Virtual Private Cloud (VPC) or private subnet.
  • Ensure it's accessible only within your organization's internal network.
  • (^) 2. Firewalls and Security Groups: Configure firewalls or security groups to allow connections only from trusted IPs or ranges.
  • (^) Block incoming requests from public IPs unless necessary.
  • (^) 3. Access the cluster only via a Virtual Private Network (VPN)
  • (^) 4. Encrypt traffic between Elasticsearch nodes and clients using TLS. This prevents interception or tampering of data.

RBAC IN ELASTICSEARCH

  • DataCorp has three main departments with different access needs:
  • (^) Sales Team: Needs read-only access to sales data to monitor trends and customer orders.
  • (^) Data Engineering Team: Manages data ingestion and indexing and requires write access to all data indices but not admin privileges.
  • (^) Admin Team: Manages the entire Elasticsearch cluster and requires full access, including user and role management.
  • (^) Step 1: Define the Roles in Elasticsearch
  • (^) DataCorp’s Elasticsearch administrator creates three roles:
    • (^) sales_read_only
    • data_engineer