Database management system | Summaries Database Management Systems (DBMS)

LEC-15: NoSQL

1. NoSQL databases (aka "not only SQL") are non-tabular databases and store data diﬀerently t han relational tables. NoSQL database s

come in a variety of types based on their data model. The main t ypes are document, key-value, wide-column, and graph. They

provide flexible schemas and scale easily with large amount s of data and high user loads.

1. They are schema free.

2. Data structures used are not t abular, they are more flexible, has the ability to adjust dynamic ally.

3. Can handle huge amount of data (big data).

4. Most of the NoSQL are open sources and has the capability o f horizontal scaling.

5. It just stores data in some format other than relational.

2. History behind NoSQL

1. NoSQL databases emerged in the late 2000s as the c ost of storage dramatically decreased. Gone were the days of needing to

create a complex , diﬃcult-to-manage data model in order to avoid data duplication. Developers (rather than storage) were

becoming the primary cost of software development , so NoSQL databases optimised for developer productivity.

2. Data becoming unstructured more, hence struc turing (defining schema in advance) them had becoming costly.

3. NoSQL databases allow developers to store huge amounts o f unstructured data, giving them a lot of flexibility.

4. Recognising the need to rapidly adapt to changing requirements in a software system. Developers needed t he ability to iterate

quickly and make changes throughout their software st ack — all t he way down to the database. NoSQL databases gave them

this flexibility.

5. Cloud compu ting also rose in popularity, and developers began using public clouds to host t heir applications and data. They

wanted the ability to distribute data across multiple ser vers and regions to make their applications resilient, to scale ou t instead

of scale up, and to intelligently geo-place their data. Some NoSQL databases like MongoDB provide these c apabilities.

3. NoSQL Databases Advantages

A. Flexible Schema

1. RDBMS has pre- defined schema, which become an issue when we do not have all the data with us or we need to change

the schema. It's a huge task to change schema on the go.

B. Horizontal Scaling

1. Horizontal scaling, also known as scale -out, refers to bringing on additional nodes to share the load. This is diﬃcult with

relational databases due to the diﬃcult y in spreading out related data across nodes. With non-relational databases, this is

made simpler since collections are self-contained and not coupled relationally. This allows them to be distr ibuted across

nodes more simply, as queries do not have to “join” them together across nodes .

2. Scaling horizontally is achieved through Sharding OR Replica-sets.

C. High Availability

1. NoSQL databases are highly available due to its auto replication f eature i.e. whenever any kind of failure happen s data

replicates itself to the preceding consis tent state.

2. If a ser ver fails, we can acces s that data from another server as well , as in NoSQL database data i s stored at multiple

servers.

D. Easy insert and read operations.

1. Queries in NoSQL databases can be faster than SQL databases . Why? Data in SQL database s is typically normalise d, so

queries for a single object or entit y require you to join data from multiple tables. As your tables gro w in size, the joins can

become expensive. However, data in NoSQL databases is typically stor ed in a way that is optimised for queries. The rule of

thumb when you use MongoDB is data that is accessed together should be stored together. Queries t ypically do not require

joins, so the queries are ver y fast.

2. But diﬃcult delete or update operations.

E. Caching mechanism.

F. NoSQL use case is more for Cloud applications.

4. When to use NoSQL?

1. Fast-paced Agile development

2. Storage of structured and semi-struc tured data

3. Huge volumes of data

4. Requirements for scale-ou t architecture

5. Modern application paradigms like micro-ser vices and real-time streaming.

5. NoSQL DB Misconceptions

1. Relationship data is best suited for relational database s.

1. A common misconception is that NoSQL databases or non-relational databases don’t store relationship data well. NoSQL

databases can store relationship data — they just store it diﬀerently than relational databases do. In fact, when compared

with relational databases, many find modelling relationship data in NoSQL databases to be easier than in relational

databases, because related data doesn’t have to be split be tween tables . NoSQL data models allow related data to be

nested within a single data struc ture.

2. NoSQL databases don't support ACID trans actions.

CodeHelp

Partial preview of the text

Download Database management system and more Summaries Database Management Systems (DBMS) in PDF only on Docsity!

LEC-15: NoSQL

NoSQL databases (aka "not only SQL") are non-tabular databases and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They provide flexible schemas and scale easily with large amounts of data and high user loads. 1. They are schema free. 2. Data structures used are not tabular, they are more flexible, has the ability to adjust dynamically. 3. Can handle huge amount of data (big data). 4. Most of the NoSQL are open sources and has the capability of horizontal scaling. **5. It just stores data in some format other than relational.
History behind NoSQL**
1. NoSQL databases emerged in the late 2000s as the cost of storage dramatically decreased. Gone were the days of needing to create a complex, difficult-to-manage data model in order to avoid data duplication. Developers (rather than storage) were becoming the primary cost of software development, so NoSQL databases optimised for developer productivity.
2. Data becoming unstructured more, hence structuring (defining schema in advance) them had becoming costly.
3. NoSQL databases allow developers to store huge amounts of unstructured data, giving them a lot of flexibility.
4. Recognising the need to rapidly adapt to changing requirements in a software system. Developers needed the ability to iterate quickly and make changes throughout their software stack — all the way down to the database. NoSQL databases gave them this flexibility.
5. Cloud computing also rose in popularity, and developers began using public clouds to host their applications and data. They wanted the ability to distribute data across multiple servers and regions to make their applications resilient, to scale out instead of scale up, and to intelligently geo-place their data. Some NoSQL databases like MongoDB provide these capabilities. 3. NoSQL Databases Advantages A. Flexible Schema
  1. RDBMS has pre-defined schema, which become an issue when we do not have all the data with us or we need to change the schema. It's a huge task to change schema on the go. B. Horizontal Scaling
  2. Horizontal scaling, also known as scale-out, refers to bringing on additional nodes to share the load. This is difficult with relational databases due to the difficulty in spreading out related data across nodes. With non-relational databases, this is made simpler since collections are self-contained and not coupled relationally. This allows them to be distributed across nodes more simply, as queries do not have to “join” them together across nodes.
  3. Scaling horizontally is achieved through Sharding OR Replica-sets. C. High Availability
  4. NoSQL databases are highly available due to its auto replication feature i.e. whenever any kind of failure happens data replicates itself to the preceding consistent state.
  5. If a server fails, we can access that data from another server as well, as in NoSQL database data is stored at multiple servers. D. Easy insert and read operations.
  6. Queries in NoSQL databases can be faster than SQL databases. Why? Data in SQL databases is typically normalised, so queries for a single object or entity require you to join data from multiple tables. As your tables grow in size, the joins can become expensive. However, data in NoSQL databases is typically stored in a way that is optimised for queries. The rule of thumb when you use MongoDB is data that is accessed together should be stored together. Queries typically do not require joins, so the queries are very fast.
  7. But difficult delete or update operations. E. Caching mechanism. F. NoSQL use case is more for Cloud applications. 4. When to use NoSQL?
6. Fast-paced Agile development
7. Storage of structured and semi-structured data
8. Huge volumes of data
9. Requirements for scale-out architecture
10. Modern application paradigms like micro-services and real-time streaming. 5. NoSQL DB Misconceptions
11. Relationship data is best suited for relational databases.
  1. A common misconception is that NoSQL databases or non-relational databases don’t store relationship data well. NoSQL databases can store relationship data — they just store it differently than relational databases do. In fact, when compared with relational databases, many find modelling relationship data in NoSQL databases to be easier than in relational databases, because related data doesn’t have to be split between tables. NoSQL data models allow related data to be nested within a single data structure.
12. NoSQL databases don't support ACID transactions.

CodeHelp

Another common misconception is that NoSQL databases don't support ACID transactions. Some NoSQL databases like MongoDB do, in fact, support ACID transactions. **6. Types of NoSQL Data Models
Key-Value Stores**
The simplest type of NoSQL database is a key-value store. Every data element in the database is stored as a key value pair consisting of an attribute name (or "key") and a value. In a sense, a key-value store is like a relational database with only two columns: the key or attribute name (such as "state") and the value (such as "Alaska").
Use cases include shopping carts, user preferences, and user profiles.
e.g., Oracle NoSQL, Amazon DynamoDB, MongoDB also supports Key-Value store, Redis.
A key-value database associates a value (which can be anything from a number or simple string to a complex object) with a key, which is used to keep track of the object. In its simplest form, a key-value store is like a dictionary/array/map object as it exists in most programming paradigms, but which is stored in a persistent way and managed by a Database Management System (DBMS).
Key-value databases use compact, efficient index structures to be able to quickly and reliably locate a value by its key, making them ideal for systems that need to be able to find and retrieve data in constant time.
There are several use-cases where choosing a key value store approach is an optimal solution: a) Real time random data access, e.g., user session attributes in an online application such as gaming or finance. b) Caching mechanism for frequently accessed data or configuration based on keys. c) Application is designed on simple key-based queries. 2. Column-Oriented / Columnar / C-Store / Wide-Column
The data is stored such that each row of a column will be next to other rows from that same column.
While a relational database stores data in rows and reads data row by row, a column store is organised as a set of columns. This means that when you want to run analytics on a small number of columns, you can read those columns directly without consuming memory with the unwanted data. Columns are often of the same type and benefit from more efficient compression, making reads even faster. Columnar databases can quickly aggregate the value of a given column (adding up the total sales for the year, for example). Use cases include analytics.
e.g., Cassandra, RedShift, Snowflake. 3. Document Based Stores
This DB store data in documents similar to JSON (JavaScript Object Notation) objects. Each document contains pairs of fields and values. The values can typically be a variety of types including things like strings, numbers, booleans, arrays, or objects.
Use cases include e-commerce platforms, trading platforms, and mobile app development across industries.
Supports ACID properties hence, suitable for Transactions.
e.g., MongoDB, CouchDB. 4. Graph Based Stores
A graph database focuses on the relationship between data elements. Each element is stored as a node (such as a person in a social media graph). The connections between elements are called links or relationships. In a graph database, connections are first-class elements of the database, stored directly. In relational databases, links are implied, using data to express the relationships.
A graph database is optimised to capture and search the connections between data elements, overcoming the overhead associated with JOINing multiple tables in SQL.
Very few real-world business systems can survive solely on graph queries. As a result graph databases are usually run alongside other more traditional databases.
Use cases include fraud detection, social networks, and knowledge graphs. **7. NoSQL Databases Dis-advantages
Data Redundancy**
Since data models in NoSQL databases are typically optimised for queries and not for reducing data duplication, NoSQL databases can be larger than SQL databases. Storage is currently so cheap that most consider this a minor drawback, and some NoSQL databases also support compression to reduce the storage footprint.
Update & Delete operations are costly.
All type of NoSQL Data model doesn’t fulfil all of your application needs
Depending on the NoSQL database type you select, you may not be able to achieve all of your use cases in a single database. For example, graph databases are excellent for analysing relationships in your data but may not provide what you need for everyday retrieval of the data such as range queries. When selecting a NoSQL database, consider what your use cases will be and if a general purpose database like MongoDB would be a better option.
Doesn’t support ACID properties in general.
Doesn’t support data entry with consistency constraints.

CodeHelp

LEC-16: Types of Databases

1. Relational Databases 1. Based on Relational Model. 2. Relational databases are quite popular , even though it was a system designed in the 1970s. Also known as relational database management systems (RDBMS), relational databases commonly use Structured Query Language (SQL) for operations such as creating , reading , updating , and deleting data. Relational databases store information in discrete tables , which can be JOIN ed together by fields known as foreign keys. For example, you might have a User table which contains information about all your users, and join it to a Purchases table, which contains information about all the purchases they’ve made. MySQL, Microsoft SQL Server, and Oracle are types of relational databases. 3. they are ubiquitous, having acquired a steady user base since the 1970s 4. they are highly optimised for working with structured data. 5. they provide a stronger guarantee of data normalisation 6. they use a well-known querying language through SQL 7. Scalability issues (Horizontal Scaling). 8. Data become huge, system become more complex. 2. Object Oriented Databases 1. The object-oriented data model, is based on the object-oriented-programming paradigm, which is now in wide use. Inheritance , object - identity , and encapsulation (information hiding), with methods to provide an interface to objects, are among the key concepts of object-oriented programming that have found applications in data modelling. The object-oriented data model also supports a rich type system, including structured and collection types. While inheritance and, to some extent, complex types are also present in the E-R model, encapsulation and object-identity distinguish the object-oriented data model from the E-R model. 2. Sometimes the database can be very complex, having multiple relations. So, maintaining a relationship between them can be tedious at times. 1. In Object-oriented databases data is treated as an object. 2. All bits of information come in one instantly available object package instead of multiple tables. 3. Advantages 1. Data storage and retrieval is easy and quick. 2. Can handle complex data relations and more variety of data types that standard relational databases. 3. Relatively friendly to model the advance real world problems 4. Works with functionality of OOPs and Object Oriented languages. 4. Disadvantages 1. High complexity causes performance issues like read, write, update and delete operations are slowed down. 2. Not much of a community support as isn’t widely adopted as relational databases. 3. Does not support views like relational databases. 5. e.g., ObjectDB, GemStone etc. 3. NoSQL Databases 1. NoSQL databases (aka "not only SQL") are non-tabular databases and store data differently than relational tables. NoSQL databases come in a variety of types based on their data model. The main types are document, key-value, wide-column, and graph. They provide flexible schemas and scale easily with large amounts of data and high user loads. 2. They are schema free. 3. Data structures used are not tabular, they are more flexible, has the ability to adjust dynamically. 4. Can handle huge amount of data (big data). 5. Most of the NoSQL are open sources and has the capability of horizontal scaling. 6. It just stores data in some format other than relational. 7. Refer LEC-15 notes… 4. Hierarchical Databases 1. As the name suggests, the hierarchical database model is most appropriate for use cases in which the main focus of information gathering is based on a concrete hierarchy , such as several individual employees reporting to a single department at a company. 2. The schema for hierarchical databases is defined by its tree-like organisation, in which there is typically a root “parent” directory of data stored as records that links to various other subdirectory branches, and each subdirectory branch, or child record, may link to various other subdirectory branches. 3. The hierarchical database structure dictates that, while a parent record can have several child records, each child record can only have one parent record. Data within records is stored in the form of fields, and each field can only contain one value. Retrieving hierarchical data from a hierarchical database architecture requires traversing the entire tree, starting at the root node. 4. Since the disk storage system is also inherently a hierarchical structure, these models can also be used as physical models. 5. The key advantage of a hierarchical database is its ease of use. The one-to-many organisation of data makes traversing the database simple and fast, which is ideal for use cases such as website drop-down menus or computer folders in systems like

CodeHelp

Microsoft Windows OS. Due to the separation of the tables from physical storage structures, information can easily be added or deleted without affecting the entirety of the database. And most major programming languages offer functionality for reading tree structure databases.

The major disadvantage of hierarchical databases is their inflexible nature. The one-to-many structure is not ideal for complex structures as it cannot describe relationships in which each child node has multiple parents nodes. Also the tree-like organisation of data requires top-to-bottom sequential searching, which is time consuming, and requires repetitive storage of data in multiple different entities, which can be redundant.
e.g., IBM IMS. 5. Network Databases
Extension of Hierarchical databases
The child records are given the freedom to associate with multiple parent records.
Organised in a Graph structure.
Can handle complex relations.
Maintenance is tedious.
M:N links may cause slow retrieval.
Not much web community support.
e.g., Integrated Data Store (IDS), IDMS (Integrated Database Management System), Raima Database Manager, TurboIMAGE etc.

LEC-18: Partitioning & Sharding in DBMS (DB Optimisation)

A big problem can be solved easily when it is chopped into several smaller sub-problems. That is what the partitioning technique does. It divides a big database containing data metrics and indexes into smaller and handy slices of data called partitions. The partitioned tables are directly used by SQL queries without any alteration. Once the database is partitioned, the data definition language can easily work on the smaller partitioned slices, instead of handling the giant database altogether. This is how partitioning cuts down the problems in managing large database tables.
Partitioning is the technique used to divide stored database objects into separate servers. Due to this, there is an increase in performance, controllability of the data. We can manage huge chunks of data optimally. When we horizontally scale our machines/servers, we know that it gives us a challenging time dealing with relational databases as it’s quite tough to maintain the relations. But if we apply partitioning to the database that is already scaled out i.e. equipped with multiple servers, we can partition our database among those servers and handle the big data easily. 3. Vertical Partitioning
1. Slicing relation vertically / column-wise.
2. Need to access different servers to get complete tuples. 4. Horizontal Partitioning
3. Slicing relation horizontally / row-wise.
4. Independent chunks of data tuples are stored in different servers. 5. When Partitioning is Applied?
5. Dataset become much huge that managing and dealing with it become a tedious task.
6. The number of requests are enough larger that the single DB server access is taking huge time and hence the system’s response time become high. 6. Advantages of Partitioning
7. Parallelism
8. Availability
9. Performance
10. Manageability
11. Reduce Cost, as scaling-up or vertical scaling might be costly. 7. Distributed Database
12. A single logical database that is, spread across multiple locations (servers) and logically interconnected by network.
13. This is the product of applying DB optimisation techniques like Clustering, Partitioning and Sharding.
14. Why this is needed? READ Point 5. 8. Sharding
15. Technique to implement Horizontal Partitioning.
16. The fundamental idea of Sharding is the idea that instead of having all the data sit on one DB instance, we split it up and introduce a Routing layer so that we can forward the request to the right instances that actually contain the data. 3. Pros
  1. Scalability
  2. Availability 4. Cons
  3. Complexity, making partition mapping, Routing layer to be implemented in the system, Non-uniformity that creates the necessity of Re- Sharding
  4. Not well suited for Analytical type of queries, as the data is spread across different DB instances. (Scatter-Gather problem)

LEC-20: CAP Theorem

Basic and one of the most important concept in Distributed Databases.
Useful to know this to design efficient distributed system for your given business logic.
Let’s first breakdown CAP
1. Consistency : In a consistent system, all nodes see the same data simultaneously. If we perform a read operation on a consistent system, it should return the value of the most recent write operation. The read should cause all nodes to return the same data. All users see the same data at the same time, regardless of the node they connect to. When data is written to a single node, it is then replicated across the other nodes in the system.
2. Availability : When availability is present in a distributed system, it means that the system remains operational all of the time. Every request will get a response regardless of the individual state of the nodes. This means that the system will operate even if there are multiple nodes down. Unlike a consistent system, there’s no guarantee that the response will be the most recent write operation.
3. Partition Tolerance : When a distributed system encounters a partition, it means that there’s a break in communication between nodes. If a system is partition-tolerant, the system does not fail, regardless of whether messages are dropped or delayed between nodes within the system. To have partition tolerance, the system must replicate records across combinations of nodes and networks.
What does the CAP Theorem says,
1. The CAP theorem states that a distributed system can only provide two of three properties simultaneously: consistency, availability, and partition tolerance. The theorem formalises the tradeoff between consistency and availability when there’s a partition.
CAP Theorem NoSQL Databases: NoSQL databases are great for distributed networks. They allow for horizontal scaling, and they can quickly scale across multiple nodes. When deciding which NoSQL database to use, it’s important to keep the CAP theorem in mind. 1. CA Databases: CA databases enable consistency and availability across all nodes. Unfortunately, CA databases can’t deliver fault tolerance. In any distributed system, partitions are bound to happen, which means this type of database isn’t a very practical choice. That being said, you still can find a CA database if you need one. Some relational databases, such as MySQL or PostgreSQL, allow for consistency and availability. You can deploy them to nodes using replication. 2. CP Databases: CP databases enable consistency and partition tolerance, but not availability. When a partition occurs, the system has to turn off inconsistent nodes until the partition can be fixed. MongoDB is an example of a CP database. It’s a NoSQL database management system (DBMS) that uses documents for data storage. It’s considered schema-less, which means that it doesn’t require a defined database schema. It’s commonly used in big data and applications running in different locations. The CP system is structured so that there’s only one primary node that receives all of the write requests in a given replica set. Secondary nodes replicate the data in the primary nodes, so if the primary node fails, a secondary node can stand-in. In banking system Availability is not as important as consistency, so we can opt it (MongoDB). 3. AP Databases: AP databases enable availability and partition tolerance, but not consistency. In the event of a partition, all nodes are available, but they’re not all updated. For example, if a user tries to access data from a bad node, they won’t receive the most up-to-date version of the data. When the partition is eventually resolved, most AP databases will sync the nodes to ensure consistency across them. Apache Cassandra is an example of an AP database. It’s a NoSQL database with no primary node, meaning that all of the nodes remain available. Cassandra allows for eventual consistency because users can re-sync their data right after a partition is resolved. For apps like Facebook, we value availability more than consistency, we’d opt for AP Databases like Cassandra or Amazon DynamoDB.

Database management system, Summaries of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Database management system and more Summaries Database Management Systems (DBMS) in PDF only on Docsity!

LEC-15: NoSQL

CodeHelp

CodeHelp

LEC-16: Types of Databases

CodeHelp

LEC-18: Partitioning & Sharding in DBMS (DB Optimisation)

LEC-20: CAP Theorem