Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Data Warehousing: Online Transaction Processing vs. Online Analytic Processing, Slides of Database Management Systems (DBMS)

Stanford University Database Management Systems (DBMS)

The concepts of online transaction processing (oltp) and online analytic processing (olap) in data warehousing. Oltp involves handling short, simple queries and frequent updates, while olap deals with fewer, complex queries and infrequent updates. The document also covers the architecture of data warehouses and the differences between rolap and molap approaches.

Typology: Slides

2011/2012

Uploaded on 01/31/2012

marphy 🇺🇸

4.4

(31)

284 documents

1 / 14

This page cannot be seen from the preview

Don't miss anything!

Warehousing



The most common form of information

integration: copy sources into a single DB and

try to keep it up-to-date.



Usual method: perio dic reconstruction of the

warehouse, perhaps overnight.

Discover Slides of Database Management Systems (DBMS) Stanford University

Partial preview of the text

Download Data Warehousing: Online Transaction Processing vs. Online Analytic Processing and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

Warehousing

The most common form of information integration: copy sources into a single DB and try to keep it up-to-date.

Usual metho d: p erio dic reconstruction of the warehouse, p erhaps overnight.

OLTP Versus OLAP

Most database op erations are of a typ e called on-line transaction processing (OLTP). F Short, simple queries and frequent up dates involving one or a small numb er of tuples. F Examples: answering queries from a Web interface, recording sales at cash-registers, selling airline tickets.

Star Schemas

Commonly, the data at a warehouse is of two typ es:

Fact Data : Very large, accumulation of facts such as sales. F Often \insert-only"; once there, a tuple remains.
Dimension Data : Smaller, generally static, information ab out the entities involved in the facts.

Example

Supp ose we wanted to record every sale of b eer at all bars: the bar, the b eer, the drinker who b ought the b eer, the day and time, the price charged.

Fact data is in a relation with schema:

Sales(bar, beer, drinker, day, time, price)

Dimension data could include a relation for bars, one for b eers, and one for drinkers. Bars(bar, addr, lic) Beers(beer, manf) Drinkers(drinker, addr, phone)

ROLAP

Typical queries b egin with a complete \star join," for example:

SELECT * FROM Sales, Bars, Beers, Drinkers WHERE Sales.bar = Bars.bar AND Sales.beer = Beers.beer AND Sales.drinker = Drinkers.drinker;

Typical OLAP query will:

Do all or part of the star join.
Filter interesting tuples based on fact and/or dimension data.
Group by one or more dimensions.
Aggregate the result.

Example: \For each bar in Palo Alto, nd the total sale of each b eer manufactured by Anheuser-Busch."

Performance Issues

If the fact table is large, queries will take much to o long.

Materializ ed views can help.

Example

For the question ab out bars in Palo Alto and b eers by Anheuser-Busch, we would b e aided by the materialized view:

CREATE VIEW BABMS(bar, addr, beer, manf, sales) AS SELECT bar, addr, beer, manf, SUM(price) AS sales FROM Sales NATURAL JOIN Bars NATURAL JOIN Beers GROUP BY bar, addr, beer, manf;

Slicing and Dicing

Slice = select a value along one dimension, e.g., a particular bar.

Dice = the same thing along another dimension, e.g., a particular b eer.

Drill-Down and Roll-Up

Dril l-down = \de-aggregate" = break an aggregate into its constituents. F Example: having determined that Jo e's Bar in Palo Alto is selling very few Anheuser-Busch b eers, break down his sales by the particular b eer.

Rol l-up = aggregate along one dimension.

F Example: given a table of how much Budweiser each drinker consumes at each bar, roll it up into a table of amount consumed by each drinker.

Performance

As with ROLAP, materialized views can help.

Data-cub es invite materialized views that are aggregations in one or more dimensions.

Dimensions need not b e aggregated completely. Rather, grouping by attributes of the dimension table is p ossible. F Example: a materialized view might aggregate by drinker completely, by b eer not at all, by time according to the day, and by bar only according to the city of the bar. F Example: time is a really interesting dimension, since there are natural groupings, such as weeks and months, that are not commensurate.

Simplest Problem: Find the Frequent Pairs of Items

Given a support threshold s, we could ask:

Find the pairs of items that app ear together in at least s baskets. SELECT b1.item, b2.item FROM Baskets b1, Baskets b WHERE b1.bid = b2.bid AND b1.item < b2.item GROUP BY b1.item, b2.item HAVING COUNT(*) >= s;

A-Priori Trick

Ab ove query is prohibitively exp ensive for large data.

A-priori algorithm uses the fact that a pair (i; j ) cannot have supp ort s unless i and j b oth have supp ort s by themselves.

More ecient implementation uses an intermediate relation Baskets1. INSERT INTO Baskets1(bid, item) SELECT * FROM Baskets WHERE item IN ( SELECT item FROM Baskets GROUP BY item HAVING COUNT(*) >= s );

Then run the query for pairs on Baskets instead of Baskets.

Data Warehousing: Online Transaction Processing vs. Online Analytic Processing, Slides of Database Management Systems (DBMS)

Related documents

Partial preview of the text

Download Data Warehousing: Online Transaction Processing vs. Online Analytic Processing and more Slides Database Management Systems (DBMS) in PDF only on Docsity!