























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Query Optimization, Statistics, Table Scan, Table Space Scans, Simple Indexed Access, Index Scan Step, Unclustered Match, Concatenated Index, Index-only Scan, Filter Factors, Indexable Predicates, Concatenated Index, Indexable Predicate, Multiple Index Access, RID List Use, No Index Screening
Typology: Study notes
1 / 31
This page cannot be seen from the preview
Don't miss anything!
























Query optimizer creates a procedural access plan for a query. After submit a Select statement, three steps.
select * from customers where city = 'Boston' and discnt between 10 and 12;
(1) Parse (as in BNF), look up tables, columns, views (query modification), check privileges, verify integrity constraints.
(2) Query Optimization. Look up statistics in System tables. Figure out what to do. Basically consider competing plans, choose the cheapest. (What do we mean by cheapest? What are we minimizing? See shortly.)
(3) Now Create the "program" (code generation). Then Query Execution.
We will be talking about WHY a particular plan is chosen, not HOW all the plan search space is considered. Question of how compiler chooses best plan is beyond scope of course. Assume QOPT follows human reasoning.
For now, considering only queries, not updates.
Section 9.1. Now what do we mean by cheapest alternative plan? What resources are we trying to minimize? CPU and I/O. Given a PLAN, talk about COSTCPU(PLAN) and COSTI/O(PLAN).
But two plans PLAN 1 and PLAN 2 can have incomparable resource use with this pair of measures. See Figure 9.1. ( pg. 535)
PLAN 1 9.2 CPU seconds 103 reads PLAN 2 1.7 CPU seconds 890 reads
*** Which is cheaper? Depends on which resources have a bottleneck. DB2 takes a weighted sum of these for the total cost, COST(PLAN):
A DBA should try to avoid bottlenecks by understanding the WORKLOAD for a system. Estimated total CPU and I/O for all users at peak work time.
Figure out various queries and rate of submission per second. Q (^) k , RATE(Q (^) k ) as in Figure 9.2. (pg. 537.)
Query Type RATE(Query) in Submissions/second Q1 40. Q2 20.
Figure 9.2 Simple Workload with two Queries
Then work out COST (^) CPU (Q (^) k ) and COST (^) CPU (Q (^) k ) for plan that database system will choose. Now can figure out total average CPU or I/O needs per second:
Similarly for I/O. Tells us how many disks to buy, how powerful a CPU. (Costs go up approximately linearly with CPU power within small ranges).
Ideally, the weights W 1 and W 2 used to calculate COST(PLAN) from I/O and CPU costs will reflect actual costs of equipment
Good DBA tries to estimate workload in advance to make purchases, have equipment ready for a new application.
*** Note that one other major expense is response time. Could have poor response time (1-2 minutes) even on lightly loaded inexpensive system.
This is because many queries need to perform a LOT of I/O, and some commercial database systems have no parallelism: all I/Os in sequence.
But long response time also has a cost, in that it wastes user time. Pay workers for wasted time. Employees quit from frustration and others must be trained. Vendors are trying hard to reduce response times.
where city = 'Boston' and discnt between 12 and 14;
The Explain Plan statement puts rows in a "plan_table" to represent individual procedural steps in a query plan. Can get rows back by:
select * from plan_table where queryno = 1000;
Recall that a Query Plan is a sequence of procedural access steps that carry out a program to answer the query. Steps are peculiar to the DBMS.
From one DBMS to another, difference in steps used is like difference between programming languages. Can't learn two languages at once.
We will stick to a specific DBMS in what follows, MVS DB2, so we can end up with an informative benchmark we had in the first edition.
But we will have occasional references to ORACLE, DB2 UDB. Here is the ORACLE Explain Plan syntax.
EXPLAIN PLAN [SET STATEMENT_ID = 'text-identifier'] [INTO [schema.]tablename] FOR explainable-sql-statement;
This inserts a sequence of statements into a user created DB2/ORACLE table known as PLAN_TABLE one row for each access step. To learn more about this, see ORACLE8 documentation named in text.
Need to understand what basic procedural access steps ARE in the particular product you're working with.
The set of steps allowed is the "bag of tricks" the query optimizer can use. Think of these procedural steps as the "instructions" a compiler can use to create "object code" in compiling a higher-level request.
A system that has a smaller bag of tricks is likely to have less efficient ac- cess plans for some queries.
MVS DB2 (and the architecturally allied DB2 UDB) have a wide range of tricks, but not bitmap indexing or hashing capability.
Still, very nice capabilities for range search queries, and probably the most sophisticated query optimizer.
Basic procedrual steps covered in the next few Sections (thumbnail):
Table Scan Look through all rows of table Unique Index Scan Retrieve row through unique index Unclustered Matching Index Scan Retrieve multiple rows through a non-unique index, rows not same order Clustered Matching Index Scan Retrieve multiple rows through a non-unique clustered index Index-Only Scan Query answered in index, not rows
Note that the steps we have listed access all the rows restricted by the WHERE clause in some single table query.
Need two tables in the FROM clause to require two steps of this kind and Joins come later. A multi-step plan for a single table query will also be covered later.
Such a multi-step plan on a single table is one that combines multiple indexes to retrieve data. Up to then, only one index per table can be used.
Single step. The plan table (plan_table) will have a column ACCESSTYPE with value R (ACCESSTYPE = R for short).
Example 9.2.1. Table Space Scan Step. Look through all rows in table to answer query, maybe because there is no index that will help.
Assume in DB2 an employees table with 200,000 rows, each row of 200 bytes, each 4 KByte page just 70% full. Thus 2800 usable pages, 14 rows/pg. Need CEIL(200,000/14) = 14,286 pages.
Consider the query: select eid, ename from employees where socsecno = 113353179;
If there is no index on socsecno, only way to answer query is by reading in all rows of table. (Stupid not to have an index if this query occurs with any frequency at all!)
But if there are lots of queries compared to number of disks and accessed pages are randomly placed on disks, probably keep all disk arms busy already.
But there's another factor operating. Two disk pages that are close to each other on one disk can be read faster because there's a shorter seek time.
Recall that the system tries to make extents contiguous on disk, so I/Os in sequence are faster. Thus, a table that is made up of a sequence of (mainly) contiguous pages, one after another within a track, will take much less time to read in.
In fact it seems we should be able to read in successive pages at full transfer speed would take about .00125 secs per page.
Used to be that by the time the disk controller has read in the page to a memory buffer and looked to see what the next page request is, the page immediately following has already passed by under the head.
But now with multiple requests to the disk outstanding, we really COULD get the disk arm to read in the next disk page in sequence without a miss.
Another factor supports this speedup: the typical disk controller buffers an entire track in it's memory whenever a disk page is requested.
Reads in whole track containing the disk page, returns the page requested, then if later request is for page in track doesn't have to access disk again.
So when we're reading in pages one after another on disk, it's like we're reading from the disk an entire track at a time.
I/O is about TEN TIMES faster for disk pages in sequence compared to randomly place I/O. (Accurate enough for rule of thumb.)
PUT ON BOARD : We can do 800 I/Os per second when pages in sequence (S) instead of 80 for randomly placed pages (R). Sequential I/O takes 0.00125 secs instead of 0.0125 secs for random I/O.
DB2 Sequential Prefetch makes this possible even if turn off buffering on disk (which actually hurts performance of random I/O, since reads whole track it doesn't need: adds 0.008 sec to random I/O of 0.0125 sec)
IBM puts a lot of effort into making I/O requests sequentially in a query plan to gain this I/O advantage!
Example 9.2.2. Table Space Scan with Sequential Advantage. The 14286R of Example 9.2.1 becomes 14286S (S for Sequential Prefetch I/O instead of Random I/O). And 14286S requires 14286/800 = 17.86 seconds instead of the 178.6 seconds of 142286R. Note that this is a REAL COST SAVINGS, that we are actually using the disk arm for a smaller period. Striping reduces elapsed time but not COST.
Cover idea of List Prefetch. 32 pages, not in perfect sequence, but rela- tively close together. Difficult to predict time.
We use the rule of thumb that List Prefetch reads in 200 pages per second.
See Figure 9.10, page 546, for table.
Plan table row for an access step will have PREFETCH = S for sequential prefetch, PREFETCH = L for list prefetch, PREFETCH = blank if random I/O. See Figure 9.10. And of course ACCESSTYPE = R when really Random I/O.
Note that sequential prefetch is just becoming available on UNIX database systems. Often just put a lot of requests out in parallel and depend on smart I/O system to use arm efficiently
select * from T1 where C1 <> 10;
But the statistics usually weigh against it's use and so the query will be performed by a table space scan. More on indexable predicates later.
OK, now what about query:
select * from T1 where C1 = 10 and C2 between 100 and 200 and C3 like 'A%';
These three predicates are all indexable. If have only C1X, will be like previous example with retrieved rows restricted by tests on other two predicates.
If have index combinx, created by:
create index combinx on T1 (C1, C2, C3)...
Will be able to limit (filter) RIDs of rows to retrieve much more completely before going to data. Like books in a card catalog, looking up
authorlname = 'James' (c1 = 10) and authorfname between 'H' and 'K' and title begins with letter 'A'
Finally, we will cover the question of how to filter the RIDs of rows to retrieve if we have three indexes, C1X, C2X, and C3X. This is not simple.
See how to do this by taking out cards for each index, ordering by RID, then merge-intersecting.
It is an interesting query optimization problem whether this is worth it.
OK, now some examples of simple index scans.
Example 9.3.3. Index Scan Step, Unique Match. Continuing with Example 9.2.1, employees table with 200,000 rows of 200 bytes and pctfree = 30, so 14 rows/pg and CEIL(200,000/14) =14,286 data pages. Assume in index on eid, also have pctfree = 30, and eid||RID takes up 10 bytes, so 280 entries per pg, and CEIL(200,000/280) = 715 leaf level pages. Next level up CEIL(715/280) = 3. Root next level up. Write on board:
employees table: 14,286 data pages index on eid, eidx: 715 leaf nodes, 3 level 2 nodes, 1 root node.
Now query: select ename from employees where eid = '12901A';
Root, on level 2 node, 1 leaf node, 1 data page. Seems like 4R. But what about buffered pages? Five minute rule says should purchase enough memory so pages referenced more frequently than about once every 120 seconds (popular pages) should stay in memory. Assume we have done this. If workload assumes 1 query per second of this on ename with eid = predi- cate (no others on this table), then leaf nodes and data pages not buffered, but upper nodes of eidx are. So really 2R is cost of query.
This Query Plan is a single step, with ACCESSTYPE = I, ACCESSNAME = eidx, MATCHCOLS = 1.
Example 9.3.4. Matching Index Scan Step, Unclustered Match. Consider the following query:
select name, straddr from prospects where hobby = 'chess';
Query optimizer assumes each of 100 hobbies equally likely (knows there are 100 from RUNSTATS), so restriction cuts 50M rows down to 500,000.
Walk down hobby index (2R for directory nodes) and across 500,000 entries (1000 per page so 500 leaf pages, sequential prefeth so 500S).
For every entry, read in row -- non clustered so all random choices out of 5M data pages, 500,000 distinct I/Os (not in order, so R), 500,000R.
Total I/O is 500S + 500,002R. Time is 500/800 + 500,002/80, about 500,000/80 = 6250 seconds. Or about 1.75 hours (2 hrs = 7200 secs).
Really only picking up 500,000 distinct pages, will lie on less than 500, pages (out of 5 M). Would this mean less than 500,000 R because buffering keeps some pages around for double/triple hits?
VERY TRIVIAL EFFECT! Hours of access, 120 seconds pages stay in buffer.
Can generally assume that upper level index pages are buffer resident (skip 2R) but leaf level pages and maybe one level up are not. Should calculate index time and can then ignore it if insignificant.
If we used a table space scan for Example 9.3.4, qualifying rows to ensure hobby = 'chess, how would time compare to what we just calculated?
Simple: 5M pages using sequential prefetch, 5,000,000/800 = 625 seconds. (Yes, CPU is still ignored — in fact is relatively insignificant.)
But this is the same elapsed time as for indexed access of 1/100 of rows!!
Yes, surprising. But 10 rows per page so about 1/10 as many pages hit, and S is 10 times as fast as R.
Query optimizer compares these two approaches and chooses the faster one. Would probably select Table Space Scan here But minor variation in CARD(hobby) could make either plan a better choice.
Example 9.3.5. Matching Index Scan Step, Clustered Match. Consider the following query:
select name, straddr from prospects where zipcode between 02159 and 03158;
Recall CARD(zipcode) = 100,000. Range of zipcodes is 1000. Therefore, cut number of rows down by a factor of 1/100. SAME AS 9.3.4.
Bigger index entries. Walk down to leaf level and walk across 1/100 of leaf level: 500,000 leaf pages, so 5000 pages traversed. I/O of 5000S.
And data is clustered by index, so walk across 1/100 of 5M data pages, 50,000 data pages, and they're in sequence on disk, so 50,000S.
Compared to Nonmatching index scan of Example 9.3.4, walk across 1/10 as many pages and do it with S I/O instead of R. Ignore directory walk.
Then I/O cost is 55,000S, with elapsed time 55,000/500 = 137.5 seconds, a bit over 2 minutes, compared with 1.75 hrs for unclustered index scan.
The difference between Examples 9.3.4 and 9.3.5 doesn't show up in the PLAN table. Have to look at ACCESSNAME = addrx and note that this index is clustered, (clusterratio) whereas ACCESSNAME = hobbyx is not.
(1) Clusterratio determines if index still clustered in case rows exist that don't follow clustering rule. (Inserted when no space left on page.)
(2) Note that entries in addrx are 40 bytes, rows of prospects are 400 bytes. Seems natural that 5000S for index, 50,000S for rows.
Properties of index:
Example 9.3.6. Concatenated Index, Index-Only Scan. Assume (just for this example) a new index, naddrx:
create index naddrx on prospects (zipcode, city, straddr, name)
... cluster.. .;
Recall, estimated probability that a random row made some predicate true. By statistics, determine the fraction (FF(pred)) of rows retrieved.
E.g., hobby column has 100 values. Generally assume uniform distribution, and get: FF(hobby = const) = 1/100 = .01.
And zipcode column has 100,000 values, FF(zipcode = const) = 1/100,000. FF(zipcode between 02159 and 03158) = 1000.^ (1/100,000) = 1/100.
How does the DB2 query optimizer make these estimates?
See Figure 9.13, pg. 558. After use RUNSTATS, these statistics are up to date. (Next pg. of these notes) Other statistics as well, not covered.
DON'T WRITE THIS ON BOARD -- SEE IN BOOK Catalog Name
S t a t i s t i c Name
D e f a u l t Value
D e s c r i p t i o n
SYSTABLES CARD NPAGES
1 0 , 0 0 0 CEIL(1+CARD/20)
Number of rows in the table Number of data pages containing rows SYSCOLUMNS COLCARD HIGH2KEY LOW2KEY
2 5 n / a n / a
Number of distinct values in this column Second highest value in this column Second lowest value in this column SYSINDEXES NLEVELS NLEAF FIRSTKEY- CARD FULLKEY- CARD CLUSTER- RATIO
0 CARD/ 2 5
2 5
0% if CLUSTERED = 'N 95% if CLUSTERED =
Number of Levels of the Index B-tree Number of leaf pages in the Index B-tree Number of distinct values in the first column, C1, of this key Number of distinct values in the full key, all components: e.g. C1.C2.C Percentage of rows of the table that are clustered by these index values
Figure 9.13. Some Statistics gathered by RUNSTATS used for access plan determination
Statistics gathered into DB2 Catalog Tables named. Assume that index might be composite, (C1, C2, C3)
Go over table. CARD, NPAGES for table. For column, COLCARD, HIGH2KEY, LOW2KEY. For Indexes, NLEVELS, NLEAF, FIRSTKEYCARD, FULLKEYCARD, CLUSTERRATIO. E.g., from Figure 9.12, statistics for prospects table (given on pp. 552-3). Write these on Board.
SYSTABLES NAME CARD NPAGES
......... prospects 5 0 , 0 0 0 , 0 0 0 5 , 0 0 0 , 0 0 0 .........
SYSCOLUMNS NAME TBNAME COLCARD HIGH2KEY LOW2KEY
............... hobby prospects 1 0 0 Wines Bicycling zipcode prospects 1 0 0 0 0 0 9 9 9 9 8 0 0 0 0 1 ...............
SYSINDEXES NAME TBNAME NLEVELS NLEAF FIRSTKEY CARD
FULLKEY CARD
CLUSTER RATIO
..................... addrx prospects 4 5 0 0 , 0 0 0 1 0 0 , 0 0 0 5 0 , 0 0 0 , 0 0 0 1 0 0 hobbyx prospects 3 5 0 , 0 0 0 1 0 0 1 0 0 0 .....................
CLUSTERRATIO is a measure of how well the clustering property holds for an index. With 80 or more , will use Sequential Prefetch in retrieving rows.
Look at Figure 9.14, pg. 560. QOPT guesses at Filter Factor. Product rule assumes independent distributions of columns. Still no subquery predicate.
Predicate Type Filter Factor Notes Col = const 1/COLCARD "Col <> const" same as "not (Col = const)" Col ∝ const Interpolation formula "∝" is any comparison predicate other than equality; an example follows Col < const or Col <= const
(const - LOW2KEY) (HIGH2KEY-LOW2KEY)
LOW2KEY and HIGH2KEY are estimates for extreme points of the range of Col values Col between const and const
( c o n s t 2 - c o n s t 1 ) (HIGH2KEY-LOW2KEY)
"Col not between const1 and const2" same as "not (Col between const1 and const2)" Col in list (list size)/COLCARD "Col not in list" same as "not (Col in list)" Col is null 1/COLCARD "Col is not null" same as "not(Col is null)" Col like 'pattern' Interpolation Formula Based on the alphabet Pred1 and Pred2 (^) FF(Pred1).FF(Pred2) As in probability Pred1 or Pred2 FF(Pred1)+FF(Pred2) -FF(Pred1).FF(Pred2)
As in probability
not Pred1 1 - FF(Pred1) As in probability
Figure 9.20. Filter Factor formulas for various predicate types Class 17.
Interpret this probabilistically, and expected time for retrieving rows is only 1/80 second. Have to add index I/O of course. 2R, .05 sec.
Example 9.5.2. Concatenated Index, Matching index scan.
select name straddr from prospects where zipcode between 02159 and 04158 and hobby = 'chess' and incomeclass = 10;
Now, important. Not one contiguous interval in index. There is one interval for: z = 02159 and h = 'c' and inc = 10, and then another for z = 02160 and h = 'c' and inc = 10, and... But there is stuff between them.
Analogy in telephone directory: last name between 'Sma' and 'Smz' and first name 'John'. Lot of directory to look through, not all matches.
Query optimizer here traverses from leftmost z = 02159 to rightmost z = 04158 and uses h = 'c' and inc = 10 as screening predicates.
We say the first predicate is a MATCHING predicate (used for cutting down interval of index considered) and other two are SCREENING predicates.
(This MATCHING predicate is what we mean by Matching Index Scan.)
So index traversed is: (2000/100,000) (filter factor) of 500,000 leaf pages, = 10,000 leaf pages. Query optimizer actually calculates FF as
(04158-02159)(HIGH2KEY-LOW2KEY) = 2000/(99998-00001) = 200/99997 or approximately 2000/100,000 = 1/
Have to look through 1/50.^ NLEAF = 5000 pages, I/O cost is 5,000S with elapsed time: 5,000/400 = 12.5 seconds.
How many rows retrieved? (1/50)(1/100)(1/10) = (1/50,000) with 50M rows, so 1000 rows retrieved. Sequential, class?
No. 1000R, with elapsed time 1000/40 = 25 seconds. Total elapsed time is 37.5 secs.
Example 9.5.3. Concatenated Index, Non-Matching Index Scan.
select name straddr from prospects where hobby = 'chess' and incomeclass = 10 and age = 40;
Like saying First name = 'John' and City = 'Waltham' and street = 'Main'. Have to look through whole index, no matching column, only screening predicates.
Still get small number of rows back, but have to look through whole index. 250,000S. Elapsed time 250,000/400 = 625 seconds, about 10.5 minutes.
Number of rows retrieved: (1/100)(1/10)(1/50)(50,000,000) = 1000. 1000R = 25 seconds.
In PLAN TABLE, for Example 9.5.2, have ACCESSTYPE = I, ACCESSNAME = mailx, MATCHCOLS = 1; In Example 9.5.3, have MATCHCOLS = 0.
Definition 9.5.1. Matching Index Scan. A plan to execute a query where at least one indexable predicate must match the first column of an index (known as matching predicate, matching index). May be more.
What is an indexable predicate? Equal match predicate is one: Col = const See Definition 9.5.3. Pg. 565
Say have index C1234X on table T, composite index on columns (C1, C2, C3, C4). Consider following compound predicates.
C1 = 10 and C2 = 5 and C3 = 20 and C4 = 25 (matches all columns)
C2 = 5 and C3 = 20 and C1 = 10 (matches first three: needn't be in order)
C2 = 5 and C4 = 22 and C1 = 10 and C6 = 35 (matches first two)
C2 = 5 and C3 = 20 and C4 = 25 (NOT a matching index scan)
Screening predicates are ones that match non-leading columns in index. E.g., in first example all are matching, in second all are matching, in third, two are matching, one is screening, and one is not in index, in fourth all three are screening.
Finish through Section 9.6 by next class. Homework due next class (Wednesday after Patriot's day). NEXT homework is rest of Chapter 9 non- dotted exercises if you want to work ahead.