CSc 417 Assignment 5

Due Friday 24 March

Query plans

Excercises 16.4.1 (a), (c), (d), (f) - use Y instead of Z, (h) -- Estimate size of results:
    for range queries, estimate 1/3
Here are the statistics for relations W, X, Y, and Z
W(a,b)
X(b,c)
Y(c,d)
Z(d,e)
T(W) = 100
T(X) = 200
T(Y) = 300
T(Z) = 400
V(W,a) = 20
V(X,b) = 50 V(Y,c) = 50 V(Z,d) = 40
V(W,b) = 60 V(X,c) = 100 V(Y,d) = 50 V(Z,e) = 100

(a)  Join of all 4 relations (natural join)
(c)  Selection on Y where c=20
(d) ( Selection on Y where c=20) natural join Z
(f) Selection on Y where d>10
(h) Selection on W where a=1 AND b>2

16.6.3: Greedy algorithm for the join order of:


R(a,b)
S(b,c)
T(c,d)
U(a,d)
T(R) = 1000
T(S) = 100
T(T) = 200
T(U) = 1000
V(R,a) = 100
V(S,b) = 100 V(T,c) = 10 V(U,a) = 100
V(R,b) = 100 V(S,c) = 10 V(T,d) = 150 V(U,d) = 100
Please indicate the size of each join you considered, and the intermediate join sizes. Total cost is the sum of the intermediate join sizes you selected. Use only left-deep trees.
Also choose an algorithm for each join in your final plan, given also that there is an index on S(b). This is the "physical query plan".

16.7.1 Relation R(a,b,c,d) has a clustering indes on a and non-clustering indexes on each of the other attributes (this means that it is stored ordered by a).
Parameters are: B(R) = 1000, T(R) = 5000,
                         V(R,a) = 20, V(R,b) = 1000, V(R,c) = 5000, & V(R,d) = 500.
Give the best query plan: (index scan or table-scan followed by a filter step) and the disk-I/O cost for:

(b) Selection on R where a=1 AND b=2 AND c >= 3
(c) Selection on R where a=1 AND b<=2 AND c >= 3