CS 307, Assignment 6
Due as specified in course webpage.
Submit on paper, as I expect you to draw some trees. Please note that
the Science Secretary leaves at noon on Fridays. You will need to
run EXPLAIN on 4 queries, and print them out.
Query plans using EXPLAIN
explain select .... will
show you the
query plan for any query. From it you can deduce the relational
algebra "tree", the first operations will be at the bottom, and the
last one at the top of the indented list. It will also show you what algorithm is chosen to do
something.
Please note how joins are processed. Also note whether indexes are
used, where they exist. (eg. Battles has an index on the primary key).
"Scan" means get rows from a table. Sequential (seq) is the fastest,
Index
scan will get the rows in order by primary key. (An index is quite useful for a very large
table, but is unlikely to be chosen as our databases are relatively
small.)
Finally, there is an estimate of "cost" on each line. The top line will
have total cost. (Projection operations are not explicitly shown,
though they affect the "width" of results.)
What to do:
For each explain of a query, first derive the relational algebra tree
that is used, and draw it, showing the joins ("Join Filter" is the join
condition), and where the selections are
placed ("Filter"). The order of joins is important to note, and also
the algorithm chosen. The query planner tries to estimate how many rows
will result from each operation, in order to figure "cost." Please
comment on the "cost" estimates you observe.
Here are two queries, explained, from the ships database. Both have the
same result, the second does the useless join with battles defined in
the view OB (as a number of you did.)
ships=# select a.ship, a.battle, b.battle
from outcomes a, outcomes b
where a.ship=b.ship
and a.battle < b.battle
;
ship | battle | battle
-----------------+-----------------+-----------------
California | Guadalcanal | Surigao Strait
Duke of York | North Cape | Surigao Strait
Petri | Denmark Strait | North Cape
Yorktown | Guadalcanal | MidwayIsland
(4 rows)
ships=# explain select a.ship, a.battle, b.battle
from outcomes a, outcomes b
where a.ship=b.ship
and a.battle < b.battle
;
QUERY PLAN
---------------------------------------------------------------------------
Merge Join (cost=143.69..241.66 rows=1768 width=57)
Merge Cond: (a.ship = b.ship)
Join Filter: (a.battle < b.battle)
-> Sort (cost=71.84..74.42 rows=1030 width=38)
Sort Key: a.ship
-> Seq Scan on outcomes a (cost=0.00..20.30 rows=1030 width=38)
-> Sort (cost=71.84..74.42 rows=1030 width=38)
Sort Key: b.ship
-> Seq Scan on outcomes b (cost=0.00..20.30 rows=1030 width=38)
(9 rows)
ships=# explain select a.ship, a.battle, b.battle
from ob a, ob b
where a.ship=b.ship
and a.battle < b.battle
;
QUERY PLAN
--------------------------------------------------------------------------------
Hash Join (cost=26.97..51.59 rows=1 width=57)
Hash Cond: (public.outcomes.ship = public.outcomes.ship)
Join Filter: (public.outcomes.battle < public.outcomes.battle)
-> Hash Join (cost=1.11..25.54 rows=26 width=38)
Hash Cond: (public.outcomes.battle = public.battles.name)
-> Seq Scan on outcomes (cost=0.00..20.30 rows=1030 width=38)
-> Hash (cost=1.05..1.05 rows=5 width=19)
-> Seq Scan on battles (cost=0.00..1.05 rows=5 width=19)
-> Hash (cost=25.54..25.54 rows=26 width=38)
-> Hash Join (cost=1.11..25.54 rows=26 width=38)
Hash Cond: (public.outcomes.battle = public.battles.name)
-> Seq Scan on outcomes (cost=0.00..20.30 rows=1030 width=38)
-> Hash (cost=1.05..1.05 rows=5 width=19)
-> Seq Scan on battles (cost=0.00..1.05 rows=5 width=19)
(14 rows)
Please also comment on the plan, which seems to be based upon wild
estimates.
PostgreSQL has a command, VACUUM ANALYZE, that can be run to clean up a
database, and optionally analyze it, which should give the query
planner more information about things like the number of rows to expect.
After running this command, here are the two new plans: (If the tree remains the same for each
query, you can just say so, but if it changes, draw the new one.)
ships=# vacuum analyze;
VACUUM
ships=# explain select a.ship, a.battle, b.battle
from outcomes a, outcomes b
where a.ship=b.ship
and a.battle < b.battle
;
QUERY PLAN
------------------------------------------------------------------------
Hash Join (cost=1.54..3.30 rows=11 width=48)
Hash Cond: (a.ship = b.ship)
Join Filter: (a.battle < b.battle)
-> Seq Scan on outcomes a (cost=0.00..1.24 rows=24 width=32)
-> Hash (cost=1.24..1.24 rows=24 width=32)
-> Seq Scan on outcomes b (cost=0.00..1.24 rows=24 width=32)
(6 rows)
ships=# explain select a.ship, a.battle, b.battle
from ob a, ob b
where a.ship=b.ship
and a.battle < b.battle
;
QUERY PLAN
--------------------------------------------------------------------------------
--
Hash Join (cost=3.76..5.77 rows=8 width=48)
Hash Cond: (public.outcomes.battle = public.battles.name)
-> Hash Join (cost=2.65..4.54 rows=9 width=48)
Hash Cond: (public.outcomes.battle = public.battles.name)
-> Hash Join (cost=1.54..3.30 rows=11 width=48)
Hash Cond: (public.outcomes.ship = public.outcomes.ship)
Join Filter: (public.outcomes.battle < public.outcomes.battle)
-> Seq Scan on outcomes (cost=0.00..1.24 rows=24 width=32)
-> Hash (cost=1.24..1.24 rows=24 width=32)
-> Seq Scan on outcomes (cost=0.00..1.24 rows=24 width=32
)
-> Hash (cost=1.05..1.05 rows=5 width=16)
-> Seq Scan on battles (cost=0.00..1.05 rows=5 width=16)
-> Hash (cost=1.05..1.05 rows=5 width=16)
-> Seq Scan on battles (cost=0.00..1.05 rows=5 width=16)
(14 rows)
Join or subquery? -- Timetable database.
Four done, four more to go. For the next two, you are more on your own.
Do the EXPAIN in a terminal
yourself, and interpret as before.
select students.* from
students natural join enroll
where course='CSC101A01';
select * from
students where linux in (select linux from enroll
where course='CSC101A01');
Your own database
Finally, explain, and comment on the plan, for two queries you use in your
own database. If you have a view defined involving a join, that would
be a good one to do.