Suppose a Student relation at Bishop's consists of T=4000 tuples,
and occupies B=400 blocks. There are M=100 memory blocks available.
There are B+ trees on both id and name. We want a sorted list under the
following conditions. Should we use a B+ index, or a sequential scan,
followed by a sort? (As we shall see, a 2 phase sort requires 3B block
I/O s: read, write sublists, read back.) Give the cost in Block reads for
each method you consider.
We want the whole list, sorted by name. There is an index on
name, but the blocks are NOT in name order. (Note that we can read 100 blocks
into memory, at which point there is a 25% chance that the next tuple
will already be in memory.)
We just want students named 'Wang, Wei'
We want only first year students, in ID order (so, ID >
2220000 or some such number) and we have statistics that this will be
less than 25% of the tuples. Does your answer depend on whether the
blocks are in ID order?
(Graduate Students) In
general, if we want the whole relation sorted, as in (a), for what
ratio for tuples per block, T/B would using a B+ index be
advantageous?
There may be occasions where an index might be useful in
computing a join. Consider the following: A professor might want a list
of the students in her class. As well as T(enroll), We know V(enroll,
coursecode), [in fact, this is <= T(courses)]. This will give us an
estimate of how many students might be in a class. Let's estimate average class size
for Bishop's as 40. At this point, we have a smaller relation, and wish
to join it with the student relation of question 1. Should we
use an index? Assume a 2-level B+ tree, with the root block kept in
memory. (So to access a single tuple
takes 2 block reads: one for the leaf node, + 1 for the data).
In fact, we only need a semi-join, the student fields for the matching rows, as in
select * from students where id IN
(select id from enroll
where course = 'MAT123')
(a) Describe an algorithm
for a set intersection operator R ∩ S, where S is "smaller" and B(S)
< M, so a search structure for S will fit into M blocks of memory.
Assume that R, at least, is a set (has no duplicates). How much
blocking is involved in your algorithm? That is, when can the first
tuple be output?
3.(b) If R might have duplicates, how could we keep them out of the
result?
An "outer join" of 2 relations includes also tuples of one or
both relations that "do not match", the output being the "dangling"
tuple joined with nulls for the other relation. Describe an algorithm
for R natural right outer join S where S fits in main memory (B(S) <
M). When can the first joined tuple be output? When are the dangling
tuples from S output?
(Graduate Students) A
full outer join, assuming S fits in memory. As above, but when do the
dangling tuples from R appear in the output?