Database Software -  Assignment 5

Due Friday, 16 February 2018

Where M is given, you may assume that a few more buffers would be available, for instance to acumulate output.

1. A website requires users to register in order to view it, and users can also post comments to the site. Now let's assume there are 10,000 users, and 500 comments, made by 100 different people (1% of all users). We want to join comments with users (in order to get their names and email addresses.)

Schema:
user U(id, name, email, aboutme), with a unique index on id.
comment C(id, date, subject, thecomment)   variable length records, no indexes.

Please pick and describe a reasonable join algorithm, based upon the following statistics. Estimate the number of disk I/O's required.
You could compare the I/O requirements if a 2-pass join with that of using an index.

           
  U 
  C    
  T
10000
500
  B
500
250
  V(id)
10000
100
M available
200

2. (Unrelated to Question 1.) Describe a (2-pass) algorithm for R left outer join S on the comon attrribute(s) Y where there are M =100 memory buffers available, and these statistics (Note that there will be "dangling tuples" of R, which must be joind with NULLs of S). No indexes available.


 R
 S
 T
 25000
 4000
 B 
5000
500
 V(Y)
2500
2000

How many disk I/O's will be required?
Estimate how many rows will be in the result.
T/V gives you an average of how many of the same Y values there are.