1. We have seen that hashing can be used for indexing, since we can determine from the primary key which "bucket" a tuple will be stored in. If we choose the number of buckets so that each usually fits in one block, then only one block needs to be read or written for a given value of the key.
Let's suppose that 11 tuples of some relation can fit into a block, and we expect to have up to 40,000 of them. How many "buckets" would you recommend? (remember that the number of tuples in each bucket will vary somewhat).
2. Also, hashing can be used for a (2-pass) join R ⋈ S, where M buffers are available, and (for the smaller relation S) M < B(S) < M2,
we can use hashing on the key in a 2-pass join. In this case, if the relation is
already "hashed" into buckets for indexing, do we
need to "re-hash" the relation into a different number of buckets? If
so, how many? Let's try this for:
M = 105 (100 + 5 "extra" just in case)
B(S) = 4900, and we assume that each block can hold many tuples, and
averages 90% full.
B(R) is not limited, but if you want a figure, lets say B(R) = 20,000