CSc 417, Assignment 3

Due Friday 28 September 2007

RAID 5 and BLOB

Tying together chapters 11 and 12, we read on p557 that we should distribute tredundant blocks across all the RAID 5 disks. The suggestion made is that

The redundant block for blocks i be stored on block i of disk i mod n, where n is the total number of disks.
Would it be better or worse to distribute by surface, or some other method? (replace block with surface in 1.)

On page 596 we read that to increse the transfer rate of a BLOB we may stripe the blocks across all the disks.
What we want to do is read all the blocks on a track (one revolution of the disk) continuously, in order to eliminate delays due to latency, and continue on for other surfaces of the same cylinder. And we want to do this with all the disks at once.
Let's assume for example that there are

4 disks
4 surfaces each

so that 1 out of 4 blocks is the redundant one.
It would seem that 3 tracks worth of data would need to occupy 4 corresponding tracks, one on each disk. Using the above suggestions for RAID 5, we could, in the time of one revolution,

Read a track from each of 4 disks, 3/4 of which will contain data
Read a track from each of 3 disks, all of which contain data. (We don't need to read the disk whose surface is the redundant one.)

Let's assume that there is enough memory available so that by buffering, we can ignore the fact that since each disk rotates independently, the time at which the "start" of a track is under the head is not the same on each disk. So both of the suggested RAID 5 methods read "3 tracks of data" per rotation.

The question: How is it best to distribute blocks whose order in the BLOB is 0,1,2,3,4,5,6,... so that we can deliver them at the fastest possible transfer rate?

Can you think of any way to increase this to closer to "4 tracks of data"?

Excercises 12.5, Blocks of Records

12.5.1: We have blocks of records ordered by their sort key (sequential file). We know from a "sparse index" structure the range of keys contained in each block. We assume no other pointers from outside. How should we manage insertion and deletion? Eventually the number of blocks will grow. A common operation is fetching one specific record by the key. Compare the average number of disk I/O's to read a record (not counting using the index) once we know where the block is (we are searching for one record).

Split blocks whenever there is an overflow. Adjust the range in the index. (In other words, no "overflow" blocks)
Keep the range fixed in the index, and use overflow blocks ans needed. In each block, keep an offset table for the records in that block alone, and a pointer to the next offset block (so they form a chain. Assume that half the blocks have an overflow block, and half of those have a second overflow block.
Same as (ii) but keep all the offsets in the first block, including those for overflow blocks.
(skip)
Same as (iii) but keep the sort key along with a pointer in the single offset table. (Note that this wastes no space, as the sort key does not need to be duplicated where the rest of the redord is. )

12.5.2 Relational database systems have always preferred to use fixed-length tuples (records) if possible. Give some reasons why.