Review of Database Software Design
Wednesday, 13 April
Disks
(Chapter 13)
- Mechanics: Read/write of whole blocks, Timing: seek, rotational
latency, transfer time
- Disk I/O is dominant "cost" of computation:
- Speeding it up: Organizing data by cylinders, using multiple
disks, elevator algorithm
- Failures: checksums, stable storage (mirroring), and RAID
- Blocks containing records -Fixed length or Variable length. Need
for headers to describe what's there.
Indexes
- B+ tree: Disk structure
for rapid lookup. A relation can have both primary and secondary
(possibly non-unique) indexes. Useful to fetch records (tuples) in
order, and for range queries.
- Hashing into buckets
(blocks) as primary storage technique. No "data structure" on disk,
each record is simply stored in a "bucket" of one or more blocks,
determined by "hashing" its primary key. To find a record, hash the
sought key, and read that bucket, often just one block read! Useless
for ordering or range queries. You can't also so has the relation some
other way.
Query Execution: scanning and algorithms
(Chapter 15) Number of Block I/O required
- Scanning tables: sequential, or using index.
- There is a trade-off here, depending on how many records fit in
a block, it may be quicker to scan the whole table and sort, for
example than to use the index.
- One-pass algorithms
- Tuple at a time, for example, selection
- Grouping
- Binary operations, such as join, cost is B(R) + B(S)
- Number of buffers needed -- Join, for example, can smaller
relation fit in memory? This places limits on the size of inputs.
- Nested-loop join - No size limit, cost is B(S) + B(S)B(R)/M
- Two-pass algorithms
- Generally, cost is 3(B(R) + B(S))
- using sorting
- Multiway Merge-sort - describe, cost: read-write-read
of everything
- Sort-join
- using Hashing, size limit on smaller relation only
- index based: selection, join using index zigzag join, using 2
B-trees
Query compiler
(Chapter 16)
- select-from-where to relational algebra (tree)
- Project on SELECT list, select on conditions in WHERE clause,
cross-product of FROM list
- Pushing operations up or down the tree, selection in particular
- If the conditions are connected by AND, they can be done in any
order.
- Involving one attribute, push selection down to the
relation(s) containing it
- Involving [equality] of attribute in 2 relations, convert
cross product to join ("join condition")
- cost estimation
- At this stage in the logical query plan, we want to estimate
the size, in tuples, of the result and intermediate stages. This will
be based on statistics of T(R), and V(R,a) -- number of distinct values of a in R.
(For example, if we want a particular value for a, we would expect to
select T(R)/V(R,a) tuples)
- Best join order: minimize intermediate results
- We should choose the order the various operations, joins in
particular, to minimize the estimated size of intermediate results (The
size of the final result will be the same, however we arrange the plan)
- If there are several joins, this is non-trivial, as there are
many possibilities. A simple strategy is to use a left-deep tree, and
pick as the first join the pair with the smallest result.
- Note that most algorithms are asymmetric, with the build relation on the left (put
into M) and the probe
relation on the right (read a tuple at a time, or use index, if
available)
- choosing algorithms
- Finally, choose the best algorithm for each operation, starting
with scanning of tables
- Passing intermediate results
- pipelining
- storing in Memory buffers
- materializing = writing blocks to disk
Logging and Recovery
(Chapter 17)
- Transactions are basic units, that maintain consistency of a
database. In case of power failure, we must recover the latest
consistent state from what we find on disk
- Accordingly, it is important to write on disk a log of all the
changes made by each transaction, and, when a transaction commits,
write that fact on the log and flush
the log to disk.
- UNDO logging
- When all changes by T have reached disk, then write <commit
T> on log, then flush the log
- To recover after a crash, scan the log backwards, and undo all
changes by uncommitted transactions
- REDO logging
- write <commit T> on log and flush log to disk, before
- writing any of T's changes to disk, now do so as soon as
convenient.
- To recover after a crash, scan the log backwards to identify
the committed transactions, then forwards to redo all the changes made
by them.
- (There is also UNDO-REDO logging, which is flexible in when data
goes to disk, at the expense of writing both old and new values in the
log.)
- A checkpoint can limit
the length of the log required to be scanned, and kept.
- an archive is a dump of
the whole database, necessary in case of catastrophic failure. Best
kept off-site, and the log also. In fact, the log is supposed to be on
"stable storage", with the redundancy provided by transmitting it to a
secure location.
Concurrency control - locking, etc. - exam will be limited to locking
Why? A Transaction should, in isolation, maintain the database in a
consistent state. Even a read-only transaction should "see" a
consistent state. If transactions were executed one at a time
(serially) there would be no problems. But concurrency is good, so we
need to avoid conflict between transactions.
- Read-only transactions will not cause inconsistency, but probably
want to "see" a consistent state.
- Transactions that write must avaid confilct with other
concurrent transactions. There are 2 problems where we cannot say that
one transaction "came before" another:
- The same element is read by one transaction and written by
another
- The same element is written by two different transactions
- Something must be done to either prevent these "race conditions"
from occuring, or aborting one of the transactions involved. (It will
have to be restarted)
Locking
The most common method of concurrency control is to have each
transaction lock the elements it plans to read or write. A lock manager
will either grant the lock request, or delay the transaction until a
lock held by another transaction has released it.
- 2-phase locking: Phase 1: get all the locks you will need, after
that you may start releacing locks
- All locks must be released - In fact, this is not up to the
programmer, but the system!
- No locks may be requested after the first unlock, lest another
transaction get "in the middle"
Different lock modes
- Readers do not conflict with each other, so they can have "shared
locks." These lock out any writers.
- Writers must request "exclusive locks" since they need to avoid
conflict with other writers.
- If there are many active readers, a writer might have to wait a
long time. (starvation)
Optimistic methods (not on exam)
- Timestamps: Each
transaction is given a unique timestamp. It is allowed to run, unless
one of two things occurs:
- Late read: It attempts
to read an element already written by a transaction with a later
timestamp. Possibly the database can supply the proper previous value,
else it must abort.
- Late write: It
attempts to write an element already read by a transaction with a later
timestamp. In this case it must abort.
- Validation: Transactions
have a read phase and a write phase. We know the read set(RS) and write
set(WS) of each transaction that is concurrently active.
- During the read phase they do all their reading and prepare
what they want to write
- Validation: the transaction is ready to commit, but it must abort if its RS or WS overlaps with
the WS of any previously validated recent transaction.
- It now does its writing.
Dirty reads, Deadlocks - Ahh, transactions
- a "Dirty Read" is a transaction reading a value written by an
incomplete transaction. Since that transaction may still abort for any
number of reasons, the "dirty" value becomes invalid, and should not
have been read. This can lead to "cascading rollback."
- To avoid the problem:
- Strict locking: No locks may be released until a
transaction is surely going to commit.
- Timestamps: Reader must be delayed until other transaction
commits or aborts (and this could lead to deadlock)
- Validation: The problem does not occur, since no writing is done
until the commit decision has been made.
-
- Deadlock:
- So, course evaluator "Ben," new on the job, was following the
instruction to "wait for Prof. Jensen to sign a paper" before
conducting the course evaluation. Meanwhile, Prof. Jensen was waiting
for the evaluator to emerge from the room before entering, as custom
requires. Being humans, we eventually detected this deadlock.
- Any time a computer system delays something, such as a lock
request, there is a potential for deadlock, and this must be detected,
by timeout (I got tired of waiting), or by constructing a "wait-for"
graph and searching it for cycles. If a cycle is detected, one
transaction must be aborted.
- In a locking scheme, if the elements can be ordered (say, by
disk address) and locks requested in that order, deadlocks are
avoided. -- some of you did that on the assignment!
Lin Jensen