CS262A Reading
Summary 5
An Evaluation
of Buffer Management Strategies for Relational Database Systems
Hong-Tai Chou, David J. DeWitt
Summary by Feng Zhou
9/16/2002
This paper presents a buffer
management algorithm tailored specifically to DBMS. Strong points:
- The most important
point in this paper is to use distinct buffer management policies for different
accessing patterns, at the same time. The idea behind this is to "build
buffer management policies that 'understand' database systems". This
serves as the unique motivation of the paper. Although it had been
discussed before the paper. What the temporal and special unit(granularity)
should be to
assign a certain policy remained unclear. It is cleared stated in this
paper that the unit should be a single file/relation in a single query/operation.
Inside that unit, the access pattern is consistent and can be given
a certain fixed policy. But the system at any time can be using many
different buffer management policies. That is how it differs
from the general purpose buffer/cache management provided by the OS and also
used widely in DBMS before.
- The paper presents a taxonomy for page reference patterns exhibited
by common access methods and operations - Query Locality Set Model. Nine
access patterns, including five sequential patterns and four hierarchical
patterns, cover most of DB operations.
- The DBMIN buffer management algorithm uses different policies for
each page reference pattern. It divides the buffer space into localities
sets, which correspond to relations under certain operations. Every
locality set has certain size limit according to some heuristics. Pages
are swapped in and out according to the policy within the locality set. To
support sharing of page between locality sets, which is needed by concurrent
data access, there is a global table to reference each page in the whole
buffer. However, as a trade-off for simplicity, every page is allowed
to be owned by only one locality group, which is the first one to grap that
page.
One major flaw:
The simplification of letting
one buffer belong to only one locality set may cause premature swapping out
of useful pages when system is under heavy load. So the statistical
infomation collected in the owned locality set may not reflect the actual
access pattern. Some pages useful to one locality set may be owned
by another locality set, which can later swapping them out, dropping them
to the global free list. This may result in the page be actually replace
before the locality set interested in the page "grabs" it from the
global free list. I think this will often happen when data sharing
is intense and free space is very limited.