CS262A Reading Summary 5

An Evaluation of Buffer Management Strategies for Relational Database Systems

Hong-Tai Chou, David J. DeWitt
Summary by Feng Zhou
9/16/2002

This paper presents a buffer management algorithm tailored specifically to DBMS.  Strong points:

  1. The most important point in this paper is to use distinct buffer management policies for different accessing patterns, at the same time.  The idea behind this is to "build buffer management  policies that 'understand' database systems".  This serves as the unique motivation of the paper.  Although it had been discussed before the paper.  What the temporal and special unit(granularity) should be to assign a certain policy remained unclear.  It is cleared stated in this paper that the unit should be a single file/relation in a single query/operation.   Inside that unit, the access pattern is consistent and can be given a certain fixed policy.  But the system at any time can be using many different  buffer management policies.  That is how it differs from the general purpose buffer/cache management provided by the OS and also used widely in DBMS before.
  2. The paper presents a taxonomy for page reference patterns exhibited by common access methods and operations - Query Locality Set Model.  Nine access patterns, including five sequential patterns and four hierarchical patterns, cover most of DB operations.  
  3. The DBMIN buffer management algorithm uses different policies for each page reference pattern.  It divides the buffer space into localities sets, which correspond to relations under certain operations.  Every locality set has certain size limit according to some heuristics.  Pages are swapped in and out according to the policy within the locality set.  To support sharing of page between locality sets, which is needed by concurrent data access, there is a global table to reference each page in the whole buffer.  However, as a trade-off for simplicity, every page is allowed to be owned by only one locality group, which is the first one to grap that page.

One major flaw:

The simplification of letting one buffer belong to only one locality set may cause premature swapping out of useful pages when system is under heavy load.  So the statistical infomation collected in the owned locality set may not reflect the actual access pattern.  Some pages useful to one locality set may be owned by another locality set, which can later swapping them out, dropping them to the global free list.  This may result in the page be actually replace before the locality set  interested in the page "grabs" it from the global free list.  I think this will often happen when data sharing is intense and free space is very limited.