Adaptive Multi-Policy (AMP) caching Linux implementation and simulator

Introduction
Compiling an AMP-enabled Linux kernel
Compiling user-level tools and the AMP simulator
Collecting application traces
Using the AMP implementation
Running trace driven simulation
License

Introduction

This document describes how to use the AMP program-context-specific disk caching package. This package contains a Linux kernel patch and several user-level tools. For more information and recent updates about AMP, see the AMP website This package allows you to:

Run applications with the AMP Linux page cache implementation, and compare it to the original Linux page cache. AMP could improve disk caching performance for certain applications.
Reproduce trace-driven simulation results comparing AMP with other caching policies like OPT, LRU, ARC and PCC (partial implementation).
Use any part of the package, e.g. the I/O trace collector and simulator, for other disk caching related research.

AMP should be considered Alpha-quality software. DO NOT run AMP-patched kernel on a system containing useful data. The implementation deals with persistent data and it is possible that bugs in the kernel patch could destroy valuable disk files. System Requirements:

An x86 system capable of running the Linux 2.6.8.1 kernel, for running the patched kernel. Fedora 3 and Debian Sarge work for us.
Java SDK >1.4 for running the user-level tools and simulator. Available from java.sun.com.
Apache Ant build tool, available from the Ant website.

This document was last modified on $Date: 2005/03/29 07:40:33 $.

Compiling an AMP-enabled Linux kernel

Apply the patches in patches/ to Linux 2.6.8.1 kernel source. The order is amp_trace.patch first and then amp.patch. Configure the kernel as normal (there's currently no AMP-enabling options). Then build, install and reboot with the patched kernel.

Compiling user-level tools and the AMP simulator

Build the user-level programs by running ant in the package root directory. Add bin/ to your path or create a softlink to bin/amp from anywhere in your path to finish installation. Now typing amp should give you a list of tools to run.

Collecting application traces

IMPORTANT: First thing is to make sure the following environment variable is set:

export LD_ASSUME_KERNEL=2.3.99

This tells glibc to stop using certain recent kernel features. For Debian and Fedora 3 this prevents the frame-pointer-less version of glibc from being used. This is necessary because currently the AMP kernel patch needs the user-level stack frame-pointers to walk the user-level stack to obtain program context information.

Then start trace collection with something like,

amp trace 16 trace1

where "16" is the size of memory allocated for holding the trace in MB. "trace1" is the name of the trace file. Then run whatever application in other terminal windows. When done, press any key in the terminal running amp trace and trace files trace1.dat, trace1.bt and trace1.files will be generated.

Caveats:

The memory buffer allocated must be large enough to hold the entire trace (about 50 bytes per access). Samples will be dropped when the buffer is full. The current amount used and free can be examined by cat /proc/tracectl.
Currently buffer space is allocated in big chunks. In some cases allocations could fail when not so many continuous slots are available. Rebooting the system often solves this problem.

Using the AMP implementation

Also first make sure the LD_ASSUME_KERNEL is set properly as described above.

Currently the workflow of AMP implementation is not automated yet. Manual intervention is needed when workload changes. This is because pattern detection is now done by a user-level program. We may implement continuous pattern detection in kernel in the future. But before that, here's how it works. First a trace is collected when the workload is run. Then the pattern detection tool is run on the trace to generate a "policy" file. After that the policy file is fed into the kernel through the proc interface. By this time AMP caching is enabled and file accesses by the workload will go through AMP.

Assuming a trace has already been collected using the steps described above and it is named trace1. Here's how to parameterize AMP with pattern detection results from the trace (as root).

amp genbps trace1.dat > trace1.bps
amp -o trace1.policy trace1.dat trace1.bps
cat trace1.policy > /proc/amp_partitions

Some useful details,

Write a new policy file into /proc/amp_partitions will currently reset AMP and free all currently cached blocks in all MRU partitions.
A small trick to ensure that the application under test is run with a cold cache is to place all data in a non-root file system and remount the file system every time before your test is run. This will evict all cache pages associated with the file system.
cat /proc/amp_partitions will list active sizes of all cache partition.

Running trace driven simulation

The package includes a simple simulator of several cache policies including OPT, LRU, ARC, DEAR, AMP and PCC (only the pattern detection part). The simulator takes a trace as input and outputs cache hit rate for various policies and cache sizes. For example, to compare AMP and ARC for trace trace1 with cache size from 16 pages to 1024 pages.

amp sim -size 16 1024 -ampc -arc trace1.dat trace1.bps

Type amp sim to see a full list of options. As an example, data/ contains a short trace of running glimpseindex on a 8MB directory containing some C source code.

License

Please acknowledge me and drop me an email (Feng Zhou <zf@cs.berkeley.edu>) if you use the AMP code in your research.

The kernel patch is distributed under GPL. See gpl.txt in doc/.

The user-level tools and simulator are distributed under the BSD license.

Copyright (c) 2005, Feng Zhou <zf@cs.berkeley.edu>, University of California, Berkeley
All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

Neither the name of the University of California nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.