This is the readme file for the policy-based allocator described in the article.

BASIC USE:

In order to compile programs using the allocator, you need to have the following files in the same directory.

File					Description
pool_allocator_stl.hpp			Implementation of std::allocator.
GrowthPolicy.hpp			Growth policy
HoldingPolicy.hpp			Holding policy
pool.hpp				Implementation of BunchOfChunks and pool_allocator
PoolChunk.hpp				Implementation of PoolChunk
Threading.hpp				Threading policy
singleton.h				Singleton class (Loki-inspired)
static_assert.h				Static assert class
StoragePolicies.hpp			Storage policy

To use the allocator, once you've decided which set of policies is best, just #include pool_allocator_stl.hpp, and then specify the allocator you want for your container.

For instance, if your old code looked like this:

#include <map>
using namespace std;
int main(int argc, char** argv)
{
	map<int, int> mi;
	for(int i=0; i < 10; ++i)
	{
		m[i]=i+1;
	}
	for(int i=0; i < 10; ++i)
	{
		cout << i << " -> " m[i] << endl;
	}
}

Your new code might look like this:
#include <map>
#include "pool_allocator_stl.hpp"
using namespace std;
int main(int argc, char** argv)
{
	typedef pool_allocator_stl< 				//A pool allocator
		int, 						//for storing ints
		StoragePolicyBoundary<ThreadingSingleThreaded>, //Using the boundary storage policy
		ThreadingSingleThreaded, 			//single threaded environment
		HoldingPolicyStack, 				//store the allcotor on the stack
		GrowthPolicyLinear, 				//grow it linearly
		true > 						//delete memory as the container
		AllocatorType;					//releases it
	//Use simple storage policy in a multithreaded environment, performing 
	map<int, int, less<int>, AllocatorType> mi;
	for(int i=0; i < 10; ++i)
	{
		m[i]=i+1;
	}
	for(int i=0; i < 10; ++i)
	{
		cout << i << " -> " m[i] << endl;
	}
}



WHICH COMBINATION OF POLICIES?

The most reliable way of figuring out which combination of policies is likely to result in the best performance is to actually run the allocation / deallocation code in-place using each different permutation. In order to automate this, I've written a tracing allocator that reports each allocation and deallocation, and a playback system that will run through the allocation / deallocation operations for each combination of policies. Let's say you want to instrument the example from the previous section. 

The files you'll need are the following:
//depot/users/anthaue/alloc_paper/code/Makefile#8 - edit change 31866 (text)
//depot/users/anthaue/alloc_paper/code/parse_log.pl#1 - add change 31895 (text)
//depot/users/anthaue/alloc_paper/code/perf_info.cpp#1 - add change 31576 (text)
//depot/users/anthaue/alloc_paper/code/perf_info.h#2 - edit change 31611 (text)
//depot/users/anthaue/alloc_paper/code/playback_bounded.cpp#7 - edit change 31852 (text)
//depot/users/anthaue/alloc_paper/code/run_all.pl#1 - add change 31852 (text)
//depot/users/anthaue/alloc_paper/code/test_all.pl#1 - add change 31866 (text)
//depot/users/anthaue/alloc_paper/code/timer.h#2 - edit change 31611 (text)
//depot/users/anthaue/alloc_paper/code/trace.h#9 - edit change 31895 (text)
//depot/users/anthaue/alloc_paper/code/trace_allocator.hpp#2 - edit change 31576 (text)
//depot/users/anthaue/alloc_paper/code/uuid.cpp#1 - add change 31358 (text)
//depot/users/anthaue/alloc_paper/code/uuid.h#2 - edit change 31576 (text)

First of all, if you haven't already done so, you'll have to build the playback utility. Run "make playback" to make the utility. [See the makefile itself for how to specify which compiler / stl you're using, as well as updating paths to your boost installation and stlport installation, if you have one].

You'll end up with a file named playback_bounded_COMPILER_THREADING_STL.exe (where the words in caps will be replaced with the 

The code to get the actual instrumentation would look like this:

#include <map>
#include "trace_allocator.hpp"
using namespace std;
int main(int argc, char** argv)
{
	map<int, int, less<int>, trace_allocator<int> > mi;
	for(int i=0; i < 10; ++i)
	{
		m[i]=i+1;
	}
	for(int i=0; i < 10; ++i)
	{
		cout << i << " -> " m[i] << endl;
	}
}

When run, this program will produce a set of allocation dumps for each instantiation of the allocator (note that there may be more than one, as containers often create more than one allocator per instantiation. The file will be named with the typeid of the allocation type, as well as a guid to make sure each instantiation gets its own file. Rename this file to something managable - I'll assume here that you've called it "map.alloc".

Now, you can run playback_bounded.exe on the allocation file, specifying which set of policies you're interested in trying (see the code for syntax, or just run with no arguments). You can also run run_all.pl to run with all combinations of configurations described in the paper (with the exception that we assume single-threaded ThreadingPolicy, but that would be easy enough to change).

The output of this file will tell you average runtime, time spent in user vs. kernel code, maximum working set size, and # of page faults for each run. Based on this information, you can then decide which allocator to use.


MORE RESULTS

I didn't have space in my article to report on all of the different perf metrics for all of the different configurations. You can see the complete set of these (including the numbers for gcc on cygwin, which I didn't have room to mention either) in the spreadsheet results.xls.

CONTACT INFORMATION

For questions, comments, bug reports, bug fixes (much appreciated!), etc., please send me mail at:
anthaue@microsoft.com.

