Sunday, 17 August 2008

Surf Photography: Rip Curl Boardmasters 2008

I took this on Thursday August 7th, 2008 at Fistral Beach, Newquay Cornwall. The Rip Curl Boardmasters is a surf competition and festival which has been at Fistral for as long as I can remember, and it's a great atmosphere.

This was taken at the 300mm end of my 70-300mm lens with a shutter speed of 1/640 s, aperture f/8.0 and ISO 200. It was subsequently adjusted in Photoshop to improve the contrast a little, then cropped to remove a bit of empty sea.

Pebble v1 Call Graph

Following up on my recent post about a C++ implementation of the Mersenne Twister pseudo-random number generator (PRNG), I decided to find out just how much the performance of the PRNG actually affects a Monte Carlo system such as Pebble. Clearly, the random number generation is a critical part of the procedure, but as I'm about to show, it is definitely not the slowest part of Pebble, by a long way.

The valgrind tool allows a programmer to check their programs for many things, including memory leaks (the original goal of the valgrind project). One of the neat things it does is function call profiling and graphing. The idea is simple, you run a program inside valgrind using the callgrind option, and it tracks every function call made within the program. This lets you figure out how many times a function is being called, and how much of the program execution time is spent within a given function, allowing you to easily profile the code and determine functions or classes which are good candidates for optimisation.

Here is the call graph for Pebble v1, which uses a C++ implementation of the Mersenne Twister which I wrote last summer, rather than the new one I have recently posted about.


(Click for larger version)


The random number generation, performed by Pebble::Math::Random::MersenneTwister::getDouble and getLong, account for a mere 3.33% of the time spent in the event generator function Pebble::EventGenerator::generate. In contrast, finding minima of the probability distribution functions (using Pebble::Math::Utility::Minimiser::getMinimum) accounts for 31.29%, and integrating the PDFs for normalisation purposes (using Pebble::Math::Calculus::RombergIntegrator::integrate) accounts for 50.82% of the time spent generating events.

It is clear from this that, if I want to improve on the speed of generation in "Pebble v2", I need to focus my efforts on efficient minimia-finding and numerical integration functions.

There is also 7.47% of the time taken to generate events spent adding particles to an Event object, using the Pebble::Event::addParticle function. A lot of this time is accounted for by the memory allocation, and as I mentioned in my previous post, eliminating such object-oriented extravagances, while maintaining a sensible object-oriented design for the important parts of the program, should help to boost the speed of any new version.

Friday, 15 August 2008

Mersenne Twister

A key part of the Pebble program was the ability to generate lots of good-quality pseudo-random numbers for Monte Carlo (MC). The Mersenne Twister (MT) algorithm is ideally suited to this, and is indeed the algorithm we used. The original author, Makoto Matsumoto, has published C source for the algorithm, which I converted into a C++ class for the project, using the same interface as several other random number generators (RNGs) provided within Pebble.

So, following my post yesterday about considering building a "Pebble v2" or similar, I've spent the afternoon looking back into Mersenne Twister. The C code is fairly small, and reasonably easy to understand (surprisingly for something as complicated as a PRNG!) and I had no difficulty converting it to C++ the first time around. This time, I've ignored a lot of the complicated interface I built in the first time, since I no longer intend to provide other RNGs -- Mersenne Twister is more than good enough for any MC purposes.

Anyway, I modified the MT code to encapsulate it in a C++ class (code below), then ran some timing tests against the C version. In order to get useful timing data, I had to generate 1 million integers and 1 million real numbers, timing the program run for the entire thing. A real number is generated from a single integer, so this is roughly equivalent to two million integer number generations, plus a little overhead in converting half of them to a number between 0 and 1.

For the purposes of this test, I removed all the program output code from both the original C code and my new C++ version, timing the programs as they generated and threw away two million random numbers. The reason for removing output code was to prevent any skew in the results from the different output systems used by C and C++.

The test itself consisted of running a program to generate 1M integers and 1M reals, using the C or C++ versions of the MT algorithm, respectively. Each test was run ten times, using the bash time command to provide timing details. I discarded the real time taken, which refers to what is known as wall time -- the amount of actual elapsed time according to a clock on the wall -- and focused only on the user and system times, which are the number of seconds the CPU actually spent running the program in user-space and in system-calls, respectively.

The graph below shows the results. Both are incredibly fast, with the slowest taking a mere 0.156 seconds of CPU time. If I can generate two million random numbers in 0.2 seconds, this will certainly not be a bottleneck in any MC system!


(Click image for larger version)

The comparison, however, is slightly discouraging. The system times are incredibly low, 0.002 and 0.001 seconds for C and C++ respectively, averaged over ten runs. This is almost certainly because I removed all of the I/O, so the only time spent in the "system" would be the initial memory allocation and the final deallocation. In fact, I'd expect the C++ variant to take longer here, as the MersenneTwister object must be constructed. Still, these times are so close to zero it's hard to talk about them meaningfully.

The "user" times tell a different story. These are also averaged over ten runs, and show that the C++ version took almost twice as long as the C version. I expected there to be a little overhead from dynamically allocating the array used to store the internal state of the MT generator, but that should be a one-time thing, and the rest of the code is almost identical. I can only assume that the discrepancy arises from the function calls being mapped to member functions in C++ rather than global functions in C.

Many would argue that the factor of 2 slowdown is completely unacceptable, even if the C++ version does hide the details of MT (no defines, no global variables, just a few member variables). I agree that the reduction in speed is quite dramatic here, but compared to other components of a functioning Monte Carlo system, this overhead is negligible, and for any program written predominantly in C++, encapsulating the functionality and hiding the details is a good idea. This is where the hardcore C fans would argue that no system which requires speed or efficiency should be written in C++, but I disagree, backed up by plenty of evidence for C++ development in the physical sciences, and in particle physics in particular!

MersenneTwister.hpp

/*
A C++ program for MT19937, with initialization improved 2002/1/26.
Original C Coded by Takuji Nishimura and Makoto Matsumoto.
Converted to a C++ class by Andrew J. Bennieston

Before using, initialize the state by using init(seed)
or initArray(init_key, key_length).

Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura,
All rights reserved.
C++ Class modifications copyright (C) 2008, Andrew J. Bennieston.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions
are met:

1. Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.

3. The names of its contributors may not be used to endorse or promote
products derived from this software without specific prior written
permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.


Any feedback is very welcome.
http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html
email: m-mat @ math.sci.hiroshima-u.ac.jp (remove space)

Feedback on the C++ modifications to andrew @ physical - thought . com (remove spaces)
*/


class MersenneTwister {
private:
const int N;
const int M;
const unsigned long MATRIX_A;
const unsigned long UPPER_MASK;
const unsigned long LOWER_MASK;
int mti;
unsigned long* mt;
public:
// Constructors
MersenneTwister()
: N(624), M(397), MATRIX_A(0x9908b0dfUL),
UPPER_MASK(0x80000000UL), LOWER_MASK(0x7fffffffUL),
mti(N+1)
{ mt = new unsigned long[N]; };

// Destructor
~MersenneTwister() { delete[] mt; };

// Initialisation
void init(unsigned long s);
void initArray(unsigned long init_key[], int key_length);

// Random number generation
unsigned long genInt32(); // Random integer on [0,0xffffffff]
long genInt31(); // Random integer on [0,0x7fffffff]
double genRealClosed(); // Random real on closed range [0,1]
double genReal(); // Random real on half-open range [0,1) (i.e. not including 1)
double genRealOpen(); // Random real on open range (0,1) (i.e. not including 0 or 1)
double genReal53(); // Random 53-bit real on [0,1)
};


MersenneTwister.cpp

#include "MersenneTwister.hpp"

/*
* Initialise mt[N] with a seed
*/
void MersenneTwister::init(unsigned long s)
{
mt[0] = s & 0xffffffffUL;
for (mti=1; mti<N; mti++) {
mt[mti] =
(1812433253UL * (mt[mti-1] ^ (mt[mti-1] >> 30)) + mti);
/* See Knuth TAOCP Vol2. 3rd Ed. P.106 for multiplier.
* In the previous versions, MSBs of the seed affect
* only MSBs of the array mt[].
* 2002/01/09 modified by Makoto Matsumoto
*/
mt[mti] &= 0xffffffffUL;
/* for >32 bit machines */
}
}

/*
* Initialise by an array with length given by key_length
*/
void MersenneTwister::initArray(unsigned long init_key[], int key_length)
{
int i, j, k;
init(19650218UL);
i=1; j=0;
k = (N>key_length ? N : key_length);
for (; k; k--) {
mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1664525UL))
+ init_key[j] + j; /* non linear */
mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */
i++; j++;
if (i>=N) { mt[0] = mt[N-1]; i=1; }
if (j>=key_length) j=0;
}
for (k=N-1; k; k--) {
mt[i] = (mt[i] ^ ((mt[i-1] ^ (mt[i-1] >> 30)) * 1566083941UL))
- i; /* non linear */
mt[i] &= 0xffffffffUL; /* for WORDSIZE > 32 machines */
i++;
if (i>=N) { mt[0] = mt[N-1]; i=1; }
}

mt[0] = 0x80000000UL; /* MSB is 1; assuring non-zero initial array */
}

/*
* Generate random number on [0,0xffffffff]
*/
unsigned long MersenneTwister::genInt32()
{
unsigned long y;
static unsigned long mag01[2]={0x0UL, MATRIX_A};
/* mag01[x] = x * MATRIX_A for x=0,1 */

if (mti >= N) { /* generate N words at one time */
int kk;

if (mti == N+1) /* if init() has not been called, */
init(5489UL); /* a default initial seed is used */

for (kk=0;kk<N-M;kk++) {
y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
mt[kk] = mt[kk+M] ^ (y >> 1) ^ mag01[y & 0x1UL];
}
for (;kk<N-1;kk++) {
y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
mt[kk] = mt[kk+(M-N)] ^ (y >> 1) ^ mag01[y & 0x1UL];
}
y = (mt[N-1]&UPPER_MASK)|(mt[0]&LOWER_MASK);
mt[N-1] = mt[M-1] ^ (y >> 1) ^ mag01[y & 0x1UL];

mti = 0;
}

y = mt[mti++];

/* Tempering */
y ^= (y >> 11);
y ^= (y << 7) & 0x9d2c5680UL;
y ^= (y << 15) & 0xefc60000UL;
y ^= (y >> 18);

return y;
}

/*
* Generate a random number on [0,0x7fffffff]
*/
long MersenneTwister::genInt31()
{
return (long)(genInt32() >> 1);
}

/*
* Generate a random number on [0,1]-real
*/
double MersenneTwister::genRealClosed()
{
return genInt32() * (1.0 / 4294967295.0);
/* divided by 2^32 - 1 */
}

/*
* Generate a random number on [0,1)-real
*/
double MersenneTwister::genReal()
{
return genInt32() * (1.0 / 4294967296.0);
/* divided by 2^32 */
}

/*
* Generate a random number on (0,1)-real
*/
double MersenneTwister::genRealOpen()
{
return (((double)genInt32()) + 0.5) * (1.0 / 4294967296.0);
/* divided by 2^32 */
}

/*
* Generate a random number on [0,1)-real with 53-bit resolution
*/
double MersenneTwister::genReal53()
{
unsigned long a = genInt32() >> 5;
unsigned long b = genInt32() >> 6;
return(a * 67108864.0 + b) * (1.0 / 9007199254740992.0);
}