Performance Tuning: Profiling and Timing
Materials Simulations
James Sethna, Spring 1999
The efficiency of a simulation depends, in decreasing order of importance, on
Optimization strategies depend in detail on the kind of computer you’re using, but the process of optimization is largely independent of platform. Broadly speaking, one first generates a functioning program, designed with intelligent algorithms and written with some concern for speed and memory usage. One then profiles the code. On Unix platforms, there are excellent programs (gprof and various cousins) which will report how much time is spent in each subroutine and function call. If you’ve broken down your task into natural pieces, you can quickly find out where most of your CPU time is spent. Unfortunately, I haven’t figured out how to get profiling to activate on Visual Studio.
The final step is to turn your elegant inner loop into an CPU-efficient machine, probably at the expense of making it somewhat uglier and more verbose. You try various changes in the code, testing each one to see if it speeds up the routine noticably without changing the answer. As a general rule, modifications which make the code harder to modify and/or which are platform dependent should be avoided unless they speed things up a lot. We’ll try out some optimization on the MD code we wrote earlier.
I’ll warn you in advance that your code is probably pretty sound: over the weekend I wasn’t able to speed mine up noticably. This is mostly because we’re running on Intel boxes running Microsoft operating systems. Because they needed to run old DOS programs forever, they have not been able to add extra floating-point units, pipelines, and registers to nearly the same degree as the competing RISC machines. When you port your code to a Unix supercomputer, remember to tune your inner loops again!
// Draw average time per force calculation in upper LHS
char timeForceCalc[100];
sprintf(timeForceCalc, "1000000*Time per Force = %6.3lf",
1000000*pDoc->sim.atoms->timer.AverageTime());
bufferDC.TextOut(0,WINDOW_BUFFER_RESOLUTION,timeForceCalc);
dx[coord] = dx[coord] - boxSize[coord] *
((int)(dx[coord]*inverseBoxSize[coord]+1.5)-1);
It may work better on RISC platforms. It may also gain from loop unrolling.
Statistical Mechanics: Entropy, Order Parameters, and Complexity,
now available at
Oxford University Press
(USA,
Europe).