Tips for Optimizing C/C++ Code
1. Remember Ahmdal’s Law: Speedup =
time
old
time
new
=
1
(1−func
cost
)+func
cost
/func
speedup
• Where func
cost
is the percentage of the program runtime used by the function func, and func
speedup
is
the factor by which you speedup the function.
• Thus, if you optimize the function T riangleIntersect(), which is 40% of the runtime, so that it runs
twice as fast, your program will run 25% faster (
1
(1−0.4)+0.4/2
=
1
0.8
= 1.25).
• This means infrequently used code (e.g., the scene loader) probably should be optimized little (if at all).
• This is often phrased as: “make the common case fast and the rare case correct.”
2. Code for correctness first, then optimize!
• This does not mean write a fully functional ray tracer for 8 weeks, then optimize for 8 weeks!
• Perform optimizations on your ray tracer in multiple steps.
• Write for correctness, then if you know the function will be called frequently, perform obvious optimiza-
tions.
• Then profile to find bottlenecks, and remove the bottlenecks (by optimization or by improving the al-
gorithm). Often improving the algorithm drastically changes the bottleneck – perhaps to a function you
might not expect. This is a good reason to perform obvious optimizations on all functions you know will
be frequently used.
3. People I know who write very efficient code say they spend at least twice as long optimizing code as they spend
writing code.
4. Jumps/branches are expensive. Minimize their use whenever possible.
• Function calls require two jumps, in addition to stack memory manipulation.
• Prefer iteration over recursion.
• Use inline functions for short functions to eliminate function overhead.
• Move loops inside function calls (e.g., change for(i=0;i<100;i++) DoSomething(); into DoSomething()
{ for(i=0;i<100;i++) { ... } } ).
• Long if...else if...else if...else if... chains require lots of jumps for cases near the end of the chain (in
addition to testing each condition). If possible, convert to a switch statement, which the compiler some-
times optimizes into a table lookup with a single jump. If a switch statement is not possible, put the most
common clauses at the beginning of the if chain.
5. Think about the order of array indices.
• Two and higher dimensional arrays are still stored in one dimensional memory. This means (for C/C++
arrays) array[i][j] and array[i][j+1] are adjacent to each other, whereas array[i][j] and array[i+1][j]
may be arbitrarily far apart.
• Accessing data in a more-or-less sequential fashion, as stored in physical memory, can dramatically speed
up your code (sometimes by an order of magnitude, or more)!
• When modern CPUs load data from main memory into processor cache, they fetch more than a single
value. Instead they fetch a block of memory containing the requested data and adjacent data (a cache
line). This means after array[i][j] is in the CPU cache, array[i][j+1] has a good chance of already being
in cache, whereas array[i+1][j] is likely to still be in main memory.
Commentaires sur ces manuels