optimization.md 3.91 KB
Newer Older
Ben Huber's avatar
Ben Huber committed
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Optimizations

If you are like us, you want to get the fastest possible version of your numerical code to run as many samples as possible and
solve the largest systems possible. To this end there are a number of possible optimizations already provided for you by the 
`xerus` library. The following list expands on the most relevant of them in roughly the order of effectiveness.

## Disabling Runtime Checks
The library contains many runtime checks for out-of-bounds access, other invalid inputs (like illegal contractions), consistency
and even to check the correct behaviour of internal structures. Depending on the complexity your code and the time spent inside
`xerus` (and not one of the libraries it uses) you can expect a large performance gain by disabling these checks in the `config.mk`
file during compilation of xerus.

It is not advisable to do this while developing, as it will be much more difficult to detect errors in your calls to `xerus`
functions, but once you have established, that your code works as expected you might want to try replacing the `libxerus.so` object
used by your project with one compiled with the `-D XERUS_DISABLE_RUNTIME_CHECKS` flag.

17
## Use c++ instead of Python
Ben Huber's avatar
Ben Huber committed
18
19
20
21
22
23
24
25
26

## Compiling Xerus with High Optimizations
Per default the library already compiles with high optimization settings (corresponding basically to `-O3`) as there is rarely
any reason to use lower settings for numerical code. If you want to spend a significant amount of cpu hours in numerical code
using the `xerus` library though, it might be worthwile to go even further.

The most significant change in runtime speed gains due to compiler settings at this point will come from link-time optimizations
(for `c++`projects using `xerus`).
To make use of them you will need a sufficiently recent versions of the `g++` compiler and `ar` archiver. After compiling the
27
`libxerus.so` object with the `USE_LTO = TRUE` flag you can then enable `-flto` in your own compilation process. The optimizations
Ben Huber's avatar
Ben Huber committed
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
that will be used then extending more than a single compilation unit and might thus use significant system resources during 
compilation.

If link-time optimization is not an option (or not sufficient) it is also possible to replace the high optimizations flag in your
`config.mk` file with the `DANGEROUS_OPTIMIZATION = TRUE` flag. This will enable non-IEEE-conform optimizations that should
typically only change floating point results in the least significant bit but might lead to undefined behaviour in case a `NaN`
or overflow is encountered during runtime. (It is rumored that there is an even higher optimization setting available for `xerus`
for those who know how to find it and want to get even the last 1% of speedup...)


## Avoiding Indexed Expressions
The comfort of being able to write Einstein-notation-like equations in the source code of the form `A(i,k) = B(i,j)*C(j,k);` 
comes with the price of a certain overhead during runtime. It is in the low single-digit percent range for typical applications
but can become significant when very small tensors are being used and the time for the actual contraction thus becomes negligible.

In such cases it can be useful to replace such equations (especially ones that are as simple as above) with the explicit statement 
of contractions and reshuffels. For above equation that would simply be
~~~.cpp
contract(A, B, false, C, false, 1);
~~~
i.e. read as: contract two tensors and store the result in A, left hand side B, not transposed, right hand side C, not transposed, contract a single mode.

If it is necessary to reshuffle a tensor to be able to contract it in such a way, e.g. `A(i,j,k) = B(i,k,j)`, this can be done
with the `reshuffle` function.
~~~.cpp
reshuffle(A, B, {0,2,1});
~~~

It is our opinion that code written with these functions instead of indexed espressions are often much harder to understand
and the speedup is typically small... but just in case you really want to, you now have the option to use them.