Using profile guided optimization (PGO) helps to generate faster CP2K executables, e.g. up to 20 percent for hybrid functional calculations. The basic procedure is rather easy if a recent gcc/gfortran is used (e.g. gcc 4.9.2, as tested below, older versions will/may not work).
1. Introduce in the used arch file (e.g. local.sopt) the variable $(PROFOPT) as part as the FCFLAGS (available by default in the toolchain arch files).
FCFLAGS = -I$(CP2KINSTALLDIR)/include -std=f2003 -fimplicit-none -ffree-form -fno-omit-frame-pointer -g -O3 -march=native -ffast-math $(PROFOPT) $(DFLAGS) $(WFLAGS)
2. Clean any eventual leftovers from previous compilations, removing all relevant directories (i.e. realclean)
make -j ARCH=local VERSION=sopt realclean
3. Build the code with extra instrumentation (this binary is slow, and used only for training purposes)
make -j ARCH=local VERSION=sopt PROFOPT=-fprofile-generate
4. Run the binary either on a specific testcase, or better on the full testsuite for good coverage. Only those parts of the code executed during the training run can benefit from PGO. This will write additional files (.gcda) files in the obj directory.
make -j ARCH=local VERSION=sopt PROFOPT=-fprofile-generate test
5. Remove the old instrumented object files, retaining the .gcda files (i.e. clean not realclean)
make -j ARCH=local VERSION=sopt PROFOPT=-fprofile-use clean
6. Recompile to build the optimized binary using the profile data.
make -j ARCH=local VERSION=sopt PROFOPT=-fprofile-use