Nice to Haves

This is a list of nice to have features or changes in CP2K that nobody got around to do, yet. They are noted down here so we don't forget about them.

Some of them may be suitable for one or more Google Summer of Code projects.

External Libraries

Use GNU Scientific Library for special math functions, sorting, splines, physical constants,…
Use the Space Group Library for handling crystal symmetries.
Use BigDFTs wavelet Poisson solvers.

Input / Output

Refactor cp_output_handling.F such that it does not require the input_section. Instead there should be a routine to parse the input section once and store that information into a novel printkey_type.
Make it clear in the output at which nesting level the output is happening. For example, print “Entering nesting level X” before and “Exiting nesting level X” after. Proivde tools for highliting and folding the output files.
Evolve the printkey into a real logging mechanism. If the WRITE statements were converted into functions calls, one could annotate the output in a standardized way. Such annotations would then allow for parsing the output in a generic fashion.
A lone keyword of type logical should always shortcut to YES. This should be hard coded.

Missing features

Cut-off auto-calibration (Is it possible to provide good cut-off default values based on basis sets and desired precision?)
Estimation of time remaining until the simulation is done. Probably easy to do for MD using FIST. Order of magnitude estimation could possibly be provided using parameters from the input file with a good cost model, heuristics will likely be needed for more precise estimation.
Convergence analysis of SCF and automated abort if convergence will not be reached (e.g. oscillations, etc.) and/or adaptive convergence thresholds like ORCA
Estimation of disk size simulation output files will take.
Support LibMints or libcint as alternatives to libint
Support xcfun
Support GROMACS format for force fields and topology files in MM.
Provide a cp2k library/API:
- Due to the lack of full ATTRIBUTE support in gfortran we will have to rely on the LD Version scripts to control the visibility of symbols in the API. In addition to the control of the symbol visibility, this provides an explicit versioning of the API (in addition the the version number in soname). The name mangling of Fortran symbols is fortunately easy: __mymodulename_MOD_myfunctionname
- Change of the API should be controlled, useful tools in this area are: abi-dumper and ABI compliance checker
- Mark private/internal functions in the Doxygen function description as such to provide a public API documentation
- Automatic generation of the LD version script and/or C-bindings/API. Either use a Fortran parser (see other todo items) or hook into Doxygen for that.
Extend the preprocessor to accept @ELSE in the existing '@IF @ENDIF' and possibly remove the restriction on nesting.

Performance

Investigate performance delta of MM code in CP2K compared to another state of the art open source software (e.g. GROMACS). See if there are interesting ideas there which can be used.

Testing

The test coverage of the XC-functionals is pretty low. Since we have libxc as reference, one could easily write a unit-test that compares both implementations by applying them to a randomly generated density.
Setup a regtester with the PGI Community Edition Compiler.
Performance regression testing. Requires an empty machine for reproducibility. Test each kernel (FFT, LA, grid,…) in different regimes (compute-, communication-, overhead-bound).

Wiki

Add support for CP2K input-files to GeSHi, which is what DokuWiki uses for syntax highlighting.

Dev Tools

A Fortran parser library for Python. As a starting point one could take the parser from gfortran and extend it to preserve white spaces. Alternatively, one could try to get the Open Fortran Parser to work with CP2K. A third option would be to improve the Fortran frontend for LLVM. Such a parser library would allow for advanced tools like:
- static code analysis
- generation of nice API docs
- …
Run static code analysis e.g. on the gfortran AST to find common performance issues:
- ALLOCATEs in OMP-regions or tight loops
- ALLOCATABLEs / POINTERs that could go in the stack.
Fuzzy testing: Generate randomly valid input files, then check that they either run successfully for a few seconds or quit with a proper error message. The fuzzing could be extended to other settings, e.g. mpi, openmp, arch-file,etc.

Youthful Folly

Remove reference counting wherever possible. It leads to super-hard to find bugs, and most objects in CP2K have an obvious owner. This ownership assignment can be enforced by using ALLOCATABLEs in derived types.

Turn POINTERs into ALLOCATABLEs

Allocatables lead to faster code, can not lead to leaks, and are less prone to programming errors. This would be easier with a (not-yet-existing) -Wneedless-pointer warning in gfortran. In order of difficulty :

Switch local procedure variables from POINTER → ALLOCATABLE if they are
- not passed to a procedure with explicit POINTER dummy arguments
- not used in a pointer assignment
- … ? It probably holds that the code is correct if it compiles.
- special treatment might be useful for NULLIFY and ASSOCIATED
Remove POINTER attribute of dummy arguments
- needs to be careful with INTENT(IN) pointers
Switch derived type members from POINTER to ALLOCATABLE

DBCSR

Removals

Remove dbcsr_mutable_type.
Remove various init-routines, rely on Fortran type initializers instead.

Restructurings

Eliminate the work matrices, the assignment to threads should be static.
Enforce clear separation of library layers. For example arnoldi should be independent from data storage format.
Do not pass any internal data-structure to the “outside”.
Merge dbcsrwrap and dbcsr_api. There should only be one API.
Strengthen the API with unit-tests.
Remove improper usage of INTERFACEs like this .

Bugs?

dbcsr_add does not check for symmetry.
dbcsr_add does not check the “transpose-state”.
dbcsr_copy has confusing order of arguments.

Missing features?

dbcsr_trace does not work with matrices of different symmetries.
dbcsr_add does not work with matrices of different symmetries.
Complex matrices are not fully supported, which is why they are not used in e.g. RTP.
Expose dbcsr's internal types as Fortran types. This means having separate types for symmetry/non-symmetric and int/float/complex.
MIC port (integrate)
Finish OpenCL port (kernels), make sure it runs on open source OpenCL stack provided by Mesa / Gallium (radeonsi driver is a good target).
Use the CUDA runtime compilation library instead of statically linking only a small list of preselected kernels.

CP2K Open Source Molecular Dynamics

Table of Contents