get rid of lapacke dependency
The lapacke dependency is annoying (cannot be assumed to be installed), we only need few functions of it and those can easily be written manually.
- import lapacke source to 3rdParty folder
- unify functions to get rid of unnecessary transpositions
- allocate a thread_local work array to get rid of allocations per call
- use performance analysis (from makefile) to benchmark