Commit 9bcdcdb8 authored by Sebastian Wolf's avatar Sebastian Wolf

Merge branch 'v3' of git.hemio.de:xerus/xerus into v3

parents 82865e49 fdf94b53
Pipeline #737 passed with stages
in 8 minutes and 37 seconds
......@@ -16,9 +16,14 @@
!*.sh
!*.rb
!*.py
!*.md
!*.xml
!*.dox
!*.css
!*.js
# ...even if they are in subdirectories
!*/
......
......@@ -2,13 +2,16 @@
Potentially breaking changes are marked with an exclamation point '!' at the begin of their description.
* 2016-??-?? v3.0.0
* 2017-??-?? v3.0.0
* Python wrapper now stable.
* ! REQUIRE macro now logs as error instead of fatal error.
* ! All macros and preprocessor defines now use the XERUS_ prefix. The config.mk file changed accordingly.
* ! TT::find_largest_entry and TT::dyadic_product left the TT scope.
* ! Tensor::modify_diag_elements renamed to Tensor::modify_diagonal_entries for naming consistency.
* Some minor bugfixes.
* Much faster solve of matrix equations Ax=b by exploiting symmetry and definiteness where possible. This directly speeds up the ALS as well.
* Added a highly optimized minimal version of the ALS algorithm as xALS.
* Added Tensor.one_norm() and one_norm(Tensor) to calculate the one norm of a Tensor.
* Some minor bugfixes and performance improvements.
* 2016-06-23 v2.4.0
* Introduced nomeclature 'mode'. Marked all functions that will be renamed / removed in v3.0.0 as deprecated.
......@@ -26,7 +29,7 @@ Potentially breaking changes are marked with an exclamation point '!' at the beg
* Bug fix in the handling of fixed indices in TensorNetworks.
* Several static member function now warn if their return type is not used.
* Initial support for compilation with the intel ICC.
* 2016-03-11 v2.2.1
* Added support for 32bit systems.
......
......@@ -107,7 +107,7 @@ define \n
endef
FLAGS = $(strip $(WARNINGS) $(OPTIMIZE) $(LOGGING) $(DEBUG) $(ADDITIONAL_INCLUDE) $(OTHER))
PYTHON_FLAGS = $(strip $(WARNINGS) $(LOGGING) $(DEBUG) $(ADDITIONAL_INCLUDE) $(OTHER))
PYTHON_FLAGS = $(strip $(WARNINGS) $(LOGGING) $(DEBUG) $(ADDITIONAL_INCLUDE) $(OTHER) -fno-var-tracking-assignments)
MINIMAL_DEPS = Makefile config.mk makeIncludes/general.mk makeIncludes/warnings.mk makeIncludes/optimization.mk
......
......@@ -73,6 +73,7 @@ COMPILE_THREADS = 8 # Number of threads to use during link time optimizatio
# DEBUG += -fsanitize=undefined # GCC only
# DEBUG += -fsanitize=memory # Clang only
# DEBUG += -fsanitize=address # find out of bounds access
# DEBUG += -pg # adds profiling code for the 'gprof' analyzer
# Xerus has a buildin logging system to provide runtime information. Here you can adjust the logging level used by the library.
......
# Notes for Advanced Users
## Multi-Threading
Please note, that `xerus` is only thread-safe up to 1024 threads at the time of this writing. This number is due to the internal handling of indices. To ensure that indices can
be compared and identified uniquely, they store a unique id of which the first 10 bits are reserved to denote the number of the current thread. With more than 1024 threads (when these
10 bits overflow) it can thus lead to collisions and indices that were shared between threads and were meant to be seperate suddenly evaluate to be equal to all algorithms.
......@@ -44,7 +44,7 @@ The output of this executable should then list a number of passed tests and end
-------------------------------------------------------------------------------
~~~
Note in particular, that all tests were passed. Should this not be the case please file a bug report with as many details as you
can in our [issuetracker](https://git.hemio.de/xerus/xerus/issues).
can in our [issuetracker](https://git.hemio.de/xerus/xerus/issues) or write us an [email](mailto:contact@libxerus.org).
## Building the Library
......
# Indices and Equations
## Blockwise Construction of Tensors
......@@ -24,7 +24,9 @@ The current development version is also available in the same git repository (br
There are a number of tutorials to get you started using the `xerus` library.
* [Building xerus](@ref md_building_xerus) - instruction on how to build the library iteself and your first own program using it.
* [Quick-Start guide](_quick-_start-example.html) - a short introduction into the basic `xerus` functionality.
* [Tensor Class](@ref md_tensor) - an introduction into the most important functions relating to the `xerus::Tensor` class.
* [TT Tensors](_t_t-_tensors_01_07_m_p_s_08-example.html) - using the MPS or Tensor-Train decomposition.
* [Optimization](@ref md_optimization) - some hints how to achieve the fastest possible numerical code using `xerus`.
* [Debugging](@ref md_tut_debugging) - using `xerus`'s capabilities to debug your own application.
## Issues
......
# Jekyll configuration
name: Xerus Documentation
description: Documentation for the xerus library.
# baseurl will often be '', but for a project page on gh-pages, it needs to
# be the project name.
# *** IMPORTANT: If your local "jekyll serve" throws errors change this to '' or
# run it like so: jekyll serve --baseurl=''
baseurl: ''
permalink: /:title
# paginate: 3
highlighter: rouge
# highlighter: coderay
markdown: kramdown
# gems: ['TabsConverter']
# exclude: ['README.md', 'LICENSE']
#include <xerus.h>
using namespace xerus;
class InternalSolver {
const size_t d;
std::vector<Tensor> leftAStack;
std::vector<Tensor> rightAStack;
std::vector<Tensor> leftBStack;
std::vector<Tensor> rightBStack;
TTTensor& x;
const TTOperator& A;
const TTTensor& b;
const double solutionsNorm;
public:
size_t maxIterations;
InternalSolver(const TTOperator& _A, TTTensor& _x, const TTTensor& _b)
: d(_x.degree()), x(_x), A(_A), b(_b), solutionsNorm(frob_norm(_b)), maxIterations(1000)
{
leftAStack.emplace_back(Tensor::ones({1,1,1}));
rightAStack.emplace_back(Tensor::ones({1,1,1}));
leftBStack.emplace_back(Tensor::ones({1,1}));
rightBStack.emplace_back(Tensor::ones({1,1}));
}
void push_left_stack(const size_t _position) {
Index i1, i2, i3, j1 , j2, j3, k1, k2;
const Tensor &xi = x.get_component(_position);
const Tensor &Ai = A.get_component(_position);
const Tensor &bi = b.get_component(_position);
Tensor tmpA, tmpB;
tmpA(i1, i2, i3) = leftAStack.back()(j1, j2, j3)
*xi(j1, k1, i1)*Ai(j2, k1, k2, i2)*xi(j3, k2, i3);
leftAStack.emplace_back(std::move(tmpA));
tmpB(i1, i2) = leftBStack.back()(j1, j2)
*xi(j1, k1, i1)*bi(j2, k1, i2);
leftBStack.emplace_back(std::move(tmpB));
}
void push_right_stack(const size_t _position) {
Index i1, i2, i3, j1 , j2, j3, k1, k2;
const Tensor &xi = x.get_component(_position);
const Tensor &Ai = A.get_component(_position);
const Tensor &bi = b.get_component(_position);
Tensor tmpA, tmpB;
tmpA(i1, i2, i3) = xi(i1, k1, j1)*Ai(i2, k1, k2, j2)*xi(i3, k2, j3)
*rightAStack.back()(j1, j2, j3);
rightAStack.emplace_back(std::move(tmpA));
tmpB(i1, i2) = xi(i1, k1, j1)*bi(i2, k1, j2)
*rightBStack.back()(j1, j2);
rightBStack.emplace_back(std::move(tmpB));
}
double calc_residual_norm() {
Index i,j;
return frob_norm(A(i/2, j/2)*x(j&0) - b(i&0)) / solutionsNorm;
}
void solve() {
// Build right stack
x.move_core(0, true);
for (size_t pos = d-1; pos > 0; --pos) {
push_right_stack(pos);
}
Index i1, i2, i3, j1 , j2, j3, k1, k2;
std::vector<double> residuals(10, 1000.0);
for (size_t itr = 0; itr < maxIterations; ++itr) {
// Calculate residual and check end condition
residuals.push_back(calc_residual_norm());
if (residuals.back()/residuals[residuals.size()-10] > 0.99) {
XERUS_LOG(simpleALS, "Done! Residual decrease from " << std::scientific << residuals[10] << " to " << std::scientific << residuals.back() << " in " << residuals.size()-10 << " iterations.");
return; // We are done!
}
XERUS_LOG(simpleALS, "Iteration: " << itr << " Residual: " << residuals.back());
// Sweep Left -> Right
for (size_t corePosition = 0; corePosition < d; ++corePosition) {
Tensor op, rhs;
const Tensor &Ai = A.get_component(corePosition);
const Tensor &bi = b.get_component(corePosition);
op(i1, i2, i3, j1, j2, j3) = leftAStack.back()(i1, k1, j1)*Ai(k1, i2, j2, k2)*rightAStack.back()(i3, k2, j3);
rhs(i1, i2, i3) = leftBStack.back()(i1, k1) * bi(k1, i2, k2) * rightBStack.back()(i3, k2);
xerus::solve(x.component(corePosition), op, rhs);
if (corePosition+1 < d) {
x.move_core(corePosition+1, true);
push_left_stack(corePosition);
rightAStack.pop_back();
rightBStack.pop_back();
}
}
// Sweep Right -> Left : only move core and update stacks
x.move_core(0, true);
for (size_t corePosition = d-1; corePosition > 0; --corePosition) {
push_right_stack(corePosition);
leftAStack.pop_back();
leftBStack.pop_back();
}
}
}
};
void simpleALS(const TTOperator& _A, TTTensor& _x, const TTTensor& _b) {
InternalSolver solver(_A, _x, _b);
return solver.solve();
}
int main() {
Index i,j,k;
auto A = TTOperator::random(std::vector<size_t>(16, 4), std::vector<size_t>(7,2));
A(i/2,j/2) = A(i/2, k/2) * A(j/2, k/2);
auto solution = TTTensor::random(std::vector<size_t>(8, 4), std::vector<size_t>(7,3));
TTTensor b;
b(i&0) = A(i/2, j/2) * solution(j&0);
auto x = TTTensor::random(std::vector<size_t>(8, 4), std::vector<size_t>(7,3));
simpleALS(A, x, b);
XERUS_LOG(info, "Residual: " << frob_norm(A(i/2, j/2) * x(j&0) - b(i&0))/frob_norm(b));
XERUS_LOG(info, "Error: " << frob_norm(solution-x)/frob_norm(x));
}
import xerus as xe
class InternalSolver :
def __init__(self, A, x, b):
self.A = A
self.b = b
self.x = x
self.d = x.degree()
self.solutionsNorm = b.frob_norm()
self.leftAStack = [ xe.Tensor.ones([1,1,1]) ]
self.leftBStack = [ xe.Tensor.ones([1,1]) ]
self.rightAStack = [ xe.Tensor.ones([1,1,1]) ]
self.rightBStack = [ xe.Tensor.ones([1,1]) ]
self.maxIterations = 1000
def push_left_stack(self, pos) :
i1,i2,i3, j1,j2,j3, k1,k2 = xe.indices(8)
Ai = self.A.get_component(pos)
xi = self.x.get_component(pos)
bi = self.b.get_component(pos)
tmpA = xe.Tensor()
tmpB = xe.Tensor()
tmpA(i1, i2, i3) << self.leftAStack[-1](j1, j2, j3)\
*xi(j1, k1, i1)*Ai(j2, k1, k2, i2)*xi(j3, k2, i3)
self.leftAStack.append(tmpA)
tmpB(i1, i2) << self.leftBStack[-1](j1, j2)\
*xi(j1, k1, i1)*bi(j2, k1, i2)
self.leftBStack.append(tmpB)
def push_right_stack(self, pos) :
i1,i2,i3, j1,j2,j3, k1,k2 = xe.indices(8)
Ai = self.A.get_component(pos)
xi = self.x.get_component(pos)
bi = self.b.get_component(pos)
tmpA = xe.Tensor()
tmpB = xe.Tensor()
tmpA(j1, j2, j3) << xi(j1, k1, i1)*Ai(j2, k1, k2, i2)*xi(j3, k2, i3) \
* self.rightAStack[-1](i1, i2, i3)
self.rightAStack.append(tmpA)
tmpB(j1, j2) << xi(j1, k1, i1)*bi(j2, k1, i2) \
* self.rightBStack[-1](i1, i2)
self.rightBStack.append(tmpB)
def calc_residual_norm(self) :
i,j = xe.indices(2)
return xe.frob_norm(self.A(i/2, j/2)*self.x(j&0) - self.b(i&0)) / self.solutionsNorm
def solve(self) :
# build right stack
self.x.move_core(0, True)
for pos in reversed(xrange(1, self.d)) :
self.push_right_stack(pos)
i1,i2,i3, j1,j2,j3, k1,k2 = xe.indices(8)
residuals = [1000]*10
for itr in xrange(self.maxIterations) :
residuals.append(self.calc_residual_norm())
if residuals[-1]/residuals[-10] > 0.99 :
print("Done! Residual decreased from:", residuals[10], "to", residuals[-1], "in", len(residuals)-10, "sweeps")
return
print("Iteration:",itr, "Residual:", residuals[-1])
# sweep left -> right
for pos in xrange(self.d):
op = xe.Tensor()
rhs = xe.Tensor()
Ai = self.A.get_component(pos)
bi = self.b.get_component(pos)
op(i1, i2, i3, j1, j2, j3) << self.leftAStack[-1](i1, k1, j1)*Ai(k1, i2, j2, k2)*self.rightAStack[-1](i3, k2, j3)
rhs(i1, i2, i3) << self.leftBStack[-1](i1, k1) * bi(k1, i2, k2) * self.rightBStack[-1](i3, k2)
tmp = xe.Tensor()
tmp(i1&0) << rhs(j1&0) / op(j1/2, i1/2)
self.x.set_component(pos, tmp)
if pos+1 < self.d :
self.x.move_core(pos+1, True)
self.push_left_stack(pos)
self.rightAStack.pop()
self.rightBStack.pop()
# right -> left, only move core and update stack
self.x.move_core(0, True)
for pos in reversed(xrange(1,self.d)) :
self.push_right_stack(pos)
self.leftAStack.pop()
self.leftBStack.pop()
def simpleALS(A, x, b) :
solver = InternalSolver(A, x, b)
solver.solve()
if __name__ == "__main__":
i,j,k = xe.indices(3)
A = xe.TTOperator.random([4]*16, [2]*7)
A(i/2,j/2) << A(i/2, k/2) * A(j/2, k/2)
solution = xe.TTTensor.random([4]*8, [3]*7)
b = xe.TTTensor()
b(i&0) << A(i/2, j/2) * solution(j&0)
x = xe.TTTensor.random([4]*8, [3]*7)
simpleALS(A, x, b)
print("Residual:", xe.frob_norm(A(i/2, j/2) * x(j&0) - b(i&0))/xe.frob_norm(b))
print("Error:", xe.frob_norm(solution-x)/xe.frob_norm(x))
/**
* @file
* @short the source code to the "Quick-Start" guide
*/
#include <xerus.h>
int main() {
......@@ -30,9 +25,7 @@ int main() {
std::cout << "ttA ranks: " << ttA.ranks() << std::endl;
// the right hand side of the equation both as Tensor and in (Q)TT format
xerus::Tensor b({512}, []() {
return 1.0;
});
auto b = xerus::Tensor::ones({512});
b.reinterpret_dimensions(std::vector<size_t>(9, 2));
xerus::TTTensor ttb(b);
......@@ -56,8 +49,5 @@ int main() {
x(j^9) = b(i^9) / A(i^9, j^9);
// and calculate the Frobenius norm of the difference
// here i&0 denotes a multiindex large enough to fully index the respective tensors
// the subtraction of different formats will default to Tensor subtraction such that
// the TTTensor ttx will be evaluated to a Tensor prior to subtraction.
std::cout << "error: " << frob_norm(x(i&0) - ttx(i&0)) << std::endl;
std::cout << "error: " << frob_norm(x - xerus::Tensor(ttx)) << std::endl;
}
import xerus as xe
# construct the stiffness matrix A using a fill function
def A_fill(idx):
if idx[0] == idx[1] :
return 2.0
elif idx[1] == idx[0]+1 or idx[1]+1 == idx[0] :
return -1.0
else:
return 0.0
A = xe.Tensor.from_function([512,512], A_fill)
# and dividing it by h^2 = multiplying it with N^2
A *= 512*512
# reinterpret the 512x512 tensor as a 2^18 tensor
# and create (Q)TT decomposition of it
A.reinterpret_dimensions([2,]*18)
ttA = xe.TTOperator(A)
# and verify its rank
print("ttA ranks:", ttA.ranks())
# the right hand side of the equation both as Tensor and in (Q)TT format
b = xe.Tensor.ones([512])
b.reinterpret_dimensions([2,]*9)
ttb = xe.TTTensor(b)
# construct a random initial guess of rank 3 for the ALS algorithm
ttx = xe.TTTensor.random([2,]*9, [3,]*8)
# and solve the system with the default ALS algorithm for symmetric positive operators
xe.ALS_SPD(ttA, ttx, ttb)
# to perform arithmetic operations we need to define some indices
i,j,k = xe.indices(3)
# calculate the residual of the just solved system to evaluate its accuracy
# here i^9 denotes a multiindex named i of dimension 9 (ie. spanning 9 indices of the respective tensors)
residual = xe.frob_norm( ttA(i^9,j^9)*ttx(j^9) - ttb(i^9) )
print("residual:", residual)
# as an comparison solve the system exactly using the Tensor / operator
x = xe.Tensor()
x(j^9) << b(i^9) / A(i^9, j^9)
# and calculate the Frobenius norm of the difference
print("error:", xe.frob_norm(x - xe.Tensor(ttx)))
module Jekyll
class TabsConverter < Converter
safe true
priority :low
def matches(ext)
ext =~ /^\.md$/i
end
def output_ext(ext)
".html"
end
def convert(content)
content.gsub('<p>__tabsInit</p>', "<input id=\"tab1\" type=\"radio\" name=\"tabs\" checked><input id=\"tab2\" type=\"radio\" name=\"tabs\">")
.gsub('<p>__tabsStart</p>', "<div id=\"tabs\"><label for=\"tab1\">C++</label><label for=\"tab2\">Python</label><div id=\"content\"><section id=\"content1\">")
.gsub('<p>__tabsMid</p>', "</section><section id=\"content2\">")
.gsub('<p>__tabsEnd</p>', "</section></div></div>")
.gsub('<p>__dangerStart</p>', "<div class=\"alert alert-danger\">")
.gsub('<p>__dangerEnd</p>', "</div>")
.gsub('<p>__warnStart</p>', "<div class=\"alert alert-warning\">")
.gsub('<p>__warnEnd</p>', "</div>")
end
end
end
---
layout: post
title: "The ALS Algorithm"
date: 1000-12-10
topic: "Examples"
section: "Examples"
---
__tabsInit
# The ALS Algorithm
Implementing the Alternating Least Squares (ALS) algorithm (also known as single-site DMRG) for the first time was the most
important step for us to understand the TT format and its
intricacies. We still think that it is a good point to start so we want to provide a simple implementation of the ALS
algorithm as an example. Using the `xerus` library this will be an efficient implementation using only about 100 lines of code (without comments).
## Introduction
The purpose of this page is not to give a full derivation of the ALS algorithm. The interested reader is instead refered to the
original publications on the matter. Let us just shortly recap the general idea of the algorithm to refresh your memory though.
Solving least squares problems of the form $\operatorname{argmin}_x \\|Ax - b\\|^2$ for large dimensions is a difficult endeavour.
Even if $x$ and $b$ are given in the TT-Tensor and $A$ in the TT-Operator format with small ranks this is far from trivial.
There is a nice property of the TT format though, that we can use to construct a Gauss-Seidel-like iterative scheme: the linear
dependence of the represented tensor on all of its component tensors.
Due to this linearity, we can formulate a smaller subproblem: fixing all but one components of $x$, the resulting minimization
problem is again of the form of a least squares problem as above but with a projected $\hat b = Pb$ and with a smaller $\hat A=PAP^T$.
In practice, these projections can be obtained by simply contracting all fixed components of $x$ to $A$ and $b$ (assuming all
fixed components are orthogonolized). The ALS algorithm will now simply iterate over the components of $x$ and solve these smaller subproblems.
There are a few things we should note before we start implementing this algorithm
* It is enough to restrict ourselves to the case of symmetric positive-semidefinite operators $A$. Any non-symmetric problem can be solved by setting $A'=A^TA$ and $b' = A^Tb$.
* We should always move our core of $x$ to the position currrently being optimized to make our lives easier (for several reasons...).
* Calculating the local operators $\hat A$ for components $i$ and $i+1$ is highly redundant. All components of $x$ up to the $i-1$'st have to be contracted with $A$ in both cases. Effectively this means, that we will keep stacks of $x^TAx$ contracted up to the current index ("left" of the current index) as well as contracted at all indices above the currrent one ("right" of it) and similarly for $x^T b$.
## Pseudo-Code
Let us start by writing down the algorithm in pseudo-code and then fill out the steps one by one.
__tabsStart
~~~ cpp
// while we are not done
// for every position p = 0..degree(x) do
// local operator = left stack(A) * p'th component of A * right stack(A)
// local rhs = left stack(b) * p'th component of b * right stack(b)
// p'th component of x = solution of the local least squares problem
// remove top entry of the right stacks
// add position p to left stacks
// for every position p = degree(x)..0 do
// same as above OR simply move core and update stacks
~~~
__tabsMid
~~~ py
# while we are not done
# for every position p = 0..degree(x) do
# local operator = left stack(A) * p'th component of A * right stack(A)
# local rhs = left stack(b) * p'th component of b * right stack(b)
# p'th component of x = solution of the local least squares problem
# remove top entry of the right stacks
# add position p to left stacks
# for every position p = degree(x)..0 do
# same as above OR simply move core and update stacks
~~~
__tabsEnd
## Helper Class
We want our main loop to resemble the above pseudo code as closely as possible, so we have to define some helper functions to
update the stacks. To ensure that all functions work on the same data without passing along references all the time, we will
define a small helper class, that holds all relevant variables: the degree `d` of our problem, the left and right stacks for `A`
and `b`, the TT tensors `A`, `x` and `b` themselves and the norm of `b`. As a parameter of the algorithm we will also store the
maximal number of iterations
__tabsStart
~~~ cpp
class InternalSolver {
const size_t d;
std::vector<Tensor> leftAStack;
std::vector<Tensor> rightAStack;
std::vector<Tensor> leftBStack;
std::vector<Tensor> rightBStack;
TTTensor& x;
const TTOperator& A;
const TTTensor& b;
const double solutionsNorm;
public:
size_t maxIterations;
InternalSolver(const TTOperator& _A, TTTensor& _x, const TTTensor& _b)
: d(_x.degree()), x(_x), A(_A), b(_b), solutionsNorm(frob_norm(_b)), maxIterations(1000)
{
leftAStack.emplace_back(Tensor::ones({1,1,1}));
rightAStack.emplace_back(Tensor::ones({1,1,1}));
leftBStack.emplace_back(Tensor::ones({1,1}));
rightBStack.emplace_back(Tensor::ones({1,1}));
}
~~~
__tabsMid
~~~ py
class InternalSolver :
def __init__(self, A, x, b):
self.A = A
self.b = b
self.x = x
self.d = x.degree()
self.solutionsNorm = b.frob_norm()
self.leftAStack = [ xe.Tensor.ones([1,1,1]) ]
self.leftBStack = [ xe.Tensor.ones([1,1]) ]
self.rightAStack = [ xe.Tensor.ones([1,1,1]) ]
self.rightBStack = [ xe.Tensor.ones([1,1]) ]
self.maxIterations = 1000
~~~
__tabsEnd
Note here, that we initialized the stacks with tensors of dimensions $1\times 1\times 1$ (respectively $1\times 1$). By doing this
we won't have to distinguish between the first, last or any other component that is being optimized. As the first and last component
of a TT Tensor already have an additional mode of dimension 1 in `xerus`, we can simply contract them to these tensors containing
a 1 entry as if they were any of the middle components.
## Updating the Stacks
To add the next entry to the left (right) stacks we have to contract the next components to the previous results. To increase
readability of the equations, we first store references to the $p$'th components in `Ai`, `xi` and `bi` before we contract them