XERUS_LOG(simpleALS,"Done! Residual decrease from "<<std::scientific<<residuals[10]<<" to "<<std::scientific<<residuals.back()<<" in "<<residuals.size()-10<<" iterations.");
Implementing the ALS algorithm for the first time was the most important step for us to understand the TT format and its
Implementing the Alternating Least Squares (ALS) algorithm (also known as single-site DMRG) for the first time was the most
important step for us to understand the TT format and its
intricacies. We still think that it is a good point to start so we want to provide a simple implementation of the ALS
algorithm as an example. Using the `xerus` library this will be an efficient implementation using less than 100 lines of code.
algorithm as an example. Using the `xerus` library this will be an efficient implementation using only about 100 lines of code (without comments).
## Introduction
The purpose of this page is not to give a full derivation of the ALS algorithm. The interested reader is instead refered to the
original publications on the matter. Let us just shortly recap the general idea of the algorithm to refresh your memory though.
Solving least squares problems of the form $\operatorname{argmin}_x \\|Ax - b\\|^2$ for large dimensions is a difficult endeavour.
Even if $x$ and $b$ are given in the TT-Tensor and $A$ in the TT-Operator format with small ranks this is far from trivial.
There is a nice property of the TT format though, that we can use to construct a Gauss-Seidel-like iterative scheme: the linear
dependence of the represented tensor on all of its component tensors.
Due to this linearity, we can formulate a smaller subproblem: fixing all but one components of $x$, the resulting minimization
problem is again of the form of a least squares problem as above but with a projected $\hat b = Pb$ and with a smaller $\hat A=PAP^T$.
In practice, these projections can be obtained by simply contracting all fixed components of $x$ to $A$ and $b$ (assuming all
fixed components are orthogonolized). The ALS algorithm will now simply iterate over the components of $x$ and solve these smaller subproblems.
There are a few things we should note before we start implementing this algorithm
* It is enough to restrict ourselves to the case of symmetric positive-semidefinite operators $A$. Any non-symmetric problem can be solved by setting $A'=A^TA$ and $b' = A^Tb$.
* We should always move our core of $x$ to the position currrently being optimized to make our lives easier (for several reasons...).
* Calculating the local operators $\hat A$ for components $i$ and $i+1$ is highly redundant. All components of $x$ up to the $i-1$'st have to be contracted with $A$ in both cases. Effectively this means, that we will keep stacks of $x^TAx$ contracted up to the current index ("left" of the current index) as well as contracted at all indices above the currrent one ("right" of it) and similarly for $x^T b$.
## Pseudo-Code
Let us start by writing down the algorithm in pseudo-code and then fill out the steps one by one.
__tabsStart
~~~ cpp
// while we are not done
// for every position p = 0..degree(x) do
// local operator = left stack(A) * p'th component of A * right stack(A)
// local rhs = left stack(b) * p'th component of b * right stack(b)
// p'th component of x = solution of the local least squares problem
// remove top entry of the right stacks
// add position p to left stacks
// for every position p = degree(x)..0 do
// same as above OR simply move core and update stacks
~~~
__tabsMid
~~~ py
# while we are not done
# for every position p = 0..degree(x) do
# local operator = left stack(A) * p'th component of A * right stack(A)
# local rhs = left stack(b) * p'th component of b * right stack(b)
# p'th component of x = solution of the local least squares problem
# remove top entry of the right stacks
# add position p to left stacks
# for every position p = degree(x)..0 do
# same as above OR simply move core and update stacks
~~~
__tabsEnd
## Helper Class
We want our main loop to resemble the above pseudo code as closely as possible, so we have to define some helper functions to
update the stacks. To ensure that all functions work on the same data without passing along references all the time, we will
define a small helper class, that holds all relevant variables: the degree `d` of our problem, the left and right stacks for `A`
and `b`, the TT tensors `A`, `x` and `b` themselves and the norm of `b`. As a parameter of the algorithm we will also store the