# Convergence of the iterates of the gradient method with constant stepsize

The gradient method with constant step length is the simplest method for solving unconstrained optimisation problems involving a continuously differentiable function with Lipschitz-continuous gradient. The motivation for this post came after reading this Wikipedia article where it is stated that under certain assumptions the sequence $\{x_k\}$ converges to a local optimum, but it is no further discussion is provided.
The gradient method with constant step size is defined as

\begin{aligned} x_{k+1} = x_{k} - \frac{1}{L}\nabla f(x_k) \end{aligned}

where $L$ is a Lipschitz constant of $\nabla f$. For this method the following facts are proven and they are well known:

1. If $f$ is continuously differentiable with $L$-Lipschitz gradient (but not necessarily convex), then
\begin{aligned} \min_{k=0,\ldots, N}\|\nabla f(x_k)\|^2 \leq \frac{2L}{N+1}(f(x_0)-f^*) \end{aligned}
where $f^*$ is the infimum of $f$ (the function may not even have a minimiser).
2. If $f$ is continuously differentiable with $L$-Lipschitz gradient, its set of minimisers is nonempty and it is also convex then
\begin{aligned} f(x_k) - f^* \leq \frac{L}{2k}\|x_0 - x^*\|^2. \end{aligned}

In the first case we should underline that $\liminf \|\nabla f(x_k)\|^2 = 0$, which does not mean that $\|\nabla f(x_k)\|$ converges. In the second case where $f$ is convex we similarly should not deduce that $\{x_k\}$ converges to a point $x^*\in\mathrm{argmin} f$ and neither should we draw the conclusion that $\{x_k\}$ converges towards the set $\mathrm{argmin} f$ in the sense $d(x_k, \mathrm{argmin} f) \to 0$ (where $d$ is the distance-to-set function).

Nevertheless, we may show that the sequence $\{x_k\}$ is Fejér with respect to $\mathrm{argmin} f$ which will allow us to prove that, indeed, $d(x_k, \mathrm{argmin} f) \to 0$ as $k\to\infty$.

Definition (Fejér sequence). We say that a sequence $\{x_k\}$ is Fejér monotone with respect to a set $C$ if

\begin{aligned} \|x_{k+1}-x\|^2 \leq \|x_{k} - x\|^2, \end{aligned}

for all $x \in C$ and $k \in \mathbb{N}$.

Fejér sequences possess certain favourable properties such as that they are bounded, $\{\|x_k-x\|\}$ converges for every $x\in C$ and $d(x_k, C)$ is decreasing and converges.

Proposition (Fejér monotonicity). The sequence $\{x_k\}$ produced by the gradient method with constant step size $1/L$ when $f$ is convex, continuously diff/ble and with $L$-Lipschitz gradient, is Fejér with respect to the set of minimisers of $f$ (assumed nonempty).

Proof. Take any minimiser $x^*$. Then

\begin{aligned} \|x_{k+1}-x\|^2 &= \|x_k - \frac{1}{L} \nabla f(x_k) - x^*\|^2\| \\ &= \|x_k-x^*\|^2 + \frac{1}{L}\|\nabla f(x_k)\|^2 - \frac{2}{L} \langle \nabla f(x), x-x^*\rangle \end{aligned}

Since $f$ is convex,

\begin{aligned} &\langle \nabla f(x) - \nabla f(x^*), x -x^*\rangle \geq 0\\ \Leftrightarrow& \langle \nabla f(x), x -x^*\rangle \geq 0 \end{aligned}

and since $\nabla f$ is $L$-Lipschitz

\begin{aligned} \|\nabla f(x)\|^2 &= \| \nabla f(x) - \nabla f(x^*)\|^2\\ &\leq L \|x-x^*\|^2 \end{aligned}

therefore,

\begin{aligned} \frac{1}{L^2}\|\nabla f(x_k)\|^2 \leq \frac{1}{L} \|x_k - x^*\|^2. \end{aligned}

So,

\begin{aligned} \|x_{k+1}-x^*\|^2 \leq \left(1+\frac{1}{L}\right) \|x_k - x^*\|^2 \leq\|x_k - x^*\|^2. \Box \end{aligned}

The most important property of Fejér sequences wrt to a set $C$ is that if a cluster point of that sequence is in $C$, then the whole sequence converges to that point. In our case we know that $\{x_k\}$ is bounded and has cluster points which are stationary and because $f$ is convex, they are minimisers of our function. As a result, the whole sequence converges to a minimiser $x^*$.

The Fejér condition can be weakened. We may take $\{x_k\}$ to be a quasi-Fejér sequence, that is

\begin{aligned} \|x_{k+1}-x\|^2 \leq \|x_{k} - x\|^2 + \epsilon_k, \end{aligned}

where $\{\epsilon_k\}$ is a summable sequence of nonnegative scalars.

We may also generalise the above result when instead of a constant step size $1/L$ we use a variable one $h_k$ which satisfies the Armijo condition.

• H.H. Bauschke and P.L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces, Springer, 2011
• A.N. Iusem, On the convergence properties of the projected gradient method for convex optimization, Computational and Applied Mathematics 22(1):37-52, 2003.
• L.M. Grana Drummond and B.F. Svaiter, A steepest descent method for vector optimization, Journal of Computational and Applied Mathematics 175(2):395-313, 2015.
mathbabe

Exploring and venting about quantitative issues

Look at the corners!

The math blog of Dmitry Ostrovsky

The Unapologetic Mathematician

Mathematics for the interested outsider

Almost Sure

A random mathematical blog

Mathematix

Mathematix is the mathematician of the village of Asterix