Do no generic termination criteria exist for steepest descent?

Where here we wonder why there are no generic termination criteria for the gradient method which guarantee a desired sub-optimality when $f$ is strictly (not strongly) convex and has $L$-Lipschitz gradient. Does it make sense to terminate when $\|\nabla f(x_k)\|$ is small? And when should we terminate?

Here we discuss the criteria under which the simple gradient method $x_{k+1} = x_{k} - \frac{1}{L}\nabla f(x_k)$ should be terminated for unconstrained minimisation problems

\begin{aligned}\mathrm{minimise}_x\ f(x)\end{aligned},

where $f$ is a strictly convex (but not strongly convex) function with $L$-Lipschitz gradient.

The criterion $\|\nabla f(x_k) \| < \epsilon$ if fine for strongly convex functions with $L$-Lipschitz gradient. Indeed, if $f$ is $\mu$-strongly convex, that is

\begin{aligned} f(y) \geq f(x) + \nabla f(x)^\top (y-x) + \tfrac{\mu}{2} \|y-x\|^2 \end{aligned},

then, for $x^*$ such that $\nabla f(x^*)=0$ (the unique minimiser of $f$), we have

\begin{aligned} f(x) - f(x^*)\leq \tfrac{1}{2\mu}\|\nabla f(x) \|^2, \end{aligned}

so, if $\|\nabla f(x) \|^2 < 2\mu\epsilon$, then $f(x) - f(x^*) < \epsilon$, i.e., $x$ is $\epsilon$-suboptimal.

But termination is a mysterious thing… In general, if $f$ is merely strictly convex, it is not true that we will have $\|x-x^*\|<\epsilon$ if $\| \nabla f(x) \| < \kappa \epsilon$, for some $\kappa > 0$ (not even locally). The reason for that if that the condition that $f$ has Lipschitz gradient, or the condition that $f$ is Lipschitz continuous provides only an upper bound on the steepness of the function, but allows it to be arbitrarily flat – especially close to the optimum.

There might be specific cases where a favourable bound holds, notwithstanding. Unless you draw some additional assumptions on $f$, this will not be a reliable termination criterion.

However, strong convexity is often too strong a requirement in practice. Weaker conditions are discussed in the article:  D. Drusvyatskiy and A.S. Lewis, Error bounds, quadratic growth, and linear convergence of proximal methods, 2016.

Let $f$ be convex with $L$-Lipschithz gradient and define $\mathcal{B}_\nu^f = \{x: f(x) - f^* < \nu\}$. Let us assume that $f$ has a unique minimiser $x^*$ (e.g., $f$ is strictly convex). Then assume that $f$ has the property

\begin{aligned} f(x) - f(x^*) \geq \tfrac{\alpha}{2} \|x-x^*\|^2, \end{aligned}

for all $x\in\mathcal{B}_\nu^f$ for some $\nu>0$. Functions which satisfy this property are not necessarily strongly convex. We say in that case that $x^*$ is a strong minimum. As an example we have $f = (\max\{|x|-1,0\})^2$ which has a strong minimum, but is not strongly convex. Of course if $f$ is strongly convex the above holds as well as if $f$ is given in the form $f(x) = h(Ax)$ where $h$ is a strongly convex function and $A$ is any matrix.

Then, condition the above condition is shown to be equivalent to

\begin{aligned} \|x-x^*\| \leq \frac{2}{\alpha} \|\nabla f(x) \|,\end{aligned}

for all $x\in\mathcal{B}_{\nu}^f$ and with $\alpha < 1/L$.

Clearly in this case we may use the termination condition $\| \nabla f(x) \| < \epsilon\alpha/2$ which will imply that $\|x-x^*\| < \epsilon$.

There are, however, cases where after all the criterion $\| \nabla f(x) \| < \epsilon$ implies some sort of bound on $\|x-x^*\|$. The function $f(x) = x^4$, which is not strongly convex, is such an example. We may show that $|f'(x)-f'(x^*)|^{\tfrac{1}{3}}=|f'(x)|^{\tfrac{1}{3}} = 2^{\tfrac{2}{3}}|x-x^*|$ which provides an error bound, although $f$ is not strongly convex. In general, if $f$ is not strongly convex and it doesn’t have a strong minimum, we may still derive error bounds based on other properties of the function at hand, but for the time being the above results are the most generic that are known.

One response

1. remarkable, notwithstanding.

Like

mathbabe

Exploring and venting about quantitative issues

Look at the corners!

The math blog of Dmitry Ostrovsky

The Unapologetic Mathematician

Mathematics for the interested outsider

Almost Sure

A random mathematical blog

Mathematix

Mathematix is the mathematician of the village of Asterix