Where here we wonder why there are no generic termination criteria for the gradient method which guarantee a desired sub-optimality when is strictly (not strongly) convex and has -Lipschitz gradient. Does it make sense to terminate when is small? And when should we terminate?
Here we discuss the criteria under which the simple gradient method should be terminated for unconstrained minimisation problems
where is a strictly convex (but not strongly convex) function with -Lipschitz gradient.
The criterion if fine for strongly convex functions with -Lipschitz gradient. Indeed, if is -strongly convex, that is
then, for such that (the unique minimiser of ), we have
so, if , then , i.e., is -suboptimal.
But termination is a mysterious thing… In general, if is merely strictly convex, it is not true that we will have if , for some (not even locally). The reason for that if that the condition that has Lipschitz gradient, or the condition that is Lipschitz continuous provides only an upper bound on the steepness of the function, but allows it to be arbitrarily flat – especially close to the optimum.
There might be specific cases where a favourable bound holds, notwithstanding. Unless you draw some additional assumptions on , this will not be a reliable termination criterion.
However, strong convexity is often too strong a requirement in practice. Weaker conditions are discussed in the article: D. Drusvyatskiy and A.S. Lewis, Error bounds, quadratic growth, and linear convergence of proximal methods, 2016.
Let be convex with -Lipschithz gradient and define . Let us assume that has a unique minimiser (e.g., is strictly convex). Then assume that has the property
for all for some . Functions which satisfy this property are not necessarily strongly convex. We say in that case that is a strong minimum. As an example we have which has a strong minimum, but is not strongly convex. Of course if is strongly convex the above holds as well as if is given in the form where is a strongly convex function and is any matrix.
Then, condition the above condition is shown to be equivalent to
for all and with .
Clearly in this case we may use the termination condition which will imply that .
There are, however, cases where after all the criterion implies some sort of bound on . The function , which is not strongly convex, is such an example. We may show that which provides an error bound, although is not strongly convex. In general, if is not strongly convex and it doesn’t have a strong minimum, we may still derive error bounds based on other properties of the function at hand, but for the time being the above results are the most generic that are known.