The gradient method with constant step length is the simplest method for solving unconstrained optimisation problems involving a continuously differentiable function with Lipschitz-continuous gradient. The motivation for this post came after reading this Wikipedia article where it is stated that under certain assumptions the sequence converges to a local optimum, but it is no further discussion is provided.
The gradient method with constant step size is defined as
where is a Lipschitz constant of . For this method the following facts are proven and they are well known:
- If is continuously differentiable with -Lipschitz gradient (but not necessarily convex), then
where is the infimum of (the function may not even have a minimiser).
- If is continuously differentiable with -Lipschitz gradient, its set of minimisers is nonempty and it is also convex then
In the first case we should underline that , which does not mean that converges. In the second case where is convex we similarly should not deduce that converges to a point and neither should we draw the conclusion that converges towards the set in the sense (where is the distance-to-set function).
Nevertheless, we may show that the sequence is Fejér with respect to which will allow us to prove that, indeed, as .
Definition (Fejér sequence). We say that a sequence is Fejér monotone with respect to a set if
for all and .
Fejér sequences possess certain favourable properties such as that they are bounded, converges for every and is decreasing and converges.
Proposition (Fejér monotonicity). The sequence produced by the gradient method with constant step size when is convex, continuously diff/ble and with -Lipschitz gradient, is Fejér with respect to the set of minimisers of (assumed nonempty).
Proof. Take any minimiser . Then
Since is convex,
and since is -Lipschitz
The most important property of Fejér sequences wrt to a set is that if a cluster point of that sequence is in , then the whole sequence converges to that point. In our case we know that is bounded and has cluster points which are stationary and because is convex, they are minimisers of our function. As a result, the whole sequence converges to a minimiser .
The Fejér condition can be weakened. We may take to be a quasi-Fejér sequence, that is
where is a summable sequence of nonnegative scalars.
We may also generalise the above result when instead of a constant step size we use a variable one which satisfies the Armijo condition.
- H.H. Bauschke and P.L. Combettes, Convex analysis and monotone operator theory in Hilbert spaces, Springer, 2011
- A.N. Iusem, On the convergence properties of the projected gradient method for convex optimization, Computational and Applied Mathematics 22(1):37-52, 2003.
- L.M. Grana Drummond and B.F. Svaiter, A steepest descent method for vector optimization, Journal of Computational and Applied Mathematics 175(2):395-313, 2015.