The gradient method with constant step length is the simplest method for solving unconstrained optimisation problems involving a continuously differentiable function with Lipschitz-continuous gradient. The motivation for this post came after reading this Wikipedia article where it is stated that under certain assumptions the sequence converges to a local optimum, but it is no further discussion is provided.

The gradient method with constant step size is defined as

where is a Lipschitz constant of . For this method the following facts are proven and they are well known:

- If is continuously differentiable with -Lipschitz gradient (but not necessarily convex), then

where is the*infimum*of (the function may not even have a minimiser). - If is continuously differentiable with -Lipschitz gradient, its set of minimisers is nonempty and it is also convex then

In the first case we should underline that , which does not mean that converges. In the second case where is convex we similarly should not deduce that converges to a point and neither should we draw the conclusion that converges towards the set in the sense (where is the distance-to-set function).

Nevertheless, we may show that the sequence is Fejér with respect to which will allow us to prove that, indeed, as .

**Definition (Fejér sequence). **We say that a sequence is Fejér monotone with respect to a set if

for all and .

Fejér sequences possess certain favourable properties such as that they are bounded, converges for every and is decreasing and converges.

**Proposition (Fejér monotonicity). **The sequence produced by the gradient method with constant step size when is convex, continuously diff/ble and with -Lipschitz gradient, is Fejér with respect to the set of minimisers of (assumed nonempty).

*Proof.* Take any minimiser . Then

Since is convex,

and since is -Lipschitz

therefore,

So,

The most important property of Fejér sequences wrt to a set is that if a cluster point of that sequence is in , then the whole sequence *converges *to that point. In our case we know that is bounded and has cluster points which are stationary and because is convex, they are minimisers of our function. As a result, the whole sequence converges to a minimiser .

The Fejér condition can be weakened. We may take to be a quasi-Fejér sequence, that is

where is a summable sequence of nonnegative scalars.

We may also generalise the above result when instead of a constant step size we use a variable one which satisfies the Armijo condition.

Further reading:

- H.H. Bauschke and P.L. Combettes,
*Convex analysis and monotone operator theory in Hilbert spaces,*Springer, 2011 - A.N. Iusem,
*On the convergence properties of the projected gradient method for convex optimization,*Computational and Applied Mathematics 22(1):37-52, 2003. - L.M. Grana Drummond and B.F. Svaiter,
*A steepest descent method for vector optimization,*Journal of Computational and Applied Mathematics 175(2):395-313, 2015.