Error bounds for second order approximation

Where here we prove an approximation bound for twice continuously differentiable functions f:\mathbb{R}^n \to \mathbb{R} with M-Lipschitzian Hessian, that is \|\nabla^2 f(x) - \nabla^2 f(y)\| \leq M \|x-y\| for all x,y. In particular, we show that for all x,y

\begin{aligned} |f(y) - f(x) - \langle \nabla f(x), y-x\rangle - \frac{1}{2}\langle \nabla f^2(x)(y-x), y-x\rangle| \leq \frac{M}{6}\|x-y\|^3. \end{aligned}

This is stated as Lemma 1.2.4 in: Y. Nesterov, Introductory Lectures on Convex Optimization – A basic course, Kluwer Ac. Publishers, 2004.

Since f is twice continuously differentiable, we may use the fact that

\begin{aligned} f(y) = f(x) + \langle \nabla f(x), y-x\rangle + \frac{1}{2}\nabla f^2(x+\alpha (y-x))(y-x), \end{aligned}\hfill (1)

for some \alpha\in [0,1].

We may write this equivalently as (and this is maybe more convenient)

\begin{aligned} f(y) = f(x) + \langle \nabla f(x), y-x\rangle + \int_0^1 \int_0^\tau \left\langle \nabla f^2(x+\alpha (y-x))(y-x), y-x\right\rangle \mathrm{d} \alpha \mathrm{d} \tau. \end{aligned}\hfill (2)

Since \nabla^2 f is M-Lipschitz,

\begin{aligned} &\|\nabla^2 f(x+\alpha(y-x)) - \nabla^2 f(x)\| \leq M\alpha \|x-y\|, \\ \Leftrightarrow& \nabla^2 f(x+\alpha(y-x)) = \nabla^2 f(x) + H, \end{aligned}\hfill (3)

where \|H\|\leq M\alpha \|x-y\|. We may now easily plug (3) into (2) and prove the original inequality. In particular, because of (1):

\begin{aligned} f(y) &= f(x) + \langle \nabla f(x), y-x\rangle + \int_0^1 \int_0^\tau \left\langle \nabla f^2(x)(y-x), y-x\right\rangle \mathrm{d} \alpha \mathrm{d} \tau\\ &+ \int_0^1 \int_0^\tau \left\langle H(y-x), y-x\right\rangle \mathrm{d} \alpha \mathrm{d} \tau, \end{aligned}\hfill (4)

where the first integral in (4) is

\begin{aligned} \int_0^1 \int_0^\tau \left\langle \nabla f^2(x)(y-x), y-x\right\rangle \mathrm{d} \alpha \mathrm{d} \tau = \tfrac{1}{2}\left\langle \nabla f^2(x)(y-x), y-x\right\rangle \end{aligned}

and the second integral’s absolute value is

\begin{aligned} \left|\int_0^1 \int_0^\tau \left\langle H(y-x), y-x\right\rangle \mathrm{d} \alpha \mathrm{d} \tau \right| &\leq \int_0^1 \int_0^\tau \|H\|\  \|y-x\|^2 \mathrm{d} \alpha \mathrm{d} \tau\\ &\leq \int_0^1 \int_0^\tau M \alpha \|y-x\|^3 \mathrm{d} \alpha \mathrm{d} \tau\\ &=\frac{M}{6}\|y-x\|^3. \end{aligned}

Advertisements

One response

  1. […] derived – for real-valued functions – when second order information is available. Read this post for details. In that case, the right-hand side involves a cubic term […]

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

mathbabe

Exploring and venting about quantitative issues

Look at the corners!

The math blog of Dmitry Ostrovsky

The Unapologetic Mathematician

Mathematics for the interested outsider

Almost Sure

A random mathematical blog

Mathematix

Mathematix is the mathematician of the village of Asterix

%d bloggers like this: