# Metric subregularity for monotone inclusions

Metric sub-regularity is a local property of set-valued operators which turns out to be a key enabler for linear convergence in several operator-based algorithms. However, conditions are often stated for the fixed-point residual operator and become rather difficult to verify in practice. In this post we state sufficient metric sub-regularity conditions for a monotone inclusion which are easier to verify and we focus on the preconditioned proximal point method (P3M).

## Preliminaries

Let us first provide some necessary mathematical preliminaries and definitions.

Let use denote by $\bar{\mathbb{R}}$ the set of extended real numbers, that is $\mathbb{R}\cup\{+\infty\}$. Functions which map into $\bar{\mathbb{R}}$ are called extended-real-valued and are often used in optimisation to encode constraints.

Let $\mathcal{H}$ be a real Hilbert space. For a function $:\mathcal{H}\to\bar{\mathbb{R}}$, its domain is the set $\mathrm{dom} f=\{x\in\mathcal{H}: f(x) < \infty\}$.

We call $f$ proper if it is not everywhere equal to infinity, that is $\mathrm{dom} f\neq \varnothing$.

We say that $f$ is closed (or lower semicontinuous) if $f(x)\leq \liminf_{y\to x}f(y)$. This is equivalent to requiring that the epigraph of $f$, that is the set $\mathrm{epi} f = \{(x,\alpha)\in\mathcal{H}\times \mathbb{R}: f(x) \leq \alpha\}$, is closed.

The convex conjugate of a proper convex function $f:\mathcal{H}\to\bar{\mathbb{R}}$, is the function $f^*(y) = \sup_{x\in \mathcal{H}}\langle x,y \rangle-f(x)$. The convex conjugate plays a central role in Fenchel duality theory.

The proximal operator of a proper, closed, convex function $f$ with parameter $\gamma>0$ is the mapping $\mathrm{prox}_{\gamma f}(v) = \mathrm{argmin}_z f(z) + \frac{1}{2\gamma}\|v-z\|^2$. The proximal operator of $f$ is also the resolvent of the operator $\partial f$. In general, for an operator $F$, its resolvent is the operator $J_F = (I+F)^{-1}$, where $I$ is the identity operator.

The subdifferential of a proper, convex function $f$ at $x$ is $\partial f(x) = \{g\in\mathcal{H}\mid f(z)\geq f(x) + \langle g, z-x \rangle, \forall z\in\mathcal{H}\}$.

For two Hilbert spaces $(\mathcal{H}^*, \langle\cdot,\cdot\rangle)$ and $(\mathcal{H}, \langle\cdot,\cdot \rangle_*)$ we denote by $\mathcal{H}\oplus \mathcal{H}^*$ the direct sum of the two spaces equipped with the inner product $\langle (x,x_*), (y,y_*)\rangle=\langle x,x_* \rangle + \langle y,y_* \rangle$.

We define three important classes of operators $T:\mathcal{H}\to\mathcal{H}$ on Hilbert spaces. We call $T$ nonexpansive if $\|Tx-Ty\|\leq \|x-y\|$. We call $T$ $\alpha$averaged if it can be written as $T=(1-\alpha)I+\alpha R$, where $I$ is the identity operator and $R$ is nonexpansive. We say that $T$ is firmly nonexpansive if it is $\tfrac{1}{2}$-averaged.

We say that a set-valued operator $F:\mathcal{H}\rightrightarrows \mathcal{H}$ is a monotone operator if for every $x,y\in\mathcal{H}$ and for every $u\in F(x)$ and $v \in F(y)$, it is $\langle x-y, u-v\rangle \geq 0$. For any proper function $f$, $\partial f$ is monotone.

We say that $F$ is maximally monotone if it is monotone and its graph cannot be extended to the graph of another monotone operator – in other words, if for every $x,u\in\mathcal{H}$ for which $\langle x-y, u-v\rangle \geq 0$ for all $y,v\in\mathcal{H}$ with $v\in F(y)$, it is $u\in F(x)$. As an example, skew-symmetric bounded linear operators are maximally monotone. Moreoever, the subdifferential of a convex, proper, closed function is maximally monotone.

Last, for a convex set $C\subseteq \mathcal{H}$, we define its core to be the set $\mathrm{core} C = \{x\in C: \mathrm{cone}(C-x)=\mathcal{H}\}$, where $\mathrm{cone}$ is the conic hull. The strong relative interior of a set $C$ is the set $\mathrm{sri}(C)=\{x\in C: \mathrm{cone}(C-x)=\overline{\mathrm{span}}(C-x)\}$ (here span stands for the set of linear combinations and the overline denotes the topological closure). The strong relative interior is a recurring notion in infinite dimensional spaces and it appears in results related to strong duality (such as the Attouch-Brézis Theorem).

For an operator $T:\mathcal{H}\to\mathcal{H}$, the set of its fixed points is defined as $\mathrm{fix} T=\{z\in\mathcal{H}: Tz = z\}$.

The inverse of an operator $F:\mathcal{H}\rightrightarrows \mathcal{H}$ is defined as $F^{-1}y = \{x\in \mathcal{H}: y\in Fx\}$. The set of zeros of $F$ is $\mathrm{zer} F = F^{-1}0=\{x\in\mathcal{H}: 0 \in Fx\}$

## Preconditioned Proximal Point Method

Optimization problems can be stated as monotone inclusions:

\begin{aligned} F(z)\ni 0 \end{aligned},

where $F$ is a maximally monotone operator. The classical proximal point method (P2M) is

\begin{aligned} z_{k+1} = J_{\gamma F}(z_k) \end{aligned},

where $J_{\gamma F}=(I+\gamma F)^{-1}$ is the resolvent of $F$ with parameter $\gamma>0$. More often than not, however, $J_{\gamma F}$ cannot be evaluated as it requires the solution of another optimisation problem.

The preconditioned proximal point method (P3M) consists in replacing $F$ in P2M with $PF$ where $P$ is a bounded invertible linear operator on $\mathcal{H}$. The modified algorithm reads

\begin{aligned} z_{k+1} &= J_{\gamma PF}(z_k)\\ &= (I+\gamma PF)^{-1}(z_k)\\ &= (P+\gamma F)^{-1}Pz_k \end{aligned},

An appropriate choice of $P$ may lead to steps which are easier to evaluate as we shall see in the following section.

## Primal-Dual Optimality Conditions

Consider the following optimisation problem

\begin{aligned} \mathrm{minimise}_{x\in\mathcal{H}}\ f(x) + g(Lx), \end{aligned}

where $f:\mathcal{H}\to\bar{\mathbb{R}}$ and $g:\mathcal{H}^*\to\bar{\mathbb{R}}$ are proper, convex, closed functions and $L:\mathcal{H}\to\mathcal{H}^*$ is a bounded linear operator whose adjoint is denoted by $L^*$.

We assume that $\mathrm{dom}\,(f+g\circ L)\neq \varnothing$, that is, the above optimisation problem has a nonempty feasible domain.

The optimality conditions of this problem are

\begin{aligned} 0 \in \partial (f+ g \circ L)(x), \end{aligned}

and provided that $0 \in \mathrm{sri}(\mathrm{dom} g - L(\mathrm{dom}f))$, the optimality conditions become

\begin{aligned} 0 \in \partial f(x)+ L^* \partial g(Lx). \end{aligned}

Equivalently, the optimality conditions are satisfied if and only if there is a $x\in\mathcal{H}$ and a $u\in\mathcal{H}^*$ so that

\begin{aligned} 0 &\in \partial f(x) + L^* u,\\ 0 &\in (\partial g)^{-1}(u), \end{aligned}

Using the well-known property that $(\partial g)^{-1} = \partial g^*$ (as it easily follows from the definitions of the subdifferential and the convex conjugate), the optimality conditions become simply.

\begin{aligned} 0 &\in \partial f(x) + L^* u\\ 0 &\in \partial g^*(u). \end{aligned}

Naturally, we define the operator $F:\mathcal{H}\oplus \mathcal{H}^*\rightrightarrows \mathcal{H}\oplus \mathcal{H}^*$ as

\begin{aligned} F(x,u) = \begin{bmatrix}\partial f(x)\\ \partial g^*(u)\end{bmatrix} + \begin{bmatrix}0 & L^*\\-L & 0\end{bmatrix}\begin{bmatrix}x\\u\end{bmatrix}. \end{aligned}

We are now looking for a primal-dual point $z = (x,u)$ which satisfies the monotone inclusion $F(z)\ni 0$.

## The Chambolle-Pock Method

The Chambolle-Pock method is an instance of the P3M method for solving $F(z)\ni 0$ using the following linear operator as a preconditioner

\begin{aligned} P = \begin{bmatrix}I & -\alpha_1 L^*\\ -\alpha_1 L & \frac{\alpha_1}{\alpha_2}I\end{bmatrix}. \end{aligned}

Using its Schur complement it can be verified that this is positive definite provided that $\alpha_1 \alpha_2 \|L\|^2 < 1$ (where $\|L\|$ is the operator norm of $L$).

This linear operator is used to define a modified inner product on $\mathcal{H}\oplus \mathcal{H}^*$ as $\langle p,q\rangle = \langle p, Pq \rangle$ which in turn induces the norm $\|p\|_P^2=\langle p,p \langle_P = \langle p, Pp\rangle$. In what follows, $\mathcal{H}\oplus \mathcal{H}^*$ will be taken with respect to this inner product.

Using this preconditioner, P3M boils down to the following recursion:

\begin{aligned} x_{k+1} &= \mathrm{prox}_{\alpha_1 f}(x_k - \alpha_1 L^* u_k)\\ u_{k+1} &= \mathrm{prox}_{\alpha_2 g^*}(u_k + \alpha_2 L(2x_{k+1}-x_k)). \end{aligned}

This is easy to verify starting from the basic P3M iteration described above, which is $z_{k+1}= (P + \alpha_1 F)^{-1}Pz_k$ where $z_{k}=(x_{k}, u_{k})$.

This defines the following operator

\begin{aligned} (x_{k+1}, u_{k+1}) = T(x_k, u_k), \end{aligned}

which we shall refer to as the Chambolle-Pock operator. The fixed points of $T$, that is, points $(x,u)$ such that $(x,u) = T(x,u) = (P + \alpha_1 F)^{-1}P(x_k,u_k)$ are exactly the solutions of the monotone inclusion $F(x,u)\ni 0$.

Operator $T$ is firmly nonexpansive in $\mathcal{H}\oplus \mathcal{H}^*$ with the modified inner product introduced above.

We further define the fixed-point residual operator $R: \mathcal{H}\oplus \mathcal{H}^* \to \mathcal{H}\oplus \mathcal{H}^*$ as

\begin{aligned} R = I - T. \end{aligned}

## Metric Sub-regularity

The notion of metric subregularity is of key importance in establishing linear convergence rates for several algorithms. Let us first state the definition:

Definition. A set-valued function $\Phi: \mathcal{K}\rightrightarrows \mathcal{K}$ defined on a Hilbert space $\mathcal{K}$ is said to be metrically subregular at a $\bar{x}\in\mathcal{H}$ for a $\bar{y}\in \Phi(\bar{x})$ if there is a constant $\kappa > 0$ and a neighbourhood $U$ for $\bar{x}$ so that

\begin{aligned} d(x, \Phi^{-1}(\bar{y})) \leq \kappa d(\bar{y}, \Phi(x)), \end{aligned}

for all $x \in U$.

Metric subregularity is equivalent to the calmness of the inverse mapping. It is a property which is weaker than metric regularity, bounded linear regularity for single-valued mappings and the popular Aubin property.

A common assumption which is used to establish linear rate of convergence is that the fixed-point residual $R$ is metrically subregular at a solution $z^\star$ for $0$, that is, there is a neighbourhood $U$ of $z^\star$ so that

\begin{aligned} d(z, R^{-1}(0)) \leq \kappa \|Rz\|, \end{aligned}

where $R^{-1}(0)$ is the set of fixed-points of $T$. This condition is, however, difficult to verify in practice. Instead, we may use the following condition in conjunction with P3M. We state the following proposition which is not specific to the Chambolle-Pock algorithm.

Proposition. Let $T=(P+\alpha F)^{-1}P$ be a P3M step and let $R=I-T$ be the corresponding fixed-point residual.  If $F$ is metrically sub-regular at $z^\star\in\mathrm{fix} T$ for $0$ with modulus $\kappa$, then $R$ is metrically sub-regular at $z^\star$ for $0$ with modulus $1+\frac{\kappa}{\alpha_1}\|P\|$.

Proof. For given $z^\star\in\mathrm{fix} T$ take $\epsilon>0$ so that $B_{z^\star,\epsilon} \triangleq \{z\in\mathcal{H}\oplus \mathcal{H}^*: \|z-z^\star\| \leq \epsilon\} \subseteq \mathcal{V}_{z^\star}$.

For $z\in B_{z^\star,\epsilon}$, define $z'=(P+\alpha_1 F)^{-1}Pz$ for which we know that $z'\in B_{z^\star,\epsilon}$.

Equivalently, $z'$ satisfies $\alpha_1^{-1}(Pz-Pz')\in F(z')$.

Then,

\begin{aligned} \mathrm{dist}(z', F^{-1}0) \leq \kappa \mathrm{dist}(0, Fz') \leq \frac{\kappa}{\alpha_1}\|P\|\|z-z'\|. \end{aligned}

Note that $\mathrm{zer} R = R^{-1}0=F^{-1}0=\mathrm{fix} T$ and $Rz=z-z'$. Using the triangle inequality for the distance-to-set mapping, for $z\in\mathcal{B}_{z^\star,\epsilon}$

\begin{aligned} \mathrm{dist}(z,R^{-1}0) &\leq \mathrm{dist}(z',F^{-1}0) + \|z-z'\|\\ &\leq (1+\tfrac{\kappa}{\alpha_1}\|P\|) \|Rz\|, \end{aligned}

which completes the proof.  $\Box$

Shen and Pan (Prop. 3.1) provide conditions under which operators $F$ in the form $F=F_1+F_2$ are metrically sub-regular at a $z^\star\in F^{-1}0$. A condition which is also mentioned in the book of Dontchev and Rockafellar requires $F_2$ to be an affine map and $F_1$ to have a polyhedral graph.

The condition of the above proposition is that for $z^\star \in F^{-1}0$, there is a $\kappa>0$ and a neighbourhood $U$ of $z^\star$ so that

\begin{aligned} \mathrm{dist}(z, F^{-1}0) \leq \kappa \mathrm{dist}(0, Fz), \end{aligned}

for all $z\in U$. The right hand side of this last inequality can be written as

\begin{aligned} \mathrm{dist}(0, Fz) = \mathrm{dist} \left( \begin{bmatrix}-L^* u\\ Lx\end{bmatrix}, \begin{bmatrix} \partial f(x) \\ \partial g^*(u)\end{bmatrix}\right). \end{aligned}

mathbabe

Exploring and venting about quantitative issues

Look at the corners!

The math blog of Dmitry Ostrovsky

The Unapologetic Mathematician

Mathematics for the interested outsider

Almost Sure

A random mathematical blog

Mathematix

Mathematix is the mathematician of the village of Asterix