Cone programs and self-dual embeddings

This post aims at providing some intuition into cone programs from different perspectives; in particular:

  1. Equivalence of different formulations of cone programs
  2. Fenchel duality
  3. Primal-dual optimality conditions (OC)
  4. OCs as variational inequalities
  5. Homogeneous self-dual embeddings (HSDEs)
  6. OCs for HSDEs

Cones and their properties

A set \mathcal{K} in a vector space X is called a cone if \lambda x \in \mathcal{K} for every x\in\mathcal{K} and it is closed
under addition. \mathcal{K} is said to be a pointed cone if x\in \mathcal{K} and -x \in \mathcal{K} imply x=0. Pointed cones define the partial ordering:

\begin{aligned} x \geq_\mathcal{K} y \Leftrightarrow x-y \in \mathcal{K}. \end{aligned}

The polar of a cone \mathcal{K} is a cone \mathcal{K}^\circ defined as

\begin{aligned}\mathcal{K}^\circ = \{x^* {}\mid{} \langle x^*, x\rangle  \leq 0, \forall x\in\mathcal{K}\}. \end{aligned}

If K is closed, then \mathcal{K}^{\circ\circ}=\mathcal{K} and the indicators of \mathcal{K} and \mathcal{K}^{\circ} are conjugate to each other, that is

\begin{aligned} (\delta_\mathcal{K})^* = \delta_{\mathcal{K}^\circ}. \end{aligned}

The dual cone of \mathcal{K} is defined as

\begin{aligned} \mathcal{K}^* = \{x^* {}\mid{} \langle x^*, x\rangle  \geq 0, \forall x\in\mathcal{K}\} = -\mathcal{K}^\circ. \end{aligned}

If \mathcal{K} is a closed, convex cone, then \mathcal{K}^{**}=\mathcal{K} and \mathcal{K}^{\circ\circ}=\mathcal{K}.
This result is known as the extended Farkas’ lemma.

Interesting examples of conex are

  1. the set of positive semidefinite matrices \mathcal{K}_{\mathrm{psd}} =\{A \in \mathbb{R}^{n\times n}{}\mid{}X\succeq 0\} which is a closed, convex, pointed cone,
  2. the icecream cone {\mathcal{C}}_n=\{z=(x,t)\in\mathbb{R}^n\times \mathbb{R}{}\mid{} \|x\|_2\leq t\}, and
  3. polyhedral cones \mathcal{K}_{\mathrm{poly}}=\{x\in\mathbb{R}^n{}\mid{}Ax\geq 0\}, for some matrix A\in\mathbb{R}^{m\times n}.

The conjugate of a function f is defined as

\begin{aligned} f^*(y) = \sup_x \langle x,y\rangle  - f(x). \end{aligned}

If f(x) = g(x-b), then f^*(y) = \langle b,y\rangle  + g^*(y). For all \alpha>0, if f(x) = \alpha g(x), f^*(y) = \alpha g^*(y/\alpha).

The subdifferential of a convex function f at a point x \in \mathrm{dom} f is

\begin{aligned} \partial f(x) = \{g {}\mid{} f(y) \geq f(x) + \langle g, y-x\rangle , \forall y \in \mathrm{dom} f\}. \end{aligned}

The subdifferential of the indicator function \delta_C of a convex set C is

\begin{aligned} \partial \delta_C(x) = N_C(x) = \{g {}\mid{} \langle g, x\rangle  \geq \langle g, y\rangle , \forall y\in C\}. \end{aligned}

This function is called the normal cone of C.

Moreau’s Theorem establishes an important duality correspondence between a cone \mathcal{K} and its polar cone \mathcal{K}^\circ:

For all x,y\in X, the following are equivalent:

  1. z=x+y, x\in \mathcal{K}, y\in\mathcal{K}^{\circ}, \langle x,y\rangle =0,
  2. x = \mathrm{proj}_{\mathcal{K}}z, y = \mathrm{proj}_{\mathcal{K}^\circ}z

 

Cone programs

A cone program is an optimization problem of the form

\begin{aligned} \mathcal{P}:&\mathrm{minimize}_{x\in\mathbb{R}^n} \langle c, x\rangle \\ &\text{s.t. } b- Ax\in\mathcal{K}, \end{aligned}

with A\in\mathbb{R}^{m\times n}.

The constraint b- Ax\in\mathcal{K} can be equivalently written as (i) b-Ax\geq_{\mathcal{K}} 0, that is Ax \leq_{\mathcal{K}} b, or (ii) Ax + s = b for s\in\mathcal{K}. Problem \mathcal{P} is often written as

\begin{aligned} \mathcal{P}:&\mathrm{minimize}_{x\in\mathbb{R}^n} \langle c, x\rangle \\ &\text{s.t. } Ax + s = b\\ &s\in \mathcal{K} \end{aligned}

In the literature, we often encounter the following standard formulation for a cone program

\begin{aligned} \mathcal{P}':&\mathrm{minimize}_{x\in\mathbb{R}^n} \langle c, x\rangle \\ &\text{s.t. } Ax = b,\\ & x\in \mathcal{K}. \end{aligned}

These two formulations — problems \mathcal{P} and \mathcal{P}' —  are equivalent in the sense that we may transform one into the other.

For instance, starting from the second one, we clearly see that the constraints can be interpreted as (i) x belongs to an affine space defined as E = \{x {}\mid{} Ax = b\} and (ii) x\in\mathcal{K}, that is, x is contained in a cone. Essentially, x must be in the intersection of a cone and an affine space.

Take x_0\in\mathbb{R}^n so that Ax_0 = b and let N_A\in\mathbb{R}^{n\times p} be a matrix that spans the kernel of A (which has dimension p\leq n), that is AN_Av=0 for all v\in\mathbb{R}^p.

Then E = \{z = N_A v + x_0 {}\mid{} v\in\mathbb{R}^p\} and the requirement that Ax=b is written as x = N_A v + x_0, so the problem becomes

\begin{aligned} &\mathrm{minimize}_{v\in\mathbb{R}^p} \langle N_A^*c, v\rangle \\ &\text{s.t. } N_A v + x_0 \in \mathcal{K}, \end{aligned}

which is in the form of \mathcal{P}.

Conversely, \mathcal{P} can be written in the form of problem \mathcal{P}' .

 

Fenchel duality

The dual of problem \mathcal{P} can be derived using the Fenchel duality framework.

Define f(x) = \langle c,x\rangle and g(z) = \delta_{\mathcal{K}}(b-z). Then, problem \mathcal{P} can be written as

\begin{aligned} \mathcal{P}:\mathrm{minimize}_{x\in\mathbb{R}^n} f(x) + g(Ax), \end{aligned}

whose Fenchel dual is

\begin{aligned} \mathcal{D}:\mathrm{minimize}_{y\in\mathbb{R}^m} f^*(-A^*y) + g^*(y). \end{aligned}

The conjugates of f and g are [note: The conjugate of g is derived as follows: g(z) = \delta_{\mathcal{K}}(b-z) = \delta_{-\mathcal{K}}(z-b), so g^*(y) = \langle b,y\rangle  + (\delta_{-\mathcal{K}})^*(y) = \langle b,y\rangle  + \delta_{(-\mathcal{K})^{\circ}}(y) = \langle b,y\rangle  + \delta_{\mathcal{K}^*}(y).]

\begin{aligned} f^*(y) &= \delta_{\{c\}}(y),\\ g^*(y) &= \langle b, y\rangle  + \delta_{\mathcal{K}^*}(y), \end{aligned}

and the Fenchel dual problem is

\begin{aligned} \mathcal{D}:\mathrm{minimize}_{y\in\mathbb{R}^m} f^*(-A^*y) + g^*(y), \end{aligned}

which is

\begin{aligned} \mathcal{D}:&\mathrm{minimize}_{y\in\mathbb{R}^m} \langle b,y\rangle \\ &\text{s.t. } y\in\mathcal{K}^*,\\ & A^*y + c = 0 \end{aligned}

Let p^\star be the optimal value of \mathcal{P} and d^\star
be the optimal value of \mathcal{D}. Then, strong duality holds if the primal or the dual problem are strictly feasible and then p^\star = -d^\star.

Overall, we may derive the following optimality conditions for a pair (x^\star, y^\star) to be a primal-dual optimal pair

\begin{aligned} b-Ax^\star \in \mathcal{K},\\ y^\star \in \mathcal{K}^*,\\ A^* y^\star + c = 0,\\ \langle c,x^\star\rangle  + \langle b, y^\star\rangle  = 0. \end{aligned}

These optimality conditions can be seen as the following conditions on (x^\star, s^\star, y^\star)

\begin{aligned} \begin{bmatrix} 0\\s^\star \end{bmatrix} = \begin{bmatrix} & A^*\\ -A \end{bmatrix} \begin{bmatrix} x^\star\\y^\star \end{bmatrix} + \begin{bmatrix} c\\b \end{bmatrix},\\ s^\star\in\mathcal{K},\ y^\star\in\mathcal{K}^*,\ \langle s^\star,y^\star\rangle =0. \end{aligned}

Here note that the equality \langle s^\star,y^\star\rangle =0 is equivalent to zero duality gap, that is \langle c,x^\star\rangle  + \langle b, y^\star\rangle  = 0.

 

Optimality conditions from a different perspective

The optimality conditions of \mathcal{P} (using the splitting we introduced in the previous section) are simply

\begin{aligned} 0 \in \partial(f + g\circ A)(x^\star), \end{aligned}

and, provided that \mathcal{K} has a nonempty relative interior, this is equivalent to

\begin{aligned} &0 \in \nabla f(x^\star) + A^* (\partial g)(A x^*)\\ \Leftrightarrow\ & - \nabla f(x^\star) \in A^* (\partial g)(A x^\star), \end{aligned}

or equivalently

\begin{aligned} &- \nabla f(x^\star)  = A^* y^\star,\\ &y^\star \in (\partial g)(A x^\star). \end{aligned}

We have \partial g(z) = -(\partial \delta_{\mathcal{K}})(b-z) = -N_{\mathcal{K}}(b-z) and

\begin{aligned} &-c  = A^* y^\star,\\ &y^\star \in -N_{\mathcal{K}}(b-Ax^\star). \end{aligned}

or

\begin{aligned} &c  = A^* y^\star,\\ &y^\star \in N_{\mathcal{K}}(b-Ax^\star). \end{aligned}

Since \mathcal{K} is a nonempty convex cone, we have (book of Hiriart-Urruty and Lemaréchal, Example 5.2.6 (a))

\begin{aligned} N_{\mathcal{K}}(b-Ax^\star) = \mathcal{K}^\circ \cap \{b-Ax^\star\}^{\bot}. \end{aligned}

This yields the primal-dual optimality conditions we stated above.

 

Unboundedness and infeasibility

Infeasibility and unboundedness (dual infeasibility) conditions are provided by the so-called theorems of the alternative. The weak theorems of the alternative, state that

  1. (Primal feasibility). At most one of the following two systems is solvable
    • Ax\preceq_{\mathcal{K}} b (i.e., b-Ax\in\mathcal{K} or Ax+s=b for some s \in \mathcal{K})
    • A^* y = 0, y \succeq_{\mathcal{K}^*} 0 and \langle b,y\rangle <0
  2. (Dual feasibility). At most one of the following two systems is solvable
    • A^*y+c = 0 and y\succeq_{\mathcal{K}} 0
    • Ax\succeq 0 and  \langle c,x\rangle \leq 0

The primal-dual optimality conditions we stated previously together with these
feasibility conditions make the whole picture.

 

Homogeneous Self-Dual Embeddings

Consider the following feasibility problem in (x^\star, s^\star, y^\star) and (\tau^\star, \kappa^\star)

\begin{aligned} \begin{bmatrix} 0\\s^\star\\\kappa^\star \end{bmatrix} = \underbrace{ \begin{bmatrix} 0 & A^* & c\\ -A & 0 &  b\\ -c^* & -b^* & 0 \end{bmatrix}}_{Q} \begin{bmatrix} x^\star\\y^\star\\\tau^\star \end{bmatrix},\\ s^\star\in\mathcal{K},\ y^\star\in\mathcal{K}^*,\ \tau\geq 0,\ \kappa \geq 0. \end{aligned}

Note that for \tau=1 and \kappa=0, the above equations collapse to the
primal-dual optimality conditions. Second, due to the skew-symmetry of the above system, any solution (0,s^\star, \kappa^\star) and (x^\star, y^\star, \tau^\star) satisfies

\begin{aligned} \left\langle \begin{bmatrix} 0\\s^\star\\\kappa^\star \end{bmatrix}, \begin{bmatrix} x^\star\\y^\star\\\tau^\star \end{bmatrix}\right\rangle = 0, \end{aligned}

which leads to \langle  s^\star, y^\star \rangle  + \kappa^\star \tau^\star = 0, but since we already know that \langle  s^\star, y^\star \rangle  = 0, it is \kappa^\star\tau^\star=0, i.e., at least one of the two must be zero.

If \kappa=0 and \tau>0, then (x^\star/\tau^\star, y^\star/\tau^\star, s^\star/\tau^\star) is a solution. If \tau=0 and \kappa>0, then the problem is either primal- or dual-infeasible. If \tau=\kappa=0, no definitive conclusion can be drawn.

Let us define u=(x,y,\tau) and v=(0,s,\kappa). Then, the self-dual embedding becomes

\begin{aligned} \text{Find: } u,\,v,\text{ s.t. } Qu=v,\,(u,v)\in\mathcal{C}\times\mathcal{C}^*, \end{aligned}

where \mathcal{C} = \mathbb{R}^n\times\mathcal{K}^*\times\mathbb{R}_+. The problem can now be cast as an optimization problem

\begin{aligned} \mathrm{minimize}_{u,v} \delta_{\mathcal{C}\times\mathcal{C}^*}(u,v) + \delta_{\{0\}}(Qu - v). \end{aligned}

Furthermore, this is equivalent to the variational inequality

\begin{aligned} 0 \in Qu + N_{\mathcal{C}}(u) \end{aligned}

and N_{\mathcal{C}}(u) = \mathcal{C}^{\circ}\cap\{u\}^{\bot}.

This, then, becomes

\begin{aligned} u&\in\mathcal{C}\\ Qu&\in\mathcal{C}^*\\ \langle u, Qu \rangle &= 0 \end{aligned}

 

Up next

Operator splitting algorithms to solve cone programs.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

mathbabe

Exploring and venting about quantitative issues

Look at the corners!

The math blog of Dmitry Ostrovsky

The Unapologetic Mathematician

Mathematics for the interested outsider

Almost Sure

A random mathematical blog

Mathematix

Mathematix is the mathematician of the village of Asterix

%d bloggers like this: