- Equivalence of different formulations of cone programs
- Fenchel duality
- Primal-dual optimality conditions (OC)
- OCs as variational inequalities
- Homogeneous self-dual embeddings (HSDEs)
- OCs for HSDEs

A set in a vector space is called a *cone* if for every and it is closed

under addition. is said to be a *pointed cone* if and imply . Pointed cones define the partial ordering:

The *polar* of a cone is a cone defined as

If is closed, then and the indicators of and are conjugate to each other, that is

The *dual cone* of is defined as

If is a closed, convex cone, then and .

This result is known as the *extended Farkas’ lemma*.

Interesting examples of conex are

- the set of
*positive semidefinite matrices*which is a closed, convex, pointed cone, - the
*icecream cone*, and *polyhedral cones*, for some matrix .

The *conjugate* of a function is defined as

If , then . For all , if , .

The *subdifferential* of a convex function at a point is

The subdifferential of the indicator function of a convex set is

This function is called the *normal cone* of .

Moreau’s Theorem establishes an important duality correspondence between a cone and its polar cone :

For all , the following are equivalent:

- , , , ,
- ,

A *cone program* is an optimization problem of the form

with .

The constraint can be equivalently written as (i) , that is , or (ii) for . Problem is often written as

In the literature, we often encounter the following standard formulation for a cone program

These two formulations — problems and — are equivalent in the sense that we may transform one into the other.

For instance, starting from the second one, we clearly see that the constraints can be interpreted as (i) belongs to an affine space defined as and (ii) , that is, is contained in a cone. Essentially, must be in the intersection of a cone and an affine space.

Take so that and let be a matrix that spans the kernel of (which has dimension ), that is for all .

Then and the requirement that is written as , so the problem becomes

which is in the form of .

Conversely, can be written in the form of problem .

The dual of problem can be derived using the Fenchel duality framework.

Define and . Then, problem can be written as

whose Fenchel dual is

The conjugates of and are [note: The conjugate of is derived as follows: , so .]

and the Fenchel dual problem is

which is

Let be the optimal value of and

be the optimal value of . Then, strong duality holds if the primal or the dual problem are strictly feasible and then .

Overall, we may derive the following optimality conditions for a pair to be a primal-dual optimal pair

These optimality conditions can be seen as the following conditions on

Here note that the equality is equivalent to zero duality gap, that is .

The optimality conditions of (using the splitting we introduced in the previous section) are simply

and, provided that has a nonempty relative interior, this is equivalent to

or equivalently

We have and

or

Since is a nonempty convex cone, we have (book of Hiriart-Urruty and Lemaréchal, Example 5.2.6 (a))

This yields the primal-dual optimality conditions we stated above.

Infeasibility and unboundedness (dual infeasibility) conditions are provided by the so-called *theorems of the alternative*. The *weak* theorems of the alternative, state that

- (Primal feasibility). At most one of the following two systems is solvable
- (i.e., or for some )
- , and

- (Dual feasibility). At most one of the following two systems is solvable
- and
- and

The primal-dual optimality conditions we stated previously together with these

feasibility conditions make the whole picture.

Consider the following feasibility problem in and

Note that for and , the above equations collapse to the

primal-dual optimality conditions. Second, due to the skew-symmetry of the above system, any solution and satisfies

which leads to , but since we already know that , it is , i.e., at least one of the two must be zero.

If and , then is a solution. If and , then the problem is either primal- or dual-infeasible. If , no definitive conclusion can be drawn.

Let us define and . Then, the self-dual embedding becomes

where . The problem can now be cast as an optimization problem

Furthermore, this is equivalent to the variational inequality

and .

This, then, becomes

Operator splitting algorithms to solve cone programs.

]]>

Pointwise convergence of , where , to a function is not in general sufficient to guarantee convergence neither of to nor of (which is a sequence of sets) to .

It is in principle easier to answer the question “under what conditions converges to ?” because the convergence of sets requires the introduction of a new notion of convergence.

As we can see in the animation below, we may have a sequence of *continuous* functions which converge *pointwise* to a function , but neither the infima nor the sets of minimisers converge as you would expect.

What we need here is an alternative notion of convergence of sequences of functions which is better suited for the study of the convergence of infima and minimiser sets. This is exactly the *epigraphical* convergence of .

Instead of looking at at individual points we look at the *epigraphs* . The epigraph of a function is defined as

We then need to introduce a suitable notion of convergence for sequences of sets. This is the *Painleve-Kuratowski convergence*, or convergence in the Fell topology.

We then say that a sequence of functions converges *epigraphically* to a function – we denote – if the sequence of sets converges to the epigraph to in the Painleve-Kuratowski sense.

Now, according to Thm. 7.33 in (1), assuming that the sequence is *eventually level-bounded* (functions have bounded level sets for all for some ), and and and are *lower semicontinuous* (i.e., they have closed epigraphs) and proper, then

and the sets are eventually nonempty and form a bounded sequence with

where here is the outer limit of a sequence of sets.

*Note.* in the Wikipedia article on Kuratowski convergence, they use the term *limit superior* in lieu of *outer limit*. In (1) the authors use the term *outer limit* instead to avoid any confusion with the set-theoretic limit which is not a topological notion.

Note that it is possible that contains more elements than the outer limit of .

A special case is when are eventually singletons. Then, we have a strong convergence result, but this would require additional assumptions such as *strict convexity*. It is otherwise quite difficult to establish conditions for the convergence of the minimisers unless we draw restrictive assumptions, e.g., that the functions are nested like .

It is also important to note that according to the above theorem, the limit of may not exist.

Regarding the *continuity* of , we should first define a space of functions on functions and equip it with an appropriate topology to judge whether the set-valued functional

is continuous. This will be the space of *lower semi-continuous* and *proper* functions and the topology will be the topology of *total epi-convergence*. It would take a lot of time and effort to explain the notion of *total* epi-convergence, but according to Thm. 7.53 in (1), it is the same as epi-convergence in certain special cases:

when is further restricted to containing :

(i) only convex functions

(ii) only positive homogeneous functions

Also, if is nonincreasing, then epi-convergence of implies its total epi-convergence and, finally, if is equi-coercive, again epi-convergence is the same as total epi-convergence.

Under these assumptions, the mapping is outer semi-continuous.

(1) R.T. Rockafellar and R.J-B. Wets, Variational Analysis, Grundlehren

der mathematischen Wissenshaften, vol. 317, Springer, Dordrecht 2000, ISBN: 978-3-540-62772-2.

]]>

We take the infimum which corresponds to the optimization problem that defines the projection on the epigraph and we have

where we have done the converse of an epigraphical relaxation; we define the function and notice that this is minimized at

and . This is of course only useful to the extent that is computable.

]]>

In mathematics, the concept of duality allows to study an object by means of some other object called its dual. A linear operator can be studied via its *adjoint* operator . Certain properties of a Banach space can be studied via its *topological dual* . A convex set in can be seen as the intersection of a set of hyperplanes, that is and the latter is often a more convenient representation. These are examples of *dual objects* in mathematics. Likewise, in convex optimization, the dual object which corresponds to a convex function is its *convex conjugate. *When it comes to optimization problems, however, there are several ways in which we may derive dual optimization problems leading to different formulations. This is because we first need to specify what exactly we dualize and how…

*Notation:* In what follows, denote two Hilbert spaces. Their inner product will be denoted by With minor adaptations the results presented here hold for Banach spaces as well. We denote the extended-real line by .

In convex optimization theory, most duality concepts (including the Lagrange and Fenchel duality frameworks) source from the realization that **convex sets** can be represented as the **intersection of a set of hyperplanes**. This extends elegantly to proper, convex, lower semicontinuous functions which can be identified by their epigraphs. Recall that the epigraph of a function is the set .

Let us describe the supporting hyperplanes of the epigraph of a function. Let be an affine function with slope which is majorized by , that is for all .

Provided that is proper, convex and lower semicontinuous, for every slope there is a supporting hyperplane for the form for some . This is

The *convex conjugate* of at is the RHS of this equation which returns a value so that is a supporting hyperplane of the epigraph of .

Provided that is proper, convex and lower semicontinuous, knowing can tell us everything about . In fact, according to the Fenchel-Moreau theorem, , where .

The convex conjugates are the *dual* objects of convex functions and, as you can see, they

The framework of perturbation functions is perhaps the most elegant in optimization theory the reveals the essence of dual optimization problems [1], [2]. In fact, every dualization (there is no unique way in which we define a dual optimization problem) is associated with a perturbation function.

Consider the optimization problem

We introduce a convex function which we shall call a *perturbation function, *so that . Then, the optimization problem above can be equivalently written as

For , the following problem, which depends parametrically on , is called a *perturbed optimization problem*

Let be the optimal value of this problem. We are interested in determining . If is sufficiently regular, that is, proper, convex, lower semicontinuous (we shall discuss later when this is the case), then

otherwise .

The **dual problem** consists in finding .

By definition

therefore,

Therefore, the determination of requires the solution of the optimization problem

This is precisely the **Fenchel dual** optimization problem [3].

Let us see how exactly perturbation functions lead naturally to dual formulations. The convex conjugate of is

therefore,

As a result, the dual optimization problem problem can be written as

or, equivalently

Juxtapose this with the primal problem which is

Lagrangian duality is a particular type of duality for optimization problems of the form

where , and is meant in the component-wise sense.

We perturb the problem as follows

where . Define the set . The perturbed problem is written as

where is the indicator function of , that is

In other words, the perturbation function is defined as

Its convex conjugate is

where .

This is exactly the **Lagrangian dual** optimization problem which is the Fenchel dual on a properly chosen perturbation function.

[1] R.I. Bot, S.M. Grad and G. Wanka, Fenchel-Lagrange Duality Versus Geometric Duality in Convex Optimization, JOTA 129(1):35-54, 2006.

[2] R.T. Rockafellar, R. J.-B. Wets, Variational Analysis, Springer, 2009.

[3] H. H. Bauschke, P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, 2011.

]]>

That is, quadratic constraints (of the form ), these can be converted to second-order conic constraints.

Note that in the constraints may or may not be a decision variable (it may as well be a constant).

The following identity does the trick:

Then,

where is the second-order cone, also known as Lorentz cone or ice cream cone.

Define the linear operator

Then, the quadratic constraints becomes

Having for some symmetric positive semidefinite matrix is no different than what we just did since .

]]>

Define with and take $(z,\tau)\notin \mathrm{epi} f$. Let . Following the previous post, the optimality conditions are

Then, the first optimality condition becomes

Then,

where . This third-order polynomial equation can be solved numerically, it has a root which satisfies the second optimality condition above. Once has been determined, can be computed.

The above can be trivially extended to the case where , with $\alpha_i>0$. However, the projection onto the epigraph of where is a symmetric positive (semi)definite matrix is more complex and we cannot derive explicit formulas in the general case.

Here is a simple MATLAB function to compute the projection onto the epigraph of the squared Euclidean norm efficiently:

function [x_, z_, details] = epipr_sqnorm(x,z) if (x'*x <= z) x_ = x; z_ = z; return end theta = 1 - 2 * z; [r, status] = cubic_roots(theta, x); details.status = status; % Pick the right root for i=1:length(r), x_ = x/(1 + 2*(r(i) - z)); if abs(norm(x_)^2-r(i)) <= 1e-6, z_ = r(i); break; end end % Refine the root [z_, iter, err] = newton_solver(x,theta,z_,5,1e-10); x_ = x/(1 + 2*(z_ - z)); details.newton_iter = iter; details.newton_err = err; function [r, status] = cubic_roots(theta, x) b=4*theta; c=theta^2; d=-x'*x; D = 72*b*c*d - 4*b^3*d +b^2*c^2 - 16*c^3 - 432*d^2; D0 = b^2 - 12*c; status.D = D; status.D0 = D0; if abs(D) <= 1e-14, if abs(D0) <= 1e-14, % one triple root (we cannot be here!) r = -b/12; status.msg = 'one triple root'; else % a double root and a single one r = zeros(2,1); r(1) = (16*b*c - 144*d - b^3)/(4*D0); % single r(2) = (36*d - b*c)/(2*D0); % double (cannot be) status.msg = 'double plus single'; end return; end r = roots([4 b c d]); % eigenvalues of matrix function [zsol, i, err] = ... newton_solver(x,theta,z0,maxit,tolx) zsol = z0; zsol_prev = z0-1; for i=1:maxit, zsol = zsol - ... (4*zsol ^3 + 4*theta*zsol^2 + theta^2 * zsol -x'*x)... /(12*zsol^2 + 8*theta*zsol + theta^2); err = abs(zsol-zsol_prev); if err < tolx, return; end zsol_prev = zsol; end

]]>

**KKT Optimality Conditions**

Let be a proper, convex, continuous function; its epigraph is the nonemtpy convex closed set

For convenience, we define the projection of a pair onto the epigraph of as

Let ; if , then . Suppose . Let and so that ; this pair solves the optimization problem

The KKT conditions for are

Clearly, since . Then, because of the second condition, and, because of the fourth condition, , so the third condition yields

It might be necessary to employ numerical methods to solve the optimality conditions.

**Dual epigraphical projection**

Consider again the optimization problem which defined the epigraphical projection. We introduce the Lagrangian

We compute the partial subdifferentials of with respect to the primal variables :

The *dual function* is defined as

The optimality conditions for the optimization problem which defines the dual function are and , therefore, with

and

Then, the Lagrangian dual function becomes

where is the Moreau envelope function of with parameter . Then, the dual problem is the following 1-dimensional optimization problem

*Up next*: examples of epigraphical projections: the *squared norm* and *norm* cases.

]]>

This question was published in Harvey J. Greenberg’s “Myths and Counterexamples in Mathematical Programming,” which can be found online (see NLP Myth 5). The author points out to:

D. M. Bloom. FFF #34. The shortest distance from a point to a parabola. The College Mathematics Journal, 22(2):131, 1991,

where Bloom addresses the simple problem of determining the shortest distance between the point on the plane and the parabola .

A simple sketch reveals that the unique minimiser of this problem is the point and the distance is equal to 5.

The original problem, using the squared distance, is

Now if we simply substitute by we get the problem

However, in doing so we have dropped the requirement that . This leads to the “paradox” that the solution of the second problem is which corresponds to an imaginary x-coordinate.

The correct way to go is, of course, to use the KKT conditions of the original problems to determine its critical points.

]]>

Let us first provide some necessary mathematical preliminaries and definitions.

Let use denote by the set of extended real numbers, that is . Functions which map into are called extended-real-valued and are often used in optimisation to encode constraints.

Let be a real Hilbert space. For a function , its **domain** is the set .

We call **proper** if it is not everywhere equal to infinity, that is .

We say that is **closed** (or lower semicontinuous) if . This is equivalent to requiring that the epigraph of , that is the set , is closed.

The **convex conjugate** of a proper convex function , is the function . The convex conjugate plays a central role in Fenchel duality theory.

The **proximal operator** of a proper, closed, convex function with parameter is the mapping . The proximal operator of is also the **resolvent **of the operator . In general, for an operator , its resolvent is the operator , where is the identity operator.

The **subdifferential** of a proper, convex function at is .

For two Hilbert spaces and we denote by the direct sum of the two spaces equipped with the inner product .

We define three important classes of operators on Hilbert spaces. We call **nonexpansive** if . We call –**averaged** if it can be written as , where is the identity operator and is nonexpansive. We say that is **firmly nonexpansive** if it is -averaged.

We say that a set-valued operator is a **monotone** operator if for every and for every and , it is . For any proper function , is monotone.

We say that is **maximally monotone** if it is monotone and its graph cannot be extended to the graph of another monotone operator – in other words, if for every for which for all with , it is . As an example, skew-symmetric bounded linear operators are maximally monotone. Moreoever, the subdifferential of a convex, proper, closed function is maximally monotone.

Last, for a convex set , we define its **core** to be the set , where is the conic hull. The **strong relative interior **of a set is the set (here span stands for the set of linear combinations and the overline denotes the topological closure). The strong relative interior is a recurring notion in infinite dimensional spaces and it appears in results related to strong duality (such as the Attouch-Brézis Theorem).

For an operator , the set of its **fixed points** is defined as .

The **inverse** of an operator is defined as . The set of zeros of is

Optimization problems can be stated as monotone inclusions:

where is a maximally monotone operator. The classical **proximal point method** (P2M) is

where is the resolvent of with parameter . More often than not, however, cannot be evaluated as it requires the solution of another optimisation problem.

The **preconditioned proximal point method **(P3M) consists in replacing in P2M with where is a bounded invertible linear operator on . The modified algorithm reads

An appropriate choice of may lead to steps which are easier to evaluate as we shall see in the following section.

Consider the following optimisation problem

where and are proper, convex, closed functions and is a bounded linear operator whose adjoint is denoted by .

We assume that , that is, the above optimisation problem has a nonempty feasible domain.

The optimality conditions of this problem are

and provided that , the optimality conditions become

Equivalently, the optimality conditions are satisfied if and only if there is a and a so that

Using the well-known property that (as it easily follows from the definitions of the subdifferential and the convex conjugate), the optimality conditions become simply.

Naturally, we define the operator as

We are now looking for a primal-dual point which satisfies the monotone inclusion .

The Chambolle-Pock method is an instance of the P3M method for solving using the following linear operator as a preconditioner

Using its Schur complement it can be verified that this is positive definite provided that (where is the operator norm of ).

This linear operator is used to define a modified inner product on as which in turn induces the norm . In what follows, will be taken with respect to this inner product.

Using this preconditioner, P3M boils down to the following recursion:

This is easy to verify starting from the basic P3M iteration described above, which is where .

This defines the following operator

which we shall refer to as the Chambolle-Pock operator. The fixed points of , that is, points such that are exactly the solutions of the monotone inclusion .

Operator is firmly nonexpansive in with the modified inner product introduced above.

We further define the fixed-point residual operator as

The notion of metric subregularity is of key importance in establishing linear convergence rates for several algorithms. Let us first state the definition:

**Definition. **A set-valued function defined on a Hilbert space is said to be **metrically subregular** at a for a if there is a constant and a neighbourhood for so that

for all .

Metric subregularity is equivalent to the calmness of the inverse mapping. It is a property which is weaker than metric regularity, bounded linear regularity for single-valued mappings and the popular Aubin property.

A common assumption which is used to establish linear rate of convergence is that the fixed-point residual is metrically subregular at a solution for , that is, there is a neighbourhood of so that

where is the set of fixed-points of . This condition is, however, difficult to verify in practice. Instead, we may use the following condition in conjunction with P3M. We state the following proposition which is not specific to the Chambolle-Pock algorithm.

**Proposition.** Let be a P3M step and let be the corresponding fixed-point residual. If is metrically sub-regular at for with modulus , then is metrically sub-regular at for with modulus .

*Proof. *For given take so that .

For , define for which we know that .

Equivalently, satisfies .

Then,

Note that and . Using the triangle inequality for the distance-to-set mapping, for

which completes the proof.

Shen and Pan (Prop. 3.1) provide conditions under which operators in the form are metrically sub-regular at a . A condition which is also mentioned in the book of Dontchev and Rockafellar requires to be an affine map and to have a polyhedral graph.

The condition of the above proposition is that for , there is a and a neighbourhood of so that

for all . The right hand side of this last inequality can be written as

]]>

Here we discuss the criteria under which the simple gradient method should be terminated for unconstrained minimisation problems

where is a *strictly convex* (but not strongly convex) function with -Lipschitz gradient.

The criterion if fine for strongly convex functions with -Lipschitz gradient. Indeed, if is -strongly convex, that is

then, for such that (the unique minimiser of ), we have

so, if , then , i.e., is -suboptimal.

But termination is a mysterious thing… In general, if is merely strictly convex, it is not true that we will have if , for some (not even locally). The reason for that if that the condition that has Lipschitz gradient, or the condition that is Lipschitz continuous provides only an upper bound on the steepness of the function, but allows it to be arbitrarily flat – especially close to the optimum.

There might be specific cases where a favourable bound holds, notwithstanding. Unless you draw some additional assumptions on , this will not be a reliable termination criterion.

However, strong convexity is often too strong a requirement in practice. Weaker conditions are discussed in the article: D. Drusvyatskiy and A.S. Lewis, Error bounds, quadratic growth, and linear convergence of proximal methods, 2016.

Let be convex with -Lipschithz gradient and define . Let us assume that has a unique minimiser (e.g., is strictly convex). Then assume that has the property

for all for some . Functions which satisfy this property are not necessarily strongly convex. We say in that case that is a *strong minimum*. As an example we have which has a strong minimum, but is not strongly convex. Of course if is strongly convex the above holds as well as if is given in the form where is a strongly convex function and is any matrix.

Then, condition the above condition is shown to be equivalent to

for all and with .

Clearly in this case we may use the termination condition which will imply that .

There are, however, cases where after all the criterion implies some sort of bound on . The function , which is not strongly convex, is such an example. We may show that which provides an error bound, although is not strongly convex. In general, if is not strongly convex and it doesn’t have a strong minimum, we may still derive error bounds based on other properties of the function at hand, but for the time being the above results are the most generic that are known.

]]>