We will start by presenting the structure of scenario trees

Discrete-time stochastic processes over a discrete probability space can be described with the simple structure of a scenario tree. A scenario tree is a structure such as the one shown in the following figure: this can be thought of as a type of graph whose nodes are organized in (time) stages.

Each node in the scenario tree is indexed with a unique node . The nodes are organised in stages . The root node resides in stage and is identified with . The nodes at the final stage are called *leaf nodes*. The nodes at stage are denoted as . A non-leaf node of the tree are linked to a set of nodes in the next stage which are called its *children *and are denoted by . Every node , except for the root node, has a single *ancestor* which resides in the previous stage and is denoted by .

The set of leaf nodes, , is equipped with the sigma algebra ; then we assign a probability value to every nodes . This defined a probability measure on that space. This way becomes a probability space.

The definition of a probability space over as explained above, induces a probability space over . We equip with the sigma-algebra and for every we define .

Recursively, we construct probability spaces over for .

We have constructed a sequence of probability spaces . However, the way we have defined these probability spaces conceals the temporal nature of the stochastic process that generates them. In other words, this way it is as if we are dealing with a collection of dissociated and separate spaces with no apparent connection with one another. We shall remedy that in the following section where we will identify all these spaces with subspaces of a common probability space.

In this section, we will revisit the construction of probability spaces. The starting point is the definition of as above. Every node has a set of children so that and are disjoint sets. In other words, we define a partition of by grouping together its siblings (nodes with a common ancestor). We may define the sigma-algebra (this time on as the smallest sigma-algebra containing for .

This is illustrated in the following figure.

In Figure 2, the set of leaf nodes is . At we define the sigma-algebra – the discrete sigma-algebra. The nodes of the previous stage, define a partition on : in particular and . Now at we define the sigma-algebra . Last, for we always have that , the trivial sigma-algebra.

We have this way constructed a filtration on the probability space . This is a sequence of sub-sigma-algebras of , with the property for .

The way a filtration is defined remind’s me of a verse in Andreas Empirikos’s poem “spinning mill of night rest” which goes “we are all **within** our future” (είμεθα όλοι εντός του μέλλοντός μας). A filtration encodes a *flow of information* from time , the time when we have no precise information about the progress of the random process (any outcome in is possible), till time when we know exactly how the stochastic progress has evolved.

We may define (real-valued or vector-valued) random variables on a scenario tree. These are always random variables on the probability space , that is, functions . The random variables which reside at stage are the ones which are measurable with .

We denote the space of random variables as . The spaces of -measurable random variables are denoted by . We have that .

Let be a sequence of random variables on . We say that the sequence is adapted to the filtration if for all , is -measurable (as shown in Figure 3).

In Figure 3, the constraints and are imposed so that the random process be adapted to the filtration. Such constraints are called non-anticipativity (or causality) constraints.

Support that the process is the decision variable of a stochastic multistage optimal control problem. Suppose that are -valued random variables.

Let be a function and be a random variable on , i.e., a function . Then is a random variable on the same probability space with

with

To facilitate the presentation of the material in what follows, we will stick with a simpler class of random variables. We will study mappings so that, when composed with random variables as above, they yield the random variable .

Alongside stochastic optimal control formulation, we often need to impose that ““, for some , in some probabilistic sense. There are several different ways to pose such constraints which are discuss below. As an example, we may require that for all ; however, this is most likely too conservative in a probabilistic setting. Instead, one may require that the probability for given .

Before we discuss how we may impose risk-based constraints on multistage scenario-based optimal control problems, we shall motivate the use of risk-based constraints on a space .

Let be an -valued random variable on a space and (as above). Probabilistic or chance constraints are those of the type , where is the underlying probability distribution.

Expectation constraints are those in the form , where is the expectation operator with respect to the underlying probability measure.

Probabilistic constraints can be cast as expectation constraints via the familiar formula where and denotes the characteristic function of

Therefore, the probabilistic constraint can be written as .

Probabilistic constraints are equivalent to value-at-risk constraints. Recall that the value-at-risk of a real-valued random variable , at level is defined as for . The probabilistic constraint is equivalent to .

A tight convex overapproximation of the value-at-risk is the average value-at-risk which is defined as for and . The average value-at-risk is a convex functional with the property . This means that the above nonconvex probabilistic constraints can be relaxed by . This is exactly a risk constraint.

Another way to motivate the introduction of risk constraints is the framework of ambiguous chance constraints. We require that the probabilistic constraint is satisfied for all probability measures in a set , in lieu of just . This means that is required to belong to the set

If (and only if) is a w*-closed, convex set of probability measures, the mapping is a coherent risk measure. The above ambiguous probabilistic constraint becomes:

We call stage-wise risk constraints those that apply at a single stage . Let again be a random process of decision variables which is adapted to the filtration . A stage risk constraint at a stage is a constraint of the form

where is a (coherent) risk measure, is an -measurable random variable and is the maximum allowed risk at stage .

Such constraints can be used as convex surrogates of probabilistic or expectation constraints at a given stage.

The problem with such constraints is that they treat the uncertainty at stage without examining how this is produced, how uncertainty propagates to yield the probability distribution at time and how what happens at previous stages may affect the probability distribution at that stage.

Given a scenario tree, a scenario is an admissible (nonzero probability) path that connects the root node to a leaf node. There are as many scenarios as leaf nodes. For every lead node , the corresponding scenario is the sequence , where is the ancestor of node .

Let be a random process on the scenario tree and let us denote by the value of at node of the scenario tree. For every , let us denote by the -tuple of values of on the scenario which connects the root node to node , that is .

Let be a function that maps those tuples to scalars and quantifies a constraint in the space of all scenarios. For example, one might choose

We may then impose constraints on the risk of the random variable with values .

Although this approach uses past information, it makes us of the probability space and disregards how this space is generated (i.e., the filtration ).

Next, we shall explain why it is important to take the filtration, i.e., the tree structure, into account.

In scenario-based optimal control it is important to keep in mind that the probability distribution at every stage of the tree depends on the probability distribution at all previous stages.

Every node *i* of the tree can be identified by a sequence of nodes that connect the root node () with that node in a unique way.

The probability of node *i* is equal to a product of conditional probabilities.

It should be well understood that a scenario tree described the propagation of uncertainty in time. But what about the uncertainty on this uncertainty? What is the effect of the misestimation of the probability at a given stage on subsequent stages?

Let me try to illustrate that with the following figure.

The tree in the figure above is generated by the model of a Markovian system with two modes: A and B with estimated transition probability values , .

In the figure, the red nodes correspond to the occurrence of mode A, the nodes corresponding to mode B are the blue-ish ones. The gray nodes are to be ignored here.

Suppose that all nodes incur a zero cost except for node which has the cost value .

If we use stage-wise risk constraints using the average value-at-risk at level , then, at stage , we account for the probability vector with (instead of its nominal value ).

Again, using the stage-wise approach we described above and using the average value-at-risk at level , at the highest probability value that we may assign to node is .

However, errors at all preceding stages may build up. If, as mentioned previously, the transition probability has been misestimated and is actually equal to , the highest probability that corresponds to is — almost double the value .

Nested risk constraints seek to remedy the issues associated with uncertainty propagation.

Nested risk functionals apply on -measurable random variables and return a scalar value . Given a sequence of conditional risk mappings – where is conditioned by the sigma-algebra – we define

These have the dual representation

with

From that, we see that nested risk measures take into account how the uncertainty at all intermediate stages builds up to the uncertainty at stage .

The constraints

do not suffer from the pathologies of stage-wise risk constraints discussed above.

All these constraints (nested, stage-wise and scenario-wise) can be modeled in a way that allows the efficient solution of the associated optimization problems. More on that, soon.

- A. Shapiro, D. Dentcheva, and Ruszczyński, Lectures on stochastic programming: modeling and theory. Second Edition, SIAM, 2009.
- G. Pflug and W. Römisch, Modeling, Measuring and Managing Risk. World Scientific, 2007.
- A. Shapiro, Minimax and risk averse multistage stochastic programming,
*European Journal of Operational Research*219(3):719-726, 2012.

Let be a probability space and . Let denote the set of all probability measures on the measurable space .

Risk measures are defined as mappings . The notion of coherency for risk measures is an essential one, however, different authors use different conventions (e.g., monotonicity vs anti-monotonicity), therefore, it is expedient to state the coherency axioms we use here:

- (Convexity). For every and ,
- (Monotonicity). If and almost surely, then
- (Translation equivariance). For ,
- (Positive homogeneity). For and , .

These are the coherency axioms as stated in the book *Lectures on Stochastic Programming: Modeling and Theory* by A. Shapiro, D. Dentchva and A. Ruszczynski, SIAM, 2nd Edition, Philadelphia 2014.

Here, we shall use the average value-at-risk as an example.

Let us recall that the average value-at-risk at level of a random variable has the following representation

where is the *admissibility set* of the average value-at-risk given by

We may associate every with a measure (sometimes denoted as ), which is a measure on so that for all sets it is .

In fact this is a probability measure: since .

Then, .

This way, we have

where, with a little abuse of notation we denote by the following set

This way we identify , which is a set of random variables, by the set given here above, which is a set of probability measures.

In general, coherent risk measures are written in the form

where is a (convex, w*-closed) set of probability measures.

Before we proceed let us recall the definition of the conditional expectation

of a random variable conditioned by a sigma-algebra , sub-sigma-algebra of .

A conditional expectation is an -measurable random variable which satisfies

for all and .

The conditional expectation is not unique, however, all versions of a conditional expectation differ only on a set of measure zero.

We shall define conditional risk mappings in a similar fashion: a conditional risk mapping of a random variable will be an -measurable random variable and it will suffice to define for all .

The conditional version of the average value-at-risk at level , conditioned by a sigma-algebra is a random variable which satisfies the following identity for all

Note that , therefore, .

By virtue of the monotone convergence theorem, the above can be extended for nonnegative -measurable random variables in lieu of yielding

for all .

More generally, the above identities for conditional risk mappings can be written in the following compact form

or, even more generally for all nonnegative random variables .

General conditional risk measures are defined by specifying a collection of sets of random variables of for all so that

Sets for nonnegative random variables may also be defined.

As we shall discuss in what follows, we may compose multiple conditional risk mappings.

Given two sigma-algebras and (sub-sigma-algebras of ) and an -measurable random variable, we define the conditional risk mapping which returns an -measurable random variable .

Its dual representation is given by

We observe that the tower property does not hold for compositions of conditional risk measures:for , is not the same as .

We will revisit these compositions in the following section.

Suppose that . In order to express the conditional average value-at-risk in terms of probability measures we introduce the probability measure (or, in equivalent notation ).

Recall that this is defined by for all .

In light of (*), we have that for all ,

, that is, ,

i.e., .

Clearly, , so is indeed a probability measure (). Moreover, .

Lastly, .

Let be a probability space and let be a finite length filtration. Recall that, according to the definition for all . We will further assume that and .

We define . Let be a sequence of conditional risk mappings, for .

These are mappings ; sometimes, these are defined just as .

The nested risk measure up to stage which is induced by a risk measure is

This is a mapping (or, often, ), so, they are risk measures.

We shall denote the nested average value-at-risk at level up to stage as . This is

where is the set

that is, all in are (finite-length) martingales.

The sequence and belongs to , therefore

for all .

Similarly, for general risk measures other than the average value-at-risk, we have

with

Such nested multiperiod risk measures are used in the formulation of multistage risk-averse optimization problems. In particular, for a finite-length stochastic process which is adapted to the above filtration, that is , we define the risk-averse cost .

This is

where we have used the translation equivariance property of conditional risk mappings.

- A. Shapiro, D. Dentcheva, and Ruszczynski, Lectures on stochastic programming: modeling and theory. Second Edition, SIAM, 2009.
- G. Pflug and W. R ̈omisch, Modeling, Measuring and Managing Risk. World Scientific, 2007.
- A. Ruszczynski and A. Shapiro, Conditional risk mappings. Math. Oper. Res 31(3):544-561, 2006.

]]>

Download it here: **probability cookbook v0.1**

Feedback is more than welcome!

]]>Let us first recall a few definitions. A Markov jump linear system (MJLS) in discrete time is a stochastic dynamical system of the form

where is a Markov process.

Hereafter, we shall focus on Markov processes where takes values from a finite set.

We say that the above MJLS is stable in the ** p-mean** if as . If , we say that the system is

Let denote the Kronecker product of two matrices. The Kronecker product of with itself p times is denoted as (p times).

We say that the system is stable in the ** p-order moment** if as .

If is iid, then stability in the *p*-mean implies stability in the *p*-order moment. Conversely, for even orders *p*, *p*-moment stability entails *p*-mean stability.

Consider a Markov jump linear system (MJLS), and

and

Let be identically independently and uniformly distributed.

This system comes from [1] and [8].

Next, we plot and starting from the initial condition .

We notice that the system is mean-square stable.

However, it is not stable in the fourth mean:

In case you wonder, the fourth moment, diverges as shown in the following plot

Let us have a look at the trajectories of the system:

And here we see the histograms of the norm of x(k) at three different time instants:

The example above comes from the PhD dissertation of Masaki Ogura [1] and originally appeared in [8].

According to Theorem 2.12 and Proposition 2.13 in that dissertation, provided that *p* is even and is iid, the system is stable in the p-order mean if and only if , where is the spectral norm.

In the above example, we have , while .

Therefore, the system is 2-moment stable and 2-mean stable (mean square stable), but not 4-moment stable and not 2-mean stable.

Further reading:

- Masaki Ogura, “Mean stability of switched linear systems,” PhD dissertation, Texas Tech University, 2014.
- Y. Fang and K. A. Loparo, “Stochastic stability of jump linear systems,” IEEE Transactions on Automatic Control, vol. 47, no. 7, pp. 1204–1208, 2002.
- Y. Fang and K. A. Loparo, “On the relationship between the sample path and moment Lyapunov exponents for jump linear systems,” IEEE Transactions on Automatic Control, vol. 47, no. 9, pp. 1556–1560, 2002.
- Y. Fang, K. A. Loparo, and X. Feng, “Almost sure and δ-moment stability of jump linear systems,” International Journal of Control, vol. 59, no. 5, pp. 1281–1307, 1994.
- R. M. Jungers and V. Y. Protasov, “Weak stability of switching dynamical systems and fast computation of the p-radius of matrices,” in 49th IEEE Conference on Decision and Control, pp. 7328–7333, 2010.
- O.L.V. Costa and M. Fragoso, “Comments on “Stochastic stability of jump linear systems”,” IEEE Transactions on Automatic Control, vol. 49, no. 8, pp. 1414–1416, 2004.
- X. Feng, K. A. Loparo, Y. Ji, and H. J. Chizeck, “Stochastic stability properties of jump linear systems,” IEEE Transactions on Automatic Control, vol. 31, no. 1, pp. 38–53, 1992.
- B. Hanlon, N. Wang, M. Egerstedt, and C. Martin, “Switched linear systems: stability and the convergence of random products,” Communications in Information and Systems, vol. 11, no. 4, pp. 327–342, 2011.

and , while, under additional conditions (which are typically met in finite-dimensional spaces), we have

Let be a probability space with . Define .

Real-valued random variables over this space are functions which can be identified by vectors of .

A **risk measure** over is a mapping .

Examples of risk measures are:

- The expectation:
- The essential supremum: ,
- The average value-at-risk at level defined as where (using the convention followed by Shapiro
*et al.*), - The entropic risk measure defined as (defined by a log-sum-exp function)

and many another.

A common requirement for risk measures is that they be **monotone**, that is, for with for all with , it is . All above risk measures are monotone.

Let . Then for all , can be seen as a random variable over .

Define , where is a nonempty set. Likewise, can be seen as a random variable over .

We shall hereafter assume that are finite for all those with .

In finite-dimensional spaces, all widely-used risk measures are **continuous** mappings. In fact, all **coherent** risk measures are written as the support function of a convex compact set, therefore, they are continuous. It is, however, sufficient to only assume that is continuous at .

We shall show that

For , define

By the definition of infimum, these sets are nonempty for all and are nested ( whenever ).

For we have

Because of the monotonicity of it is

By virtue of the first inequality in (*), taking the infimum over on both sides, we have

Making use of the second inequality in (*), and because of the fact that is continuous at (take and note that ) we have

which completes the proof.

Assume that is nonempty. It is easy to verify that , i.e., if then . Conversely, take a and suppose that . This means that there is a so that

[Note: the inequality should be interpreted as and for at least one index ]

The monotonicity property of implies that . Under the additional assumption of **strong monotonicity **( is strongly monotone if whenever with ), we have , which leads to contradiction because is a minimizer of over .

The interchangeability property of risk measures is used to establish the equivalence of the following formulations of multistage risk-averse problems:

(where are **conditional risk mappings**) and

Here is another use of the interchangeability property: assume that is (element-wise) lower semi-continuous (in the sense that are lower semi-continuous).

Then, and

This means that the inequality — which is a risk constraint — can be written as , which holds if there is a so that and .

References:

- A. Shapiro, D. Dentcheva, A. Ruszczyński,
*Lectures on Stochastic Programming: Modeling and Theory,*Second Edition, SIAM, Philadelphia, 2014. See Proposition 6.60 and Section 6.8.1 (in particular the equivalence between (6.323) and (6.325). Therein, the authors show the interchangeability property in general infinite-dimensional probability spaces. - D. Bertsekas,
*Convex Optimization Theory,*Athena Scientific, 2009.

]]>

- Equivalence of different formulations of cone programs
- Fenchel duality
- Primal-dual optimality conditions (OC)
- OCs as variational inequalities
- Homogeneous self-dual embeddings (HSDEs)
- OCs for HSDEs

A set in a vector space is called a *cone* if for every and it is closed

under addition. is said to be a *pointed cone* if and imply . Pointed cones define the partial ordering:

The *polar* of a cone is a cone defined as

If is closed, then and the indicators of and are conjugate to each other, that is

The *dual cone* of is defined as

If is a closed, convex cone, then and .

This result is known as the *extended Farkas’ lemma*.

Interesting examples of conex are

- the set of
*positive semidefinite matrices*which is a closed, convex, pointed cone, - the
*icecream cone*, and *polyhedral cones*, for some matrix .

The *conjugate* of a function is defined as

If , then . For all , if , .

The *subdifferential* of a convex function at a point is

The subdifferential of the indicator function of a convex set is

This function is called the *normal cone* of .

Moreau’s Theorem establishes an important duality correspondence between a cone and its polar cone :

For all , the following are equivalent:

- , , , ,
- ,

A *cone program* is an optimization problem of the form

with .

The constraint can be equivalently written as (i) , that is , or (ii) for . Problem is often written as

In the literature, we often encounter the following standard formulation for a cone program

These two formulations — problems and — are equivalent in the sense that we may transform one into the other.

For instance, starting from the second one, we clearly see that the constraints can be interpreted as (i) belongs to an affine space defined as and (ii) , that is, is contained in a cone. Essentially, must be in the intersection of a cone and an affine space.

Take so that and let be a matrix that spans the kernel of (which has dimension ), that is for all .

Then and the requirement that is written as , so the problem becomes

which is in the form of .

Conversely, can be written in the form of problem .

The dual of problem can be derived using the Fenchel duality framework.

Define and . Then, problem can be written as

whose Fenchel dual is

The conjugates of and are [note: The conjugate of is derived as follows: , so .]

and the Fenchel dual problem is

which is

Let be the optimal value of and

be the optimal value of . Then, strong duality holds if the primal or the dual problem are strictly feasible and then .

Overall, we may derive the following optimality conditions for a pair to be a primal-dual optimal pair

These optimality conditions can be seen as the following conditions on

Here note that the equality is equivalent to zero duality gap, that is .

The optimality conditions of (using the splitting we introduced in the previous section) are simply

and, provided that has a nonempty relative interior, this is equivalent to

or equivalently

We have and

or

Since is a nonempty convex cone, we have (book of Hiriart-Urruty and Lemaréchal, Example 5.2.6 (a))

This yields the primal-dual optimality conditions we stated above.

Infeasibility and unboundedness (dual infeasibility) conditions are provided by the so-called *theorems of the alternative*. The *weak* theorems of the alternative, state that

- (Primal feasibility). At most one of the following two systems is solvable
- (i.e., or for some )
- , and

- (Dual feasibility). At most one of the following two systems is solvable
- and
- and

The primal-dual optimality conditions we stated previously together with these

feasibility conditions make the whole picture.

Consider the following feasibility problem in and

Note that for and , the above equations collapse to the

primal-dual optimality conditions. Second, due to the skew-symmetry of the above system, any solution and satisfies

which leads to , but since we already know that , it is , i.e., at least one of the two must be zero.

If and , then is a solution. If and , then the problem is either primal- or dual-infeasible. If , no definitive conclusion can be drawn.

Let us define and . Then, the self-dual embedding becomes

where . The problem can now be cast as an optimization problem

Furthermore, this is equivalent to the variational inequality

and .

This, then, becomes

Operator splitting algorithms to solve cone programs.

]]>

Pointwise convergence of , where , to a function is not in general sufficient to guarantee convergence neither of to nor of (which is a sequence of sets) to .

It is in principle easier to answer the question “under what conditions converges to ?” because the convergence of sets requires the introduction of a new notion of convergence.

As we can see in the animation below, we may have a sequence of *continuous* functions which converge *pointwise* to a function , but neither the infima nor the sets of minimisers converge as you would expect.

What we need here is an alternative notion of convergence of sequences of functions which is better suited for the study of the convergence of infima and minimiser sets. This is exactly the *epigraphical* convergence of .

Instead of looking at at individual points we look at the *epigraphs* . The epigraph of a function is defined as

We then need to introduce a suitable notion of convergence for sequences of sets. This is the *Painleve-Kuratowski convergence*, or convergence in the Fell topology.

We then say that a sequence of functions converges *epigraphically* to a function – we denote – if the sequence of sets converges to the epigraph to in the Painleve-Kuratowski sense.

Now, according to Thm. 7.33 in (1), assuming that the sequence is *eventually level-bounded* (functions have bounded level sets for all for some ), and and and are *lower semicontinuous* (i.e., they have closed epigraphs) and proper, then

and the sets are eventually nonempty and form a bounded sequence with

where here is the outer limit of a sequence of sets.

*Note.* in the Wikipedia article on Kuratowski convergence, they use the term *limit superior* in lieu of *outer limit*. In (1) the authors use the term *outer limit* instead to avoid any confusion with the set-theoretic limit which is not a topological notion.

Note that it is possible that contains more elements than the outer limit of .

A special case is when are eventually singletons. Then, we have a strong convergence result, but this would require additional assumptions such as *strict convexity*. It is otherwise quite difficult to establish conditions for the convergence of the minimisers unless we draw restrictive assumptions, e.g., that the functions are nested like .

It is also important to note that according to the above theorem, the limit of may not exist.

Regarding the *continuity* of , we should first define a space of functions on functions and equip it with an appropriate topology to judge whether the set-valued functional

is continuous. This will be the space of *lower semi-continuous* and *proper* functions and the topology will be the topology of *total epi-convergence*. It would take a lot of time and effort to explain the notion of *total* epi-convergence, but according to Thm. 7.53 in (1), it is the same as epi-convergence in certain special cases:

when is further restricted to containing :

(i) only convex functions

(ii) only positive homogeneous functions

Also, if is nonincreasing, then epi-convergence of implies its total epi-convergence and, finally, if is equi-coercive, again epi-convergence is the same as total epi-convergence.

Under these assumptions, the mapping is outer semi-continuous.

(1) R.T. Rockafellar and R.J-B. Wets, Variational Analysis, Grundlehren

der mathematischen Wissenshaften, vol. 317, Springer, Dordrecht 2000, ISBN: 978-3-540-62772-2.

We take the infimum which corresponds to the optimization problem that defines the projection on the epigraph and we have

where we have done the converse of an epigraphical relaxation; we define the function and notice that this is minimized at

and . This is of course only useful to the extent that is computable.

]]>In mathematics, the concept of duality allows to study an object by means of some other object called its dual. A linear operator can be studied via its *adjoint* operator . Certain properties of a Banach space can be studied via its *topological dual* . A convex set in can be seen as the intersection of a set of hyperplanes, that is and the latter is often a more convenient representation. These are examples of *dual objects* in mathematics. Likewise, in convex optimization, the dual object which corresponds to a convex function is its *convex conjugate. *When it comes to optimization problems, however, there are several ways in which we may derive dual optimization problems leading to different formulations. This is because we first need to specify what exactly we dualize and how…

*Notation:* In what follows, denote two Hilbert spaces. Their inner product will be denoted by With minor adaptations the results presented here hold for Banach spaces as well. We denote the extended-real line by .

In convex optimization theory, most duality concepts (including the Lagrange and Fenchel duality frameworks) source from the realization that **convex sets** can be represented as the **intersection of a set of hyperplanes**. This extends elegantly to proper, convex, lower semicontinuous functions which can be identified by their epigraphs. Recall that the epigraph of a function is the set .

Let us describe the supporting hyperplanes of the epigraph of a function. Let be an affine function with slope which is majorized by , that is for all .

Provided that is proper, convex and lower semicontinuous, for every slope there is a supporting hyperplane for the form for some . This is

The *convex conjugate* of at is the RHS of this equation which returns a value so that is a supporting hyperplane of the epigraph of .

Provided that is proper, convex and lower semicontinuous, knowing can tell us everything about . In fact, according to the Fenchel-Moreau theorem, , where .

The convex conjugates are the *dual* objects of convex functions and, as you can see, they

The framework of perturbation functions is perhaps the most elegant in optimization theory the reveals the essence of dual optimization problems [1], [2]. In fact, every dualization (there is no unique way in which we define a dual optimization problem) is associated with a perturbation function.

Consider the optimization problem

We introduce a convex function which we shall call a *perturbation function, *so that . Then, the optimization problem above can be equivalently written as

For , the following problem, which depends parametrically on , is called a *perturbed optimization problem*

Let be the optimal value of this problem. We are interested in determining . If is sufficiently regular, that is, proper, convex, lower semicontinuous (we shall discuss later when this is the case), then

otherwise .

The **dual problem** consists in finding .

By definition

therefore,

Therefore, the determination of requires the solution of the optimization problem

This is precisely the **Fenchel dual** optimization problem [3].

Let us see how exactly perturbation functions lead naturally to dual formulations. The convex conjugate of is

therefore,

As a result, the dual optimization problem problem can be written as

or, equivalently

Juxtapose this with the primal problem which is

Lagrangian duality is a particular type of duality for optimization problems of the form

where , and is meant in the component-wise sense.

We perturb the problem as follows

where . Define the set . The perturbed problem is written as

where is the indicator function of , that is

In other words, the perturbation function is defined as

Its convex conjugate is

where .

This is exactly the **Lagrangian dual** optimization problem which is the Fenchel dual on a properly chosen perturbation function.

[1] R.I. Bot, S.M. Grad and G. Wanka, Fenchel-Lagrange Duality Versus Geometric Duality in Convex Optimization, JOTA 129(1):35-54, 2006.

[2] R.T. Rockafellar, R. J.-B. Wets, Variational Analysis, Springer, 2009.

[3] H. H. Bauschke, P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces, Springer, 2011.

]]>That is, quadratic constraints (of the form ), these can be converted to second-order conic constraints.

Note that in the constraints may or may not be a decision variable (it may as well be a constant).

The following identity does the trick:

Then,

where is the second-order cone, also known as Lorentz cone or ice cream cone.

Define the linear operator

Then, the quadratic constraints becomes

Having for some symmetric positive semidefinite matrix is no different than what we just did since .

]]>