Constrained optimization#

ECON2125/6012 Lecture 8 Fedor Iskhakov

Announcements & Reminders

  • Submit Test 3 before October 4, 23:59

  • The test will include the material from the last two lectures (unconstrained and constrained optimization) with some additional material on fundamentals of optimization from the lecture before (properties of optima and convexity)

  • Next week you will have a guest lecture by Dr. Esben Scriver Andersen who is a postdoc at RSE working on structural estimation of dynamic discrete choice and matching models
    Topic: numerical methods for practical optimization problems

_images/esben.jpeg

Plan for this lecture

  1. Constrained vs unconstrained optimization

  2. Equality constrains and Lagrange method

  3. Inequality constrains and Kuhn-Tucker method

  4. Constrained optimization under convexity

Supplementary reading:

  • Simon & Blume: 16.2, whole of chapter 18, 19.3

  • Sundaram: chapters 5 and 6

Setting up a constrained optimization problem#

Let’s start with recalling the definition of a general optimization problem

Definition

The general form of the optimization problem is

V(θ)=maxxf(x,θ)subject togi(x,θ)=0,i{1,,I}hj(x,θ)0,j{1,,J}

where:

  • f(x,θ):RN×RKR is an objective function

  • xRN are decision/choice variables

  • θRK are parameters

  • gi(x,θ)=0,i{1,,I} where gi:RN×RKR, are equality constraints

  • hj(x,θ)0,j{1,,J} where hj:RN×RKR, are inequality constraints

  • V(θ):RKR is a value function

Today we focus on the problem with constrants, i.e. I+J>0

The obvious classification that we follow

  • equality constrained problems, i.e. I>0, J=0

  • inequality constrained problems, i.e. J>0, which can include equalities as special case

Note

Note that every optimization problem can be seen as constrained if we define the objective function to have the domain that coincides with the admissible set.

maxxf(x):RNRsubject togi(x)=0,i{1,,I}hj(x)0,j{1,,J}

is equivalent to

maxxf(x):ARA={xRN:gi(x)=0,i{1,,I} and hj(x)0,j{1,,J}}

Equality constrains and Lagrange method#

Constrained Optimization#

A major focus of econ: the optimal allocation of scarce resources

  • optimal = optimization

  • scarce = constrained

Standard constrained problems:

  • Maximize utility given budget

  • Maximize portfolio return given risk constraints

  • Minimize cost given output requirement

Example

Maximization of utility subject to budget constraint

maxu(x1,x2) such that p1x1+p2x2m
  • pi is the price of good i, assumed >0

  • m is the budget, assumed >0

  • xi0 for i=1,2

  • let u(x1,x2)=αlog(x1)+βlog(x2), α>0, β>0

_images/budget_set_1.png

Fig. 86 Budget set when, p1=1, p2=1.2, m=4#

_images/budget_set_2.png

Fig. 87 Budget set when, p1=0.8, p2=1.2, m=4#

_images/log_util.png

Fig. 88 Log utility with α=0.4, β=0.5#

_images/log_util_contour.png

Fig. 89 Log utility with α=0.4, β=0.5#

We seek a bundle (x1,x2) that maximizes u over the budget set

That is,

αlog(x1)+βlog(x2)αlog(x1)+βlog(x2)

for all (x1,x2) satisfying xi0 for each i and

p1x1+p2x2m

Visually, here is the budget set and objective function:

_images/budget_set_3.png

Fig. 90 Utility max for p1=1, p2=1.2, m=4, α=0.4, β=0.5#

First observation: u(0,x2)=u(x1,0)=u(0,0)=

  • Hence we need consider only strictly positive bundles

Second observation: u(x1,x2) is strictly increasing in both xi

  • Never choose a point (x1,x2) with p1x1+p2x2<m

  • Otherwise can increase u(x1,x2) by small change

Hence we can replace with = in the constraint

p1x1+p2x2mbecomesp1x1+p2x2=m

Implication: Just search along the budget line

Substitution Method#

Substitute constraint into objective function

This changes

maxx1,x2{αlog(x1)+βlog(x2)} such that p1x1+p2x2=m

into

maxx1{αlog(x1)+βlog((mp1x1)/p2)}

Since all candidates satisfy x1>0 and x2>0, the domain is

0<x1<mp1
_images/one_dim_budget.png

Fig. 91 Utility max for p1=1, p2=1.2, m=4, α=0.4, β=0.5#

_images/log_util_budget_line.png

Fig. 92 Utility max for p1=1, p2=1.2, m=4, α=0.4, β=0.5#

First order condition for

maxx1{αlog(x1)+βlog((mp1x1)/p2)}

gives

x1=αα+βmp1

Exercise: Verify

Exercise: Check second order condition (strict concavity)

Substituting into p1x1+p2x2=m gives

x2=ββ+αmp2
_images/log_util_optimizer.png

Fig. 93 Maximizer for p1=1, p2=1.2, m=4, α=0.4, β=0.5#

Substitution method: recipe#

How to solve

maxx1,x2f(x1,x2) such that g(x1,x2)=0

Steps:

  1. Write constraint as x2=h(x1) for some function h

  • Solve univariate problem maxx1f(x1,h(x1)) to get x1

  1. Plug x1 into x2=h(x1) to get x2

Limitations: substitution doesn’t always work

Example

Suppose that g(x1,x2)=x12+x221

Step 1 from the cookbook says

use g(x1,x2)=0 to write x2 as a function of x1

But x2 has two possible values for each x1(1,1)

x2=±1x12

In other words, x2 is not a well defined function of x1

As we meet more complicated constraints such problems intensify

  • Constraint defines complex curve in (x1,x2) space

  • Inequality constraints, etc.

We need more general, systematic approach

Tangency#

Consider again the problem

maxx1,x2{αlog(x1)+βlog(x2)} such that p1x1+p2x2=m

Turns out that the maximizer has the following property:

  • budget line is tangent to an indifference curve at maximizer

_images/log_util_maximizer_tangent.png

Fig. 94 Maximizer for p1=1, p2=1.2, m=4, α=0.4, β=0.5#

In fact this is an instance of a general pattern

Notation: Let’s call (x1,x2) interior to the budget line if xi>0 for i=1,2 (not a “corner” solution, see below)

In general, any interior maximizer (x1,x2) of differentiable utility function u has the property: budget line is tangent to a contour line at (x1,x2)

Otherwise we can do better:

_images/log_util_not_maximizer.png

Fig. 95 When tangency fails we can do better#

Necessity of tangency often rules out a lot of points. We exploit this fact to

  • Build intuition

  • Develop more general methods

Diversion: derivative of an implicit function

To continue with the tangency approach we need to know how to differentiate an implicit function x2(x1) which is given by a contour line of the utility function f(x1,x2)=c

Let’s approach this task in the following way

  • utility function is given by f:R2R

  • the contour line is given by f(x1,x2)=c which defines an implicit function x1(x2)

  • let map g:RR2 given by g(x2)=[x1(x2),x2]

The total derivative of fg can be derived using the chain rule

Dg(x2)=(dx1dx21)Df(x)=(fx1(x),fx2(x))=(f1(x),f2(x))D(fg)(x)=Df(x)Dg(x)=f1(g(x2))dx1dx2+f2(g(x2))

Differentiating the equation f(x1,x2)=c on both sides with respect to x2 gives D(fg)(x)=0, thus leading to the final expression for the derivative of the implicit function

dx1dx2(x2)=f2(x1,x2)f1(x1,x2)=f2(x1,x2)f1(x1,x2)

Using Tangency: Relative Slope Conditions#

Consider an equality constrained optimization problem where objective and constraint functions are differentiable:

f(x1,x2)maxx1,x2 such that g(x1,x2)=0

How to develop necessary conditions for optima via tangency?

_images/tangency_1.png

Fig. 96 Contours for f and g#

_images/tangency_2.png

Fig. 97 Contours for f and g#

_images/tangency_3.png

Fig. 98 Tangency necessary for optimality#

How do we locate an optimal (x1,x2) pair?

_images/tangency_4.png

Fig. 99 Slope of contour lines#

Let’s choose (x1,x2) to equalize the slopes

That is, choose (x1,x2) to solve

f1(x1,x2)f2(x1,x2)=g1(x1,x2)g2(x1,x2)

Equivalent:

f1(x1,x2)f2(x1,x2)=g1(x1,x2)g2(x1,x2)

Also need to respect g(x1,x2)=0

_images/tangency_5.png

Fig. 100 Condition for tangency#

Tangency condition algorithm#

In summary, when f and g are both differentiable functions, to find candidates for optima in

maxx1,x2f(x1,x2)
 such that g(x1,x2)=0
  1. (Impose slope tangency) Set

f1(x1,x2)f2(x1,x2)=g1(x1,x2)g2(x1,x2)
  1. (Impose constraint) Set g(x1,x2)=0

  2. Solve simultaneously for (x1,x2) pairs satisfying these conditions

Example

Consider again

maxx1,x2{αlog(x1)+βlog(x2)}
 such that p1x1+p2x2m=0

Then

f1(x1,x2)f2(x1,x2)=g1(x1,x2)g2(x1,x2)αβx2x1=p1p2

Solving simultaneously with p1x1+p2x2=m gives

x1=αα+βmp1andx2=ββ+αmp2

Same as before…

Slope Conditions for Minimization#

Good news: The conditions are exactly the same

In particular:

  • Lack of tangency means not optimizer

  • Constraint must be satisfied

_images/tangency_min_1.png

Fig. 101 Lack of tangency#

_images/tangency_min_2.png

Fig. 102 Condition for tangency#

The method of Lagrange multipliers#

Lagrange theorem

Let f:RNR and g:RNRK be continuously differentiable functions.

Let D=U{x:gi(x)=0,i=1,,K} where URN is an open set.

Suppose that xD is a local extremum of f on D and that the gradients of the constraint functions gi are linearly independent at x (equivalently, the rank of the Jacobian matrix Dg(x) is equal to K)

Then there exists a vector λRK such that

Df(x)λDg(x)=Df(x)i=1KλiDgi(x)=0

Note

  • λ is called the vector of Lagrange multipliers

  • The assumption that the gradients of the constraint functions gi are linearly independent at x is called the constraint qualification assumption

  • Lagrange theorem provides necessary first order conditions for both maximum and minimum problems

Definition

The function L:RN+KR defined as

L(x,λ)=f(x)λg(x)=f(x)i=1Kλigi(x)

is called a Lagrangian (function)

  • Different sources define the Lagrangian with minus or plus in front of the second term: mathematically the two definitions are equivalent, but the choice of the sign affects the interpretation of the Lagrange multipliers

  • I like to have a minus in the Lagrangian for consistency with the inequality constrained case where the sign is important

  • We come back to this in lecture 10

Fact

First order conditions for the Lagrangian coincide with the conditions in the Lagrange theorem, and together with the constraint qualification assumption they provide necessary first order conditions for the constrained optimization problem.

Example

maxx1,x2{αlog(x1)+βlog(x2)} such that p1x1+p2x2m=0

Form the Lagrangian

L(x1,x2,λ)=αlog(x1)+βlog(x2)λ(p1x1+p2x2m)

FOC

Lx1=0αx1λp1=0Lx2=0βx2λp2=0Lλ=0x1p1+x2p2m=0

From the first two equations we have x1=αλp1 and x2=βλp2, and then from the third one αλ+βλ=m. This gives λ=α+βm, leading to the same result

x1=αα+βmp1andx2=ββ+αmp2

Lagrangian as tangency condition#

The “standard machine” for optimization with equality constraints

maxx1,x2f(x1,x2) such that g(x1,x2)=0

Set

L(x1,x2,λ):=f(x1,x2)λg(x1,x2)

and solve the system of

Lx1=0,Lx2=0,Lλ=0

We have

Lxi(x1,x2,λ)=0fi(x1,x2)=λgi(x1,x2),i=1,2

Hence Lx1=Lx2=0 gives

f1(x1,x2)f2(x1,x2)=g1(x1,x2)g2(x1,x2)

Finally

Lλ(x1,x2,λ)=0g(x1,x2)=0

Hence the method leads us to the same conditions as the tangency condition method!

Second order conditions#

  • similar to the unconstrained case, but applied to the Lagrangian

  • have to do with local optima only

  • similarly, provide sufficient conditions in the strict case, but only necessary conditions in the weak case

Using the rules of the differentiation from multivariate calculus we can express the gradient and the Hessian of the Lagrangian w.r.t. x as

xL(λ,x)=Df(x)λDg(x)HxL(λ,x)=Hf(x)λHg(x)

Fact (necessary SOC)

Let f:RNR and g:RNRK be twice continuously differentiable functions (C2) and let D=U{x:gi(x)=0,i=1,,K} where URN is an open set.

Suppose xD is the local constrained optimizer of f on D and that the constraint qualification assumption holds at x.

Then:

  • If f has a local maximum on D at x, then HxL(λ,x) defined above is negative semi-definite on a linear constraint set {v:Dg(x)v=0}, that is

vRN and Dg(x)v=0vHxL(λ,x)v0
  • If f has a local minimum on D at x, then HxL(λ,x) defined above is positive semi-definite on a linear constraint set {v:Dg(x)v=0}, that is

vRN and Dg(x)v=0vHxL(λ,x)v0

Fact (sufficient SOC)

Let f:RNR and g:RNRK be twice continuously differentiable functions (C2) and let D=U{x:gi(x)=0,i=1,,K} where URN is an open set.

Suppose there exists xD such that rank(Dg(x))=K and Df(x)λDg(x)=0 for some λRK and the constraint qualification assumption holds at x (FOC satisfied)

Then:

  • If HxL(λ,x) defined above is negative definite on a linear constraint set {v:Dg(x)v=0}, that is

v0 and Dg(x)v=0vHxL(λ,x)v<0,

then x is the local constrained maximum of f on D

  • If HxL(λ,x) defined above is positive definite on a linear constraint set {v:Dg(x)v=0}, that is

v0 and Dg(x)v=0vHxL(λ,x)v>0

then x is the local constrained minimum of f on D

Thus, the structure of both the necessary and the sufficient conditions is very similar to the unconstrained case.

The important difference is that the definiteness of the Hessian is checked on the linear constraint set {v:Dg(x)v=0} rather than on the whole RN.

The linear constraint set represents the tangent hyperplane to the constraint set at x.

But how do we check for definiteness on a linear constraint set?

Definition

The Hessian of the Lagrangian with respect to (λ,x) is caleld a bordered Hessian, and is given by

HL(λ,x)=([0]K×K[Dg(x)]K×N[Dg(x)]N×K[Hf(x)λHg(x)]N×N)

Similarly to the Hessian w.r.t. x above, we can derive the bordered Hessian using the rules of the differentiation from multivariate calculus, differentiating the gradient of the Lagrangian w.r.t. (λ,x) which is given

L(λ,x)=(g(x),Df(x)λDg(x))
  • the subscripts in the definition above denote the shape of the blocks in the bordered Hessian

  • some textbooks place the “border”, i.e. Jacobians of the constrained in upper left corner and the Hessian w.r.t. x of the objective in the lower right corner, with corresponding changes in the propositions below

  • the sign for the second term in the Lagrangian also appears and may differ between different texts

Definition

Consider a n×n matrix A and let 0<kn.

The leading principal minor of order k of A is the determinant of a matrix obtained from A by deleting its last nk rows and columns.

Example

A=(a11a1nan1ann)

The leading principal minors of oder k=1,,n of A are

M1=det(a11)=a11M2=det(a11a12a21a22)=a11a22a12a21M3=det(a11a12a13a21a22a23a31a32a33)Mn=det(A)

Fact

In the same settings as propositions above, namely for f:RNR and g:RNRK twice continuously differentiable functions (C2) and D=U{x:gi(x)=0,i=1,,K} where URN is an open.

HxL(λ,x) is negative semi-definite on a linear constraint set {v:Dg(x)v=0} the last NK leading principal minors of HL(λ,x) are zero or alternate in sign, with the sign of the full Hessian matrix equal to (1)N.

HxL(λ,x) is positive semi-definite on a linear constraint set {v:Dg(x)v=0} the last NK leading principal minors of HL(λ,x) are zero or have the same sign as (1)K.

HxL(λ,x) is negative definite on a linear constraint set {v:Dg(x)v=0} the last NK leading principal minors of HL(λ,x) are non-zero and alternate in sign, with the sign of the full Hessian matrix equal to (1)N.

HxL(λ,x) is positive definite on a linear constraint set {v:Dg(x)v=0} the last NK leading principal minors of HL(λ,x) are non-zero and have the same sign as (1)K.

  • Similarly to the unconstrained case, when none of the listed conditions for the bordered Hessian hold, it is indefinite and the point tested by the second order conditions is not a constrained optimizer.

Fact

Determinant of 3×3 matrix can be computed by the triangles rule:

det(a11,a12,a13a21,a22,a23a31,a32,a33)=+a11a22a33+a12a23a31+a13a21a32a13a22a31a12a21a33a11a23a32
_images/det33.png

Example

Let’s check the second order conditions for the consumer choice problem.

L(x1,x2,λ)=αlog(x1)+βlog(x2)λ(p1x1+p2x2m)
xL(λ,x)=(αx1λp1,βx2λp2)
HxL(λ,x)=(αx12,00,βx22)
L(λ,x)=(mx1p1x2p2,αx1λp1,βx2λp2)
HL(λ,x)=(0,p1,p2p1,αx12,0p2,0,βx22)

Solving FOC we have as before

x=(αα+βmp1,ββ+αmp2) and λ=α+βm

To check the second order conditions compute the Hessian at x

HL(λ,x)=(0,p1,p2p1,(α+β)2p12αm2,0p2,0,(α+β)2p22βm2)

Because N=2 and K=1 we need to check one last lead principle minor, which is the determinant of the whole HL(λ,x). Using the triangle rule for computing determinants of the 3×3 matrix, we have

det(HL(λ,x))=(α+β)2p12p22αm2+(α+β)2p12p22βm2=(α+β)3p12p22αβm2>0

We have that the determinant of the bordered Hessian has the same sign as (1)N, and the condition for sign alternation is satisified for NK=1 last leading principal minors.

Hence, HxL(λ,x) is negative definite, implying that x is the local constrained maximum of f on D by the sufficient second order condition.

Lagrangian method: recipe#

  1. Write down the Lagrangian function L(x,λ)

  2. Find all stationary points of L(x,λ) with respect to x and λ, i.e. solve the system of first order equations

  3. Derive the bordered Hessian HL(x,λ) and compute it value at the stationary points

  4. Using second order conditions, check if the stationary points are local optima

  5. Compare the function values at all identified local optima to find the global one

  • in practice second order conditions (steps 3 and 4) may be skipped when for example the global shape of the function is known (convex case)

When does Lagrange method fail?#

Example

f(x,y)=ymaxx,ysubject tog(x,y)=y3x2=0

Proceed according to the Lagrange recipe

L(x,y,λ)=yλ(y3x2)
L(x,y,λ)=(x2y3,2xλ,3y2λ1)

The FOC system of equations

{2xλ=03y2λ+1=0x2y3=0

does not have solutions! From second equation λ0, then from the first equation x=0, and so y=0 from the thrid, but then the second equation is not satisfied.

However, we can deduce that (x,y)=(0,0) is a maximizer in the following way. For all points (x,y) which satisfy the constraint y30 because x20, and thus the maximum attainable value is f(x,y)=0 which it takes at (0,0).

Why does Lagrange method fail?

g(x,y)=(2x,3y2)g(x,y)=(0,0)

The Lagrange method fails because the constraint qualification assumption is not satisfied at (0,0), i.e. the rank of Jacobian matrix of the constraints (one by two in this case) is zero which is less than the number of constraints K.

Lagrange method may not work when:

  • Constraints qualification assumption is not satisfied at the stationary point

  • FOC system of equations does not have solutions (thus no local optima)

  • Saddle points – second order conditions can help

  • local optimizer is not a global one – no way to check without studying analytic properties of each problem

Conclusion: great tool, but blind application may lead to wrong results!

Inequality constraints and Kuhn-Tucker method#

Turning now to the inequality constrained optimization problems

Example

Maximization of utility subject to budget constraint

u(x1,x2)=αx1+βlog(x2)maxx1,x2 subject top1x1+p2x2mx10,x20
  • pi is the price of good i, assumed non-negative

  • m is the budget, assumed non-negative

  • α>0, β>0

Apply the Lagrange method neglecting the non-negativity requirement and assuming no money are wasted, and so the budget constraint binds:

L(x1,x2,λ)=αx1+βlog(x2)λ(p1x1+p2x2m)
Lx1=0αλp1=0Lx2=0βx2λp2=0Lλ=0x1p1+x2p2m=0

Solving FOC, from the first equation we have λ=α/p1, then from the second equation x2=βp1/αp2 and from the third x1=m/p1β/α. This is the only stationary point.

Hence, for some admissible parameter values, for example, α=β=1, p1=p2=1 and m=0.4 we can have the optimal level of consumption of good 1 to be negative!

_images/corner_sol.png

Fig. 103 Tangent point is infeasible#

Interpretation: No interior solution

Put differently

  • Every interior point on the budget line is dominated by the infeasible solution

  • Hence solution must be on the boundary

Since x2=0 implies x1+log(x2)=, solution is

  • x1=0

  • x2=m/p2=0.4

_images/corner_sol_2.png

Fig. 104 Corner solution#

Let’s look at the systematic solution approach where the corner solutions will emerge naturally when they are optimal.

Karush-Kuhn-Tucker conditions (maximization)

Let f:RNR and g:RNRK be continuously differentiable functions.

Let D=U{x:gi(x)0,i=1,,K} where URN is an open set.

Suppose that xD is a local maximum of f on D and that the gradients of the constraint functions gi corresponding to the binding constraints are linearly independent at x (equivalently, the rank of the matrix composed of the gradients of the binding constraints is equal to the number of binding constraints).

Then there exists a vector λRK such that

Df(x)λDg(x)=Df(x)i=1KλiDgi(x)=0

and

λi0 with λigi(x)=0i=1,,K

Karush-Kuhn-Tucker conditions (miminization)

In the settings of KKT conditions (maximization), suppose that xD is a local minimum of f on D as before the matrix composed of the gradients of the binding constraints has full rank.

Then there exists a vector λRK such that (note opposite sign)

Df(x)+λDg(x)=Df(x)+i=1KλiDgi(x)=0

and

λi0 with λigi(x)=0i=1,,K
  • Very similar to the Lagrange theorem, but now we have inequalities!

  • The last set of conditions is called the complementary slackness conditions, and they play the following role:

    • if for a given i gi(x)=0, that is the i-th constraint is binding , then the corresponding λi>0 acts as a Lagrange multiplier for an equality constraint

    • if on the other hand for a given i gi(x)<0, the corresponding λi must be zero, removing the term with Dgi(x) from the first condition

  • This way the KKT conditions combine the unconstrained and equality constrained conditions in one

_images/KKTdiagram.png

Fig. 105 Binding and non-binding constraint at x#

Karush - Kuhn-Tucker method: recipe#

Essentially the same as for the Lagrange method

  1. Write down the Lagrangian function L(x,λ)

  2. Solve the KKT first order conditions to find the stationary points of L(x,λ) with respect to x, together with the non-negativity of KKT multipliers and complementary slackness conditions

  3. Using second order conditions using the subset of binding constraints, check if the stationary points at the boundary are local optima. Similarly, using the second order conditions for the unconstrained problem to investigate the interior stationary points

  4. Compare the function values at all identified local optima to find the global one

Possible issues with KKT method are similar to the Lagrange method:

  • constraint qualification assumption

  • existence of constrained optima

  • local vs global optimality

Example

Returning to the utility maximization problem with budget constraint and non-negative consumption

u(x1,x2)=αx1+βlog(x2)maxx1,x2 subject top1x1+p2x2mx10,x20

Form the Lagrangian with 3 inequality constraints (have to flip the sign for non-negativity to stay within the general formulation)

L(x1,x2,λ1,λ2,λ3)==αx1+βlog(x2)λ1(x1)λ2(x2)λ3(p1x1+p2x2m)==αx1+βlog(x2)+λ1x1+λ2x2λ3(p1x1+p2x2m)

The necessary KKT conditions are given by the following system of equations

{Lx1=0α+λ1λ3p1=0Lx2=0βx2+λ2λ3p2=0x10x20x1p1+x2p2mλ10 and λ1x1=0λ20 and λ2x2=0λ30 and λ3(x1p1+x2p2m)=0

The KKT conditions can be solved systematically by considering all combinations of the multipliers:

  1. λ1=λ2=λ3=0
    The first equation becomes α=0 which is inconsistent with the initially set α>0

  2. λ1=λ2=0,λ3>0x1p1+x2p2m=0
    This is the exact case we looked at with the Lagrange method ignoring the non-negativity conditions on consumption. The solution is x1=mp1βα and x2=βp1αp2 if it also holds that x1>0 and x2>0, i.e. p1/m<α/β

  3. λ1=λ3=0,λ2>0x2=0
    The case of x2=0 is outside of the domain of the utility function and could in fact be excluded from the start.

  4. λ1=0,λ2>0,λ3>0x2=x1 and p1+x2p2m=0
    Inconsistent similarly to the previous case

  5. λ1>0,λ2=λ3=0x1=0
    The second equation becomes β/x2=0 which is inconsistent with the β>0 and x20

  6. λ1>0,λ2=0,λ3>0x1=0 and p1+x2p2m=0
    We have the following system in this case

{α+λ1λ3p1=0βx2λ3p2=0x2p2m=0

From the last equation x2=m/p2, combining the two last equations λ3=β/m, and from the first equation λ1=βp1/mα. The solution holds conditional on λ1>0, i.e. p1/m>α/β.

  1. λ1>0,λ2>0,λ3=0x1=0 and x2=0
    Inconsistent similar to case 3

  2. λ1>0,λ2>0,λ3>0x1=x2=p1+x2p2m=0
    Inconsistent similarly to the previous case

To summarize, the solution to the KKT conditions is given by the following cases (it’s easy to see that the two solutions coincide for the equality in the parameter condition):

{x1=mp1βα,x2=βp1αp2, if p1/mα/β,x1=0,x2=mp2, if p1/m>α/β

Thus, the corner solution is included in the solution set of the KKT conditions.

Second order conditions#

Nearly identical to the secondary order conditions in the equality constraints case, except written only for the constraints which are binding/active at x under consideration.

Example

f(x,y)=x333y2+5x6xymaxx,ysubject to(x/4)2+(y/8)21,x,yR
Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm

f = lambda x: x[0]**3/3 - 3*x[1]**2 + 5*x[0] - 6*x[0]*x[1]

x = y = np.linspace(-10.0, 10.0, 100)
X, Y = np.meshgrid(x, y)
zs = np.array([f((x,y)) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)

a,b=4,8
# (x/a)^2 + (y/b)^2 = 1
theta = np.linspace(0, 2 * np.pi, 100)
X1 = a*np.cos(theta)
Y1 = b*np.sin(theta)
zs = np.array([f((x,y)) for x,y in zip(np.ravel(X1), np.ravel(Y1))])
Z1 = zs.reshape(X1.shape)

fig = plt.figure(dpi=160)
ax2 = fig.add_subplot(111)
ax2.set_aspect('equal', 'box')
ax2.contour(X, Y, Z, 50,
            cmap=cm.jet)
ax2.plot(X1, Y1)
plt.setp(ax2, xticks=[],yticks=[])

fig = plt.figure(dpi=160)
ax3 = fig.add_subplot(111, projection='3d')
ax3.plot_wireframe(X, Y, Z, 
            rstride=2, 
            cstride=2,
            alpha=0.7,
            linewidth=0.25)
f0 = f(np.zeros((2)))+0.1
ax3.plot(X1, Y1, Z1, c='red')
plt.setp(ax3,xticks=[],yticks=[],zticks=[])
ax3.view_init(elev=18, azim=154)

plt.show()
_images/9c6f524d6b0a7902469c59abf1311c8d582f92ee14c84d8543831366fabc4433.png _images/d1641a091417d258e07db925184992333893dee5474522b444ef7de3ab6b398e.png

Extra material#

Story of William Karush and his contribution the KKT theorem by Richard W Cottle (download pdf)