📖 Unconstrained optimization#

21 min | 2104 words

A standard optimization problem#

Example

Consider a (monopolistic) firm that is facing a market demand for its products D(p)=α+14αp2p and the cost of production C(q)=βq+δq2. As usual, p and q denote price and quantity of product, respectively.

To maximize its profit π(p,q)=pqC(q), the firm solves the following optimization problem

Π(α,β,δ)=maxp,qπ(p,q)=maxp,q[pqC(q)]subject toq=D(p)q0p>0

Plugging in the functions D(p) and C(q) we have an equivalent formulation

Π(α,β,δ)=maxp,q[(pβ)qδq2]subject toq=α+14αp2pq0p>0

Components of the optimization problem#

  1. Objective function: function to be maximized or minimized, also known as maximand
    In the example above profit function π(p,q)=pqC(q) to be maximized

  2. Decision/choice variables: the variables that the agent can control in order to optimize the objective function, also known as controls
    In the example above price p and quantity q variables that the firm can choose to maximize its profit

  3. Equality constraints: restrictions on the choice variables in the form of equalities
    In the example above q=α+14αp2p

  4. Inequality constraints (weak and strict): restrictions on the choice variables in the form of inequalities
    In the example above q0 and p>0

  5. Parameters: the variables that are not controlled by the agent, but affect the objective function and/or constraints
    In the example above α, β and δ are parameters of the problem

  6. Value function: the “optimized” value of the objective function as a function of parameters
    In the example above Π(α,β,δ) is the value function

Definition

The general form of the optimization problem is

V(θ)=maxxf(x,θ)subject togi(x,θ)=0,i{1,,I}hj(x,θ)0,j{1,,J}

where:

  • f(x,θ):RN×RKR is an objective function

  • xRN are decision/choice variables

  • θRK are parameters

  • gi(x,θ)=0,i{1,,I} where gi:RN×RKR, are equality constraints

  • hj(x,θ)0,j{1,,J} where hj:RN×RKR, are inequality constraints

  • V(θ):RKR is a value function

Definition

The set of admissible choices (admissible set) contains all the choices that satisfy the constraints of the optimization problem.

Note

Sometimes the equality constraints are dropped from the definition of the optimization problem, because they can always be represented as a pair of inequality constraints gi(x,θ)0 and gj(x,θ)0

Note

Note that strict inequality constraints are not present in the definition above, although they may be present in the economic applications. You already know that this has to do with the intention to keep the set of admissible choices closed, such that the solution of the problem (has a better chance to) exist. Sometimes they are added to the definition.

A roadmap for formulating an optimization problem (in economics)

  1. Determine which variables are choice variables and which are parameters according to what the economic agent has control over

  2. Determine whether the optimization problem is a maximization or a minimization problem

  3. Determine the objective function of the economic agent (and thus the optimization problem)

  4. Determine the constraints of the optimization problem: equality and inequality, paying particular attention to whether inequalities should be strict or weak (the latter has huge implications for the existence of the solution)

Example

Consider a decision maker who is deciding how to divide the money they have between food and services, bank deposit and buying some crypto. Discuss and Write down the corresponding optimization problem. [class exercise]

Classes of the optimization problems#

  1. Static optimization: finite number of choice variables

  • singe instance of choice

  • deterministic finite horizon dynamic choice models can be represented as static

  • our main focus in this course

  1. Dynamic programming: some choice variables are infinite sequences, solved using similar techniques as static optimization

  • will touch upon in the end of the course

  1. Deterministic optimal control: some “choice variables” are functions, completely new theory is needed

  2. Stochastic optimal control: “choice variables” are functions, objective function is a stochastic process, yet more theory is needed

Review of one-dimensional optimization#

Let f:[a,b]R be a differentiable function

  • f takes x[a,b]R and returns number f(x)

  • derivative f(x) exists for all x with a<x<b

Differentiability implies that f is continuous, and because the interval [a,b] is closed and bounded, f has a maximum and a minimum on [a,b] by the Weierstrass extreme value theorem.

Reminder of definitions

A point x[a,b] is called a

  • maximizer of f on [a,b] if f(x)f(x) for all x[a,b]

  • minimizer of f on [a,b] if f(x)f(x) for all x[a,b]

Point x is called interior to [a,b] if a<x<b

A stationary point of f on [a,b] is an interior point x with f(x)=0

_images/stationary.png

Fig. 73 Both x and x are stationary#

Fact

If f is differentiable and x is either an interior minimizer or an interior maximizer of f on [a,b], then x is stationary

any interior maximizer stationary
set of interior maximizers set of stationary points
maximizers stationary points{a}{b}

Algorithm for finding maximizers/minimizers:

  1. Locate stationary points

  2. Evaluate y=f(x) for each stationary x and for a, b

  3. Pick point giving largest y value

First oder conditions (FOC)#

Necessary

In this lecture we focus on the unconstrained optimization problems of the form

maxxf(x,θ) also written as f(x,θ)max,minxf(x,θ) also written as f(x,θ)min,

where f(x,θ):RNR and unless stated otherwise is assumed to be continuous and twice continuously differentiable everywhere on RN. Parameter θ may or may not be present.

Note

Twice continuously differentiable functions are said to be C2.

  • Every point in the whole space RN is interior, therefore all maximizers/minimizers have to be stationary points

  • Assuming differentiability implies we can focus on derivative based conditions

Definition

Given a function f:RNR, a point xRN is called a stationary point of f if f(x)=0

Definition

Given a function f:RNR, a point xRN is called a local maximizer/minimizer of f if ϵ such that f(x)f(x) for all xBϵ(x)

If the inequality is strict, then x is called a strict local maximizer/minimizer of f

A maximizer/minimizer (global) must also be a local one, but the opposite is not necessarily true.

Fact (Necessary condition for optima)

Let f(x,θ):RNR be a differentiable function and let xRN be a local maximizer/minimizer of f.

Then x is a stationary point of f, that is f(x)=0

Example

Consider quadratic form f(x)=xAx where A=(1,0.50.5,2)

Solving the FOC

f(x)=2xA=0x=0

The point (0,0) should be an optimizer.

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm

A = np.array([[1,.5],[.5,2]])
f = lambda x: x@A@x
x = y = np.linspace(-5.0, 5.0, 100)
X, Y = np.meshgrid(x, y)
zs = np.array([f((x,y)) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)

fig = plt.figure(dpi=160)
ax1 = fig.add_subplot(111, projection='3d')
ax1.plot_surface(X, Y, Z, 
            rstride=2, 
            cstride=2,
            cmap=cm.jet,
            alpha=0.7,
            linewidth=0.25)
plt.setp(ax1,xticks=[],yticks=[],zticks=[])

fig = plt.figure(dpi=160)
ax2 = fig.add_subplot(111)
ax2.set_aspect('equal', 'box')
ax2.contour(X, Y, Z, 50,
            cmap=cm.jet)
plt.setp(ax2, xticks=[],yticks=[])

fig = plt.figure(dpi=160)
ax3 = fig.add_subplot(111, projection='3d')
ax3.plot_wireframe(X, Y, Z, 
            rstride=2, 
            cstride=2,
            alpha=0.7,
            linewidth=0.25)
f0 = f(np.zeros((2)))+0.1
ax3.scatter(0, 0, f0, c='black', marker='o', s=10)
ax3.plot([-3,3],[0,0],[f0,f0],color='black')
ax3.plot([0,0],[-3,3],[f0,f0],color='black')
plt.setp(ax3,xticks=[],yticks=[],zticks=[])

plt.show()
_images/6ef874230d5f955db21f8fc6cf8af740328eedf32326dd1d833206d8a3a78c01.png _images/cd96216b562ed8bc908234e8403ff6325a31d2a2d685e9306a49d494fce14458.png _images/a39efaa9e1f83fe5cfcd16f9f17e0ab693d8054dc2086e6cb88b84b435aa07e0.png

Example

Consider quadratic form f(x)=xAx where A=(1,0.50.5,2)

Solving the FOC

f(x)=2xA=0x=0

The point (0,0) should be an optimizer?

Hide code cell source
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from matplotlib import cm

A = np.array([[1,.5],[.5,-2]])
f = lambda x: x@A@x
x = y = np.linspace(-5.0, 5.0, 100)
X, Y = np.meshgrid(x, y)
zs = np.array([f((x,y)) for x,y in zip(np.ravel(X), np.ravel(Y))])
Z = zs.reshape(X.shape)

fig = plt.figure(dpi=160)
ax1 = fig.add_subplot(111, projection='3d')
ax1.plot_surface(X, Y, Z, 
            rstride=2, 
            cstride=2,
            cmap=cm.jet,
            alpha=0.7,
            linewidth=0.25)
plt.setp(ax1,xticks=[],yticks=[],zticks=[])

fig = plt.figure(dpi=160)
ax2 = fig.add_subplot(111)
ax2.set_aspect('equal', 'box')
ax2.contour(X, Y, Z, 50,
            cmap=cm.jet)
plt.setp(ax2, xticks=[],yticks=[])

fig = plt.figure(dpi=160)
ax3 = fig.add_subplot(111, projection='3d')
ax3.plot_wireframe(X, Y, Z, 
            rstride=2, 
            cstride=2,
            alpha=0.7,
            linewidth=0.25)
f0 = f(np.zeros((2)))+0.1
ax3.scatter(0, 0, f0, c='black', marker='o', s=10)
ax3.plot([-3,3],[0,0],[f0,f0],color='black')
ax3.plot([0,0],[-3,3],[f0,f0],color='black')
plt.setp(ax3,xticks=[],yticks=[],zticks=[])

plt.show()
_images/475c3726000289cb39aa908cacc22ab632304e8e24dd27a8da668efd8e4e859a.png _images/68f3db5fe0df66b3fbdc22761f59550be5b2be9c8208624eea7b920aaff6f3b6.png _images/ec3947f919ca5344e1a583c9641f869eda187321e873c9c42c2b2a7ee448d7e5.png
  • This is an example of a saddle point where the FOC hold, yet the point is not a local maximizer/minimizer!

  • Similar to x=0 in f(x)=x3: derivative is zero, yet the point is not an optimizer

  • How to distinguish saddle points from optima? Key insight: the function has different second order derivatives in different directions!

Second order conditions (SOC)#

  • allow to establish whether the stationary point is a local maximizer/minimizer or a saddle point

  • help to determine whether an optimizer is a maximizer or a minimizer

  • do not give definitive answer in all cases, unfortunately

Fact (necessary SOC)

Let f(x):RNR be a twice continuously differentiable function and let xRN be a local maximizer/minimizer of f. Then:

  • f has a local maximum at xHf(x) is negative semi-definite

  • f has a local minimum at xHf(x) is positive semi-definite

  • recall the definition of semi-definiteness

Fact (sufficient SOC)

Let f(x):RNR be a twice continuously differentiable function. Then:

  • if for some xRN f(x)=0 (FOC satisfied) and Hf(x) is negative definite, then x is a strict local maximum of f

  • if for some xRN f(x)=0 (FOC satisfied) and Hf(x) is positive definite, then x is a strict local minimum of f

  • observe that SOC are only necessary in the “weak” form, but are sufficient in the “strong” form

  • this leaves room for ambiguity when we can not arrive at a conclusion — particular stationary point may be a local maximum or minimum

  • but we can rule out saddle points for sure, in this case neither semi-definiteness nor definiteness can be established, the Hessian is indefinite

Example

Consider a one dimensional function f(x)=(x1)2, f(x)=2x2, Hf(x)=2.

Point x=1 is a stationary point where FOC is satisfied.

Treating Hf(x) as 1×1 matrix, we can see it is positive definite at x=1 (y[2]y=2y2>0 for all y0), therefore x=1 is a strict local minimum of f.

Example

Consider a one dimensional function f(x)=x21, f(x)=2x, Hf(x)=2.

Point x=0 is a stationary point where FOC is satisfied.

Treating Hf(x) as 1×1 matrix, we can see it is positive definite at x=0 (y[2]y=2y2>0 for all y0), therefore x=0 is a strict local minimum of f.

Example

Consider a one dimensional function f(x)=(x1)4, f(x)=4(x1)3, Hf(x)=12(x1)2.

Point x=1 is a stationary point where FOC is satisfied.

Treating Hf(x) as 1×1 matrix, we can see it is positive semi-definite (and negative semi-definite) at x=1 (y[0]y=00 and y[0]y=00 for all yR), therefore at x=1 function f may have a local minimum. But may have a local maximum as well. No definite conclusion! (In reality it is a local and global minimum)

Example

Consider a one dimensional function f(x)=(x+1)3, f(x)=3(x+1)2, Hf(x)=6(x+1).

Point x=1 is a stationary point where FOC is satisfied.

Treating Hf(x) as 1×1 matrix, we can see it is positive semi-definite (and negative semi-definite) at x=1 (y[0]y=00 and y[0]y=00 for all yR), therefore at x=1 function f may have a local minimum. But may have a local maximum as well. No definite conclusion! (In reality it is neither local minimum nor maximum)

Establishing definiteness of Hessian in R2 case#

Recall that a symmetric matrix A is

  • positive definite all eigenvalues are strictly positive

  • negative definite all eigenvalues are strictly negative

  • nonpositive definite all eigenvalues are nonpositive

  • nonnegative definite all eigenvalues are nonnegative

  • indefinite there are both positive and negative eigenvalues

Let H be a 2×2 matrix eigenvalues are roots of a quadratic equation

H=(abcd)HλI=(aλbcdλ)
det(HλI)=(aλ)(dλ)bc=λ2(a+d)λ+(adbc)=λ2trace(H)λ+det(H)

Hence the the two eigenvalues λ1 and λ2 of H are given by the two roots of

λ2trace(H)λ+det(H)=0

From the Viets’s formulas for a quadratic polynomial we have

λ1+λ2=trace(H)λ1λ2=det(H)

Applying this result to a Hessian of a function f:R2R we have

Fact

Given a twice continuously differentiable function f:R2R and a stationary point x:f(x)=0, the second order conditions provide:

  1. if det(Hf(x))>0 and trace(Hf(x))>0

    • λ1>0 and λ2>0,

    • Hf(x) is positive definite,

    • f has a strict local minimum at x

  2. if det(Hf(x))>0 and trace(Hf(x))<0

    • λ1<0 and λ2<0,

    • Hf(x) is negative definite,

    • f has a strict local maximum at x

  3. if det(Hf(x))=0 and trace(Hf(x))>0

    • λ1=0 and λ2>0,

    • Hf(x) is positive semi-definite (nonnegative definite),

    • f may have a local minimum x

    • undeceive!

  4. if det(Hf(x))=0 and trace(Hf(x))<0

    • λ1=0 and λ2<0,

    • Hf(x) is negative semi-definite (nonpositive definite),

    • f may have a local maximum x

    • undeceive!

  5. if det(Hf(x))<0

    • λ1 and λ2 have different signs,

    • Hf(x) is indefinite,

    • x is a saddle point of f

Example

Consider a two dimensional function f(x)=(x11)2+x1x22

f(x)=(2(x11)+x22,2x1x2),Hf(x)=(2,2x22x2,2x1)

Point x1=(1,0) is a stationary point where FOC is satisfied.

Hf(x)=(2,00,2),det(Hf(x))=4>0,trace(Hf(x))=4>0

Therefore at x1=(1,0) function f has a strict local minimum.

Point x2=(1+22,2) is also a stationary point where FOC is satisfied.

Hf(x)=(2,2222,22),det(Hf(x))=422<0

Therefore at x2=(1,0) function f has a saddle point.

Point x3=(122,2) is yet another stationary point where FOC is satisfied.

Hf(x)=(2,2222,2+2),det(Hf(x))=224<0

Therefore again, at x3=(1,0) function f has a saddle point.

References and reading#

References
  • Simon & Blume: 13.1, 13.2, 13.3, 14.1, 14.3, 14.4, 14.5, 14.8, whole of chapter 17

  • Sundaram: 1.4, 1.5, 2.1, 2.2, 4.1 to 4.4