Exercise set F#

Please, see the general comment on the tutorial exercises

Question F.1#

Consider a function f:RNxxBxR, where N×N matrix B is square but not symmetric. Show that the same function can be represented as xAx where A is symmetric.

Hint

Recall that the definition of a quadratic form calls for the symmetry of matrix A. This exercise shows that this assumption is without loss of generality.

Hint

Given a square matrix M, you can use the identity M=12(M+M)+12(MM) where the first component is symmetric and the second is not symmetric.

Question F.2#

Consider a quadratic form f:RNxxAxR, where N×N matrix A is symmetric.

Using the product rule of multivariate calculus, derive the gradient and Hessian of f. Make sure that all multiplied vectors and matrices are conformable.

Hint

You can assume that x is a column vector, and that any vector function of x is also a column vector.

Question F.3#

This exercise takes you on a tour of a binary logit model and its properties.

Consider a model when a decision maker is making a choice between J=2 two alternatives, each of which has a scalar characteristic xjR, j=1,2. Econometrician observes data on these characteristics, the choice made by the decision maker yi{0,1} and an attribute of the decision maker, ziR. The positive value of yi denotes that the first alternative was chosen. The data is indexed with i and has N observations, i.e. i{1,,N}.

To rationalize the data the econometrician assumes that the utility of each alternative is given by a scalar product of a vector of parameters βR2 and a vector function h:R2R2 of alternative and decision maker attributes. Let

h:(xz)(xxz)

In line with the random utility model, the econometrician also assumes that the utility of each alternative contains the additively separable random component which has an appropriately centered type I extreme value distribution, such that the choice probabilities for the two alternatives are given by a vector function p:R2(0,1)R2

p:(u1u2)(exp(u1)exp(u1)+exp(u2)exp(u2)exp(u1)+exp(u2))

In order to estimate the vector of parameters of the model β, the econometrician maximizes the likelihood of observing the data D=({xj}j{1,2},{zi,yi}i{1,,N}). The log-likelihood function logL:R2+J+2NR is given by

logL(β,D)=i=1Ni(β,x1,x2,zi,yi),

where the individual log-likelihood contribution is given by a scalar product function i:R6R

i(β,x1,x2,zi,yi)=(yi1yi)log(p(βh(x1,zi)βh(x2,zi)))

Assignments:

  1. Write down the optimization problem the econometrician is solving. Explain the meaning of each part.

    • What are the variables the econometrician has control over in the estimation exercise?

    • What variables should be treated as parameters of the optimization problem?

  2. Elaborate on whether the solution can be guaranteed to exist.

    • What theorem should be applied?

    • What conditions of the theorem are met?

    • What conditions of the theorem are not met?

  3. Derive the gradient and Hessian of the log-likelihood function. Make sure that all multiplied vectors and matrices are conformable.

  4. Derive conditions under which the likelihood function has a unique maximizer (and thus the logit model has a unique maximum likelihood estimator).

Solutions

Question F.1

See the last anwer in this math stackexchange post

Question F.2

A possible answer:

Represent the quadratic form as a dot product of two functions f(x)=xAx=h(x)g(x), where h(x)=x and g(x)=Ax. Then Dh(x)=I (identity matrix) and Dg(x)=A.

The last Jacobian can be easity derived by representing matrix multiplication as a linear combination of columns. Differentiating with respect to each element of x then yields a Jacobian composed of columns of matrix A, therefore equal to it.

g(x)=Ax=(a11a1NaN1aNN)(x1xN)=(a11aN1)x1+(a1NaNN)xN

Applying the dot product rule of differentiation we have

D(hg)(x)=[h(x)]Dg(x)+[g(x)]Dh(x)==xA+[Ax]I=xA+xA=2xA=2[Ax]

The last transformation is transpose of a product + utilizing symmetry of A.

The final answer is the 1×N matrix (row vector) Df(x)=f(x)=2xA=2[Ax].

Question F.3

  1. The optimization problem is:

maxβR2i=1nli(β,x1,x2,zi,yi).
  • We can control β=[β1,β2] (coefficients to be estimated).

  • We treat x1,x2,(zi,yi)i=1n as parameters (data).

  1. We can try to apply Weierstrass theorem.

  • The objective function is continuous.

  • But the domain is not compact (R2 is closed but unbounded).

  1. Denote Dβli as the Jacobian of li with respect to β, Hβli as the Hessian of li with respect to β.

Notice that li(β)=li(pi(ui(β))), then by the chain rule:

Dβli=Dpili Duipi Dβui.

We calculate the three terms on the r.h.s. one by one:

Dpili=[yi/pi1(1yi)/pi2],Duipi=[pi1pi2pi1pi2pi1pi2pi1pi2]=pi1pi2[1111],Dβui=[x1x1zix2x2zi].

Thus,

Dβli=[yi/pi1(1yi)/pi2]pi1pi2[1111][x1x1zix2x2zi],=(x1x2)[yipi2(1yi)pi1yipi2z(1yi)pi1z].

The Jacobian (gradient) of the MLE objective function is:

Dβl=i=1nDβli=(x1x2)i=1n[yipi2(1yi)pi1yipi2z(1yi)pi1z]

We set h(βi)(Dβli). Then h(β)=h(pi(ui(β))) Thus, by applying the chain rule again, Hessian of li with respect to β is:

Hβli=Dβh(β)=Dpih(pi) Duipi Dβui=(x1x2)[(1yi)yi(1yi)ziyizi]pi1pi2[1111][x1x1zix2x2zi]=(x1x2)2pi1pi2[1zizizi2]

Thus, the Hessian of the MLE objective function is:

Hβl=i=1nHβli=(x1x2)2i=1npi1pi2[x1x1zix2x2zi]=(x1x2)2[i=1npi1pi2i=1npi1pi2zii=1npi1pi2zii=1npi1pi2zi2]
  1. If the Hessian Hβl is negative definite for all βR2, then det(Hβl)>0 always holds, we know there must be at least one solution to the first order conditions Dβl=0 by the Inverse function theorem. Moreover, if the Hessian is negative definite, then the MLE objective is strictly concave, i.e., there is a unique maximizer of the log likelihood function, which is the unique solution to the first order conditions.

Let us find the conditions under which the Hessian is negative definite. Notice that (x1x2)2>0 if and only if x1x2, and pi1pi2>0 for all i. Thus, if x1x2, we only need to check the condition det(Hβl)>0. Notice that:

det(Hβl)>0(ipi1pi2)(ipi1pi2zi2)(ipi1pi2zi)2>0.i,jpi1pi2pj1pj2zj2i,jpi1pi2pj1pj2zizj>0i>jpi1pi2pj1pj2(zi2+zj2)i>jpi1pi2pj1pj2(2zizj)>0i>jpi1pi2pj1pj2(zizj)2>0.zizj for some i,j.

Thus, we get a sufficient condition for the unique maximizer:

  • x1x2,

  • zizj for some i,j.

It’s easy to show this condition is also a necessary condition. Since if x1=x2, β1 has no impact on the likelihood function, and if zi=zj for all i,j, we cannot distinguish β1 from β2 (this is called strict multicollinearity in econometrics).

Thus, the logit model has a unique ML estimator if and only if x1x2, and zizj for some i,j.

The intuition is that if the model can be estimated (identifiable in econometrics jargon), the two alternatives cannot be the same (x1x2), and at least two people in the data set should have different characteristics (zizj for some i,j).