Processing math: 100%

Mini Project 2

Watchanan Chantapakul (wcgzm)


A two-class dataset has gaussian likelihood functions and priors P(ω1)=4P(ω2). Let the parameters of the likelihoods be μ1=(71), μ2=(17) and Σ1=Σ2=[3.1002.6]

Question A

a) Write a generic Matlab function1 to compute the Mahalanobis distances between two arbitrary samples x1 and x2 or the distance between a sample x1 and the center of any given Gaussian distribution with covariance Σ, mean μ, and dimension d.

1 you may use any computer language/package, but you may NOT use any function other than the basic operations: i.e. +, -, *, / (for scalars, vectors, or matrices)

Solution

(i,j)-Minor of a Matrix

Mi,j=det((Ap,q)pi,qj)=|(Ap,q)pi,qj|

Determinant

det(A)=|A|=i=0,j(1)i+jAi,jMi,j=i=0,j(1)jA0,jM0,j

Matrix of Cofactors

Ci,j=(1)i+jMi,j

Transpose

ATi,j=Aj,i

Adjugate Matrix

adj(A)=CT

Inverse Matrix

A1=|A|1adj(A)=adj(A)|A|

Identity matrix

Iij={1i=j0otherwise

Mahalanobis Distance

r2=(xy)TΣ1(xy)

Question B

b) Write another Matlab function1 to call the function above and compute the discriminant function with the following generic form gi(x)=12(xμi)tΣ1i(xμi)d2ln(2π)12ln|Σi|+lnP(ωi) also for any given d dimensional data, mean, covariance matrix and prior probabilities.

1 you may use any computer language/package, but you may NOT use any function other than the basic operations: i.e. +, -, *, / (for scalars, vectors, or matrices)

Discriminant function gi()

Question C

c) write a Matlab program that generates (say, 1000) samples from the two classes with the parameters in part a); and plot the two classes in 3D. (your plot should be similar to figure 2.10 (b) in the textbook). The class samples above MUST be created from a Gaussian distribution with N( 0,I) (ie. use the concept of whitening in an inverse manner).2

2 That is, do NOT use a Matlab Toolbox or any other library function, to generate the distributions above directly from the parameters in part a). You MUST do a “dewhitening” instead. In that case, the following Matlab functions can still be useful for this assignment: randn(), peaks(), meshgrid(), surf(), and mesh()

Eigenvalue λ

Eigenvector ϕ

Cross-check with the example in the lecture note

Creating random samples

Now, we create 5,000 samples for each class.

Overlay all samples of all classes in the same figure.

Whitening Transformation

Y=Λ12ΦTX

Dewhitening Transformation

Λ12Y=Λ12Λ12ΦTXΛ12Y=IΦTXΛ12Y=ΦTX(ΦT)1Λ12Y=(ΦT)1ΦTX(ΦT)1Λ12Y=IX(ΦT)1Λ12Y=XΦTΛ12Y=XΦΛ12Y=X

Question D

d) derive the decision boundary and plot this boundary on top of the generated samples.

Discriminant function:

gi(x)=12(xμi)tΣ1i(xμi)d2ln(2π)12ln|Σi|+lnP(ωi)

Decision boundary:

gi(x)=gj(x)12(xμi)tΣ1i(xμi)d2ln(2π)12ln|Σi|+lnP(ωi)=12(xμj)tΣ1j(xμj)d2ln(2π)12ln|Σj|+lnP(ωj)12(xμi)tΣ1i(xμi)12ln|Σi|+lnP(ωi)=12(xμj)tΣ1j(xμj)12ln|Σj|+lnP(ωj)

From Σ1=Σ2, 12(xμi)tΣ1i(xμi)+lnP(ωi)=12(xμj)tΣ1j(xμj)+lnP(ωj) 12[xTΣ1ix2μTiΣ1ix+μTiΣ1iμi]+lnP(ωi)=12[xTΣ1jx2μTjΣ1jx+μTjΣ1jμj]+lnP(ωj) 12xTΣ1ix+122μTiΣ1ix12μTiΣ1iμi+lnP(ωi)=12xTΣ1jx+122μTjΣ1jx12μTjΣ1jμj+lnP(ωj) From Σ1=Σ2, 122μTiΣ1ix12μTiΣ1iμi+lnP(ωi)=122μTjΣ1jx12μTjΣ1jμj+lnP(ωj) μTiΣ1ix12μTiΣ1iμi+lnP(ωi)=μTjΣ1jx12μTjΣ1jμj+lnP(ωj) Let i=1 and j=2, and from P(ω1)=4P(ω2), we get μTiΣ11x12μT1Σ11μ1+ln4P(ω2)=μT2Σ12x12μT2Σ12μ2+lnP(ω2) μTiΣ11x12μT1Σ11μ1+ln4P(ω2)lnP(ω2)=μT2Σ12x12μT2Σ12μ2 μTiΣ11x12μT1Σ11μ1+ln4P(ω2)P(ω2)=μT2Σ12x12μT2Σ12μ2 μTiΣ11x12μT1Σ11μ1+ln4=μT2Σ12x12μT2Σ12μ2 From μ1=[71], μ2=[17], From Σ1=Σ2=[3.1002.6], [71][3.1002.6]1x12[71][3.1002.6]1[71]+ln4=[17][3.1002.6]1x12[17][3.1002.6]1[17]

[71][13.10012.6]x12[71][13.10012.6][71]+ln4=[17][13.10012.6]x12[17][13.10012.6][17][71][13.10012.6][x1x2]12[71][13.10012.6][71]+ln4=[17][13.10012.6][x1x2]12[17][13.10012.6][17][71][13.10012.6][x1x2]12[71][73.112.6]+ln4=[17][13.10012.6][x1x2]12[17][13.172.6][71][13.10012.6][x1x2]12(773.1+112.6)+ln4=[17][13.10012.6][x1x2]12(113.1+772.6)
[71][13.10012.6][x1x2]8.0955+ln4=[17][13.10012.6][x1x2]9.5844
[71][13.10012.6][x1x2]+2.8751=[17][13.10012.6][x1x2][71][x13.1x22.6]+2.8751=[17][x13.1x22.6]7x13.1+x22.6+2.8751=x13.1+7x22.6

Multiply both sides by 3.1×2.6=8.06, (2.67x1)+(3.1x2)+(8.062.8751)=(2.6x1)+(3.17x2)

18.2x1+3.1x2+23.1733=2.6x1+21.7x215.6x1+23.1733=18.6x2
0.8387x1+1.2459=x2

Likelihood

3D Plot

The way I do this plot is to take the maximum likelihood of all classes at a point (x1,x2). So, instead of plotting two surfaces separately, I combine them into one single surface. It is then colored based on the values of discriminant function.

Note that I intentionally do not put the scatter plot of all samples on the z=0 plane as it looks too messy.

Let's take a look at the 3D plot from the bottom.

Here is the view when we look at z=0 plane as a line.

Question E

e) plot the posterior probabilities.

Posterior probability

P(ωi|x)=p(x|ωi)P(ωi)p(x)

where the evidence is defined as:

p(x)=Cj=1p(x|ωi)P(ωi)

(in this case C=2)


Question F

f) redo part c), d) and e) using the same parameters except for Σ1=Σ2=[3.10.350.352.6]

Decision boundary

Just like the equation of the previous question, but with different covariance matrix:

[71][0.32760.04410.04410.3906][x1x2]12[71][0.32760.04410.04410.3906][71]+ln4=[17][0.32760.04410.04410.3906][x1x2]12[17][0.32760.04410.04410.3906][17]
[71][0.32760.04410.04410.3906][x1x2]7.9118+ln4=[17][0.32760.04410.04410.3906][x1x2]9.4236
[71][0.32760.04410.04410.3906][x1x2]+2.8981=[17][0.32760.04410.04410.3906][x1x2][71][0.3276x10.0441x20.0441x1+0.3906x2]+2.8981=[17][0.3276x10.0441x20.0441x1+0.3906x2]7(0.3276x10.0441x2)+1(0.0441x1+0.3906x2)+2.8981=1(0.3276x10.0441x2)+7(0.0441x1+0.3906x2)
2.2932x10.3087x20.0441x1+0.3906x2+2.8981=0.3276x10.0441x20.3087x1+2.7342x20.3087x2+0.3906x2+0.0441x22.7342x2+2.8981=0.3276x10.3087x12.2932x1+0.0441x1(0.3087+0.3906+0.04412.7342)x2=(0.3276+0.30872.2932+0.0441)x12.8981
2.6082x2=2.2302x12.8981
x2=0.8551x1+1.1111

Question G

g) redo part c), d) and e) for Σ1=[2.11.51.53.8], Σ2=[3.10.350.352.6] and P(ω1)=2×P(ω2)

Decision boundary

gi(x)=gj(x)12xTΣ1ix+122μTiΣ1ix12μTiΣ1iμi12ln|Σi|+lnP(ωi)=12xTΣ1jx+122μTjΣ1jx12μTjΣ1jμj12ln|Σj|+lnP(ωj)12xTΣ11x+μT1Σ11x12μT1Σ11μ112ln|Σ1|+ln2P(ω2)P(ω2)=12xTΣ12x+μT2Σ12x12μT2Σ12μ212ln|Σ2|12xTΣ11x+μT1Σ11x12μT1Σ11μ112ln|Σ1|+ln2=12xTΣ12x+μT2Σ12x12μT2Σ12μ212ln|Σ2|xTΣ11x+2μT1Σ11xμT1Σ11μ1ln|Σ1|+2ln2=xTΣ12x+2μT2Σ12xμT2Σ12μ2ln|Σ2|
[x1x2][0.663176270.26178010.26178010.36649215][x1x2]+2[71][0.663176270.26178010.26178010.36649215][x1x2][71][0.663176270.26178010.26178010.36649215][71]1.7457+2ln2=[x1x2][0.327559060.044094490.044094490.39055118][x1x2]+2[17][0.327559060.044094490.044094490.39055118][x1x2][17][0.327559060.044094490.044094490.39055118][17]2.0716
[x1x2][0.663176270.26178010.26178010.36649215][x1x2]+2[71][0.663176270.26178010.26178010.36649215][x1x2]29.19721.7457+2ln2=[x1x2][0.327559060.044094490.044094490.39055118][x1x2]+2[17][0.327559060.044094490.044094490.39055118][x1x2]18.84722.0716
[x1x2][0.66317627x10.2617801x20.2617801x1+0.36649215x2]+2[71][0.66317627x10.2617801x20.2617801x1+0.36649215x2]8.6378=[x1x2][0.32755906x10.04409449x20.04409449x10.39055118x2]+2[17][0.32755906x10.04409449x20.04409449x1+0.39055118x2]

(0.66317627x210.2617801x22+0.2617801x21+0.36649215x22)+2[71][0.66317627x10.2617801x20.2617801x1+0.36649215x2]8.6378=(0.32755906x210.04409449x22+0.04409449x21+0.39055118x22)+2[17][0.32755906x10.04409449x20.04409449x1+0.39055118x2]

(0.66317627x210.2617801x220.2617801x21+0.36649215x22)+14(0.66317627x10.2617801x2)+2(0.2617801x1+0.36649215x2)8.6378=(0.32755906x210.04409449x220.04409449x21+0.39055118x22)+2(0.32755906x10.04409449x2)+14(0.04409449x1+0.39055118x2)
0.66317627x21+0.2617801x22+0.2617801x210.36649215x22+9.2845x13.6649x20.5236x1+0.7330x28.6378=0.32755906x21+0.04409449x22+0.04409449x210.39055118x22+0.6551x10.0882x20.6173x1+5.4677x2
0.1179x21+0.2417x22+8.7231x18.3114x28.6378=0

Solution y:

  1. x20.000413736(415572849643x21210837327x1+1935759875)

  2. x20.000413736(2849643x21210837327x1+1935759875+41557)