Mini Project 1

Watchanan Chantapakul (wcgzm)


Part A: Original feature space and Euclidean distance

Given the dataset with four-classes (you can download from the link provided on Canvas as .mat) where each class follows a specific distribution:

1. Estimate the mean and covariance of each class distribution using a library function (i.e. Matlab toolbox, or Python statistics package, etc.). Report on their values.

Compute means $\mu$

$$\vec{\mu_c}\mathbf(x) = \frac{1}{N}\sum_{i=1}^{N_c}\vec{x_i}$$

Compute variances $\sigma^2$

$$ \vec{\sigma^2_c}(\mathbf{x}) = \frac{1}{N-ddof} \sum_{i=1}^{N_c}(\vec{x_i}-\vec{\mu_c})^2 $$

Note: numpy computes variance depending on delta degrees of freedom (ddof). So, in order to compute a sample variance, ddof must be set to 1.

Check that the computed eigenvectors are perpendicular to each other.

2. Plot the data in each of the four classes using different colors and display their eigen-vectors.

3. Consider the following four test samples in the table below 1:

Test Samples x-value y-value
s1 2.3 1.9
s2 7 -0.3
s3 10 0.5
s4 -1.2 0.6
Table 1: Test Samples to be classified

(a) On the same previous plot, display the four test samples.

Ellipse equation

$$ x(\alpha)=\sigma^2_{c,x} \cos(\alpha)\cos(\theta) − \sigma^2_{c,y} \sin(\alpha)\sin(\theta) + \mu_{c,x} $$$$ y(\alpha)=\sigma^2_{c,x} \cos(\alpha)\sin(\theta) + \sigma^2_{c,y} \sin(\alpha)\cos(\theta) + \mu_{c,y} $$

Here, the angle $\alpha$ is computed from: $$ \alpha = \operatorname{arctan} \left( \frac{\vec{\phi_{i, 2}}}{\vec{\phi_{i, 1}}} \right) $$

The length of an eigenvector is scaled by the factor of the associated eigenvalue $\lambda_i$. Thus, an axis of a distribution is $ \lambda_i \vec{\phi_i} $.

(b) Compute the Euclidean distances $d(\mu_i, s_j)$ between the center of each class $i = 1, 2, 3, 4$ and the test samples $j = 1, 2, 3, 4.$

$$ d(\mu_c, s_j) = ||\vec{\mu_c} - \vec{s_j}||_2 $$

Classification based on Euclidean distance

$$ \omega^* = \underset{i}{\operatorname{argmin}} d(\vec{\mu_c}, \vec{s_j}) $$

In the plotted figures belolw, the purple line indicates the minimum distance popped out from the other three distances.

(c) Classify the test samples accordingly and report the results in the following table 2:

Test Samples d($\mu_1$,$s_j$) d($\mu_2$,$s_j$) d($\mu_3$,$s_j$) d($\mu_4$,$s_j$) Class Assignment
s1 8.2553 6.7662 5.8157 4.2925 class 4
s2 6.0986 2.7688 9.1359 9.4408 class 2
s3 4.5482 4.7291 12.2354 12.1118 class 1
s4 11.9873 8.8865 2.7610 2.4548 class 4
Table 2: Euclidean distances and classification results in the original feature space

Part B: Whitened space and Euclidean distance

1. Apply a whitening transformation to the data in each of the classes according to their own parameters (i.e. Mean and Covariance)

Whitened mean

$$ \vec{\mu_{W, c}}(\mathbf{x}) = \Lambda^{-\frac{1}{2}}_c \Phi^\text{T}_c \vec{\mu_{x, c}} $$

Whitened covariance

$$ \Sigma_{W, c} = \Lambda^{-\frac{1}{2}}_c \Phi^\text{T}_c \Sigma_{c} \Phi_c \Lambda^{-\frac{1}{2}}_c = I $$

The whitened covariance matrix $\Sigma_w$ becomes an identity matrix as it is transformed by rotating and squishing. Therefore, $\Phi^\text{T}_c \Sigma_{c} \Phi_c$ is equal to $\Lambda_c$, so that it cancels the other two $\Lambda^{-\frac{1}{2}}_c$'s out. $$ \Phi^\text{T}_c \Sigma_{c} \Phi_c = \Lambda_c $$ $$ \Lambda^{-\frac{1}{2}}_c \Lambda_c \Lambda^{-\frac{1}{2}}_c = I $$

Whitened data sample

We can whiten a data sample just like when we apply whitening transformation to a mean vector $\vec{\mu_{x, c}}$. However, it must be transformed with the corresponding $\Lambda$ and $\Phi$. $$ \vec{x_{W, c}} = \Lambda^{-\frac{1}{2}}_c \Phi^\text{T}_c \vec{x} $$

Notice the values of all whitened covariance matrices. They are identity matrices as stated above. Well, the off diagonal values are not completely zero but they approach zero.

Check that the computed eigenvectors are perpendicular to each other.

Whitened test sample

In order to whiten a test sample, as there are 4 classes, each test sample has to be whitened based on each distribution one at a time.

$$ \vec{s_j} = \Lambda^{-\frac{1}{2}}_i \Phi^\text{T}_i \vec{s_j} $$

Classification based on Euclidean distance in the whitened spaces

$$ \omega^* = \underset{i}{\operatorname{argmin}} d(\vec{\mu_c}, \vec{s_j}) $$

2. Repeat questions A.1, A.2 and A.3. but this time using the whitened data and whitened testing samples and report the results in the following table 3:

Test Samples d($\mu_1$,$s_j$) d($\mu_2$,$s_j$) d($\mu_3$,$s_j$) d($\mu_4$,$s_j$) Class Assignment
s1 4.4632 8.2269 3.6661 3.0176 class 4
s2 2.7344 2.5084 3.4876 6.5090 class 2
s3 2.4367 2.6259 4.6832 8.6817 class 1
s4 6.5137 10.3474 2.5280 3.0785 class 3
Table 3: Euclidean distances and classification results in the whitened space

Part C: Original feature space and Mahalanobis distance

1. Using the original dataset from Part A (ie. before whitening), repeat question A.3 using the Mahalanobis distances instead of the Euclidean $r(\mu_i, s_j)$ and report the results in the following table 4.

Mahalanobis distance

Mahalanobis distance is a way for computing the distance from a pattern to a distribution. It is preferable because we can perform the computation in the original feature space, i.e., we don't have to apply whitening transformation to the data. It is defined as follows: $$ r^2(\vec{\mu_c}, \vec{x_j}) = (\vec{x_j} - \vec{\mu_c})^{\text{T}} \Sigma_c^{-1} (\vec{x_j} - \vec{\mu_c}) $$

Classification based on Mahalanobis distance in the original feature spaces

$$ \omega^* = \underset{i}{\operatorname{argmin}} r(\vec{\mu_c}, \vec{s_j}) $$
Test Samples d($\mu_1$,$s_j$) d($\mu_2$,$s_j$) d($\mu_3$,$s_j$) d($\mu_4$,$s_j$) Class Assignment
s1 4.4632 8.2269 3.6661 3.0176 class 4
s2 2.7344 2.5084 3.4876 6.5090 class 2
s3 2.4367 2.6259 4.6832 8.6817 class 1
s4 6.5137 10.3474 2.5280 3.0785 class 3
Table 4: Mahalanobis distances and classification results in the original feature space

2. Compare Tables 2, 3, 4 and comment on the classification results.


Report:

  1. Write and submit a Mini-Project Report 1 containing the answers to all the questions above, including a discussion on the results – i.e. the mean and covariance before and after the whitening; the class assignments in all three cases; etc.
  2. Submit your implementations.
Test Samples x-value y-value 2-norm 2-norm on whitened space Mahalanobis distance
s1 2.3 1.9 class 4 class 4 class 4
s2 7 -0.3 class 2 class 2 class 2
s3 10 0.5 class 1 class 1 class 1
s4 -1.2 0.6 class 4 class 3 class 3
Table 5: Classification resuts of four test samples based on different methods.

The classification results from using three approaches of the test samples $s_3, s_2, s_3$ are the same. They are classified as class 4, 2, and 1, respectively. Interestingly, the test sample $s_4$ is different. In the original feature space, it is classified by comparing euclidean distances to 4 classes, it turns out that $s_4$ is classified as class 4. But when it comes to the other two methods (whitening transformation and Mahalanobis distance), it is rather classified as class 3. For the Mahalanobis distances between the test sample $s_4$ and class distributions 3 and 4, they are really close. We obviously cannot use Euclidean distance here as the data of classes 3 and 4 are not aligned together. They spread in different directions.

Comparing mean vectors

Comparing covariance matrices

Whitened covariance matrices = identity matrices

This means that our covariance matrices become identity matrices. It is the result from applying whitening transformation to the origin space. The data will spread in every direction equally. We can also see from the unit standard deviation of each class distribution.

Comparing 3 distances

Why are they the same?

We can prove that a Euclidean distance in the whitened space equals a Mahalanobis distance.

Solve for $\Sigma_c$ based on the equation arose from whitening transformation. $$ \Phi^\text{T}_c \Sigma_{c} \Phi_c = \Lambda_c $$ $$ (\Phi^\text{T}_c)^{-1} \Phi^\text{T}_c \Sigma_{c} \Phi_c = (\Phi^\text{T}_c)^{-1} \Lambda_c $$ $$ \Sigma_{c} \Phi_c = (\Phi^\text{T}_c)^{-1} \Lambda_c $$ $$ \Sigma_{c} \Phi_c \Phi_c^{-1} = (\Phi^\text{T}_c)^{-1} \Lambda_c \Phi_c^{-1} $$ $$ \Sigma_{c} = (\Phi^\text{T}_c)^{-1} \Lambda_c \Phi_c^{-1} $$ $$ \Sigma_{c} = (\Phi^\text{T}_c)^\text{T} \Lambda_c \Phi_c^\text{T} $$ $$ \Sigma_{c} = (\Phi^\text{T}_c) \Lambda_c^\text{T} \Phi_c $$ $$ \Sigma_{c} = \Phi^\text{T}_c \Lambda_c \Phi_c $$

Substitute $\Sigma_c$ to the Mahalanobis distance equation. $$ r^2(\vec{\mu_c}, \vec{x_j}) = (\vec{x_j} - \vec{\mu_c})^{\text{T}} \Sigma_c^{-1} (\vec{x_j} - \vec{\mu_c}) $$ $$ = (\vec{x_j} - \vec{\mu_c})^{\text{T}} (\Phi_c \Lambda_c \Phi^\text{T}_c)^{-1} (\vec{x_j} - \vec{\mu_c}) $$ $$ = (\vec{x_j} - \vec{\mu_c})^{\text{T}} (\Phi_c \Lambda_c^{-1} \Phi^\text{T}_c) (\vec{x_j} - \vec{\mu_c}) $$ $$ = (\vec{x_j} - \vec{\mu_c})^{\text{T}} (\Phi_c \Lambda_c^{-\frac{1}{2}} \Lambda_c^{-\frac{1}{2}} \Phi^\text{T}_c) (\vec{x_j} - \vec{\mu_c}) $$ $$ = (\vec{x_j} - \vec{\mu_c})^{\text{T}} (\Phi_c \Lambda_c^{-\frac{1}{2}}) (\Lambda_c^{-\frac{1}{2}} \Phi^\text{T}_c) (\vec{x_j} - \vec{\mu_c}) $$ $$ = [(\Phi_c \Lambda_c^{-\frac{1}{2}})^{\text{T}} ((\vec{x_j} - \vec{\mu_c})^{\text{T}})^{\text{T}}]^{\text{T}} [(\Lambda_c^{-\frac{1}{2}} \Phi^\text{T}_c) (\vec{x_j} - \vec{\mu_c})] $$ $$ = [(\Phi_c \Lambda_c^{-\frac{1}{2}})^{\text{T}} (\vec{x_j} - \vec{\mu_c})]^{\text{T}} [(\Lambda_c^{-\frac{1}{2}} \Phi^\text{T}_c) (\vec{x_j} - \vec{\mu_c})] $$ $$ = [(\vec{x_{W, j}} - \vec{\mu_{W, c}})]^{\text{T}} [(\vec{x_{W, j}} - \vec{\mu_{W, c}})] $$ $$ = ||\vec{\mu_{W, c}} - \vec{x_{W, j}}||_2^2 $$ $$ = d^2(\vec{\mu_{W, c}}, \vec{x_{W, j}}) $$

Take the square root out.

$$ r^2(\vec{\mu_c}) = d^2(\vec{\mu_{W, c}}) $$$$ r(\vec{\mu_c}) = d(\vec{\mu_{W, c}}) $$