SML_Assignment1

Task 1: Machine Learning Introduction

1a) Model Fitting

we can use linear model (classification)
SML_Assignment1
The right classification is

striped circle triangle
10 1
black circle triangle
11 1

but we get

striped circle triangle
11 1
black circle triangle
10 1

Because linear model (classification) cannot precisely classify the training set.

Task 2: Linear Algebra Refresher

2a) Matrix Properties

1.Disprove of commutative properties of matrix
Reference: https://proofwiki.org/wiki/Matrix_Multiplication_is_not_Commutative
For an order n square matrix D, let D′ be the square matrix of order n+1 defined as:
\(d_{ij}‘ = \begin{cases} d_{ij}: & i<n+1 \land j<n+1 \0: & i=n+1 \lor j=n+1 \end{cases}\)
Thus D′ is just D with a zero row and zero column added at the ends.
We have that D is a submatrix of D′.
Now:
\((a‘b‘)_{ij} = \begin{cases} \sum_{r=1}^{n+1} a‘_{ir}b‘_{rj}, & i<n+1 \land j<n+1 \0, & i=n+1 \lor j=n+1 \end{cases}\)
But:
\(\sum_{r=1}^{n+1} a‘_{ir}b‘_{rj}=a‘_{i(n+1)}b‘_{(n+1)i}+\sum_{r=1}^n a‘_{ir}b‘_{rj} \\ \qquad\qquad\quad=\sum_{r=1}^n a_{ir}b_{rj}\)
and so:
\(\mathbf{A‘B‘}(n+1,n+1)=(\mathbf{AB})‘(n+1,n+1) \\qquad\qquad\qquad\qquad=\mathbf{AB} \\qquad\qquad\qquad\qquad\neq \mathbf{BA} \\qquad\qquad\qquad\qquad= (\mathbf{BA})‘(n+1,n+1) \\qquad\qquad\qquad\qquad= \mathbf{B‘A‘}(n+1,n+1) \)
Thus it is seen that:
\(\exists \mathbf{A‘},\mathbf{B‘}\in \mathcal{M_{n+1\times n+1}}:\mathbf{A‘B‘}\neq\mathbf{B‘A‘}\)
So \(P(k)\Rightarrow P(k+1)\) and the result follows by the Principle of Mathematical Induction
Therefore:
\(\exists \mathbf{A},\mathbf{B}\in\mathcal{M_R(n)}:\mathbf{AB}\neq\mathbf{BA}\)
2.Prove of distributive properties of matrix
Reference:https://proofwiki.org/wiki/Matrix_Multiplication_Distributes_over_Matrix_Addition
Let \(\mathbf{A}=[a]_{nn},\mathbf{B}=[b]_{nn},\mathbf{C}=[c]_{nn}\) be matrices over a ring \((\mathit{R},+,\circ)\)
Consider \(\mathbf{A}(\mathbf{B}+\mathbf{Cd})\)
Let \(\mathbf{R}=[r]_{nn}=\mathbf{B}+\mathbf{C},\mathbf{S}=[r]_{nn}=\mathbf{A}(\mathbf{B}+\mathbf{C})\)
Let \(\mathbf{G}=[g]_{nn}=\mathbf{A}\mathbf{B},\mathbf{H}=[h]_{nn}=\mathbf{A}\mathbf{C}\)
Then:
\(s_{ij}=\sum_{k=1}^n a_{ik}\circ r_{kj} \r_{kj}=b_{kj}+c_{kj} \\Rightarrow s_{ij}=\sum_{k=1}^n a_{ik}\circ(b_{kj}+c_{kj}) \\qquad =\sum_{k=1}^n a_{ik}b_{kj}+\sum_{k=1}^n a_{ik}c_{kj} \\qquad =g_{ij}+h_{ij}\)
Thus:
\(\mathbf{A}(\mathbf{B}+\mathbf{C})=(\mathbf{A}\mathbf{B})+(\mathbf{A}\mathbf{C})\)
A similar construction shows that:
\((\mathbf{B}+\mathbf{C})\mathbf{A}=(\mathbf{B}\mathbf{A})+(\mathbf{C}\mathbf{A})\)
3.Prove of associative properties of matrix
Reference:https://proofwiki.org/wiki/Matrix_Multiplication_is_Associative
Let \(\mathbf{A}=[a]_{nn},\mathbf{B}=[b]_{nn},\mathbf{C}=[c]_{nn}\) be matrices
From inspection of the subscripts, we can see that both \((\mathbf{A}\mathbf{B})\mathbf{C}\) and \(\mathbf{A}(\mathbf{B}\mathbf{C})\) are defined
Consider\((\mathbf{A}\mathbf{B})\mathbf{C}\)
Let \(\mathbf{R}=[r]_{nn}=\mathbf{A}\mathbf{B},\mathbf{S}=[s]_{nn}=\mathbf{A}(\mathbf{B}\mathbf{C})\)
Then:
\(s_{ij}=\sum_{k=1}^n r_{ik}\circ c_{kj} \r_{ik}=\sum_{l=1}^n a_{il}\circ b_{lk} \\Rightarrow s_{ij}=\sum_{k=1}^n (\sum_{l=1}^n a_{il}\circ b_{lk})\circ c_{kj} \\qquad=\sum_{k=1}^n\sum_{l=1}^n (a_{il}\circ b_{lk})\circ c_{kj}\)
Now consider \(\mathbf{A}(\mathbf{B}\mathbf{C})\)
Let \(\mathbf{R} = [r]_{nn} = \mathbf{B}\mathbf{C},\mathbf{S}=[s]_{nn}=\mathbf{A}(\mathbf{B}\mathbf{C})\)
Then:
\(s_{ij}=\sum_{l=1}^n a_{il}\circ r_{lj} \r_{lj}=\sum_{k=1}^n b_{lk}\circ c_{kj} \\Rightarrow s_{ij}=\sum_{l=1}^n a_{il}(\sum_{k=1}^n \circ b_{lk})\circ c_{kj} \\qquad=\sum_{l=1}^n\sum_{k=1}^n a_{il}\circ(b_{lk}\circ c_{kj})\)
Using Ring Axiom M1: Associativity of Product:
\(s_{ij} = \sum_{k=1}^n\sum_{l=1}^n (a_{il}\circ b_{lk})\circ c_{kj} = \sum_{l=1}^n\sum_{k=1}^n a_{il}\circ(b_{lk}\circ c_{kj})=s‘_{ij}\)
It is concluded that:
\((\mathbf{A}\mathbf{B})\mathbf{C}=\mathbf{A}(\mathbf{B}\mathbf{C})\)

2b) Matrix Inversion (7 Points)

\(A^{-1}=\frac{1}{|A|}A^* \A^*=\begin{bmatrix} A_{11} & A_{21} & A_{31} \A_{12} & A_{22} & A_{32} \A_{13} & A_{23} & A_{33} \end{bmatrix}\)
\(A_{xx}\) is algebraic complement. After calculating:
\(A^*=\begin{bmatrix} c & -a & ad-bc \-1 & 1 & b-d \0 & 0 & c-a \end{bmatrix}\)
\(|A|=c+0+0-(0+0+a)=c-a\)
only when \(c-a\neq 0\), A is invertible, b can be any value.
so
\(A^{-1}=\frac{1}{c-a}\begin{bmatrix} c & -a & ad-bc \-1 & 1 & b-d \0 & 0 & c-a \end{bmatrix}\)
If we change the matrix to
\(A = \begin{bmatrix} 2 & 2 & 3 \0 & 1 & 0 \8 & 3 & 12 \end{bmatrix}\)
\(|A|=2\times1\times2+0+0-3\times1\times8-0-0 = 0\)
When \(|A|=0\), it is not invertible.

2c) Matrix Pseudoinverse

Left-Pseudo Inverse:
\(\color{red}{\mathbf{J‘}}\mathbf{J}=\color{red}{\mathbf{(J^T J)^{-1}J^T}}\mathbf{J}=\mathbf{I_m}\)
Works if J has full column rank
Right-Pseudo Inverse:
\(\mathbf{J}\color{red}{\mathbf{J‘}}=\mathbf{J}\color{red}{\mathbf{J^T(J J^T )^{-1}}}=\mathbf{I_n}\)
Works if J has full row rank
Given \(A \in \mathbb{R}^{2\times3}\)
First calculate dimensionality of left-pseudo
\(\color{red}{\mathbf{(J^T J)^{-1}J^T}}=(\mathbb{R}^{3\times2}\mathbb{R}^{2\times3})^{-1}\mathbb{R}^{3\times2} \\qquad\qquad\quad=\mathbb{R}^{3\times2}\)
Second calculate dimensionality of right-pseudo
\(\color{red}{\mathbf{J^T(J J^T )^{-1}}} = \mathbb{R}^{3\times2}(\mathbb{R}^{2\times3}\mathbb{R}^{3\times2})^{-1} \\qquad\qquad\quad=\mathbb{R}^{3\times2}\)
So left and right pseudo invert exist

2d) Basis Transformation

(1) \(T = wv^{-1}\\quad=\begin{bmatrix} 2 & 3 \3 & 4 \end{bmatrix} \begin{bmatrix} 1 & 0 \0 & 1 \end{bmatrix}^{-1} \\quad=\begin{bmatrix} 2 & 3 \3 & 4 \end{bmatrix}\)
(2) \(v = Yw \Y = \begin{bmatrix} 2 & 3 \3 & 4 \end{bmatrix} \w = Y^{-1}v \=\begin{bmatrix} 2 & 3 \3 & 4 \end{bmatrix}^{-1} \begin{bmatrix} 2 \5 \end{bmatrix} =\begin{bmatrix} -4 & 3 \3 & -2 \end{bmatrix} \begin{bmatrix} 2 \5 \end{bmatrix} =\begin{bmatrix} 7 \-4 \end{bmatrix}\)

Task 3: Statistics Refresher

3a) Expectation and Variance

(1)
Expectation:
\(\mathbb{E}_{x\sim p(x)} = \mathbb{E}_{x}[f] = \mathbb{E}[f] = \begin{cases} \sum_{x} p(x)f(x), & \text{discrete case} \\int p(x)f(x)\,{\rm d}x & \text{continuous case} \end{cases} \)
Variance:
\(var[x] = \mathbb{E}[(x - \mathbb{E}[x])^2] = \mathbb{E}[x^2] - \mathbb{E}[x]^2\)
Expectation is not a linear operator. However variance is a linear operator.
(2)
Estimate the expectation:
\(E(A) = 1\times1+5\times2+6\times3+3\times4+2\times5+1\times6 = 57 \E(B) = 6\times1+1\times2+1\times3+4\times4+1\times5+5\times6 = 62 \E(C) = 3\times1+2\times2+3\times3+3\times4+4\times5+3\times6 = 66\)
Estimate the variance:
\(\mu = 3\), so
\(var(A) = \frac{[(1-3)^2+(5-3)^2+(6-3)^2+(3-3)^2+(2-3)^2+(1-3)^2]}{5} = 4.4 \var(B) = \frac{[(6-3)^2+(1-3)^2+(1-3)^2+(4-3)^2+(1-3)^2+(5-3)^2]}{5} = 5.2 \var(C) = \frac{[(3-3)^2+(2-3)^2+(3-3)^2+(3-3)^2+(4-3)^2+(3-3)^2]}{5} = 0.4\).
(3) don‘t know
To do

3b) It is a Cold World

(1)
p(back): Probability that a person has back pain
p(cold): Probability that a person has a cold
p(ncold): Probability that a person doesn‘t have a cold
(2) \(x\in N\)
(3)
\(p(cold) = 4\% \p(back|cold) = 25\% \p(back|ncold) = 10\%\)
(4)
\(p(cold|back) = \frac{p(cold,back)}{p(back)} \\qquad\qquad=\frac{p(back|cold)p(cold)}{p(back|cold)+p(back|ncold)} \\qquad\qquad=\frac{25\%\times4\%}{25\%+10\%} \\qquad\qquad=2.86\%\)

3c) Cure the Virus

(1)
\(mutated = 42\%+2.6\% = 44.6\% \unmutated = 58\%-2.6\% = 55.4\%\)
\(\begin{array}{c|ccc} State & \text{m} & \tilde{m}\\hline \tilde{m} & 44.6\% & 55.4\% \\end{array}\)
(2)

import numpy as np
import matplotlib.pyplot as plt

# State space
states = ["Mutated","Unmutated"]

# Possible events
transitionName = [["UM","UU"],["MU","MM"]]

# Probilistic matrix
transitionMatrix = [0.446,0.554]



def activity_forecast(days):
    # Choose initial state
    activityToday = "Unmutated"
    print("Start state: " + activityToday)
    # Initial state list
    activityList = [activityToday]
    prob_list = []
    i = 0
    # Calculate the probability of activityList
    prob = 1
    while i != days:
        if activityToday == "Unmutated":
            change = np.random.choice(transitionName[0],replace=True,p=transitionMatrix)
            if change == "UM":
                prob = prob * 0.446
                activityToday = "Mutated"
                activityList.append("Mutated")
                prob_list.append(prob)
                pass
            elif change == "UU":
                prob = prob * 0.554
                activityList.append("Unmutated")
                prob_list.append(prob)
        elif activityToday == "Mutated":
            change = np.random.choice(transitionName[1], replace=True, p=transitionMatrix)
            if change == "MU":
                prob = prob * 0.446
                activityToday = "Unmutated"
                activityList.append("Unmutated")
                prob_list.append(prob)
                pass
            elif change == "MM":
                prob = prob * 0.554
                activityList.append("Mutated")
                prob_list.append(prob)
        i += 1
    print("Possible states: " + str(activityList))
    print("End state after "+ str(days) + " days: " + activityToday)
    print("Probability of the possible sequence of states: " + str(prob))

    x = np.arange(0, 18, 1)
    prob_list = np.array(prob_list)
    plt.plot(x,prob_list)
    plt.show()

# predict states after 18 days
activity_forecast(18)

The result is

Start state: Unmutated
Possible states: [‘Unmutated‘, ‘Unmutated‘, ‘Mutated‘, ‘Unmutated‘, ‘Unmutated‘, ‘Mutated‘, ‘Mutated‘, ‘Unmutated‘, ‘Unmutated‘, ‘Unmutated‘, ‘Unmutated‘, ‘Mutated‘, ‘Unmutated‘, ‘Mutated‘, ‘Mutated‘, ‘Mutated‘, ‘Mutated‘, ‘Unmutated‘, ‘Mutated‘]
End state after 18 days: Mutated
Probability of the possible sequence of states: 3.432429382297346e-06

Plot:
SML_Assignment1
(3) don‘t know
To do

Task 4: Information Theory

(1)
Measure the information in single:
\(h(p_i) = -\log_{2} p_i\)
So
\(h(S_1) = -\log_{2} 0.04 = 4.64 \h(S_2) = -\log_{2} 0.22 = 2.18 \h(S_3) = -\log_{2} 0.67 = 0.58 \h(S_4) = -\log_{2} 0.07 = 3.83 \\\)
(2)
Measure average information:
\(H(p) = \mathbb{E}[h(.)] = \sum_{i} p_i h(p_i) = -\sum_{i} p_i \log_{2} p_i\)
So the entropy is
\(0.04\times4.64 + 0.22\times2.18 + 0.67\times0.58 + 0.07\times3.83 \=1.3219\)

Task 5: Bayesian Decision Theory

(5a) Optimal Boundary

Bayes Decision Theory describes the probability of an event, based on prior knowledge of conditions that might be related to the event.
Its Goal is to minimize the misclassification rate.
Bayes optimal classification based on probability distributions \(p(x|C_k)p(C_k)\)
We compare the result among the magnitude of \(p(x|C_1)\) and \(p(x|C_2)\)
Who is bigger, then we choose it. For example, when \(p(x|C_1)\) is bigger, then we choose \(C_1\)

(5b) Decision Boundaries

\(p(x|C_1) = \frac{p(xC_1)}{p(C_1)} = \frac{p(C_1|x)p(x)}{p(C_1)}\p(x|C_2) = \frac{p(xC_2)}{p(C_2)} = \frac{p(C_2|x)p(x)}{p(C_2)}\)
Because \(p(C_1) = p(C_2)\) and \(\sigma_1 = \sigma_2\)
According to Gaussian Distribution so \(\mu_1 = \mu_2\)
So \(p(x|a) = p(x|b)\)
Then \(p(xa) = p(xb)\)
So \(x^* = \mu_1 \quad \text{or} \quad \mu_2\)

5c) Different Misclassification Costs

lecture_04_bayesian_decision_theory page 17

SML_Assignment1

上一篇:433. 最小基因变化


下一篇:泛型 与 some(Opaque Type)