ECE 595: Machine Learning I
Spring 2019
Homework 1: Linear Algebra and Probability Review
Spring 2019
(Due: Friday, Jan 18, 2019 )
Homework is due at 4:30pm. Please put your homework in the dropbox located at MSEE 330. No late
homework will be accepted.
Objective
As the first homework assignment, we would like you to refresh some of the concepts in the Background
chapter, and have some hands-on experience with Python. Here are the specific objectives.
(a) Familiarize yourself with tools in Python that will be helpful to you in later part of the course. In
addition to basic Python functions and objects, you will gain experience working with functions that
simulate random data sampling from probability distributions, and visualize the data;
(b) Review of some important concepts in linear algebra and probability. Warm up with some proof
techniques that will be used later in the course.
Exercise 1: Installing Python and Getting Started (0 point)
To get started with the homework, please download and install Python on your local machine. Here are
a few steps to guide you through. For additional information, e.g., video demonstrations, please visit our
course website.
(a) If you are a beginner to Python, we suggest you download Anaconda at https://www.anaconda.com/
download/. Follow the instruction and install on your local machine.
(b) Once you have installed Anaconda, open an environment and install Spyder.
代做ECE 595作业、代写Linear Algebra作业、代写Python语言作业、Python程序语言作业调试
(c) Make sure you have standard packages installed: scipy, numpy, matplotlib, cvxpy, cvxopt, and
imageio.
(d) After you have installed all these packages, open Spyder and type your hello world program.
import numpy as np
import scipy
import matplotlib.pyplot as plt
import cvxpy as cp
import csv
import imageio
print("Hello World!")
If you are already familiar with Python, you may skip this exercise. Please contact our teaching assistants
if you need any help.
c 2019 Stanley Chan. All Rights Reserved. 1
Exercise 2: Generating 1D Random Variables
In this exercise, we will use Python to draw random samples from a 1D Gaussian and visualize the data
using a the histogram.
(a) Let X be a random variable with X ~ N (μ, σ2). The PDF of X is written explicitly as
(b) Let μ = 0 and σ = 1 so that X ~ N (0, 1). Plot fX(x) using matplotlib.pyplot.plot for the range
x ∈ [3, 3]. Use matplotlib.pyplot.savefig to save your figure.
(c) Let us investigate the use of histograms in data visualization.
(i) Use numpy.random.normal to draw 1000 random samples from N (0, 1).
(ii) Make two histogram plots using matplotlib.pyplot.hist, with the number of bins m set to 4
and 1000.
(iii) Use scipy.stats.norm.fit to estimate the mean and standard deviation of your data. Report
the estimated values.
(iv) Plot the fitted gaussian curve on top of the two histogram plots using scipy.stats.norm.pdf.
(v) Are the two histograms representative of your data’s distribution? How are they different in terms
of data representation?
(d) A practical way to estimate the optimal bin width is to make use of what is called the cross validation
estimator of risk (CVER) of the dataset. Denoting h = (max data value min data value)/m as
the bin width, with m = the number of bins (assuming you applied no rescaling to your raw data), we
seek h
that minimizes the CVER Jb(h), expressed as follows:
Jb(h) = 2(2)
where {pbj}
m
j=1 is the empirical probability of a sample falling into each bin, and n is the total number
of samples.
Plot Jb(h) with respect to m the number of bins, for m = 1, 2, ..., 200. Find the m?
that minimizes
Jb(h), plot the histogram of your data with that m?
, and plot the Gaussian curve fitted to your data on
top of your histogram. How is your current histogram different from those you obtained in part (c)?
Note: If you are interested in why Jb(h) plays an important role in estimating the optimal bin width
of the histogram, see the additional note of this homework.
Exercise 3: Generating 2D Random Variables
In this exercise, we consider the following question: suppose that we are given a random number generator
that can only generate zero-mean unit variance Gaussians, i.e., X ~ N (0, I), how do we transform the
distribution of X to an arbitrary Gaussian distribution? We will first derive a few equations, and then verify
them with an empirical example, by drawing samples from the 2D Gaussian, applying the transform to the
dataset, and checking if the transformed dataset really takes the form of the desired Gaussian.
(a) Let X ~ N (μ, Σ) be a 2D Gaussian. The PDF of X is given by
c 2019 Stanley Chan. All Rights Reserved. 2
where in this exercise we assume
(4)
(i) Simplify the expression fX(x) for the particular choices of μ and Σ here. Show your derivation.
(ii) Using matplotlib.pyplot.contour, plot the contour of fX(x) for the range x ∈ [?1, 5]×[0, 10].
(b) Suppose X ~ N (0, I). We would like to derive a transformation that can map X to an arbitrary
Gaussian.
(i) Let X ~ N (0, I) be a d-dimensional random vector. Let A ∈ R
d×d and b ∈ Rd. Let Y = AX +b
be an affine transformation of X. Let μY
def = E[Y ] be the mean vector and ΣY
def = E[(Y μY )(YμY )T] be the covariance matrix. Show that
μY = b, and ΣY = AAT. (5)
(ii) Show that ΣY is symmetric positive semi-definite.
(iii) Under what condition on A would ΣY become a symmetric positive definite matrix?
(iv) Consider a random variable Y ~ N (μY , ΣY ) such that
Determine A and b which could satisfy Equation (5).
Hint: Consider eigen-decomposition of ΣY . You may compute the eigen-decomposition numerically.
(c) Now let us verify our results from part (b) with an empirical example.
(i) Use numpy.random.multivariate_normal to draw 5000 random samples from the 2D standard
normal distribution, and make a scatter plot of the data point using matplotlib.pyplot.scatter.
(ii) Apply the affine transformation you derived in part (b)(iv) to the data points, and make a
scatter plot of the transformed data points. Now check your answer by using the Python function
numpy.linalg.eig to obtain the trasformation and making a new scatter plot of the transformed
data points.
(iii) Do your results from parts (c)(i) and (ii) support your theoretical findings from part (b)? You
are welcome to utilize Python functions you find useful and include plots in your answer.
Exercise 4: Norm and Positive Semi-Definiteness
The aim of this exercise is to reinforce your understanding of the vital concepts of norms, the two famous
inequalities, eigen-decomposition, and the notion of positive (semi-)definiteness, which will be ubiquitous
throughout the semester.
(a) Schur’s lemma (one of the several named after Issai Schur) is one of the most commonly used inequalities
in estimating quadratic forms. Given a matrix A ∈ R
m×n, vectors x ∈ Rm and y ∈ R
n, the inequality
takes the form
RCkxk2kyk2, where R = max
|[A]j,k|, C = max
|[A]j,k| (6)
Prove this inequality.
Hint: Use the Cauchy-Schwarz inequality.
c 2019 Stanley Chan. All Rights Reserved. 3
(b) Recall from the lectures the concepts related to positive (semi-)definite matrices.
(i) Prove that any positive definite matrix A is invertible.
(ii) Find a function f : R
2 → R whose Hessian is invertible but not positive definite anywhere in R2.
(iii) Under what extra condition is any positive semi-definite matrix positive definite? Justify your
answer.
(c) Recall the concept of eigen-decomposition: for any symmetric matrix A ∈ R
n×n, there exist a diagonal
matrix Λ ∈ R
n×n with eigenvalues of A on its diagonal, and orthonormal matrix U ∈ Rn×n with
eigenvectors of A as its columns, such that A = UΛU
T. Prove that there exists A ∈ R
n×n such that
the following holds:
A = A (7)
Hint: You can use the fact that, for symmetric A with rank k ≤ n, it is possible to eigen-decompose
A such that the first k diagonal entries of Λ are nonzero, and the rest are all zeros. Then define
j,j for 1 ≤ j ≤ k, and 0 everywhere else. A
is what is called the
pseudoinverse of A.
c 2019 Stanley Chan. All Rights Reserved. 4
因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com
微信:codehelp