最近学习了coursera上面Andrew NG的Machine learning课程,课程地址为:https://www.coursera.org/course/ml
在Introduction部分NG较为系统的概括了Machine learning的一些基本概念,也让我接触了一些新的名词,这些名词在后续课程中会频繁出现:
Machine Learning | Supervised Learning | Unsupervised Learning | Regression Problem | Classification Problem | Octave |
机器学习 | 有监督学习 | 无监督学习 | 回归问题 | 分类问题 | Octave |
What is Machine Learning
Definition: A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.
Example of Machine Learning
Suppose your email program watches which emails you do or do not mark as spam, and based on that learns how to better filter spam.
T: Classifying emails as spam or not spam; (目标)
E: Watching you label emails as spam or not spam; (算法 + 数据)
P: The number (or fraction) of emails correctly classified as spam/not spam. (评价方法->损失函数)
Supervised Learning
Definition: The goal is, given a labeled training data, to learn a function h so that h(x) is a “good” predictor for the corresponding value of y. A pair (x, y) is called a training example, x denoting “input” variables, also called features, and y denoting “output” or target variable that we are trying to predict.
When the target variable that we are trying to predict is continuous, we call the learning problem a regression problem. When the target can take on only a small number of discrete values, the learning problem is called a classification problem.
A.Example of Regression Problem
Suppose we have a dataset giving the living areas and prices of 11 houses from Portland, Oregon:
Living area (feet2) | Price (1000$s) |
450 | 100 |
600 | 140 |
620 | 210 |
... | ... |
We can plot this data:
So regression problem is to find a function h to fit these points.
B.Example of Classification Problem
Suppose we have a dataset giving the tumor size, patient age and malignant or benign, we plot these data as follows:
So classification problem is to find a function h to sperate these points.
PS: 回归就是找出那个可以拟合样本的函数(平面,空间,...),分类就是找到那个可以把不同类别的样本分开的函数(平面,空间,...);在特定问题下,比如逻辑回归问题,分类问题就可以被视作回归问题来解决。
Unsupervised Learning
In the clustering problem, we are given a training set {x(1), . . . , x(m)}, and want to group the data into a few cohesive “clusters”. Here, no labels y(i) are given. So, this is an unsupervised learning problem.
PS: 无监督学习很多时候都暗指聚类算法,聚类算法又分硬聚类(K-means, 分层聚类,基于密度的等等)和软聚类(EM算法)。