集成学习(上)task1

1.什么是机器学习?

下面摘抄上学期机器学习slides里的几个定义,上学期的机器学习课学的东西又还给老师了,希望通过这次组队学习复习+学习~
Arthur Samuel (1959): Machine learning is a “field of study that gives computers the ability to learn without being explicitly programmed”.

Machine learning is the science of getting machines to learn and act in a similar way to humans while also autonomously learning from real-world interactions and sets of training data that we feed them.

Machine learning is an application of artificial intelligence (AI) that provides systems the ability to automatically learn and improve from experience without being explicitly programmed. Machine learning focuses on the development of computer programs that can access data and use it learn for themselves.

Machine Learning is the science of making computer artifacts improve their performance with respect to a certain performance criterion using example data or past experience, without requiring humans to program their behavior explicitly.

Machine Learning is a set of methods that automatically detect patterns in data, use the uncovered patterns to for prediction or decision making.

机器学习分为有监督的学习,无监督的学习。区别在于是否有因变量。

根据因变量的是否连续,有监督学习又分为回归和分类:
回归:因变量是连续型变量,如:房价,体重等。
分类:因变量是离散型变量,如:是否患癌症,西瓜是好瓜还是坏瓜等。

# 引入相关科学计算包
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline 
plt.style.use("ggplot")      
import seaborn as sns

这里的%matplotlib inline表示:当你调用matplotlib.pyplot的绘图函数plot()进行绘图的时候,或者生成一个figure画布的时候,可以直接在你的python console里面生成图像。

1.1回归

使用sklearn内置数据集Boston房价数据集。sklearn中所有内置数据集都封装在datasets对象内: 返回的对象有:
data:特征X的矩阵(ndarray)
target:因变量的向量(ndarray)
feature_names:特征名称(ndarray)

from sklearn import datasets
boston = datasets.load_boston()     # 返回一个类似于字典的类
X = boston.data
y = boston.target
features = boston.feature_names
boston_data = pd.DataFrame(X,columns=features)
boston_data["Price"] = y
boston_data.head()
sns.scatterplot(boston_data['NOX'],boston_data['Price'],color="r",alpha=0.6)
plt.title("Price~NOX")
plt.show()

1.2分类

from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
features = iris.feature_names
iris_data = pd.DataFrame(X,columns=features)
iris_data['target'] = y
iris_data.head()

特征可视化

marker = ['s','x','o']
for index,c in enumerate(np.unique(y)):
    plt.scatter(x=iris_data.loc[y==c,"sepal length (cm)"],y=iris_data.loc[y==c,"sepal width (cm)"],alpha=0.8,label=c,marker=marker[c])
plt.xlabel("sepal length (cm)")
plt.ylabel("sepal width (cm)")
plt.legend()
plt.show()

1.3无监督学习

生成月牙型非凸集

from sklearn import datasets
x, y = datasets.make_moons(n_samples=2000, shuffle=True,
                  noise=0.05, random_state=None)
for index,c in enumerate(np.unique(y)):
    plt.scatter(x[y==c,0],x[y==c,1],s=7)
plt.show()

生成符合正态分布的聚类数据

from sklearn import datasets
x, y = datasets.make_blobs(n_samples=5000, n_features=2, centers=3)
for index,c in enumerate(np.unique(y)):
    plt.scatter(x[y==c, 0], x[y==c, 1],s=7)
plt.show()
上一篇:1.Java8新特性简介&Lambda表达式


下一篇:20210315_23期_集成学习(上)_Task01