Numpy下 3

次序统计
1.1 计算最小值numpy.amin(a[, axis=None, out=None, keepdims=np._NoValue, initial=np._NoValue, where=np._NoValue])numpy中amin()方法中维度axis=0 1 2 的理解axis=0 从最外一层的维度来比较(把两块面包变成一块面包)axis=1 从中间一层的维度来比较(比较行,将多行压缩成了一行)axis=2 从最内一层的维度来比较(比较列,把多列压缩成了一列)import numpy as npa = np.random.randint(2,40,size=(2,3,4))print(a)print("="*90)print(np.amin(a,0))print("="*90)print(np.amin(a,1))print("="*90)print(np.amin(a,2))print("="90)print(np.amin(a,(0,2)))12345678910111234567891011在这里插入图片描述1.2 计算最大值numpy.amax()计算最大值,参数和np.amin()一样1.3 计算极差numpy.ptp(a, axis=None, out=None, keepdims=np._NoValue)最大数-最小数的值(ptp means ‘peak to peak’)import numpy as npnp.random.seed(20200623)x = np.random.randint(0, 20, size=[4, 5])print(x)# [[10 2 1 1 16]# [18 11 10 14 10]# [11 1 9 18 8]# [16 2 0 15 16]]print(np.ptp(x)) # 18print(np.ptp(x, axis=0)) # [ 8 10 10 17 8]print(np.ptp(x, axis=1)) # [15 8 17 16]12345678910111213123456789101112131.4 计算分位数numpy.percentile(a, q, axis=None, out=None, overwrite_input=False, interpolation=‘linear’, keepdims=False)1)第一四分位数(Q1),又称“较小四分位数”,等于该样本中所有数值由小到大排列后第25%的数字;2)第二四分位数(Q2),又称“中位数”,等于该样本中所有数值由小到大排列后第50%的数字;3)第三四分位数(Q3),又称“较大四分位数”,等于该样本中所有数值由小到大排列后第75%的数字。第三四分位数与第一四分位数的差距又称四分位距。1.4.1 分位数解法1求P分位点;(P是一个小于等于1的小数)假如有n个点,但是实际上只有n-1个距离;所以直接使用np得到的位置的数字是有问题的,所以真正的P分位点的绝对位置为:(n - 1)*p + 1那么问题就转化为了去求得这个位置上的数字,设绝对位置为X,X向下取整为Y,则有data[Y] + (data[Y+1] - data[Y]) * (X-Y)1.4.2 分位数解法2条件与解法1一样;np这个数字一定是小于或者等于真正位置的,所以取大于等于np位置的第一数字;设大于np的第一个整数为X,则有:data[X]+(data[X+1]-data[X]) * (P-(X-1)/(N - 1)) / (N - 1);在这里插入图片描述import numpy as npnp.random.seed(20200623)x = np.random.randint(0, 20, size=[4, 5])print(x)# [[10 2 1 1 16]# [18 11 10 14 10]# [11 1 9 18 8]# [16 2 0 15 16]]print(np.percentile(x, [25, 50])) # [ 2. 10.]print(np.percentile(x, [25, 50], axis=0))# [[10.75 1.75 0.75 10.75 9.5 ]# [13.5 2. 5. 14.5 13. ]]print(np.percentile(x, [25, 50], axis=1))# [[ 1. 10. 8. 2.]# [ 2. 11. 9. 15.]]123456789101112131415161718192012345678910111213141516171819202. 均值与方差2.1 中位数numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)import numpy as npnp.random.seed(20200623)x = np.random.randint(0, 20, size=[4, 5])print(x)# [[10 2 1 1 16]# [18 11 10 14 10]# [11 1 9 18 8]# [16 2 0 15 16]]print(np.percentile(x, 50))print(np.median(x))# 10.0print(np.percentile(x, 50, axis=0))print(np.median(x, axis=0))# [13.5 2. 5. 14.5 13. ]print(np.percentile(x, 50, axis=1))print(np.median(x, axis=1))# [ 2. 11. 9. 15.]123456789101112131415161718192012345678910111213141516171819202.2 平均值numpy.mean(a[, axis=None, dtype=None, out=None, keepdims=np._NoValue)])import numpy as npx = np.array([[11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25], [26, 27, 28, 29, 30], [31, 32, 33, 34, 35]])y = np.mean(x)print(y) # 23.0y = np.mean(x, axis=0)print(y) # [21. 22. 23. 24. 25.]y = np.mean(x, axis=1)print(y) # [13. 18. 23. 28. 33.]1234567891011121314151234567891011121314152.3 加权平均值numpy.average(a[, axis=None, weights=None, returned=False])mean和average都是计算均值的函数,在不指定权重的时候average和mean是一样的。指定权重后,average可以计算加权平均值。import numpy as npx = np.array([[11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25], [26, 27, 28, 29, 30], [31, 32, 33, 34, 35]])y = np.average(x)print(y) # 23.0y = np.average(x, axis=0)print(y) # [21. 22. 23. 24. 25.]y = np.average(x, axis=1)print(y) # [13. 18. 23. 28. 33.]y = np.arange(1, 26).reshape([5, 5])print(y)# [[ 1 2 3 4 5]# [ 6 7 8 9 10]# [11 12 13 14 15]# [16 17 18 19 20]# [21 22 23 24 25]]z = np.average(x, weights=y)print(z) # 27.0z = np.average(x, axis=0, weights=y)print(z)# [25.54545455 26.16666667 26.84615385 27.57142857 28.33333333]z = np.average(x, axis=1, weights=y)print(z)# [13.66666667 18.25 23.15384615 28.11111111 33.08695652]123456789101112131415161718192021222324252627282930313233343512345678910111213141516171819202122232425262728293031323334352.4 计算方差numpy.var(a[, axis=None, dtype=None, out=None, ddof=0, keepdims=np._NoValue])要注意方差和样本方差的无偏估计,方差公式中分母上是n;样本方差无偏估计公式中分母上是n-1(n为样本个数)。ddof=0:是“Delta Degrees of Freedom”,表示*度的个数。123123方差的计算公式:在这里插入图片描述*度:在这里插入图片描述import numpy as npx = np.array([[11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25], [26, 27, 28, 29, 30], [31, 32, 33, 34, 35]])y = np.var(x)print(y) # 52.0y = np.mean((x - np.mean(x)) ** 2)print(y) # 52.0y = np.var(x, ddof=1)print(y) # 54.166666666666664y = np.sum((x - np.mean(x)) ** 2) / (x.siz

上一篇:Tensor多维数组和axis的理解


下一篇:accpect限制上传文件的类型