文章目录
一、pandas
1.as
import pandas as pd
2.体系
pandas中有两类非常重要的数据结构,即序列Series和数据框DataFrame。
-
Series类似于numpy中的一维数组,除了通吃一维数组可用的函数或方法,而且其可通过索引标签的方式获取数据,还具有索引的自动对齐功能。
-
DataFrame类似于numpy中的二维数组,同样可以通用numpy数组的函数和方法
二、序列Series和数据框DataFrame
1.Series创建
左边是下标索引,右边是对应的值
name=pd.Series(…)
(1)一维np数组
#ndarray类型的一维数组
import pandas as pd
import numpy as np
a=pd.Series(np.array([3,4,5]));
print(a)
'''
0 3
1 4
2 5
dtype: int32
'''
print(type(a))
#<class 'pandas.core.series.Series'>
(2)列表
#列表类型的一维数组
import pandas as pd
a=pd.Series([3,4,5]);
print(a)
'''
0 3
1 4
2 5
dtype: int64
'''
(3)字典
import pandas as pd
dict = {'a':3,'b':4,'c':5}
ds=pd.Series(dict)
print(ds)
'''
a 3
b 4
c 5
dtype: int64
'''
(4)通过DataFrame中的某一行或某一列创建序列
import pandas as pd
dic3 = {'one':{'a':1,'b':2,'c':3,'d':4},'two':{'a':5,'b':6,'c':7,'d':8},'three':{'a':9,'b':10,'c':11,'d':12}}
df3 = pd.DataFrame(dic3)
s3 = df3['one']
print(s3)
'''
a 1
b 2
c 3
d 4
Name: one, dtype: int64
'''
print(type(s3))
#<class 'pandas.core.series.Series'>
2.DataFrame
左边和上边都是索引
name=pd.DataFrame(…)
(1)二维np数组
#ndarray的二维数组
import pandas as pd
import numpy as np
arr1 = np.array(np.arange(12)).reshape(4,3)
df1 = pd.DataFrame(arr1)
print(df1)
'''
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
'''
print(type(df1))
#<class 'pandas.core.frame.DataFrame'>
(2)嵌套列表
#列表类型的二维数组
import pandas as pd
arr2 = [[1,2,3],[4,5,6]];
df2 = pd.DataFrame(arr2)
print(df2)
'''
0 1 2
0 1 2 3
1 4 5 6
'''
(3)字典
以下以两种字典来创建数据框,一个是字典列表,一个是嵌套字典。
#字典列表
import pandas as pd
dic1 = {'a':[1,2,3,4],'b':[5,6,7,8],'c':[9,10,11,12],'d':[13,14,15,16]}
df1 = pd.DataFrame(dic1)
print(df1)
'''
a b c d
0 1 5 9 13
1 2 6 10 14
2 3 7 11 15
3 4 8 12 16
'''
#嵌套字典
import pandas as pd
dic3 = {'one':{'a':1,'b':2,'c':3,'d':4},'two':{'a':5,'b':6,'c':7,'d':8},'three':{'a':9,'b':10,'c':11,'d':12}}
df3 = pd.DataFrame(dic3)
print(df3)
'''
one three two
a 1 9 5
b 2 10 6
c 3 11 7
d 4 12 8
'''
(4)通过数据框的方式创建数据框
import pandas as pd
dic3 = {'one':{'a':1,'b':2,'c':3,'d':4},'two':{'a':5,'b':6,'c':7,'d':8},'three':{'a':9,'b':10,'c':11,'d':12}}
df3 = pd.DataFrame(dic3)
df4 = df3[['one','three']]
print(df4)
'''
one three
a 1 9
b 2 10
c 3 11
d 4 12
'''
三、索引
1.设置索引
(1)默认索引
如果不给一个指定的索引值,则自动生成一个从0开始的自增索引。
import pandas as pd
s4 = pd.Series(np.array([0,1,2,3,4,5]))
print(s4)
'''
0 0
1 1
2 2
3 3
4 4
5 5
dtype: int32
'''
import pandas as pd
import numpy as np
arr1 = np.array(np.arange(12)).reshape(4,3)
df1=pd.DataFrame(arr1)
print(df1)
'''
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
'''
(2)查看索引
Series
可以通过index查看序列的索引:从start开始,不到stop,步长为step
print(s4.index)
#RangeIndex(start=0, stop=6, step=1)
DataFrame
index查看纵列的索引,columns查看横行的索引
import pandas as pd
import numpy as np
arr1 = np.array(np.arange(12)).reshape(4,3)
df1=pd.DataFrame(arr1,index=['a','b','c','d'],columns=[3,4,5])
print(df1)
'''
3 4 5
a 0 1 2
b 3 4 5
c 6 7 8
d 9 10 11
'''
print(df1.index)
#Index(['a', 'b', 'c', 'd'], dtype='object')
print(df1.columns)
#Int64Index([3, 4, 5], dtype='int64')
(3)自定义索引
自定义索引的概念
自定义索引后,不仅自定义索引可以使用,也可以使用原来的默认索引
分开创建
import pandas as pd
s4 = pd.Series(np.array([0,1,2,3,4,5]))
print(s4)
'''
0 0
1 1
2 2
3 3
4 4
5 5
dtype: int32
'''
s4.index = ['a','b','c','d','e','f']
print(s4)
'''
a 0
b 1
c 2
d 3
e 4
f 5
dtype: int32
'''
import pandas as pd
import numpy as np
arr1 = np.array(np.arange(12)).reshape(4,3)
df1 = pd.DataFrame(arr1)
print(df1)
'''
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
3 9 10 11
'''
df1.index=['a','b','c','d']
print(df1)
'''
0 1 2
a 0 1 2
b 3 4 5
c 6 7 8
d 9 10 11
'''
df1.columns=[3,4,5]
print(df1)
'''
3 4 5
a 0 1 2
b 3 4 5
c 6 7 8
d 9 10 11
'''
在初始化时创建
import pandas as pd
s=pd.Series(np.array([1,2,3]),index=['a','b','c'])
print(s)
'''
a 1
b 2
c 3
dtype: int32
'''
print(s.index)
#Index(['a', 'b', 'c'], dtype='object')
import pandas as pd
import numpy as np
arr1 = np.array(np.arange(12)).reshape(4,3)
df1=pd.DataFrame(arr1,index=['a','b','c','d'],columns=[3,4,5])
print(df1)
'''
3 4 5
a 0 1 2
b 3 4 5
c 6 7 8
d 9 10 11
'''
2.通过索引值获取数据
单个索引:索引值,如s4[1]或者s4[‘b’]
多个索引:一维列表,如s4[1,3,5]或者s4[‘b’,‘d’,‘f’]
花式索引:通过自定义索引标签获取数据的话,末端标签所对应的值是可以返回的!默认索引标签不返回。
[4:]和[’d’:]、[:2]和[:‘c’]效果是一样的。但[‘b’:‘d’]是以从b到包含d,[1:3]是从1不到3。
import pandas as pd
s4 = pd.Series(np.array([0,1,2,3,4,5]))
print(s4)
'''
0 0
1 1
2 2
3 3
4 4
5 5
dtype: int32
'''
s4.index = ['a','b','c','d','e','f']
print(s4)
'''
a 0
b 1
c 2
d 3
e 4
f 5
dtype: int32
'''
#单个索引
print('s4[3]: ',s4[3])
print('s4[e]: ',s4['e'])
'''
s4[3]: 3
s4[e]: 4
'''
#多个索引
print("s4[[1,3,5]]: ",s4[[1,3,5]])
'''
s4[1,3,5]: b 1
d 3
f 5
dtype: int32
'''
print("s4[['b','d','f']]: ",s4[['b','d','f']])
'''
s4[['b','d','f']]: b 1
d 3
f 5
dtype: int32
'''
#花式索引-一样的
print('s4[:4]: ',s4[:4])
print("s4[:'d']:",s4[:'d'])
print('s4[2:]',s4[2:])
print("s4['c':]: ",s4['c':])
'''
s4[:4]: a 0
b 1
c 2
d 3
dtype: int32
s4[:'d']: a 0
b 1
c 2
d 3
dtype: int32
s4[2:] c 2
d 3
e 4
f 5
dtype: int32
s4['c':]: c 2
d 3
e 4
f 5
dtype: int32
'''
#花式索引-不同的
print("s4['b':'d']: ",s4['b':'d'])
print("s4[1:3]:",s4[1:3])
'''
s4['b':'d']: b 1
c 2
d 3
dtype: int32
s4[1:3]: b 1
c 2
dtype: int32
'''
3.自动化对齐
如果有两个序列,需要对这两个序列进行算术运算,这时索引的存在就体现的它的价值了—自动化对齐.
当有对应的索引时,结果为索引之间的结果。
当缺乏对应的索引时,结果为NaN。
import pandas as pd
import numpy as np
s1=pd.Series(np.array([1,2,3]),index=['a','b','c'])
s2=pd.Series(np.array([10,20,30]),index=['b','a','d'])
print(s1+s2)
'''
a 21.0
b 12.0
c NaN
d NaN
dtype: float64
'''
import pandas as pd
d1=pd.DataFrame({'a':[1,2,3],'b':[4,5,6],'c':[7,8,9]})
d2=pd.DataFrame({'b':[10,10,10],'a':[20,20,20],'d':[30,30,30]})
print(d1+d2)
'''
a b c d
0 21 14 NaN NaN
1 22 15 NaN NaN
2 23 16 NaN NaN
'''