数据选择的常用方法
在ipython解释器下,构造测试数据:
In [1]: import numpy as np
In [2]: import pandas as pd
In [3]: dates = pd.date_range('20211107', periods=6)
In [4]: data = pd.DataFrame(np.random.randn(6, 4), index=d
...: ates, columns=['A', 'B', 'C', 'D'])
In [5]: data
Out[5]:
A B C D
2021-11-07 -0.877244 1.713848 0.144780 0.011379
2021-11-08 -0.213253 0.344520 -0.658004 0.648373
2021-11-09 0.408933 -0.145004 -1.644283 -0.402420
2021-11-10 0.687184 0.412389 0.571787 0.069970
2021-11-11 0.984974 -1.827038 1.206369 -1.701110
2021-11-12 2.179770 -0.607483 -0.170085 0.617194
复制代码
按行标签选择数据:
In [11]: data.loc['2021-11-07']
Out[11]:
A -0.877244
B 1.713848
C 0.144780
D 0.011379
Name: 2021-11-07 00:00:00, dtype: float64
复制代码
列标签获取多列数据:
In [15]: data.loc[:, ['A', 'B']]
Out[15]:
A B
2021-11-07 -0.877244 1.713848
2021-11-08 -0.213253 0.344520
2021-11-09 0.408933 -0.145004
2021-11-10 0.687184 0.412389
2021-11-11 0.984974 -1.827038
2021-11-12 2.179770 -0.607483
复制代码
结合行标签和列标签获取多行多列数据:
In [16]: data.loc['2021-11-07':'2021-11-09', ['A', 'B']]
Out[16]:
A B
2021-11-07 -0.877244 1.713848
2021-11-08 -0.213253 0.344520
2021-11-09 0.408933 -0.145004
复制代码
根据行标签和列标签获取某行多列数据:
In [17]: data.loc['2021-11-07', ['A', 'B']]
Out[17]:
A -0.877244
B 1.713848
Name: 2021-11-07 00:00:00, dtype: float64
复制代码
根据特定行标签和特定的列标签获取某一个值:
# 方法一:
In [19]: data.loc['2021-11-07', 'A']
Out[19]: -0.877244243377641
# 方法二:
In [20]: data.at['2021-11-07', 'A']
Out[20]: -0.877244243377641
复制代码
按索引位置(第几个位置,0索引为第一个位置)获取数据:
In [22]: data
Out[22]:
A B C D
2021-11-07 -0.877244 1.713848 0.144780 0.011379
2021-11-08 -0.213253 0.344520 -0.658004 0.648373
2021-11-09 0.408933 -0.145004 -1.644283 -0.402420
2021-11-10 0.687184 0.412389 0.571787 0.069970
2021-11-11 0.984974 -1.827038 1.206369 -1.701110
2021-11-12 2.179770 -0.607483 -0.170085 0.617194
In [23]: data.iloc[0]
Out[23]:
A -0.877244
B 1.713848
C 0.144780
D 0.011379
Name: 2021-11-07 00:00:00, dtype: float64