重塑层次化索引
层次化索引为DataFrame的重排提供了良好的一致性操作,主要方法有
stack :将数据的列旋转为行
unstack:将数据的行转换为列
用一个dataframe对象举例
In [4]: data = DataFrame(np.arange(6).reshape((2,3)),index = pd.Index(['Ohio','Colorado'],name='state'),columns = pd.Index(['one','two','three'],name = 'number')) In [5]: data
Out[5]:
number one two three
state
Ohio 0 1 2
Colorado 3 4 5 In [6]: data.stack()#将列索引转换为行索引
Out[6]:
state number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32 In [7]: data.unstack()#将行索引转换为列索引
Out[7]:
number state
one Ohio 0
Colorado 3
two Ohio 1
Colorado 4
three Ohio 2
Colorado 5
dtype: int32 In [9]: data.unstack().index
Out[9]:
MultiIndex(levels=[['one', 'two', 'three'], ['Ohio', 'Colorado']],
labels=[[0, 0, 1, 1, 2, 2], [0, 1, 0, 1, 0, 1]],
names=['number', 'state']) In [10]:
对于DataFrame,无论是使用unstack,还是stack,得到都是一个Series对象
Series对象,只有unstack方法。
默认情况下,unstack操作的是最内层,传入分层级别的编号或名称即可对相应级别的索引做操作。
In [21]: result.unstack(0)
Out[21]:
state Ohio Colorado
number
one 0 3
two 1 4
three 2 5 In [22]: result.unstack()
Out[22]:
number one two three
state
Ohio 0 1 2
Colorado 3 4 5 In [23]: result.unstack('state')
Out[23]:
state Ohio Colorado
number
one 0 3
two 1 4
three 2 5
如果不是所有的级别的值都能在个分组中找到的话,则unstack会引入缺失值
In [24]: s1 =Series([0,1,2,3],index = ['a','b','c','d']) In [25]: s2 = Series([4,5,6],index = ['c','d','e']) In [26]: data2 = pd.concat([s1,s2],keys = ['one','two']) In [27]: data2
Out[27]:
one a 0
b 1
c 2
d 3
two c 4
d 5
e 6
dtype: int64 In [28]: data2.unstack()
Out[28]:
a b c d e
one 0.0 1.0 2.0 3.0 NaN
two NaN NaN 4.0 5.0 6.0 In [29]: data2.unstack(0)
Out[29]:
one two
a 0.0 NaN
b 1.0 NaN
c 2.0 4.0
d 3.0 5.0
e NaN 6.0
而stack默认会滤除缺失值。
在对DataFrame进行旋转操作时,旋转的轴会成为旋转后索引的最低级别。也就是最内层索引。