值的字母数字时排序索引

2023-09-22 23:51:57

我想知道如何应对这种数据操作困境.
在索引级别的值为字母数字的数据框中,对多索引的索引进行排序的最佳方法是什么.
值是：

[u’0′,u’1′,u’10’,u’11’,u’2′,u’2Y’,u’3′,u’3Y’,u’4′,u’4Y’ ,u’5′,u’5Y’,u’6′,u’7′,u’8′,u’9′,u’9Y’]

我要搜索的结果是：

[u’0′,u’1′,u’2′,u’3′,u’4′,u’5′,u’6′,u’7′,u’8′,u’9′ ,u’10’,u’11’,u’2Y’,u’3Y’,u’4Y’,u’5Y’,u’9Y’]

普通数字值表示月份,而整数加’Y’表示年份.

有没有一种方法可以对索引进行排序？

持续时间-是多重索引的一级,第二级是总和.
请在下面找到样本数据集：

Duration                            2          2Y         3         3Y   
customer                                                                     
Invoice A                         25.50        0.00      0.00       20.00   
Invoice B                         50.00        25.00     -10.50     0.00
Invoice C                         125.00       0.00      11.20      0.50
Invoice D                         0.00        15.00      0.00       80.10

解决方法:

您可以使用natsort包对列进行自然排序.这是一个例子：

import natsort as ns

c =  ['0', '1', '10', ...]
c = sorted(ns.natsorted(c), key=lambda x: not x.isdigit())

print(c)
['0',
 '1',
 '2',
 '3',
 '4',
 '5',
 '6',
 '7',
 '8',
 '9',
 '10',
 '11',
 '2Y',
 '3Y',
 '4Y',
 '5Y',
 '9Y']

对于您的问题,reindex_axis采取了类似的方法,作为额外的步骤：

c = df.columns.levels[1]
c = sorted(ns.natsorted(c), key=str.isdigit, reverse=True)

df = df.reindex_axis(pd.MultiIndex.from_product([df.columns.levels[0], c]), axis=1)

码农公寓

相关文章