当Series或DataFrame存在重复索引时,使用reindex()函数会抛出上述错误:
a = pd.Series([1,2,3,4,5,6],index=['a','b','c','d','e','a'])
print(a)
a 1
b 2
c 3
d 4
e 5
a 6
a.reindex(['b','c','e','b'])
报错:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Python\lib\site-packages\pandas\core\series.py", line 3325, in reindex
return super(Series, self).reindex(index=index, **kwargs)
File "C:\Python\lib\site-packages\pandas\core\generic.py", line 3689, in reindex
fill_value, copy).__finalize__(self)
File "C:\Python\lib\site-packages\pandas\core\generic.py", line 3707, in _reindex_axes
copy=copy, allow_dups=False)
File "C:\Python\lib\site-packages\pandas\core\generic.py", line 3810, in _reindex_with_indexers
copy=copy)
File "C:\Python\lib\site-packages\pandas\core\internals.py", line 4414, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "C:\Python\lib\site-packages\pandas\core\indexes\base.py", line 3576, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
可以通过index.duplicated()函数找出重复索引:
a.index.duplicated()
array([False, False, False, False, False, True])