在上一篇文章中我们使用推导列表对代码进行了简化,并得到了对应的结果,我们首先来看一下上篇文章中得到的结果,如下:
>>> ================================ RESTART ================================ >>> [‘2.01‘, ‘2.01‘, ‘2.22‘, ‘2.34‘, ‘2.34‘, ‘2.45‘, ‘3.01‘, ‘3.10‘, ‘3.21‘] [‘2.11‘, ‘2.11‘, ‘2.23‘, ‘2.23‘, ‘2.59‘, ‘3.10‘, ‘3.10‘, ‘3.21‘, ‘3.21‘] [‘2.22‘, ‘2.38‘, ‘2.49‘, ‘3.01‘, ‘3.01‘, ‘3.02‘, ‘3.02‘, ‘3.02‘, ‘3.22‘] [‘2.18‘, ‘2.25‘, ‘2.39‘, ‘2.54‘, ‘2.55‘, ‘2.55‘, ‘2.55‘, ‘2.58‘, ‘2.58‘]
我们可以看到,这结果中存在一些重复的元素,这个我们得想一个办法把他给去除。
自己动手,利用迭代删除重复项
好的,那我们就修改之前的代码,去掉其中的重复项,代码如下:
def sanitize(time_string): if ‘-‘ in time_string: splitter=‘-‘ elif ‘:‘ in time_string: splitter=‘:‘ else: return(time_string) (mins,secs)=time_string.split(splitter) return(mins+‘.‘+secs) with open(‘james.txt‘) as jaf: data=jaf.readline() james=data.strip().split(‘,‘) with open(‘julie.txt‘) as juf: data=juf.readline() julie=data.strip().split(‘,‘) with open(‘mikey.txt‘) as mif: data=mif.readline() mikey=data.strip().split(‘,‘) with open(‘sarah.txt‘) as saf: data=saf.readline() sarah=data.strip().split(‘,‘) james=sorted([sanitize(t) for t in james]) julie=sorted([sanitize(t) for t in julie]) mikey=sorted([sanitize(t) for t in mikey]) sarah=sorted([sanitize(t) for t in sarah]) unique_james=[] for each in james: if each not in unique_james: unique_james.append(each) unique_julie=[] for each in julie: if each not in unique_julie: unique_julie.append(each) unique_mikey=[] for each in mikey: if each not in unique_mikey: unique_mikey.append(each) unique_sarah=[] for each in sarah: if each not in unique_sarah: unique_sarah.append(each) """打印四个列表""" print(unique_james) print(unique_julie) print(unique_mikey) print(unique_sarah)运行结果如下:
>>> ================================ RESTART ================================ >>> [‘2.01‘, ‘2.22‘, ‘2.34‘, ‘2.45‘, ‘3.01‘, ‘3.10‘, ‘3.21‘] [‘2.11‘, ‘2.23‘, ‘2.59‘, ‘3.10‘, ‘3.21‘] [‘2.22‘, ‘2.38‘, ‘2.49‘, ‘3.01‘, ‘3.02‘, ‘3.22‘] [‘2.18‘, ‘2.25‘, ‘2.39‘, ‘2.54‘, ‘2.55‘, ‘2.58‘]
可以看到我们已经将其中的重复项给去除了,当然,如果想只打印每个列表前三个的话,只需要修改一下print的语句即可,如下:
"""打印四个列表""" print(unique_james[0:3]) print(unique_julie[0:3]) print(unique_mikey[0:3]) print(unique_sarah[0:3])结果如下:
>>> ================================ RESTART ================================ >>> [‘2.01‘, ‘2.22‘, ‘2.34‘] [‘2.11‘, ‘2.23‘, ‘2.59‘] [‘2.22‘, ‘2.38‘, ‘2.49‘] [‘2.18‘, ‘2.25‘, ‘2.39‘]
通过以上的代码,我们成功的完成了列表中元素的去重操作,可是,回去再看看代码,这段代码似乎很糟,有没有更好的解决方案呢?
python提供,用集合删除重复项
在python中有一个内置的集合函数为set(),通过这个函数可以除去集合中存在的重复元素,我们先来看一个示例:
>>> data=[1.1,1.2,1.1,1.5,1.6] >>> data [1.1, 1.2, 1.1, 1.5, 1.6] >>> >>> data=set(data) >>> data {1.1, 1.6, 1.5, 1.2}可以看到,通过set集合已经把data列表中的重复元素给删除了,这里我们要注意一点:通过set函数操作之前,data打印的是以“[]”括起来的,代表是列表,而通过set函数操作以后,data打印的是以“{}”括起来的,代表是集合,这点需要注意一下。
修改之前的代码,使用set函数进行列表元素的去重,如下:
def sanitize(time_string): if ‘-‘ in time_string: splitter=‘-‘ elif ‘:‘ in time_string: splitter=‘:‘ else: return(time_string) (mins,secs)=time_string.split(splitter) return(mins+‘.‘+secs) with open(‘james.txt‘) as jaf: data=jaf.readline() james=data.strip().split(‘,‘) with open(‘julie.txt‘) as juf: data=juf.readline() julie=data.strip().split(‘,‘) with open(‘mikey.txt‘) as mif: data=mif.readline() mikey=data.strip().split(‘,‘) with open(‘sarah.txt‘) as saf: data=saf.readline() sarah=data.strip().split(‘,‘) """打印四个列表""" print(sorted(set([sanitize(t) for t in james]))[0:3]) print(sorted(set([sanitize(t) for t in julie]))[0:3]) print(sorted(set([sanitize(t) for t in mikey]))[0:3]) print(sorted(set([sanitize(t) for t in sarah]))[0:3])我们在sorted函数中嵌套set函数将推导列表返回的列表进行去重,然后进行排序,最后输出前面的三个数字,结果如下:
>>> ================================ RESTART ================================ >>> [‘2.01‘, ‘2.22‘, ‘2.34‘] [‘2.11‘, ‘2.23‘, ‘2.59‘] [‘2.22‘, ‘2.38‘, ‘2.49‘] [‘2.18‘, ‘2.25‘, ‘2.39‘]运行结果和之前的是一样的。