Python开发一个csv比较功能相关知识点汇总及demo

Python 2.7

csv.reader(csvfiledialect='excel'**fmtparams)的一个坑:
csvfile被csv.reader生成的iterator,在遍历每二次时,内容为空


iterator

An object representing a stream of data. Repeated calls to the iterator’s __next__() method (or passing it to the built-in function next()) return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its __next__() method just raise StopIteration again. Iterators are required to have an __iter__() method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container.

More information can be found in Iterator Types.

# -*- coding: utf-8 -*-
import csv

with open('eggs.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
with open('eggs.csv', 'rb') as csvfile:
    print("==============")
    for row in csv.reader(csvfile, delimiter=' ', quotechar='|'):
        print row
        for x in row:
            print x
    print("======first travel end. next:========")
    for row in csv.reader(csvfile):
        print row
    print("=======end=======")

输出:

==============
['Spam', 'Spam', 'Spam', 'Spam', 'Spam', 'Baked Beans']
Spam
Spam
Spam
Spam
Spam
Baked Beans
['Spam', 'Lovely Spam', 'Wonderful Spam']
Spam
Lovely Spam
Wonderful Spam
======first travel end. next:========
=======end=======

 







doc demo:

# -*- coding: utf-8 -*-
import csv

with open('eggs.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=' ',
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(['Spam'] * 5 + ['Baked Beans'])
    spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam'])
with open('eggs.csv', 'rb') as csvfile:
    for row in csv.reader(csvfile, delimiter=' ', quotechar='|'):
        print row
        for x in row:
            print x

with open('names.csv', 'w') as csvfile:
    fieldnames = ['first_name', 'last_name']
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

    writer.writeheader()
    writer.writerow({'first_name': '中Baked', 'last_name': 'Beans'})
    writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})
    writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'})

with open('names.csv', 'r') as csvfile:
    reader = csv.DictReader(csvfile)
    for row in reader:
        # print(",".join(row['first_name']).decode('GBK'), row['last_name'])
        print(row['first_name'], row['last_name'])
        # print(row['first_name'].decode('GBK').encode('UTF-8'), row['last_name'])

https://docs.python.org/2/library/csv.html#csv-examples
https://docs.python.org/2.7/tutorial/datastructures.html#dictionaries
https://docs.python.org/2.7/library/stdtypes.html#bltin-file-objects

    def is_same_csv_file(self, compare_csv_files_path, baseline_csv_files_path, csv_file_name):
        baseline_file = open(self.get_csv_files(baseline_csv_files_path, csv_file_name), 'rb')
        compare_file = open(self.get_csv_files(compare_csv_files_path, csv_file_name), 'rb')

        base_line_count = len(baseline_file.readlines())
        compare_line_count = len(compare_file.readlines())
        if base_line_count != compare_line_count:
            print("line_num is not equal\n\r:base_line_count:%d\n\r compare_line_count:%d" % (base_line_count,
                                                                                              compare_line_count))
            return False

        baseline_reader = self.get_csv_reader(baseline_csv_files_path, csv_file_name)
        compare_reader = self.get_csv_reader(compare_csv_files_path, csv_file_name)
        for base_row in baseline_reader:
            if self.is_base_record_exist(base_row, compare_reader):
                continue
            else:
                print("Missing record:line_num:%d" % baseline_reader.line_num)
                print "Expected Data:"
                print(",".join(base_row).decode('gb2312'))
                return False
        return True

    @staticmethod
    def is_base_record_exist(base_row, compare_reader):
        for result_row in compare_reader:
            if base_row == result_row:
                return True
        return False

 

相关知识点:
字符串格式化 (%操作符)

模板
格式化字符串时,Python使用一个字符串作为模板。模板中有格式符,这些格式符为真实值预留位置,并说明真实数值应该呈现的格式。Python用一个tuple将多个值传递给模板,每个值对应一个格式符。
比如下面的例子:

print("I'm %s. I'm %d year old" % ('Vamei', 99))
上面的例子中,

"I'm %s. I'm %d year old" 为我们的模板。%s为第一个格式符,表示一个字符串。%d为第二个格式符,表示一个整数。('Vamei', 99)的两个元素'Vamei'和99为替换%s和%d的真实值。
在模板和tuple之间,有一个%号分隔,它代表了格式化操作。

整个"I'm %s. I'm %d year old" % ('Vamei', 99) 实际上构成一个字符串表达式。我们可以像一个正常的字符串那样,将它赋值给某个变量。比如:

a = "I'm %s. I'm %d year old" % ('Vamei', 99)
print(a)

我们还可以用词典来传递真实值。如下:

print("I'm %(name)s. I'm %(age)d year old" % {'name':'Vamei', 'age':99})
可以看到,我们对两个格式符进行了命名。命名使用()括起来。每个命名对应词典的一个key。

格式符

格式符为真实值预留位置,并控制显示的格式。格式符可以包含有一个类型码,用以控制显示的类型,如下:
%s 字符串 (采用str()的显示)
%r 字符串 (采用repr()的显示)
%c 单个字符
%b 二进制整数
%d 十进制整数
%i 十进制整数
%o 八进制整数
%x 十六进制整数
%e 指数 (基底写为e)
%E 指数 (基底写为E)
%f 浮点数
%F 浮点数,与上相同
%g 指数(e)

上一篇:从零开始完成一个Android JNI开发


下一篇:打开键盘遮住View的问题解决方法-IOS开发