python基础之文件处理

2022-05-25 00:59:48

读和写文件

读写文件是最常见的IO操作。Python内置了读写文件的函数，用法和C是兼容的。

读写文件前，我们先必须了解一下，在磁盘上读写文件的功能都是由操作系统提供的，现代操作系统不允许普通的程序直接操作磁盘，所以，读写文件就是请求操作系统打开一个文件对象（通常称为文件描述符），然后，通过操作系统提供的接口从这个文件对象中读取数据（读文件），或者把数据写入这个文件对象（写文件）。

操作

2.x

class file(object)

    def close(self): # real signature unknown; restored from __doc__

        关闭文件

        """

        close() -> None or (perhaps) an integer.  Close the file.

        Sets data attribute .closed to True.  A closed file cannot be used for

        further I/O operations.  close() may be called more than once without

        error.  Some kinds of file objects (for example, opened by popen())

        may return an exit status upon closing.

        """

    def fileno(self): # real signature unknown; restored from __doc__

        文件描述符

         """

        fileno() -> integer "file descriptor".

        This is needed for lower-level file interfaces, such os.read().

        """

        return 0    

    def flush(self): # real signature unknown; restored from __doc__

        刷新文件内部缓冲区

        """ flush() -> None.  Flush the internal I/O buffer. """

        pass

    def isatty(self): # real signature unknown; restored from __doc__

        判断文件是否是同意tty设备

        """ isatty() -> true or false.  True if the file is connected to a tty device. """

        return False

    def next(self): # real signature unknown; restored from __doc__

        获取下一行数据，不存在，则报错

        """ x.next() -> the next value, or raise StopIteration """

        pass

    def read(self, size=None): # real signature unknown; restored from __doc__

        读取指定字节数据

        """

        read([size]) -> read at most size bytes, returned as a string.

        If the size argument is negative or omitted, read until EOF is reached.

        Notice that when in non-blocking mode, less data than what was requested

        may be returned, even if no size parameter was given.

        """

        pass

    def readinto(self): # real signature unknown; restored from __doc__

        读取到缓冲区，不要用，将被遗弃

        """ readinto() -> Undocumented.  Don't use this; it may go away. """

        pass

    def readline(self, size=None): # real signature unknown; restored from __doc__

        仅读取一行数据

        """

        readline([size]) -> next line from the file, as a string.

        Retain newline.  A non-negative size argument limits the maximum

        number of bytes to return (an incomplete line may be returned then).

        Return an empty string at EOF.

        """

        pass

    def readlines(self, size=None): # real signature unknown; restored from __doc__

        读取所有数据，并根据换行保存值列表

        """

        readlines([size]) -> list of strings, each a line from the file.

        Call readline() repeatedly and return a list of the lines so read.

        The optional size argument, if given, is an approximate bound on the

        total number of bytes in the lines returned.

        """

        return []

    def seek(self, offset, whence=None): # real signature unknown; restored from __doc__

        指定文件中指针位置

        """

        seek(offset[, whence]) -> None.  Move to new file position.

        Argument offset is a byte count.  Optional argument whence defaults to

(offset from start of file, offset should be >= 0); other values are 1

        (move relative to current position, positive or negative), and 2 (move

        relative to end of file, usually negative, although many platforms allow

        seeking beyond the end of a file).  If the file is opened in text mode,

        only offsets returned by tell() are legal.  Use of other offsets causes

        undefined behavior.

        Note that not all file objects are seekable.

        """

        pass

    def tell(self): # real signature unknown; restored from __doc__

        获取当前指针位置

        """ tell() -> current file position, an integer (may be a long integer). """

        pass

    def truncate(self, size=None): # real signature unknown; restored from __doc__

        截断数据，仅保留指定之前数据

        """

        truncate([size]) -> None.  Truncate the file to at most size bytes.

        Size defaults to the current file position, as returned by tell().

        """

        pass

    def write(self, p_str): # real signature unknown; restored from __doc__

        写内容

        """

        write(str) -> None.  Write string str to file.

        Note that due to buffering, flush() or close() may be needed before

        the file on disk reflects the data written.

        """

        pass

    def writelines(self, sequence_of_strings): # real signature unknown; restored from __doc__

        将一个字符串列表写入文件

        """

        writelines(sequence_of_strings) -> None.  Write the strings to the file.

        Note that newlines are not added.  The sequence can be any iterable object

        producing strings. This is equivalent to calling write() for each string.

        """

        pass

    def xreadlines(self): # real signature unknown; restored from __doc__

        可用于逐行读取文件，非全部

        """

        xreadlines() -> returns self.

        For backward compatibility. File objects now include the performance

        optimizations previously implemented in the xreadlines module.

        """

        pass

2.x

3.x

class TextIOWrapper(_TextIOBase):

    """

    Character and line based layer over a BufferedIOBase object, buffer.

    encoding gives the name of the encoding that the stream will be

    decoded or encoded with. It defaults to locale.getpreferredencoding(False).

    errors determines the strictness of encoding and decoding (see

    help(codecs.Codec) or the documentation for codecs.register) and

    defaults to "strict".

    newline controls how line endings are handled. It can be None, '',

    '\n', '\r', and '\r\n'.  It works as follows:

    * On input, if newline is None, universal newlines mode is

      enabled. Lines in the input can end in '\n', '\r', or '\r\n', and

      these are translated into '\n' before being returned to the

      caller. If it is '', universal newline mode is enabled, but line

      endings are returned to the caller untranslated. If it has any of

      the other legal values, input lines are only terminated by the given

      string, and the line ending is returned to the caller untranslated.

    * On output, if newline is None, any '\n' characters written are

      translated to the system default line separator, os.linesep. If

      newline is '' or '\n', no translation takes place. If newline is any

      of the other legal values, any '\n' characters written are translated

      to the given string.

    If line_buffering is True, a call to flush is implied when a call to

    write contains a newline character.

    """

    def close(self, *args, **kwargs): # real signature unknown

        关闭文件

        pass

    def fileno(self, *args, **kwargs): # real signature unknown

        文件描述符

        pass

    def flush(self, *args, **kwargs): # real signature unknown

        刷新文件内部缓冲区

        pass

    def isatty(self, *args, **kwargs): # real signature unknown

        判断文件是否是同意tty设备

        pass

    def read(self, *args, **kwargs): # real signature unknown

        读取指定字节数据

        pass

    def readable(self, *args, **kwargs): # real signature unknown

        是否可读

        pass

    def readline(self, *args, **kwargs): # real signature unknown

        仅读取一行数据

        pass

    def seek(self, *args, **kwargs): # real signature unknown

        指定文件中指针位置

        pass

    def seekable(self, *args, **kwargs): # real signature unknown

        指针是否可操作

        pass

    def tell(self, *args, **kwargs): # real signature unknown

        获取指针位置

        pass

    def truncate(self, *args, **kwargs): # real signature unknown

        截断数据，仅保留指定之前数据

        pass

    def writable(self, *args, **kwargs): # real signature unknown

        是否可写

        pass

    def write(self, *args, **kwargs): # real signature unknown

        写内容

        pass

    def __getstate__(self, *args, **kwargs): # real signature unknown

        pass

    def __init__(self, *args, **kwargs): # real signature unknown

        pass

    @staticmethod # known case of __new__

    def __new__(*args, **kwargs): # real signature unknown

        """ Create and return a new object.  See help(type) for accurate signature. """

        pass

    def __next__(self, *args, **kwargs): # real signature unknown

        """ Implement next(self). """

        pass

    def __repr__(self, *args, **kwargs): # real signature unknown

        """ Return repr(self). """

        pass

    buffer = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default

    closed = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default

    encoding = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default

    errors = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default

    line_buffering = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default

    name = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default

    newlines = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default

    _CHUNK_SIZE = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default

    _finalizing = property(lambda self: object(), lambda self, v: None, lambda self: None)  # default

3.x

管理上下文

为了避免打开文件后忘记关闭，可以通过管理上下文，即：

with open('log','r') as f:

    ...

如此方式，当with代码块执行完毕时，内部会自动关闭并释放文件资源。

在Python 2.7 及以后，with又支持同时对多个文件的上下文进行管理，即：

with open('log1') as obj1, open('log2') as obj2:

    pass

文件处理流程

打开文件，得到文件句柄并赋值给一个变量
通过句柄对文件进行操作
关闭文件

open() 将会返回一个 file 对象，基本语法格式如下:

open(filename, mode)

>>> f = open('/Users/michael/test.txt', 'r')

filename：filename 变量是一个包含了你要访问的文件名称的字符串值。
mode：mode决定了打开文件的模式：只读，写入，追加等。所有可取值见如下的完全列表。这个参数是非强制的，默认文件访问模式为只读(r)。

标示符'r'表示读，这样，我们就成功地打开了一个文件。

如果文件不存在，open()函数就会抛出一个IOError的错误，并且给出错误码和详细的信息告诉你文件不存在

>>> f=open('/Users/michael/notfound.txt', 'r')

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>

FileNotFoundError: [Errno 2] No such file or directory: '/Users/michael/notfound.txt'

不同模式打开文件的完全列表：

模式	描述
r	以只读方式打开文件。文件的指针将会放在文件的开头。这是默认模式。
rb	以二进制格式打开一个文件用于只读。文件指针将会放在文件的开头。这是默认模式。
r+	打开一个文件用于读写。文件指针将会放在文件的开头。
rb+	以二进制格式打开一个文件用于读写。文件指针将会放在文件的开头。
w	打开一个文件只用于写入。如果该文件已存在则将其覆盖。如果该文件不存在，创建新文件。
wb	以二进制格式打开一个文件只用于写入。如果该文件已存在则将其覆盖。如果该文件不存在，创建新文件。
w+	打开一个文件用于读写。如果该文件已存在则将其覆盖。如果该文件不存在，创建新文件。
wb+	以二进制格式打开一个文件用于读写。如果该文件已存在则将其覆盖。如果该文件不存在，创建新文件。
a	打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。也就是说，新的内容将会被写入到已有内容之后。如果该文件不存在，创建新文件进行写入。
ab	以二进制格式打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。也就是说，新的内容将会被写入到已有内容之后。如果该文件不存在，创建新文件进行写入。
a+	打开一个文件用于读写。如果该文件已存在，文件指针将会放在文件的结尾。文件打开时会是追加模式。如果该文件不存在，创建新文件用于读写。
ab+	以二进制格式打开一个文件用于追加。如果该文件已存在，文件指针将会放在文件的结尾。如果该文件不存在，创建新文件用于读写。

打开文件时，需要指定文件路径和以何等方式打开文件，打开后，即可获取该文件句柄，日后通过此文件句柄对该文件操作。

打开文件的模式有：

r ，只读模式【默认模式，文件必须存在，不存在则抛出异常】
w，只写模式【不可读；不存在则创建；存在则清空内容】
x，只写模式【不可读；不存在则创建，存在则报错】
a，追加模式【可读；不存在则创建；存在则只追加内容】

"+" 表示可以同时读写某个文件

r+，读写【可读，可写】
w+，写读【可读，可写】
x+ ，写读【可读，可写】
a+，写读【可读，可写】

"b"表示以字节的方式操作

rb 或 r+b
wb 或 w+b
xb 或 w+b
ab 或 a+b

注：以b方式打开时，读取到的内容是字节类型，写入时也需要提供字节类型，不能指定编码

以下实例将字符串写入到文件 foo.txt 中：

#!/usr/bin/python3

# 打开一个文件

f = open("/tmp/foo.txt", "w")

f.write( "Python 是一个非常好的语言。\n是的，的确非常好!!\n" )

# 关闭打开的文件

f.close()

第一个参数为要打开的文件名。
第二个参数描述文件如何使用的字符。 mode 可以是 'r' 如果文件只读, 'w' 只用于写 (如果存在同名文件则将被删除), 和 'a' 用于追加文件内容; 所写的任何数据都会被自动增加到末尾. 'r+' 同时用于读写。 mode 参数是可选的; 'r' 将是默认值。

此时打开文件 foo.txt,显示如下：

$ cat /tmp/foo.txt

Python 是一个非常好的语言。

是的，的确非常好!!

file-like Object

像open()函数返回的这种有个read()方法的对象，在Python中统称为file-like Object。除了file外，还可以是内存的字节流，网络流，自定义流等等。file-like Object不要求从特定类继承，只要写个read()方法就行。

StringIO就是在内存中创建的file-like Object，常用作临时缓冲。

二进制文件

前面讲的默认都是读取文本文件，并且是UTF-8编码的文本文件。要读取二进制文件，比如图片、视频等等，用'rb'模式打开文件即可：

>>> f = open('/Users/michael/test.jpg', 'rb')

>>> f.read()

b'\xff\xd8\xff\xe1\x00\x18Exif\x00\x00...' # 十六进制表示的字节

字符编码

要读取非UTF-8编码的文本文件，需要给open()函数传入encoding参数，例如，读取GBK编码的文件：

>>> f = open('/Users/michael/gbk.txt', 'r', encoding='gbk')

>>> f.read()

'测试'

遇到有些编码不规范的文件，你可能会遇到UnicodeDecodeError，因为在文本文件中可能夹杂了一些非法编码的字符。遇到这种情况，open()函数还接收一个errors参数，表示如果遇到编码错误后如何处理。最简单的方式是直接忽略：

>>> f = open('/Users/michael/gbk.txt', 'r', encoding='gbk', errors='ignore')

需知:

1.在python2默认编码是ASCII, python3里默认是utf-8

2.unicode 分为 utf-32(占4个字节),utf-16(占两个字节)，utf-8(占1-4个字节)， so utf-8就是unicode

3.在py3中encode,在转码的同时还会把string 变成bytes类型，decode在解码的同时还会把bytes变回string

文件对象的方法

本节中剩下的例子假设已经创建了一个称为 f 的文件对象。

f.read()

为了读取一个文件的内容，调用 f.read(size), 这将读取一定数目的数据, 然后作为字符串或字节对象返回。

size 是一个可选的数字类型的参数。当 size 被忽略了或者为负, 那么该文件的所有内容都将被读取并且返回。

以下实例假定文件 foo.txt 已存在（上面实例中已创建）：

#!/usr/bin/python3

# 打开一个文件

f = open("/tmp/foo.txt", "r")

str = f.read()

print(str)

# 关闭打开的文件

f.close()

执行以上程序，输出结果为：

Python 是一个非常好的语言。

是的，的确非常好!!

f.readline()

f.readline() 会从文件中读取单独的一行。换行符为 '\n'。f.readline() 如果返回一个空字符串, 说明已经已经读取到最后一行。

#!/usr/bin/python3

# 打开一个文件

f = open("/tmp/foo.txt", "r")

str = f.readline()

print(str)

# 关闭打开的文件

f.close()

执行以上程序，输出结果为：

Python 是一个非常好的语言。

f.readlines()

f.readlines() 将返回该文件中包含的所有行。

如果设置可选参数 sizehint, 则读取指定长度的字节, 并且将这些字节按行分割。

#!/usr/bin/python3

# 打开一个文件

f = open("/tmp/foo.txt", "r")

str = f.readlines()

print(str)

# 关闭打开的文件

f.close()

执行以上程序，输出结果为：

['Python 是一个非常好的语言。\n', '是的，的确非常好!!\n']

另一种方式是迭代一个文件对象然后读取每行:

#!/usr/bin/python3

# 打开一个文件

f = open("/tmp/foo.txt", "r")

for line in f:

    print(line, end='')

# 关闭打开的文件

f.close()

结果：

python是一个非常好的语言

是的，的确非常好！！

这个方法很简单, 但是并没有提供一个很好的控制。因为两者的处理机制不同, 最好不要混用。

f.write()

f.write(string) 将 string 写入到文件中, 然后返回写入的字符数。

#!/usr/bin/python3

# 打开一个文件

f = open("/tmp/foo.txt", "w")

num = f.write( "Python 是一个非常好的语言。\n是的，的确非常好!!\n" )

print(num)

# 关闭打开的文件

f.close()

执行以上程序，输出结果为：

如果要写入一些不是字符串的东西, 那么将需要先进行转换:

#!/usr/bin/python3

# 打开一个文件

f = open("/tmp/foo1.txt", "w")

value = ('www.baidu.com', 14)

s = str(value)

f.write(s)

# 关闭打开的文件

f.close()

执行以上程序，打开 foo1.txt 文件：

$ cat /tmp/foo1.txt

('www.baidu.com', 14)

with语句

为了避免打开文件后忘记关闭，可以通过管理上下文，即：

with open('log','r') as f:

    ...

如此方式，当with代码块执行完毕时，内部会自动关闭并释放文件资源。

在Python 2.7 后，with又支持同时对多个文件的上下文进行管理，即：

with open('log1') as obj1, open('log2') as obj2:

    pass

你可以反复调用write()来写入文件，但是务必要调用f.close()来关闭文件。当我们写文件时，操作系统往往不会立刻把数据写入磁盘，而是放到内存缓存起来，空闲的时候再慢慢写入。只有调用close()方法时，操作系统才保证把没有写入的数据全部写入磁盘。忘记调用close()的后果是数据可能只写了一部分到磁盘，剩下的丢失了。所以，还是用with语句来得保险：

要写入特定编码的文本文件，请给open()函数传入encoding参数，将字符串自动转换成指定编码。

小结

在Python中，文件读写是通过open()函数打开的文件对象完成的。使用with语句操作文件IO是个好习惯。

f.tell()

f.tell() 返回文件对象当前所处的位置, 它是从文件开头开始算起的字节数。

f.seek()

如果要改变文件当前的位置, 可以使用 f.seek(offset, from_what) 函数。

from_what 的值, 如果是 0 表示开头, 如果是 1 表示当前位置, 2 表示文件的结尾，例如：

seek(x,0) ：从起始位置即文件首行首字符开始移动 x 个字符
seek(x,1) ：表示从当前位置往后移动x个字符
seek(-x,2)：表示从文件的结尾往前移动x个字符

from_what 值为默认为0，即文件开头。下面给出一个完整的例子：

>>> f = open('/tmp/foo.txt', 'rb+')

>>> f.write(b'0123456789abcdef')

16

>>> f.seek(5)     # 移动到文件的第六个字节

5

>>> f.read(1)

b'5'

>>> f.seek(-3, 2) # 移动到文件的倒数第三字节

13

>>> f.read(1)

b'd'

文件内置函数flush

flush原理：

文件操作是通过软件将文件从硬盘读到内存
写入文件的操作也都是存入内存缓冲区buffer（内存速度快于硬盘，如果写入文件的数据都从内存刷到硬盘，内存与硬盘的速度延迟会被无限放大，效率变低，所以要刷到硬盘的数据我们统一往内存的一小块空间即buffer中放，一段时间后操作系统会将buffer中数据一次性刷到硬盘）
flush即，强制将写入的数据刷到硬盘

滚动条：

import sys,time

for i in  range(10):

    sys.stdout.write('#')

    sys.stdout.flush()

    time.sleep(0.2)

f.close()

在文本文件中 (那些打开文件的模式下没有 b 的), 只会相对于文件起始位置进行定位。

当你处理完一个文件后, 调用 f.close() 来关闭文件并释放系统的资源，如果尝试再调用该文件，则会抛出异常。

>>> f.close()

>>> f.read()

Traceback (most recent call last):

  File "<stdin>", line 1, in ?

ValueError: I/O operation on closed file

<pre>

<p>

当处理一个文件对象时, 使用 with 关键字是非常好的方式。在结束后, 它会帮你正确的关闭文件。 而且写起来也比 try - finally 语句块要简短:</p>

<pre>

>>> with open('/tmp/foo.txt', 'r') as f:

...     read_data = f.read()

>>> f.closed

True

文件对象还有其他方法, 如 isatty() 和 trucate(), 但这些通常比较少用。

pickle 模块

Python中可以使用 pickle 模块将对象转化为文件保存在磁盘上，在需要的时候再读取并还原。具体用法如下：

pickle.dump(obj, file[, protocol])

这是将对象持久化的方法，参数的含义分别为：
obj: 要持久化保存的对象；
file: 一个拥有 write() 方法的对象，并且这个 write() 方法能接收一个字符串作为参数。这个对象可以是一个以写模式打开的文件对象或者一个 StringIO 对象，或者其他自定义的满足条件的对象。
protocol: 这是一个可选的参数，默认为 0 ，如果设置为 1 或 True，则以高压缩的二进制格式保存持久化后的对象，否则以ASCII格式保存。

对象被持久化后怎么还原呢？pickle 模块也提供了相应的方法，如下：

pickle.load(file)

只有一个参数 file ，对应于上面 dump 方法中的 file 参数。这个 file 必须是一个拥有一个能接收一个整数为参数的 read() 方法以及一个不接收任何参数的 readline() 方法，并且这两个方法的返回值都应该是字符串。这可以是一个打开为读的文件对象、StringIO 对象或其他任何满足条件的对象。

pickle是Python库中常用的序列化工具，可以将内存对象以文本或二进制格式导出为字符串，或者写入文档。后续可以从字符或文档中还原为内存对象。新版本的Python中用c重新实现了一遍，叫cPickle，性能更高。 下面的代码演示了pickle库的常用接口用法，非常简单：

import cPickle as pickle

# dumps and loads

# 将内存对象dump为字符串，或者将字符串load为内存对象

def test_dumps_and_loads():

    t = {'name': ['v1', 'v2']}

    print t

    o = pickle.dumps(t)

    print o

    print 'len o: ', len(o)

    p = pickle.loads(o)

    print p

# 关于HIGHEST_PROTOCOL参数，pickle 支持3种protocol，0、1、2：

# http://*.com/questions/23582489/python-pickle-protocol-choice

# 0：ASCII protocol，兼容旧版本的Python

# 1：binary format，兼容旧版本的Python

# 2：binary format，Python2.3 之后才有，更好的支持new-sytle class

def test_dumps_and_loads_HIGHEST_PROTOCOL():

    print 'HIGHEST_PROTOCOL: ', pickle.HIGHEST_PROTOCOL

    t = {'name': ['v1', 'v2']}

    print t

    o = pickle.dumps(t, pickle.HIGHEST_PROTOCOL)

    print 'len o: ', len(o)

    p = pickle.loads(o)

    print p

# new-style class

def test_new_sytle_class():

    class TT(object):

        def __init__(self, arg, **kwargs):

            super(TT, self).__init__()

            self.arg = arg

            self.kwargs = kwargs

        def test(self):

            print self.arg

            print self.kwargs

    # ASCII protocol

    t = TT('test', a=1, b=2)

    o1 = pickle.dumps(t)

    print o1

    print 'o1 len: ', len(o1)

    p = pickle.loads(o1)

    p.test()

    # HIGHEST_PROTOCOL对new-style class支持更好，性能更高

    o2 = pickle.dumps(t, pickle.HIGHEST_PROTOCOL)

    print 'o2 len: ', len(o2)

    p = pickle.loads(o2)

    p.test()

# dump and load

# 将内存对象序列化后直接dump到文件或支持文件接口的对象中

# 对于dump，需要支持write接口，接受一个字符串作为输入参数，比如：StringIO

# 对于load，需要支持read接口，接受int输入参数，同时支持readline接口，无输入参数，比如StringIO

# 使用文件，ASCII编码

def test_dump_and_load_with_file():

    t = {'name': ['v1', 'v2']}

    # ASCII format

    with open('test.txt', 'w') as fp:

        pickle.dump(t, fp)

    with open('test.txt', 'r') as fp:

        p = pickle.load(fp)

        print p

# 使用文件，二进制编码

def test_dump_and_load_with_file_HIGHEST_PROTOCOL():

    t = {'name': ['v1', 'v2']}

    with open('test.bin', 'wb') as fp:

        pickle.dump(t, fp, pickle.HIGHEST_PROTOCOL)

    with open('test.bin', 'rb') as fp:

        p = pickle.load(fp)

        print p

# 使用StringIO，二进制编码

def test_dump_and_load_with_StringIO():

    import StringIO

    t = {'name': ['v1', 'v2']}

    fp = StringIO.StringIO()

    pickle.dump(t, fp, pickle.HIGHEST_PROTOCOL)

    fp.seek(0)

    p = pickle.load(fp)

    print p

    fp.close()

# 使用自定义类

# 这里演示用户自定义类，只要实现了write、read、readline接口，

# 就可以用作dump、load的file参数

def test_dump_and_load_with_user_def_class():

    import StringIO

    class FF(object):

        def __init__(self):

            self.buf = StringIO.StringIO()

        def write(self, s):

            self.buf.write(s)

            print 'len: ', len(s)

        def read(self, n):

            return self.buf.read(n)

        def readline(self):

            return self.buf.readline()

        def seek(self, pos, mod=0):

            return self.buf.seek(pos, mod)

        def close(self):

            self.buf.close()

    fp = FF()

    t = {'name': ['v1', 'v2']}

    pickle.dump(t, fp, pickle.HIGHEST_PROTOCOL)

    fp.seek(0)

    p = pickle.load(fp)

    print p

    fp.close()

# Pickler/Unpickler

# Pickler(file, protocol).dump(obj) 等价于 pickle.dump(obj, file[, protocol])

# Unpickler(file).load() 等价于 pickle.load(file)

# Pickler/Unpickler 封装性更好，可以很方便的替换file

def test_pickler_unpickler():

    t = {'name': ['v1', 'v2']}

    f = file('test.bin', 'wb')

    pick = pickle.Pickler(f, pickle.HIGHEST_PROTOCOL)

    pick.dump(t)

    f.close()

    f = file('test.bin', 'rb')

    unpick = pickle.Unpickler(f)

    p = unpick.load()

    print p

    f.close()

python的pickle模块实现了基本的数据序列和反序列化。

通过pickle模块的序列化操作我们能够将程序中运行的对象信息保存到文件中去，永久存储。

通过pickle模块的反序列化操作，我们能够从文件中创建上一次程序保存的对象。

基本接口：

pickle.dump(obj, file, [,protocol])

有了 pickle 这个对象, 就能对 file 以读取的形式打开:

x = pickle.load(file)

注解：从 file 中读取一个字符串，并将它重构为原来的python对象。

file: 类文件对象，有read()和readline()接口。

实例1：

#!/usr/bin/python3

import pickle

# 使用pickle模块将数据对象保存到文件

data1 = {'a': [1, 2.0, 3, 4+6j],

         'b': ('string', u'Unicode string'),

         'c': None}

selfref_list = [1, 2, 3]

selfref_list.append(selfref_list)

output = open('data.pkl', 'wb')

# Pickle dictionary using protocol 0.

pickle.dump(data1, output)

# Pickle the list using the highest protocol available.

pickle.dump(selfref_list, output, -1)

output.close()

实例2：

#!/usr/bin/python3

import pprint, pickle

#使用pickle模块从文件中重构python对象

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)

pprint.pprint(data1)

data2 = pickle.load(pkl_file)

pprint.pprint(data2)

pkl_file.close()

Python3 File(文件) 方法

file 对象使用 open 函数来创建，下表列出了 file 对象常用的函数：

序号	方法及描述
1	file.close() 关闭文件。关闭后文件不能再进行读写操作。
2	file.flush() 刷新文件内部缓冲，直接把内部缓冲区的数据立刻写入文件, 而不是被动的等待输出缓冲区写入。
3	file.fileno() 返回一个整型的文件描述符(file descriptor FD 整型), 可以用在如os模块的read方法等一些底层操作上。
4	file.isatty() 如果文件连接到一个终端设备返回 True，否则返回 False。
5	file.next() 返回文件下一行。
6	file.read([size]) 从文件读取指定的字节数，如果未给定或为负则读取所有。
7	file.readline([size]) 读取整行，包括 "\n" 字符。
8	file.readlines([sizehint]) 读取所有行并返回列表，若给定sizeint>0，返回总和大约为sizeint字节的行, 实际读取值可能比sizhint较大, 因为需要填充缓冲区。
9	file.seek(offset[, whence]) 设置文件当前位置
10	file.tell() 返回文件当前位置。
11	file.truncate([size]) 截取文件，截取的字节通过size指定，默认为当前文件位置。
12	file.write(str) 将字符串写入文件，没有返回值。
13	file.writelines(sequence) 向文件写入一个序列字符串列表，如果需要换行则要自己加入每行的换行符。

Python3 OS 文件/目录方法

如果我们要操作文件、目录，可以在命令行下面输入操作系统提供的各种命令来完成。比如dir、cp等命令。

如果要在Python程序中执行这些目录和文件的操作怎么办？其实操作系统提供的命令只是简单地调用了操作系统提供的接口函数，Python内置的os模块也可以直接调用操作系统提供的接口函数。

打开Python交互式命令行，我们来看看如何使用os模块的基本功能：

>>> import os

>>> os.name # 操作系统类型

'posix'

如果是posix，说明系统是Linux、Unix或Mac OS X，如果是nt，就是Windows系统。

要获取详细的系统信息，可以调用uname()函数：

>>> os.uname()

posix.uname_result(sysname='Darwin', nodename='MichaelMacPro.local', release='14.3.0', version='Darwin Kernel Version 14.3.0: Mon Mar 23 11:59:05 PDT 2015; root:xnu-2782.20.48~5/RELEASE_X86_64', machine='x86_64')

注意uname()函数在Windows上不提供，也就是说，os模块的某些函数是跟操作系统相关的。

环境变量

在操作系统中定义的环境变量，全部保存在os.environ这个变量中，可以直接查看：

>>> os.environ

environ({'VERSIONER_PYTHON_PREFER_32_BIT': 'no', 'TERM_PROGRAM_VERSION': '326', 'LOGNAME': 'michael', 'USER': 'michael', 'PATH': '/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/opt/X11/bin:/usr/local/mysql/bin', ...})

要获取某个环境变量的值，可以调用os.environ.get('key')：

>>> os.environ.get('PATH')

'/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/opt/X11/bin:/usr/local/mysql/bin'

>>> os.environ.get('x', 'default')

'default'

操作文件和目录

操作文件和目录的函数一部分放在os模块中，一部分放在os.path模块中，这一点要注意一下。查看、创建和删除目录可以这么调用：

# 查看当前目录的绝对路径:

>>> os.path.abspath('.')

'/Users/michael'

# 在某个目录下创建一个新目录，首先把新目录的完整路径表示出来:

>>> os.path.join('/Users/michael', 'testdir')

'/Users/michael/testdir'

# 然后创建一个目录:

>>> os.mkdir('/Users/michael/testdir')

# 删掉一个目录:

>>> os.rmdir('/Users/michael/testdir')

把两个路径合成一个时，不要直接拼字符串，而要通过os.path.join()函数，这样可以正确处理不同操作系统的路径分隔符。在Linux/Unix/Mac下，os.path.join()返回这样的字符串：

part-1/part-2

而Windows下会返回这样的字符串：

part-1\part-2

os 模块提供了非常丰富的方法用来处理文件和目录。常用的方法如下表所示：

序号	方法及描述
1	os.access(path, mode) 检验权限模式
2	os.chdir(path) 改变当前工作目录
3	os.chflags(path, flags) 设置路径的标记为数字标记。
4	os.chmod(path, mode) 更改权限
5	os.chown(path, uid, gid) 更改文件所有者
6	os.chroot(path) 改变当前进程的根目录
7	os.close(fd) 关闭文件描述符 fd
8	os.closerange(fd_low, fd_high) 关闭所有文件描述符，从 fd_low (包含) 到 fd_high (不包含), 错误会忽略
9	os.dup(fd) 复制文件描述符 fd
10	os.dup2(fd, fd2) 将一个文件描述符 fd 复制到另一个 fd2
11	os.fchdir(fd) 通过文件描述符改变当前工作目录
12	os.fchmod(fd, mode) 改变一个文件的访问权限，该文件由参数fd指定，参数mode是Unix下的文件访问权限。
13	os.fchown(fd, uid, gid) 修改一个文件的所有权，这个函数修改一个文件的用户ID和用户组ID，该文件由文件描述符fd指定。
14	os.fdatasync(fd) 强制将文件写入磁盘，该文件由文件描述符fd指定，但是不强制更新文件的状态信息。
15	os.fdopen(fd[, mode[, bufsize]]) 通过文件描述符 fd 创建一个文件对象，并返回这个文件对象
16	os.fpathconf(fd, name) 返回一个打开的文件的系统配置信息。name为检索的系统配置的值，它也许是一个定义系统值的字符串，这些名字在很多标准中指定（POSIX.1, Unix 95, Unix 98, 和其它）。
17	os.fstat(fd) 返回文件描述符fd的状态，像stat()。
18	os.fstatvfs(fd) 返回包含文件描述符fd的文件的文件系统的信息，像 statvfs()
19	os.fsync(fd) 强制将文件描述符为fd的文件写入硬盘。
20	os.ftruncate(fd, length) 裁剪文件描述符fd对应的文件, 所以它最大不能超过文件大小。
21	os.getcwd() 返回当前工作目录
22	os.getcwdu() 返回一个当前工作目录的Unicode对象
23	os.isatty(fd) 如果文件描述符fd是打开的，同时与tty(-like)设备相连，则返回true, 否则False。
24	os.lchflags(path, flags) 设置路径的标记为数字标记，类似 chflags()，但是没有软链接
25	os.lchmod(path, mode) 修改连接文件权限
26	os.lchown(path, uid, gid) 更改文件所有者，类似 chown，但是不追踪链接。
27	os.link(src, dst) 创建硬链接，名为参数 dst，指向参数 src
28	os.listdir(path) 返回path指定的文件夹包含的文件或文件夹的名字的列表。
29	os.lseek(fd, pos, how) 设置文件描述符 fd当前位置为pos, how方式修改: SEEK_SET 或者 0 设置从文件开始的计算的pos; SEEK_CUR或者 1 则从当前位置计算; os.SEEK_END或者2则从文件尾部开始. 在unix，Windows中有效
30	os.lstat(path) 像stat(),但是没有软链接
31	os.major(device) 从原始的设备号中提取设备major号码 (使用stat中的st_dev或者st_rdev field)。
32	os.makedev(major, minor) 以major和minor设备号组成一个原始设备号
33	os.makedirs(path[, mode]) 递归文件夹创建函数。像mkdir(), 但创建的所有intermediate-level文件夹需要包含子文件夹。
34	os.minor(device) 从原始的设备号中提取设备minor号码 (使用stat中的st_dev或者st_rdev field )。
35	os.mkdir(path[, mode]) 以数字mode的mode创建一个名为path的文件夹.默认的 mode 是 0777 (八进制)。
36	os.mkfifo(path[, mode]) 创建命名管道，mode 为数字，默认为 0666 (八进制)
37	os.mknod(filename[, mode=0600, device]) 创建一个名为filename文件系统节点（文件，设备特别文件或者命名pipe）。
38	os.open(file, flags[, mode]) 打开一个文件，并且设置需要的打开选项，mode参数是可选的
39	os.openpty() 打开一个新的伪终端对。返回 pty 和 tty的文件描述符。
40	os.pathconf(path, name) 返回相关文件的系统配置信息。
41	os.pipe() 创建一个管道. 返回一对文件描述符(r, w) 分别为读和写
42	os.popen(command[, mode[, bufsize]]) 从一个 command 打开一个管道
43	os.read(fd, n) 从文件描述符 fd 中读取最多 n 个字节，返回包含读取字节的字符串，文件描述符 fd对应文件已达到结尾, 返回一个空字符串。
44	os.readlink(path) 返回软链接所指向的文件
45	os.remove(path) 删除路径为path的文件。如果path 是一个文件夹，将抛出OSError; 查看下面的rmdir()删除一个 directory。
46	os.removedirs(path) 递归删除目录。
47	os.rename(src, dst) 重命名文件或目录，从 src 到 dst
48	os.renames(old, new) 递归地对目录进行更名，也可以对文件进行更名。
49	os.rmdir(path) 删除path指定的空目录，如果目录非空，则抛出一个OSError异常。
50	os.stat(path) 获取path指定的路径的信息，功能等同于C API中的stat()系统调用。
51	os.stat_float_times([newvalue]) 决定stat_result是否以float对象显示时间戳
52	os.statvfs(path) 获取指定路径的文件系统统计信息
53	os.symlink(src, dst) 创建一个软链接
54	os.tcgetpgrp(fd) 返回与终端fd（一个由os.open()返回的打开的文件描述符）关联的进程组
55	os.tcsetpgrp(fd, pg) 设置与终端fd（一个由os.open()返回的打开的文件描述符）关联的进程组为pg。
56	os.tempnam([dir[, prefix]]) 返回唯一的路径名用于创建临时文件。
57	os.tmpfile() 返回一个打开的模式为(w+b)的文件对象 .这文件对象没有文件夹入口，没有文件描述符，将会自动删除。
58	os.tmpnam() 为创建一个临时文件返回一个唯一的路径
59	os.ttyname(fd) 返回一个字符串，它表示与文件描述符fd 关联的终端设备。如果fd 没有与终端设备关联，则引发一个异常。
60	os.unlink(path) 删除文件路径
61	os.utime(path, times) 返回指定的path文件的访问和修改的时间。
62	os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]]) 输出在文件夹中的文件名通过在树中游走，向上或者向下。
63	os.write(fd, str) 写入字符串到文件描述符 fd中. 返回实际写入的字符串长度

码农公寓

读和写文件

文件处理流程

file-like Object

二进制文件

字符编码

文件对象的方法

f.read()

f.readline()

f.readlines()

f.write()

小结

f.tell()

f.seek()

文件内置函数flush

f.close()

pickle 模块

Python3 File(文件) 方法

Python3 OS 文件/目录方法

环境变量

操作文件和目录

相关文章