使用python处理文件

2021-07-25 03:53:18

想做一些简单的文件操作，用java太重量级，python是一个不错的选择。

有一个需求是将一个文件夹中所有的文件的内容提取出来分别填入excel的一个单元格中，

用os就可以对文件进行遍历，读文件信息

import 
os
# Get the all files & directories in the specified directory (path).  

def get_recursive_file_list(path):  

    current_files = 
os.listdir(path)  

    all_files = 
[]  

    for 
file_name in 
current_files:  

        full_file_name = 
os.path.join(path, file_name)  

        all_files.append(full_file_name)  

   

        if 
os.path.isdir(full_file_name):  

            next_level_files = 
get_recursive_file_list(full_file_name)  

            all_files.extend(next_level_files)  

   

    return 
all_files  
 

all_files=get_recursive_file_list(‘C:\Users\green_pasture\Desktop\korea\key_words‘)

for 
filename in 
all_files:

    print 
filename

    f1=open(filename,‘r+‘)

    for 
line1 in 
f1:

    　　print 
"\n"

        　　print 
line1,

    f1.close

　　但是遇到一个问题，IndentationError:unindent does not match any outer indentation level，于是去查找了一下python的indentation：

http://www.secnetix.de/olli/Python/block_indentation.hawk

“关于python缩进的迷思”中说道：只有缩进层（即语句最左边）的空格是有意义的，并且跟缩进的确切数目无关，只和代码块的相对缩进有关。

同时，在你使用显式或者隐式的continue line时缩进会被忽略。

你可以把内层的代码同时写在一行，用分号隔开。如果要将他们写到不同的行，那么python会强制你使用它的indentation规则。在python中，缩进的层次和代码的逻辑结构是一致的。

不要把tab和space混在一起，通常，tab可以自动用8个空格来代替。

import 
os
# Get the all files & directories in the specified directory (path).  

def get_recursive_file_list(path):  

    current_files = 
os.listdir(path)  

    all_files = 
[]  

    for 
file_name in 
current_files:  

        full_file_name = 
os.path.join(path, file_name)  

        all_files.append(full_file_name)  

   

        if 
os.path.isdir(full_file_name):  

            next_level_files = 
get_recursive_file_list(full_file_name)  

            all_files.extend(next_level_files)  

   

    return 
all_files  
 

all_files=get_recursive_file_list(‘C:\Users\green_pasture\Desktop\korea\key_words‘)

for 
filename in 
all_files:

    print 
filename

    f1=open(filename,‘r+‘)

    for 
line1 in 
f1:        print 
line1,

　　我将for里面的语句跟for写在了同一行，程序没有错误了。

接着我要把打印出来的文件内容写入到excel单元格当中。

可以使用xlsxWriter https://xlsxwriter.readthedocs.org/

xlwt http://www.python-excel.org/

openpyxl http://pythonhosted.org/openpyxl/

xlsxWriter文档挺全，就考虑用这个

samplecode是：

##############################################################################
#
# A simple example of some of the features of the XlsxWriter Python module.
#
# Copyright 2013-2014, John McNamara, jmcnamara@cpan.org
#

import 
xlsxwriter
 
 
# Create an new Excel file and add a worksheet.

workbook = 
xlsxwriter.Workbook(‘demo.xlsx‘)

worksheet = 
workbook.add_worksheet()
 
# Widen the first column to make the text clearer.

worksheet.set_column(‘A:A‘, 20)
 
# Add a bold format to use to highlight cells.

bold = 
workbook.add_format({‘bold‘: True})
 
# Write some simple text.

worksheet.write(‘A1‘, ‘Hello‘)
 
# Text with formatting.

worksheet.write(‘A2‘, ‘World‘, bold)
 
# Write some numbers, with row/column notation.

worksheet.write(2, 0, 123)

worksheet.write(3, 0, 123.456)
 
# Insert an image.

worksheet.insert_image(‘B5‘, ‘logo.png‘)
 
workbook.close()

　　如何安装呢？可以使用pip installer，我的python目录下C:\Python33\Scripts已经有pip.exe，把当前路径设置到path环境变量，在命令行执行pip install XlsxWriter

出现了错误：

Fatal error in launcher: Unable to create process using C:\Python33\Scripts\pip.exe install XlsxWriter。

改用从github下载，

$ git clone https://github.com/jmcnamara/XlsxWriter.git
 
$ cd XlsxWriter
$ sudo python setup.py install

　　创建了一个测试程序：

import 
xlsxwriter
 
workbook = 
xlsxwriter.Workbook(‘hello.xlsx‘)

worksheet = 
workbook.add_worksheet()
 
worksheet.write(‘A1‘, ‘Hello world‘)
 
workbook.close()

　　测试成功。

import 
os

import 
xlsxwriter
# Get the all files & directories in the specified directory (path).  

def get_recursive_file_list(path):  

    current_files = 
os.listdir(path)  

    all_files = 
[]  

    for 
file_name in 
current_files:  

        full_file_name = 
os.path.join(path, file_name)  

        all_files.append(full_file_name)  

   

        if 
os.path.isdir(full_file_name):  

            next_level_files = 
get_recursive_file_list(full_file_name)  

            all_files.extend(next_level_files)  

   

    return 
all_files  
 

workbook = 
xlsxwriter.Workbook(‘keywords.xlsx‘)

worksheet = 
workbook.add_worksheet()
 

all_files=get_recursive_file_list(‘C:\Users\green_pasture\Desktop\korea\key_words‘)

row=0

for 
filename in 
all_files:

    print 
filename

    f1=open(filename,‘r+‘)

    keywords=""

    list=[]

    a="\n"

    for 
line in 
f1:     list.append(line),

    keywords=a.join(list)

    print 
keywords

    worksheet.write(row,0, filename)

    worksheet.write(row,1,keywords.decode("utf-8"))

    row=row+1
workbook.close()

使用python处理文件,布布扣,bubuko.com

使用python处理文件

码农公寓

相关文章