想做一些简单的文件操作,用java太重量级,python是一个不错的选择。
有一个需求是将一个文件夹中所有的文件的内容提取出来分别填入excel的一个单元格中,
用os就可以对文件进行遍历,读文件信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
|
import
os
# Get the all files & directories in the specified directory (path). def get_recursive_file_list(path):
current_files =
os.listdir(path)
all_files =
[]
for
file_name in
current_files:
full_file_name =
os.path.join(path, file_name)
all_files.append(full_file_name)
if
os.path.isdir(full_file_name):
next_level_files =
get_recursive_file_list(full_file_name)
all_files.extend(next_level_files)
return
all_files
all_files = get_recursive_file_list( ‘C:\Users\green_pasture\Desktop\korea\key_words‘ )
for
filename in
all_files:
print
filename
f1 = open (filename, ‘r+‘ )
for
line1 in
f1:
print
"\n"
print
line1,
f1.close
|
但是遇到一个问题,IndentationError:unindent does not match any outer indentation level,于是去查找了一下python的indentation:
http://www.secnetix.de/olli/Python/block_indentation.hawk
“关于python缩进的迷思”中说道:只有缩进层(即语句最左边)的空格是有意义的,并且跟缩进的确切数目无关,只和代码块的相对缩进有关。
同时,在你使用显式或者隐式的continue line时缩进会被忽略。
你可以把内层的代码同时写在一行,用分号隔开。如果要将他们写到不同的行,那么python会强制你使用它的indentation规则。在python中,缩进的层次和代码的逻辑结构是一致的。
不要把tab和space混在一起,通常,tab可以自动用8个空格来代替。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
|
import
os
# Get the all files & directories in the specified directory (path). def get_recursive_file_list(path):
current_files =
os.listdir(path)
all_files =
[]
for
file_name in
current_files:
full_file_name =
os.path.join(path, file_name)
all_files.append(full_file_name)
if
os.path.isdir(full_file_name):
next_level_files =
get_recursive_file_list(full_file_name)
all_files.extend(next_level_files)
return
all_files
all_files = get_recursive_file_list( ‘C:\Users\green_pasture\Desktop\korea\key_words‘ )
for
filename in
all_files:
print
filename
f1 = open (filename, ‘r+‘ )
for
line1 in
f1: print
line1,
|
我将for里面的语句跟for写在了同一行,程序没有错误了。
接着我要把打印出来的文件内容写入到excel单元格当中。
可以使用xlsxWriter https://xlsxwriter.readthedocs.org/
xlwt http://www.python-excel.org/
openpyxl http://pythonhosted.org/openpyxl/
xlsxWriter文档挺全,就考虑用这个
samplecode是:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
############################################################################## # # A simple example of some of the features of the XlsxWriter Python module. # # Copyright 2013-2014, John McNamara, jmcnamara@cpan.org # import
xlsxwriter
# Create an new Excel file and add a worksheet. workbook =
xlsxwriter.Workbook( ‘demo.xlsx‘ )
worksheet =
workbook.add_worksheet()
# Widen the first column to make the text clearer. worksheet.set_column( ‘A:A‘ , 20 )
# Add a bold format to use to highlight cells. bold =
workbook.add_format({ ‘bold‘ : True })
# Write some simple text. worksheet.write( ‘A1‘ , ‘Hello‘ )
# Text with formatting. worksheet.write( ‘A2‘ , ‘World‘ , bold)
# Write some numbers, with row/column notation. worksheet.write( 2 , 0 , 123 )
worksheet.write( 3 , 0 , 123.456 )
# Insert an image. worksheet.insert_image( ‘B5‘ , ‘logo.png‘ )
workbook.close() |
如何安装呢?可以使用pip installer,我的python目录下C:\Python33\Scripts已经有pip.exe,把当前路径设置到path环境变量,在命令行执行pip install XlsxWriter
出现了错误:
Fatal error in launcher: Unable to create process using C:\Python33\Scripts\pip.exe install XlsxWriter。
改用从github下载,
1
2
3
4
|
$ git clone https: / / github.com / jmcnamara / XlsxWriter.git
$ cd XlsxWriter $ sudo python setup.py install |
创建了一个测试程序:
1
2
3
4
5
6
7
8
|
import
xlsxwriter
workbook =
xlsxwriter.Workbook( ‘hello.xlsx‘ )
worksheet =
workbook.add_worksheet()
worksheet.write( ‘A1‘ , ‘Hello world‘ )
workbook.close() |
测试成功。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
|
import
os
import
xlsxwriter
# Get the all files & directories in the specified directory (path). def get_recursive_file_list(path):
current_files =
os.listdir(path)
all_files =
[]
for
file_name in
current_files:
full_file_name =
os.path.join(path, file_name)
all_files.append(full_file_name)
if
os.path.isdir(full_file_name):
next_level_files =
get_recursive_file_list(full_file_name)
all_files.extend(next_level_files)
return
all_files
workbook =
xlsxwriter.Workbook( ‘keywords.xlsx‘ )
worksheet =
workbook.add_worksheet()
all_files = get_recursive_file_list( ‘C:\Users\green_pasture\Desktop\korea\key_words‘ )
row = 0
for
filename in
all_files:
print
filename
f1 = open (filename, ‘r+‘ )
keywords = ""
list = []
a = "\n"
for
line in
f1: list .append(line),
keywords = a.join( list )
print
keywords
worksheet.write(row, 0 , filename)
worksheet.write(row, 1 ,keywords.decode( "utf-8" ))
row = row + 1
workbook.close() |