Datawhale Task 02 Python自动化之Excel

2.0 包的安装

操作难度:⭐

方法一:应用pip执行命令

安装openpyxl模块pip install openpyxl

​ 注:openpyxl可以读取xlsx的格式,但是不可以去读xls格式;读取xls格式,可以安装xlrd模块,pip install xlrd,本章节以xlsx格式为主。

方法二:在Pycharm中:File->Setting->左侧Project Interpreter

2.1 Excel读取

项目难度:⭐

  • Excel全称为Microsoft Office Excel,2003年版本的是xls格式,2007和2007年之后的版本是xlsx格式。
  • xlsx格式通过openpyxl模块打开; xls格式通过xlwt模块写,xlrd模块读取。
  • 本文以xlsx模式为例

In [ ]:

# 多行内容显现
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = 'all'

2.1.1 读取对应表格

In [ ]:

import os
import warnings
warnings.filterwarnings('ignore')
os.getcwd()

Out[ ]:

'd:\\打卡学习\\OfficeAutomation\\source\\task02'

关于路径:

文件应在当前工作目录才可引用,可导入os,使用函数os.getcwd()弄清楚当前工作目录是什么,可使用os.chdir()改变当前工作目录,具体可参考第一章节。(此处显现为相对路径)

  1. 查看属性

In [ ]:

os.chdir(r'..\..')

In [ ]:

# 导入模块,查看属性
os.getcwd()
os.listdir()
import openpyxl
wb = openpyxl.load_workbook('用户行为偏好.xlsx')
type(wb)

Out[ ]:

'd:\\打卡学习\\OfficeAutomation'

Out[ ]:

['new.zip',
 'pdf版本',
 'pdf版本.zip',
 'readme.md',
 'source',
 'Task01 文件自动化与邮件处理.md',
 'Task02 Python与Excel.md',
 'Task03 python与word.md',
 'Task04 python与pdf.md',
 'Task05 爬虫入门与综合应用.md',
 'test.xlsx',
 'test_1.xlsx',
 'test_makedir',
 '图片',
 '用户行为偏好.xlsx']

Out[ ]:

openpyxl.workbook.workbook.Workbook

import * 和from...import...

import *from...import...的区别

  • import导入一个模块,相当于导入的是一个文件夹,相对路径。
  • from...import...导入了一个模块中的一个函数,相当于文件夹中的文件,绝对路径。

2、打开已经存在的Excel表格,查询对应sheet的名称

In [ ]:

# 导入模块中得函数,查询对应表得名称
from openpyxl import load_workbook
exl = load_workbook('用户行为偏好.xlsx')
print(exl.sheetnames)
['订单时长分布', 'Sheet3']

In [ ]:

'''通过传递表名字符串读取表、类型和名称'''
sheet = exl.get_sheet_by_name('Sheet3')
sheet
type(sheet)
sheet.title
'''读取工作簿得活动表'''
# 活动表是工作簿在Excel中打开时出现得工作表,再取得Worksheet对象后,可通过title属性取得它的名称。
anotherSheet = exl.active
anotherSheet

Out[ ]:

'通过传递表名字符串读取表、类型和名称'

Out[ ]:

<Worksheet "Sheet3">

Out[ ]:

openpyxl.worksheet.worksheet.Worksheet

Out[ ]:

'Sheet3'

Out[ ]:

'读取工作簿得活动表'

Out[ ]:

<Worksheet "订单时长分布">

3、获取Excel 内容占据的大小

In [ ]:

sheet.dimensions

Out[ ]:

'A1:O29'

2.1.2 读取单元格

Cell

  • Cell对象有一个value属性,包含这个单元格中保存的值。
  • Cell对象也有row、column和coordinate属性,提供该单元格的位置信息。
  • Excel用字母指定列,在Z列之后,列开始使用两个字母:AA、AB等,所以在调用的cell()方法时,可传入整数作为row和column关键字参数,也可以得到一个单元格。
  • 注:第一行或第一列的整数取1,而不是0.

In [ ]:

# 从表中取得单元格
## 获取表格名称
from openpyxl import load_workbook
exl = load_workbook(r'D:\打卡学习\OfficeAutomation\用户行为偏好.xlsx')
exl.get_sheet_names()

# 读取单元格
sheet = exl.get_sheet_by_name('订单时长分布')

'''显现单元格格式'''
sheet['A1']

'''显现单元格文本内容'''
sheet['B1'].value
# 另一种表达方式
a = sheet['A1']
a.value
'''行、列和数值显现'''
print('Row' +str(a.row) +',Column'+str(a.column)+ ' is '+a.value)
'''显现单元格'''
'Cell' + a.coordinate +' is '+ a.value

Out[ ]:

['订单时长分布', 'Sheet3']

Out[ ]:

'显现单元格格式'

Out[ ]:

<Cell '订单时长分布'.A1>

Out[ ]:

'显现单元格文本内容'

Out[ ]:

'日期'

Out[ ]:

'编号'

Out[ ]:

'行、列和数值显现'
Row1,Column1 is 编号

Out[ ]:

'显现单元格'

Out[ ]:

'CellA1 is 编号'

In [ ]:

# 顺B列打出前8行的奇数行单元格值
for i in range(1,8,2):
    print(i,sheet.cell(row=i,column=2).value)
1 日期
3 2020-07-24 00:00:00
5 2020-07-24 00:00:00
7 2020-07-24 00:00:00

In [ ]:

# 确定表格的最大行数和最大列数,即表的大小
sheet.max_row
sheet.max_column

Out[ ]:

102883

Out[ ]:

4

2.1.3 读取多个格子的值

In [ ]:

#A1到C8区域的值
cells = sheet['A1:C8']
type(cells)
#用enumerate包装一个可迭代对象,同时使用索引和迭代项
for index, item in enumerate(sheet['A1:C8']):
    if index >= 1:
        print("\n")
    for cell in item:
        print(cell.value,end=" ")

Out[ ]:

tuple
编号 日期 行为时长 

71401.30952380953 2020-07-24 00:00:00 a 

71401.30952380953 2020-07-24 00:00:00 b 

71401.30952380953 2020-07-24 00:00:00 c 

71401.30952380953 2020-07-24 00:00:00 d 

71401.30952380953 2020-07-24 00:00:00 e 

71401.30952380953 2020-07-24 00:00:00 f 

71401.30952380953 2020-07-24 00:00:00 g 

In [ ]:

# 指定范围的值
# 行获取
for row in sheet.iter_rows(min_row = 1, max_row = 5,
						   min_col = 2, max_col = 6):
	print(row)
	# 一列由多个单元格组成,若需要获取每个单元格的值则循环获取即可
	for cell in row:
		print(cell.value)
		
# 列获取
for col in sheet.iter_cols(min_row = 1, max_row = 5,
						   min_col = 2, max_col = 6):
	print(col)
	
	for cell in col:
		print(cell.value)
(<Cell '订单时长分布'.B1>, <Cell '订单时长分布'.C1>, <Cell '订单时长分布'.D1>, <Cell '订单时长分布'.E1>, <Cell '订单时长分布'.F1>)
日期
行为时长
次数
None
None
(<Cell '订单时长分布'.B2>, <Cell '订单时长分布'.C2>, <Cell '订单时长分布'.D2>, <Cell '订单时长分布'.E2>, <Cell '订单时长分布'.F2>)
2020-07-24 00:00:00
a
718.832012012012
None
None
(<Cell '订单时长分布'.B3>, <Cell '订单时长分布'.C3>, <Cell '订单时长分布'.D3>, <Cell '订单时长分布'.E3>, <Cell '订单时长分布'.F3>)
2020-07-24 00:00:00
b
728.862012012012
None
None
(<Cell '订单时长分布'.B4>, <Cell '订单时长分布'.C4>, <Cell '订单时长分布'.D4>, <Cell '订单时长分布'.E4>, <Cell '订单时长分布'.F4>)
2020-07-24 00:00:00
c
390.792012012012
None
None
(<Cell '订单时长分布'.B5>, <Cell '订单时长分布'.C5>, <Cell '订单时长分布'.D5>, <Cell '订单时长分布'.E5>, <Cell '订单时长分布'.F5>)
2020-07-24 00:00:00
d
228.542012012012
None
None
(<Cell '订单时长分布'.B1>, <Cell '订单时长分布'.B2>, <Cell '订单时长分布'.B3>, <Cell '订单时长分布'.B4>, <Cell '订单时长分布'.B5>)
日期
2020-07-24 00:00:00
2020-07-24 00:00:00
2020-07-24 00:00:00
2020-07-24 00:00:00
(<Cell '订单时长分布'.C1>, <Cell '订单时长分布'.C2>, <Cell '订单时长分布'.C3>, <Cell '订单时长分布'.C4>, <Cell '订单时长分布'.C5>)
行为时长
a
b
c
d
(<Cell '订单时长分布'.D1>, <Cell '订单时长分布'.D2>, <Cell '订单时长分布'.D3>, <Cell '订单时长分布'.D4>, <Cell '订单时长分布'.D5>)
次数
718.832012012012
728.862012012012
390.792012012012
228.542012012012
(<Cell '订单时长分布'.E1>, <Cell '订单时长分布'.E2>, <Cell '订单时长分布'.E3>, <Cell '订单时长分布'.E4>, <Cell '订单时长分布'.E5>)
None
None
None
None
None
(<Cell '订单时长分布'.F1>, <Cell '订单时长分布'.F2>, <Cell '订单时长分布'.F3>, <Cell '订单时长分布'.F4>, <Cell '订单时长分布'.F5>)
None
None
None
None
None

2.1.4 练习题

找出用户行为偏好.xlsx中sheet1表中空着的格子,并输出这些格子的坐标

In [ ]:

from openpyxl import load_workbook
exl = load_workbook(r'D:\打卡学习\OfficeAutomation\用户行为偏好.xlsx')
sheet = exl.active
for row in sheet.iter_cols(min_row = 1, max_row = sheet.max_row,
						   min_col = 1, max_col = sheet.max_column):
						   #具体查看对应表格的行列数
    for cell in row:
        if not cell.value:
            print(cell.coordinate)
A102841
A102842
A102843
A102844
A102845
A102846
A102847
A102848
A102849
A102850
A102851
A102852
A102853
A102854
A102855
A102856
A102857
A102858
A102859
A102860
A102861
A102862
A102863
A102864
A102865
A102866
A102867
A102868
A102869
A102870
A102871
A102872
A102873
A102874
A102875
A102876
A102877
A102878
A102879
A102880
A102881
A102882
A102883
B102841
B102842
B102843
B102844
B102845
B102846
B102847
B102848
B102849
B102850
B102851
B102852
B102853
B102854
B102855
B102856
B102857
B102858
B102859
B102860
B102861
B102862
B102863
B102864
B102865
B102866
B102867
B102868
B102869
B102870
B102871
B102872
B102873
B102874
B102875
B102876
B102877
B102878
B102879
B102880
B102881
B102882
B102883
C102841
C102842
C102843
C102844
C102845
C102846
C102847
C102848
C102849
C102850
C102851
C102852
C102853
C102854
C102855
C102856
C102857
C102858
C102859
C102860
C102861
C102862
C102863
C102864
C102865
C102866
C102867
C102868
C102869
C102870
C102871
C102872
C102873
C102874
C102875
C102876
C102877
C102878
C102879
C102880
C102881
C102882
C102883
D102841
D102842
D102843
D102844
D102845
D102846
D102847
D102848
D102849
D102850
D102851
D102852
D102853
D102854
D102855
D102856
D102857
D102858
D102859
D102860
D102861
D102862
D102863
D102864
D102865
D102866
D102867
D102868
D102869
D102870
D102871
D102872
D102873
D102874
D102875
D102876
D102877
D102878
D102879
D102880
D102881
D102882
D102883

2.2 Excel写入

项目难度:⭐

2.2.1 写入数据并保存

  1. 原有工作簿中写入数据并保存

In [ ]:

# 已有的表格赋值保存
from openpyxl import load_workbook

exl = load_workbook(filename = r'D:\打卡学习\OfficeAutomation\用户行为偏好.xlsx')
sheet = exl.active
sheet['A1'] = 'hello world'       
#或者cell = sheet['A1'] 
#cell.value = 'hello world'
exl.save(filename = '用户行为偏好.xlsx') #存入原Excel表中,若创建新文件则可命名为不同名称
  1. 创建新的表格写入数据并保存

In [ ]:

# openpyxl 写入xsxl
from openpyxl import load_workbook
import openpyxl
wb = openpyxl.Workbook()
# 创建一个sheet
sh = wb.active
sh.title = 'My Worksheet' #注:此处在工作簿内的表格名称没有变。sheet 名

# 写入excel
# 参数对应 行, 列, 值
sh.cell(1,1).value = 'this is test'

# 保存
wb.save('new_test.xlsx')

2.2.2 将公式写入单元格保存

In [ ]:

# 公式写入单元格保存
from openpyxl import load_workbook

exl = load_workbook(r'D:\打卡学习\OfficeAutomation\用户行为偏好.xlsx')
sheet = exl.get_sheet_by_name('Sheet3')
sheet.dimensions       #先查看原有表格的单元格范围,防止替代原有数据

sheet['A30'] = '=SUM(A1:D1)'
exl.save(filename='用户行为偏好.xlsx')

Out[ ]:

'A1:O28'

2.2.3 插入数据

In [ ]:

#插入列数据
'''idx=2第2列,第2列前插入一列'''
sheet.insert_cols(idx=2) 
'''第2列前插入5列作为举例'''
sheet.insert_cols(idx=2, amount=5)

#插入行数据
'''插入一行'''
sheet.insert_rows(idx=2)
'''插入多行'''
sheet.insert_rows(idx=2, amount=5)

exl.save(filename='用户行为偏好.xlsx')

Out[ ]:

'idx=2第2列,第2列前插入一列'

Out[ ]:

'第2列前插入5列作为举例'

Out[ ]:

'插入一行'

Out[ ]:

'插入多行'

### 2.2.4 删除

In [ ]:

# 删除多列
sheet.delete_cols(idx=5, amount=2)
# 删除多行
sheet.delete_rows(idx=2, amount=5)

exl.save(filename='用户行为偏好.xlsx')

2.2.5 移动

当数字为正即向下或向右,为负即为向上或向左

In [ ]:

#移动
'''当数字为正即向下或向右,为负即为向上或向左'''
sheet.move_range('B3:E16',rows=1,cols=-1)
exl.save(filename='用户行为偏好.xlsx')

Out[ ]:

'当数字为正即向下或向右,为负即为向上或向左'

2.2.6 Sheet表操作

  1. 创建新的sheet

In [ ]:

from openpyxl import Workbook

workbook=Workbook()
sheet=workbook.active


workbook.create_sheet('new_sheet')
workbook.create_sheet('ssss_sheet')
workbook.save(filename='new_test.xlsx')

Out[ ]:

<Worksheet "new_sheet">

Out[ ]:

<Worksheet "ssss_sheet">

In [ ]:

os.getcwd()

Out[ ]:

'd:\\打卡学习\\OfficeAutomation'

2.修改sheet表名

In [ ]:

sheet = workbook.active
sheet.title = 'newname'
sheet.title
workbook.save(filename='new_test.xlsx')

Out[ ]:

'newname'

2.3 Excel 样式

项目难度:⭐⭐

2.3.1设置字体样式

  1. 设置字体样式

    Font(name字体名称,size大小,bold粗体,italic斜体,color颜色)

In [ ]:

from openpyxl import Workbook
from openpyxl.styles import Font

workbook = Workbook()
sheet = workbook.active
cell = sheet['A1']
cell.value = '你好'
font = Font(name='宋体', size=10, bold=True, italic=True, color='FF0000')
cell.font = font
workbook.save(filename='new_test.xlsx')

2.设置多个格子的字体样式

In [ ]:

from openpyxl import Workbook
from openpyxl.styles import Font

workbook = Workbook()
sheet = workbook.active
cells = sheet['B2:C3']
print('***\n',len(cells))
font = Font(name='宋体', size=10, bold=True, italic=True, color='FF000000')
for cells_ in cells:
    for cell in cells_:
        cell.font = font
sheet['B2'] = '你好啊a'
workbook.save(filename='new_test.xlsx')
***
 2

2.3.2 设置对齐样式

水平对齐:distributed, justify, center, left, fill, centerContinuous, right, general

垂直对齐:bottom, distributed, justify, center, top

  1. 设置单元格边框样式

Side:变现样式,边线颜色等

Border:左右上下边线

In [ ]:

from openpyxl import Workbook
from openpyxl.styles import Font
from openpyxl.styles import Side,Border
workbook = Workbook()
sheet = workbook.active
cell = sheet['A1']
side = Side(border_style='thin', color='FF000000')
#先定好side的格式
border = Border(left=side, right=side, top=side, bottom=side)
#代入边线中
cell.border = border
workbook.save(filename='new_test.xlsx')
  1. 设置单元格边框样式

变现样式:double, mediumDashDotDot, slantDashDot, dashDotDot, dotted, hair, mediumDashed, dashed, dashDot, thin, mediumDashDot, medium, thick

In [ ]:

from openpyxl import Workbook
from openpyxl.styles import PatternFill, Border, Side, Alignment, Font, GradientFill

workbook = Workbook()
sheet = workbook.active
cell = sheet['A1']
pattern_fill = PatternFill(fill_type='solid',fgColor="DDDDDD")
cell.fill = pattern_fill

#单色填充
cell2 = sheet['A3']
gradient_fill = GradientFill(stop=('FFFFFF', '99ccff','000000'))
cell2.fill = gradient_fill
#渐变填充
workbook.save(filename='new_test.xlsx')

2.3.3 设置行高与列宽

In [ ]:

from openpyxl import Workbook 

workbook = Workbook()
sheet = workbook.active
sheet.row_dimensions[1].height = 50
sheet.column_dimensions['C'].width = 20 
workbook.save(filename='new_test.xlsx')

2.3.4 合并、取消合并单元格

In [ ]:

sheet.merge_cells('A1:B2')
sheet.merge_cells(start_row=1, start_column=3,
				  end_row=2, end_column=4)

sheet.unmerge_cells('A1:B2')
sheet.unmerge_cells(start_row=1, start_column=3,
				    end_row=2, end_column=4)

2.3.5 练习题

打开test文件,找出文件中购买数量buy_mount超过5的行,并对其标红、加粗、附上边框。

In [ ]:

from openpyxl import load_workbook
from openpyxl.styles import Font, Side, Border 

workbook = load_workbook('./test.xlsx') 
sheet = workbook.active
buy_mount = sheet['F'] 
row_lst = []

for cell in buy_mount:
	if isinstance(cell.value, int) and cell.value > 5: 
		print(cell.row)
		row_lst.append(cell.row)
		
side = Side(style='thin', color='FF000000')
border = Border(left=side, right=side, top=side, bottom=side) 
font = Font(bold=True, color='FF0000')
for row in row_lst:
	for cell in sheet[row]: 
		cell.font = font 
		cell.border = border
workbook.save('new_test.xlsx')
2
3
4
8
9

In [ ]:

buy_mount[0].row
sheet.views

Out[ ]:

1

Out[ ]:

<openpyxl.worksheet.views.SheetViewList object>
Parameters:
sheetView=[<openpyxl.worksheet.views.SheetView object>
Parameters:
windowProtection=None, showFormulas=None, showGridLines=None, showRowColHeaders=None, showZeros=None, rightToLeft=None, tabSelected=True, showRuler=None, showOutlineSymbols=None, defaultGridColor=None, showWhiteSpace=None, view=None, topLeftCell=None, colorId=None, zoomScale=None, zoomScaleNormal=None, zoomScaleSheetLayoutView=None, zoomScalePageLayoutView=None, zoomToFit=None, workbookViewId=0, pane=None, selection=[<openpyxl.worksheet.views.Selection object>
Parameters:
pane=None, activeCell='H12', activeCellId=None, sqref='H12']]
上一篇:部分浏览器出现初始弹框一闪而过的情况


下一篇:伪类选择器