2.0 包的安装
操作难度:⭐
方法一:应用pip执行命令
安装openpyxl模块pip install openpyxl
注:openpyxl可以读取xlsx的格式,但是不可以去读xls格式;读取xls格式,可以安装xlrd模块,pip install xlrd
,本章节以xlsx格式为主。
方法二:在Pycharm中:File->Setting->左侧Project Interpreter
2.1 Excel读取
项目难度:⭐
- Excel全称为Microsoft Office Excel,2003年版本的是xls格式,2007和2007年之后的版本是xlsx格式。
- xlsx格式通过
openpyxl
模块打开; xls格式通过xlwt
模块写,xlrd
模块读取。 - 本文以xlsx模式为例
In [ ]:
# 多行内容显现 from IPython.core.interactiveshell import InteractiveShell InteractiveShell.ast_node_interactivity = 'all'
2.1.1 读取对应表格
In [ ]:
import os import warnings warnings.filterwarnings('ignore') os.getcwd()
Out[ ]:
'd:\\打卡学习\\OfficeAutomation\\source\\task02'
关于路径:
文件应在当前工作目录才可引用,可导入os
,使用函数os.getcwd()
弄清楚当前工作目录是什么,可使用os.chdir()
改变当前工作目录,具体可参考第一章节。(此处显现为相对路径)
- 查看属性
In [ ]:
os.chdir(r'..\..')
In [ ]:
# 导入模块,查看属性 os.getcwd() os.listdir() import openpyxl wb = openpyxl.load_workbook('用户行为偏好.xlsx') type(wb)
Out[ ]:
'd:\\打卡学习\\OfficeAutomation'
Out[ ]:
['new.zip', 'pdf版本', 'pdf版本.zip', 'readme.md', 'source', 'Task01 文件自动化与邮件处理.md', 'Task02 Python与Excel.md', 'Task03 python与word.md', 'Task04 python与pdf.md', 'Task05 爬虫入门与综合应用.md', 'test.xlsx', 'test_1.xlsx', 'test_makedir', '图片', '用户行为偏好.xlsx']
Out[ ]:
openpyxl.workbook.workbook.Workbook
import * 和from...import...
import *
和from...import...
的区别
-
import
导入一个模块,相当于导入的是一个文件夹,相对路径。 -
from...import...
导入了一个模块中的一个函数,相当于文件夹中的文件,绝对路径。
2、打开已经存在的Excel表格,查询对应sheet的名称
In [ ]:
# 导入模块中得函数,查询对应表得名称 from openpyxl import load_workbook exl = load_workbook('用户行为偏好.xlsx') print(exl.sheetnames)
['订单时长分布', 'Sheet3']
In [ ]:
'''通过传递表名字符串读取表、类型和名称''' sheet = exl.get_sheet_by_name('Sheet3') sheet type(sheet) sheet.title '''读取工作簿得活动表''' # 活动表是工作簿在Excel中打开时出现得工作表,再取得Worksheet对象后,可通过title属性取得它的名称。 anotherSheet = exl.active anotherSheet
Out[ ]:
'通过传递表名字符串读取表、类型和名称'
Out[ ]:
<Worksheet "Sheet3">
Out[ ]:
openpyxl.worksheet.worksheet.Worksheet
Out[ ]:
'Sheet3'
Out[ ]:
'读取工作簿得活动表'
Out[ ]:
<Worksheet "订单时长分布">
3、获取Excel 内容占据的大小
In [ ]:
sheet.dimensions
Out[ ]:
'A1:O29'
2.1.2 读取单元格
Cell
- Cell对象有一个value属性,包含这个单元格中保存的值。
- Cell对象也有row、column和coordinate属性,提供该单元格的位置信息。
- Excel用字母指定列,在Z列之后,列开始使用两个字母:AA、AB等,所以在调用的cell()方法时,可传入整数作为row和column关键字参数,也可以得到一个单元格。
- 注:第一行或第一列的整数取1,而不是0.
In [ ]:
# 从表中取得单元格 ## 获取表格名称 from openpyxl import load_workbook exl = load_workbook(r'D:\打卡学习\OfficeAutomation\用户行为偏好.xlsx') exl.get_sheet_names() # 读取单元格 sheet = exl.get_sheet_by_name('订单时长分布') '''显现单元格格式''' sheet['A1'] '''显现单元格文本内容''' sheet['B1'].value # 另一种表达方式 a = sheet['A1'] a.value '''行、列和数值显现''' print('Row' +str(a.row) +',Column'+str(a.column)+ ' is '+a.value) '''显现单元格''' 'Cell' + a.coordinate +' is '+ a.value
Out[ ]:
['订单时长分布', 'Sheet3']
Out[ ]:
'显现单元格格式'
Out[ ]:
<Cell '订单时长分布'.A1>
Out[ ]:
'显现单元格文本内容'
Out[ ]:
'日期'
Out[ ]:
'编号'
Out[ ]:
'行、列和数值显现'
Row1,Column1 is 编号
Out[ ]:
'显现单元格'
Out[ ]:
'CellA1 is 编号'
In [ ]:
# 顺B列打出前8行的奇数行单元格值 for i in range(1,8,2): print(i,sheet.cell(row=i,column=2).value)
1 日期 3 2020-07-24 00:00:00 5 2020-07-24 00:00:00 7 2020-07-24 00:00:00
In [ ]:
# 确定表格的最大行数和最大列数,即表的大小 sheet.max_row sheet.max_column
Out[ ]:
102883
Out[ ]:
4
2.1.3 读取多个格子的值
In [ ]:
#A1到C8区域的值 cells = sheet['A1:C8'] type(cells) #用enumerate包装一个可迭代对象,同时使用索引和迭代项 for index, item in enumerate(sheet['A1:C8']): if index >= 1: print("\n") for cell in item: print(cell.value,end=" ")
Out[ ]:
tuple
编号 日期 行为时长 71401.30952380953 2020-07-24 00:00:00 a 71401.30952380953 2020-07-24 00:00:00 b 71401.30952380953 2020-07-24 00:00:00 c 71401.30952380953 2020-07-24 00:00:00 d 71401.30952380953 2020-07-24 00:00:00 e 71401.30952380953 2020-07-24 00:00:00 f 71401.30952380953 2020-07-24 00:00:00 g
In [ ]:
# 指定范围的值 # 行获取 for row in sheet.iter_rows(min_row = 1, max_row = 5, min_col = 2, max_col = 6): print(row) # 一列由多个单元格组成,若需要获取每个单元格的值则循环获取即可 for cell in row: print(cell.value) # 列获取 for col in sheet.iter_cols(min_row = 1, max_row = 5, min_col = 2, max_col = 6): print(col) for cell in col: print(cell.value)
(<Cell '订单时长分布'.B1>, <Cell '订单时长分布'.C1>, <Cell '订单时长分布'.D1>, <Cell '订单时长分布'.E1>, <Cell '订单时长分布'.F1>) 日期 行为时长 次数 None None (<Cell '订单时长分布'.B2>, <Cell '订单时长分布'.C2>, <Cell '订单时长分布'.D2>, <Cell '订单时长分布'.E2>, <Cell '订单时长分布'.F2>) 2020-07-24 00:00:00 a 718.832012012012 None None (<Cell '订单时长分布'.B3>, <Cell '订单时长分布'.C3>, <Cell '订单时长分布'.D3>, <Cell '订单时长分布'.E3>, <Cell '订单时长分布'.F3>) 2020-07-24 00:00:00 b 728.862012012012 None None (<Cell '订单时长分布'.B4>, <Cell '订单时长分布'.C4>, <Cell '订单时长分布'.D4>, <Cell '订单时长分布'.E4>, <Cell '订单时长分布'.F4>) 2020-07-24 00:00:00 c 390.792012012012 None None (<Cell '订单时长分布'.B5>, <Cell '订单时长分布'.C5>, <Cell '订单时长分布'.D5>, <Cell '订单时长分布'.E5>, <Cell '订单时长分布'.F5>) 2020-07-24 00:00:00 d 228.542012012012 None None (<Cell '订单时长分布'.B1>, <Cell '订单时长分布'.B2>, <Cell '订单时长分布'.B3>, <Cell '订单时长分布'.B4>, <Cell '订单时长分布'.B5>) 日期 2020-07-24 00:00:00 2020-07-24 00:00:00 2020-07-24 00:00:00 2020-07-24 00:00:00 (<Cell '订单时长分布'.C1>, <Cell '订单时长分布'.C2>, <Cell '订单时长分布'.C3>, <Cell '订单时长分布'.C4>, <Cell '订单时长分布'.C5>) 行为时长 a b c d (<Cell '订单时长分布'.D1>, <Cell '订单时长分布'.D2>, <Cell '订单时长分布'.D3>, <Cell '订单时长分布'.D4>, <Cell '订单时长分布'.D5>) 次数 718.832012012012 728.862012012012 390.792012012012 228.542012012012 (<Cell '订单时长分布'.E1>, <Cell '订单时长分布'.E2>, <Cell '订单时长分布'.E3>, <Cell '订单时长分布'.E4>, <Cell '订单时长分布'.E5>) None None None None None (<Cell '订单时长分布'.F1>, <Cell '订单时长分布'.F2>, <Cell '订单时长分布'.F3>, <Cell '订单时长分布'.F4>, <Cell '订单时长分布'.F5>) None None None None None
2.1.4 练习题
找出用户行为偏好.xlsx中sheet1表中空着的格子,并输出这些格子的坐标
In [ ]:
from openpyxl import load_workbook exl = load_workbook(r'D:\打卡学习\OfficeAutomation\用户行为偏好.xlsx') sheet = exl.active for row in sheet.iter_cols(min_row = 1, max_row = sheet.max_row, min_col = 1, max_col = sheet.max_column): #具体查看对应表格的行列数 for cell in row: if not cell.value: print(cell.coordinate)
A102841 A102842 A102843 A102844 A102845 A102846 A102847 A102848 A102849 A102850 A102851 A102852 A102853 A102854 A102855 A102856 A102857 A102858 A102859 A102860 A102861 A102862 A102863 A102864 A102865 A102866 A102867 A102868 A102869 A102870 A102871 A102872 A102873 A102874 A102875 A102876 A102877 A102878 A102879 A102880 A102881 A102882 A102883 B102841 B102842 B102843 B102844 B102845 B102846 B102847 B102848 B102849 B102850 B102851 B102852 B102853 B102854 B102855 B102856 B102857 B102858 B102859 B102860 B102861 B102862 B102863 B102864 B102865 B102866 B102867 B102868 B102869 B102870 B102871 B102872 B102873 B102874 B102875 B102876 B102877 B102878 B102879 B102880 B102881 B102882 B102883 C102841 C102842 C102843 C102844 C102845 C102846 C102847 C102848 C102849 C102850 C102851 C102852 C102853 C102854 C102855 C102856 C102857 C102858 C102859 C102860 C102861 C102862 C102863 C102864 C102865 C102866 C102867 C102868 C102869 C102870 C102871 C102872 C102873 C102874 C102875 C102876 C102877 C102878 C102879 C102880 C102881 C102882 C102883 D102841 D102842 D102843 D102844 D102845 D102846 D102847 D102848 D102849 D102850 D102851 D102852 D102853 D102854 D102855 D102856 D102857 D102858 D102859 D102860 D102861 D102862 D102863 D102864 D102865 D102866 D102867 D102868 D102869 D102870 D102871 D102872 D102873 D102874 D102875 D102876 D102877 D102878 D102879 D102880 D102881 D102882 D102883
2.2 Excel写入
项目难度:⭐
2.2.1 写入数据并保存
- 原有工作簿中写入数据并保存
In [ ]:
# 已有的表格赋值保存 from openpyxl import load_workbook exl = load_workbook(filename = r'D:\打卡学习\OfficeAutomation\用户行为偏好.xlsx') sheet = exl.active sheet['A1'] = 'hello world' #或者cell = sheet['A1'] #cell.value = 'hello world' exl.save(filename = '用户行为偏好.xlsx') #存入原Excel表中,若创建新文件则可命名为不同名称
- 创建新的表格写入数据并保存
In [ ]:
# openpyxl 写入xsxl from openpyxl import load_workbook import openpyxl wb = openpyxl.Workbook() # 创建一个sheet sh = wb.active sh.title = 'My Worksheet' #注:此处在工作簿内的表格名称没有变。sheet 名 # 写入excel # 参数对应 行, 列, 值 sh.cell(1,1).value = 'this is test' # 保存 wb.save('new_test.xlsx')
2.2.2 将公式写入单元格保存
In [ ]:
# 公式写入单元格保存 from openpyxl import load_workbook exl = load_workbook(r'D:\打卡学习\OfficeAutomation\用户行为偏好.xlsx') sheet = exl.get_sheet_by_name('Sheet3') sheet.dimensions #先查看原有表格的单元格范围,防止替代原有数据 sheet['A30'] = '=SUM(A1:D1)' exl.save(filename='用户行为偏好.xlsx')
Out[ ]:
'A1:O28'
2.2.3 插入数据
In [ ]:
#插入列数据 '''idx=2第2列,第2列前插入一列''' sheet.insert_cols(idx=2) '''第2列前插入5列作为举例''' sheet.insert_cols(idx=2, amount=5) #插入行数据 '''插入一行''' sheet.insert_rows(idx=2) '''插入多行''' sheet.insert_rows(idx=2, amount=5) exl.save(filename='用户行为偏好.xlsx')
Out[ ]:
'idx=2第2列,第2列前插入一列'
Out[ ]:
'第2列前插入5列作为举例'
Out[ ]:
'插入一行'
Out[ ]:
'插入多行'
### 2.2.4 删除
In [ ]:
# 删除多列 sheet.delete_cols(idx=5, amount=2) # 删除多行 sheet.delete_rows(idx=2, amount=5) exl.save(filename='用户行为偏好.xlsx')
2.2.5 移动
当数字为正即向下或向右,为负即为向上或向左
In [ ]:
#移动 '''当数字为正即向下或向右,为负即为向上或向左''' sheet.move_range('B3:E16',rows=1,cols=-1) exl.save(filename='用户行为偏好.xlsx')
Out[ ]:
'当数字为正即向下或向右,为负即为向上或向左'
2.2.6 Sheet表操作
- 创建新的sheet
In [ ]:
from openpyxl import Workbook workbook=Workbook() sheet=workbook.active workbook.create_sheet('new_sheet') workbook.create_sheet('ssss_sheet') workbook.save(filename='new_test.xlsx')
Out[ ]:
<Worksheet "new_sheet">
Out[ ]:
<Worksheet "ssss_sheet">
In [ ]:
os.getcwd()
Out[ ]:
'd:\\打卡学习\\OfficeAutomation'
2.修改sheet表名
In [ ]:
sheet = workbook.active sheet.title = 'newname' sheet.title workbook.save(filename='new_test.xlsx')
Out[ ]:
'newname'
2.3 Excel 样式
项目难度:⭐⭐
2.3.1设置字体样式
-
设置字体样式
Font(name字体名称,size大小,bold粗体,italic斜体,color颜色)
In [ ]:
from openpyxl import Workbook from openpyxl.styles import Font workbook = Workbook() sheet = workbook.active cell = sheet['A1'] cell.value = '你好' font = Font(name='宋体', size=10, bold=True, italic=True, color='FF0000') cell.font = font workbook.save(filename='new_test.xlsx')
2.设置多个格子的字体样式
In [ ]:
from openpyxl import Workbook from openpyxl.styles import Font workbook = Workbook() sheet = workbook.active cells = sheet['B2:C3'] print('***\n',len(cells)) font = Font(name='宋体', size=10, bold=True, italic=True, color='FF000000') for cells_ in cells: for cell in cells_: cell.font = font sheet['B2'] = '你好啊a' workbook.save(filename='new_test.xlsx')
*** 2
2.3.2 设置对齐样式
水平对齐:distributed, justify, center, left, fill, centerContinuous, right, general
垂直对齐:bottom, distributed, justify, center, top
- 设置单元格边框样式
Side
:变现样式,边线颜色等
Border
:左右上下边线
In [ ]:
from openpyxl import Workbook from openpyxl.styles import Font from openpyxl.styles import Side,Border workbook = Workbook() sheet = workbook.active cell = sheet['A1'] side = Side(border_style='thin', color='FF000000') #先定好side的格式 border = Border(left=side, right=side, top=side, bottom=side) #代入边线中 cell.border = border workbook.save(filename='new_test.xlsx')
- 设置单元格边框样式
变现样式:double, mediumDashDotDot, slantDashDot, dashDotDot, dotted, hair, mediumDashed, dashed, dashDot, thin, mediumDashDot, medium, thick
In [ ]:
from openpyxl import Workbook from openpyxl.styles import PatternFill, Border, Side, Alignment, Font, GradientFill workbook = Workbook() sheet = workbook.active cell = sheet['A1'] pattern_fill = PatternFill(fill_type='solid',fgColor="DDDDDD") cell.fill = pattern_fill #单色填充 cell2 = sheet['A3'] gradient_fill = GradientFill(stop=('FFFFFF', '99ccff','000000')) cell2.fill = gradient_fill #渐变填充 workbook.save(filename='new_test.xlsx')
2.3.3 设置行高与列宽
In [ ]:
from openpyxl import Workbook workbook = Workbook() sheet = workbook.active sheet.row_dimensions[1].height = 50 sheet.column_dimensions['C'].width = 20 workbook.save(filename='new_test.xlsx')
2.3.4 合并、取消合并单元格
In [ ]:
sheet.merge_cells('A1:B2') sheet.merge_cells(start_row=1, start_column=3, end_row=2, end_column=4) sheet.unmerge_cells('A1:B2') sheet.unmerge_cells(start_row=1, start_column=3, end_row=2, end_column=4)
2.3.5 练习题
打开test文件,找出文件中购买数量buy_mount
超过5的行,并对其标红、加粗、附上边框。
In [ ]:
from openpyxl import load_workbook from openpyxl.styles import Font, Side, Border workbook = load_workbook('./test.xlsx') sheet = workbook.active buy_mount = sheet['F'] row_lst = [] for cell in buy_mount: if isinstance(cell.value, int) and cell.value > 5: print(cell.row) row_lst.append(cell.row) side = Side(style='thin', color='FF000000') border = Border(left=side, right=side, top=side, bottom=side) font = Font(bold=True, color='FF0000') for row in row_lst: for cell in sheet[row]: cell.font = font cell.border = border workbook.save('new_test.xlsx')
2 3 4 8 9
In [ ]:
buy_mount[0].row sheet.views
Out[ ]:
1
Out[ ]:
<openpyxl.worksheet.views.SheetViewList object> Parameters: sheetView=[<openpyxl.worksheet.views.SheetView object> Parameters: windowProtection=None, showFormulas=None, showGridLines=None, showRowColHeaders=None, showZeros=None, rightToLeft=None, tabSelected=True, showRuler=None, showOutlineSymbols=None, defaultGridColor=None, showWhiteSpace=None, view=None, topLeftCell=None, colorId=None, zoomScale=None, zoomScaleNormal=None, zoomScaleSheetLayoutView=None, zoomScalePageLayoutView=None, zoomToFit=None, workbookViewId=0, pane=None, selection=[<openpyxl.worksheet.views.Selection object> Parameters: pane=None, activeCell='H12', activeCellId=None, sqref='H12']]