poi生成excel大数据量的合并单元格操作优化
之前根据官网给的api和example,对excel单元格的合并操作使用下方的代码(poi版本 3.12)
public class MergedCells {
public static void main(String[] args) throws IOException {
HSSFWorkbook wb = new HSSFWorkbook();
HSSFSheet sheet = wb.createSheet("new sheet");
HSSFRow row = sheet.createRow(1);
HSSFCell cell = row.createCell(1);
cell.setCellValue("This is a test of merging");
sheet.addMergedRegion(new CellRangeAddress(1, 1, 1, 2));
// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("workbook.xls");
wb.write(fileOut);
fileOut.close();
wb.close();
}
}
当使用支持07以上的XSSFWorkbook和SXSSFWorkbook的时候同样可以这样,代码如下:
private static void mergeWithXSSF() throws IOException{
XSSFWorkbook wb = new XSSFWorkbook();
XSSFSheet sheet = wb.createSheet("new sheet");
XSSFRow row = sheet.createRow(1);
XSSFCell cell = row.createCell(1);
cell.setCellValue("This is a test of merging");
sheet.addMergedRegion(new CellRangeAddress(1, 1, 1, 2));
// Write the output to a file
FileOutputStream fileOut = new FileOutputStream("d:/temp/workbook1.xlsx");
wb.write(fileOut);
fileOut.close();
wb.close();
}
但是当mergeCell的次数非常大(数万~数十万)时候,对cpu和内存消耗不仅大大增加,而且耗时也非常大,本地测试的时候,循环次数6W次,大约需要20-30分钟;耗时核心代码如下:
for(int i=0;i<100000;i++) {
XSSFRow row = sheet.createRow(i);
XSSFCell cell = row.createCell(1);
cell.setCellValue("This is a test of merging");
sheet.addMergedRegion(new CellRangeAddress(i, i, 1, 2));
}
官网版本是3.16以后,我更新后发现 sheet 多了一个合并单元格的方法 addMergedRegionUnsafe,比addMergedRegion少了一些检测异常的过程
源码如下:
public int addMergedRegion(CellRangeAddress region) {
return this.addMergedRegion(region, true);
}
public int addMergedRegionUnsafe(CellRangeAddress region) {
return this.addMergedRegion(region, false);
}
private int addMergedRegion(CellRangeAddress region, boolean validate) {
if(region.getNumberOfCells() < 2) {
throw new IllegalArgumentException("Merged region " + region.formatAsString() + " must contain 2 or more cells");
} else {
region.validate(SpreadsheetVersion.EXCEL2007);
if(validate) {
this.validateArrayFormulas(region);
this.validateMergedRegions(region);
}
CTMergeCells ctMergeCells = this.worksheet.isSetMergeCells()?this.worksheet.getMergeCells():this.worksheet.addNewMergeCells();
CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();
ctMergeCell.setRef(region.formatAsString());
return ctMergeCells.sizeOfMergeCellArray();
}
}
使用这个方法之后,对于十万条数据合并单元格的本地测试就降低到了30多秒,感觉真的是质的飞跃,非常高兴,但是这只是开始,我想到既然通过减少了一些异常检测就有如此神威,是否合并单元格的方法还可以继续缩减呢?
合并单元格的核心代码在这:
//判断sheet是否已经有过合并单元格的经历,如果有就getMergeCells得到ctMergeCells 对象,否则就从addNewMergeCells获取对象(因为isSetMergeCells中使用了锁,还有一些复杂的操作,感觉会比较耗时)
CTMergeCells ctMergeCells = this.worksheet.isSetMergeCells()?this.worksheet.getMergeCells():this.worksheet.addNewMergeCells();
CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();
ctMergeCell.setRef(region.formatAsString());
return ctMergeCells.sizeOfMergeCellArray();
自己控制是否合并了单元格。设置一个本地变量 int mergeCellsCount = 0;如果合并了单元格 mergeCellsCount ++; 刚才核心的代码可以改成
CTMergeCells ctMergeCells = mergeCellsCount >0 ?this.worksheet.getMergeCells():this.worksheet.addNewMergeCells();
CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();
ctMergeCell.setRef(region.formatAsString());
mergeCellsCount ++
return ctMergeCells.sizeOfMergeCellArray();
return每次合并单元格的时候还要返回 已经 合并的单元格 的数目(我们上方定义的mergeCellsCount )
我们只要合并的过程,这个计数对我们没用,所以,自己重写了以下addMergedRegion方法
private void addMergeRegion(CellRangeAddress cra) {
XSSFWorkbook workbook = new XSSFWorkbook();
XSSFSheet sheet = workbook.getSheetAt(0);
CTWorksheet ctWorksheet = sheet.getCTWorksheet();
CTMergeCells ctMergeCells = mergeCellsCount > 0 ?ctWorksheet.getMergeCells():ctWorksheet.addNewMergeCells();
CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();
ctMergeCell.setRef(cra.formatAsString());
mergeCellsCount ++;
}
总结一下:主要是CellRangeAddress 是自己定义的,自己会控制合并区域的单元格的合法性,所以去掉验证合法性的代码; 去掉返回count的代码
以上转载自:https://my.oschina.net/u/1417838/blog/edit
--------------------------------------------------------------------------------分割线--------------------------------------------------------------------------------------------
然后我自己的业务场景是有大量的合并操作,我把要合并的参数保存到map中,然后再开线程跑(我是用SXSSFWorkbook 操作的,该方法操作大数据可观)
//maps用来存要合并的参数
System.out.println("总数"+maps.size());
if (maps.size() > 0){
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
// 设置核心线程数
executor.setCorePoolSize(8);
// 设置最大线程数
executor.setMaxPoolSize(8);
// 设置队列容量
executor.setQueueCapacity(350);
// 设置线程活跃时间(秒)
executor.setKeepAliveSeconds(60);
// 等待所有任务结束后再关闭线程池
executor.setWaitForTasksToCompleteOnShutdown(true);
// rejection-policy:当pool已经达到max size的时候,如何处理新任务
// CALLER_RUNS:不在新线程中执行任务,而是有调用者所在的线程来执行
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
executor.initialize();
int t = Runtime.getRuntime().availableProcessors();
System.out.println("当前可用的线程数:"+t);
BlockingQueue<List<Map<Integer, Integer>>> queue = new ArrayBlockingQueue(maps.size());
//以先入先出的顺序排序的阻塞队列
for (Map.Entry<Integer, Integer> entry : maps.entrySet()) {
List<Map<Integer, Integer>> listMap = new ArrayList<>();
Map<Integer, Integer> mapList = new HashMap<>();
mapList.put(entry.getKey(), entry.getValue());
listMap.add(mapList);
queue.add(listMap);
}
CountDownLatch latch = new CountDownLatch(queue.size());
int rowNum = 0;
while (queue.size()>0){
try {
System.out.println(rowNum);
rowNum++;
executor.execute(new MergedRegionForThread(latch,queue.take(),sheet));
} catch (InterruptedException e) {
e.printStackTrace();
}
}
try {
latch.await();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
/**
* 多线程合并表格
*/
public class MergedRegionForThread implements Runnable{
private CountDownLatch latch;
private List<Map<Integer, Integer>> dataList;
private int mergeCellsCount = 0;
private SXSSFSheet sheet;
public MergedRegionForThread(CountDownLatch latch, List<Map<Integer, Integer>> dataList, SXSSFSheet sheet) {
this.latch = latch;
this.dataList = dataList;
this.sheet = sheet;
}
@Override
public void run() {
try {
Map<Integer, Integer> map = (Map<Integer, Integer>) dataList.get(0);
for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 0, 0));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 1, 1));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 2, 2));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 3, 3));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 4, 4));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 5, 5));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 6, 6));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 7, 7));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 8, 8));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 9, 9));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 10, 10));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 11, 11));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 12, 12));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 13, 13));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 14, 14));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 15, 15));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 16, 16));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 17, 17));
addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 39, 39));
}
}catch (Exception e) {
e.printStackTrace();
} finally {
if (null != latch) {
latch.countDown();
}
}
}
/**
* 根据源码自定义表格合并方法(去掉验证和判断),只提取合并方法
* @param sheet
* @param cra
*/
private void addMergeRegion(SXSSFSheet sheet,CellRangeAddress cra) {
SXSSFWorkbook workbook = sheet.getWorkbook();
XSSFWorkbook xssfWorkbook = workbook.getXSSFWorkbook();
//获取第一个sheet
XSSFSheet sheet1 = xssfWorkbook.getSheetAt(0);
CTWorksheet ctWorksheet = sheet1.getCTWorksheet();
CTMergeCells ctMergeCells = mergeCellsCount > 0 ?ctWorksheet.getMergeCells():ctWorksheet.addNewMergeCells();
CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();
ctMergeCell.setRef(cra.formatAsString());
//用来判断sheet是否已经有合并表格
mergeCellsCount ++;
}
}
亲测,本来合并要2多个小时以上,使用后20s,30s即刻完成。有错欢迎大家指出