poi生成excel大数据量的合并单元格操作优化

poi生成excel大数据量的合并单元格操作优化

之前根据官网给的api和example,对excel单元格的合并操作使用下方的代码(poi版本 3.12)

public class MergedCells {
   public static void main(String[] args) throws IOException {
        HSSFWorkbook wb = new HSSFWorkbook();
        HSSFSheet sheet = wb.createSheet("new sheet");

        HSSFRow row = sheet.createRow(1);
        HSSFCell cell = row.createCell(1);
        cell.setCellValue("This is a test of merging");

        sheet.addMergedRegion(new CellRangeAddress(1, 1, 1, 2));

        // Write the output to a file
        FileOutputStream fileOut = new FileOutputStream("workbook.xls");
        wb.write(fileOut);
        fileOut.close();
        wb.close();
    }
}

当使用支持07以上的XSSFWorkbook和SXSSFWorkbook的时候同样可以这样,代码如下:

private static void mergeWithXSSF() throws  IOException{
       XSSFWorkbook wb = new XSSFWorkbook();
       XSSFSheet sheet = wb.createSheet("new sheet");

       XSSFRow row = sheet.createRow(1);
       XSSFCell cell = row.createCell(1);
       cell.setCellValue("This is a test of merging");
       sheet.addMergedRegion(new CellRangeAddress(1, 1, 1, 2));
        
       // Write the output to a file
       FileOutputStream fileOut = new FileOutputStream("d:/temp/workbook1.xlsx");
       wb.write(fileOut);
       fileOut.close();
       wb.close();
 }

但是当mergeCell的次数非常大(数万~数十万)时候,对cpu和内存消耗不仅大大增加,而且耗时也非常大,本地测试的时候,循环次数6W次,大约需要20-30分钟;耗时核心代码如下:

for(int i=0;i<100000;i++) {
      XSSFRow row = sheet.createRow(i);
      XSSFCell cell = row.createCell(1);
      cell.setCellValue("This is a test of merging");
      sheet.addMergedRegion(new CellRangeAddress(i, i, 1, 2));
  }

官网版本是3.16以后,我更新后发现 sheet 多了一个合并单元格的方法 addMergedRegionUnsafe,比addMergedRegion少了一些检测异常的过程

源码如下:

public int addMergedRegion(CellRangeAddress region) {
        return this.addMergedRegion(region, true);
    }

    public int addMergedRegionUnsafe(CellRangeAddress region) {
        return this.addMergedRegion(region, false);
    }

    private int addMergedRegion(CellRangeAddress region, boolean validate) {
        if(region.getNumberOfCells() < 2) {
            throw new IllegalArgumentException("Merged region " + region.formatAsString() + " must contain 2 or more cells");
        } else {
            region.validate(SpreadsheetVersion.EXCEL2007);
            if(validate) {
                this.validateArrayFormulas(region);
                this.validateMergedRegions(region);
            }

            CTMergeCells ctMergeCells = this.worksheet.isSetMergeCells()?this.worksheet.getMergeCells():this.worksheet.addNewMergeCells();
            CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();
            ctMergeCell.setRef(region.formatAsString());
            return ctMergeCells.sizeOfMergeCellArray();
        }
    }

使用这个方法之后,对于十万条数据合并单元格的本地测试就降低到了30多秒,感觉真的是质的飞跃,非常高兴,但是这只是开始,我想到既然通过减少了一些异常检测就有如此神威,是否合并单元格的方法还可以继续缩减呢?

合并单元格的核心代码在这:

//判断sheet是否已经有过合并单元格的经历,如果有就getMergeCells得到ctMergeCells 对象,否则就从addNewMergeCells获取对象(因为isSetMergeCells中使用了锁,还有一些复杂的操作,感觉会比较耗时)
CTMergeCells ctMergeCells = this.worksheet.isSetMergeCells()?this.worksheet.getMergeCells():this.worksheet.addNewMergeCells();
CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();
ctMergeCell.setRef(region.formatAsString());
return ctMergeCells.sizeOfMergeCellArray();

自己控制是否合并了单元格。设置一个本地变量 int mergeCellsCount = 0;如果合并了单元格 mergeCellsCount ++; 刚才核心的代码可以改成

CTMergeCells ctMergeCells = mergeCellsCount >0 ?this.worksheet.getMergeCells():this.worksheet.addNewMergeCells();
CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();
ctMergeCell.setRef(region.formatAsString());
mergeCellsCount  ++
return ctMergeCells.sizeOfMergeCellArray();

return每次合并单元格的时候还要返回 已经 合并的单元格 的数目(我们上方定义的mergeCellsCount )

我们只要合并的过程,这个计数对我们没用,所以,自己重写了以下addMergedRegion方法

private void addMergeRegion(CellRangeAddress cra) {
		XSSFWorkbook workbook = new XSSFWorkbook();
		XSSFSheet sheet = workbook.getSheetAt(0);
		CTWorksheet ctWorksheet = sheet.getCTWorksheet();

		CTMergeCells ctMergeCells = mergeCellsCount > 0 ?ctWorksheet.getMergeCells():ctWorksheet.addNewMergeCells();
		CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();

		ctMergeCell.setRef(cra.formatAsString());
		mergeCellsCount ++;
	}

总结一下:主要是CellRangeAddress 是自己定义的,自己会控制合并区域的单元格的合法性,所以去掉验证合法性的代码; 去掉返回count的代码

以上转载自:https://my.oschina.net/u/1417838/blog/edit

--------------------------------------------------------------------------------分割线--------------------------------------------------------------------------------------------

然后我自己的业务场景是有大量的合并操作,我把要合并的参数保存到map中,然后再开线程跑(我是用SXSSFWorkbook 操作的,该方法操作大数据可观)

//maps用来存要合并的参数
System.out.println("总数"+maps.size());
        if (maps.size() > 0){
			ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
			// 设置核心线程数
			executor.setCorePoolSize(8);
			// 设置最大线程数
			executor.setMaxPoolSize(8);
			// 设置队列容量
			executor.setQueueCapacity(350);
			// 设置线程活跃时间(秒)
			executor.setKeepAliveSeconds(60);
			// 等待所有任务结束后再关闭线程池
			executor.setWaitForTasksToCompleteOnShutdown(true);
			// rejection-policy:当pool已经达到max size的时候,如何处理新任务
			// CALLER_RUNS:不在新线程中执行任务,而是有调用者所在的线程来执行
			executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
			executor.initialize();
			int t = Runtime.getRuntime().availableProcessors();
			System.out.println("当前可用的线程数:"+t);

			BlockingQueue<List<Map<Integer, Integer>>> queue = new ArrayBlockingQueue(maps.size());
			//以先入先出的顺序排序的阻塞队列
			for (Map.Entry<Integer, Integer> entry : maps.entrySet()) {
				List<Map<Integer, Integer>> listMap = new ArrayList<>();
				Map<Integer, Integer> mapList = new HashMap<>();
				mapList.put(entry.getKey(), entry.getValue());
				listMap.add(mapList);
				queue.add(listMap);
			}
			CountDownLatch latch = new CountDownLatch(queue.size());
			int rowNum = 0;
			while (queue.size()>0){
				try {
					System.out.println(rowNum);
					rowNum++;
					executor.execute(new MergedRegionForThread(latch,queue.take(),sheet));
				} catch (InterruptedException e) {
					e.printStackTrace();
				}
			}
			try {
				latch.await();
			} catch (InterruptedException e) {
				e.printStackTrace();
			}
		}
/**
 * 多线程合并表格
 */
public class MergedRegionForThread implements Runnable{

    private CountDownLatch latch;
    private List<Map<Integer, Integer>> dataList;
    private int mergeCellsCount = 0;
    private SXSSFSheet sheet;

    public MergedRegionForThread(CountDownLatch latch, List<Map<Integer, Integer>> dataList, SXSSFSheet sheet) {

        this.latch = latch;
       this.dataList = dataList;
        this.sheet = sheet;
    }

    @Override
    public void run() {
        try {
            Map<Integer, Integer> map = (Map<Integer, Integer>) dataList.get(0);
            for (Map.Entry<Integer, Integer> entry : map.entrySet()) {
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 0, 0));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 1, 1));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 2, 2));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 3, 3));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 4, 4));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 5, 5));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 6, 6));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 7, 7));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 8, 8));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 9, 9));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 10, 10));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 11, 11));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 12, 12));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 13, 13));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 14, 14));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 15, 15));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 16, 16));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 17, 17));
                addMergeRegion(sheet,new CellRangeAddress(entry.getKey(), entry.getValue(), 39, 39));
            }
        }catch (Exception e) {
            e.printStackTrace();
        } finally {
            if (null != latch) {
                latch.countDown();
            }
        }
    }

    /**
     * 根据源码自定义表格合并方法(去掉验证和判断),只提取合并方法
     * @param sheet
     * @param cra
     */
    private void addMergeRegion(SXSSFSheet sheet,CellRangeAddress cra) {
        SXSSFWorkbook workbook = sheet.getWorkbook();
        XSSFWorkbook xssfWorkbook = workbook.getXSSFWorkbook();
        //获取第一个sheet
        XSSFSheet sheet1 = xssfWorkbook.getSheetAt(0);
        CTWorksheet ctWorksheet = sheet1.getCTWorksheet();

        CTMergeCells ctMergeCells = mergeCellsCount > 0 ?ctWorksheet.getMergeCells():ctWorksheet.addNewMergeCells();
        CTMergeCell ctMergeCell = ctMergeCells.addNewMergeCell();

        ctMergeCell.setRef(cra.formatAsString());
        //用来判断sheet是否已经有合并表格
        mergeCellsCount ++;
    }
}

亲测,本来合并要2多个小时以上,使用后20s,30s即刻完成。有错欢迎大家指出

上一篇:HashMap详解


下一篇:TreeMap的使用和底层实现