全网最详细SpringBatch读(Reader)跨多行文件讲解

文章列表

写在前面:
我是「境里婆娑」。我还是从前那个少年,没有一丝丝改变,时间只不过是考验,种在心中信念丝毫未减,眼前这个少年,还是最初那张脸,面前再多艰险不退却。
写博客的目的就是分享给大家一起学习交流,如果您对 Java感兴趣,可以关注我,我们一起学习。

前言:在工作中可能会遇到SpringBatch读取的文件记录跨多行或者文件中存在多种不同的记录格式,不必担心SpringBatch已经帮我们把接口都预留好了,只需要稍微改造就可以轻松实现。

读记录跨多行文件

当Flat文件格式非标准是,通过实现记录分隔策略接口RecordSeparatorPolicy来实现非标准Flat格式文件。非标准Flat文件有多种情况,例如记录跨多行、以特定的字符开头、以特定的字符结尾。

下面讲的例子是每两行表示一条记录:

412222,201,tom,2020-02-27
,china
412453,203,tqm,2020-03-27
,us
412222,205,tym,2020-05-27
,jap

默认的记录分割策略SimpleRecordSeparatorPolicy或者DefaultRecordSeparatorPolicy已经不能处理此类文件。我们可以实现接口RecordSeparatorPolicy来自定义分割策略MulitiLineRecordSeparatorPolicy

读记录跨多行文件时,使用到的核心组件类图如下:
全网最详细SpringBatch读(Reader)跨多行文件讲解
在本类图中除了MulitiLineRecordSeparatorPolicy和CommonFieldSetMapper是自定义实现的,其他组件都是SpringBatch自带。

MulitiLineRecordSeparatorPolicy:负责从文件中确认一条完整记录,在本实现中每读到四个逗号分隔符,则认为是一条完整的记录

/**
 * @author shuliangzhao
 * @date 2020/12/6 13:05
 */
public class MulitiLineRecordSeparatorPolicy implements RecordSeparatorPolicy {

    private String delimiter = ",";

    private int count = 0;

    public int getCount() {
        return count;
    }

    public void setCount(int count) {
        this.count = count;
    }

    public String getDelimiter() {
        return delimiter;
    }

    public void setDelimiter(String delimiter) {
        this.delimiter = delimiter;
    }

    @Override
    public boolean isEndOfRecord(String record) {
        return countDelimiter(record) == count;
    }

    private int countDelimiter(String record) {
        String temp = record;
        int index = -1;
        int count = 0;
        while ((index=temp.indexOf(",")) != -1) {
            temp = temp.substring(index +1);
            count++;
        }
        return count;
    }

    @Override
    public String postProcess(String record) {
        return record;
    }

    @Override
    public String preProcess(String record) {
        return record;
    }
}

delimiter :定义为读的的分割符号
count:分隔符总数,给定的字符串包含的分隔符个数等于此值,则认为是一条完整的记录。

1、读跨多行文件job配置

读跨多行文件job基于javabean配置如下

/**
 * 读记录跨多行文件
 * @author shuliangzhao
 * @date 2020/12/6 13:38
 */
@Configuration
@EnableBatchProcessing
public class MulitiLineConfiguration {

    @Autowired
    private JobBuilderFactory jobBuilderFactory;

    @Autowired
    private StepBuilderFactory stepBuilderFactory;

    @Autowired
    private PartitonMultiFileProcessor partitonMultiFileProcessor;

    @Autowired
    private PartitionMultiFileWriter partitionMultiFileWriter;

    @Bean
    public Job mulitiLineJob() {
       return jobBuilderFactory.get("mulitiLineJob").start(mulitiLineStep()).build();
    }

    @Bean
    public Step mulitiLineStep() {
        return stepBuilderFactory.get("mulitiLineStep")
                .<CreditBill,CreditBill>chunk(12)
                .reader(mulitiLineRecordReader())
                .processor(partitonMultiFileProcessor)
                .writer(partitionMultiFileWriter)
                .build();
    }

    @Bean
    @StepScope
    public MulitiLineRecordReader mulitiLineRecordReader() {
        return new MulitiLineRecordReader(CreditBill.class);
    }
}

2、读跨多行文件reader

MulitiLineRecordReader详细如下

/**
 * @author shuliangzhao
 * @date 2020/12/6 13:09
 */
public class MulitiLineRecordReader extends FlatFileItemReader {

    public MulitiLineRecordReader(Class clz) {
        setResource(CommonUtil.createResource("D:\\aplus\\muliti\\muliti.csv"));
        String[] names = CommonUtil.names(clz);
        DefaultLineMapper defaultLineMapper = new DefaultLineMapper();
        CommonFieldSetMapper commonFieldSetMapper = new CommonFieldSetMapper();
        commonFieldSetMapper.setTargetType(clz);
        defaultLineMapper.setFieldSetMapper(commonFieldSetMapper);
        DelimitedLineTokenizer delimitedLineTokenizer = new DelimitedLineTokenizer();
        delimitedLineTokenizer.setFieldSetFactory(new DefaultFieldSetFactory());
        delimitedLineTokenizer.setNames(names);
        delimitedLineTokenizer.setDelimiter(",");
        defaultLineMapper.setLineTokenizer(delimitedLineTokenizer);
        MulitiLineRecordSeparatorPolicy mulitiLineRecordSeparatorPolicy = new MulitiLineRecordSeparatorPolicy();
        mulitiLineRecordSeparatorPolicy.setCount(4);
        mulitiLineRecordSeparatorPolicy.setDelimiter(",");
        setRecordSeparatorPolicy(mulitiLineRecordSeparatorPolicy);
        setLineMapper(defaultLineMapper);
    }
}

3、自定义FieldSetMapper

自定义CommonFieldSetMapper

**
 * @author shuliangzhao
 * @date 2020/12/4 22:14
 */
public class CommonFieldSetMapper<T> implements FieldSetMapper<T> {

    private Class<? extends T> type;

    @Override
    public T mapFieldSet(FieldSet fieldSet) throws BindException {
        try {
            T t = type.newInstance();
            Field[] declaredFields = type.getDeclaredFields();
            if (declaredFields != null) {
                for (Field field : declaredFields) {
                    field.setAccessible(true);
                    if (field.getName().equals("id")) {
                        continue;
                    }
                    String name = field.getType().getName();
                    if (name.equals("java.lang.Integer")) {
                        field.set(t,fieldSet.readInt(field.getName()));
                    }else if (name.equals("java.lang.String")) {
                        field.set(t,fieldSet.readString(field.getName()));
                    }else if (name.equals("java.util.Date")) {
                        field.set(t,fieldSet.readDate(field.getName()));
                    }else{
                        field.set(t,fieldSet.readString(field.getName()));
                    }
                }
                return t;
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
        return null;
    }

    public void setTargetType(Class<? extends T> type) {
        this.type = type;
    }

4、读跨多行文件processor

PartitonMultiFileProcessor 详细如下

@Component
@StepScope
public class PartitonMultiFileProcessor implements ItemProcessor<CreditBill,CreditBill> {
    @Override
    public CreditBill process(CreditBill item) throws Exception {
        CreditBill creditBill = new CreditBill();
        creditBill.setAcctid(item.getAcctid());
        creditBill.setAddress(item.getAddress());
        creditBill.setAmout(item.getAmout());
        creditBill.setDate(item.getDate());
        creditBill.setName(item.getName());
        return creditBill;
    }
}

5、读跨多行文件writer

PartitionMultiFileWriter详细如下

@Component
@StepScope
public class PartitionMultiFileWriter implements ItemWriter<CreditBill> {

    @Autowired
    private CreditBillMapper creditBillMapper;

    @Override
    public void write(List<? extends CreditBill> items) throws Exception {
        if (items != null && items.size() > 0) {
            items.stream().forEach(item -> {
                creditBillMapper.insert(item);
            });
        }
    }
}

至此,我们完成了对文件分区的处理。
如果向更详细查看以上所有代码请移步到github:读跨多行文件详细代码

上一篇:史上最全的SpringBatch学习教程


下一篇:luogu P2580 于是他错误的点名开始了