关于Java解压文件的一些坑及经验分享(MALFORMED异常)

2022-10-18 10:12:25

文章也已经同步到我的csdn博客: http://blog.csdn.net/u012881584/article/details/72615481

关于Java解压文件的一些坑及经验分享

就在本周，测试人员找到我说现上的需求文档(zip格式的)无法预览了，让我帮忙看看怎么回事。

这个功能也并不是我做的，于是我便先看看线上日志有没有什么错误，果不其然，后台果然报错了。

java.lang.IllegalArgumentException:MALFORMED

   at java.util.zip.ZipCoder.toString(ZipCoder.toString:58)

   ...

异常大致是这样，前台无法预览需求文档的原因是该zip文件解压失败了。

首先网上查了下这个异常的原因，都说是因为编码的问题，要求将UTF-8改成GBK就可以了。

然后定位代码，看到有一个方法：unzip()

public static void unzip(File zipFile, String descDir) {

    try {

        File pathFile = new File(descDir);

        if (!pathFile.exists()) {

            pathFile.mkdirs();

        }

        ZipFile zip = getZipFile(zipFile);

        for (Enumeration entries = zip.entries(); entries.hasMoreElements(); ) {

            ZipEntry entry = (ZipEntry) entries.nextElement();

            String zipEntryName = entry.getName();

            if (StringUtils.isNotBlank(pre)) {

                zipEntryName = zipEntryName.substring(pre.length());

            }

            InputStream in = zip.getInputStream(entry);

            String outPath = (descDir + "/" + zipEntryName).replaceAll("\\*", "/");

            ;

            //判断路径是否存在,不存在则创建文件路径

            File file = new File(outPath.substring(0, outPath.lastIndexOf('/')));

            if (!file.exists()) {

                file.mkdirs();

            }

            //判断文件全路径是否为文件夹,如果是上面已经上传,不需要解压

            if (new File(outPath).isDirectory()) {

                continue;

            }

            //输出文件路径信息

            LOG.info("解压文件的当前路径为:{}", outPath);

            OutputStream out = new FileOutputStream(outPath);

            IOUtils.copy(in, out);

            in.close();

            out.close();

        }

        zip.close();

        LOG.info("******************解压完毕********************");

    } catch (Exception e) {

        LOG.error("[unzip] 解压zip文件出错", e);

    }

}

private static ZipFile getZipFile(File zipFile) throws Exception {

    ZipFile zip = new ZipFile(zipFile, Charset.forName("UTF-8"));

    Enumeration entries = zip.entries();

    while (entries.hasMoreElements()) {

        try {

            entries.nextElement();

            zip.close();

            zip = new ZipFile(zipFile, Charset.forName("UTF-8"));

            return zip;

        } catch (Exception e) {

            zip = new ZipFile(zipFile, Charset.forName("GBK"));

            return zip;

        }

    }

    return zip;

}

于是便将线上的zip文件down下来然后本地调试下，发现在第9行中抛出了异常，如下代码：

ZipEntry entry = (ZipEntry) entries.nextElement();

再由最开始的异常日志找到ZipCoder中的58行:

String toString(byte[] ba, int length) {

    CharsetDecoder cd = decoder().reset();

    int len = (int)(length * cd.maxCharsPerByte());

    char[] ca = new char[len];

    if (len == 0)

        return new String(ca);

    // UTF-8 only for now. Other ArrayDeocder only handles

    // CodingErrorAction.REPLACE mode. ZipCoder uses

    // REPORT mode.

    if (isUTF8 && cd instanceof ArrayDecoder) {

        int clen = ((ArrayDecoder)cd).decode(ba, 0, length, ca);

        if (clen == -1)    // malformed

            throw new IllegalArgumentException("MALFORMED");

        return new String(ca, 0, clen);

    }

    ByteBuffer bb = ByteBuffer.wrap(ba, 0, length);

    CharBuffer cb = CharBuffer.wrap(ca);

    CoderResult cr = cd.decode(bb, cb, true);

    if (!cr.isUnderflow())

        throw new IllegalArgumentException(cr.toString());

    cr = cd.flush(cb);

    if (!cr.isUnderflow())

        throw new IllegalArgumentException(cr.toString());

    return new String(ca, 0, cb.position());

}

这里只有UTF-8才会进入if逻辑才会抛错？果然如网上所说，将编码格式改为GBK即可。

ZipCoder这个类似src.zip包中的，既然这里做了check当然会有它的道理，单纯的改为GBK来解决这个bug显然是不合理的。

于是便要换种思路了，线上有些zip是仍然可以预览的。我将线上的zip文件解压后，在自己电脑重新打个包（我用的是好压），然后又运行了上述代码，竟然解压成功？？这是为什么？于是上网上找了一下，果然找到了答案：

Windows 压缩的时候使用的是系统的编码 GB2312，而 Mac 系统默认的编码是 UTF-8，于是出现了乱码。

最后去问了上传的同事，他是在Windows下用的winRar上传的(看来不同的解压工具还不同)。

好了，问题基本定位到了，这里就要想着怎么解决了。

又是一通找，终于：

Apache commons-compress 解压 zip 文件是件很幸福的事，可以解决 zip 包中文件名有中文时跨平台的乱码问题，不管文件是在 Windows 压缩的还是在 Mac，Linux 压缩的，解压后都没有再出现乱码问题了。

看到这里基本上问题就要解决了，于是开始使用apache的commons-compress了，下面直接上代码，代码是基于上面代码进行改造的：

首先引入pom文件：

<dependency>

    <groupId>org.apache.commons</groupId>

    <artifactId>commons-compress</artifactId>

    <version>1.8.1</version>

</dependency>

public static void main(String[] args) throws Exception{

    String path = "C:\\Users\\Isuzu\\Desktop\\test.zip";

    unzip(new File(path), "D:\\data",);

}

public static void unzip(File zipFile, String descDir) {

    try (ZipArchiveInputStream inputStream = getZipFile(zipFile)) {

        File pathFile = new File(descDir);

        if (!pathFile.exists()) {

            pathFile.mkdirs();

        }

        ZipArchiveEntry entry = null;

        while ((entry = inputStream.getNextZipEntry()) != null) {

            if (entry.isDirectory()) {

                File directory = new File(descDir, entry.getName());

                directory.mkdirs();

            } else {

                OutputStream os = null;

                try {

                    os = new BufferedOutputStream(new FileOutputStream(new File(descDir, entry.getName())));

                    //输出文件路径信息

                    LOG.info("解压文件的当前路径为:{}", descDir + entry.getName());

                    IOUtils.copy(inputStream, os);

                } finally {

                    IOUtils.closeQuietly(os);

                }

            }

        }

        final File[] files = pathFile.listFiles();

        if (files != null && files.length == 1 && files[0].isDirectory()) {

            // 说明只有一个文件夹

            FileUtils.copyDirectory(files[0], pathFile);

            //免得删除错误， 删除的文件必须在/data/demand/目录下。

            boolean isValid = files[0].getPath().contains("/data/www/");

            if (isValid) {

                FileUtils.forceDelete(files[0]);

            }

        }

        LOG.info("******************解压完毕********************");

    } catch (Exception e) {

        LOG.error("[unzip] 解压zip文件出错", e);

    }

}

private static ZipArchiveInputStream getZipFile(File zipFile) throws Exception {

    return new ZipArchiveInputStream(new BufferedInputStream(new FileInputStream(zipFile)));

}

到了这里就大功告成了，原先自己遇到这个问题时百度了一圈，解决方案大都是改编码格式为GBK，但那也只是治标不治本的方法，解压的坑就讲这么多，后续有新的坑还会继续总结出来的。

码农公寓

关于Java解压文件的一些坑及经验分享

相关文章