java-无法以正确的编码从url读取js文件

我想从URL https://d3c3cq33003psk.cloudfront.net/opentag-67008-473432.js中以字符串形式读取js文件

我尝试了几种方法(从url读取或下载然后读取),但是我一直都收到不可读的字符,例如“( _.s d :`. …
我尝试的方式:
1.从网址下载文件:

FileUtils.copyURLToFile(jsUrl, file);

2.从网址中读取:

    StringBuilder sb = new StringBuilder();
    try {
        URL url = new URL(jsUrl);
        // read text returned by server
        BufferedReader in = new BufferedReader(new InputStreamReader(url.openStream(), "UTF-8"));
        String line;
        while ((line = in.readLine()) != null) {
            sb.append(line).append("\n");
        }
        in.close();
    } catch (Exception e) {
    }
    return sb.toString();

如果我从url(页面->另存为…)手动下载文件-可以使用记事本以常规UTF-8编码打开该文件.
有人可以帮我处理奇怪的文件吗?

解决方法:

它是GZIPped.使用GZIPInputStream.

更新

        InputStream stream = url.openStream();
        if ("gzip".equalsIgnoreCase(cnt.getHeaderField("Content-Encoding"))) {
            stream = new GZIPInputStream(stream);
        }
        BufferedReader in = new BufferedReader(new InputStreamReader(stream, "UTF-8"));

更新2

使用URLConnection:

        URLConnection cnt = url.openConnection();
        InputStream stream = cnt.getInputStream();
        if ("gzip".equalsIgnoreCase(cnt.getHeaderField("Content-Encoding"))) {
            stream = new GZIPInputStream(stream);
        }
        BufferedReader read = new BufferedReader(new InputStreamReader(stream, "UTF-8"));
上一篇:文件工具类之FileUtils


下一篇:如何从扩展名为.html的网页以编程方式下载pdf文件?