requests 库之响应编码问题

2023-11-22 19:01:03

使用python 的requests模块进行网络请求的时候，我们有时候会遇到响应中的中文内容无法正常解析，这种情况下通常是编码的问题导致

比如：

url = 'http://httpbin.org/post'
reqbody = {'张三':19}
res = requests.post(url=url,data=reqbody)
print(res.text)

打印的内容是：

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "\u5f20\u4e09": "19"
  }

我们看到中文张三没有被正常展示出来，而是以\u开头的字符串 \u5f20\u4e09

我们在打印一下这串字符，可以看到，输出的内容就是张三

为什么会出现这种情况呢？

我们来看一下响应内容的编码格式

res.encoding

#输出的是
None

我们在来看一下r.text的源码

@property
    def text(self):
        """Content of the response, in unicode.

        If Response.encoding is None, encoding will be guessed using
        ``chardet``.

        The encoding of the response content is determined based solely on HTTP
        headers, following RFC 2616 to the letter. If you can take advantage of
        non-HTTP knowledge to make a better guess at the encoding, you should
        set ``r.encoding`` appropriately before accessing this property.
        """

大概意思就是

r.text输出的是一个 Unicode 格式字符串，如果响应的编码格式为None 则调用chardet 去猜测编码方式，

再往下看源码

if self.encoding is None:
    encoding = self.apparent_encoding

我看到当响应对象的encoding 为None时，在text中调用了 apparent_encoding 去获取响应对象的编码方式

我们试着也调用一下该方法

res.apparent_encoding

输出的是：
ascii

至此我们找到了问题的原因：

我们的接口响应没有设置对应的编码方式，encoding 为None

我们通过 r.text获取响应的文本内容使用的编码是Unicode，通过 chardet 获取内容编码方式为ascii ，

而在python 3+之后的版本，无论输入输出是什么格式，中间转接的都是Unicode格式。在python运行时，自动给你转到Unicode，输出的时候再转一遍换成需要的输出格式。　　

上面我们获取到的编码格式为ascii ，但是中文是又无法通过ascii进行解码输出（其他英文数字等部分已经通过ascii正常解码输出了），故这里直接以Unicode编码格式进行输出。所以我们看到中文张三是以\u开头的字符串输出 \u5f20\u4e09

现在我们要做的是：

将unicode格式输出转换成我们想要的中文张三

通过

r.text.encode('latin-1').decode('unicode_escape')

输出：

{
　　"args": {},
　　"data": "",
　　"files": {},
　　"form": {
　　"张三": "19"
}

同理 r.content 获取到的是 bytes 二进制的响应内容，在以上相同场景下，也会存在类似问题

我们通过 content.decode('unicode_escape') 将Unicode 转换成我们想要的输出

r.content.decode('unicode_escape')

输出：

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "张三": "19"
  }

总结：

　1、str.encode() 把一个字符串转换为其raw bytes形式

2、bytes.decode() 把raw bytes转换为其字符串形式

遇到类似的编码问题时，先检查响应内容text是什么类型

type(text) is bytes时：

text.decode('unicode_escape')

如果type(text) is str：

text.encode('latin-1').decode('unicode_escape')

码农公寓

使用python 的requests模块进行网络请求的时候，我们有时候会遇到响应中的中文内容无法正常解析，这种情况下通常是编码的问题导致

为什么会出现这种情况呢？

总结：

相关文章