ValueError: Unicode strings with encoding declaration are not supported. Please use bytes input or XML fragments without declaration.
原xml文件内容:
<?xml version="1.0" encoding="UTF-8"?>
<dataset name="Lara_UrbanSeq1" version="0.5" comments="Public database: http://www.lara.prd.fr/benchmarks/trafficlightsrecognition">
<frame number="6695" sec="487" ms="829">
<objectlist>
<object id="18">
<orientation>90</orientation>
<box h="39" w="18" xc="294" yc="34"/>
<appearance>appear</appearance>
<hypothesislist>
<hypothesis evaluation="1.0" id="1" prev="1.0">
<type evaluation="1.0">Traffic Light</type>
<subtype evaluation="1.0">go</subtype>
</hypothesis>
</hypothesislist>
</object>
<object id="19">
<orientation>90</orientation>
<box h="15" w="6" xc="518" yc="123"/>
<appearance>appear</appearance>
<hypothesislist>
<hypothesis evaluation="1.0" id="1" prev="1.0">
<type evaluation="1.0">Traffic Light</type>
<subtype evaluation="1.0">go</subtype>
</hypothesis>
</hypothesislist>
</object>
<object id="20">
<orientation>90</orientation>
<box h="15" w="6" xc="382" yc="122"/>
<appearance>appear</appearance>
<hypothesislist>
<hypothesis evaluation="1.0" id="1" prev="1.0">
<type evaluation="1.0">Traffic Light</type>
<subtype evaluation="1.0">go</subtype>
</hypothesis>
</hypothesislist>
</object>
</objectlist>
<grouplist>
</grouplist>
</frame>
</dataset>
原读取代码:
import numpy as np
import PIL.Image
import tensorflow as tf
from lxml import etree
from object_detection.dataset_tools import tf_record_creation_util
from object_detection.utils import dataset_util
from object_detection.utils import label_map_util
# xml_path = "./Annotations/Abyssinian_12_test.xml"
xml_path = "./Annotations/Lara_test.xml"
with tf.gfile.GFile(xml_path, 'r') as fid:
xml_str = fid.read()
# xml = etree.fromstring(xml_str)
# xml = etree.fromstring(xml_str).encode('utf-8')
xml = etree.fromstring(xml_str.encode('utf-8')) # 这一句做了修改后bug消失
data = dataset_util.recursive_parse_xml_to_dict(xml)['annotation']
print(data)
错误显示不支持的解码格式,以为时标注的xml文件出错了,就把相应的图片和标注文件删除了,发现还是出错。很感谢这篇博客的作者:https://blog.csdn.net/Fkk921912333/article/details/78537726 ,作者博客“解析 XML 字符串”部分,介绍了相关的xml文件解析方法,最主要的一句:print(etree.tostring(root, pretty_print=True).decode(‘utf-8’))。对比自己的creat_te_record文件,修改了读取文件时格式,即加入‘utf-8’,改变读取xml文件的编码方式,即可顺利转换数据。因为用的是tensorflow官方给的转换文件,具体语句为将xml = etree.fromstring(xml_str)改为xml = etree.fromstring(xml_str.encode(‘utf-8’))。可以据此更改自己的文件语句。
————————————————
以上文字转载自:
原文链接:https://blog.csdn.net/mingyang_wang/article/details/82912636