使用BeautifulSoup解析XML文档

有200多个XML文档,每个文档类似如下:

使用BeautifulSoup解析XML文档
<?xml version="1.0"?>
<VehicleInfo>
  <FileHeader>
    <ScaleInfo>
      <SN>H00120030101081526</SN>
      <UserName>盛隆钢铁</UserName>
      <ScaleName>2#</ScaleName>
      <ScaleID>H001</ScaleID>
      <ScaleType>铁水秤开关</ScaleType>
      <WeighingType>铁水秤开关</WeighingType>
      <MeasureTime>2003-01-01 08:15:26</MeasureTime>
      <NodeNumber>2</NodeNumber>
      <WaveFile>20030101081424.wave</WaveFile>
      <VideoFile>20030101081424.wave</VideoFile>
      <Orientation>右方向来车&lt;&lt;&lt;&lt;&lt;&lt;</Orientation>
      <OperatorName>Admin</OperatorName>
      <SUMWeight>0</SUMWeight>
    </ScaleInfo>
  </FileHeader>
  <FileBody>
    <Node>
      <ID>1</ID>
      <_DateTime>2003-1-1 8:14:25</_DateTime>
      <VehicleType />
      <VehicleCardID />
      <Speed>17.5</Speed>
      <Weight>3.12</Weight>
      <FrontAxisWeight>.00</FrontAxisWeight>
      <BackAxisWeight>.00</BackAxisWeight>
      <InsideWheel1>.00</InsideWheel1>
      <OutsideWheel1>.00</OutsideWheel1>
      <InsideWheel2>.00</InsideWheel2>
      <OutsideWheel2>.00</OutsideWheel2>
      <InsideWheel3>.00</InsideWheel3>
      <OutsideWheel3>.00</OutsideWheel3>
      <InsideWheel4>.00</InsideWheel4>
      <OutsideWheel4>.00</OutsideWheel4>
      <Temperature>0123</Temperature>
      <Humidity>0123</Humidity>
      <PIC1>_1.bmp</PIC1>
      <PIC2>_2.bmp</PIC2>
      <PIC3>_3.bmp</PIC3>
      <PIC4>_4.bmp</PIC4>
    </Node>
    <Node>
      <ID>2</ID>
      <_DateTime>2003-1-1 8:14:26</_DateTime>
      <VehicleType />
      <VehicleCardID />
      <Speed>15.8</Speed>
      <Weight>4.77</Weight>
      <FrontAxisWeight>.00</FrontAxisWeight>
      <BackAxisWeight>.00</BackAxisWeight>
      <InsideWheel1>.00</InsideWheel1>
      <OutsideWheel1>.00</OutsideWheel1>
      <InsideWheel2>.00</InsideWheel2>
      <OutsideWheel2>.00</OutsideWheel2>
      <InsideWheel3>.00</InsideWheel3>
      <OutsideWheel3>.00</OutsideWheel3>
      <InsideWheel4>.00</InsideWheel4>
      <OutsideWheel4>.00</OutsideWheel4>
      <Temperature>0123</Temperature>
      <Humidity>0123</Humidity>
      <PIC1>_1.bmp</PIC1>
      <PIC2>_2.bmp</PIC2>
      <PIC3>_3.bmp</PIC3>
      <PIC4>_4.bmp</PIC4>
    </Node>
  </FileBody>
</VehicleInfo>
使用BeautifulSoup解析XML文档

现在要提取MeasureTime、NodeNumber、Orientation以及每个Node下面的Weight,最后计算左方向和右方向总次数和总节数,以及每个方向的总重与差。如果使用C#,代码不知道要多长,那就用Python吧~

使用BeautifulSoup解析XML文档
#!/usr/bin/env python
#-*- coding:utf-8 -*-
__author__ = liulixiang

from bs4 import BeautifulSoup
import glob

left, left_times, left_weight = 0, 0, 0.0
right, right_times, right_weight = 0, 0, 0.0
files = sorted(glob.glob(rE:\工作\work-documents\2013凤矿计量系统\Debug\WY.WeightBridge.Data\*.xml))
for index, filename in enumerate(files, 1):
    file = open(filename, encoding=utf-8).read()
    soup = BeautifulSoup(file, xml)
    print(index,  时间, soup.MeasureTime.string, 节数:, int(soup.NodeNumber.string), 方向:, soup.Orientation.string)
    for node in soup.FileBody.findChildren(Node):
        print(\t序号:, node.ID.string, 重量:, node.Weight.string)
        if soup.Orientation.string == 左方向来车>>>>>>:
            left_weight += float(node.Weight.string)
        elif soup.Orientation.string == 右方向来车<<<<<<:
            right_weight += float(node.Weight.string)
    if soup.Orientation.string == 左方向来车>>>>>>:
        left += int(soup.NodeNumber.string)
        left_times += 1
    elif soup.Orientation.string == 右方向来车<<<<<<:
        right += int(soup.NodeNumber.string)
        right_times += 1
        print(\n)

print(左方向来车共{}次,共{}节,总皮重{:.2f}.format(left_times, left, left_weight))
print(右方向来车共{}次,共{}节, 总毛重{:.2f}.format(right_times, right, right_weight))
print(总净重:%.2f % (right_weight - left_weight))
使用BeautifulSoup解析XML文档

注意:

1、soup = BeautifulSoup(file, ‘xml‘),因为BeautifulSoup默认解析HTML,所以解析XML时需要声明。

2、BS解析XML依赖lxml,windows下可以到这里下载二进制版本的lxml库。

3、BS的children()返回的是NavigableString,用findChildren可以返回tag。

这世上诱惑(五花八门的编程语言)太多,请抵制诱惑,今天这个语言流行(go说你呢)用这个,明天那个语言流行就用那个。人应该驾驭语言,而非语言来驾驭人。自勉!

使用BeautifulSoup解析XML文档

上一篇:Creole


下一篇:pod