有200多个XML文档,每个文档类似如下:
<?xml version="1.0"?> <VehicleInfo> <FileHeader> <ScaleInfo> <SN>H00120030101081526</SN> <UserName>盛隆钢铁</UserName> <ScaleName>2#</ScaleName> <ScaleID>H001</ScaleID> <ScaleType>铁水秤开关</ScaleType> <WeighingType>铁水秤开关</WeighingType> <MeasureTime>2003-01-01 08:15:26</MeasureTime> <NodeNumber>2</NodeNumber> <WaveFile>20030101081424.wave</WaveFile> <VideoFile>20030101081424.wave</VideoFile> <Orientation>右方向来车<<<<<<</Orientation> <OperatorName>Admin</OperatorName> <SUMWeight>0</SUMWeight> </ScaleInfo> </FileHeader> <FileBody> <Node> <ID>1</ID> <_DateTime>2003-1-1 8:14:25</_DateTime> <VehicleType /> <VehicleCardID /> <Speed>17.5</Speed> <Weight>3.12</Weight> <FrontAxisWeight>.00</FrontAxisWeight> <BackAxisWeight>.00</BackAxisWeight> <InsideWheel1>.00</InsideWheel1> <OutsideWheel1>.00</OutsideWheel1> <InsideWheel2>.00</InsideWheel2> <OutsideWheel2>.00</OutsideWheel2> <InsideWheel3>.00</InsideWheel3> <OutsideWheel3>.00</OutsideWheel3> <InsideWheel4>.00</InsideWheel4> <OutsideWheel4>.00</OutsideWheel4> <Temperature>0123</Temperature> <Humidity>0123</Humidity> <PIC1>_1.bmp</PIC1> <PIC2>_2.bmp</PIC2> <PIC3>_3.bmp</PIC3> <PIC4>_4.bmp</PIC4> </Node> <Node> <ID>2</ID> <_DateTime>2003-1-1 8:14:26</_DateTime> <VehicleType /> <VehicleCardID /> <Speed>15.8</Speed> <Weight>4.77</Weight> <FrontAxisWeight>.00</FrontAxisWeight> <BackAxisWeight>.00</BackAxisWeight> <InsideWheel1>.00</InsideWheel1> <OutsideWheel1>.00</OutsideWheel1> <InsideWheel2>.00</InsideWheel2> <OutsideWheel2>.00</OutsideWheel2> <InsideWheel3>.00</InsideWheel3> <OutsideWheel3>.00</OutsideWheel3> <InsideWheel4>.00</InsideWheel4> <OutsideWheel4>.00</OutsideWheel4> <Temperature>0123</Temperature> <Humidity>0123</Humidity> <PIC1>_1.bmp</PIC1> <PIC2>_2.bmp</PIC2> <PIC3>_3.bmp</PIC3> <PIC4>_4.bmp</PIC4> </Node> </FileBody> </VehicleInfo>
现在要提取MeasureTime、NodeNumber、Orientation以及每个Node下面的Weight,最后计算左方向和右方向总次数和总节数,以及每个方向的总重与差。如果使用C#,代码不知道要多长,那就用Python吧~
#!/usr/bin/env python #-*- coding:utf-8 -*- __author__ = ‘liulixiang‘ from bs4 import BeautifulSoup import glob left, left_times, left_weight = 0, 0, 0.0 right, right_times, right_weight = 0, 0, 0.0 files = sorted(glob.glob(r‘E:\工作\work-documents\2013凤矿计量系统\Debug\WY.WeightBridge.Data\*.xml‘)) for index, filename in enumerate(files, 1): file = open(filename, encoding=‘utf-8‘).read() soup = BeautifulSoup(file, ‘xml‘) print(index, ‘时间‘, soup.MeasureTime.string, ‘节数:‘, int(soup.NodeNumber.string), ‘方向:‘, soup.Orientation.string) for node in soup.FileBody.findChildren(‘Node‘): print(‘\t序号:‘, node.ID.string, ‘重量:‘, node.Weight.string) if soup.Orientation.string == ‘左方向来车>>>>>>‘: left_weight += float(node.Weight.string) elif soup.Orientation.string == ‘右方向来车<<<<<<‘: right_weight += float(node.Weight.string) if soup.Orientation.string == ‘左方向来车>>>>>>‘: left += int(soup.NodeNumber.string) left_times += 1 elif soup.Orientation.string == ‘右方向来车<<<<<<‘: right += int(soup.NodeNumber.string) right_times += 1 print(‘\n‘) print(‘左方向来车共{}次,共{}节,总皮重{:.2f}‘.format(left_times, left, left_weight)) print(‘右方向来车共{}次,共{}节, 总毛重{:.2f}‘.format(right_times, right, right_weight)) print(‘总净重:%.2f‘ % (right_weight - left_weight))
注意:
1、soup = BeautifulSoup(file, ‘xml‘),因为BeautifulSoup默认解析HTML,所以解析XML时需要声明。
2、BS解析XML依赖lxml,windows下可以到这里下载二进制版本的lxml库。
3、BS的children()返回的是NavigableString,用findChildren可以返回tag。
这世上诱惑(五花八门的编程语言)太多,请抵制诱惑,今天这个语言流行(go说你呢)用这个,明天那个语言流行就用那个。人应该驾驭语言,而非语言来驾驭人。自勉!