我有一个图形/分析问题我无法理解.我可以做一个蛮力,但它太慢了,也许有人有更好的主意,或知道或快速的python库?
我有2个时间序列数据集(x,y),我想聚合(随后绘图).问题是系列中的x值不匹配,我真的不想诉诸于将值复制到时间箱中.
所以,鉴于这两个系列:
S1: (1;100) (5;100) (10;100)
S2: (4;150) (5;100) (18;150)
加在一起时,应该导致:
ST: (1;100) (4;250) (5;200) (10;200) (18;250)
逻辑:
x=1 s1=100, s2=None, sum=100
x=4 s1=100, s2=150, sum=250 (note s1 value from previous value)
x=5 s1=100, s2=100, sum=200
x=10 s1=100, s2=100, sum=200
x=18 s1=100, s2=150, sum=250
我目前的想法是迭代一个排序的键列表(x),保留每个系列的前一个值,并查询每个集合是否有x的新y.
任何想法,将不胜感激!
解决方法:
这是另一种方法,将更多行为放在单个数据流上:
class DataStream(object):
def __init__(self, iterable):
self.iterable = iter(iterable)
self.next_item = (None, 0)
self.next_x = None
self.current_y = 0
self.next()
def next(self):
if self.next_item is None:
raise StopIteration()
self.current_y = self.next_item[1]
try:
self.next_item = self.iterable.next()
self.next_x = self.next_item[0]
except StopIteration:
self.next_item = None
self.next_x = None
return self.next_item
def __iter__(self):
return self
class MergedDataStream(object):
def __init__(self, *iterables):
self.streams = [DataStream(i) for i in iterables]
self.outseq = []
def next(self):
xs = [stream.next_x for stream in self.streams if stream.next_x is not None]
if not xs:
raise StopIteration()
next_x = min(xs)
current_y = 0
for stream in self.streams:
if stream.next_x == next_x:
stream.next()
current_y += stream.current_y
self.outseq.append((next_x, current_y))
return self.outseq[-1]
def __iter__(self):
return self
if __name__ == '__main__':
seqs = [
[(1, 100), (5, 100), (10, 100)],
[(4, 150), (5, 100), (18, 150)],
]
sm = MergedDataStream(*seqs)
for x, y in sm:
print "%02s: %s" % (x, y)
print sm.outseq