c# – Rx:配对窗口持续时间与窗口内引发的事件数

我想使用Rx来计算2个事件流的统计数据.

输入流

//    stream1    --A---B----A-B-----A-----B----A--B|
//    stream2    ----X---X-----------X--X---XX---X--X|

中级结果

窗口持续时间,其中窗口在A上打开并在B上关闭以及在这些窗口内引发的stream2事件的计数

//    result     ------1------0-----------2-------1|    <-- count of stream2 events in [A-B] window
//                     4      2           6       3     <-- paired with window [A-B] window duration

最后结果

通过stream2事件的计数和每个组的返回窗口持续时间统计对中间结果进行分组,例如平均,最小和最大窗口持续时间

//    output     -----------------------------------0    1     2|    <-- count of stream2 events in [A-B] window
//                                                  2   3.5    6     <-- average [A-B] window duration for that count of stream2 events.

Rx查询

public enum EventKind
{
    START,
    STOP,
    OTHER
};

public struct Event1
{
    public EventKind  Kind;
    public DateTime   OccurenceTime;
};

var merge = stream1.Merge(stream2.Select(x => new Event1
                                        {
                                            Kind = EventKind.OTHER,
                                            OccurenceTime = x
                                        }))
           .RemoveDisorder(x => x.OccurenceTime, new TimeSpan(0,0,10));

var shared = merge.Publish().RefCount();

// Windows open on START and close on STOP
var windows = shared.Window(
            shared.Where(x => x.Kind == EventKind.START),
            opening => shared.Where(x => x.Kind == EventKind.STOP));

// For each window we're interested in the duration of the window along with
// the count of OTHER events that were raised inside the window
//
var pairs = windows.Select(window => new 
        {
            Duration = window
                .Where(x=>x.Kind != EventKind.OTHER) // we only want START & STOP events, not OTHER events
                .Buffer(2,1)                         // could use buffer(2) but this is more reliable if stream1 sometimes has multiple consecutive START events.
                .Where(x => x.Count == 2 && x[1].Kind == EventKind.STOP && x[0].Kind == EventKind.START)
                .Select(x => x[1].OccurenceTime - x[0].OccurenceTime), // compute the latency

            EventCount = window.Where(x=>x.Kind == EventKind.OTHER).Count() // count the number of OTHER events in the window
        }
    );

我想简化可观察的类型

>来自IObservable< {IObservable< int>,IObservable< TimeSpan>}>
>到IObservable< {int,TimeSpan}>

这应该是可能的,因为每个窗口只有1个持续时间和1个OTHER事件计数.

此时,定义通过EventCount对窗口进行分组的输出查询并选择窗口持续时间的统计信息(例如每组的最小值,最大值,平均值)应该不会太困难.

var result = pairs
        .GroupBy(pair => pair.EventCount)
        .Select(g => new 
            {
                EventCount = g.Key,
                Min = g.Min(x => x.Duration),
                Avg = g.Average(x => x.Duration),
                Max = g.Max(x => x.Duration)
            });

RemoveDisorder是一个扩展方法,我用它来对OccurenceTime上合并的obersvable的结果进行排序.我需要它,因为我的输入流不是直播事件(如本例所示),而是通过Tx从日志中读取.并且2个排序流的合并输出本身不再排序.

解决方法:

使用Rx一段时间后,您可能遇到的常见情况是启动和停止事件.要正确处理它有几种方法,它将取决于您的要求.

如果您的问题只是使用数据投影检查@Brandon解决方案,关键是以不同的方式进行组合,例如使用SelectMany.如果你想保留Select运算符,则必须返回IObservable< T>输入投影.

无论如何,我认为你的作文总体上有问题,我将尝试在下面说明.

像你一样使用Window运算符,如果在开始流中发生多个连续事件,它将创建多个组.在您的代码中可能会出现问题,因为主事件流将在下一个事件发生时多次处理.

这个例子只是为了向您展示许多组的创建:

var subject = new Subject<Event1>();
var shared = subject.Publish().RefCount();
var start = shared.Where(a => a.Kind == EventKind.START);
var stop = shared.Where(a => a.Kind == EventKind.STOP);
var values = shared.Where(a => a.Kind == EventKind.OTHER);

values.Window(start, a => stop).Subscribe(inner => 
 { 
    Console.WriteLine("New Group Started");
    inner.Subscribe(next => 
                    { 
                        Console.WriteLine("Next = "+ next.Kind + " | " + next.OccurenceTime.ToLongTimeString());
                    }, () => Console.WriteLine("Group Completed"));
 });

subject.OnNext(new Event1 { Kind = EventKind.START, OccurenceTime = DateTime.Now });
subject.OnNext(new Event1 { Kind = EventKind.START, OccurenceTime = DateTime.Now.AddSeconds(1) });
subject.OnNext(new Event1 { Kind = EventKind.OTHER, OccurenceTime = DateTime.Now.AddSeconds(2) });
subject.OnNext(new Event1 { Kind = EventKind.STOP, OccurenceTime = DateTime.Now.AddSeconds(3) });

结果:

New Group Started
New Group Started
Next = OTHER | 4:55:46 PM
Next = OTHER | 4:55:46 PM
Group Completed
Group Completed

也许这种行为是可取的,否则将是必要的其他组成.为了“驯服”事件流,我看到了三种不同的方法:

>仅使用第一个启动事件开始计算,忽略其他启动而不会停止. (例如:Create observable and consume only between events).
>使用最新的启动事件计算流,在这种情况下,先前的流将被内部的组合忽略(可能使用Switch运算符).
>独立计算,考虑到每个开始事件都需要一个结束事件,允许在合成中创建许多组流(对我来说没有任何意义,除非你有一个匹配起始和结束事件的标识符).

要实现这些选项中的一个,通常,您有许多不同的方法来实现它.如果我理解你的问题,你正在寻找选项ONE.现在回答:

>保持窗口,代码太多:

IObservable<Event1> sx= GetEventStream();
var shared = sx.Publish().RefCount();
var start = shared.Where(a => a.Kind == EventKind.START);
var stop = shared.Where(a => a.Kind == EventKind.STOP);
shared.Window(start, a => stop)
    .Select(sx => 
            sx.Publish(b =>
                        b.Take(1)
                        .Select(c => 
                        {
                            var final = b.LastOrDefaultAsync().Select(a => a.OccurenceTime);
                            var comp = b.Where(d => d.Kind == EventKind.OTHER).Count();
                            return final.Zip(comp, (d,e) => new { Count = e, Time = d - c.OccurenceTime });
                        })
                        .Switch()   // whatever operator here there's no difference
                    )               // because is just 1
            )
    .Concat()
    .Subscribe(next => 
    { 
        Console.WriteLine("Count = "+ next.Count + " | " + next.Time);
    });

>使用GroupByUntil,一种“黑客”,但这是我的偏好:

IObservable<Event1> sx = GetEventStream();
var shared = sx.Publish().RefCount();       
var stop = shared.Where(a => a.Kind == EventKind.STOP).Publish().RefCount();
var start = shared.Where(a => a.Kind == EventKind.START);       
start.GroupByUntil(a => Unit.Default, a => stop)
        .Select(newGroup => 
        { 
            var creation = newGroup.Take(1);
            var rightStream = shared.Where(a => a.Kind == EventKind.OTHER)
                                    .TakeUntil(newGroup.LastOrDefaultAsync())
                                    .Count();
            var finalStream = stop.Take(1);

            return creation.Zip(rightStream, finalStream, (a,b,c) => new { Count = b, Time = c.OccurenceTime - a.OccurenceTime });
        })
        .Concat()
        .Subscribe(next => 
        { 
            Console.WriteLine("Count = "+ next.Count + " | " + next.Time);
        });

>不使用带有Take的组/窗口(1)在组合的最后添加Repeat运算符,但由于“重新订阅”(因为它是冷或热可观察的,将取决于它)可能会导致不希望的行为,和调度程序使用).
>创建一个声明自己的扩展方法的自定义实现,并不像看起来那么难,可能是最好的选择,但需要一段时间才能实现.

您的组合的另一个问题是无法获得统计数据,因为您无法在GroupBy运算符中完成每个新组.

我建议重新考虑你的方法,可能解决方案是将时间结合起来.有关统计数据和Rx的更多信息,请检查:
http://www.codeproject.com/Tips/853256/Real-time-statistics-with-Rx-Statistical-Demo-App

上一篇:优化MySQL聚合查询


下一篇:python – Django Aggregation,计数总和