FLINK实例(134):自定义时间和窗口的操作符(15)Flink 时间窗口的起始时间

来源:https://blog.csdn.net/Accelerating/article/details/111556440

Flink 时间窗口的起始时间

话不多说,直接上手今天的主题,探索一个容易让人忽略和困惑的问题:Flink 时间窗口的起始时间
就以最简单的demo为例,定义一个步长为5s的滚动窗口,就以这个简单的入口进入Flink的源码开始探索

timeWindow(Time.seconds(5))

1)timeWindow的定义

public WindowedStream<T, KEY, TimeWindow> timeWindow(Time size) {
        if (environment.getStreamTimeCharacteristic() == TimeCharacteristic.ProcessingTime) {
            return window(TumblingProcessingTimeWindows.of(size));
        } else {
            return window(TumblingEventTimeWindows.of(size));
        }
    }

这段源码比较贴近大众,就是一个普通的判断,而且environment.getStreamTimeCharacteristic()这个东西我们再熟悉不过了,判断当前是ProcessingTime还是EventTime,当然除了EventTime还有IngestionTime,但是比较常用的还是ProcessingTime和EventTime,所以我们就非ProcessingTime即EventTime这样理解,因为生产环境比较常用的是EventTime,所以我们就进入else的代码继续查看

2)TumblingEventTimeWindows的定义

window(TumblingEventTimeWindows.of(size)) 定义了一个滚动窗口TumblingEventTimeWindows,下面给出了TumblingEventTimeWindows.of(size)的定义和TumblingEventTimeWindows构造函数。

public static TumblingEventTimeWindows of(Time size) {
        return new TumblingEventTimeWindows(size.toMilliseconds(), 0);
    }

 protected TumblingEventTimeWindows(long size, long offset) {
        if (Math.abs(offset) >= size) {
            throw new IllegalArgumentException("TumblingEventTimeWindows parameters must satisfy abs(offset) < size");
        }

        this.size = size;
        this.offset = offset;
    }

可以看到通过of(size)方法创建了一个offset为0,size为5的TumblingEventTimeWindows对象,然后就是我们需要的核心方法,assignWindows,窗口分配元素的核心方法

@Override
    public Collection<TimeWindow> assignWindows(Object element, long timestamp, WindowAssignerContext context) {
        if (timestamp > Long.MIN_VALUE) {
            // Long.MIN_VALUE is currently assigned when no timestamp is present
            long start = TimeWindow.getWindowStartWithOffset(timestamp, offset, size);
            return Collections.singletonList(new TimeWindow(start, start + size));
        } else {
            throw new RuntimeException("Record has Long.MIN_VALUE timestamp (= no timestamp marker). " +
                    "Is the time characteristic set to 'ProcessingTime', or did you forget to call " +
                    "'DataStream.assignTimestampsAndWatermarks(...)'?");
        }
    }

其中,assignWindows方法调用TimeWindow.getWindowStartWithOffset()方法,获取window窗口开始时间,我们再看该方法的定义:

/**
     * 方法用来获取给定时间戳timestamp下的窗口开始时间
     *
     * @param timestamp epoch millisecond to get the window start.
     * @param offset The offset which window start would be shifted by.
     * @param windowSize The size of the generated windows.
     * @return window start
     */
    public static long getWindowStartWithOffset(long timestamp, long offset, long windowSize) {
        return timestamp - (timestamp - offset + windowSize) % windowSize;
    }

getWindowStartWithOffset()方法就是获取窗口的开始时间,该方法的就是一行代码,也是其核心。

上一段代码小测一下

case class TestData(timestamp:Long,word:String)
  def main(args: Array[String]): Unit = {
    val env = StreamExecutionEnvironment.getExecutionEnvironment
    env.setStreamTimeCharacteristic(TimeCharacteristic.EventTime)
    // 便于输出,设置并行度为1
    env.setParallelism(1)
    val socketStream = env.socketTextStream("localhost",9999)
    val windowedStream = socketStream
        .map(row=>TestData(row.split(" ")(0).toLong,row.split(" ")(0)))
        .assignTimestampsAndWatermarks(new  BoundedOutOfOrdernessTimestampExtractor[TestData](Time.seconds(1)) {
            override def extractTimestamp(element: TestData): Long 
            = element.timestamp * 1000
           })
  .keyBy(_.word)
  .timeWindow(Time.seconds(5))
   .reduce((r1,r2)=>TestData(r1.timestamp,"hello "+r2.word))
windowedStream.print("window output is")
socketStream.print("input data is")
env.execute("window_test_job")
}

测试数据
1599623712 word(2020-09-09 11:55:12)
1599623715 word(2020-09-09 11:55:15)
根据公式算出开始时间:
1599623712 - (1599623712 - 0 + 5) % 5 == 1599623710
也就是开始时间为 1599623710,步长为5s,也就是下次触发窗口计算为1599623715

验证一下:
nc录入数据:
1599623712 word
1599623715 word
控制台输出结果:
input data is> 1599623712 word
input data is> 1599623715 word
window output is> TestData(1599623712,word)
结果验证了公式结果即为窗口的开始时间,ProcessingTime与之类似就不测试了,其实也可以看到公式的计算结果一般为自然时间的开始,如2020-09-09 11:55:12的开始时间为2020-09-09 11:55:10

 

上一篇:Kafka生产环境问题总结及性能优化实战:JVM参数设置、消息丢失、重复消费、消息乱序、延时队列、消息回溯、分区数量设置、消息传递保障、kafka的事务、kafka高性能的原因


下一篇:环形数组 js