output-operations-on-dstreams

Finally, this can be further optimized by reusing connection objects across multiple RDDs/batches. One can maintain a static pool of connection objects than can be reused as RDDs of multiple batches are pushed to the external system, thus further reducing the overheads.

Scala
Python
dstream.foreachRDD { rdd =>
rdd.foreachPartition { partitionOfRecords =>
// ConnectionPool is a static, lazily initialized pool of connections
val connection = ConnectionPool.getConnection()
partitionOfRecords.foreach(record => connection.send(record))
ConnectionPool.returnConnection(connection) // return to the pool for future reuse
}
}

http://spark.apache.org/docs/1.6.1/streaming-programming-guide.html#output-operations-on-dstreams

上一篇:怎样从外网访问内网Jupyter Notebook?


下一篇:STM32F10x_模拟I2C读写EEPROM