output-operations-on-dstreams

2023-08-27 17:10:58

Finally, this can be further optimized by reusing connection objects across multiple RDDs/batches. One can maintain a static pool of connection objects than can be reused as RDDs of multiple batches are pushed to the external system, thus further reducing the overheads.

Scala

Python

dstream.foreachRDD { rdd =>

  rdd.foreachPartition { partitionOfRecords =>

    // ConnectionPool is a static, lazily initialized pool of connections

    val connection = ConnectionPool.getConnection()

    partitionOfRecords.foreach(record => connection.send(record))

    ConnectionPool.returnConnection(connection)  // return to the pool for future reuse

  }

}

http://spark.apache.org/docs/1.6.1/streaming-programming-guide.html#output-operations-on-dstreams

码农公寓

相关文章