COMP9313 week3b Resilient Distributed Dataset (RDD) 下 Pyspark

Resilient Distributed Dataset (RDD)

https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es

https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L3.pdf

 

setting

COMP9313 week3b Resilient Distributed Dataset (RDD) 下   Pyspark

 

 

wordCount MapReduce

 COMP9313 week3b Resilient Distributed Dataset (RDD) 下   Pyspark

 

 

 COMP9313 week3b Resilient Distributed Dataset (RDD) 下   Pyspark

 

 COMP9313 week3b Resilient Distributed Dataset (RDD) 下   Pyspark

 

 

Lineage:

  1)在此结构下出错丢失partition文件: r5.todebugstring()

  2)如果worker坏掉了,并且造成数据丢失,可以从原始数据集中恢复,并通过lineage结构

  3)  如果driver坏掉了,有back up的driver

COMP9313 week3b Resilient Distributed Dataset (RDD) 下   Pyspark

 

 COMP9313 week3b Resilient Distributed Dataset (RDD) 下   Pyspark

 

 

DAG :  说实话 不知道在讲啥 偷懒太严重了 两小时准备了只11张PPT 全是字

  DAG and RDD are two core components in spark

  1) stage 1: no shuffling (narrow transformations)  stage2: shuffling stage3: shuffling (Wide transformation)

  

COMP9313 week3b Resilient Distributed Dataset (RDD) 下   Pyspark

 

COMP9313 week3b Resilient Distributed Dataset (RDD) 下   Pyspark

 

 

 

COMP9313 week3b Resilient Distributed Dataset (RDD) 下   Pyspark

 

上一篇:pytorch里DataParallel 和 DistributedParallel


下一篇:单个tomcat一端口多应用