Resilient Distributed Dataset (RDD)
https://drive.google.com/drive/folders/13_vsxSIEU9TDg1TCjYEwOidh0x3dU6es
https://www.cse.unsw.edu.au/~cs9313/20T2/slides/L3.pdf
setting
wordCount MapReduce
Lineage:
1)在此结构下出错丢失partition文件: r5.todebugstring()
2)如果worker坏掉了,并且造成数据丢失,可以从原始数据集中恢复,并通过lineage结构
3) 如果driver坏掉了,有back up的driver
DAG : 说实话 不知道在讲啥 偷懒太严重了 两小时准备了只11张PPT 全是字
DAG and RDD are two core components in spark
1) stage 1: no shuffling (narrow transformations) stage2: shuffling stage3: shuffling (Wide transformation)