SparkSQL读取Hive中的数据

由于我Spark采用的是Cloudera公司的CDH,并且安装的时候是在线自动安装和部署的集群。最近在学习SparkSQL,看到SparkSQL on HIVE。下面主要是介绍一下如何通过SparkSQL在读取HIVE的数据。

(说明:如果不是采用CDH在线自动安装和部署的话,可能需要对源码进行编译,使它能够兼容HIVE。

编译的方式也很简单,只需要在Spark_SRC_home(源码的home目录下)执行如下命令:

./make-distribution.sh --tgz -Phadoop-2.2 -Pyarn -DskipTests -Dhadoop.version=2.6.0-cdh5.4.4 -Phive

编译好了之后,会在lib目录下多几个jar包。)

下面我主要介绍一下我使用的情况:

1、为了让Spark能够连接到Hive的原有数据仓库,我们需要将Hive中的hive-site.xml文件拷贝到Spark的conf目录下,这样就可以通过这个配置文件找到Hive的元数据以及数据存放。

在这里由于我的Spark是自动安装和部署的,因此需要知道CDH将hive-site.xml放在哪里。经过摸索。该文件默认所在的路径是:/etc/hive/conf 下。

同理,spark的conf也是在/etc/spark/conf。

此时,如上所述,将对应的hive-site.xml拷贝到spark/conf目录下即可

  如果Hive的元数据存放在Mysql中,我们还需要准备好Mysql相关驱动,比如:mysql-connector-java-5.1.22-bin.jar。

2、编写测试代码

    val conf=new SparkConf().setAppName("Spark-Hive").setMaster("local")
val sc=new SparkContext(conf) //create hivecontext
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' ") //这里需要注意数据的间隔符 sqlContext.sql("LOAD DATA INPATH '/user/liujiyu/spark/kv1.txt' INTO TABLE src "); sqlContext.sql(" SELECT * FROM jn1").collect().foreach(println) sc.stop()

3、下面列举一下出现的问题:

(1)如果没有将hive-site.xml拷贝到spark/conf目录下,会出现:

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAABPQAAAAuCAIAAAAgMdBiAAAMdUlEQVR4nO2d3aGzIAxAO4+rOImDOAhzdLn7cFsFEhIEW396ztP3VQkhIYEUb308AABOxTg/5/Gw3ocpHNk9/Dbj/AzT0UokfDUgjo19gFtyvqwCAADwK4zzc+Wb6/EwhWcCe2z4EsmkP8W8OyQMD4t9gPtxvqwCAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAANDCMIVnmIaj1YAyeCjmpNYgjk4PHorBGlfnfh6834jgl2E+wydgXtVxrk35+kPm59HpcE7locM5qTWIo9NzKg8dzi9Y4xUG1hsxLhwp9/Pg/UZ0FV7vRPvAq5j2lfwWduQkqR/RyebzhXPdP2fw/hk42byCLQxTOInzxIsw93572DCF777cTxvRad6J9nVr3Bvi6HMQR3sRv5pxHuX/u6VXCPlIpFzKCy8+p/MVJf8UF3HQKRa1K0+5Awy4n7m2Kf9BNyWr1HSGOdnOMpYwDfHO5hR7mbtxivy1MM7xt3TjvOsXnIdkyXREj9od4Oe58ppxQoijz0Ic7YScqLtNXYrbTVykwvmS5J/iIg46xaJ25SlHcdvNOMeV3zhf/zw5WybPspFpIT91yGZAeioRuy3/Vj2+819IuW09+gwuS14fFFnvEQLShzI2POEgtrCLdtkDKu8exFVdq9iW6lGF/hjJInZ96O71r2Q0ifTMv9amvOpbnNgT/19buf0u2sZujIQ2WuOh3RA3rJoba+ttE5Y4Io6yD4gjS3YBo7j1+7Uj5d8l0ZjURbu0Z7K8b+F64VGy8ybh0mi+j9J5o83Ygs7teaNvTlr+9e3c6kHZOvXR+0qcvQuJY2OedFeNYuxXrRrFnPNfCZRd0W5JW7Krc9RxmAYRqnYc6Vd9S/aMqOZCad3vnLFponuJTwe2qT7sXiXbs4oUkHu/JzP07K9E6TfO5XwnPVjMwO7Vz63dNypuE7yvN6YgV6J8i1aaHHnbDRo5sy2XPEwhhGWODenl5L/DFJ4h1Ad4eVO+6GpeLWqlNpco1vifo+szfO88+RY1TEEEchojSwWl9J44OBa7XI30eX1tVdtvEu2ZqFZryJ4ywa4XFt16voAjjmyII+KoWpl8E+P2+0aLsnilF7YqKPBwvVA3quIlw84bxGsetGyVzMl10tfonLIpb1RKbslXpuQuD1b4aJyf87Sm7fTbutY8maLPZyP2jbZmzvn34GKvLF/1WrIs2dY56yg3pZuvjKtbLNk2In0+m+t+j53Tm8f5qU+7TcXtIrd9lXy0ZxXH+2bbnn49hAvTPk1r2BnYy89vmXuv3b9a3Gr7zmzDWzZE9fQSjbzwyyQLd0UyhKsGo44QpJtybVE3N+VFrfTmksKm/CV1HdsqSpua47wKib/f0Xt/xZjM2qZravqVxxdpJmjaAMWdKIJdL+wCcWRDHNnKJK1+K4784ra+32zs0lZaoGnW8LzgUvaCY+cN4r0tTm6r7Ixhg87OnR+K0BoNi5K7PFjjo1F/ILEvT2a489mYOmlbx7xCSSvzP7bEgiXZ1lnpN5Zl+8jx4CZLSiUrRlQyeHnd77GzbLvhizyLzlXy0ZxVHO+bbXv6rSFd+5MEa1vDzsBefl6V3Xftlgfd9y1ukzN7sXWL92oyRdhtqzXSk0JRsrE51ge4ZVOe9Cn2x83nUQXdMvRNeWT/PN0MuZ3y6eqcOMXDzqe4lf5r+hWNbXOpndRtDKMPXS/sAnFkQxylnxJHhjJ5cWv2a81n11YFBXwv1I1Ku+DZeYN4fUMdf6BESjkK7ZnTmDcqJFsjqvnrD1VylwerfKT7rDNP9s1no60zw+zM3xMLzfGrWTLa/ts+8jxYlxnaRhTdpYgrrvs9dnZs5WtVI7lllSzottDqfbNtT79bievGSmvEvcoMbFxdh7Xv2v0zJ7fuoUr0xVW+J/fb1mqUe9WR/NFNefwAhvfExSk25bZM8YCo/omWV51NudvvvTblUrO0F+IogjhKPyaODGXqi1tnPvcUt83bHUfCocVtdrP8A8jCqNvzhic5vWVjvjIkd3nwsOK2Zz7bbbuK255YaI7fmxa3xXW/x84HFreuzm1ZxRtRe2bYaX8Vt35J2ypHZuCKq/uv3esADGlXxFulrEmWbea2TC9PI2+rsWFx3fNxyvhbN63fNJ8csil3c7S7KV9Tbl505VGQ4vdrb/JbN0BKLGaPZHy/uCWOMogj4qiE6Dj5YNO2WxYDyqTNR1koq/pGV/aCY+cN4vuKWymkvlD8TnFbkwmtLWy7B2t8VHBaR55smM9Ld15bO+d4JWhHLLTHr/Ngqu0jx4NuFm0bUX6XbjV93e+ys2OrGq10OlfJR3NWMUfUkxn69ldSrdikTdYtNlCv7r92/0pxm24rh+n9Q1sZ4/x8ziEPnvq2xTP3h+48T7LpvHQnPc7q36WXtBIJKN13R/9b1Nq0KY/uH2flGZSGTfljVH5iJknrxqY8S/r5wqxt0+PvUu1+n+aDUm3WyLt55C6qWYq8OelDHNlaEUfEUZmkY2k66wsXcz6Pyg9KyXVbt4btBRfDC7adq8VvLW6F/qLfos5decOUbI+oIhMakns8WOGj0h6wI0/WzWc99l1bmTnH9mCNJUux3xG/2Xegk/KDUma+Mq66WbRxRNldhajW1/0+O6cDyG1Vq5WQ3LlKPtqziun9nszQtb8axV/a10eKnYH9/Py+bee1+2eK28WpL8++na9GWCllm22tSSPItnSa5GF9TP39FaZoHX32nMfabVncKo+Gt9Gie8L0/tH/eazSKh9XdKFkjUzM6obM1LHqurYp/zesvYZpKOmcqqYklMLVcU7117aZG61Rmj2a32u8sGNxSxwlton6JI401X49jnSN/X7L83ktbKN7ymPV7jC80D4mpe/6rYShs2srOWmVfks69+UNQ3Jzvqqyc48Hyz5Soj+V3JYn/fHase/bSs85VR50Lelk/s3xK7sN05Bft+PIuLohi24bkZ9V1ru0PNlkZznckL44p0orIblzldQ12zJjLe/3ZIa+/dU8JeNVvjQvWMPOwPbVT63dVXuViyI35T/Abg/UQT23+T5IhTiC73DvOAK4HZ97gB8a+QFLWoe0ALdnw4M6t2EUzxPCF7j1ckIcwZe4dRwB3I0d8ySxvxc/YEmKW/g5kuPrH9meJmO+e1Y7H7eccrcclANxdCi/OOUALscH8iSxvxe3tySrNAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABcmimor5GGE4GPAAAAAAAAHCicDmX9wXbDC/gIAAAAAADgm7xLtfi9YcMUNr1ra5hC+2vH4pd7zVP2AusuyT0kWo3au8MPeNf2YdYAAAAAAAC4AK9Cbqnehik8Q9hQR7UXXeMcV9DjnB+HHlLODVMQWslCn+IWAAAAAACgjeIDsdk543r0Gh3Jxvck57SPcX7O0xSWD6fwDPMc11FmW0lUB8aayOd4xYHoOC/3eJK1ccsrsTXi3ssjeoyzKFqHKdQWt85Dy6V+XzqGKbJYUd+CNQAAAAAAAC6FXlalh6CP9M8+s+osPZsc5+c8Du/b/+VEh4Rm21WhGs1FU6G0NtTipfx8Vaudp9UMi9l8a1SMxT65Va86lhymEMJS3ffYGQAAAAAA4AKUyqqkdEoKKa1gWw8oX1f/C81XubnWUXbbVaF61XNZ5ZNdQ7JyvqqdA2sHqOaIqseyvbj1LCnK2VwGxS0AAAAAANyJUlk1RI8WJ0ei6bPB4rHWd9GVPcr7L8ppuypU0jZvb5Rn8rCyKFk9X80/1A9h7RF9rrh1LSm6prgFAAAAAIA7UyyrlkeLs8d9naJorQGn8C624uLWLajK9+TVqict/+PWDxW3ddawaSlubUtS3AIAAAAAwE9hlFWvH2Qa56ddJuWNRDkX1VH+rwKXii75efqJ7DgfmvVYslBaeyxZrVPtEVVWtw1/c+tYkuIWAAAAAAB+CqusGufncw5B+20l8ZKd/LFk0UdF2+X/6w3jvDxrG/34cfyDwLFSqWSlnNMlSyWSvpaP9DLVGVH+M1fKbzs1FbdOvzXFbckaAAAAAAAAV0F9G4x2fKn+ipL4m0/5J7VrrSukq21LymUlZ9zqJSeqqifzXUGGZGGR9KeYhKHkGbI1olR06a1Iom/fR4V+o4/fj4WrgzasAQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAPfmDwtuwEnZ2g9AAAAAAElFTkSuQmCC" alt="" />

分析:从错误提示上面就知道,spark无法知道hive的元数据的位置,所以就无法实例化对应的client。

解决的办法就是必须将hive-site.xml拷贝到spark/conf目录下

(2)测试代码中没有加sc.stop会出现如下错误:

ERROR scheduler.LiveListenerBus: Listener EventLoggingListener threw an exception
java.lang.reflect.InvocationTargetException

在代码最后一行添加sc.stop()解决了该问题。

上一篇:MyBatis项目实战 快速将MySQL转换成Oracle语句


下一篇:HDU1879 kruscal 继续畅通工程