前提:
1、spark1.0的包编译时指定支持hive:./make-distribution.sh --hadoop 2.3.0-cdh5.0.0 --with-yarn --with-hive --tgz
2、安装完spark1.0;
3、安装与hadoop对应的CDH版本的hive;
Spark SQL 支持Hive案例:
1、将hive-site.xml配置文件拷贝到$SPARK_HOME/conf下
hive-site.xml文件内容形如:
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop000:3306/hive?createDatabaseIfNotExist=true</value> </property> <property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> </property> <property> <name>javax.jdo.option.ConnectionUserName</name> <value>root</value> </property> <property> <name>javax.jdo.option.ConnectionPassword</name> <value>root</value> </property> </configuration>
2、启动spark: spark-shell
案例来源于spark官方文档:http://spark.apache.org/docs/latest/sql-programming-guide.html
//创建hiveContext val hiveContext = new org.apache.spark.sql.hive.HiveContext(sc) // 隐式转换 import hiveContext._ //创建hive表 hql("CREATE TABLE IF NOT EXISTS hive.kv_src (key INT, value STRING)") //加载数据到hive表 hql("LOAD DATA LOCAL INPATH ‘/home/spark/app/spark-1.0.0-bin-2.3.0-cdh5.0.0/examples/src/main/resources/kv1.txt‘ INTO TABLE hive.kv_src") //通过hql查询 hql("FROM hive.kv_src SELECT key, value").collect().foreach(println)