Spark2.2(五)SparkSQL读写Hive

IDEA中使用SparkSQL读写Hive

添加依赖

libraryDependencies ++= Seq("org.apache.spark" %% "spark-core" % "2.2.0",
                            "org.apache.spark" %% "spark-sql" % "2.2.0",
                            "org.apache.spark" %% "spark-sql" % "2.2.0",
                            "com.databricks" %% "spark-csv" % "1.5.0",
                            "org.apache.spark" %% "spark-hive" % "2.2.0",
                            "mysql" % "mysql-connector-java" % "8.0.15")

复制hive-site.xml

复制hive-site.xml文件放在IDEA工程下的resources目录下,记录了元数据的地址,如果不添加这个配置,SparkSQL将走默认的配置路线,在项目本地创建一个spark-warehouse文件夹来放置我们的元数据

代码实例

package doc

import java.io.File

import org.apache.spark.sql.{SaveMode, SparkSession}

/**
  * @Author huangwei
  * @Date 19-9-20 
  * @Comments  Spark SQL 读写Hive
  **/
object HiveTable {

  def main(args: Array[String]): Unit = {
    val warehouseLocation = new File("spark-warehouse").getAbsolutePath
    val spark = SparkSession
      .builder()
      .appName("Spark Hive")
      .master("local")
      .config("spark.sql.warehouse.dir",warehouseLocation)
      .enableHiveSupport()
      .getOrCreate()

    import spark.sql

    sql("SHOW DATABASES").show()
//    +------------+
//    |databaseName|
//    +------------+
//    |     default|
//    |        test|
//    +------------+
    // 选择数据库
    sql("USE test")
    // 执行查询
    sql("SELECT * FROM score").show()
//    +----------+------------+-------+----+-------+
//    |student_id|student_name|chinese|math|english|
//    +----------+------------+-------+----+-------+
//    |         1|      ZhaoSi|     88|  98|     90|
//    |         2|    ZhangSan|     80|  89|     78|
//    |         3|    WangMaZi|     65|  70|     79|
//    |         4|     LiuNeng|     78|  80|     82|
//    |         5|        LiSi|     85|  90|     72|
//    +----------+------------+-------+----+-------+

    // 创建表格
    sql("CREATE TABLE IF NOT EXISTS persons (name STRING,gender String,age INT,from STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'")
    // 导入数据
    sql("LOAD DATA LOCAL INPATH '/home/huangwei/person.txt' INTO TABLE persons")
    sql("SELECT * FROM persons").show()
//    +-------+-------+---+-----+
//    |name   |gender|age| from|
//    +----+------+---+---------+
//    |   马云|    男| 55|浙江-杭州|
//    | 马化腾|     男| 48|广东-深圳|
//    | 李彦宏|     男| 51|山西-阳泉|
//    | 刘强东|     男| 46|江苏-宿迁|
//    |  雷军|     男| 50|湖北-仙桃|
//    +-------+-------+---+------+
    val result = sql("SELECT * FROM persons WHERE age < 50")
    // 查询结果导出
    result.repartition(1).write.format("csv").mode(SaveMode.Append)save("/home/huangwei/result")
  }

}

踩的一些坑

异常处理:File /hive/tmp does not exist

Spark2.2(五)SparkSQL读写Hive
找不到/hive/tmp目录(对应于hive-site.xml中),但是HDFS上是存在该目录的,这里需要修改hive-site.xml将/hive/tmp修改为hdfs://master:9000/hive/tmp.因为IDEA中是默认系统的文件系统路径,所以会存在找不到文件
解决方案:修改hive-site.xml配置

<property>
  <name>hive.exec.scratchdir</name>
  <value>hdfs://localhost:9000/hive/tmp</value>
 </property>
<property>
  <name>hive.metastore.warehouse.dir</name>
  <value>hdfs:localhist:9000/hive/warehouse</value>
  </property>
<property>
  <name>hive.querylog.location</name>
  <value>hdfs:localhost:9000/hive/log</value>
</property>

异常处理:Hive Schema version 2.3.0 does not match metastore’s schema version 1.2.0 Metastore is not

解决方案:修改hive-site.xml 关闭版本验证

<property>
  <name>hive.metastore.schema.verification</name>
  <value>false</value>
</property>

异常处理:java.lang.RuntimeException: java.lang.RuntimeException: Unable to create log directory /usr/local/hive2.3.6/tmp/huangwei;

本地tmp由于是在/目录下没有写权限
解决方案:修改hive-site.xml

<property>
  <name>hive.exec.local.scratchdir</name>
  <value>/home/huangwei/tmp</value>
</property>

还有一个警告,Loading class com.mysql.jdbc.Driver'. This is deprecated. The new driver class iscom.mysql.cj.jdbc.Driver’. The driver is automatically registered via the SPI and manual loading of the driver class is generally unnecessary.

因为这里mysql的驱动我用的是MySQL8版本的,所以需要修改hive-site.xml,将com.mysql.cj.jdbc.Driver改为com.mysql.cj.jdbc.Driver

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.cj.jdbc.Driver</value>
  <description>Driver class name for a JDBC metastore</description>
</property>
上一篇:Hive环境搭建和SparkSql整合


下一篇:SparkSQL(一)