17.3. 管理 Hive

如果你又任何一种关系型数据库的使用经验,那么你将在这里看到非常熟悉的操作。

17.3.1. 表管理

17.3.1.1. 创建表


CREATE TABLE member (name string, age int, sex int);

				
hive> CREATE TABLE member (name string, age int, sex int);
OK
Time taken: 0.687 seconds
hive>
				
				

基于现有的数据建表

				
hive> create table newtable as select * from oldtable;
				
				

17.3.1.2. 显示表

				
hive> SHOW TABLES;
OK
test
Time taken: 0.041 seconds, Fetched: 1 row(s)
hive> 
				
				

通配符匹配表名称

show tables '*t*';
				

17.3.1.3. 删除表

				
hive> DROP TABLE test;
OK
Time taken: 1.337 seconds
hive>
				
				

17.3.1.4. 查看表结构

				
hive> CREATE TABLE member (name string, age int, sex int);
OK
Time taken: 0.273 seconds

hive> desc member;
OK
name                	string              	                    
age                 	int                 	                    
sex                 	int                 	                    
Time taken: 0.035 seconds, Fetched: 3 row(s)
hive> 
				
				

17.3.1.5. 为表增加字段

增加一个字段 phone 字符串类型

				
hive> ALTER TABLE member ADD COLUMNS (phone String);
OK
Time taken: 0.188 seconds
hive> desc member;
OK
name                	string              	                    
age                 	int                 	                    
sex                 	int                 	                    
phone               	string              	                    
Time taken: 0.033 seconds, Fetched: 4 row(s)
				
				

17.3.1.6. 修改表名称

将 test 表重命名为 vipuser

				
hive> CREATE TABLE test (name string, age int, sex int);
OK
Time taken: 0.311 seconds
hive> ALTER TABLE test RENAME TO vipuser;
OK
Time taken: 0.115 seconds
hive> desc vipuser;
OK
name                	string              	                    
age                 	int                 	                    
sex                 	int                 	                    
Time taken: 0.032 seconds, Fetched: 3 row(s)
hive> 
				
				

17.3.1.7. 使用已有表结构创建新表

仅仅创建表结构,不会复制数据过来。

				
hive> CREATE TABLE news_2017 LIKE news;
OK
Time taken: 0.311 seconds
				
				

17.3.2. 分区表

17.3.2.1. 创建分区表

				
hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);
				
				

17.3.2.2. 显示分区情况

				
hive> SHOW PARTITIONS passwd;
OK
computer=hadoop
computer=hbase
computer=hive
Time taken: 0.056 seconds, Fetched: 3 row(s)
				
				

17.3.2.3. 增加分区

				
hive> alter table member add partition (province='shenzhen');
				
				

17.3.2.4. 向分区表导入数据

				
hive> CREATE TABLE passwd (a string, b string, c string, d string, e string, f string) PARTITIONED BY (computer string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ':';
OK
Time taken: 0.323 seconds
hive> load data local inpath '/etc/passwd' overwrite into table passwd partition(computer="hive");
Loading data to table default.passwd partition (computer=hive)
OK
Time taken: 0.499 seconds
hive> select * from passwd;
OK
root	x	0	0	root	/root	hive
bin	x	1	1	bin	/bin	hive
daemon	x	2	2	daemon	/sbin	hive
adm	x	3	4	adm	/var/adm	hive
lp	x	4	7	lp	/var/spool/lpd	hive
sync	x	5	0	sync	/sbin	hive
shutdown	x	6	0	shutdown	/sbin	hive
halt	x	7	0	halt	/sbin	hive
mail	x	8	12	mail	/var/spool/mail	hive
operator	x	11	0	operator	/root	hive
games	x	12	100	games	/usr/games	hive
ftp	x	14	50	FTP User	/var/ftp	hive
nobody	x	99	99	Nobody	/	hive
dbus	x	81	81	System message bus	/	hive
polkitd	x	999	998	User for polkitd	/	hive
avahi	x	70	70	Avahi mDNS/DNS-SD Stack	/var/run/avahi-daemon	hive
avahi-autoipd	x	170	170	Avahi IPv4LL Stack	/var/lib/avahi-autoipd	hive
postfix	x	89	89		/var/spool/postfix	hive
sshd	x	74	74	Privilege-separated SSH	/var/empty/sshd	hive
ntp	x	38	38		/etc/ntp	hive
rpc	x	32	32	Rpcbind Daemon	/var/lib/rpcbind	hive
qemu	x	107	107	qemu user	/	hive
unbound	x	998	996	Unbound DNS resolver	/etc/unbound	hive
rpcuser	x	29	29	RPC Service User	/var/lib/nfs	hive
nfsnobody	x	65534	65534	Anonymous NFS User	/var/lib/nfs	hive
saslauth	x	997	76	"Saslauthd user"	/run/saslauthd	hive
radvd	x	75	75	radvd user	/	hive
nagios	x	1000	1000		/home/nagios	hive
apache	x	48	48	Apache	/usr/share/httpd	hive
exim	x	93	93		/var/spool/exim	hive
tss	x	59	59	Account used by the trousers package to sandbox the tcsd daemon	/dev/null	hive
git	x	996	994		/var/opt/gitlab	hive
gitlab-www	x	995	993		/var/opt/gitlab/nginx	hive
gitlab-redis	x	994	992		/var/opt/gitlab/redis	hive
gitlab-psql	x	993	991		/var/opt/gitlab/postgresql	hive
nginx	x	992	990	nginx user	/var/cache/nginx	hive
www	x	80	80	Web Application	/www	hive
mysql	x	27	27	MySQL Server	/var/lib/mysql	hive
redis	x	991	989	Redis Database Server	/var/lib/redis	hive
epmd	x	990	988	Erlang Port Mapper Daemon	/tmp	hive
rabbitmq	x	989	987	RabbitMQ messaging server	/var/lib/rabbitmq	hive
solr	x	1001	1001	Apache Solr	/srv/solr	hive
mongodb	x	184	986	MongoDB Database Server	/var/lib/mongodb	hive
test	x	1002	1002		/home/test	hive
sysaccount	x	988	985		/home/sysaccount	hive
systemd-bus-proxy	x	987	983	systemd Bus Proxy	/	hive
systemd-network	x	986	982	systemd Network Management	/	hive
elasticsearch	x	985	980	elasticsearch user	/home/elasticsearch	hive
zabbix	x	984	979	Zabbix Monitoring System	/var/lib/zabbix	hive
mysqlrouter	x	983	978	MySQL Router	/var/lib/mysqlrouter	hive
hadoop	x	1003	1003		/home/hadoop	hive
Time taken: 0.118 seconds, Fetched: 51 row(s)

hive> SHOW PARTITIONS passwd;
OK
computer=hive
Time taken: 0.058 seconds, Fetched: 1 row(s)
				
				

17.3.3. 视图管理

17.3.3.1. 创建视图

				
hive> CREATE VIEW v_test AS SELECT name,age FROM member where age>20;
hive> select * from v_test;
				
				

17.3.3.2. 删除视图

				
hive> drop view test;
OK
Time taken: 0.276 seconds			
				
				

判断视图是否存在

				
hive> DROP VIEW IF EXISTS v_test;
OK
Time taken: 0.495 seconds
				
				

17.3.4. 数据管理

17.3.4.1. 从文本文件导入数据

首先创建一个文本文件,如下:

				
[root@localhost ~]# cat /tmp/hive.txt 
1	2	3
2	3	4
3	4	5
6	7	8
				
				
				
hive> CREATE TABLE test (a int, b int, c int) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';
OK
Time taken: 0.294 seconds
hive> LOAD DATA LOCAL INPATH '/tmp/hive.txt' OVERWRITE INTO TABLE test;
Loading data to table default.test
OK
Time taken: 0.541 seconds
hive> select * from test;
OK
1	2	3
2	3	4
3	4	5
6	7	8
Time taken: 0.952 seconds, Fetched: 5 row(s)
				
				

17.3.4.2. 从其他表查询数据并创建新表

				
hive> CREATE TABLE mytable AS SELECT * FROM anytable;
				
				

17.3.4.3. 从其他表查询数据然后插入指定表中

				
 INSERT OVERWRITE TABLE mytable SELECT * FROM other ;
				
				

17.3.4.4. 从现有表中查询数据然后插入到新的分区表中

				
hive> insert into table table2 partition(created_date) select * from table1;
hive> insert into table newtable partition(type='1') select * from oldtable;
				
				

17.3.5. HDFS与本地文件系统管理

17.3.5.1. HDFS 目录迁移

				
[hadoop@localhost ~]$ hdfs dfs -ls /user/hive/warehouse
Found 3 items
drwxrwxr-x   - hadoop supergroup          0 2017-06-29 03:36 /user/hive/warehouse/member
drwxrwxr-x   - hadoop supergroup          0 2017-06-29 03:32 /user/hive/warehouse/test
drwxrwxr-x   - hadoop supergroup          0 2017-06-29 03:41 /user/hive/warehouse/vipuser

[hadoop@localhost ~]$ hdfs dfs -cp /user/hive/warehouse/vipuser /user/hive/warehouse/vipuser2
				
				

17.3.5.2. 导出表数据到本地文件

				
hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/test' SELECT * FROM test;

hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/test' SELECT * FROM member;
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = hadoop_20170629040540_ddeda146-efed-44c4-bb20-a6453c21cc8e
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1498716998098_0001, Tracking URL = http://localhost:8088/proxy/application_1498716998098_0001/
Kill Command = /srv/apache-hadoop/bin/hadoop job  -kill job_1498716998098_0001
Hadoop job information for Stage-1: number of mappers: 0; number of reducers: 0
2017-06-29 04:05:49,221 Stage-1 map = 0%,  reduce = 0%
Ended Job = job_1498716998098_0001
Moving data to local directory /tmp/test
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 0 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
Time taken: 10.54 seconds	
				
				

17.3.5.3. 导出到HDFS

				
hive> insert overwrite directory '/usr/tmp/test' select * from test;				
				
				

17.3.5.4. 

				
	
		





原文出处:Netkiller 系列 手札
本文作者:陈景峯
转载请与作者联系,同时请务必标明文章原始出处和作者信息及本声明。

上一篇:智能家居市场停滞 究竟发生了什么事情?


下一篇:[20121116]通过bbed观察行链接与行迁移.txt