1.创建ODPS表
create table hbaseport.odps_test (
key string,
value1 string,
value2 bigint);
2.配置MR集群访问云HBASE环境
开通云 HBase HDFS 端口
配置hdfs-site.xml使之能访问云HBASE HA的hdfs, 具体参考这里
配置hbase-site.xml文件可访问云HBASE
在MR集群上创建临时conf目录, 执行hadoop命或者yarn命令时通过--config选项添加到命令运行时的classpath中,目录中包括如下:
ls conf/
core-site.xml hbase-site.xml hdfs-site.xml
mapred-site.xml yarn-site.xml
3.创建Phoenix测试表
DROP TABLE IF EXISTS TABLE1;
CREATE TABLE TABLE1 (
ID VARCHAR NOT NULL PRIMARY KEY,
V1 VARCHAR,
V2 BIGINT)
SALT_BUCKETS = 10,UPDATE_CACHE_FREQUENCY = 120000;
CREATE INDEX V1_IDX on TABLE1(V1) include(v2);
CREATE INDEX V2_IDX on TABLE1(V2) include(v1);
4.导入测试数据到ODSP表
通过csv导入odps300w数据
5.执行Bulkload命令
使用 Phoenix 提供的 client jar 运行 Bulkload命令:
yarn --config conf \
jar ali-phoenix-4.12.0-AliHBase-1.1-0.4-Final/ali-phoenix-4.12.0-AliHBase-1.1-0.4-Final-client.jar \
org.apache.phoenix.mapreduce.ODPSBulkLoadTool \
--table "TABLE1" \
--access_id "xxx" \
--access_key "xxx" \
--odps_url "http://odps-ext.aliyun-inc.com/api" \
--odps_tunnel_url "http://dt-ext.odps.aliyun-inc.com" \
--odps_project "hbaseport" \
--odps_table "odps_test" \
--odps_partition_number 15 \
--zookeeper "zk1,zk2,zk3" \
--output "hdfs://emr-cluster/tmp/tmp_data"
6.验证
Phoenix 表数据验证
0: jdbc:phoenix:localhost> select count(*) from TABLE1;
+-----------+
| COUNT(1) |
+-----------+
| 3124856 |
+-----------+
1 row selected (4.618 seconds)
0: jdbc:phoenix:localhost> select count(*) from V1_IDX;
+-----------+
| COUNT(1) |
+-----------+
| 3124856 |
+-----------+
1 row selected (3.149 seconds)
0: jdbc:phoenix:localhosts> select count(*) from V2_IDX;
+-----------+
| COUNT(1) |
+-----------+
| 3124856 |
+-----------+
1 row selected (4.386 seconds)