OushuDB 快速入手

OushuDB 快速入手

快速入手

本节将通过RPM安装物理机版本的一个Centos/Redhat 7.x单节点集群。假设我们安装的服务器hostname为oushu(可以通过命令:hostname 直接获取,请将文中所有出现的oushu替换为实际的hostname)。此次部署大约需要您30分钟时间。

安装准备

首先使用root登录。 查看有无avx指令:

cat /proc/cpuinfo | grep avx

安装oushu yum源:

#Redhat/CentOS 7.0, 7.1, 7.2系统并且包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ $获取的repo url

#Redhat/CentOS 7.0, 7.1, 7.2系统但不包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ $获取的repo url

#Redhat/CentOS 7.3系统并且包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ $获取的repo url

#Redhat/CentOS 7.3系统但不包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ $获取的repo url

#Redhat/CentOS 7.4系统并且包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ $获取的repo url

#Redhat/CentOS 7.4系统但不包含avx指令请配置以下YUM源:
wget -P /etc/yum.repos.d/ $获取的repo url

禁用selinux:

sed -i "s/^SELINUX\=enforcing/SELINUX\=disabled/g" /etc/selinux/configsetenforce 0

关闭防火墙:

systemctl stop iptablessystemctl disable iptablessystemctl stop firewalldsystemctl disable firewalld

安装Java:

yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-develmkdir -p /usr/java//注意查看本机的java版本ln -s /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.141-2.b16.el7_4.x86_64  /usr/java/default

安装HDFS

安装HDFS并且创建其使用的目录,这里我们假设我们的机器上有两个数据盘,分别mount在/data1和/data2目录,如果您有多块盘,下面的目录创建以及配置文件需要做相应的更改。尤其对HDFS的数据目录以及OushuDB的临时文件目录位置。

#由于hadoop依赖于特定版本的snappy,请先卸载snappy确保安装的顺利进行yum -y remove snappy#安装HDFS RPM,RPM安装会自动创建hdfs用户yum install -y hadoop hadoop-hdfs#在/data1上创建NameNode目录mkdir -p /data1/hdfs/namenode#在每块盘上创建DataNode数据目录,并更改权限mkdir -p /data1/hdfs/datanodechmod -R 755 /data1/hdfschown -R hdfs:hadoop /data1/hdfsmkdir -p /data2/hdfs/datanodechmod -R 755 /data2/hdfschown -R hdfs:hadoop /data2/hdfs
复制下列文件到/etc/hadoop/conf/中(遇到覆盖提示,请输入y,表示确认覆盖)
http://www.oushu.com/docs/ch/_downloads/908bee114673dff44292d2b51ed5a1ce/core-site.xml.
http://www.oushu.com/docs/ch/_downloads/a57b214c41f418570548204fdf5089b3/hdfs-site.xml.
http://www.oushu.com/docs/ch/_downloads/5caeda7d6d35f2ab18438c8994e855c1/hadoop-env.sh.

编辑/etc/hadoop/conf/core-site.xml文件中的fs.defaultFS属性,其他系统通过这个url来访问HDFS,注:在做format之前,请确认已经将core-site.xml中fs.defaultFS的值由oushu替换成hostname。:

<property><name>fs.defaultFS</name><value>hdfs://oushu:9000</value></property>

编辑 /etc/hadoop/conf/hadoop-env.sh,加入下面参数。这些参数配置了Java Home,Hadoop配置文件,日志文件目录,以及JVM选项。根据存储的HDFS数据量大小,需要适当修改NameNode的-Xmx值。HDFS数据量越大,-Xmx值应该设的越大。

export JAVA_HOME="/usr/java/default"
export HADOOP_CONF_DIR="/etc/hadoop/conf"
export HADOOP_NAMENODE_OPTS="-Xmx6144m -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70"
export HADOOP_DATANODE_OPTS="-Xmx2048m -Xss256k"
export HADOOP_LOG_DIR=/var/log/hadoop/$USER

因为/etc/hadoop/conf/hdfs-site.xml中默认使用/data1和/data2两块盘,如果你有多块盘,你需要更改dfs.data.dir属性,使得HDFS用到所有盘:

<property><name>dfs.data.dir</name><value>/data1/hdfs/datanode,/data2/hdfs/datanode</value><final>true</final></property>

格式化NameNode,并启动NameNode和DataNode。
注:在format过程中如果询问是否要format,请输入y,表示确认。

sudo -u -E hdfs hdfs namenode -formatsudo -u -E hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start namenodesudo -u -E hdfs /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh start datanode

HDFS日志在/var/log/hadoop/hdfs/中。如果因为配置出错,可以查看错误日志,并依据改正。
检查hdfs是否成功运行:

su - hdfshdfs dfsadmin -reporthdfs dfs -mkdir /testnodehdfs dfs -put /usr/hdp/current/hadoop-client/sbin/hadoop-daemon.sh /testnode/hdfs dfs -ls -R /

你也可以查看HDFS web界面:http://oushu:50070/

安装OushuDB

安装OushuDB RPM,OushuDB RPM安装会自动创建gpadmin用户。

yum install -y hawq

在配置文件/etc/sysctl.conf添加内容

kernel.shmmax = 1000000000kernel.shmmni = 4096kernel.shmall = 4000000000kernel.sem = 250 512000 100 2048kernel.sysrq = 1kernel.core_uses_pid = 1kernel.msgmnb = 65536kernel.msgmax = 65536kernel.msgmni = 2048net.ipv4.tcp_syncookies = 0net.ipv4.conf.default.accept_source_route = 0net.ipv4.tcp_tw_recycle = 1net.ipv4.tcp_max_syn_backlog = 200000net.ipv4.conf.all.arp_filter = 1net.ipv4.ip_local_port_range = 10000 65535net.core.netdev_max_backlog = 200000fs.nr_open = 3000000kernel.threads-max = 798720kernel.pid_max = 798720# increase networknet.core.rmem_max=2097152net.core.wmem_max=2097152net.core.somaxconn=4096

使系统配置生效:

sysctl -p

创建OushuDB本地元数据目录和临时文件目录:

#创建OushuDB本地元数据目录,下面两个目录分别为master和segment使用mkdir -p /data1/hawq/masterddmkdir -p /data1/hawq/segmentdd#创建OushuDB临时文件目录,每块盘需要创建一个临时文件目录,这样可以让OushuDB使用到所有盘。mkdir -p /data1/hawq/tmpchmod -R 755 /data1/hawqchown -R gpadmin:gpadmin /data1/hawqmkdir -p /data2/hawq/tmpchmod -R 755 /data2/hawqchown -R gpadmin:gpadmin /data2/hawq

在HDFS上创建OushuDB数据目录:

sudo -u hdfs hdfs dfs -mkdir -p /hawq_defaultsudo -u hdfs hdfs dfs -chown -R gpadmin /hawq_default

编辑/usr/local/hawq/etc/slaves,去掉文件中的localhost,并加入oushu。slaves文件中存放所有slave节点的地址,每行一个节点。修改后文件为:
oushu
编辑/usr/local/hawq/etc/hawq-site.xml, 因为/usr/local/hawq/etc/hawq-site.xml中默认使用/data1和/data2两块盘,如果你有多块盘,你需要更改hawq_master_temp_directory和hawq_segment_temp_directory值,用到所有盘:

<property><name>hawq_master_address_host</name><value>oushu</value><description>The host name of hawq master.</description></property><property><name>hawq_dfs_url</name><value>oushu:9000/hawq_default</value><description>URL for accessing HDFS.</description></property><property><name>magma_nodes_url</name><value>localhost:6666</value><description>urls for accessing magma.</description></property><property><name>hawq_master_directory</name><value>/data1/hawq/masterdd</value><description>The directory of hawq master.</description></property><property><name>hawq_segment_directory</name><value>/data1/hawq/segmentdd</value><description>The directory of hawq segment.</description></property><property><name>hawq_master_temp_directory</name><value>/data1/hawq/tmp,/data2/hawq/tmp</value><description>The temporary directory reserved for hawq master. Note: please DONOT add " " between directries.</description></property><property><name>hawq_segment_temp_directory</name><value>/data1/hawq/tmp,/data2/hawq/tmp</value><description>The temporary directory reserved for hawq segment. Note: please DONOT add " " between directories.</description></property>

OushuDB4.0版本新增Magma的单独配置和启停功能,使用magam服务时,首先创建magma node数据目录:

# 创建mamga node数据目录mkdir -p /data1/hawq/magma_segmentddmkdir -p /data2/hawq/magma_segmentddchown -R gpadmin:gpadmin /data1/hawqchown -R gpadmin:gpadmin /data2/hawq

然后编辑配置/usr/local/hawq/etc/magma-site.xml:

<property><name>nodes_file</name><value>slaves</value><description>The magma nodes file name at GPHOME/etc</description></property><property><name>node_data_directory</name><value>file:///data1/hawq/magma_segmentdd,file:///data2/hawq/magma_segmentdd</value><description>The data directory for magma node</description></property><property><name>node_log_directory</name><value>~/hawq-data-directory/segmentdd/pg_log</value><description>The log directory for magma node</description></property><property><name>node_address_port</name><value>6666</value><description>The port magma node listening</description></property><property><name>magma_range_number</name><value>2</value></property><property><name>magma_replica_number</name><value>3</value></property><property><name>magma_datadir_capacity</name><value>3</value></property><property><name>compact_trigger_ap_ratio_limit</name><value>0.2</value><description>The threshold of triggering compact in MAGMAAP format.</description></property><property><name>compact_trigger_tp_ratio_limit</name><value>0.5</value><description>The threshold of triggering compact in MAGMAAP catalog</description></property>

以gpadmin用户登录:

su - gpadmin

设置免密码ssh:

source /usr/local/hawq/greenplum_path.shhawq ssh-exkeys -h oushu
初始化OushuDB,在询问是否初始化时,请输入y,表示确认初始化。
hawq init cluster   //OushuDB4.0 默认不启动magma服务
hawq init cluster --with_magma  //OushuDB4.0新增,3.X版本不支持该选项
// OushuDB4.0版本新增--with_magma选项,但只有hawq init|start|stop cluster命令可以带--with_magma选项。

OushuDB管理工具日志在/home/gpadmin/hawqAdminLogs/中,OushuDB master日志和segment日志分别在/data1/hawq/masterdd/pg_log/ 和/data1/hawq/segmentdd/pg_log/中。如果因为配置出错,可以查看错误日志,并依据改正。
检查OushuDB是否运行正常:

su - gpadminsource /usr/local/hawq/greenplum_path.shpsql -d postgresselect * from gp_segment_configuration;  //确定所有节点是up状态create table t(i int);insert into t select generate_series(1,1000);select count(*) from t;

体验新执行器

本章节通过TPCH lineitem 表来展示新执行器的使用。
建立e_lineitem外部表用来生成TPCH lineitem 数据,

CREATE EXTERNAL WEB TABLE E_LINEITEM ( L_ORDERKEY    INT8 ,L_PARTKEY     INTEGER ,L_SUPPKEY     INTEGER ,L_LINENUMBER  INTEGER ,L_QUANTITY    FLOAT ,L_EXTENDEDPRICE  FLOAT ,L_DISCOUNT    FLOAT ,L_TAX         FLOAT ,L_RETURNFLAG  VARCHAR(1) ,L_LINESTATUS  VARCHAR(1) ,L_SHIPDATE    TEXT ,L_COMMITDATE  TEXT ,L_RECEIPTDATE TEXT ,L_SHIPINSTRUCT CHAR(25) ,L_SHIPMODE     VARCHAR(10) ,L_COMMENT      VARCHAR(44) )EXECUTE 'bash -c "$GPHOME/bin/dbgen -b $GPHOME/bin/dists.dss -T L -s 1 -N 6 -n $((GP_SEGMENT_ID + 1))"'on 6 format 'text' (delimiter '|');

创建ORC 表

CREATE TABLE lineitem(  L_ORDERKEY    INT8,L_PARTKEY     INTEGER,L_SUPPKEY     INTEGER,L_LINENUMBER  INTEGER,L_QUANTITY    FLOAT,L_EXTENDEDPRICE  FLOAT,L_DISCOUNT    FLOAT,L_TAX         FLOAT,L_RETURNFLAG  TEXT,L_LINESTATUS  TEXT,L_SHIPDATE    TEXT,L_COMMITDATE  TEXT,L_RECEIPTDATE TEXT,L_SHIPINSTRUCT TEXT,L_SHIPMODE     TEXT,L_COMMENT      TEXT)WITH (APPENDONLY = true, OIDS = FALSE, ORIENTATION = orc);

插入数据

INSERT INTO lineitem SELECT * FROM e_lineitem;

从下面的例子可以看到新执行器对于性能的大幅改进。

-----获取表行数------postgres=# set new_executor = on;SETpostgres=# SELECT COUNT(*) FROM lineitem;count---------6001215(1 row)Time: 17.006 mspostgres=# set new_executor = off;SETpostgres=# SELECT COUNT(*) FROM lineitem;count---------6001215(1 row)Time: 213.248 ms-----TPCH 查询 1 ------postgres=# set new_executor = on;SETpostgres=#  SELECTl_returnflag,l_linestatus,sum(l_quantity)::bigint as sum_qty,sum(l_extendedprice)::bigint as sum_base_price,sum(l_extendedprice * (1 - l_discount))::bigint as sum_disc_price,sum(l_extendedprice * (1 - l_discount) * (1 + l_tax))::bigint as sum_charge,avg(l_quantity)::bigint as avg_qty,avg(l_extendedprice)::bigint as avg_price,avg(l_discount)::bigint as avg_disc,count(*) as count_orderFROMlineitemWHEREl_shipdate <= '1998-08-20'GROUP BYl_returnflag,l_linestatus;l_returnflag | l_linestatus | sum_qty  | sum_base_price | sum_disc_price |  sum_charge  | avg_qty | avg_price | avg_disc | count_order--------------+--------------+----------+----------------+----------------+--------------+---------+-----------+----------+-------------R            | F            | 37719753 |    56568041381 |    53741292685 |  55889619120 |      26 |     38251 |        0 |     1478870N            | F            |   991417 |     1487504710 |     1413082168 |   1469649223 |      26 |     38284 |        0 |       38854A            | F            | 37734107 |    56586554401 |    53758257135 |  55909065223 |      26 |     38273 |        0 |     1478493N            | O            | 73808911 |   110700990251 |   105167436999 | 109377979031 |      26 |     38248 |        0 |     2894278(4 rows)Time: 234.376 mspostgres=# set new_executor = off;SETpostgres=#  SELECTl_returnflag,l_linestatus,sum(l_quantity)::bigint as sum_qty,sum(l_extendedprice)::bigint as sum_base_price,sum(l_extendedprice * (1 - l_discount))::bigint as sum_disc_price,sum(l_extendedprice * (1 - l_discount) * (1 + l_tax))::bigint as sum_charge,avg(l_quantity)::bigint as avg_qty,avg(l_extendedprice)::bigint as avg_price,avg(l_discount)::bigint as avg_disc,count(*) as count_orderFROMlineitemWHEREl_shipdate <= '1998-08-20'GROUP BYl_returnflag,l_linestatus;l_returnflag | l_linestatus | sum_qty  | sum_base_price | sum_disc_price |  sum_charge  | avg_qty | avg_price | avg_disc | count_order--------------+--------------+----------+----------------+----------------+--------------+---------+-----------+----------+-------------R            | F            | 37719753 |    56568041381 |    53741292685 |  55889619120 |      26 |     38251 |        0 |     1478870N            | F            |   991417 |     1487504710 |     1413082168 |   1469649223 |      26 |     38284 |        0 |       38854A            | F            | 37734107 |    56586554401 |    53758257135 |  55909065223 |      26 |     38273 |        0 |     1478493N            | O            | 73808911 |   110700990251 |   105167436999 | 109377979031 |      26 |     38248 |        0 |     2894278(4 rows)Time: 2341.147 ms
上一篇:OushuDB 创建和管理视图


下一篇:OushuDB 的主要功能