1.4、CDH 搭建Hadoop在安装之前(推荐的群集主机和角色分配)

推荐的群集主机和角色分配

要点:本主题描述了Cloudera Manager管理的CDH群集的建议角色分配。您为部署选择的实际分配可能会有所不同,具体取决于工作负载的类型和数量,群集中部署的服务,硬件资源,配置和其他因素。

使用Cloudera Manager安装向导安装CDH时,Cloudera Manager会尝试根据主机中可用的资源在群集主机(分配给网关主机的角色除外)之间分配角色。您可以在向导中显示的“ 自定义角色分配”页面上更改这些分配。您也可以稍后使用Cloudera Manager更改和添加角色。请参阅角色实例

如果您的群集使用静态数据加密,请参阅为密钥受托者服务器和密钥受托者KMS分配主机

有关在何处找到Cloudera Manager和其他服务所需的各种数据库的信息,请参阅步骤4:安装和配置数据库

CDH群集主机和角色分配

群集主机可以大致描述为以下类型:
  • 主主机运行Hadoop主进程,例如HDFS NameNode和YARN Resource Manager。
  • 实用程序主机运行不是主进程的其他集群进程,例如Cloudera Manager和Hive Metastore。
  • 网关主机是用于在群集中启动作业的客户端访问点。所需的网关主机数量取决于工作负载的类型和大小。
  • 工作者主机主要运行DataNode和其他分布式进程,例如Impalad。

重要提示: Cloudera建议您在生产环境中使用CDH时始终启用高可用性。

3 - 10 Worker Hosts without High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • YARN ResourceManager
  • JobHistory Server
  • ZooKeeper
  • Kudu master
  • Spark History Server
One host for all Utility and Gateway roles:
  • Secondary NameNode
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • HiveServer2
  • Impala Catalog Server
  • Impala StateStore
  • Hue
  • Oozie
  • Flume
  • Gateway configuration
3 - 10 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server

3 - 20 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • JobHistory Server
  • Spark History Server
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master

Master Host 3:

  • Kudu master (Kudu requires an odd number of masters for HA.)
Utility Host 1:
  • Cloudera Manager
  • Cloudera Manager Management Service
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
  • ZooKeeper (requires dedicated disk)
  • JournalNode (requires dedicated disk)
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
3 - 20 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server

20 - 80 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 3:
  • ZooKeeper
  • JournalNode
  • JobHistory Server
  • Spark History Server
  • Kudu master
Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Cloudera Manager Management Service
  • Hive Metastore
  • Impala Catalog Server
  • Oozie
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
20 - 80 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server

80 - 200 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • YARN ResourceManager
  • ZooKeeper
  • Kudu master
Master Host 3:
  • ZooKeeper
  • JournalNode
  • JobHistory Server
  • Spark History Server
  • Kudu master
Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Activity Monitor
Utility Host 4:
  • Host Monitor
Utility Host 5:
  • Navigator Audit Server
Utility Host 6:
  • Navigator Metadata Server
Utility Host 7:
  • Reports Manager
Utility Host 8:
  • Service Monitor
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
80 - 200 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

200 - 500 Worker Hosts with High Availability

Master Hosts Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 3:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Kudu master
Master Host 4:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
Master Host 5:
  • JobHistory Server
  • Spark History Server
  • ZooKeeper
  • JournalNode

We recommend no more than three Kudu masters.

Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Activity Monitor
Utility Host 4:
  • Host Monitor
Utility Host 5:
  • Navigator Audit Server
Utility Host 6:
  • Navigator Metadata Server
Utility Host 7:
  • Reports Manager
Utility Host 8:
  • Service Monitor
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
200 - 500 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

500 -1000 Worker Hosts with High Availability

Master Hosts
Utility Hosts Gateway Hosts Worker Hosts
Master Host 1:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 2:
  • NameNode
  • JournalNode
  • FailoverController
  • ZooKeeper
  • Kudu master
Master Host 3:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
  • Kudu master
Master Host 4:
  • YARN ResourceManager
  • ZooKeeper
  • JournalNode
Master Host 5:
  • JobHistory Server
  • Spark History Server
  • ZooKeeper
  • JournalNode

We recommend no more than three Kudu masters.

Utility Host 1:
  • Cloudera Manager
Utility Host 2:
  • Hive Metastore
  • Impala Catalog Server
  • Impala StateStore
  • Oozie
Utility Host 3:
  • Activity Monitor
Utility Host 4:
  • Host Monitor
Utility Host 5:
  • Navigator Audit Server
Utility Host 6:
  • Navigator Metadata Server
Utility Host 7:
  • Reports Manager
Utility Host 8:
  • Service Monitor
One or more Gateway Hosts:
  • Hue
  • HiveServer2
  • Flume
  • Gateway configuration
500 - 1000 Worker Hosts:
  • DataNode
  • NodeManager
  • Impalad
  • Kudu tablet server (Recommended maximum number of tablet servers is 100.)

为密钥受托者服务器和密钥受托者KMS分配主机

如果要为CDH群集启用静态数据加密,Cloudera建议您通过在Cloudera Manager管理的单独群集中的专用主机上部署密钥受托者服务器,将密钥受托者服务器与其他企业数据中心(EDH)服务隔离开来。Cloudera还建议在与需要访问Key Trustee Server的EDH服务相同的群集中的专用主机上部署Key Trustee KMS。此体系结构允许多个群集共享相同的密钥托管服务器,并避免在重新启动群集时重新启动密钥托管服务器。

有关在EDH中加密静态数据的详细信息,请参阅加密静态数据

对于一般的生产环境,或者如果您已启用HDFS的高可用性并且正在使用静态数据加密,Cloudera建议您为密钥受托服务器和密钥受托者KMS启用高可用性。

上一篇:1.3、CDH 搭建Hadoop在安装之前(端口)


下一篇:1.3.1、CDH 搭建Hadoop在安装之前(端口---Cloudera Manager和Cloudera Navigator使用的端口)