冬季实战营第五期学习报告

1 动手实战-基于EMR离线数据分析

1.1 创建资源,连接EMR集群

场景申请到的资源

冬季实战营第五期学习报告

登陆ram子账号,找到主节点公网IP地址

冬季实战营第五期学习报告

连接EMR集群,场景中的终端操作起来不太方便,使用本地putty终端也可以连接到主节点,完成后面的操作。

1.2 导入数据至EMR集群

在HDFS上创建目录,将编辑的文件放到HFDS文件系统上

[root@emr-header-1 ~]hdfs dfs -mkdir -p /data/student

[root@emr-header-1 ~]vim u.txt

[root@emr-header-1 ~] hdfs dfs -put u.txt /data/student

显示放入的文件和文件内容

[root@emr-header-1 ~]# hdfs dfs -ls /data/student
Found 1 items
-rw-r-----   2 root hadoop       2391 2022-02-28 09:30 /data/student/u.txt
[root@emr-header-1 ~]# hdfs dfs -cat /data/student/u.txt
196  242  3  881250949
186  302  3  891717742
22  377  1  878887116
244  51  2  880606923
166  346  1  886397596
298  474  4  884182806
115  265  2  881171488
253  465  5  891628467
305  451  3  886324817

登陆hive,创建表,导入数据

[root@emr-header-1 ~]# hive
Logging initialized using configuration in file:/etc/ecm/hive-conf-2.3.2-1.0.1/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> CREATE TABLE emrusers (
        userid INT,
        movieid INT,
        rating INT,
        unixtime STRING )
        ROW FORMAT DELIMITED
        FIELDS TERMINATED BY '\t'
        ;
    OK
    Time taken: 1.053 seconds
hive>  LOAD DATA INPATH '/data/student/u.txt' INTO TABLE emrusers;
    Loading data to table default.emrusers
    OK
    Time taken: 0.459 seconds

1.3 查询表,在表上运行统计分析sql语句

查看表的前五行数据,sql语句被转成了map-reduce任务,花费的时间较长。

hive> select * from emrusers limit 5;
    OK
    196     242     3        881250949
    186     302     3        891717742
    22      377     1       878887116
    244     51      2       880606923
    166     346     1       886397596
    Time taken: 0.069 seconds, Fetched: 5 row(s)

查询表的总行数,sql语句被转成了map-reduce任务,花费的时间较长。

hive> select count(*) from emrusers;
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = root_20220228110103_9aec542e-2d15-49de-b0fe-388ee617b755
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1646010854736_0005, Tracking URL = http://emr-header-1.cluster-286405:20888/proxy/application_1646010854736_0005/
    Kill Command = /usr/lib/hadoop-current/bin/hadoop job  -kill job_1646010854736_0005
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2022-02-28 11:01:11,438 Stage-1 map = 0%,  reduce = 0%
    2022-02-28 11:01:16,722 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.99 sec
    2022-02-28 11:01:22,891 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.28 sec
    MapReduce Total cumulative CPU time: 2 seconds 280 msec
    Ended Job = job_1646010854736_0005
    MapReduce Jobs Launched:
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.28 sec   HDFS Read: 10079 HDFS Write: 103 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 280 msec
    OK
    106
    Time taken: 20.893 seconds, Fetched: 1 row(s)

查询数据表中评级最高的三个电影,sql语句被转成了map-reduce任务,花费的时间较长。

hive> select movieid,sum(rating) as rat from emrusers group by movieid order by rat desc limit 3;
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = root_20220228110213_6733e92a-00ed-4d71-b289-5be55aaa26af
    Total jobs = 2
    Launching Job 1 out of 2
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1646010854736_0006, Tracking URL = http://emr-header-1.cluster-286405:20888/proxy/application_1646010854736_0006/
    Kill Command = /usr/lib/hadoop-current/bin/hadoop job  -kill job_1646010854736_0006
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2022-02-28 11:02:21,418 Stage-1 map = 0%,  reduce = 0%
    2022-02-28 11:02:25,532 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.0 sec
    2022-02-28 11:02:30,664 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 2.0 sec
    MapReduce Total cumulative CPU time: 2 seconds 0 msec
    Ended Job = job_1646010854736_0006
    Launching Job 2 out of 2
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1646010854736_0007, Tracking URL = http://emr-header-1.cluster-286405:20888/proxy/application_1646010854736_0007/
    Kill Command = /usr/lib/hadoop-current/bin/hadoop job  -kill job_1646010854736_0007
    Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
    2022-02-28 11:02:38,922 Stage-2 map = 0%,  reduce = 0%
    2022-02-28 11:02:43,038 Stage-2 map = 100%,  reduce = 0%, Cumulative CPU 1.12 sec
    2022-02-28 11:02:48,162 Stage-2 map = 100%,  reduce = 100%, Cumulative CPU 2.14 sec
    MapReduce Total cumulative CPU time: 2 seconds 140 msec
    Ended Job = job_1646010854736_0007
    MapReduce Jobs Launched:
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 2.0 sec   HDFS Read: 9642 HDFS Write: 2131 SUCCESS
    Stage-Stage-2: Map: 1  Reduce: 1   Cumulative CPU: 2.14 sec   HDFS Read: 7869 HDFS Write: 143 SUCCESS
    Total MapReduce CPU Time Spent: 4 seconds 140 msec
    OK
    144     13
    274     10
    304     9
Time taken: 36.114 seconds, Fetched: 3 row(s)


2 动手实战-使用阿里云Elasticsearch快速搭建智能运维系统

2.1 申请资源,登录Elasticsearch集群

场景申请到的资源如下

冬季实战营第五期学习报告

登录子账号能看到三个Elasticsearch集群

冬季实战营第五期学习报告


核对一下,本次体验申请到的资源应该是es-cn-jpy7 开头的集群

冬季实战营第五期学习报告

修改Kibana配置,打开私网访问,从公网访问kibana。

2.2 开启自动创建索引功能

这一步比较坑的是dev工具在左侧导航栏的最下面,不知这个导航栏是以什么顺序排列的。

2.3 创建metricbeat采集器

冬季实战营第五期学习报告

选择ecs实例后,启动采集器

冬季实战营第五期学习报告

查看采集器状态

冬季实战营第五期学习报告

启动器状态为已生效

冬季实战营第五期学习报告

一共创建了3个采集器,只有一个成功运行,状态为已生效0/1的采集器其实部署是失败的。

查看dashboard

冬季实战营第五期学习报告


可以看到ECS的进程数,cpu、系统负载等。

2.4 总结

这个场景有一定难度,不知为啥场景中出现了多个Elasticsearch集群,对于采集器来说只能创建,删除和重启时都提示权限不够,创建的采集器有2个部署失败,体验手册中也没有给出分析和解决办法。

3 推荐系统入门之使用协同过滤实现商品推荐

这个场景除了需要因为版本变化需要切换到旧版本之外,其它同体验手册完全相同,甚至数据和结果也和体验手册完全一致。

打开实验 冬季实战营第五期学习报告

检查数据

冬季实战营第五期学习报告

运行实验

冬季实战营第五期学习报告

运行完成

冬季实战营第五期学习报告

检查join-1 节点结果,显示相似条目

冬季实战营第五期学习报告


查看全表统计-1 .显示推荐的结果

冬季实战营第五期学习报告

查看全表统计-2,显示相关性。

冬季实战营第五期学习报告


上一篇:致IT同仁 — IT人士常犯的17个职场错误


下一篇:HTTP状态码?