1、问题示例
[Hadoop@master Logs]$ jps
3728 ResourceManager
6976 RunJar
7587 Jps
4277 Master
3095 NameNode
3863 NodeManager
3450 SecondaryNameNode
4362 Worker
3245 DataNode
[Hadoop@master Logs]$ kill -9 6976
[Hadoop@master Logs]$ jps
3728 ResourceManager
6976 RunJar
4277 Master
3095 NameNode
3863 NodeManager
7607 Jps
3450 SecondaryNameNode
4362 Worker
3245 DataNode
问题描述:不正常启动Hive,留下的RunJar进程,通过不能成功kill掉,该进程变成僵尸进程。
2、问题剖析
参考:https://blog.csdn.net/walykyy/article/details/113253060
僵尸进程不能直接被kill掉,可从僵尸进程的父进程进行kill掉。
3、解决方案
找到僵尸进程,僵尸进程的标记符为:PPid.
按如下步骤进行:
[Hadoop@master Logs]$ cd /proc/6976
[Hadoop@master 6976]$ ls
ls: 无法读取符号链接cwd: 权限不够
ls: 无法读取符号链接root: 权限不够
ls: 无法读取符号链接exe: 权限不够
attr coredump_filter gid_map mountinfo oom_score sched statm
autogroup cpuset io mounts oom_score_adj schedstat status
auxv cwd limits mountstats pagemap sessionid syscall
cgroup environ loginuid net patch_state setgroups task
clear_refs exe map_files ns personality smaps timers
cmdline fd maps numa_maps projid_map stack uid_map
comm fdinfo mem oom_adj root stat wchan
[Hadoo@master 6976]$ cat status
Name: java
State: Z (zombie)
Tgid: 6976
Ngid: 0
Pid: 6976
PPid: 6975
TracerPid: 0
Uid: 1001 1001 1001 1001
Gid: 1001 1001 1001 1001
FDSize: 0
Groups: 0 1001
Threads: 1
SigQ: 3/15023
SigPnd: 0000000000000000
ShdPnd: 0000000000004100
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: 2000000181005ccf
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
CapBnd: 0000001fffffffff
CapAmb: 0000000000000000
NoNewPrivs: 0
Seccomp: 0
Speculation_Store_Bypass: thread vulnerable
Cpus_allowed: 3
Cpus_allowed_list: 0-1
Mems_allowed:
*********(此处有省略)
Mems_allowed_list: 0
voluntary_ctxt_switches: 50
nonvoluntary_ctxt_switches: 14
[Hadoop@master 6976]$ kill -9 6975
[Hadoop@master 6976]$ jps
3728 ResourceManager
4277 Master
3095 NameNode
3863 NodeManager
7832 Jps
3450 SecondaryNameNode
4362 Worker
3245 DataNode
以上成功kill掉僵尸进程RunJar 6875