kubernetes环境宿主机异常关机导致重新部署harbor失败解决案例

  沃佳云的环境又异常关机,导致k8s环境崩溃。之前,重启harbor就能正常使用,但是,今天恢复harbor的时候,发现无论怎样都不行。
  先启动docker服务:

[root@hdss7-200 ~]# systemctl start docker

  重新部署harbor:

[root@hdss7-200 ~]# cd /opt/harbor
[root@hdss7-200 harbor]# ./install.sh

Removing f3181ac0cf37_harbor-portal ... error

Creating harbor-log ... done "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output error
Removing network harborv183_harbor

Creating harbor-core ... 
Recreating f3181ac0cf37_harbor-portal ... 
Recreating f3181ac0cf37_harbor-portal ... error

ERROR: for f3181ac0cf37_harbor-portal  container f3181ac0cf37cd29594dd7fa499b06cee71473c4caf63b5282638513a79b081e: driver "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output error

ERROR: for portal  container f3181ac0cf37cd29594dd7fa499b06cee71473c4caf63b5282638513a79b081e: driver "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output error
ERROR: Encountered errors while bringing up the project.

  查看docker进程:

[root@hdss7-200 harbor]# docker ps -a
CONTAINER ID        IMAGE                                               COMMAND                   CREATED             STATUS                       PORTS                       NAMES
2991dd383a4d        goharbor/harbor-portal:v1.8.3                       "nginx -g 'daemon of…"    5 minutes ago       Up 5 minutes (healthy)       80/tcp                      harbor-portal
39a3a548010c        goharbor/harbor-jobservice:v1.8.3                   "/harbor/start.sh"        5 minutes ago       Up 5 minutes                                             harbor-jobservice
8e77c46135f8        goharbor/harbor-core:v1.8.3                         "/harbor/start.sh"        5 minutes ago       Up 5 minutes (healthy)                                   harbor-core
a451e99b8c61        goharbor/harbor-db:v1.8.3                           "/entrypoint.sh post…"    5 minutes ago       Up 5 minutes (healthy)       5432/tcp                    harbor-db
3e572508f475        goharbor/registry-photon:v2.7.1-patch-2819-v1.8.3   "/entrypoint.sh /etc…"    5 minutes ago       Up 5 minutes (healthy)       5000/tcp                    registry
473b7beada5a        goharbor/redis-photon:v1.8.3                        "docker-entrypoint.s…"    5 minutes ago       Up 5 minutes                 6379/tcp                    redis
160b88a8c778        goharbor/harbor-registryctl:v1.8.3                  "/harbor/start.sh"        5 minutes ago       Up 5 minutes (healthy)                                   registryctl
a5d2fc46e05e        goharbor/harbor-log:v1.8.3                          "/bin/sh -c /usr/loc…"    5 minutes ago       Up 5 minutes (healthy)       127.0.0.1:1514->10514/tcp   harbor-log
f3181ac0cf37        goharbor/harbor-portal:v1.8.3                       "nginx -g 'daemon of…"    2 weeks ago         Removal In Progress                                      f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal
b2ba0b1ac992        724c576ca3fb                                        "/bin/sh -c 'echo \" …"   12 months ago       Exited (2) 12 months ago                                 determined_hermann
9f7b7e876bde        724c576ca3fb                                        "/bin/sh -c 'echo \" …"   12 months ago       Exited (2) 12 months ago                                 elegant_mirzakhani
fa90411e33fe        bf20ac214571                                        "/bin/sh -c 'echo \" …"   12 months ago       Exited (2) 12 months ago                                 dazzling_euclid
d3517506aac9        bf20ac214571                                        "/bin/sh -c 'echo \" …"   12 months ago       Exited (100) 12 months ago                               friendly_snyder
096be691fbba        bf20ac214571                                        "/bin/sh -c 'echo \" …"   12 months ago       Exited (2) 12 months ago                                 zealous_rhodes
1e31fc83a48a        bf20ac214571                                        "/bin/sh -c 'echo \" …"   12 months ago       Exited (100) 12 months ago                               epic_chebyshev
eed93e842291        bf20ac214571                                        "/bin/sh -c 'echo \" …"   12 months ago       Exited (100) 12 months ago                               dreamy_shannon
6c2b6e889664        bf20ac214571                                        "/bin/sh -c 'echo \" …"   12 months ago       Exited (100) 12 months ago

  批量删除退出(EXIT)的docker进程:

[root@hdss7-200 harbor]# for i in `docker ps -a|grep -i exit|awk '{print $1}'`;do docker rm -f $i;done
b2ba0b1ac992
9f7b7e876bde
fa90411e33fe
d3517506aac9
096be691fbba
1e31fc83a48a
eed93e842291
6c2b6e889664
e9032d47ed14
84430f73fd64

  遗留的docker进程删不掉:

[root@hdss7-200 harbor]# docker ps -a
CONTAINER ID        IMAGE                                               COMMAND                  CREATED             STATUS                    PORTS                       NAMES
2991dd383a4d        goharbor/harbor-portal:v1.8.3                       "nginx -g 'daemon of…"   10 minutes ago      Up 10 minutes (healthy)   80/tcp                      harbor-portal
39a3a548010c        goharbor/harbor-jobservice:v1.8.3                   "/harbor/start.sh"       10 minutes ago      Up 10 minutes                                         harbor-jobservice
8e77c46135f8        goharbor/harbor-core:v1.8.3                         "/harbor/start.sh"       10 minutes ago      Up 10 minutes (healthy)                               harbor-core
a451e99b8c61        goharbor/harbor-db:v1.8.3                           "/entrypoint.sh post…"   10 minutes ago      Up 10 minutes (healthy)   5432/tcp                    harbor-db
3e572508f475        goharbor/registry-photon:v2.7.1-patch-2819-v1.8.3   "/entrypoint.sh /etc…"   10 minutes ago      Up 10 minutes (healthy)   5000/tcp                    registry
473b7beada5a        goharbor/redis-photon:v1.8.3                        "docker-entrypoint.s…"   10 minutes ago      Up 10 minutes             6379/tcp                    redis
160b88a8c778        goharbor/harbor-registryctl:v1.8.3                  "/harbor/start.sh"       10 minutes ago      Up 10 minutes (healthy)                               registryctl
a5d2fc46e05e        goharbor/harbor-log:v1.8.3                          "/bin/sh -c /usr/loc…"   10 minutes ago      Up 10 minutes (healthy)   127.0.0.1:1514->10514/tcp   harbor-log
f3181ac0cf37        goharbor/harbor-portal:v1.8.3                       "nginx -g 'daemon of…"   2 weeks ago         Removal In Progress                                   f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal

  docker-compose服务也停不掉:

[root@hdss7-200 harbor]# docker-compose down
Stopping harbor-portal     ... done
Stopping harbor-jobservice ... done
Stopping harbor-core       ... done
Stopping registryctl       ... done
Stopping redis             ... done
Stopping harbor-db         ... done
Stopping registry          ... done
Stopping harbor-log        ... done
Removing harbor-portal                                                                  ... done
Removing harbor-jobservice                                                              ... done
Removing harbor-core                                                                    ... done
Removing registryctl                                                                    ... done
Removing redis                                                                          ... done
Removing harbor-db                                                                      ... done
Removing registry                                                                       ... done
Removing harbor-log                                                                     ... done
Removing f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal ... error

ERROR: for f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal  container f3181ac0cf37cd29594dd7fa499b06cee71473c4caf63b5282638513a79b081e: driver "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output error
Removing network harborv183_harbor

  停止docker服务也不行:

[root@hdss7-200 ~]# systemctl stop docker.service
[root@hdss7-200 ~]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7
rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error

  花式各种删除,都不行:

[root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7
rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error
[root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/*    
rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error
[root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp
rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error

  查看文件属性,也报错:

[root@hdss7-200 harbor]# file /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp
/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: cannot open (Input/output error)

  挪到临时目录也不行:

[root@hdss7-200 harbor]# mv /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7 /tmp
mv: cannot stat '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error

  重新挂载也不行:

[root@hdss7-200 harbor]# mount   
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime)
proc on /proc type proc (rw,nosuid,nodev,noexec,relatime)
devtmpfs on /dev type devtmpfs (rw,nosuid,size=1910324k,nr_inodes=477581,mode=755)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev)
devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000)
tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755)
tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime)
efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime)
cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb)
cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
configfs on /sys/kernel/config type configfs (rw,relatime)
/dev/mapper/centos-root on / type xfs (rw,relatime,attr2,inode64,noquota)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=17256)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
mqueue on /dev/mqueue type mqueue (rw,relatime)
hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime)
nfsd on /proc/fs/nfsd type nfsd (rw,relatime)
/dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota)
/dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro)
/dev/mapper/centos-home on /data type xfs (rw,relatime,attr2,inode64,noquota)
sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime)
tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=384404k,mode=700)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime)
[root@hdss7-200 harbor]# df -h
Filesystem               Size  Used Avail Use% Mounted on
/dev/mapper/centos-root   50G  3.2G   47G   7% /
devtmpfs                 1.9G     0  1.9G   0% /dev
tmpfs                    1.9G     0  1.9G   0% /dev/shm
tmpfs                    1.9G  8.4M  1.9G   1% /run
tmpfs                    1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/sda2               1014M  179M  836M  18% /boot
/dev/sda1                200M   12M  189M   6% /boot/efi
/dev/mapper/centos-home   73G  9.9G   63G  14% /data
tmpfs                    376M     0  376M   0% /run/user/0
[root@hdss7-200 harbor]# mount -o remount,rw /
[root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp 
rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error

  和王导QQ联系,没想到他五一没有出去玩。
  他最后登录到我的沃佳云服务器,说是重启虚拟机后进入“救援模式”,重新mount,就可以回到正常系统里删除了。
  应该是执行如下命令:

mount -o remount, rw /

  应该是他在普通环境删除文件:kubernetes环境宿主机异常关机导致重新部署harbor失败解决案例  docker进程可以被删除了:

[root@hdss7-200 ~]# docker ps -a
CONTAINER ID        IMAGE                           COMMAND                  CREATED             STATUS              PORTS               NAMES
f3181ac0cf37        goharbor/harbor-portal:v1.8.3   "nginx -g 'daemon of…"   2 weeks ago         Dead                                    f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal
[root@hdss7-200 ~]# docker rm -f f318
f318
[root@hdss7-200 ~]# docker ps -a
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

  重新部署harbor成功:

[root@hdss7-200 ~]# cd /opt/harbor
[root@hdss7-200 harbor]# ./install.sh 

[Step 0]: checking installation environment ...

Note: docker version: 19.03.7

Note: docker-compose version: 1.18.0

[Step 1]: loading Harbor images ...
Loaded image: goharbor/harbor-db:v1.8.3
Loaded image: goharbor/redis-photon:v1.8.3
Loaded image: goharbor/notary-signer-photon:v0.6.1-v1.8.3
Loaded image: goharbor/chartmuseum-photon:v0.9.0-v1.8.3
Loaded image: goharbor/harbor-core:v1.8.3
Loaded image: goharbor/harbor-log:v1.8.3
Loaded image: goharbor/harbor-registryctl:v1.8.3
Loaded image: goharbor/notary-server-photon:v0.6.1-v1.8.3
Loaded image: goharbor/clair-photon:v2.0.8-v1.8.3
Loaded image: goharbor/harbor-migrator:v1.8.3
Loaded image: goharbor/prepare:v1.8.3
Loaded image: goharbor/harbor-portal:v1.8.3
Loaded image: goharbor/nginx-photon:v1.8.3
Loaded image: goharbor/harbor-jobservice:v1.8.3
Loaded image: goharbor/registry-photon:v2.7.1-patch-2819-v1.8.3


[Step 2]: preparing environment ...
prepare base dir is set to /opt/harbor-v1.8.3
Clearing the configuration file: /config/log/logrotate.conf
Clearing the configuration file: /config/nginx/nginx.conf
Clearing the configuration file: /config/core/env
Clearing the configuration file: /config/core/app.conf
Clearing the configuration file: /config/registry/config.yml
Clearing the configuration file: /config/registry/root.crt
Clearing the configuration file: /config/registryctl/env
Clearing the configuration file: /config/registryctl/config.yml
Clearing the configuration file: /config/db/env
Clearing the configuration file: /config/jobservice/env
Clearing the configuration file: /config/jobservice/config.yml
Generated configuration file: /config/log/logrotate.conf
Generated configuration file: /config/nginx/nginx.conf
Generated configuration file: /config/core/env
Generated configuration file: /config/core/app.conf
Generated configuration file: /config/registry/config.yml
Generated configuration file: /config/registryctl/env
Generated configuration file: /config/db/env
Generated configuration file: /config/jobservice/env
Generated configuration file: /config/jobservice/config.yml
Creating harbor-log ... done
Generated configuration file: /compose_location/docker-compose.yml
Clean up the input dir

Creating registry ... done
Creating harbor-core ... done
[Step 3]: starting Harbor ...
Creating harbor-portal ... done
Creating nginx ... done
Creating registry ... 
Creating harbor-db ... 
Creating registryctl ... 
Creating redis ... 
Creating harbor-core ... 
Creating harbor-jobservice ... 
Creating harbor-portal ... 
Creating nginx ... 

✔ ----Harbor has been installed and started successfully.----

Now you should be able to visit the admin portal at http://harbor.od.com. 
For more details, please visit https://github.com/goharbor/harbor .

  刷新浏览器页面已恢复正常:
kubernetes环境宿主机异常关机导致重新部署harbor失败解决案例


  1.问题最终是求助解决的,我还给了他20元的QQ红包。
  2.这个问题,一方面是沃佳云真的垃圾到不行,这2年已经异常重启10次以上了。没有那家公司敢用这样的公有云环境。不过,它们的价格倒是真的便宜。而且,多次异常重启逼我遇到过很多故障,反而是帮我提升了排障技术。
  3.这次遇到的问题,在生产环境应该反而不容易碰到。毕竟,生产环境这样频繁重启(几乎是每个月沃佳云都会机器重启),公司早就换云平台了。
  4.kubernetes,终究还是容器编排。所以,只会k8s,docker玩的不熟,那kubernetes肯定也熟不了。
  5.CentOS我当年跟着老男孩只学过单用户修改root密码和修改/etc/fstab文件。这应该也算是linux基础了,过了这么多年,特别是这几年用了阿里云之后,这基础知识更是生疏了。

上一篇:模拟退火笔记(详细)


下一篇:docker基本操作