沃佳云的环境又异常关机,导致k8s环境崩溃。之前,重启harbor就能正常使用,但是,今天恢复harbor的时候,发现无论怎样都不行。
先启动docker服务:
[root@hdss7-200 ~]# systemctl start docker
重新部署harbor:
[root@hdss7-200 ~]# cd /opt/harbor [root@hdss7-200 harbor]# ./install.sh Removing f3181ac0cf37_harbor-portal ... error Creating harbor-log ... done "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output error Removing network harborv183_harbor Creating harbor-core ... Recreating f3181ac0cf37_harbor-portal ... Recreating f3181ac0cf37_harbor-portal ... error ERROR: for f3181ac0cf37_harbor-portal container f3181ac0cf37cd29594dd7fa499b06cee71473c4caf63b5282638513a79b081e: driver "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output error ERROR: for portal container f3181ac0cf37cd29594dd7fa499b06cee71473c4caf63b5282638513a79b081e: driver "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output error ERROR: Encountered errors while bringing up the project.
查看docker进程:
[root@hdss7-200 harbor]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2991dd383a4d goharbor/harbor-portal:v1.8.3 "nginx -g 'daemon of…" 5 minutes ago Up 5 minutes (healthy) 80/tcp harbor-portal 39a3a548010c goharbor/harbor-jobservice:v1.8.3 "/harbor/start.sh" 5 minutes ago Up 5 minutes harbor-jobservice 8e77c46135f8 goharbor/harbor-core:v1.8.3 "/harbor/start.sh" 5 minutes ago Up 5 minutes (healthy) harbor-core a451e99b8c61 goharbor/harbor-db:v1.8.3 "/entrypoint.sh post…" 5 minutes ago Up 5 minutes (healthy) 5432/tcp harbor-db 3e572508f475 goharbor/registry-photon:v2.7.1-patch-2819-v1.8.3 "/entrypoint.sh /etc…" 5 minutes ago Up 5 minutes (healthy) 5000/tcp registry 473b7beada5a goharbor/redis-photon:v1.8.3 "docker-entrypoint.s…" 5 minutes ago Up 5 minutes 6379/tcp redis 160b88a8c778 goharbor/harbor-registryctl:v1.8.3 "/harbor/start.sh" 5 minutes ago Up 5 minutes (healthy) registryctl a5d2fc46e05e goharbor/harbor-log:v1.8.3 "/bin/sh -c /usr/loc…" 5 minutes ago Up 5 minutes (healthy) 127.0.0.1:1514->10514/tcp harbor-log f3181ac0cf37 goharbor/harbor-portal:v1.8.3 "nginx -g 'daemon of…" 2 weeks ago Removal In Progress f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal b2ba0b1ac992 724c576ca3fb "/bin/sh -c 'echo \" …" 12 months ago Exited (2) 12 months ago determined_hermann 9f7b7e876bde 724c576ca3fb "/bin/sh -c 'echo \" …" 12 months ago Exited (2) 12 months ago elegant_mirzakhani fa90411e33fe bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (2) 12 months ago dazzling_euclid d3517506aac9 bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (100) 12 months ago friendly_snyder 096be691fbba bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (2) 12 months ago zealous_rhodes 1e31fc83a48a bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (100) 12 months ago epic_chebyshev eed93e842291 bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (100) 12 months ago dreamy_shannon 6c2b6e889664 bf20ac214571 "/bin/sh -c 'echo \" …" 12 months ago Exited (100) 12 months ago
批量删除退出(EXIT)的docker进程:
[root@hdss7-200 harbor]# for i in `docker ps -a|grep -i exit|awk '{print $1}'`;do docker rm -f $i;done b2ba0b1ac992 9f7b7e876bde fa90411e33fe d3517506aac9 096be691fbba 1e31fc83a48a eed93e842291 6c2b6e889664 e9032d47ed14 84430f73fd64
遗留的docker进程删不掉:
[root@hdss7-200 harbor]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2991dd383a4d goharbor/harbor-portal:v1.8.3 "nginx -g 'daemon of…" 10 minutes ago Up 10 minutes (healthy) 80/tcp harbor-portal 39a3a548010c goharbor/harbor-jobservice:v1.8.3 "/harbor/start.sh" 10 minutes ago Up 10 minutes harbor-jobservice 8e77c46135f8 goharbor/harbor-core:v1.8.3 "/harbor/start.sh" 10 minutes ago Up 10 minutes (healthy) harbor-core a451e99b8c61 goharbor/harbor-db:v1.8.3 "/entrypoint.sh post…" 10 minutes ago Up 10 minutes (healthy) 5432/tcp harbor-db 3e572508f475 goharbor/registry-photon:v2.7.1-patch-2819-v1.8.3 "/entrypoint.sh /etc…" 10 minutes ago Up 10 minutes (healthy) 5000/tcp registry 473b7beada5a goharbor/redis-photon:v1.8.3 "docker-entrypoint.s…" 10 minutes ago Up 10 minutes 6379/tcp redis 160b88a8c778 goharbor/harbor-registryctl:v1.8.3 "/harbor/start.sh" 10 minutes ago Up 10 minutes (healthy) registryctl a5d2fc46e05e goharbor/harbor-log:v1.8.3 "/bin/sh -c /usr/loc…" 10 minutes ago Up 10 minutes (healthy) 127.0.0.1:1514->10514/tcp harbor-log f3181ac0cf37 goharbor/harbor-portal:v1.8.3 "nginx -g 'daemon of…" 2 weeks ago Removal In Progress f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal
docker-compose服务也停不掉:
[root@hdss7-200 harbor]# docker-compose down Stopping harbor-portal ... done Stopping harbor-jobservice ... done Stopping harbor-core ... done Stopping registryctl ... done Stopping redis ... done Stopping harbor-db ... done Stopping registry ... done Stopping harbor-log ... done Removing harbor-portal ... done Removing harbor-jobservice ... done Removing harbor-core ... done Removing registryctl ... done Removing redis ... done Removing harbor-db ... done Removing registry ... done Removing harbor-log ... done Removing f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal ... error ERROR: for f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal container f3181ac0cf37cd29594dd7fa499b06cee71473c4caf63b5282638513a79b081e: driver "overlay2" failed to remove root filesystem: unlinkat /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: input/output error Removing network harborv183_harbor
停止docker服务也不行:
[root@hdss7-200 ~]# systemctl stop docker.service [root@hdss7-200 ~]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7 rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error
花式各种删除,都不行:
[root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7 rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error [root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/* rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error [root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error
查看文件属性,也报错:
[root@hdss7-200 harbor]# file /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp: cannot open (Input/output error)
挪到临时目录也不行:
[root@hdss7-200 harbor]# mv /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7 /tmp mv: cannot stat '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error
重新挂载也不行:
[root@hdss7-200 harbor]# mount sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime) proc on /proc type proc (rw,nosuid,nodev,noexec,relatime) devtmpfs on /dev type devtmpfs (rw,nosuid,size=1910324k,nr_inodes=477581,mode=755) securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime) tmpfs on /dev/shm type tmpfs (rw,nosuid,nodev) devpts on /dev/pts type devpts (rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000) tmpfs on /run type tmpfs (rw,nosuid,nodev,mode=755) tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,mode=755) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) pstore on /sys/fs/pstore type pstore (rw,nosuid,nodev,noexec,relatime) efivarfs on /sys/firmware/efi/efivars type efivarfs (rw,nosuid,nodev,noexec,relatime) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,pids) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,hugetlb) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,net_prio,net_cls) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer) configfs on /sys/kernel/config type configfs (rw,relatime) /dev/mapper/centos-root on / type xfs (rw,relatime,attr2,inode64,noquota) systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=17256) debugfs on /sys/kernel/debug type debugfs (rw,relatime) mqueue on /dev/mqueue type mqueue (rw,relatime) hugetlbfs on /dev/hugepages type hugetlbfs (rw,relatime) nfsd on /proc/fs/nfsd type nfsd (rw,relatime) /dev/sda2 on /boot type xfs (rw,relatime,attr2,inode64,noquota) /dev/sda1 on /boot/efi type vfat (rw,relatime,fmask=0077,dmask=0077,codepage=437,iocharset=ascii,shortname=winnt,errors=remount-ro) /dev/mapper/centos-home on /data type xfs (rw,relatime,attr2,inode64,noquota) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw,relatime) tmpfs on /run/user/0 type tmpfs (rw,nosuid,nodev,relatime,size=384404k,mode=700) binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,relatime) [root@hdss7-200 harbor]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/centos-root 50G 3.2G 47G 7% / devtmpfs 1.9G 0 1.9G 0% /dev tmpfs 1.9G 0 1.9G 0% /dev/shm tmpfs 1.9G 8.4M 1.9G 1% /run tmpfs 1.9G 0 1.9G 0% /sys/fs/cgroup /dev/sda2 1014M 179M 836M 18% /boot /dev/sda1 200M 12M 189M 6% /boot/efi /dev/mapper/centos-home 73G 9.9G 63G 14% /data tmpfs 376M 0 376M 0% /run/user/0 [root@hdss7-200 harbor]# mount -o remount,rw / [root@hdss7-200 harbor]# rm -fr /data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp rm: cannot remove '/data/docker/overlay2/39f18cbdc9fdb752772c8b4df19b53b5ca27a51969ba8d1c7870f38a02f02bf7/diff/etc/nginx/fastcgi_temp': Input/output error
和王导QQ联系,没想到他五一没有出去玩。
他最后登录到我的沃佳云服务器,说是重启虚拟机后进入“救援模式”,重新mount,就可以回到正常系统里删除了。
应该是执行如下命令:
mount -o remount, rw /
应该是他在普通环境删除文件: docker进程可以被删除了:
[root@hdss7-200 ~]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f3181ac0cf37 goharbor/harbor-portal:v1.8.3 "nginx -g 'daemon of…" 2 weeks ago Dead f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_f3181ac0cf37_harbor-portal [root@hdss7-200 ~]# docker rm -f f318 f318 [root@hdss7-200 ~]# docker ps -a CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
重新部署harbor成功:
[root@hdss7-200 ~]# cd /opt/harbor [root@hdss7-200 harbor]# ./install.sh [Step 0]: checking installation environment ... Note: docker version: 19.03.7 Note: docker-compose version: 1.18.0 [Step 1]: loading Harbor images ... Loaded image: goharbor/harbor-db:v1.8.3 Loaded image: goharbor/redis-photon:v1.8.3 Loaded image: goharbor/notary-signer-photon:v0.6.1-v1.8.3 Loaded image: goharbor/chartmuseum-photon:v0.9.0-v1.8.3 Loaded image: goharbor/harbor-core:v1.8.3 Loaded image: goharbor/harbor-log:v1.8.3 Loaded image: goharbor/harbor-registryctl:v1.8.3 Loaded image: goharbor/notary-server-photon:v0.6.1-v1.8.3 Loaded image: goharbor/clair-photon:v2.0.8-v1.8.3 Loaded image: goharbor/harbor-migrator:v1.8.3 Loaded image: goharbor/prepare:v1.8.3 Loaded image: goharbor/harbor-portal:v1.8.3 Loaded image: goharbor/nginx-photon:v1.8.3 Loaded image: goharbor/harbor-jobservice:v1.8.3 Loaded image: goharbor/registry-photon:v2.7.1-patch-2819-v1.8.3 [Step 2]: preparing environment ... prepare base dir is set to /opt/harbor-v1.8.3 Clearing the configuration file: /config/log/logrotate.conf Clearing the configuration file: /config/nginx/nginx.conf Clearing the configuration file: /config/core/env Clearing the configuration file: /config/core/app.conf Clearing the configuration file: /config/registry/config.yml Clearing the configuration file: /config/registry/root.crt Clearing the configuration file: /config/registryctl/env Clearing the configuration file: /config/registryctl/config.yml Clearing the configuration file: /config/db/env Clearing the configuration file: /config/jobservice/env Clearing the configuration file: /config/jobservice/config.yml Generated configuration file: /config/log/logrotate.conf Generated configuration file: /config/nginx/nginx.conf Generated configuration file: /config/core/env Generated configuration file: /config/core/app.conf Generated configuration file: /config/registry/config.yml Generated configuration file: /config/registryctl/env Generated configuration file: /config/db/env Generated configuration file: /config/jobservice/env Generated configuration file: /config/jobservice/config.yml Creating harbor-log ... done Generated configuration file: /compose_location/docker-compose.yml Clean up the input dir Creating registry ... done Creating harbor-core ... done [Step 3]: starting Harbor ... Creating harbor-portal ... done Creating nginx ... done Creating registry ... Creating harbor-db ... Creating registryctl ... Creating redis ... Creating harbor-core ... Creating harbor-jobservice ... Creating harbor-portal ... Creating nginx ... ✔ ----Harbor has been installed and started successfully.---- Now you should be able to visit the admin portal at http://harbor.od.com. For more details, please visit https://github.com/goharbor/harbor .
刷新浏览器页面已恢复正常:
1.问题最终是求助解决的,我还给了他20元的QQ红包。
2.这个问题,一方面是沃佳云真的垃圾到不行,这2年已经异常重启10次以上了。没有那家公司敢用这样的公有云环境。不过,它们的价格倒是真的便宜。而且,多次异常重启逼我遇到过很多故障,反而是帮我提升了排障技术。
3.这次遇到的问题,在生产环境应该反而不容易碰到。毕竟,生产环境这样频繁重启(几乎是每个月沃佳云都会机器重启),公司早就换云平台了。
4.kubernetes,终究还是容器编排。所以,只会k8s,docker玩的不熟,那kubernetes肯定也熟不了。
5.CentOS我当年跟着老男孩只学过单用户修改root密码和修改/etc/fstab文件。这应该也算是linux基础了,过了这么多年,特别是这几年用了阿里云之后,这基础知识更是生疏了。