使用docker-compose部署Sentry（附Sentry数据清理）

2024-01-22 11:20:41

Ubuntu下Sentry部署

Sentry作为一款常见以及使用人数较多的监控服务，在接口监控、错误捕捉、错误报警等方面是非常不错的，在此之前我也用过Prometheus监控，各有各的好处，有兴趣的同学可以对这些都了解一下。

安装docker

apt install curl -y
sh -c "$(curl -fsSL https://get.docker.com)"
systemctl start docker
systemctl enable docker

安装docker-compose

# 下载 docker-compose 
wget https://github.com/docker/compose/releases/download/1.26.0/docker-compose-Linux-x86_64
# 移到 /usr/local/bin/docker-compose
sudo mv docker-compose-Linux-x86_64 /usr/local/bin/docker-compose
# 给 docker-compose 执行权限
sudo chmod +x /usr/local/bin/docker-compose

下载安装sentry

wget https://github.com/getsentry/onpremise/archive/refs/tags/21.4.1.tar.gz

tar -zxvf 21.4.1.tar.gz

cd onpremise-21.4.1/

修改 /sentry/config.yml，注意邮箱的tls和ssl只能一个为true

mail.host: 'smtp.exmail.qq.com'
mail.port: 465
mail.username: 'xxx@xxx.com'
mail.password: 'xxx'
mail.use-tls: false
mail.use-ssl: true
#The email address to send on behalf of
mail.from: 'xxx@xxx.com'

修改.env文件，在末尾添加数据自动清理时间，默认90天太长了，不设置也没关系

SENTRY_EVENT_RETENTION_DAYS=14

执行：

./install.sh

中途会让创建用户

然后执行：

docker-compose up -d

如果想停止sentry服务，执行：

docker-compose down

sentry数据清理

sentry服务部署一段时间后发现剩余磁盘空间越来越小，寻找了很多解决方案，以下两个方案供参考。

方法一

实际上最大的数据是postgres里一个不断增长的表public.nodestore_node，一般发现的时候磁盘基本都被它满了，所以百度到的通用解决方案可能并没有办法执行，所以我们直接将它TRUNCATE掉即可，清空后不会影响后续的业务就行。

#首先进入postgres容器
docker exec -it sentry_onpremise_postgres_1  /bin/bash

#切换用户
su postgres

#进入postgres
psql

#清空这个害人的表
TRUNCATE public.nodestore_node;

#在清理前后可以查看下这个表占用的空间
SELECT
    table_schema || '.' || table_name AS table_full_name,
    pg_size_pretty(pg_total_relation_size('"' || table_schema || '"."' || table_name || '"')) AS size
FROM information_schema.tables
ORDER BY
pg_total_relation_size('"' || table_schema || '"."' || table_name || '"') DESC limit 20;

方法二

下方是百度到排名靠前的解决方案，在执行第二步时剩余磁盘空间不足会出现问题，在此记录作为参考。

1、SENTRY数据软清理（清理完不会释放磁盘，如果很长时间没有运行，清理时间会很长）

#登录worker容器
docker exec -it sentry_onpremise_worker_1 /bin/bash 

#保留多少天的数据，cleanup使用delete命令删除postgresql数据，但对于delete,update等操作，只是将对应行标志为DEAD，并没有真正释放磁盘空间
sentry cleanup --days 7

2、POSTGRES数据清理（清理完后会释放磁盘空间）

 #登录postgres容器
docker exec -it sentry_onpremise_postgres_1  /bin/bash

#运行清理
vacuumdb -U postgres -d postgres -v -f --analyze
vacuumdb -U postgres -d postgres -t nodestore_node -v -f --analyze

3、定时清理脚本

0 1 * * * cd /data1/onpremise && { time docker-compose run --rm worker cleanup --days 7; } &> /var/log/cleanup.log
0 8 * * * { time docker exec -i $(docker ps --format "table {{.Names}}"|grep postgres) vacuumdb -U postgres -d postgres -v -f --analyze; } &> /data1/logs/vacuumdb.log

4、异常处理

因为磁盘已经被占满，所以上面的清理命令也执行不动了，没办法只能自己寻找大文件临时删除一些，于是找到了下面的大文件

/var/lib/docker/volumes/sentry-kafka/_data/events-0/*.log

看着是 .log结尾的文件，而且很大，就直接删除了，结果发现重启后，sentry无法正常收到上报了。

参考：https://forum.sentry.io/t/sentry-disk-cleanup-kafka/11337

没办法，只能重新安装

cd /data1/onpremise
./install.sh
重新启动生效，重新安装不会清理原有数据，所以不备份也没关系

docker-compose down
docker-compose build
docker-compose up -d

5、清理kafka占用磁盘过大的问题
清理kafka占用磁盘过大的问题搜到可以配置 .env，如下，但是我的没有效果

KAFKA_LOG_RETENTION_HOURS=24
KAFKA_LOG_RETENTION_BYTES=53687091200 #50G
KAFKA_LOG_SEGMENT_BYTES=1073741824 #1G
KAFKA_LOG_RETENTION_CHECK_INTERVAL_MS=300000
KAFKA_LOG_SEGMENT_DELETE_DELAY_MS=60000
于是自己研究，首先进入kafka的容器

docker exec -it sentry_onpremise_kafka_1 /bin/bash

查看topics

kafka-topics --list --zookeeper zookeeper:2181

修改kafka配置文件

vi server.propertyies

修改为7小时默认168

log.retention.hours=7
log.cleaner.enable=true
log.cleanup.policy=delete
log.cleanup.interval.mins=1

重启

kafka-server-stop
kafka-server-start -daemon
重启后过了一会也没效果，第二天才看到效果，具体原因有待研究，再去查看目录的大小，发小从20G下降到12G左右

cd /var/lib/docker/volumes/sentry-kafka/_data/events-0
du -h --max-depth=1
ls -alh # 日期最小的是3天前的日志：00000000000000146071.log
docker容器没有vi命令的解决方案

apt-get update
apt-get install vim

6、官方解决方案
其实官方已经提供了解决方案，修改 .env文件的以下配置

SENTRY_EVENT_RETENTION_DAYS=7
重新安装即可

详情参考：https://github.com/getsentry/onpremise

码农公寓