clickhouse常见异常以及错误码解决

2023-09-30 13:11:52

一、异常

1）DB::Exception: Nested type Array(String) cannot be inside Nullable type (version 20.4.6.53 (official build))
原因：字段类型是Nullable(String)，在使用一些字符串函数如splitByString，他们对Nullable类型是不支持的，需要转成String。
解决：使用cast强转一下字段类型就行：

select splitByString(',',cast(col as String)) col from test

2）DB::Exception: Cannot convert NULL value to non-Nullable type: while converting source column second_channel to destination column second_channel (version 20.4.6.53 (official build))
原因：字段类型是非空类型，insert null值到非空字段second_channel会报错。
解决：可以将非空类型改成Nullable(String)，但是要注意Nullable字段不允许用于order by。

3）DB::Exception: Memory limit (total) exceeded: would use 113.20 GiB (attempt to allocate chunk of 134200512 bytes), maximum: 113.14 GiB: While executing CreatingSetsTransform. (version 20.4.6.53 (official build))
原因：单次查询出来的数据量，大于单台机器的剩余内存。
解决：可以将查询范围缩小，比如添加查询条件对查询结果取余，也可以清理或者添加物理机内存。

5）DB::Exception: Table columns structure in ZooKeeper is different from local table structure (version 20.12.3.3 (official build))
原因：Replicated（副本）表删表重建，但zk中表结构删除操作是异步的，默认为五分钟。
解决：重启该节点的ck，或者选择等待几分钟内。

6）Too many parts (300). Merges are processing significantly slower than inserts...

原因：使用 Flink 实时消费 Kafka 的数据，Sink 到 ClickHouse ，策略是一条一条插入，任务上线一段时间之后，ClickHouse 扛不住数据插入的压力了(是因为MergeTree的merge的速度跟不上 data part 生成的速度)。

解决：优化 FLink ClickHouse Sink逻辑，根据时间和数据量做触发，满足其一才会执行插入操作。

7）Caused by: org.apache.spark.memory.SparkOutOfMemoryError: error while calling spill() on org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter@5915c8a5

原因：磁盘空间不足。

8）spark sql unsupported type array

原因：因为数据源的数组类型和spark sql的数组类型不一致，可以将数组转为string

解决：

select toString(field) from tablet

二、错误码

1）Code: 48

Received exception from server (version 21.1.2.15):
Code: 48. DB::Exception: Received from localhost:9000, ::1. DB::Exception: Mutations are not supported by storage Distributed.

原因：分布式表不能进行更新，ALTER TABLE UPDATE/DELETE不支持分布式DDL

解决：需要在分布式环境中手动在每个节点上local的进行更新/删除数据。

2）Code:1002
2021-02-22 07:31:31,656 ERROR [main] execute clickhouse Query Error
ru.yandex.clickhouse.except.ClickHouseUnknownException: ClickHouse exception, code: 1002, host: xxxx, port: 8123; xxxx:8123 failed to respond
原因： JDBC client端和server端对 http connection , header `keep-alive` 不一致。

解决：升级clickhouse-jdbc 驱动jar或者pom引入依赖版本到0.2.6 。

3）Code: 159，read timeout

原因：查询超时导致报错。
解决：执行某些SQL很耗时导致最后报错读超时，这是因为clickhouse执行单次SQL的默认最大等待时间是30s，如果有比较耗时的SQL，可以通过将JdbcURL的socket_timeout参数值设置的大一点来解决这个问题（注意这个参数的时间单位是毫秒，默认是30000）。

4）Code 62，Max query size exceeded

原因：Select语句中使用in方式查询报错。

解决：
这其实是因为查询语句特别的大造成的，而默认的max_query_size最大是256 KiB。打开/etc/clickhouse-server/users.xml（只配置了一些常用的用户）。max_query_size这种配置，就需要在profiles部分中配置修改。

注意这里的单位是bytes(字节),我这里设置了102410241024=1,073,741,824,就解决问题了。如果是sql创建的用户，需要通过sql修改配额，修改方式参考https://www.cnblogs.com/MrYang-11-GetKnow/p/15896355.html。

5）Code: 168，AST is too big，Maximum: 50000

原因：AST太大了。

解决：
在users.xml配置文件中添加相应配置，或者通过sql修改，具体步骤参照修改权限文档即可。

<max_ast_elements>10000000</max_ast_elements>
<max_expanded_ast_elements>10000000</max_expanded_ast_elements>

6）Code: 221，db::exception: no interserver io endpoint named…
复制副本数据时报错导致无法同步数据，直接在err.log日志文件看到的报错是：auto DB::StorageReplicatedMergeTree::processQueueEntry(ReplicatedMergeTreeQueue::SelectedEntryPtr)::(anonymous class)::operator()(DB::StorageReplicatedMergeTree::LogEntryPtr &) const: Poco::Exception. Code: 1000, e.code() = 111, Connection refused

原因：没有指定interserver_http_host参数，clickhouse配置文件中关于对这个参数的描述我翻译过来大概意思就是这个参数是其他副本用于请求此服务器的主机名；如果未指定，则与“hostname-f”命令类似确定，此设置可用于将复制切换到另一个网络接口（服务器可以通过多个地址连接到多个网络）。不指定该参数的话，服务器就会试图连接到自己，而对应的端口号未提供服务时就会报Connection refused这样的错误了。

7）Code: 253， Replica /clickhouse/tables/XXX/XXX/replicas/dba07 already exists
原因：建立副本表（ReplicatedMergeTree）的时候，如果数据库的引擎是Atomic，则在删除表之后马上重建会报这个错。删除的时候clickhouse是通过异步线程清除掉zookeeper上的数据的，立马新建的话可能异步线程还没开始执行，如果不想做其他操作的话，等一会再执行创建语句就不会报这个错了，也可以通过指定如下参数设置清除zookeeper上数据操作的延迟时间：

<!-- 修改参数 database_atomic_delay_before_drop_table_sec = 0 ，解决删除副本表立马重建会报错的问题 -->
<database_atomic_delay_before_drop_table_sec>0</database_atomic_delay_before_drop_table_sec>

8）Code: 252
Code: 252, e.displayText() = DB::Exception: Too many partitions ,for single INSERT block (more than 100).
原因：单次插入的数据分区太多了，超过默认配置的 100 个了。
解决：
1.合理设置分区字段。
2.修改这个 max_partitions_per_insert_block 参数，调大这个值。
3.避免同一批次写入包含太多分区的数据。

9）Code: 359
Code: 359,e.displayText()=DB::Exception: Table or Partition in xxx was not dropped.
Reason:
1. Size (158.40 GB) is greater than max_[table/partition]_size_to_drop (50.00 GB)
2. File '/data/clickhouse/clickhouse-server/flags/force_drop_table' intended to force DROP doesn't exist
原因：
1）可以看到，删除的数据实际大小已经超过了配置的大小。
2）说明是可以跳过配置检查，进行强制删除的，但是没找到这个文件 /data/clickhouse/clickhouse-server/flags/force_drop_table，所以不能跳过检查，也就是不能强制删除。
解决：
根据错误提示2 ，在所在的节点执行：

sudo touch '/data/clickhouse/clickhouse-server/flags/force_drop_table'
&& sudo chmod 666 '/data/clickhouse/clickhouse-server/flags/force_drop_table'

然后再次执行删除操作就可以了。需要注意的是，这个标识文件有效期只有一次，执行删除完毕之后，这个文件就会消失。

10）Code: 117

Code: 117, e.displayText() = DB::Exception: Unexpected NULL value of not Nullable type String (version 20.8.3.18)
原因：因为null值导致的,hive底层存储null值是用\N表示,而clickhouse处理null值的方式不一致,因为需要在建表时特殊说明。

解决：参照处理null值文档

11）Code: 62

ERROR ApplicationMaster: User class threw exception: ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 62, host: 127.0.0.1, port: 8123; Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 1432 (end of query): . Expected one of: ENGINE, storage definition (version 20.8.3.18)ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 62, host: 127.0.0.1, port: 8123; Code: 62, e.displayText() = DB::Exception: Syntax error: failed at position 1432 (end of query): . Expected one of: ENGINE, storage definition (version 20.8.3.18)
原因：表不存在

解决：创建相关表

12）Code: 241

Code: 241. DB::Exception: Received from localhost:9000. DB::Exception: Memory limit (for query) exceeded: would use 9.31 GiB (attempt to allocate chunk of 4223048 bytes), maximum: 9.31 GiB: While executing MergeTreeThread: While executing CreatingSetsTransform.

原因：内存使用超出限制,默认的最大限制是10G。

解决：sql设置单次查询内存或者设置用户配额（sql设置或者users.xml设置调整max_memory_usage = 20000000000000）

13）Code: 202

ClickHouse exception, code: 202, host: xxxxx, port: 8123; Code: 202, e.displayText() = DB::Exception: Too many simultaneous queries. Maximum: 100
原因：最大并发为100。

解决：修改config.xml文件:<max_concurrent_queries>100</max_concurrent_queries>。

14）Code: 252

ru.yandex.clickhouse.except.ClickHouseException: ClickHouse exception, code: 252, host: xxxx, port: 8123; Code: 252, e.displayText() = DB::Exception: Too many parts (308). Merges are processing significantly slower than inserts. (version 20.8.3.18)
原因：插入的速度太快了,clickhouse合并的速度太慢。

解决：调小并行度,减少批次处理的条数。

15）Code: 159

Code: 159. DB::Exception: Received from localhost:9000. DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000000002 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 3 unfinished hosts (0 of them are currently active), they are going to execute the query in background.
原因：ck端口是否写错
解决：检查metrika.xml文件中ck端口

码农公寓

相关文章