AWS Redshift typical error and potential root cause:

Full join issue: When use full join, the below join condition should not occur:

1, OR statement
2, an obvious false or true condition, like 1 = 0 or 1=1
3, the datatype of the join column should not be timestamp, this case is very common in pcp join, when use the report date as the join key.

ERROR: 42846: could not convert type "unknown" to numeric because of modifier

When the datatype of a column is supposed to be numeric in a UNION ALL context(basically there are more than three sub query parts), 
If there exists NULL value for some sub query, it is needed to cast the NULL value to the numeric datatype.

Error info: [Assert error]

This means the Redshift performance is bad, usually multiple large table join, more CTAS is needed.

Error info: query is automatically killed saying cancelled on user's request

Error info: could not devise a query plan for the given query

The issue is prabably due to there is an join condition in the full join clause. The join column is report_date, which is a date datatype, but there is function applied to report_date, which convert the datatype to timestamp. Unluckily redshift seems not supported this datatype in the join condition.

To solve this issue, we need to cast the join column to date again, like below:
DATEADD(MONTH,-12,CUR.REPORT_DATE) :: DATE = RESULT_PCP.REPORT_DATE ::DATE

Tips: Avoid to use the string concatenation when doing a join, create a temporary table for that particular column.

Redshift related documents

https://blogs.aws.amazon.com/bigdata/post/Tx31034QG0G3ED1/Top-10-Performance-Tuning-Techniques-for-Amazon-Redshift

http://docs.aws.amazon.com/redshift/latest/dg/c_Byte_dictionary_encoding.html

http://content.infotrustllc.com/interleaved-sorting-a-novel-solution-for-optimizing-multiple-query-patterns/

http://docs.aws.amazon.com/redshift/latest/dg/welcome.html

https://aws.amazon.com/redshift/pricing/

Sort key testing
Currently test on a table about 10millon rows, no significant difference, Need a larger data set to do the testing.

Previously, the table with sort key is not applied column compression, so this could be an potential reason why there is no significant improvement.

上一篇:Flask之项目配置,目录构建,闪现


下一篇:遍历datatable的方法汇总