MySQL中的utf8mb4和utf8字符集有什么区别?
我已经知道ASCII,UTF-8,UTF-16和UTF-32编码;
但我很想知道utf8mb4编码组与MySQL服务器中定义的其他编码类型的区别.
使用utf8mb4而不是utf8有什么特别的好处/建议吗?
解决方法:
UTF-8是可变长度编码.在UTF-8的情况下,这意味着存储一个代码点需要一到四个字节.但是,名为“utf8”(别名为“utf8mb3”)的MySQL编码每个代码点最多只能存储三个字节.
因此字符集“utf8”/“utf8mb3”不能存储所有Unicode代码点:它只支持0x000到0xFFFF的范围,称为“Basic Multilingual Plane”.
另见Comparison of Unicode encodings.
这就是(the MySQL documentation的同一页面的先前版本)必须说明的内容:
The character set named utf8[/utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:
For a BMP character, utf8[/utf8mb3] and utf8mb4 have identical storage characteristics: same code values, same encoding, same length.
For a supplementary character, utf8[/utf8mb3] cannot store the character at all, while utf8mb4 requires four bytes to store it. Since utf8[/utf8mb3] cannot store the character at all, you do not have any supplementary characters in utf8[/utf8mb3] columns and you need not worry about converting characters or losing data when upgrading utf8[/utf8mb3] data from older versions of MySQL.
因此,如果您希望列支持存储位于BMP之外的字符(通常是您想要的),例如emoji,请使用“utf8mb4”.另见What are the most common non-BMP Unicode characters in actual use?.