关于MySQL的字符集


系统变量

字符集相关的系统变量

mysql> show variables like '%char%';
+--------------------------+------------------------------------------------------------------------------+
| Variable_name            | Value                                                                        |
+--------------------------+------------------------------------------------------------------------------+
| character_set_client     | utf8                                                                         |
| character_set_connection | utf8                                                                         |
| character_set_database   | utf8                                                                         |
| character_set_filesystem | binary                                                                       |
| character_set_results    | utf8                                                                         |
| character_set_server     | utf8                                                                         |
| character_set_system     | utf8                                                                         |
| character_sets_dir       | /usr/local/Percona-Server-5.6.29-rel76.2-Linux.x86_64.ssl101/share/charsets/ |
+--------------------------+------------------------------------------------------------------------------+
8 rows in set (0.01 sec) 

各个变量的含义概述如下:

  • character_set_client :客户端发给服务端的SQL的字符集
  • character_set_connection : 字符常量的缺省字符集
  • character_set_database:缺省数据库(即use指定的数据库)的缺省字符集
  • character_set_filesystem:文件系统字符集,用于解释文件名字符常量
  • character_set_results:结果集和错误消息的字符集
  • character_set_server: 服务器的缺省字符集
  • character_set_system: 系统标识符的字符集
  • character_sets_dir: 字符集安装目录

详细定义参考官网说明:

排序规则相关的系统变量:

mysql> show variables like '%collation%';
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_general_ci |
| collation_server     | utf8_general_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec) 

排序规则和上面的字符集是对应的,就不解释了。但有一个问题,UTF8编码下该设置utf8_general_ci 还是utf8_unicode_ci有一些讨论。 比如:What's the difference between utf8_general_ci and utf8_unicode_ci

utf8_general_ci排序略快一些,utf8_unicode_ci对某些语义排序更准确。然而,所谓的"更快",快的程度可以无视;"更准确"所适用的场景对使用中文的我们没啥意义。所以个人认为设啥都没关系,干脆顺气自然不设,让MySQL自己根据字符集选择缺省值吧(即utf8_general_ci)。

数据存储

字符数据的最终存储到表的字符类型的列上,所以存储的最终体现形式是列的字符集。至于表的字符集不过是生成列时的缺省字符集;数据库的字符集不过建表时的缺省字符集。

一劳永逸的字符设置

谈到字符主要让人操心的是乱码问题,最简单有效的解决办法是统一设置UTF8编码。 只要在my.cnf的[mysqld]上设置character_set_server即可。

character_set_server           = utf8mb4 

这样,新创建的数据库和该数据库中的对象将默认采用'utf8mb4'编码;

JDBC(5.1.13以后版本)客户端将根据服务端的character_set_server设置合适的客户端编码;

http://dev.mysql.com/doc/relnotes/connector-j/5.1/en/news-5-1-14.html

Connector/J mapped both 3-byte and 4-byte UTF8 encodings to the same Java UTF8 encoding.

To use 3-byte UTF8 with Connector/J set characterEncoding=utf8 and set useUnicode=true in the connection string.

To use 4-byte UTF8 with Connector/J configure the MySQL server with character_set_server=utf8mb4. Connector/J will then use that setting as long as characterEncoding has not been set in the connection string. This is equivalent to autodetection of the character set. (Bug #58232) 

排序规则无需专门设置,让它跟随编码自己变化。

参考



上一篇:【体系结构】Oracle重做日志文件(Redo Log Files)


下一篇:TCP连接解释及连接过程描述