字符集:查看mysql数据库当前都支持哪些字符集:
system@(none)>show character set;
+----------+-----------------------------+---------------------+--------+
| Charset | Description | Default collation | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5 | Big5 Traditional Chinese | big5_chinese_ci | 2 |
| dec8 | DEC West European | dec8_swedish_ci | 1 |
| cp850 | DOS West European | cp850_general_ci | 1 |
| hp8 | HP West European | hp8_english_ci | 1 |
| koi8r | KOI8-R Relcom Russian | koi8r_general_ci | 1 |
| latin1 | cp1252 West European | latin1_swedish_ci | 1 |
| latin2 | ISO 8859-2 Central European | latin2_general_ci | 1 |
| swe7 | 7bit Swedish | swe7_swedish_ci | 1 |
| ascii | US ASCII | ascii_general_ci | 1 |
| ujis | EUC-JP Japanese | ujis_japanese_ci | 3 |
| sjis | Shift-JIS Japanese | sjis_japanese_ci | 2 |
| hebrew | ISO 8859-8 Hebrew | hebrew_general_ci | 1 |
| tis620 | TIS620 Thai | tis620_thai_ci | 1 |
| euckr | EUC-KR Korean | euckr_korean_ci | 2 |
| koi8u | KOI8-U Ukrainian | koi8u_general_ci | 1 |
| gb2312 | GB2312 Simplified Chinese | gb2312_chinese_ci | 2 |
| greek | ISO 8859-7 Greek | greek_general_ci | 1 |
| cp1250 | Windows Central European | cp1250_general_ci | 1 |
| gbk | GBK Simplified Chinese | gbk_chinese_ci | 2 |
| latin5 | ISO 8859-9 Turkish | latin5_turkish_ci | 1 |
| armscii8 | ARMSCII-8 Armenian | armscii8_general_ci | 1 |
| utf8 | UTF-8 Unicode | utf8_general_ci | 3 |
| ucs2 | UCS-2 Unicode | ucs2_general_ci | 2 |
| cp866 | DOS Russian | cp866_general_ci | 1 |
| keybcs2 | DOS Kamenicky Czech-Slovak | keybcs2_general_ci | 1 |
| macce | Mac Central European | macce_general_ci | 1 |
| macroman | Mac West European | macroman_general_ci | 1 |
| cp852 | DOS Central European | cp852_general_ci | 1 |
| latin7 | ISO 8859-13 Baltic | latin7_general_ci | 1 |
| utf8mb4 | UTF-8 Unicode | utf8mb4_general_ci | 4 |
| cp1251 | Windows Cyrillic | cp1251_general_ci | 1 |
| utf16 | UTF-16 Unicode | utf16_general_ci | 4 |
| utf16le | UTF-16LE Unicode | utf16le_general_ci | 4 |
| cp1256 | Windows Arabic | cp1256_general_ci | 1 |
| cp1257 | Windows Baltic | cp1257_general_ci | 1 |
| utf32 | UTF-32 Unicode | utf32_general_ci | 4 |
| binary | Binary pseudo charset | binary | 1 |
| geostd8 | GEOSTD8 Georgian | geostd8_general_ci | 1 |
| cp932 | SJIS for Windows Japanese | cp932_japanese_ci | 2 |
| eucjpms | UJIS for Windows Japanese | eucjpms_japanese_ci | 3 |
+----------+-----------------------------+---------------------+--------+
40 rows in set (0.00 sec)
如何确定某个字符集支持哪些校对规则:
一个字符集至少会拥有一个校对规则,查看latin1字符集所拥有的校队规则:
system@(none)> show collation like 'latin1%';
+-------------------+---------+----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| latin1_german1_ci | latin1 | 5 | | Yes | 1 |
| latin1_swedish_ci | latin1 | 8 | Yes | Yes | 1 |
| latin1_danish_ci | latin1 | 15 | | Yes | 1 |
| latin1_german2_ci | latin1 | 31 | | Yes | 2 |
| latin1_bin | latin1 | 47 | | Yes | 1 |
| latin1_general_ci | latin1 | 48 | | Yes | 1 |
| latin1_general_cs | latin1 | 49 | | Yes | 1 |
| latin1_spanish_ci | latin1 | 94 | | Yes | 1 |
+-------------------+---------+----+---------+----------+---------+
8 rows in set (0.01 sec)
--character_set_server 指定全局粒度的默认字符集。
--collation_server指定全局粒度的默认校队规则。
查看字符集和校对规则:show global variables like '%server';
system@(none)>show global variables like '%server';
+----------------------+-----------------+
| Variable_name | Value |
+----------------------+-----------------+
| character_set_server | utf8 |
| collation_server | utf8_general_ci |
+----------------------+-----------------+
2 rows in set (0.00 sec)
将全局粒度默认的字符集改为gbk:
system@(none)>set global character_set_server=gbk;
Query OK, 0 rows affected (0.00 sec)
system@(none)>show global variables like '%server';
+----------------------+----------------+
| Variable_name | Value |
+----------------------+----------------+
| character_set_server | gbk |
| collation_server | gbk_chinese_ci |
+----------------------+----------------+
2 rows in set (0.00 sec)
小知识:全局粒度是什么意思?
在mysql服务运行期间,修改系统变量的值也有作用域的。mysql中的系统变量的作用域分为全局(global)和当前会话(session)两类。对于全局的修改,作用域修改成功后新创建的会话,但对当前执行修改的会话无效,如果是会话级的修改(执行set命令并且未指定global选项就是会话级修改)则只作用于当前会话,本次会话结束后所做的修改也自动结束。此外需要注意mysql中即使是全局的参数修改,并作用域最多也只在当前mysql服务的生命周期内。mysql服务一旦重启那么值钱的设置也全部无效(不管是全局还是会话)因此要是希望所做的设置永久生效,那么除了在全局粒度修改外,还需要手动修改初始化参数文件,或者是在启动mysql服务时,在命令行中显式指定相关选项值。很多情况下,显示的字符出现乱码,也有可能是因为客户端当前的字符集设置与mysql服务端保存字符时所用的字符集不相符所致
system@jason>show variables like 'character_set_client';
+----------------------+-------+
| Variable_name | Value |
+----------------------+-------+
| character_set_client | gbk |
+----------------------+-------+
1 row in set (0.00 sec)
修改客户端所使用的字符集:set character_set_results=gbk;执行后与server端的字符集一致就不会出现乱码了。
~客户端发出的sql语句所使用的字符集由系统变量character_set_client来指定
~mysql服务端接收到语句后会用到character_set_connection和collation_connection两个系统变量中的设置。
并且会将客户端发送的语句字符集由character_set_client转换到character_set_connection(除非用户执行语句时已对字符列明确指定了字符集)对于语句中指定的字符串的比较或排序,还需要应用collation_connection中指定的校对规则处理而对于语句中指定的列的比较则无关collation_connection的设置了,因为对象的表列拥有自己的校对规则,他们拥有更高的优先级。
~mysql服务端执行完语句后会按照character_set_results系统变量设定的字符集返回结果集(或错误信息)到客户端。
system@jason>show global variables like 'character_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | gbk |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
这几个系统变量的值,默认继承自服务端启动时默认的字符集设置,也就是我们编译时指定的utf8.不过有一项例外,那是因为前面我们将character_set_server的值设置成了gbk
system@jason> show variables like 'character_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
一次性修改所有与连接相关的字符集变量设置,例如设置当前会话字符集为utf8执行命令如下:
system@jason> set names utf8;
Query OK, 0 rows affected (0.00 sec)
system@jason>show variables like 'character_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
system@jason>set names gbk;
Query OK, 0 rows affected (0.00 sec)
system@jason>show variables like 'character_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
system@jason>set names utf8;
Query OK, 0 rows affected (0.00 sec)
system@jason>show variables like 'character_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
set character set latin1;
system@jason>set character set latin1;
Query OK, 0 rows affected (0.00 sec)
system@jason>show variables like 'character_set\_%';
+--------------------------+--------+
| Variable_name | Value |
+--------------------------+--------+
| character_set_client | latin1 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | utf8 |
| character_set_system | utf8 |
+--------------------------+--------+
7 rows in set (0.00 sec)
system@jason>
创建5ienet数据库并指定该库的默认字符集为latin1
system@jason>create database 5ienet charset latin1;
Query OK, 1 row affected (0.01 sec)
修改5ienet数据库的字符集为utf8
system@(none)>alter database 5ienet charset utf8;
Query OK, 1 row affected (0.00 sec)
数据库级的字符集在数据库同名操作系统目录下的db.opt文件中。
[mysql@linux01 5ienet]$ cat db.opt
default-character-set=utf8
default-collation=utf8_general_ci
system@(none)>show create database 5ienet;
+----------+-----------------------------------------------------------------+
| Database | Create Database |
+----------+-----------------------------------------------------------------+
| 5ienet | CREATE DATABASE `5ienet` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+-----------------------------------------------------------------+
1 row in set (0.00 sec)
表级指定创建一个t1表现不指定字符集:
system@5ienet>create table t1 (id int);
Query OK, 0 rows affected (0.02 sec)
再重建一个t2表指定该表的默认字符集为latin1
system@5ienet>create table t2 (id int) charset latin1;
Query OK, 0 rows affected (0.02 sec)
system@5ienet>show create table t1 \G
*************************** 1. row ***************************
Table: t1
Create Table: CREATE TABLE `t1` (
`id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
system@5ienet>show create table t2 \G
*************************** 1. row ***************************
Table: t2
Create Table: CREATE TABLE `t2` (
`id` int(11) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
1 row in set (0.00 sec)
所以创建对象时没有指定字符集那么它会继承数据库粒度的字符集,如果明确指定了字符集那么该表的字符集就是我们指定的字符集。
操作示例:
创建表t3,拥有三个列 每个列指定不同的字符集
create table t3(
v1 varchar(20) charset latin1,
v2 varchar(20) charset gbk,
v3 varchar(20) charset utf8
);
设置当前客户端连接会话的字符集为utf8
system@5ienet>set names utf8;
Query OK, 0 rows affected (0.00 sec)
system@5ienet>show variables like 'character%';
+--------------------------+------------------------+
| Variable_name | Value |
+--------------------------+------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | gbk |
| character_set_system | utf8 |
| character_sets_dir | /mysql/share/charsets/ |
+--------------------------+------------------------+
8 rows in set (0.00 sec)
system@5ienet> insert into t3 values('cn中国','cn中国','cn中国');
ERROR 1366 (HY000): Incorrect string value: '\xD6\xD0\xB9\xFA' for column 'v1' at row 1
报错提示v1列指定了错误的字符串值,导致插入失败。这是因为latin1是西文字符集,并不支持中文这样的多字节字符。
1366在mysql5.6之前是警告级别,可以插入数据,在5.6之后改为错误级别了。由sql_mode控制。
system@5ienet>insert into t3 values('china','cn中国','cn中国');
ERROR 1366 (HY000): Incorrect string value: '\xD6\xD0\xB9\xFA' for column 'v2' at row 1
这次提示v2有错误
system@5ienet>insert into t3 values('china','china','cn中国');
ERROR 1366 (HY000): Incorrect string value: '\xD6\xD0\xB9\xFA' for column 'v3' at row 1
提示v3错误
插入时报错是因为客户端的字符集和存储所用的字符集不匹配所致。
system@5ienet> show variables like 'character%';
+--------------------------+------------------------+
| Variable_name | Value |
+--------------------------+------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | gbk |
| character_set_system | utf8 |
| character_sets_dir | /mysql/share/charsets/ |
+--------------------------+------------------------+
8 rows in set (0.00 sec)
system@5ienet>set names gbk;
Query OK, 0 rows affected (0.00 sec)
system@5ienet> show variables like 'character%';
+--------------------------+------------------------+
| Variable_name | Value |
+--------------------------+------------------------+
| character_set_client | gbk |
| character_set_connection | gbk |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | gbk |
| character_set_server | gbk |
| character_set_system | utf8 |
| character_sets_dir | /mysql/share/charsets/ |
+--------------------------+------------------------+
8 rows in set (0.00 sec)
system@5ienet>insert into t3 values('china','cn中国','cn中国');
Query OK, 1 row affected (0.00 sec)
成功插入。
system@5ienet>select * from t3;
+-------+--------+--------+
| v1 | v2 | v3 |
+-------+--------+--------+
| china | cn中国 | cn中国 |
+-------+--------+--------+
1 row in set (0.00 sec)
所以连接时的字符集和存储时的字符集一致结果才能正常显示。那为啥utf8的v3列为什么也能正常显示,这是因为character_result被转换成了gbk字符集。把当前会话字符集改为utf8再做查询。
system@5ienet> set names utf8;
Query OK, 0 rows affected (0.00 sec)
system@5ienet>select v1,v2,v3,length(v1),length(v2),length(v3) from t3;
+-------+----------+----------+------------+------------+------------+
| v1 | v2 | v3 | length(v1) | length(v2) | length(v3) |
+-------+----------+----------+------------+------------+------------+
| china | cn涓噁浗 | cn涓噁浗 | 5 | 6 | 8 |
+-------+----------+----------+------------+------------+------------+
1 row in set (0.00 sec)
system@5ienet>set names gbk;
Query OK, 0 rows affected (0.00 sec)
system@5ienet>select * from t3;
+-------+--------+--------+
| v1 | v2 | v3 |
+-------+--------+--------+
| china | cn中国 | cn中国 |
+-------+--------+--------+
1 row in set (0.00 sec)
system@5ienet> alter table t3 modify v3 varchar(20) charset gb2312;
Query OK, 1 row affected (0.04 sec)
Records: 1 Duplicates: 0 Warnings: 0
system@5ienet>select * from t3;
+-------+--------+--------+
| v1 | v2 | v3 |
+-------+--------+--------+
| china | cn中国 | cn中国 |
+-------+--------+--------+
1 row in set (0.00 sec)
system@5ienet>alter table t3 modify v3 varchar(20) charset latin1;
ERROR 1366 (HY000): Incorrect string value: '\xD6\xD0\xB9\xFA' for column 'v3' at row 1
把v3改为latin1报错。