1、先说重点:
不同的编码格式占字节数是不同的,UTF-8编码下一个中文所占字节也是不确定的,可能是2个、3个、4个字节;
2、以下是源码:
1 @Test
2 public void test1() throws UnsupportedEncodingException {
3 String a = "名";
4 System.out.println("UTF-8编码长度:"+a.getBytes("UTF-8").length);
5 System.out.println("GBK编码长度:"+a.getBytes("GBK").length);
6 System.out.println("GB2312编码长度:"+a.getBytes("GB2312").length);
7 System.out.println("==========================================");
8
9 String c = "0x20001";
10 System.out.println("UTF-8编码长度:"+c.getBytes("UTF-8").length);
11 System.out.println("GBK编码长度:"+c.getBytes("GBK").length);
12 System.out.println("GB2312编码长度:"+c.getBytes("GB2312").length);
13 System.out.println("==========================================");
14
15 char[] arr = Character.toChars(0x20001);
16 String s = new String(arr);
17 System.out.println("char array length:" + arr.length);
18 System.out.println("content:| " + s + " |");
19 System.out.println("String length:" + s.length());
20 System.out.println("UTF-8编码长度:"+s.getBytes("UTF-8").length);
21 System.out.println("GBK编码长度:"+s.getBytes("GBK").length);
22 System.out.println("GB2312编码长度:"+s.getBytes("GB2312").length);
23 System.out.println("==========================================");
24 }
3、运行结果
1 UTF-8编码长度:3
2 GBK编码长度:2
3 GB2312编码长度:2
4 ==========================================
5 UTF-8编码长度:4
6 GBK编码长度:1
7 GB2312编码长度:1
8 ==========================================
9 char array length:2
10 content:|