bytes的来源
bytes 是 Python 3.x 新增的类型,在 Python 2.x 中是不存在的。
bytes 的意思是“字节”,以字节为单位存储数据。而一个字节二进制为8个比特位。
字节串(bytes)和字符串(string)的对比:
摘自:http://c.biancheng.net/view/2175.html
- 字符串由若干个字符组成,以字符为单位进行操作;字节串由若干个字节组成,以字节为单位进行操作。
- 字节串和字符串除了操作的数据单元不同之外,它们支持的所有方法都基本相同。
- 字节串和字符串都是不可变序列,不能随意增加和删除数据。
bytes 只负责以字节序列的形式(二进制形式)来存储数据,至于这些数据到底表示什么内容(字符串、数字、图片、音频等),完全由程序的解析方式决定。如果采用合适的字符编码方式(字符集),字节串可以恢复成字符串;反之亦然,字符串也可以转换成字节串。
说白了,bytes 只是简单地记录内存中的原始数据,至于如何使用这些数据,bytes 并不在意,你想怎么使用就怎么使用,bytes 并不约束你的行为。
bytes 类型的数据非常适合在互联网上传输,可以用于网络通信编程;bytes 也可以用来存储图片、音频、视频等二进制格式的文件。
字符串和 bytes 存在着千丝万缕的联系,我们可以通过字符串来创建 bytes 对象,或者说将字符串转换成 bytes 对象。有以下三种方法可以达到这个目的:
- 如果字符串的内容都是 ASCII 字符,那么直接在字符串前面添加
b
前缀就可以转换成 bytes。 - bytes 是一个类,调用它的构造方法,也就是 bytes(),可以将字符串按照指定的字符集转换成 bytes;如果不指定字符集,那么默认采用 UTF-8。
字符串本身有一个 encode() 方法,该方法专门用来将字符串按照指定的字符集转换成对应的字节串;如果不指定字符集,那么默认采用 UTF-8。
bytes的在Python 中的表示方法
bytes的在Python 中的表示法默认为展示对应的ASCII 字符,但ASCII码表256个中只有95个可以打印的字符。
如果一个bytes类型的数据(一串bytes 暂且叫做“字节串”)一些字节没有对应的可打印字符时,则用十六进制表示法展示。
#通过b前缀将字符串转换成 bytes
b3 = b'http://c.biancheng.net/python/'
print("b3: ", b3)
b4 = bytes('C语言中文网8岁了', encoding='UTF-8')
print("b4: ", b4)
输出
b3: b'http://c.biancheng.net/python/'
b4: b'C\xe8\xaf\xad\xe8\xa8\x80\xe4\xb8\xad\xe6\x96\x87\xe7\xbd\x918\xe5\xb2\x81\xe4\xba\x86'
附录:ASCII码对照表
附一:基础ASCII码 128个,其中可打印字符95个
http://ascii.911cha.com/
ASCII控制字符
二进制 | 十进制 | 十六进制 | 缩写 | 可以显示的表示法 | 名称/意义 |
---|---|---|---|---|---|
0000 0000 | 0 | 00 | NUL | ␀ | 空字符(Null) |
0000 0001 | 1 | 01 | SOH | ␁ | 标题开始 |
0000 0010 | 2 | 02 | STX | ␂ | 本文开始 |
0000 0011 | 3 | 03 | ETX | ␃ | 本文结束 |
0000 0100 | 4 | 04 | EOT | ␄ | 传输结束 |
0000 0101 | 5 | 05 | ENQ | ␅ | 请求 |
0000 0110 | 6 | 06 | ACK | ␆ | 确认回应 |
0000 0111 | 7 | 07 | BEL | ␇ | 响铃 |
0000 1000 | 8 | 08 | BS | ␈ | 退格 |
0000 1001 | 9 | 09 | HT | ␉ | 水平定位符号 |
0000 1010 | 10 | 0A | LF | ␊ | 换行键 |
0000 1011 | 11 | 0B | VT | ␋ | 垂直定位符号 |
0000 1100 | 12 | 0C | FF | ␌ | 换页键 |
0000 1101 | 13 | 0D | CR | ␍ | 归位键 |
0000 1110 | 14 | 0E | SO | ␎ | 取消变换(Shift out) |
0000 1111 | 15 | 0F | SI | ␏ | 启用变换(Shift in) |
0001 0000 | 16 | 10 | DLE | ␐ | 跳出数据通讯 |
0001 0001 | 17 | 11 | DC1 | ␑ | 设备控制一(XON 启用软件速度控制) |
0001 0010 | 18 | 12 | DC2 | ␒ | 设备控制二 |
0001 0011 | 19 | 13 | DC3 | ␓ | 设备控制三(XOFF 停用软件速度控制) |
0001 0100 | 20 | 14 | DC4 | ␔ | 设备控制四 |
0001 0101 | 21 | 15 | NAK | ␕ | 确认失败回应 |
0001 0110 | 22 | 16 | SYN | ␖ | 同步用暂停 |
0001 0111 | 23 | 17 | ETB | ␗ | 区块传输结束 |
0001 1000 | 24 | 18 | CAN | ␘ | 取消 |
0001 1001 | 25 | 19 | EM | ␙ | 连接介质中断 |
0001 1010 | 26 | 1A | SUB | ␚ | 替换 |
0001 1011 | 27 | 1B | ESC | ␛ | 跳出 |
0001 1100 | 28 | 1C | FS | ␜ | 文件分割符 |
0001 1101 | 29 | 1D | GS | ␝ | 组群分隔符 |
0001 1110 | 30 | 1E | RS | ␞ | 记录分隔符 |
0001 1111 | 31 | 1F | US | ␟ | 单元分隔符 |
0111 1111 | 127 | 7F | DEL | ␡ | 删除 |
ASCII可显示字符
|
|
|
附二:扩展ASCII码,默认都不可打印
https://tool.ip138.com/ascii_code/
DEC | OCT | HEX | BIN | 缩写/符号 | HTML实体 | 描述 |
---|---|---|---|---|---|---|
128 | 200 | 80 | 10000000 | € | | Euro sign |
129 | 201 | 81 | 10000001 | |||
130 | 202 | 82 | 10000010 | ‚ | | Single low-9 quotation mark |
131 | 203 | 83 | 10000011 | ƒ | | Latin small letter f with hook |
132 | 204 | 84 | 10000100 | „ | | Double low-9 quotation mark |
133 | 205 | 85 | 10000101 | … | Horizontal ellipsis | |
134 | 206 | 86 | 10000110 | † | | Dagger |
135 | 207 | 87 | 10000111 | ‡ | | Double dagger |
136 | 210 | 88 | 10001000 | ˆ | | Modifier letter circumflex accent |
137 | 211 | 89 | 10001001 | ‰ | | Per mille sign |
138 | 212 | 8A | 10001010 | Š | | Latin capital letter S with caron |
139 | 213 | 8B | 10001011 | ‹ | | Single left-pointing angle quotation |
140 | 214 | 8C | 10001100 | Œ | | Latin capital ligature OE |
141 | 215 | 8D | 10001101 | |||
142 | 216 | 8E | 10001110 | Ž | | Latin capital letter Z with caron |
143 | 217 | 8F | 10001111 | |||
144 | 220 | 90 | 10010000 | |||
145 | 221 | 91 | 10010001 | ‘ | | Left single quotation mark |
146 | 222 | 92 | 10010010 | ’ | | Right single quotation mark |
147 | 223 | 93 | 10010011 | “ | | Left double quotation mark |
148 | 224 | 94 | 10010100 | ” | | Right double quotation mark |
149 | 225 | 95 | 10010101 | • | | Bullet |
150 | 226 | 96 | 10010110 | – | | En dash |
151 | 227 | 97 | 10010111 | — | | Em dash |
152 | 230 | 98 | 10011000 | ˜ | | Small tilde |
153 | 231 | 99 | 10011001 | | Trade mark sign | |
154 | 232 | 9A | 10011010 | š | | Latin small letter S with caron |
155 | 233 | 9B | 10011011 | › | | Single right-pointing angle quotation mark |
156 | 234 | 9C | 10011100 | œ | | Latin small ligature oe |
157 | 235 | 9D | 10011101 | |||
158 | 236 | 9E | 10011110 | ž | | Latin small letter z with caron |
159 | 237 | 9F | 10011111 | Ÿ | | Latin capital letter Y with diaeresis |
160 | 240 | A0 | 10100000 | Non-breaking space | ||
161 | 241 | A1 | 10100001 | ¡ | ¡ | Inverted exclamation mark |
162 | 242 | A2 | 10100010 | ¢ | ¢ | Cent sign |
163 | 243 | A3 | 10100011 | £ | £ | Pound sign |
164 | 244 | A4 | 10100100 | ¤ | ¤ | Currency sign |
165 | 245 | A5 | 10100101 | ¥ | ¥ | Yen sign |
166 | 246 | A6 | 10100110 | ¦ | ¦ | Pipe, Broken vertical bar |
167 | 247 | A7 | 10100111 | § | § | Section sign |
168 | 250 | A8 | 10101000 | ¨ | ¨ | Spacing diaeresis - umlaut |
169 | 251 | A9 | 10101001 | © | Copyright sign | |
170 | 252 | AA | 10101010 | ª | ª | Feminine ordinal indicator |
171 | 253 | AB | 10101011 | « | « | Left double angle quotes |
172 | 254 | AC | 10101100 | ¬ | ¬ | Not sign |
173 | 255 | AD | 10101101 | | | Soft hyphen |
174 | 256 | AE | 10101110 | ® | Registered trade mark sign | |
175 | 257 | AF | 10101111 | ¯ | ¯ | Spacing macron - overline |
176 | 260 | B0 | 10110000 | ° | ° | Degree sign |
177 | 261 | B1 | 10110001 | ± | ± | Plus-or-minus sign |
178 | 262 | B2 | 10110010 | ² | ² | Superscript two - squared |
179 | 263 | B3 | 10110011 | ³ | ³ | Superscript three - cubed |
180 | 264 | B4 | 10110100 | ´ | ´ | Acute accent - spacing acute |
181 | 265 | B5 | 10110101 | µ | µ | Micro sign |
182 | 266 | B6 | 10110110 | ¶ | ¶ | Pilcrow sign - paragraph sign |
183 | 267 | B7 | 10110111 | · | · | Middle dot - Georgian comma |
184 | 270 | B8 | 10111000 | ¸ | ¸ | Spacing cedilla |
185 | 271 | B9 | 10111001 | ¹ | ¹ | Superscript one |
186 | 272 | BA | 10111010 | º | º | Masculine ordinal indicator |
187 | 273 | BB | 10111011 | » | » | Right double angle quotes |
188 | 274 | BC | 10111100 | ¼ | ¼ | Fraction one quarter |
189 | 275 | BD | 10111101 | ½ | ½ | Fraction one half |
190 | 276 | BE | 10111110 | ¾ | ¾ | Fraction three quarters |
191 | 277 | BF | 10111111 | ¿ | ¿ | Inverted question mark |
192 | 300 | C0 | 11000000 | À | À | Latin capital letter A with grave |
193 | 301 | C1 | 11000001 | Á | Á | Latin capital letter A with acute |
194 | 302 | C2 | 11000010 | Â | Â | Latin capital letter A with circumflex |
195 | 303 | C3 | 11000011 | Ã | Ã | Latin capital letter A with tilde |
196 | 304 | C4 | 11000100 | Ä | Ä | Latin capital letter A with diaeresis |
197 | 305 | C5 | 11000101 | Å | Å | Latin capital letter A with ring above |
198 | 306 | C6 | 11000110 | Æ | Æ | Latin capital letter AE |
199 | 307 | C7 | 11000111 | Ç | Ç | Latin capital letter C with cedilla |
200 | 310 | C8 | 11001000 | È | È | Latin capital letter E with grave |
201 | 311 | C9 | 11001001 | É | É | Latin capital letter E with acute |
202 | 312 | CA | 11001010 | Ê | Ê | Latin capital letter E with circumflex |
203 | 313 | CB | 11001011 | Ë | Ë | Latin capital letter E with diaeresis |
204 | 314 | CC | 11001100 | Ì | Ì | Latin capital letter I with grave |
205 | 315 | CD | 11001101 | Í | Í | Latin capital letter I with acute |
206 | 316 | CE | 11001110 | Î | Î | Latin capital letter I with circumflex |
207 | 317 | CF | 11001111 | Ï | Ï | Latin capital letter I with diaeresis |
208 | 320 | D0 | 11010000 | Ð | Ð | Latin capital letter ETH |
209 | 321 | D1 | 11010001 | Ñ | Ñ | Latin capital letter N with tilde |
210 | 322 | D2 | 11010010 | Ò | Ò | Latin capital letter O with grave |
211 | 323 | D3 | 11010011 | Ó | Ó | Latin capital letter O with acute |
212 | 324 | D4 | 11010100 | Ô | Ô | Latin capital letter O with circumflex |
213 | 325 | D5 | 11010101 | Õ | Õ | Latin capital letter O with tilde |
214 | 326 | D6 | 11010110 | Ö | Ö | Latin capital letter O with diaeresis |
215 | 327 | D7 | 11010111 | × | × | Multiplication sign |
216 | 330 | D8 | 11011000 | Ø | Ø | Latin capital letter O with slash |
217 | 331 | D9 | 11011001 | Ù | Ù | Latin capital letter U with grave |
218 | 332 | DA | 11011010 | Ú | Ú | Latin capital letter U with acute |
219 | 333 | DB | 11011011 | Û | Û | Latin capital letter U with circumflex |
220 | 334 | DC | 11011100 | Ü | Ü | Latin capital letter U with diaeresis |
221 | 335 | DD | 11011101 | Ý | Ý | Latin capital letter Y with acute |
222 | 336 | DE | 11011110 | Þ | Þ | Latin capital letter THORN |
223 | 337 | DF | 11011111 | ß | ß | Latin small letter sharp s - ess-zed |
224 | 340 | E0 | 11100000 | à | à | Latin small letter a with grave |
225 | 341 | E1 | 11100001 | á | á | Latin small letter a with acute |
226 | 342 | E2 | 11100010 | â | â | Latin small letter a with circumflex |
227 | 343 | E3 | 11100011 | ã | ã | Latin small letter a with tilde |
228 | 344 | E4 | 11100100 | ä | ä | Latin small letter a with diaeresis |
229 | 345 | E5 | 11100101 | å | å | Latin small letter a with ring above |
230 | 346 | E6 | 11100110 | æ | æ | Latin small letter ae |
231 | 347 | E7 | 11100111 | ç | ç | Latin small letter c with cedilla |
232 | 350 | E8 | 11101000 | è | è | Latin small letter e with grave |
233 | 351 | E9 | 11101001 | é | é | Latin small letter e with acute |
234 | 352 | EA | 11101010 | ê | ê | Latin small letter e with circumflex |
235 | 353 | EB | 11101011 | ë | ë | Latin small letter e with diaeresis |
236 | 354 | EC | 11101100 | ì | ì | Latin small letter i with grave |
237 | 355 | ED | 11101101 | í | í | Latin small letter i with acute |
238 | 356 | EE | 11101110 | î | î | Latin small letter i with circumflex |
239 | 357 | EF | 11101111 | ï | ï | Latin small letter i with diaeresis |
240 | 360 | F0 | 11110000 | ð | ð | Latin small letter eth |
241 | 361 | F1 | 11110001 | ñ | ñ | Latin small letter n with tilde |
242 | 362 | F2 | 11110010 | ò | ò | Latin small letter o with grave |
243 | 363 | F3 | 11110011 | ó | ó | Latin small letter o with acute |
244 | 364 | F4 | 11110100 | ô | ô | Latin small letter o with circumflex |
245 | 365 | F5 | 11110101 | õ | õ | Latin small letter o with tilde |
246 | 366 | F6 | 11110110 | ö | ö | Latin small letter o with diaeresis |
247 | 367 | F7 | 11110111 | ÷ | ÷ | Division sign |
248 | 370 | F8 | 11111000 | ø | ø | Latin small letter o with slash |
249 | 371 | F9 | 11111001 | ù | ù | Latin small letter u with grave |
250 | 372 | FA | 11111010 | ú | ú | Latin small letter u with acute |
251 | 373 | FB | 11111011 | û | û | Latin small letter u with circumflex |
252 | 374 | FC | 11111100 | ü | ü | Latin small letter u with diaeresis |
253 | 375 | FD | 11111101 | ý | ý | Latin small letter y with acute |
254 | 376 | FE | 11111110 | þ | þ | Latin small letter thorn |
255 | 377 | FF | 11111111 | ÿ | ÿ | Latin small letter y with diaeresis |
str和bytes的区别bytes在python3中明确表示2进制数据。
字符串和二进制bytes之间的转换
字符串-->二进制,使用encode()
二进制-->字符串,使用decode('字符编码XXX')
decode('unicode_escape
')
decode('utf-8
')
...参考https://www.cnblogs.com/zhangmingda/p/9030229.html
256