11.python内置数据--字符串

2023-01-02 12:04:25

字符串：

一个个字符组成的有序的序列，是字符的集合
使用单引号、双引号、三引号引住的字符序列
字符串是不可变对象
Python3起，字符串就是Unicode类型

 1 >>> a = 'ab' + 'c'+ 'd'
 2 >>> type(a)
 3 <class 'str'>
 4 >>> id(a)
 5 12128640
 6 >>> a = 'ab' + 'c'+ 'd'+ 'e'
 7 >>> id(a)
 8 12128672
 9 # 内存分配的空间不一样
10 >>> a
11 'abcde'
12 >>> a[1]
13 'b'
14 >>> a[1] = l
15 Traceback (most recent call last):
16   File "<stdin>", line 1, in <module>
17 NameError: name 'l' is not defined
18 # 和列表一样的用法，也是不可改变

字符串的定义和初始化

 1 s1 = 'string'
 2 print(s1)
 3 string
 4 ##############################
 5 s2 = "string2"
 6 print(s2)
 7 string2
 8 ##############################
 9 s3 = '''this's a "String"'''
10 print(s3)
11 this's a "String"
12 ##############################
13 s4 = 'hello \n magedu.com'
14 print(s4)
15 hello 
16  magedu.com
17 ##############################
18 s5 = r"hello \n magedu.com"
19 print(s5)
20 hello \n magedu.com
21 ##############################
22 s6 = 'c:\windows\nt'
23 print(s6)
24 c:\windows
25 t
26 ##############################
27 s7 = R"c:\windows\nt"
28 print(s7)
29 c:\windows\nt
30 ##############################
31 s8 = 'c:\windows\\nt'
32 print(s8)
33 c:\windows\nt
34 ##############################
35 sql = """select * from user where name='tom'"""
36 print(sql)
37 select * from user where name='tom'
38 
39 # 注意各种标点和转义符

字符串元素访问

字符串支持使用索引访问

 1 sql = """select * from user where name='tom'"""
 2 print(sql[4])
 3 sql[4] = 'o'
 4 print(sql[4])
 5 #######################################
 6 c
 7 # 这里返回的是字符串'c'，而不是字符c
 8     sql[4] = 'o'
 9 TypeError: 'str' object does not support item assignment
10 # 这里注意不可进行修改

有序的字符集合，字符序列

1 sql = """select * from user where name='tom'"""
2 for c in sql:
3     print(c,end=' ')
4 print()
5 print(type(c))
6 #####################################
7 s e l e c t   *   f r o m   u s e r   w h e r e   n a m e = ' t o m ' 
8 <class 'str'>

可迭代

1 sql = """select * from user where name='tom'"""
2 lst = list(sql)
3 print(lst)
4 t = tuple(sql)
5 print(t)
6 #######################################
7 ['s', 'e', 'l', 'e', 'c', 't', ' ', '*', ' ', 'f', 'r', 'o', 'm', ' ', 'u', 's', 'e', 'r', ' ', 'w', 'h', 'e', 'r', 'e', ' ', 'n', 'a', 'm', 'e', '=', "'", 't', 'o', 'm', "'"]
8 ('s', 'e', 'l', 'e', 'c', 't', ' ', '*', ' ', 'f', 'r', 'o', 'm', ' ', 'u', 's', 'e', 'r', ' ', 'w', 'h', 'e', 'r', 'e', ' ', 'n', 'a', 'm', 'e', '=', "'", 't', 'o', 'm', "'")

字符串连接

+连接，a=‘a’，a=‘a’+'b'，a='ab'，将两个字符串连接在一起，返回一个新字符串
join连接，“string”.join(iterable) -> str
- 将可迭代对象连接起来，使用string作为分隔符
- 可迭代对象本身元素都是字符串
- 返回一个新字符串

 1 a = 'abcd'
 2 b = ' '.join(a)
 3 print(b)
 4 a b c d
 5 ###################
 6 c = "@".join(a)
 7 print(c)
 8 a@b@c@d
 9 ###################
10 lst = ['1','2','3']
11 print(' '.join(lst))
12 1 2 3
13 ###################
14 print("\n".join(lst))
15 1
16 2
17 3
18 ###################
19 lst1 = ['1',['a','b'],'3']
20 print(' '.join(lst1))
21 Traceback (most recent call last):
22   File "D:/untitled/project2/day1/zifuchuan.py", line 2, in <module>
23     print(' '.join(lst1))
24 TypeError: sequence item 1: expected str instance, list foun
25 # 这种嵌套类型的会报错

字符串分割

split系
- 将字符串按照分隔符分割成若干字符串，并返回列表
- split(sep=None,maxsplit=-1) -> list of strings
  - 从左到右
  - sep指定分割字符串，缺省的情况下空白字符串作为分隔符
  - maxsplit指定分割的次数，-1表示遍历整个字符串

 1 >>> s1 = "I'm \ta super student."
 2 
 3 >>> s1.split()
 4 ["I'm", 'a', 'super', 'student.']
 5 
 6 >>> s1.split('s')
 7 ["I'm \ta ", 'uper ', 'tudent.']
 8 
 9 >>> s1.split('super')
10 ["I'm \ta ", ' student.']
11 
12 >>> s1.split(' ')
13 ["I'm", '\ta', 'super', 'student.']
14 
15 >>> s1.split(' ',2)
16 ["I'm", '\ta', 'super student.']
17 
18 >>> s1.split('\t',2)
19 ["I'm ", 'a super student.']

resplit()是相反的效果。
- 从右到左，其余的参考split()
splitlines([keepends]) -> list of strings
- 按照行来切分字符串
- keepends指的是是否保留行分隔符
- 行分隔符包括\n、\r\n、\r等

 1 >>> 'ab c\n\nde fg\rk|\r\n'.splitlines()
 2 ['ab c', '', 'de fg', 'k|']
 3 # 行分隔符包括\n、\r\n、\r等等
 4 
 5 >>> 'ab c\n\nde fg\rk|\r\n'.splitlines(True)
 6 ['ab c\n', '\n', 'de fg\r', 'k|\r\n']
 7 #True保留行分隔符
 8 
 9 >>> s1 = '''I'm a super student.You're a super teacher.'''
10 >>> print(s1)
11 I'm a super student.You're a super teacher.
12 
13 >>> print(s1.splitlines())
14 ["I'm a super student.You're a super teacher."]
15 
16 >>> print(s1.splitlines(True))
17 ["I'm a super student.You're a super teacher."]

partition系
- 将字符串按照分隔符分割成2段，返回这2段和分隔符的元组
- partition(sep) -> (head,sep,tail)
  - 从左到右，遇到分隔符就把字符串分割成两部分，返回头、分隔符、尾三部分的三元组；如果没有找到分隔符，就返回头、2个空元素的三元组
  - sep分割字符串，必须指定

 1 >>> s1 = "I'm a super student."
 2 >>> s1.partition('s')
 3 ("I'm a ", 's', 'uper student.')
 4 
 5 >>> s1.partition('stu')
 6 ("I'm a super ", 'stu', 'dent.')
 7 
 8 >>> s1.partition('')
 9 Traceback (most recent call last):
10   File "<stdin>", line 1, in <module>
11 ValueError: empty separator
12 # 分隔符不能为空 
13 
14 >>> s1.partition(' ')
15 ("I'm", ' ', 'a super student.')
16 
17 >>> s1.partition('abc')
18 ("I'm a super student.", '', '')

- rpartition(sep) -> (head,sep,tail)
  - 从右到左，从左到右，遇到分隔符就把字符串分割成两部分，返回头、分隔符、尾三部分的三元组；如果没有找到分隔符，就返回2个空元素和尾的三元组

字符串的大小写

upper() -> 全大写
lower() -> 全小写
大小写，做判断的时候用
swapcase() -> 交互大小写

1 >>> s1 = "I'm a super student."
2 >>> s1.upper()
3 "I'M A SUPER STUDENT."
4 
5 >>> s1.lower()
6 "i'm a super student."
7 
8 >>> s1.swapcase()
9 "i'M A SUPER STUDENT."

字符串排版

title() -> str
- 标题的每个单词都大写
capitalize() -> str
- 首个单词大写
center(width[,fillchar]) -> str
- width 打印宽度
- fillchar 填充的字符
zfill(width) -> str
- width 打印宽度，居右，左边用0填充
ljust(width[,fillchar]) -> str 左对齐
rjust(width[,fillchar]) -> str 右对齐

 1 >>> s = "I'm a super STUDENT."
 2 >>> s.title()
 3 "I'M A Super Student."
 4 
 5 >>> s.capitalize()
 6 "I'm a super student."
 7 
 8 >>> s.center(20)
 9 "I'm a super STUDENT."
10 
11 >>> s.center(50)
12 "               I'm a super STUDENT.               "
13 
14 >>> s.center(50,'#')
15 "###############I'm a super STUDENT.###############"
16 
17 >>> s.zfill(50)
18 "000000000000000000000000000000I'm a super STUDENT."
19 
20 >>> s.ljust(50,'#')
21 "I'm a super STUDENT.##############################"
22 
23 >>> s.rjust(50,'#')
24 "##############################I'm a super STUDENT."

字符串修改

replace(old,new[,count]) -> str
- 字符串中找到匹配替换为新子串，返回新字符串
- count表示替换几次，不指定就是全部替换

 1 >>> 'www..magedu.com'.replace('w','p')
 2 'ppp..magedu.com'
 3 
 4 >>> 'www..magedu.com'.replace('w','p',2)
 5 'ppw..magedu.com'
 6 
 7 >>> 'www..magedu.com'.replace('w','p',1)
 8 'pww..magedu.com'
 9 
10 >>> 'www..magedu.com'.replace('ww','p',2)
11 'pw..magedu.com'
12 
13 >>> 'www..magedu.com'.replace('www','python',2)
14 'python..magedu.com'

strip([chars]) -> str
- 从字符串两端去除指定的字符集chars中的所有字符
- 如果chars没有指定，去除两端的空白字符
- lstrip([chars]) -> str 从左开始
- rstrip([chars]) -> str 从右开始

 1 >>> s = "\r \n \t Hello Python \n \t"
 2 >>> s.strip()
 3 'Hello Python'
 4 # 默认去掉两端的空白字符
 5 
 6 >>> s1 = " I am very very very sorry    "
 7 >>> s1.strip()
 8 'I am very very very sorry'
 9 
10 >>> s1.strip('r')
11 ' I am very very very sorry    '
12 # 因为首尾两端都没有r，因此无修改
13 
14 >>> s1.strip('r y')
15 'I am very very very so'
16 # 这里要注意，去除的是r、空格和y这三个字符，首端查找有空格去掉，尾端先去掉空格，再去掉字符y，再去掉两个字符r
17 
18 >>> s1.strip('r yIamso')
19 'very very ve'
20 
21 >>> s1.lstrip('r yIamso')
22 'very very very sorry    '
23 
24 >>> s1.rstrip('r yIamso')
25 ' I am very very ve'
26 >>>

字符串查找

find(sub[,start[,end]]) -> int
- 在指定的区间[start,end)，从左至右，查找子串sub。找到返回索引，没找到返回-1
rfind(sub[,start[,end]]) -> int
- 在指定的区间[start,end)，从右至左，查找子串sub。找到返回索引，没找到返回-1

 1 >>> s = "I am very very very sorry"
 2 >>> s.find('very')
 3 5
 4 # 每一个字母每一个空格都算一个字符
 5 # 且从0开始算起，I+空格+a+m+空格，正好第6个是very，因此返回5
 6 >>> s.find('very',5)
 7 5
 8 # 表示是从第5个开始查
 9 >>> s.find('very',6,13)
10 -1
11 # 表示是从第6个开始查，到第13个结束，查询不到返回-1的结果
12 >>> info = 'abca'
13 >>> info.find('a')
14 0
15 # 从下标0开始，查找在字符串里第一个出现的子串，返回结果0
16 >>> info.find('a',1)
17 3
18 # 从下标1开始，查找在字符串里第一个出现的子串：返回结果3
19 >>> info.find('3')
20 -1
21 # 查找不到返回-1

index(sub[,start[,end]]) -> int
- 在指定的区间[start,end)，从左到右，查找子串sub。找到返回索引，没找到抛出异常ValueError
rindex(sub[,start[,end]]) -> int
- 在指定的区间[start,end)，从右到左，查找子串sub。找到返回索引，没找到抛出异常ValueError

 1 >>> a = "I am very very very sorry"
 2 >>> a.index('very')
 3 5
 4 >>> a.index('very',5)
 5 5
 6 >>> a.index('very',6,13)
 7 Traceback (most recent call last):
 8   File "<stdin>", line 1, in <module>
 9 ValueError: substring not found
10 >>> a.rindex('very',10)
11 15
12 >>> a.rindex('very',10,15)
13 10
14 >>> a.rindex('very',-10,-1)
15 15
16 # 注意与find()比较

count(sub[,start[,end]]) -> int
- 在指定的区间[start,end)，从左到右，统计子串sub出现的次数

 1 >>> a = "I am very very very sorry"
 2 >>> a.rindex('very',10)
 3 15
 4 >>> a.rindex('very',10,15)
 5 10
 6 >>> a.rindex('very',-10,-1)
 7 15
 8 >>> a.count('very')
 9 3
10 >>> a.count('very',5)
11 3
12 >>> a.count('very',10,14)
13 1
14 # 注意，count是统计个数的

时间复杂度
- index和count方法都是O(n)
- 随着列表数据规模的增大，而效率下降
len(string)
- 返回字符串的长度，即字符的个数

字符串判断

endswith(suffix[,start[,end]]) -> bool
- 在指定的区间[start,end)，字符串是否是suffix结尾
startswith(prefix[,start[,end]]) -> bool
- 在指定的区间[start,end)，字符串是否是prefix开头

 1 >>> a = "I am very very very sorry"
 2 >>> a.startswith('very')
 3 False
 4 >>> a.startswith('very',5)
 5 True
 6 >>> a.startswith('very',5,9)
 7 True
 8 >>> a.endswith('very',5,9)
 9 True
10 >>> a.endswith('very',5)
11 False
12 >>> a.endswith('very',5,-1)
13 False
14 >>> a.endswith('very',5,100)
15 False
16 >>> a.endswith('sorry',5)
17 True
18 >>> a.endswith('sorry',5,-1)
19 False
20 >>> a.endswith('sorry',5,100)
21 True
22 # 注意左闭右开，5,9是从第5个字串开始到第8个字串，不含第9个

is系列
- isalnum() -> bool是否是字母和数字组成
- isalpha()是否是字母
- isdecimal()是否只包含十进制数字
- isdigit是否全部数字(0~9)
- isidentifier()是不是字母和下划线开头，其他都是字母、数字、下划线
- islower()是否都是小写
- isupper()是否全部大写
- isspace()是否只包含空白字符

字符串格式化(C语言风格)

字符串的格式化是一种拼接字符串输出样式的手段，更灵活方便
- join拼接只能使用分隔符，且要求被拼接的是可迭代对象
- +拼接字符串还算方便，但是非字符串需要先转换为字符串才能拼接
在2.5版本之前，只能使用printf style风格的print输出
- printf-style formatting，来自于C语言的print函数
- 格式要求
  - 占位符：使用%和格式字符组成，例如%s、%d等
    - s调用str()，r会调用 repr()。所有对象都可以被这两个转换
  - 占位符中还可以插入修饰字符，例如%03d表示打印3个位置，不够前面补零
  - format % values，格式字符串和被格式的值之间使用%分隔
  - values只能是一个对象，或是一个和格式字符占位符数目相等的元组，或一个字典

 1 >>> "I am %03d" % (20,)
 2 'I am 020'
 3 # (20,)表示20是元组的一个元素，后面加逗号
 4 >>> "I am %013d" % (20,)
 5 'I am 0000000000020'
 6 # %013d表示占了13位的字符(加上括号中的20一共13位)，前面不够补0
 7 >>> "I am %03d" % (20)
 8 'I am 020'
 9 # 也可以是单纯的一个对象20
10 >>> "I am %013d" % ([20])
11 Traceback (most recent call last):
12   File "<stdin>", line 1, in <module>
13 TypeError: %d format: a number is required, not list
14 # 提示values不能是列表
15 >>> "I am %013d" % (20,21)
16 Traceback (most recent call last):
17   File "<stdin>", line 1, in <module>
18 TypeError: not all arguments converted during string formatting
19 # 格式字符串和被格式的值是一一对应的，数目必须一样
20 >>> 'I like %s.' % 'Python'
21 'I like Python.'
22 >>> '%3.2f%%,0x%x,0X%02X' % (89.7654,10,15)
23 '89.77%,0xa,0X0F'
24 # 同符号双次或者转义符，%%才能打印出%
25 # %3.2f%%中的3指的是占3个有效字符，2指的是小数位取2位数字
26 # 0x%x是将10转换成16进制数
27 >>> "I am %50s%%" % (20)
28 'I am                                                 20%'
29 >>> "I am %-5d" % (20,)
30 'I am 20   '
31 >>> "I am %5d" % (20,)
32 'I am    20'
33 # -表示左对齐

字符串格式化(Python风格)

format函数格式字符串语法——Python鼓励使用
- "{} {xxx}".format(*args, **kwargs) -> str
- args是位置参数，是一个元组或者列表
- kwargs是关键字参数，是一个字典
- 花括号表示占位符
- {}表示按照顺序匹配位置参数，{n}表示取位置参数索引为n的值
- {xxx}表示在关键字参数中搜索名称一致的
- {{}}表示打印花括号
位置参数
- "{}:{}".format('192.168.1.100',8888)，这就是按照位置顺序用位置参数替换前面的格式字符串的占位符中
关键字参数或命名参数
- ”{server} {1}:{0}".format(8888,'192.168.1.100', server='Web Server Info : ')，位置参数按照序号匹配，关键字参数按照名词匹配
访问元素
- “{0[0]}.{0[1]}".format(('magedu','com'))
对象属性访问
- from collections import namedtuple
- Point = namedtuple('Point','x y')
- p = Point(4,5)
- "{{{0.x},{0.y}}}".format(p)

 1 "{}:{}".format('192.168.1.100',8888)
 2 '192.168.1.100:8888'
 3 >>> "{server}{1}:{0}".format(8888,'192.168.1.100',server = 'Web Server Info:')
 4 'Web Server Info:192.168.1.100:8888'
 5 >>> "{0[0]}.{0[1]}".format(('magedu','com'))
 6 'magedu.com'
 7 >>> from collections import namedtuple
 8 >>> Point = namedtuple('Point','x y')
 9 >>> p = Point(4,5)
10 >>> "{{{0.x},{0.y}}}".format(p)
11 '{4,5}'
12 >>>

对齐

 1 >>> '{0}*{1}={2:<2}'.format(3,2,2*3)
 2 '3*2=6 '
 3 >>> '{0}*{1}={2:<02}'.format(3,2,2*3)
 4 '3*2=60'
 5 >>> '{0}*{1}={2:>02}'.format(3,2,2*3)
 6 '3*2=06'
 7 >>> '{0}*{1}={2:>2}'.format(3,2,2*3)
 8 '3*2= 6'
 9 # <是往右对齐，>是往左对齐
10 
11 >>> '{:^30}'.format('centered')
12 '           centered           '
13 >>> '{:*^30}'.format('centered')
14 '***********centered***********'

进制

 1 >>> "int:{0:d}; hex:{0:x}; oct:{0:o}; bin:{0:b}".format(42)
 2 'int:42; hex:2a; oct:52; bin:101010'
 3 # 第一个是十进制,第二个十六进制2*16+a=32+10=42,第三个权重5*8+2=42
 4 
 5 >>> "int:{0:d}; hex:{0:#x}; oct:{0:#o}; bin:{0:#b}".format(42)
 6 'int:42; hex:0x2a; oct:0o52; bin:0b101010'
 7 >>> "int:{0:#d}; hex:{0:#x}; oct:{0:#o}; bin:{0:#b}".format(42)
 8 'int:42; hex:0x2a; oct:0o52; bin:0b101010'
 9 
10 >>> octets = [192,168,0,1]
11 >>> '{:02X}{:02X}{:02X}{:02X}'.format(*octets)
12 'C0A80001'
13 # C0是192，A8是168，00是0，01是1

码农公寓

相关文章