转载一篇博客：python如何去除字符串中不想要的字符

2022-02-21 21:18:09

学习 python 中的笔记之转载内容

问题：

1、过滤用户输入中前后多余的空白字符
‘ ++++++ABC23----- ’
2、过滤某 windows 下编辑文本中的 ‘\r’
‘hello world \r\n’
3、去掉文本中 unicode 组合字符，音调
‘Zhào Qián Sūn Lǐ Zhōu Wú Zhèng Wáng’

解决方法：

一、去掉两端字符串:strip()

# # 字符串学习
s = '   +++++abc123----  '
# 删除两边空字符
print(s.strip())
# 删除左边空字符
print(s.rstrip())
# 删除右边空字符
print(s.lstrip())
# 删除两边 - + 和空字符
print(s.strip().strip('-+'))
print(s.strip( ).strip('-+'))
# 输出结果如下

二、删除单个固定位置字符：切片+拼接

s = 'abc:123'
# 字符串拼接方式去除冒号
new_s = s[:3]+s[4:]
print(new_s)

输出结果：abc123

三、删除任意位置字符同时删除多种不同字符：replace(),re.sub()

# 去除字符串中相同的字符
s1 = '\tabc\t123\tisk'
print('s1: \n', s1.replace('\t', ''))

import re
# 去除 \t \r \n 字符
s2 = '\r\nabc\t123\nxyz'
print('s2: ', re.sub('[\r\n\t]]', '', s2))

输出结果：
s1: 
abc123isk
s2:
abc 123
xyz

四、同时删除多种不同字符:translate()

s = 'abc123xyz'
# a_> x, b_> y, c_> z，字符映射加密
# str.maketrans() 做映射
print(str.maketrans('abcxyz', 'xyzabc'))
# tramslate 把其转化为字符串
print(s.translate(str.maketrans('abcxyz', 'xyzabc')))

输出结果：
{97: 120, 98: 121, 99: 122, 120: 97, 121: 98, 122: 99}
xyz123abc

五、去掉 unicode 字符中音调

import sys
import unicodedata
s = "Zhào Qián Sūn Lǐ Zhōu Wú Zhèng Wáng"
remap = {
    # ord 返回 ascii 值
    ord('\t'): '',
    ord('\f'): '',
    ord('\r'): None
}
# 去除 \t, \f, \r
a = s.translate(remap)
'''
通过使用 dict.fromkeys() 方法构造一个字典，每个 Unicode 和音符作为键，对于的值全部为 None
然后使用 unicodedata.normalize() 将原始输入标准化为分解形式字符
sys.maxunicode: 给出最大 Unicode 代码点的值的整数，即 1114111（十六进制的 0x10FFFF）
unicodedata.combining: 将分配给字符 chr 的规范组合类作为整数返回。如果未定义组合类，则返回 0
'''
cmb_chrs = dict.fromkeys(c for c in range(sys.maxunicode) if
                         unicodedata.combining(chr(c)))
b = unicodedata.normalize('NFD', a)
'''调用 translate 函数删除所有音符'''
print(b.translate(cmb_chrs))

输出结果：Zhao Qian Sun Li Zhou Wu Zheng Wang

码农公寓