我的字符串是:“sooo dear how areeeee youuuuuu”
我想检查字符串中的单词是否拉长.
细长意味着:如果单词中的字符数重复超过两次,例如,也不会拉长,但是太长.
>>> import itertools
>>> my_str = 'soooo hiiiii whyyyy done'
>>> print [[g[0], sum(1 for _ in g[1])] for g in itertools.groupby(my_str)]
[['s', 1], ['o', 4], [' ', 1], ['h', 1], ['i', 5], [' ', 1], ['w', 1], ['h', 1],
['y', 4], [' ', 1], ['d', 1], ['o', 1], ['n', 1], ['e', 1]]
我想展示那个sooo,areeeee和youuuuuu是拉长的.我做了个别字符计数,但我想检查每个单词,看它是否拉长.
解决方法:
一个正则表达式浮现在脑海中:
>>> my_str = 'soooo hiiiii whyyyy done'
>>> import re
>>> regex = re.compile(r"(.)\1{2}")
>>> [word for word in my_str.split() if regex.search(word)]
['soooo', 'hiiiii', 'whyyyy']
说明:
(.) # Match any character, capture it in group number 1
\1{2} # Try to match group number 1 here, twice.
请注意,这个算法也会找到一些像对抗或腹腔镜输卵管切除术这样的非语言词,但我猜这些误报很少见:)