python-计算文件中前两个“字符串”出现之间的跳转(无行数)

我有一个巨大的数据文件,在定义的行数之后会重复一个特定的字符串.

计算前两次“排名”出现之间的跳跃.例如,文件如下所示:

  1 5 6 8 Rank                     line-start
  2 4 8 5
  7 5 8 6
  5 4 6 4
  1 5 7 4 Rank                     line-end  
  4 8 6 4
  2 4 8 5
  3 6 8 9
  5 4 6 4 Rank

您会注意到,字符串Rank每3行重复一次.因此,对于上面的示例,块中的行数为4.我的问题是如何使用python readline()获得行数.

我目前遵循此:

data = open(filename).readlines()
count = 0
for j in range(len(data)):
  if(data[j].find('Rank') != -1): 
    if count == 0: line1 = j
    count = count +1 
  if(count == 2):
    no_of_lines = j - line1
    break

欢迎任何改进或建议.

解决方法:

我假设您要查找一个块中的行数,其中每个块以包含’Rank’的行开头,例如示例中有3个块:第1个有4行,第2个有4行,第3个有1行:

from itertools import groupby

def block_start(line, start=[None]):
    if 'Rank' in line:
       start[0] = not start[0]
    return start[0]

with open(filename) as file:
     block_sizes = [sum(1 for line in block) # find number of lines in a block
                    for _, block in groupby(file, key=block_start)] # group
print(block_sizes)
# -> [4, 4, 1]

如果所有块都具有相同的行数,或者您只想在以“ Rank”开头的第一个块中查找行数:

count = None
with open(filename) as file:
     for line in file:
         if 'Rank' in line:
             if count is None: # found the start of the 1st block
                count = 1
             else: # found the start of the 2nd block
                break
         elif count is not None: # inside the 1st block
             count += 1
print(count) # -> 4
上一篇:PHP:在特定短语/单词之后搜索文本文件并输出


下一篇:java-将文件内容读入ArrayList