A DNA sequence consists of four letters, A, C, G, and T. The GC-ratio of a DNA sequence is the number of Cs and Gs of the sequence divided by the length of the sequence. GC-ratio is important in gene finding because DNA sequences with relatively high GC-ratios might be good candidates for the starting parts of genes. Given a very long DNA sequence, researchers are usually interested in locating a subsequence whose GC-ratio is maximum over all subsequences of the sequence. Since short subsequences with high GC-ratios are sometimes meaningless in gene finding, a length lower bound is given to ensure that a long subsequence with high GC-ratio could be found. If, in a DNA sequence, a 0 is assigned to every A and T and a 1 to every C and G, the DNA sequence is transformed into a binary sequence of the same length. GC-ratios in the DNA sequence are now equivalent to averages in the binary sequence.
Position | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | |||||||||
Index | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
Sequence | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 0 | 1 | 0 |
For the binary sequence above, if the length lower bound is 7, the maximum average is 6/8 which happens in the subsequence [7,14]. Its length is 8, which is greater than the length lower bound 7. If the length lower bound is 5, then the subsequence [7,11] gives
the maximum average 4/5. The length is 5 which is equal to the length lower bound. For the subsequence [7,11], 7 is its starting index and 11 is its ending index.
Given a binary sequence and a length lower bound L, write a program to find a subsequence of the binary sequence whose length is at least L and whose average is maximum over all subsequences of the binary sequence. If two or more subsequences have the maximum average, then find the shortest one; and if two or more shortest subsequences with the maximum average exist, then find the one with the smallest starting index.
Input
Your program is to read from standard input. The input consists of T test cases. The number of test cases Tis given in the first line of the input. Each test case starts with a line containing two integers n (1n100, 000) and L (1L1, 000) which are the length of a binary sequence and a length lower bound, respectively. In the next line, a string, binary sequence, of length n is given.
Output
Your program is to write to standard output. Print the starting and ending index of the subsequence.
The following shows sample input and output for two test cases.
Sample Input
2 17 5 00101011011011010 20 4 11100111100111110000
Sample Output
7 11 6 9
题意:给一个长度n的串,求长度不小于L的连续字串平均值最大。平均值=总和/长度
思路:《浅谈数形结合思想在信息学竞赛中的应用》中的例题,把每个位置都当成坐标系上的一个点,平均值即为两点的斜率。如此一来问题便转化为了求2点使得斜率最大。然后维护一个下凸点集(可证明,参考论文),每一点找到和这个下凸点集的切点即为斜率的最大值。在查找过程中,如果找到一个切点,那么切点之前的点就不需要在去考虑了,因为算出来斜率肯定不会比该切点或切点之后的更大(可简单证明出)
代码:
#include <stdio.h> #include <string.h> const int N = 100005; int t, n, L; char str[N]; struct Point { int x, y; Point(){x = 0; y = 0;} Point(int xx, int yy) { x = xx; y = yy; } }p[N], Q[N]; struct Line { Point p1, p2; Line(){} Line(Point pp1, Point pp2) { p1 = pp1; p2 = pp2; } bool operator > (const Line &a) { return (p2.y - p1.y) * (a.p2.x - a.p1.x) > (a.p2.y - a.p1.y) * (p2.x - p1.x); } bool operator == (const Line &a) { return (p2.y - p1.y) * (a.p2.x - a.p1.x) == (a.p2.y - a.p1.y) * (p2.x - p1.x); } bool operator >= (const Line &a) { return *this > a || *this == a; } Line operator = (const Line &a) { p1 = a.p1; p2 = a.p2; return *this; } int xlen() { return p2.x - p1.x; } }; void init() { int y = 0; scanf("%d%d", &n, &L); scanf("%s", str + 1); for (int i = 1; i <= n; i++) { p[i].x = i; if (str[i] == ‘0‘) p[i].y = y; else p[i].y = ++y; } } void solve() { int l = 0, r = -1, ansl = 0, ansr = L; Line Max(p[0], p[L]); for (int k = L; k <= n; k++) { int i = k - L; while (r > l && Line(Q[r - 1], p[i]) >= Line(Q[r], p[i])) r--; Q[++r] = p[i]; while (r > l && Line(Q[l + 1], p[k]) >= Line(Q[l], p[k])) l++; Line t = Line(Q[l], p[k]); if (t > Max || (t == Max && t.xlen() < Max.xlen())) { Max = t; ansl = Max.p1.x; ansr = Max.p2.x; } } printf("%d %d\n", ansl + 1, ansr); } int main() { scanf("%d", &t); while (t--) { init(); solve(); } return 0; }