MapReduce API基本概念 序列化,Reporter参数,回调机制 ---《hadoop技术内幕》读书笔记

Crazy Search

Time Limit: 10000/5000 MS (Java/Others)    Memory Limit: 65536/32768 K (Java/Others)
Total Submission(s): 1611    Accepted Submission(s): 586


Problem Description
Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle.

Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text.

As an example, consider N=3, NC=4 and the text "daababac". The different substrings of size 3 that can be found in this text are: "daa", "aab", "aba", "bab", "bac". Therefore, the answer should be 5.
 

Input
The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions.
 

Output
The program should output just an integer corresponding to the number of different substrings of size N found in the given text.
The first line of a multiple input is an integer N, then a blank line followed by N input blocks. Each input block is in the format indicated in the problem description. There is a blank line between input blocks.

The output format consists of N output blocks. There is a blank line between output blocks.

 

Sample Input
1 3 4 daababac
 

Sample Output
5
 

Recommend
Eddy   |   We have carefully selected several similar problems for you:  1391 1711 1496 1387 1385 
 题意:
给你一个长度不超过16*10^6的字符串。里面只可能出现nc种不同的字母。问你这个字符串有多少长度为n的不同的子串。
思路:
枚举长度为n的子串(从字符串的第n个位置到第len个位置。len为字符串长度)。剩下的就是判断字符串有没有出现过了。
第一种方法。就是利用STL模板库里的map。建立一个string到int的映射。若字符串出现则置1没出现置0.这样就可以判重了。map的使用点这
代码如下:
#include<algorithm>
#include<iostream>
#include<string.h>
#include<sstream>
#include<stdio.h>
#include<math.h>
#include<vector>
#include<string>
#include<queue>
#include<set>
#include<map>//常用的头文件。不知道有什么用的可以百度。
//#pragma comment(linker,"/STACK:1024000000,1024000000")
using namespace std;
const int INF=0x3f3f3f3f;
const double eps=1e-8;
const double PI=acos(-1.0);
const int maxn=16000010;
typedef __int64 ll;
char txt[maxn],tp;
map<string,int> mp;//建立map映射
int main()
{
    int i,j,n,nc,t,len,ans,one=1;

    scanf("%d",&t);
    while(t--)
    {
        if(!one)//判断是否是第一个输入。因为样例之间要空一行。
            printf("\n");
        scanf("%d%d",&n,&nc);
        scanf("%s",txt);
        len=strlen(txt);
        ans=0;
        mp.clear();//全部清0
        for(j=n;j<=len;j++)
        {
            tp=txt[j];
            txt[j]=‘\0‘;
            string tt(txt+j-n);//取出字符串.其实可以string tt(txt+j-n,n)。比赛时忘了
            txt[j]=tp;
            if(!mp[tt])
                ans++,mp[tt]=1;
        }
        printf("%d\n",ans);
        one=0;
    }
    return 0;
}


对于第二种方法。就是hash判重了。
#include<algorithm>
#include<iostream>
#include<string.h>
#include<sstream>
#include<stdio.h>
#include<math.h>
#include<vector>
#include<string>
#include<queue>
#include<set>
#include<map>//常用的头文件。不知道有什么用的可以百度。
//#pragma comment(linker,"/STACK:1024000000,1024000000")
using namespace std;
const int INF=0x3f3f3f3f;
const double eps=1e-8;
const double PI=acos(-1.0);
const int maxn=16000010;
const int maxm=8000007;
const int mod=1000007;
typedef __int64 ll;
char txt[maxn],tp;
int nc;
int Hash[maxm],id[28];
int getkey(int st,int len)//获取hash值
{
    int i,key=0;
    for(i=0;i<len;i++)
        key=key*nc+id[txt[st+i]-‘a‘];
    return key;
}
int main()
{
    int i,n,t,tp,len,ans,key,cnt;

    scanf("%d",&t);
    while(t--)
    {
        scanf("%d%d",&n,&nc);
        scanf("%s",txt);
        memset(Hash,0,sizeof Hash);
        memset(id,-1,sizeof id);
        ans=cnt=0;
        len=strlen(txt);
        for(i=0;i<len;i++)
        {
            tp=txt[i]-‘a‘;
            if(id[tp]==-1)
                id[tp]=cnt++;//压缩字母的数值
            if(cnt==nc)
                break;
        }
        for(i=n;i<=len;i++)
        {
            key=getkey(i-n,n);
            key%=maxn;
            if(!Hash[key])
                ans++,Hash[key]=1;
        }
        printf("%d\n",ans);
        if(t)//判断是否是第一个输入。因为样例之间要空一行。
            printf("\n");
    }
    return 0;
}

比较两种做法。更喜欢第一种。简单快捷。最重要的准确性高。第二种方法数据大了完全就不行!但是这也是一种思路。a=所谓hash如果你的hash函数好。冲突率很小的话。也不妨一试。

MapReduce API基本概念 序列化,Reporter参数,回调机制 ---《hadoop技术内幕》读书笔记,布布扣,bubuko.com

MapReduce API基本概念 序列化,Reporter参数,回调机制 ---《hadoop技术内幕》读书笔记

上一篇:运用painter Charcoal(炭笔)绘制漂亮气质女孩


下一篇:Photoshop设计绚丽时尚的闹钟图标