离散化,作为一种较为基础的算法,可以说极其常见。那么何为离散化呢?根据百度百科说法,离散化就是将无限空间的有限元素映射到有限空间上。通俗地说,就是在不改变数据相对大小的情况下,对数据进行缩小操作。举个例子,给定一串数据1,99,72,72,6,100.我们可以先对其排序,变为1,6,72,72,99,100.接着将其值改变为1,2,3,3,4,5.最后按照原有顺序放回1,4,3,3,2,5.这样,我们就成功对这组数据进行了离散化处理。
为了进行离散化操作,我们需要使用sort()函数以O(nlogn)排序,还需要unique()函数将重复的元素置于后方,以及lower_bound(a)函数二分得到不小于a的元素最小位置指针。程序如下所示。
#include<bits/stdc++.h> using namespace std; int a[300005],c[300005]; int main(){ int n,k; cin>>n>>k; for(int i=1;i<=n;i++){ scanf("%d",&c[i]); a[i]=c[i]; } sort(c+1,c+n+1);//排序之后才可以使用unique int m=unique(c+1,c+n+1)-c;//离散化后的元素个数 for(int i=1;i<=n;i++){ a[i]=lower_bound(c+1,c+m+1,a[i])-c;//得到离散化数组 } }
我们使用离散化,往往是为了降低内存使用量。例如当我们想要求一个数组中某个元素的出现次数(数组元素个数n不超过1e6,元素大小不超过1e9),其最为简单的做法就是建立一个cnt数组,记录每个元素出现次数,由于元素大小可达到1e9,我们就要建立一个长达1e9的数组,显然这超出了内存限制。但是我们发现这个数组的元素个数n不超过1e6,我们可以建立一个长度为1e6的离散化数组,将原数组每个元素映射到这个离散化数组上。由于其相对大小不变,既可以完成题目要求,又不会超过内存限制。
然而,这种做法依然不够方便,离散化之后对于原来的元素的查询很不便利,代码量较大,我们可以使用stl中的map容器来直接实现离散化。
map函数本质是一棵由键值对组成的红黑树。其基本操作有:
1.map<A,B>mp: 建立一个名字叫mp,下标类型为A,元素类型为B的映射表。
2.mp[a]=b:将这个“数组"中下标为a的位置的值改为b。
3.mp[a]:访问下标为a的值。
4.mp.size():返回映射表中的键值对个数。
5.mp.count(key):返回键为key的键值对个数。
6.mp.erase(key):删除键为key的键值对。
即我们可以通过建立一个map,将给定的数组的元素当作键存入map中,而每个元素出现的次数为值,这样同样避免了内存超出限制的风险(不过没有缩小数据,使数据的提取更方便)。
下面以一道atcoder题目为例子。
C - Colorful Candies
Time Limit: 2 sec / Memory Limit: 1024 MB
Score : 300 points
Problem Statement
There are N candies arranged in a row from left to right.
Each of these candies has one color that is one of the 1e9 colors called Color 1, Color 2, ……, and Color 1e9.
For each i=1,2,…,N the color of the ii-th candy from the left is Color ci.
From this row, Takahashi can choose KK consecutive candies and get them.
That is, he can choose an integer ii such that 1≤i≤N?K+1 and get the i-th, (i+1)-th, (i+2)-th, ……, (i+K?1)-th candy from the left.
Takahashi likes to eat colorful candies, so the more variety of colors his candies have, the happier he will be.
Print the maximum possible number of distinct colors in candies he gets.
Constraints
- 1≤K≤N≤3×1e5
- 1≤ci≤1e9
- All values in input are integers.
Sample Input 1
7 3
1 2 1 2 3 3 1
Sample Output 1
3
Sample Input 2
5 5
4 4 4 4 4
Sample Output 2
1
Sample Input 3
10 6
304621362 506696497 304621362 506696497 834022578 304621362 414720753 304621362 304621362 414720753
Sample Output 3
4
本题就是一道典型的离散化题目。我们先求出前k个点的cnt值,之后移动这个区间,每移动一步,cnt[i]--,cnt[i+k]++,每次移动cnt的值加到1时这个区间的答案加一,减到0时这个区间答案减一,遍历一遍求出最大值即为所求。很明显,这样会超出内存限制,于是我们可以使用map容器进行离散化处理。
代码如下所示。
#include<iostream> #include<cstdio> #include<map> using namespace std; map<int,int>cnt;//以颜色为键,以该颜色个数为值 int c[300005]; int main(){ int n,k; scanf("%d %d",&n,&k); for(int i=1;i<=n;i++)scanf("%d",&c[i]); for(int i=1;i<=k;i++)cnt[c[i]]++; int ans=cnt.size();//答案初始化为第一个区间颜色数 for(int i=1;i<=n-k;i++){//区间向右移动 cnt[c[i]]--;//左端点去除 cnt[c[i+k]]++;//右端点加上 if(cnt[c[i]]==0)cnt.erase(c[i]);//没有这个颜色 ans=max(ans,(int)cnt.size());//更新答案 } printf("%d\n",ans); return 0; }
再举一道经典例题
Moscow is hosting a major international conference, which is attended by n scientists from different countries. Each of the scientists knows exactly one language. For convenience, we enumerate all languages of the world with integers from 1 to 1e9.
In the evening after the conference, all n scientists decided to go to the cinema. There are m movies in the cinema they came to. Each of the movies is characterized by two distinct numbers — the index of audio language and the index of subtitles language. The scientist, who came to the movie, will be very pleased if he knows the audio language of the movie, will be almost satisfied if he knows the language of subtitles and will be not satisfied if he does not know neither one nor the other (note that the audio language and the subtitles language for each movie are always different).
Scientists decided to go together to the same movie. You have to help them choose the movie, such that the number of very pleased scientists is maximum possible. If there are several such movies, select among them one that will maximize the number of almost satisfied scientists.
The first line of the input contains a positive integer n (1?≤?n?≤?200?000) — the number of scientists.
The second line contains n positive integers a1,?a2,?...,?an (1?≤?ai?≤?1e9), where ai is the index of a language, which the i-th scientist knows.
The third line contains a positive integer m (1?≤?m?≤?200?000) — the number of movies in the cinema.
The fourth line contains m positive integers b1,?b2,?...,?bm (1?≤?bj?≤?1e9), where bj is the index of the audio language of the j-th movie.
The fifth line contains m positive integers c1,?c2,?...,?cm (1?≤?cj?≤?1e9), where cj is the index of subtitles language of the j-th movie.
It is guaranteed that audio languages and subtitles language are different for each movie, that is bj?≠?cj.
Print the single integer — the index of a movie to which scientists should go. After viewing this movie the number of very pleased scientists should be maximum possible. If in the cinema there are several such movies, you need to choose among them one, after viewing which there will be the maximum possible number of almost satisfied scientists.
If there are several possible answers print any of them.
Examples
input
3
2 3 2
2
3 2
2 3
output
2
input
6
6 3 1 1 3 7
5
1 2 3 4 5
2 3 4 5 1
output
1