布隆过滤器

2024-02-15 16:12:10

位图在优点上，效率高效，并且具有空间小的优点，但缺点很明显，只能处理整型，因为其属性全部是描述整型的，因此在处理日常生活中很多的字符串上，就很有必要了，，因此将位图进行改造，变成布隆过滤器，就可以解决这个问题。

首先我们的思路肯定是，将字符串变成整型，这是毋庸置疑的，哈希的字符串算法很多，所以这点不是很难，但是问题是，在位图中，ps：这里的整型都是无符号的，整型是具有唯一的属性，因此它是一一映射，所以当你找到的时候是一定存在的，但是字符串无论使用什么算法，都可能导致重叠，因为对属性的描述不具有唯一性。因此字符串的哈希冲突就是很需要解决的事情了，布隆提出了一个缓解的方式，就是用多个位来映射这个字符串，这样就可以很大程度上缓解哈希冲突。

但是布隆过滤器，如名字可知，它是过滤的，能过过滤出未出现的信息。但是对于确定一个信息是否存在，是不准确的，即，只能知道你不在，但是不能知道你在。

代码如下

#pragma once
#include "bitset.h"
#include <string>

namespace whc
{
	struct Hashs1
	{
		size_t operator()(const std::string& s)
		{
			size_t hash = 0;
			for (int i = 0; i < s.size(); i++)
			{
				hash *= 131;
				hash += s[i];
			}

			return hash;
		}
	};

	struct Hashs2
	{
		size_t operator() (const std::string& s)
		{
			size_t hash = 0;
			size_t magic = 63689; // 魔数
			for (size_t i = 0; i < s.size(); ++i)
			{
				hash *= magic;
				hash += s[i];
				magic *= 378551;
			}

			return hash;
		}
	};

	struct Hashs3
	{
		size_t operator() (const std::string& s)
		{
			size_t hash = 0;
			for (size_t i = 0; i < s.size(); ++i)
			{
				hash *= 65599;
				hash += s[i];
			}

			return hash;
		}
	};

	template<class K = std::string, class Hash1 = Hashs1,
		class Hash2 = Hashs2, class Hash3 = Hashs3>
	class bloomfiter
	{
	public:
		bloomfiter(size_t num)
			:_bs(num*5)
			,_N(num*5)
		{}


		void set(const K& key)
		{
			size_t index1 = Hash1()(key) % _N;
			size_t index2 = Hash2()(key) % _N;
			size_t index3 = Hash3()(key) % _N;

			_bs.set(index1);
			_bs.set(index2);
			_bs.set(index3);
		}

		bool test(const K& key)
		{
			size_t index1 = Hash1()(key);
			if (_bs.test(index1) == false)
				return false;

			size_t index2 = Hash2()(key);
			if (_bs.test(index2) == false)
				return false;

			size_t index3 = Hash3()(key);
			if (_bs.test(index3) == false)
				return false;

			return true;
		}

	private:
		bitset _bs;
		size_t _N;
	};
}

布隆过滤器的底层用位图实现，我们使用了三个字符串算法，来进行三次映射，用仿函数来实现泛型。显然布隆过滤器不支持删除，因为一个位置有可能映射了多个字符串，因此不能删除。

码农公寓

相关文章