缓存算法（内存回收算法）- LRU、FIFO、LFU

2022-05-27 09:47:15

Design and implement a data structure for Least Recently Used (LRU) cache. It should support the following operations: get and set.

get(key) - Get the value (will always be positive) of the key if the key exists in the cache, otherwise return -1.
set(key, value) - Set or insert the value if the key is not already present. When the cache reached its capacity, it should invalidate the least recently used item before inserting a new item.

题目大意：为LRU Cache设计一个数据结构，它支持两个操作：

　　　1）get(key)：如果key在cache中，则返回对应的value值，否则返回-1

　　　2）set(key,value):如果key不在cache中，则将该(key,value)插入cache中（注意，如果cache已满，则必须把最近最久未使用的元素从cache中删除）；如果key在cache中，则重置value的值。

解题思路：题目让设计一个LRU Cache，即根据LRU算法设计一个缓存。在这之前需要弄清楚LRU算法的核心思想，LRU全称是Least

Recently Used，即最近最久未使用的意思。在操作系统的内存管理中，有一类很重要的算法就是内存页面置换算法（包括FIFO，LRU,LFU等几种常见页面置换算法）。事实上，Cache算法和内存页面置换算法的核心思想是一样的：都是在给定一个限定大小的空间的前提下，设计一个原则如何来更新和访问其中的元素。下面说一下LRU算法的核心思想，LRU算法的设计原则是：如果一个数据在最近一段时间没有被访问到，那么在将来它被访问的可能性也很小。也就是说，当限定的空间已存满数据时，应当把最久没有被访问到的数据淘汰。

　　而用什么数据结构来实现LRU算法呢？可能大多数人都会想到：用一个数组来存储数据，给每一个数据项标记一个访问时间戳，每次插入新数据项的时候，先把数组中存在的数据项的时间戳自增，并将新数据项的时间戳置为0并插入到数组中。每次访问数组中的数据项的时候，将被访问的数据项的时间戳置为0。当数组空间已满时，将时间戳最大的数据项淘汰。

　　这种实现思路很简单，但是有什么缺陷呢？需要不停地维护数据项的访问时间戳，另外，在插入数据、删除数据以及访问数据时，时间复杂度都是O(n)。

　　那么有没有更好的实现办法呢？

　　那就是利用链表和hashmap。当需要插入新的数据项的时候，如果新数据项在链表中存在（一般称为命中），则把该节点移到链表头部，如果不存在，则新建一个节点，放到链表头部，若缓存满了，则把链表最后一个节点删除即可。在访问数据的时候，如果数据项在链表中存在，则把该节点移到链表头部，否则返回-1。这样一来在链表尾部的节点就是最近最久未访问的数据项。

　　总结一下：根据题目的要求，LRU Cache具备的操作：

　　1）set(key,value)：如果key在hashmap中存在，则先重置对应的value值，然后获取对应的节点cur，将cur节点从链表删除，并移动到链表的头部；若果key在hashmap不存在，则新建一个节点，并将节点放到链表的头部。当Cache存满的时候，将链表最后一个节点删除即可。

　　2）get(key)：如果key在hashmap中存在，则把对应的节点放到链表头部，并返回对应的value值；如果不存在，则返回-1。

　　仔细分析一下，如果在这地方利用单链表和hashmap，在set和get中，都有一个相同的操作就是将在命中的节点移到链表头部，如果按照传统的遍历办法来删除节点可以达到题目的要求么？第二，在删除链表末尾节点的时候，必须遍历链表，然后将末尾节点删除，这个能达到题目的时间要求么？

　　试一下便知结果：

　　第一个版本实现：

#include <iostream>
#include <map>
#include <algorithm>

using namespace std;
 
struct Node
{

    int key;

    int value;

    Node *next;
};
 
class LRUCache{

private:

    int count;

    int size ;

    map<int,Node *> mp;

    Node *cacheList;

public:

    LRUCache(int capacity) {

      size = capacity;

      cacheList = NULL;

      count = 0;

    }

    int get(int key) {

        if(cacheList==NULL)

            return -1;

        map<int,Node *>::iterator it=mp.find(key);

        if(it==mp.end())  //如果在Cache中不存在该key， 则返回-1

        {

            return -1; 

        }

        else

        {

            Node *p = it->second;   

            pushFront(p);    //将节点p置于链表头部

        }

        return cacheList->value;  

    }

    void set(int key, int value) {

        if(cacheList==NULL)   //如果链表为空，直接放在链表头部

        {

            cacheList = (Node *)malloc(sizeof(Node));

            cacheList->key = key;

            cacheList->value = value;

            cacheList->next = NULL;

            mp[key] = cacheList;

            count++;

        }

        else   //否则，在map中查找

        {

            map<int,Node *>::iterator it=mp.find(key);

            if(it==mp.end())   //没有命中

            {

                if(count == size)  //cache满了

                {

                    Node * p = cacheList;

                    Node *pre = p;

                    while(p->next!=NULL)

                    {

                        pre = p;

                        p= p->next; 

                    }

                    mp.erase(p->key);

                    count--;

                    if(pre==p)         //说明只有一个节点

                        p=NULL;

                    else

                        pre->next = NULL;

                    free(p);

                }

                Node * newNode = (Node *)malloc(sizeof(Node)); 

                newNode->key = key;

                newNode->value = value;

                newNode->next = cacheList;

                cacheList = newNode;

                mp[key] = cacheList;

                count++;   

            }

            else

            {

                Node *p = it->second;   

                p->value = value;

                pushFront(p);

            }

        }

    }

    void pushFront(Node *cur)   //单链表删除节点,并将节点移动链表头部，O(n)

    {

        if(count==1)

            return;

        if(cur==cacheList)

            return;

        Node *p = cacheList;

        while(p->next!=cur)

        {

            p=p->next;      

        }

        p->next = cur->next;   //删除cur节点

        cur->next = cacheList;

        cacheList = cur;

    }

    void printCache(){

        Node *p = cacheList;

        while(p!=NULL)

        {

            cout<<p->key<<" ";

            p=p->next;

        }

        cout<<endl;

    }
};
 
int main(void)
{

    /*LRUCache cache(3);

    cache.set(2,10);

    cache.printCache();

    cache.set(1,11);

    cache.printCache();

    cache.set(2,12);

    cache.printCache();

    cache.set(1,13);

    cache.printCache();

    cache.set(2,14);

    cache.printCache();

    cache.set(3,15);

    cache.printCache();

    cache.set(4,100);

    cache.printCache();

    cout<<cache.get(2)<<endl;

    cache.printCache();*/

    LRUCache cache(2);

    cout<<cache.get(2)<<endl;

    cache.set(2,6);

    cache.printCache();

    cout<<cache.get(1)<<endl;

    cache.set(1,5);

    cache.printCache();

    cache.set(1,2);

    cache.printCache();

    cout<<cache.get(1)<<endl;

    cout<<cache.get(2)<<endl;

    return 0;
}

　　提交之后，提示超时：

　　因此要对算法进行改进，如果把pushFront时间复杂度改进为O(1)的话是不是就能达到要求呢？

　　但是 在已知要删除的节点的情况下，如何在O(1)时间复杂度内删除节点？

　　如果知道当前节点的前驱节点的话，则在O(1)时间复杂度内删除节点是很容易的。而在无法获取当前节点的前驱节点的情况下，能够实现么？对，可以实现的。

　　具体的可以参照这几篇博文：

　　http://www.cnblogs.com/xwdreamer/archive/2012/04/26/2472102.html

　　http://www.nowamagic.net/librarys/veda/detail/261

　　原理：假如要删除的节点是cur，通过cur可以知道cur节点的后继节点curNext，如果交换cur节点和curNext节点的数据域，然后删除curNext节点（curNext节点是很好删除地），此时便在O(1)时间复杂度内完成了cur节点的删除。

　　改进版本1：

void pushFront(Node *cur)  //单链表删除节点,并将节点移动链表头部，O(1)

    {

        if(count==1)

            return;

        //先删除cur节点 ，再将cur节点移到链表头部

        Node *curNext = cur->next;

        if(curNext==NULL)  //如果是最后一个节点

        {

            Node * p = cacheList;

            while(p->next!=cur)

            {

                p=p->next;

            }

            p->next = NULL; 

            cur->next = cacheList;

            cacheList = cur;

        }

        else    //如果不是最后一个节点

        {

            cur->next = curNext->next;   

            int tempKey = cur->key;

            int tempValue = cur->value;

            cur->key = curNext->key;

            cur->value = curNext->value;

            curNext->key = tempKey;

            curNext->value = tempValue;

            curNext->next = cacheList;

            cacheList  = curNext;

            mp[curNext->key] = curNext;

            mp[cur->key] = cur;

        }

    }

　　提交之后，提示Accepted，耗时492ms，达到要求。

　　有没有更好的实现办法，使得删除末尾节点的复杂度也在O(1)？那就是利用双向链表，并提供head指针和tail指针，这样一来，所有的操作都是O(1)时间复杂度。

　　改进版本2：

#include <iostream>
#include <map>
#include <algorithm>

using namespace std;
 
struct Node
{

    int key;

    int value;

    Node *pre;

    Node *next;
};
 
class LRUCache{

private:

    int count;

    int size ;

    map<int,Node *> mp;

    Node *cacheHead;

    Node *cacheTail;

public:

    LRUCache(int capacity) {

      size = capacity;

      cacheHead = NULL;

      cacheTail = NULL;

      count = 0;

    }

    int get(int key) {

        if(cacheHead==NULL)

            return -1;

        map<int,Node *>::iterator it=mp.find(key);

        if(it==mp.end())  //如果在Cache中不存在该key， 则返回-1

        {

            return -1; 

        }

        else

        {

            Node *p = it->second;   

            pushFront(p);    //将节点p置于链表头部

        }

        return cacheHead->value;  

    }

    void set(int key, int value) {

        if(cacheHead==NULL)   //如果链表为空，直接放在链表头部

        {

            cacheHead = (Node *)malloc(sizeof(Node));

            cacheHead->key = key;

            cacheHead->value = value;

            cacheHead->pre = NULL;

            cacheHead->next = NULL;

            mp[key] = cacheHead;

            cacheTail = cacheHead;

            count++;

        }

        else   //否则，在map中查找

        {

            map<int,Node *>::iterator it=mp.find(key);

            if(it==mp.end())   //没有命中

            {

                if(count == size)  //cache满了

                {

                    if(cacheHead==cacheTail&&cacheHead!=NULL)  //只有一个节点

                    {

                        mp.erase(cacheHead->key);

                        cacheHead->key = key;

                        cacheHead->value = value;

                        mp[key] = cacheHead;

                    }

                    else

                    {

                        Node * p =cacheTail;

                        cacheTail->pre->next = cacheTail->next;  

                        cacheTail = cacheTail->pre;
 
                        mp.erase(p->key);

                        p->key= key;

                        p->value = value;

                        p->next = cacheHead;

                        p->pre = cacheHead->pre;

                        cacheHead->pre = p;

                        cacheHead = p;

                        mp[cacheHead->key] = cacheHead;

                    }

                }

                else

                {

                    Node * p = (Node *)malloc(sizeof(Node));

                    p->key = key;

                    p->value = value;

                    p->next = cacheHead;

                    p->pre = NULL;

                    cacheHead->pre = p;

                    cacheHead = p;

                    mp[cacheHead->key] = cacheHead;

                    count++;

                }

            }

            else

            {

                Node *p = it->second;   

                p->value = value;

                pushFront(p);

            }

        }

    }
 
    void pushFront(Node *cur)   //双向链表删除节点,并将节点移动链表头部，O(1)

    {

        if(count==1)

            return;

        if(cur==cacheHead)

            return;

        if(cur==cacheTail)

        {

            cacheTail = cur->pre;

        }

        cur->pre->next = cur->next;   //删除节点

        if(cur->next!=NULL)

            cur->next->pre = cur->pre;

        cur->next = cacheHead;

        cur->pre = NULL;

        cacheHead->pre = cur;

        cacheHead = cur;   

    }

    void printCache(){

        Node *p = cacheHead;

        while(p!=NULL)

        {

            cout<<p->key<<" ";

            p=p->next;

        }

        cout<<endl;

    }
};
 
int main(void)
{

    LRUCache cache(3);

    cache.set(1,1);

    //cache.printCache();

    cache.set(2,2);

    //cache.printCache();

    cache.set(3,3);

    cache.printCache();

    cache.set(4,4);

    cache.printCache();

    cout<<cache.get(4)<<endl;

    cache.printCache();

    cout<<cache.get(3)<<endl;

    cache.printCache();

    cout<<cache.get(2)<<endl;

    cache.printCache();

    cout<<cache.get(1)<<endl;

    cache.printCache();

    cache.set(5,5);

    cache.printCache();

    cout<<cache.get(1)<<endl;

    cout<<cache.get(2)<<endl;

    cout<<cache.get(3)<<endl;

    cout<<cache.get(4)<<endl;

    cout<<cache.get(5)<<endl;

    return 0;
}

　　提交测试结果：

　　可以发现，效率有进一步的提升。

　　其实在STL中的list就是一个双向链表，如果希望代码简短点，可以用list来实现：

　　具体实现：

#include <iostream>
#include <map>
#include <algorithm>
#include <list>

using namespace std;
 
struct Node
{

    int key;

    int value;
};
 
class LRUCache{

private:

    int maxSize ;

    list<Node> cacheList;

    map<int, list<Node>::iterator > mp;

public:

    LRUCache(int capacity) {

      maxSize = capacity;

    }

    int get(int key) {

        map<int, list<Node>::iterator >::iterator it = mp.find(key);

        if(it==mp.end())      //没有命中

        {

            return -1;

        }

        else  //在cache中命中了

        {

            list<Node>::iterator listIt = mp[key];

            Node newNode;

            newNode.key = key;

            newNode.value = listIt->value;

            cacheList.erase(listIt);               //先删除命中的节点      

            cacheList.push_front(newNode);   //将命中的节点放到链表头部

            mp[key] = cacheList.begin();

        }

        return cacheList.begin()->value;

    }

    void set(int key, int value) {

        map<int, list<Node>::iterator >::iterator it = mp.find(key);

        if(it==mp.end())   //没有命中

        {

            if(cacheList.size()==maxSize)  //cache满了

            {

                mp.erase(cacheList.back().key);    

                cacheList.pop_back();

            }

            Node newNode;

            newNode.key = key;

            newNode.value = value;

            cacheList.push_front(newNode);

            mp[key] = cacheList.begin();

        }

        else  //命中

        {

            list<Node>::iterator listIt = mp[key];

            cacheList.erase(listIt);               //先删除命中的节点          

            Node newNode;

            newNode.key = key;

            newNode.value = value;

            cacheList.push_front(newNode);   //将命中的节点放到链表头部

            mp[key] = cacheList.begin();

        }

    }
};
 
int main(void)
{

    LRUCache cache(3);

    cache.set(1,1);

    cache.set(2,2);

    cache.set(3,3);

    cache.set(4,4);

    cout<<cache.get(4)<<endl;

    cout<<cache.get(3)<<endl;

    cout<<cache.get(2)<<endl;

    cout<<cache.get(1)<<endl;

    cache.set(5,5);

    cout<<cache.get(1)<<endl;

    cout<<cache.get(2)<<endl;

    cout<<cache.get(3)<<endl;

    cout<<cache.get(4)<<endl;

    cout<<cache.get(5)<<endl;

    return 0;
}

作者：海子
出处：http://www.cnblogs.com/dolphin0520/
本博客中未标明转载的文章归作者海子和博客园共有，欢迎转载，但未经作者同意必须保留此段声明，且在文章页面明显位置给出原文连接，否则保留追究法律责任的权利。

xxxxxxxxxx

在前一篇文章中通过leetcode的一道题目了解了LRU算法的具体设计思路，下面继续来探讨一下另外两种常见的Cache算法：FIFO、LFU

1.FIFO算法

　　FIFO（First in First out），先进先出。其实在操作系统的设计理念中很多地方都利用到了先进先出的思想，比如作业调度（先来先服务），为什么这个原则在很多地方都会用到呢？因为这个原则简单、且符合人们的惯性思维，具备公平性，并且实现起来简单，直接使用数据结构中的队列即可实现。

　　在FIFO Cache设计中，核心原则就是：如果一个数据最先进入缓存中，则应该最早淘汰掉。也就是说，当缓存满的时候，应当把最先进入缓存的数据给淘汰掉。在FIFO Cache中应该支持以下操作;

　　get(key)：如果Cache中存在该key，则返回对应的value值，否则，返回-1；

　　set(key,value)：如果Cache中存在该key，则重置value值；如果不存在该key，则将该key插入到到Cache中，若Cache已满，则淘汰最早进入Cache的数据。

　　举个例子：假如Cache大小为3，访问数据序列为set(1,1),set(2,2),set(3,3),set(4,4),get(2),set(5,5)

　　则Cache中的数据变化为：

　　(1,1) set(1,1)

　　(1,1) (2,2) set(2,2)

　　(1,1) (2,2) (3,3) set(3,3)

　　(2,2) (3,3) (4,4) set(4,4)

　　(2,2) (3,3) (4,4) get(2)

　　(3,3) (4,4) (5,5) set(5,5)

　　那么利用什么数据结构来实现呢？

　　下面提供一种实现思路：

　　利用一个双向链表保存数据，当来了新的数据之后便添加到链表末尾，如果Cache存满数据，则把链表头部数据删除，然后把新的数据添加到链表末尾。在访问数据的时候，如果在Cache中存在该数据的话，则返回对应的value值；否则返回-1。如果想提高访问效率，可以利用hashmap来保存每个key在链表中对应的位置。

2.LFU算法

　　LFU（Least Frequently Used）最近最少使用算法。它是基于“如果一个数据在最近一段时间内使用次数很少，那么在将来一段时间内被使用的可能性也很小”的思路。

　　注意LFU和LRU算法的不同之处，LRU的淘汰规则是基于访问时间，而LFU是基于访问次数的。举个简单的例子：

　　假设缓存大小为3，数据访问序列为set(2,2),set(1,1),get(2),get(1),get(2),set(3,3),set(4,4)，

　　则在set(4,4)时对于LFU算法应该淘汰(3,3)，而LRU应该淘汰(1,1)。

　　那么LFU Cache应该支持的操作为：

　　get(key)：如果Cache中存在该key，则返回对应的value值，否则，返回-1；

　　set(key,value)：如果Cache中存在该key，则重置value值；如果不存在该key，则将该key插入到到Cache中，若Cache已满，则淘汰最少访问的数据。

　　为了能够淘汰最少使用的数据，因此LFU算法最简单的一种设计思路就是利用一个数组存储数据项，用hashmap存储每个数据项在数组中对应的位置，然后为每个数据项设计一个访问频次，当数据项被命中时，访问频次自增，在淘汰的时候淘汰访问频次最少的数据。这样一来的话，在插入数据和访问数据的时候都能达到O(1)的时间复杂度，在淘汰数据的时候，通过选择算法得到应该淘汰的数据项在数组中的索引，并将该索引位置的内容替换为新来的数据内容即可，这样的话，淘汰数据的操作时间复杂度为O(n)。

　　另外还有一种实现思路就是利用小顶堆+hashmap，小顶堆插入、删除操作都能达到O(logn)时间复杂度，因此效率相比第一种实现方法更加高效。

　　如果哪位朋友有更高效的实现方式（比如O(1)时间复杂度），不妨探讨一下，不胜感激。

原文链接：[http://wely.iteye.com/blog/2227844]

码农公寓

1.FIFO算法

2.LFU算法

相关文章