转:Redis作者谈Redis应用场景

毫无疑问,Redis开创了一种新的数据存储思路,使用Redis,我们不用在面对功能单调的数据库时,把精力放在如何把大象放进冰箱这样的问题上,而是利用Redis灵活多变的数据结构和数据操作,为不同的大象构建不同的冰箱。希望你喜欢这个比喻。

下面是一篇新鲜出炉的文章,其作者是Redis作者@antirez,他描述了Redis比较适合的一些应用场景,NoSQLFan简单列举在这里,供大家一览:

1.取最新N个数据的操作

比如典型的取你网站的最新文章,通过下面方式,我们可以将最新的5000条评论的ID放在Redis的List集合中,并将超出集合部分从数据库获取

  • 使用LPUSH latest.comments<ID>命令,向list集合中插入数据
  • 插入完成后再用LTRIM latest.comments 0 5000命令使其永远只保存最近5000个ID
  • 然后我们在客户端获取某一页评论时可以用下面的逻辑(伪代码)
FUNCTION get_latest_comments(start,num_items):
id_list = redis.lrange("latest.comments",start,start+num_items-1)
IF id_list.length < num_items
id_list = SQL_DB("SELECT ... ORDER BY time LIMIT ...")
END
RETURN id_list
END

如果你还有不同的筛选维度,比如某个分类的最新N条,那么你可以再建一个按此分类的List,只存ID的话,Redis是非常高效的。

2.排行榜应用,取TOP N操作

这个需求与上面需求的不同之处在于,前面操作以时间为权重,这个是以某个条件为权重,比如按顶的次数排序,这时候就需要我们的sorted set出马了,将你要排序的值设置成sorted set的score,将具体的数据设置成相应的value,每次只需要执行一条ZADD命令即可。

3.需要精准设定过期时间的应用

比如你可以把上面说到的sorted set的score值设置成过期时间的时间戳,那么就可以简单地通过过期时间排序,定时清除过期数据了,不仅是清除Redis中的过期数据,你完全可以把Redis里这个过期时间当成是对数据库中数据的索引,用Redis来找出哪些数据需要过期删除,然后再精准地从数据库中删除相应的记录。

4.计数器应用

Redis的命令都是原子性的,你可以轻松地利用INCR,DECR命令来构建计数器系统。

5.Uniq操作,获取某段时间所有数据排重值

这个使用Redis的set数据结构最合适了,只需要不断地将数据往set中扔就行了,set意为集合,所以会自动排重。

6.实时系统,反垃圾系统

通过上面说到的set功能,你可以知道一个终端用户是否进行了某个操作,可以找到其操作的集合并进行分析统计对比等。没有做不到,只有想不到。

7.Pub/Sub构建实时消息系统

Redis的Pub/Sub系统可以构建实时的消息系统,比如很多用Pub/Sub构建的实时聊天系统的例子。

8.构建队列系统

使用list可以构建队列系统,使用sorted set甚至可以构建有优先级的队列系统。

9.缓存

这个不必说了,性能优于Memcached,数据结构更多样化。

来源:antirez.com

原文:

How to take advantage of Redis just adding it to your stack

Tuesday, 28 June 11
 
Redis is different than other database solutions in many ways: it uses memory as main storage support and disk only for persistence, the data model is pretty unique, it is single threaded and so forth. I think that another big difference is that in order to take advantage of Redis in your production environment you don't need to switch to Redis. You can just use it in order to do new things that were not possible before, or in order to fix old problems.

Switching to Redis is of course an option, and many users are using Redis as primary database since they need features or write speed or latency or some other feature, but as you can guess switching is a big step if you have an already running application in production. Also for some other kind of applications Redis may not be the right database: for instance a Redis data set can't be bigger than available memory, so if you have some big dataapplication and a mostly-reads access pattern, Redis is not the right pick.

However one thing I like about Redis is that it can solve a lot of problems just adding it to your stack to do things that were too slow or impossible with your existing database. This way you start to take confidence with Redis in an incremental way, starting to use it just to optimize or to create new features in your application. This blog post explores a few use cases showing how people added Redis to existing environments to take advantage of Redis set of features. I'll not report specific use cases with site names and exact configurations, I'll just try to show you class of problems that Redis can solve without being your primary database.

Slow latest items listings in your home page

Can I have a penny for every instance of the following query that is running too slow please?

SELECT * FROM foo WHERE ... ORDER BY time DESC LIMIT 10

To have listings like "latest items added by our users" or "latest something else" in web applications is very common, and often a scalability problem. It is pretty counter intuitive that you need to sort stuff if you just want to list items in the same order they were created.

Similar problems can be fixed using a Redis pattern that I'll show you with an example. We have a web application where we want to show the latest 20 comments posted by our users. Near to the latest comments box we also have a link "show all" that links to a page where it is possible to show more than the latest 20 comments, and there is also pagination so that I can see the whole comments "time line".

We also assume that every comment is stored in our database, and has an unique incremental ID field.

We can make both the home page box and the comments time line page with pagination fast using a simple Redis pattern:

    • Every time a new comment is added, we add its ID into a Redis list:LPUSH latest.comments <ID>.
    • We also trim the list to a given length, so that Redis will just hold the latest 5000 items: LTRIM latest.comments 0 5000.
    • Every time we need to get a range of items for our latest comments usages, we call a function that will do the following (in pseudo code):
FUNCTION get_latest_comments(start,num_items):
id_list = redis.lrange("latest.comments",start,start+num_items-1)
IF id_list.length < num_items
id_list = SQL_DB("SELECT ... ORDER BY time LIMIT ...")
END
RETURN id_list
END

What we are doing here is simple. In Redis we are taking a live cache, always updated, of the latest IDs. But we are limited to 5000 IDs, and after the system is started the first time those IDs can be even zero as the list did not existed. So our new function to get the IDs of the latest comments will try to always ask Redis. If our start/count parameters are out of range, we fall back to the database.

We never need to "refresh" the cache with this system, and the SQL database (or other type of on-disk data store) will only be pinged if the user is paginating "far" intervals. So never for the home page, and never for the first pages of our comments time line.

As you can see here Redis is working as a new element. It is not working as a traditional cache, there are no cache refreshes and the info in the Redis instance is always coherent. It is not either working as a database as you can flush the key and everything will continue working. I call it just a "live cache" but there are better names I bet.

Deletion and filtering

Note that it is possible to handle comments deletion using LREM. If deletions are pretty rare another option is to just skip the entry when rendering the specific comment, since our DB query to fetch the comment by ID will report us that the comment is no longe there.

Also many times you want to have different listings with different filters. When this filters are limited in number (for example categories) you can simply use a different Redis list for every different filter you have. After all you are just taking 5000 items per list, and Redis can hold millions of items with little memory. As usually is a compromise, use your creativity!

Leaderboards and related problems

Another very common need that is hard to model with good performances in DBs that are not in-memory is to take a list of items, sorted by a score, updated in real time, with many updates arriving every second.

The classical example is the leaderboard in an online game, for instance a Facebook game, but this pattern can be applied to a number of different scenarios. In the online game example you receive a very high number of score updates by different users. WIth this scores you usually want to:

  • Show a leaderboard with the top #100 scores.
  • Show the user its current global rank.

This operations are trivial using a Redis sorted set, even if you have millions of users and millions of new scores per minute.

This is how mount this pattern: every time a new score is received by an user, we do:

ZADD leaderboard <score> <username>

Note: you may want to use the user ID instead of the username, it is up to your design

To get the top 100 users by score is as easy as ZREVRANGE leaderboard 0 99.

Similarly to tell the user its global rank you just do ZRANK leaderboard <username>.

Note that you can do more than this, for instance it is trivial to show the user the scores of users "near" his position, that is, to show the portion of the leaderboard that includes the score of our user.

Order by user votes and time

A notable variation of the above leaderboard pattern is the implementation of a site like Reddit or Hacker News, where news are ordered accordingly to a forumla similar to:

score = points / time^alpha

So user votes will raise the news in a proportional way, but time will take the news down exponentially. Well the actual algorithm is up to you, this will not change our pattern.

This pattern works in this way, starting from the observation that probably only the latest, for instance, 1000 news are good candidates to stay in the home page, so we can ignore all the others. The implementation is simple:

  • Every time a new news is posted we add the ID into a list, with LPUSH + LTRIM in order to take only the latest 1000 items.
  • There is a worker that gets this list and continually computes the final score of every news in this set of 1000 news. The result is used to populate a sorted set with ZADD. Old news are removed from the sorted set in the mean time as a cleanup operation.

At this point we have a sorted set composed of 1000 news sorted by our score. This sorted set can be queried 100k times per second for the top news, so it will be easy to scale the site this way.

The key idea here is that our sorting, made by the background worker, is not a work proportional to the number of users watching the news site.

For the "just posted" section the list of IDs can be used raw, or using the first pattern proposed in this blog post.

Implement expires on items

Another way to use sorted sets is to index stuff by time. We just use the unix time as score. This can be used in general to index things by time, but a notable usage is to expire things in our main database when a given amount of time has elapsed.

This is the pattern:

  • Every time a new item is added to our (non Redis) database we add it into the sorted set. As score we use the time at which this item should expire, in other words the current_time+time_to_live.
  • There is a background worker doing queries in the sorted set using for instance ZRANGE ... WITHSCORES to take the latest 10 items. If there are scores representing unix times already in the past, we delete this items from the database.

Counting stuff

Redis is a good counter, thanks to INCRBY and other similar commands.

How many times you wanted to add new counters in your database, to take statistics or to show new informations to your users, but avoided it since it is a write-intensive task for your database? This happened to me many times in the past.

Well, just use Redis and don't care! With atomic increments you can take all your counts, reset them atomically with GETSET if needed, put expires in your counters, so that you can take the count of events only if the time difference between those events is less then a given amount of seconds.

For instance using just:

INCR user:<id>
EXPIRE user:<id> 60

You can take the count of how many page views the user did recently, without a pause greater than 60 seconds between page views. When this count reaches, for instance, 20, it is time to show some banner, or reminder, or tip, or what you want.

Unique N items in a given amount of time

Another interesting example of statistic that is trivial to do using Redis but is very hard with other kind of databases is to see how many unique users visited a given resource in a given amount of time. For instance I want to know the number of unique registered users, or IP addresses, that accessed a given article in an online newspaper.

Every time I get a new pageview I just do the following:

SADD page:day1:<page_id> <user_id>

Of course instead of day1 you may want to use the first second of today, as unix time, like: time()-(time()%3600*24), or something like that.

Want to know the number of unique users? Just do SCARD page:day1:<page_id>.

Need to test if a specific user already accessed that page? Just doSISMEMBER page:day1:<page_id>.

Real time analysis of what is happening, for stats, anti spam, or whatever

We did just a few examples, but if you study the Redis command set and combine the data structures in an interesting way you can model an huge number of real time stats with little efforts, in order to power your anti spam systems, or the quality of service you can provide to user thanks to the new information.

Pub/Sub

Do you know that Redis includes a fairly high performance implementation of Pub/Sub?

Redis Pub/Sub is very very simple to use, stable, and fast, with support for pattern matching, ability to subscribe/unsubscribe to channels on the run, and so forth. You can read more about it in the Redis PubSub official documentation.

Queues

You probably already noticed how Redis commands like list push and list pop make it suitable to implement queues, but you can do more than that: Redis has blocking variants of list pop commands that will block if a list is empty.

A common usage of Redis as a queue is the Resque library, implemented and popularized by Github's folks.

With our http://redis.io/commands/rpoplpush list rotation commands it is possible to implement queues with interesting semantics that will make your background workers happier! (For instance you can implement a rotating list to fetch RSS feeds again and again, so that every worker will pick the RSS that was fetched more in the past, and thus needs to be updated ASAP). Similarly using sorted sets it is possible to implement priority queues easily.

Caching

This section alone would deserve a specific blog post... so in short here I'll say that Redis can be used as a replacement for memcached in order to turn your cache into something able to store data in an simpler to update way, so that there is no need to regenerate the data every time. See for reference the first pattern published in this article.

Redis can fix your problems now!

You can use Redis right now to do things that will make your users happier, your systems less complex, your site more responsive. You don't need to replace your current setup in order to use it, just start using Redis to do new things that were otherwise not possible, or hard, or too costly.

Have fun!

You can discuss this entry here or into Hacker News.

上一篇:《趣谈 Linux 操作系统》学习笔记(一):为什么要学 Linux 及学习路径


下一篇:【剑指offer】不用加减乘除做加法,C++实现