mysql – Django ORM – 具有不同select子句的Grouped聚合

想象一下,我们有Django ORM模型Meetup,其定义如下:

class Meetup(models.Model):
    language = models.CharField()
    speaker = models.CharField()
    date = models.DateField(auto_now=True)

我想使用单个查询来获取语言,演讲者和日期
每种语言的最新活动.

>>> Meetup.objects.create(language='python', speaker='mike')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='python', speaker='ryan')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='noah')
<Meetup: Meetup object>
>>> Meetup.objects.create(language='node', speaker='shawn')
<Meetup: Meetup object>
>>> Meetup.objects.values("language").annotate(latest_date=models.Max("date")).values("language", "speaker", "latest_date")
[
    {'speaker': u'mike', 'language': u'python', 'latest_date': ...}, 
    {'speaker': u'ryan', 'language': u'python', 'latest_date': ...}, 
    {'speaker': u'noah', 'language': u'node', 'latest_date': ...}, 
    {'speaker': u'shawn', 'language': u'node', 'latest_date': ...}, 
]

D’哦!我们正在接受最新的活动,但错误的分组!

好像我需要一种方法来GROUP BY语言,但SELECT需要一种不同的方式
一组字段?

更新 – 这种查询似乎很容易在SQL中表达:

SELECT language, speaker, MAX(date)
FROM app_meetup
GROUP BY language;

我喜欢这种方法,不使用Django的raw() – 这可能吗?

更新2 – 经过大量搜索,似乎在SO上有类似的问题:

> Django Query that gets the most recent objects
> How can I do a greatest n per group query in Django
> MySQL calls this sort of query a group-wise maximum of a certain column.

更新3 – 最后,在@ danihp的帮助下,它似乎是你能做到的最好的
是两个查询.我使用了以下方法:

# Abuse the fact that the latest Meetup always has a higher PK to build
# a ValuesList of the latest Meetups grouped by "language".
latest_meetup_pks = (Meetup.objects.values("language")
                                   .annotate(latest_pk=Max("pk"))
                                   .values_list("latest_pk", flat=True))

# Use a second query to grab those latest Meetups!
Meetup.objects.filter(pk__in=latest_meetup_pks)

这个问题是我上一个问题的后续问题:

Django ORM – Get latest record for group

解决方法:

这种查询很容易解释但很难编写.如果这是SQL,我将建议你一个CTE过滤查询,按行排序超过分区按语言排序的日期(desc)

但这不是SQL,这是django查询api.简单的方法是对每种语言进行查询:

languages = Meetup.objects.values("language", flat = True).distinct.order_by()
last_by_language = [  Meetup
                     .objects
                     .filter( language = l )
                     .latest( 'date' )
                     for l in languages
                    ]

如果某些语言没有会议,则会崩溃.
另一种方法是获取每种语言的所有最大数据:

last_dates = ( Meetup
             .objects
             .values("language")
             .annotate(ldate=models.Max("date"))
             .order_by() )

q= reduce(lambda q,meetup: 
     q | ( Q( language = meetup["language"] ) & Q( date = meetup["ldate"] ) ), 
     last_dates, Q())  

your_query = Meetup.objects.filter(q)

也许有人可以解释如何在没有原始sql的单个查询中执行此操作.

编辑到期OP评论

您正在寻找:

"SELECT language, speaker, MAX(date) FROM app_meetup GROUP BY language"

并非所有rdbms都支持此表达式,因为未包含在select子句上的聚合函数中的所有字段都应出现在group by子句中.在您的情况下,发言者在选择子句(没有聚合函数)但不出现在group by中.

在mysql中,他们不是保证,而不是显示结果发言人与最大日期匹配.因为这样,我们不会面临简单的查询.

引用MySQL docs

In standard SQL, a query that includes a GROUP BY clause cannot refer
to nonaggregated columns in the select list that are not named in the
GROUP BY clause…However, this is useful primarily when all values
in each nonaggregated column not named in the GROUP BY are the same
for each group.

符合您要求的最接近的查询是:

Reults = (   Meetup
             .objects
             .values("language","speaker")
             .annotate(ldate=models.Max("date"))
             .order_by() )
上一篇:Leetcode 506: Relative Ranks


下一篇:c# – 在DotLiquid中写一个递归循环