Django3中分组查询的一个坑

最近在用Django3的分组查询时,发现一个问题,通过查阅官方文档最后总算搞清楚了。

一、问题描述

1.1 问题

我的数据模型Teacher,有这么几个字段:id, name, department, rank, 我想查询每个部分有多少人。

1.2 我的代码

# 就是根据depantment分组,然后利用count函数数出每一组有多少个名字
models.Teacher.objects.values('department').annotate(t_num=Count('name'))

1.3 出现的结果:

# 查询出的结果不是我想要的,似乎分组的依据不对。加上query
models.Teacher.objects.values('department').annotate(t_num=Count('name')).query
# 根据打印出的SQL语句,GROUP_BY 后面竟然有两个依据字段,那么出来的分组是根据这两个字段的唯一性来分组的。并且多出来的那个字段似乎是最后一个字段。

1.4 那多出来的那个分组依据字段是哪里来的呢?

二、解决问题

2.1 经过多次尝试,发现先加上order_by('department') 可以查询出正确的结果

models.Teacher.objects.order_by(
  'department').values('department').annotate(t_num=Count('name'))

2.2 可问题究竟出在哪儿?那个神秘的字段究竟是从哪儿来的?百度了半天也没有一个合理的解释。

三、查阅官方文档

有问题实在解决不了,最好不是去看官方文档吧!这段话的意思就是说:如果前面的queryset有用过order_by(),那么这个字段还会影响后面的分组查询操作。你要想得到正确结果,后面要加上一个order_by()。这个括号里就什么也不要写。这样就可以清除前面order_by的影响。

  • Interaction with order_by()

Fields that are mentioned in the order_by() part of a queryset are used when selecting the output data, even if they are not otherwise specified in the values() call. These extra fields are used to group “like” results together and they can make otherwise identical result rows appear to be separate. This shows up, particularly, when counting things.

By way of example, suppose you have a model like this:

from django.db import models

class Item(models.Model):
    name = models.CharField(max_length=10)
    data = models.IntegerField()

If you want to count how many times each distinct data value appears in an ordered queryset, you might try this:

items = Item.objects.order_by('name')
# Warning: not quite correct!
items.values('data').annotate(Count('id'))

…which will group the Item objects by their common data values and then count the number of id values in each group. Except that it won’t quite work. The ordering by name will also play a part in the grouping, so this query will group by distinct (data, name) pairs, which isn’t what you want. Instead, you should construct this queryset:

items.values('data').annotate(Count('id')).order_by()

…clearing any ordering in the query. You could also order by, say, data without any harmful effects, since that is already playing a role in the query.

This behavior is the same as that noted in the queryset documentation for distinct() and the general rule is the same: normally you won’t want extra columns playing a part in the result, so clear out the ordering, or at least make sure it’s restricted only to those fields you also select in a values() call.

Note

You might reasonably ask why Django doesn’t remove the extraneous columns for you. The main reason is consistency with distinct() and other places: Django never removes ordering constraints that you have specified (and we can’t change those other methods’ behavior, as that would violate our API stability policy).

四、终极答案

官方文档虽然解释了确实是order_by()在影响后面的操作,也给出了清除影响的解决方案。可我们的查询在前面并没有使用order_by。于是,那个困扰我的依据字段究竟是从哪里来的呢?

# 我尝试了一下这样的语句
models.Teacher.objects.all().query
# 结果出来后,看到SQL语句后面有一个ORDER_BY()。依据字段就是最后一个字段。
# 也就是说,这个all()查询操作会自动加上一个默认排序。到这儿,才算是明白了。
# 后来经过测试,果然在后面加上ORDER_BY(),一切就正常了。
上一篇:【数据分析面试】大厂高频SQL笔试题(四)


下一篇:数据库基础笔记(MySQL)2 —— 基础查询 ①