如何优化/简化Django对象的堆排序? (不能使用模块和数据库排序)

我必须为我作为django实习考试而得到的作业寻求帮助.我必须用兔子和它们的胡萝卜制作和想象的api.每只兔子都应该有许多胡萝卜,但是必须将api设计为允许轻松添加其他种类的蔬菜.我拒绝每种蔬菜的整数字段,而是使用具有蔬菜类型和值的蔬菜对象.

问题是,任务还包括列出按胡萝卜排序的兔子,降落的兔子.他们要我实现堆排序,不允许数据库排序,没有外部库.尽管我对此没有任何问题,但他们在考虑到时间限制时遇到了麻烦-将20 000只兔子在30秒以内分类,理想情况下是5秒. 200只兔子已经花了5秒(只需排序并序列化为json).

我进行了一个查询集,只包含带有“胡萝卜”蔬菜的兔子.然后,我将其强制进入正常列表并在其上运行heapsort函数.

我需要如何将其更改为更快?可能吗?如果有人帮助我,我将非常高兴.先感谢您!

我的模特:

class Bunny(models.Model):
    """Bunny model for bunny usage"""
    def __str__(self):
        return self.name + " " + str(list(self.vegetables.all()))

    name = models.CharField("Name", max_length=50)
    userAccount = models.ForeignKey(User, on_delete=models.CASCADE)

    def getVegetable(self, vegetableType):
        for vegetable in self.vegetables.all():
            if vegetable.vegetableType == vegetableType:
                return vegetable
        return False


class Vegetable(models.Model):
    """Vegetable model for storing vegetable counts"""
    def __str__(self):
        return self.vegetableType + ":" + str(self.value)

    vegetableType = models.CharField(max_length=30, choices=vegetableChoices)
    value = models.PositiveIntegerField(default=0, validators=[MinValueValidator(0)])
    bunny = models.ForeignKey(Bunny, related_name="vegetables", on_delete=models.CASCADE)

我的堆排序函数:

def heapsort(bunnies, vegetableType):
    """Heapsort function for bunnies, works in place, descending"""

    for start in range((len(bunnies) - 2) // 2, -1, -1):
        siftdown(bunnies, start, len(bunnies) - 1, vegetableType)

    for end in range(len(bunnies) - 1, 0, -1):
        bunnies[end], bunnies[0] = bunnies[0], bunnies[end]
        siftdown(bunnies, 0, end - 1, vegetableType)
    return bunnies


def siftdown(bunnies, start, end, vegetableType):
    """helper function for heapsort"""
    root = start
    while True:
        child = root * 2 + 1
        if child > end:
            break
        if child + 1 <= end and bunnies[child].vegetables.get(vegetableType=vegetableType).value > bunnies[
                    child + 1].vegetables.get(vegetableType=vegetableType).value:
            child += 1
        if bunnies[root].vegetables.get(vegetableType=vegetableType).value > bunnies[child].vegetables.get(
                vegetableType=vegetableType).value:
            bunnies[root], bunnies[child] = bunnies[child], bunnies[root]
            root = child
        else:
            break

以及他们要求的性能测试(我不知道有更好的方法.只创建兔子需要很长时间)

def test_20000_rabbits_performance(self):
    print("Creating bunnies")
    register20000Bunnies()

    print("Created bunnies")
    timestart = time()

    url = reverse("api:list", args=["carrots"])

    response = self.client.get(url)
    timeMeasured = time() - timestart
    print("Sorted. Took: " + str(timeMeasured))

    self.assertEqual(response.status_code, status.HTTP_200_OK)

我的观点:

@api_view(["GET"])
def bunnyList(request, vegetableType):
""" Displays heap-sorted list of bunnies, in decreasing order.
    Takes word after list ("/list/xxx") as argument to determine
    which vegetable list to display"""
    if vegetableType in vegetablesChoices:
        bunnies =
    Bunny.objects.filter(vegetables__vegetableType=vegetableType)
        bunnies = list(bunnies)  # force into normal list

        if len(bunnies) == 0:
            return Response({"No bunnies": "there is %d bunnies with this vegetable" % len(bunnies)},
                        status=status.HTTP_204_NO_CONTENT)

        heapsort(bunnies, vegetableType)
        serialized = BunnySerializerPartial(bunnies, many=True)
        return Response(serialized.data, status=status.HTTP_200_OK)
    else:
        raise serializers.ValidationError("No such vegetable. Available are: " + ", ".join(vegetablesChoices))

编辑:现在才检查,当前排序需要1202秒…我的机器是2核1.86GHz,但是仍然.

Edit2,新代码:

@api_view(["GET"])
def bunnyList(request, vegetableType):
""" Displays heap-sorted list of bunnies, in decreasing order.
    Takes word after list ("/list/xxx") as argument to determine
    which vegetable list to display"""
if vegetableType in vegetablesChoices:
    vegetables =  Vegetable.objects.filter(vegetableType=vegetableType).select_related('bunny')
    vegetables = list(vegetables)

    if len(vegetables) == 0:
        return Response({"No bunnies": "there is 0 bunnies with this vegetable"},
                        status=status.HTTP_204_NO_CONTENT)

    heapsort(vegetables)

    bunnies = [vegetable.bunny for vegetable in vegetables]
    serialized = BunnySerializerPartial(bunnies, many=True)
    return Response(serialized.data, status=status.HTTP_200_OK)
else:
    raise serializers.ValidationError("No such vegetable. Available are: " + ", ".join(vegetablesChoices))

更新的堆排序:

def heapsort(vegetables):
"""Heapsort function for vegetables, works in place, descending"""

for start in range((len(vegetables) - 2) // 2, -1, -1):
    siftdown(vegetables, start, len(vegetables) - 1)

for end in range(len(vegetables) - 1, 0, -1):
    vegetables[end], vegetables[0] = vegetables[0], vegetables[end]
    siftdown(vegetables, 0, end - 1)
return vegetables


def siftdown(vegetables, start, end):
"""helper function for heapsort"""
root = start
while True:
    child = root * 2 + 1
    if child > end:
        break
    if child + 1 <= end and vegetables[child].value > vegetables[child+1].value:
        child += 1
    if vegetables[root].value > vegetables[child].value:
        vegetables[root], vegetables[child] = vegetables[child], vegetables[root]
        root = child
    else:
        break

我的序列化器:

class BunnySerializerPartial(serializers.ModelSerializer):
"""Used in list view, mirrors BunnySerializerFull but without account details"""
    vegetables = VegetableSerializer(many=True)

    class Meta:
        model = Bunny
        fields = ("name", "vegetables")


class VegetableSerializer(serializers.ModelSerializer):
"""Used for displaying vegetables, for example in list view"""
    class Meta:
        model = Vegetable
        fields = ("vegetableType", "value")

并从工具栏查询:

SELECT ••• FROM "zajaczkowskiBoardApi_vegetable" INNER JOIN "zajaczkowskiBoardApi_bunny" ON ("zajaczkowskiBoardApi_vegetable"."bunny_id" = "zajaczkowskiBoardApi_bunny"."id") WHERE "zajaczkowskiBoardApi_vegetable"."vegetableType" = '''carrots'''


SELECT ••• FROM "zajaczkowskiBoardApi_vegetable" WHERE "zajaczkowskiBoardApi_vegetable"."bunny_id" = '141'

第二个重复2万次

解决方法:

这是经典的N 1查询问题.您执行单个查询以获取所有兔子,但是接着继续为每个兔子执行bunnies [child] .vegetables.get(vegetableType = vegetableType),这将为每个兔子执行附加查询,从而执行附加数据库往返兔子.因此,您对N个兔子执行1个查询,再对N个查询进行查询以获取所有蔬菜(因此N 1).

数据库往返是Web开发人员可以使用的最昂贵的资源之一.虽然比较花费的时间大约是纳秒级,但数据库往返花费的时间大约是毫秒级.进行约2万次查询,这很快就会花费几分钟.

快速的解决方案是使用prefetch_related(‘vegetables’),而专门使用bunny.getVegetable(‘carrot’)来获得胡萝卜. prefetch_related()将执行单个查询以获取所有兔子的所有蔬菜并将其缓存,因此在getVegetables()中迭代self.vegetables.all()不会执行任何其他查询.

不过,还有更好的解决方案.在这种情况下,每个兔子似乎最多只能有1个特定蔬菜类型的蔬菜对象.如果您在数据库级别实施此操作,那么当有人决定向兔子添加第二种“胡萝卜”蔬菜时,您就不必担心排序算法中的错误.相反,数据库将首先阻止他们这样做.为此,您需要一个unique_together约束:

class Vegetable(models.Model):
    ...
    class Meta:
        unique_together = [
            ('vegetableType', 'bunny'),
        ]

然后,您可以获取所有类型为“ carrot”和join相关兔子的蔬菜,而不是获取所有兔子的图像并预取所有相关的蔬菜.现在您将只有一个查询:

carrots = Vegetable.objects.filter(vegetableType='carrot').select_related('bunny')

由于VegetablesType和Bunny的组合是唯一的,因此您将不会得到任何重复的兔子,并且您仍然会获得所有带有一些胡萝卜的兔子.

当然,您必须调整算法以适用于蔬菜,而不是兔子.

上一篇:Python-完美数搜索的优化


下一篇:Java的联合几何在JTS更快?