我必须为我作为django实习考试而得到的作业寻求帮助.我必须用兔子和它们的胡萝卜制作和想象的api.每只兔子都应该有许多胡萝卜,但是必须将api设计为允许轻松添加其他种类的蔬菜.我拒绝每种蔬菜的整数字段,而是使用具有蔬菜类型和值的蔬菜对象.
问题是,任务还包括列出按胡萝卜排序的兔子,降落的兔子.他们要我实现堆排序,不允许数据库排序,没有外部库.尽管我对此没有任何问题,但他们在考虑到时间限制时遇到了麻烦-将20 000只兔子在30秒以内分类,理想情况下是5秒. 200只兔子已经花了5秒(只需排序并序列化为json).
我进行了一个查询集,只包含带有“胡萝卜”蔬菜的兔子.然后,我将其强制进入正常列表并在其上运行heapsort函数.
我需要如何将其更改为更快?可能吗?如果有人帮助我,我将非常高兴.先感谢您!
我的模特:
class Bunny(models.Model):
"""Bunny model for bunny usage"""
def __str__(self):
return self.name + " " + str(list(self.vegetables.all()))
name = models.CharField("Name", max_length=50)
userAccount = models.ForeignKey(User, on_delete=models.CASCADE)
def getVegetable(self, vegetableType):
for vegetable in self.vegetables.all():
if vegetable.vegetableType == vegetableType:
return vegetable
return False
class Vegetable(models.Model):
"""Vegetable model for storing vegetable counts"""
def __str__(self):
return self.vegetableType + ":" + str(self.value)
vegetableType = models.CharField(max_length=30, choices=vegetableChoices)
value = models.PositiveIntegerField(default=0, validators=[MinValueValidator(0)])
bunny = models.ForeignKey(Bunny, related_name="vegetables", on_delete=models.CASCADE)
我的堆排序函数:
def heapsort(bunnies, vegetableType):
"""Heapsort function for bunnies, works in place, descending"""
for start in range((len(bunnies) - 2) // 2, -1, -1):
siftdown(bunnies, start, len(bunnies) - 1, vegetableType)
for end in range(len(bunnies) - 1, 0, -1):
bunnies[end], bunnies[0] = bunnies[0], bunnies[end]
siftdown(bunnies, 0, end - 1, vegetableType)
return bunnies
def siftdown(bunnies, start, end, vegetableType):
"""helper function for heapsort"""
root = start
while True:
child = root * 2 + 1
if child > end:
break
if child + 1 <= end and bunnies[child].vegetables.get(vegetableType=vegetableType).value > bunnies[
child + 1].vegetables.get(vegetableType=vegetableType).value:
child += 1
if bunnies[root].vegetables.get(vegetableType=vegetableType).value > bunnies[child].vegetables.get(
vegetableType=vegetableType).value:
bunnies[root], bunnies[child] = bunnies[child], bunnies[root]
root = child
else:
break
以及他们要求的性能测试(我不知道有更好的方法.只创建兔子需要很长时间)
def test_20000_rabbits_performance(self):
print("Creating bunnies")
register20000Bunnies()
print("Created bunnies")
timestart = time()
url = reverse("api:list", args=["carrots"])
response = self.client.get(url)
timeMeasured = time() - timestart
print("Sorted. Took: " + str(timeMeasured))
self.assertEqual(response.status_code, status.HTTP_200_OK)
我的观点:
@api_view(["GET"])
def bunnyList(request, vegetableType):
""" Displays heap-sorted list of bunnies, in decreasing order.
Takes word after list ("/list/xxx") as argument to determine
which vegetable list to display"""
if vegetableType in vegetablesChoices:
bunnies =
Bunny.objects.filter(vegetables__vegetableType=vegetableType)
bunnies = list(bunnies) # force into normal list
if len(bunnies) == 0:
return Response({"No bunnies": "there is %d bunnies with this vegetable" % len(bunnies)},
status=status.HTTP_204_NO_CONTENT)
heapsort(bunnies, vegetableType)
serialized = BunnySerializerPartial(bunnies, many=True)
return Response(serialized.data, status=status.HTTP_200_OK)
else:
raise serializers.ValidationError("No such vegetable. Available are: " + ", ".join(vegetablesChoices))
编辑:现在才检查,当前排序需要1202秒…我的机器是2核1.86GHz,但是仍然.
Edit2,新代码:
@api_view(["GET"])
def bunnyList(request, vegetableType):
""" Displays heap-sorted list of bunnies, in decreasing order.
Takes word after list ("/list/xxx") as argument to determine
which vegetable list to display"""
if vegetableType in vegetablesChoices:
vegetables = Vegetable.objects.filter(vegetableType=vegetableType).select_related('bunny')
vegetables = list(vegetables)
if len(vegetables) == 0:
return Response({"No bunnies": "there is 0 bunnies with this vegetable"},
status=status.HTTP_204_NO_CONTENT)
heapsort(vegetables)
bunnies = [vegetable.bunny for vegetable in vegetables]
serialized = BunnySerializerPartial(bunnies, many=True)
return Response(serialized.data, status=status.HTTP_200_OK)
else:
raise serializers.ValidationError("No such vegetable. Available are: " + ", ".join(vegetablesChoices))
更新的堆排序:
def heapsort(vegetables):
"""Heapsort function for vegetables, works in place, descending"""
for start in range((len(vegetables) - 2) // 2, -1, -1):
siftdown(vegetables, start, len(vegetables) - 1)
for end in range(len(vegetables) - 1, 0, -1):
vegetables[end], vegetables[0] = vegetables[0], vegetables[end]
siftdown(vegetables, 0, end - 1)
return vegetables
def siftdown(vegetables, start, end):
"""helper function for heapsort"""
root = start
while True:
child = root * 2 + 1
if child > end:
break
if child + 1 <= end and vegetables[child].value > vegetables[child+1].value:
child += 1
if vegetables[root].value > vegetables[child].value:
vegetables[root], vegetables[child] = vegetables[child], vegetables[root]
root = child
else:
break
我的序列化器:
class BunnySerializerPartial(serializers.ModelSerializer):
"""Used in list view, mirrors BunnySerializerFull but without account details"""
vegetables = VegetableSerializer(many=True)
class Meta:
model = Bunny
fields = ("name", "vegetables")
class VegetableSerializer(serializers.ModelSerializer):
"""Used for displaying vegetables, for example in list view"""
class Meta:
model = Vegetable
fields = ("vegetableType", "value")
并从工具栏查询:
SELECT ••• FROM "zajaczkowskiBoardApi_vegetable" INNER JOIN "zajaczkowskiBoardApi_bunny" ON ("zajaczkowskiBoardApi_vegetable"."bunny_id" = "zajaczkowskiBoardApi_bunny"."id") WHERE "zajaczkowskiBoardApi_vegetable"."vegetableType" = '''carrots'''
SELECT ••• FROM "zajaczkowskiBoardApi_vegetable" WHERE "zajaczkowskiBoardApi_vegetable"."bunny_id" = '141'
第二个重复2万次
解决方法:
这是经典的N 1查询问题.您执行单个查询以获取所有兔子,但是接着继续为每个兔子执行bunnies [child] .vegetables.get(vegetableType = vegetableType),这将为每个兔子执行附加查询,从而执行附加数据库往返兔子.因此,您对N个兔子执行1个查询,再对N个查询进行查询以获取所有蔬菜(因此N 1).
数据库往返是Web开发人员可以使用的最昂贵的资源之一.虽然比较花费的时间大约是纳秒级,但数据库往返花费的时间大约是毫秒级.进行约2万次查询,这很快就会花费几分钟.
快速的解决方案是使用prefetch_related(‘vegetables’),而专门使用bunny.getVegetable(‘carrot’)来获得胡萝卜. prefetch_related()将执行单个查询以获取所有兔子的所有蔬菜并将其缓存,因此在getVegetables()中迭代self.vegetables.all()不会执行任何其他查询.
不过,还有更好的解决方案.在这种情况下,每个兔子似乎最多只能有1个特定蔬菜类型的蔬菜对象.如果您在数据库级别实施此操作,那么当有人决定向兔子添加第二种“胡萝卜”蔬菜时,您就不必担心排序算法中的错误.相反,数据库将首先阻止他们这样做.为此,您需要一个unique_together约束:
class Vegetable(models.Model):
...
class Meta:
unique_together = [
('vegetableType', 'bunny'),
]
然后,您可以获取所有类型为“ carrot”和join相关兔子的蔬菜,而不是获取所有兔子的图像并预取所有相关的蔬菜.现在您将只有一个查询:
carrots = Vegetable.objects.filter(vegetableType='carrot').select_related('bunny')
由于VegetablesType和Bunny的组合是唯一的,因此您将不会得到任何重复的兔子,并且您仍然会获得所有带有一些胡萝卜的兔子.
当然,您必须调整算法以适用于蔬菜,而不是兔子.