Opened 8 years ago
Closed 8 years ago
#27159 closed Bug (fixed)
Pickling query with an __in=inner_qs lookup causes evaluation evaluation of inner_qs
Reported by: | Jani Tiainen | Owned by: | Jani Tiainen |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.10 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Accepted | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
According to documentation pickling queryset.query should be correct approach and it should only pickle only relevant information to recreate queryset.query again.
Given models:
from django.db import models # Create your models here. class ParentModel(models.Model): name = models.CharField(max_length=64) class ChildModel(models.Model): name = models.CharField(max_length=64) parent = models.ForeignKey('ParentModel', related_name='children')
Following querys, when pickled causes unwanted evaluation and creating quite large pickled result:
for x in range(1,10000): ParentModel(name='Parent {}'.format(x), ).save() ChildModel(name='Child 1', parent=ParentModel.objects.all()[0]).save() parents_1 = ParentModel.objects.all().values_list('pk', flat=True) children_1 = ChildModel.objects.filter(parent__in=parents_1) pickled_stuff_1 = pickle.dumps(children_1.query) parents_2 = ParentModel.objects.all() children_2 = ChildModel.objects.filter(parent__in=parents_2) pickled_stuff_2 = pickle.dumps(children_2.query) # First len is about 74 kilobytes, second len is about 2 megabytes. print len(pickled_stuff_1), len(pickled_stuff_2)
When comparing sizes of pickled queries both are relatively big. When inspecting latter pickle it can be seen that it actually contains all instances from fully evaluated queryset. Same behavior exists in 1.8 and 1.10 as well.
Attachments (1)
Change History (13)
comment:1 by , 8 years ago
Component: | Uncategorized → Database layer (models, ORM) |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | Uncategorized → Bug |
comment:2 by , 8 years ago
I added crude, initial test to queryset_pickle
to https://github.com/jtiai/django/tree/issue_27159
Interesting enough, if you check pickled output it's clearly visible that it contains all instances from Parent-model.
If you compare query output as a string both are equal.
comment:3 by , 8 years ago
After doing some debugging it looks like pickle goes to WhereNode
which in turn goes to RelatedIn
query, which goes to QuerySet.__getstate__
which, by design populates full cache.
comment:4 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
by , 8 years ago
Attachment: | ticket_27159.diff added |
---|
comment:5 by , 8 years ago
I deleted self._fetch_all()
from django/db/models/query.py QuerySet.__getstate__
, so now it doesn't populate the cache before pickling. Is it a valid solution? All tests are passing.
comment:7 by , 8 years ago
Patch needs improvement: | set |
---|
I left some comments for improvement on the PR.
comment:8 by , 8 years ago
Patch needs improvement: | unset |
---|
comment:9 by , 8 years ago
Owner: | changed from | to
---|---|
Summary: | Pickling query from queryset causes unintended queryset evaluation → Pickling query with an __in=inner_qs lookup causes evaluation evaluation of inner_qs |
comment:11 by , 8 years ago
Patch needs improvement: | unset |
---|
Made all suggested changes and improved test cases.
I didn't attempt to reproduce but I trust the reporter. If you could transform the reproduction information into a test case for
queryset_pickle
, that's always helpful.