Opened 17 years ago
Closed 17 years ago
#6308 closed (fixed)
QueryDict won't release objects between requests when request.POST is copy() ed
Reported by: | Owned by: | Yasushi Masuda | |
---|---|---|---|
Component: | Core (Other) | Version: | dev |
Severity: | Keywords: | request.POST.copy, memory issue | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi,
There seems to be the population of container argument in QueryDict.deepcopy(), which should cause incremental
(and unreleased) population of POST data between requests. Here is the relevant code in django.http.init.py:
174 def __deepcopy__(self, memo={}): 175 import copy 176 result = self.__class__('', mutable=True) 177 memo[id(self)] = result 178 for key, value in dict.items(self): 179 dict.__setitem__(result, copy.deepcopy(key, memo), copy.deepcopy(value, memo))
The behaviour can be easily demonstrated as follows:
from django.conf.urls.defaults import * from django.http import HttpResponse, QueryDict from django.template import Template, Context def demo(request): request.POST.copy() t = Template( '<html><body>' '<form method="POST" action="/"><input name="llama" type="submit"></form>' '{% for item in memo %}llama at {{ item.0 }}, {% endfor %}' '</body></html>') memo = QueryDict.__deepcopy__.im_func.func_defaults[0].items() c = Context({'memo': memo}) return HttpResponse(t.render(c)) urlpatterns = patterns('', url(r'^$', demo))
You should see llamas are populating between requests, and after enormous POST requests,
You can see the process size is significantly increased.
I think (I'm not sure the reason why memo is directly populated) it can be fixed rewriting deepcopy as follows:
def __deepcopy__(self, memo={}): import copy result = self.__class__('', mutable=True) m = dict(memo.items()) m[id(self)] = result for key, value in dict.items(self): dict.__setitem__(result, copy.deepcopy(key, m), copy.deepcopy(value, m)) return result
Thanks.
Attachments (1)
Change History (14)
comment:1 by , 17 years ago
by , 17 years ago
Attachment: | against_post_heaping_patch.diff added |
---|
A patch fixing POST data heaping
comment:2 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:3 by , 17 years ago
Owner: | changed from | to
---|---|
Status: | assigned → new |
comment:4 by , 17 years ago
This patch completely breaks the deepcopy behaviour. The memo dictionary must be updated directly in order to avoid circular copying loops. Making a copy of that dictionary is wrong on any level: it's an internal Python data structure, not something we ever copy. We only update it.
I need to study what's going on here, but I your example isn't particularly convincing, either. If you intentionally create a circular reference to something deep in the code object and then copy it, yes, it will keep references around. That's how Python works. That's your own doing and not something we will protect you against (since you might well have meant to do it). Do you have a more realistic example of where this comes up in practice?
follow-up: 6 comment:5 by , 17 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
I've thought about this some more and I cannot see how there's problem unless you deliberately create a circular reference precisely as you've done (by diving into the bytecode). Even if you do that, Python behaves as expected, so there's no bug here.
Reopen if there's a legitimate use-case you are trying to address here (with a better example).
comment:6 by , 17 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
Replying to mtredinnick:
I've thought about this some more and I cannot see how there's problem unless you deliberately create a circular reference precisely as you've done (by diving into the bytecode). Even if you do that, Python behaves as expected, so there's no bug here.
Reopen if there's a legitimate use-case you are trying to address here (with a better example).
The problem I meant to address was that QueryDict.deepcopy' s memo argument keeps content of request.POST.copy() after deepcopy has finished -- memo should be cleared, at least before next deepcopy call, otherwise repeating request.POST.copy() would populate POST data
ever submitted and leave them in memo, resulting the memory-leak-like behavior. That was why I create a separate dictionary from memo instead of just updating it.
Instead of doing that, how about clearing memo explicitly:
def __deepcopy__(self, memo={}): import copy result = self.__class__('', mutable=True) memo[id(self)] = result for key, value in dict.items(self): dict.__setitem__(result, copy.deepcopy(key, m), copy.deepcopy(value, m)) memo.clear() return result
comment:7 by , 17 years ago
Please read up on how Python's deepcopy is implemented. The memo parameter is not something we are using. It is needed by deepcopy() and, as I mentioned above, we only update it. Anything other than updating it is wrong.
follow-up: 9 comment:8 by , 17 years ago
The error here looks to be specifying a mutable default argument, which is invariably an error. I'm pretty sure that isn't needed, although it's not clear we should ever be using the default value of {} there -- deepcopy() will always pass in its own.
comment:9 by , 17 years ago
Replying to mtredinnick:
The error here looks to be specifying a mutable default argument, which is invariably an error. I'm pretty sure that isn't needed, although it's not clear we should ever be using the default value of {} there -- deepcopy() will always pass in its own.
I'm getting the point -- if we should avoid memo's default value ('{}') left updated (and also keep it in the argument list as a parameter), it would be better default to None, then checked in the deepcopy as follows:
def __deepcopy__(self, memo=None): import copy if memo is None: memo = {} result = self.__class__('', mutable=True) memo[id(self)] = result for key, value in dict.items(self): dict.__setitem__(result, copy.deepcopy(key, memo), copy.deepcopy(value, memo)) return result
comment:10 by , 17 years ago
Resolution: | → invalid |
---|---|
Status: | reopened → closed |
It's irrelevant, really. The deepcopy function will never be called without a dictionary being passed in. So we just ignore it. I might remove the default one day, but it's nmot really urgent. There really is no bug here except a cosmetic thing.
comment:11 by , 17 years ago
comment:12 by , 17 years ago
Resolution: | invalid |
---|---|
Status: | closed → reopened |
So it turns out you were probably on the trail of a real bug, but it wasn't triggered by calling deepcopy() at all. It was the copy() method on QueryDict that was broken (because it was the only thing relying on the default argument to QueryDict.__deepcopy__()
).
In hindsight, I understand what you were now trying to say: it doesn't keep a copy of request.POST.copy()
, it's triggered by that call. Sorry for the confusion, but it's fixed now.
comment:13 by , 17 years ago
Resolution: | → fixed |
---|---|
Status: | reopened → closed |
Aporogize for misleading summary: I mean that request.POST itself won't be populated between requests, but reference for it (data referenced as request.POST in previous requests) remains in QueryDict.