Opened 16 years ago

Closed 16 years ago

#6308 closed (fixed)

QueryDict won't release objects between requests when request.POST is copy() ed

Reported by: Yasushi Masuda <whosaysni@…> Owned by: Yasushi Masuda
Component: Core (Other) Version: dev
Severity: Keywords: request.POST.copy, memory issue
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi,
There seems to be the population of container argument in QueryDict.deepcopy(), which should cause incremental
(and unreleased) population of POST data between requests. Here is the relevant code in django.http.init.py:

174    def __deepcopy__(self, memo={}):
175	        import copy
176	        result = self.__class__('', mutable=True)
177	        memo[id(self)] = result
178	        for key, value in dict.items(self):
179	            dict.__setitem__(result, copy.deepcopy(key, memo), copy.deepcopy(value, memo))

The behaviour can be easily demonstrated as follows:

from django.conf.urls.defaults import *
from django.http import HttpResponse, QueryDict
from django.template import Template, Context

def demo(request):
  request.POST.copy()
  t = Template(
    '<html><body>'
    '<form method="POST" action="/"><input name="llama" type="submit"></form>'
    '{% for item in memo %}llama at {{ item.0 }}, {% endfor %}'
    '</body></html>')
  memo = QueryDict.__deepcopy__.im_func.func_defaults[0].items()
  c = Context({'memo': memo})
  return HttpResponse(t.render(c))

urlpatterns = patterns('', url(r'^$', demo))

You should see llamas are populating between requests, and after enormous POST requests,
You can see the process size is significantly increased.

I think (I'm not sure the reason why memo is directly populated) it can be fixed rewriting deepcopy as follows:

    def __deepcopy__(self, memo={}):
        import copy
        result = self.__class__('', mutable=True)
        m = dict(memo.items())
        m[id(self)] = result
        for key, value in dict.items(self):
            dict.__setitem__(result, copy.deepcopy(key, m), copy.deepcopy(value, m))
        return result

Thanks.

Attachments (1)

against_post_heaping_patch.diff (696 bytes ) - added by Yasushi Masuda <whosaysni@…> 16 years ago.
A patch fixing POST data heaping

Download all attachments as: .zip

Change History (14)

comment:1 by Yasushi Masuda <whosaysni@…>, 16 years ago

Aporogize for misleading summary: I mean that request.POST itself won't be populated between requests, but reference for it (data referenced as request.POST in previous requests) remains in QueryDict.

by Yasushi Masuda <whosaysni@…>, 16 years ago

A patch fixing POST data heaping

comment:2 by Yasushi Masuda <whosaysni@…>, 16 years ago

Owner: changed from nobody to anonymous
Status: newassigned

comment:3 by Yasushi Masuda, 16 years ago

Owner: changed from anonymous to Yasushi Masuda
Status: assignednew

comment:4 by Malcolm Tredinnick, 16 years ago

This patch completely breaks the deepcopy behaviour. The memo dictionary must be updated directly in order to avoid circular copying loops. Making a copy of that dictionary is wrong on any level: it's an internal Python data structure, not something we ever copy. We only update it.

I need to study what's going on here, but I your example isn't particularly convincing, either. If you intentionally create a circular reference to something deep in the code object and then copy it, yes, it will keep references around. That's how Python works. That's your own doing and not something we will protect you against (since you might well have meant to do it). Do you have a more realistic example of where this comes up in practice?

comment:5 by Malcolm Tredinnick, 16 years ago

Resolution: invalid
Status: newclosed

I've thought about this some more and I cannot see how there's problem unless you deliberately create a circular reference precisely as you've done (by diving into the bytecode). Even if you do that, Python behaves as expected, so there's no bug here.

Reopen if there's a legitimate use-case you are trying to address here (with a better example).

in reply to:  5 comment:6 by Yasushi Masuda, 16 years ago

Resolution: invalid
Status: closedreopened

Replying to mtredinnick:

I've thought about this some more and I cannot see how there's problem unless you deliberately create a circular reference precisely as you've done (by diving into the bytecode). Even if you do that, Python behaves as expected, so there's no bug here.

Reopen if there's a legitimate use-case you are trying to address here (with a better example).

The problem I meant to address was that QueryDict.deepcopy' s memo argument keeps content of request.POST.copy() after deepcopy has finished -- memo should be cleared, at least before next deepcopy call, otherwise repeating request.POST.copy() would populate POST data
ever submitted and leave them in memo, resulting the memory-leak-like behavior. That was why I create a separate dictionary from memo instead of just updating it.

Instead of doing that, how about clearing memo explicitly:

    def __deepcopy__(self, memo={}):
        import copy
        result = self.__class__('', mutable=True)
        memo[id(self)] = result
        for key, value in dict.items(self):
            dict.__setitem__(result, copy.deepcopy(key, m), copy.deepcopy(value, m))
        memo.clear()
        return result

comment:7 by Malcolm Tredinnick, 16 years ago

Please read up on how Python's deepcopy is implemented. The memo parameter is not something we are using. It is needed by deepcopy() and, as I mentioned above, we only update it. Anything other than updating it is wrong.

comment:8 by Malcolm Tredinnick, 16 years ago

The error here looks to be specifying a mutable default argument, which is invariably an error. I'm pretty sure that isn't needed, although it's not clear we should ever be using the default value of {} there -- deepcopy() will always pass in its own.

in reply to:  8 comment:9 by Yasushi Masuda, 16 years ago

Replying to mtredinnick:

The error here looks to be specifying a mutable default argument, which is invariably an error. I'm pretty sure that isn't needed, although it's not clear we should ever be using the default value of {} there -- deepcopy() will always pass in its own.

I'm getting the point -- if we should avoid memo's default value ('{}') left updated (and also keep it in the argument list as a parameter), it would be better default to None, then checked in the deepcopy as follows:

    def __deepcopy__(self, memo=None):
        import copy
        if memo is None:
            memo = {}
        result = self.__class__('', mutable=True)
        memo[id(self)] = result
        for key, value in dict.items(self):
            dict.__setitem__(result, copy.deepcopy(key, memo), copy.deepcopy(value, memo))
        return result

comment:10 by Malcolm Tredinnick, 16 years ago

Resolution: invalid
Status: reopenedclosed

It's irrelevant, really. The deepcopy function will never be called without a dictionary being passed in. So we just ignore it. I might remove the default one day, but it's nmot really urgent. There really is no bug here except a cosmetic thing.

comment:11 by Malcolm Tredinnick, 16 years ago

(In [7167]) Removed an unnecessary default argument in one deepcopy() method and fixed
up the one place that was mistakenly relying on that.

Refs #6308.

comment:12 by Malcolm Tredinnick, 16 years ago

Resolution: invalid
Status: closedreopened

So it turns out you were probably on the trail of a real bug, but it wasn't triggered by calling deepcopy() at all. It was the copy() method on QueryDict that was broken (because it was the only thing relying on the default argument to QueryDict.__deepcopy__()).

In hindsight, I understand what you were now trying to say: it doesn't keep a copy of request.POST.copy(), it's triggered by that call. Sorry for the confusion, but it's fixed now.

comment:13 by Malcolm Tredinnick, 16 years ago

Resolution: fixed
Status: reopenedclosed
Note: See TracTickets for help on using tickets.
Back to Top