Code

Opened 6 years ago

Closed 6 years ago

#6308 closed (fixed)

QueryDict won't release objects between requests when request.POST is copy() ed

Reported by: Yasushi Masuda <whosaysni@…> Owned by: ymasuda
Component: Core (Other) Version: master
Severity: Keywords: request.POST.copy, memory issue
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Hi,
There seems to be the population of container argument in QueryDict.deepcopy(), which should cause incremental
(and unreleased) population of POST data between requests. Here is the relevant code in django.http.init.py:

174    def __deepcopy__(self, memo={}):
175	        import copy
176	        result = self.__class__('', mutable=True)
177	        memo[id(self)] = result
178	        for key, value in dict.items(self):
179	            dict.__setitem__(result, copy.deepcopy(key, memo), copy.deepcopy(value, memo))

The behaviour can be easily demonstrated as follows:

from django.conf.urls.defaults import *
from django.http import HttpResponse, QueryDict
from django.template import Template, Context

def demo(request):
  request.POST.copy()
  t = Template(
    '<html><body>'
    '<form method="POST" action="/"><input name="llama" type="submit"></form>'
    '{% for item in memo %}llama at {{ item.0 }}, {% endfor %}'
    '</body></html>')
  memo = QueryDict.__deepcopy__.im_func.func_defaults[0].items()
  c = Context({'memo': memo})
  return HttpResponse(t.render(c))

urlpatterns = patterns('', url(r'^$', demo))

You should see llamas are populating between requests, and after enormous POST requests,
You can see the process size is significantly increased.

I think (I'm not sure the reason why memo is directly populated) it can be fixed rewriting deepcopy as follows:

    def __deepcopy__(self, memo={}):
        import copy
        result = self.__class__('', mutable=True)
        m = dict(memo.items())
        m[id(self)] = result
        for key, value in dict.items(self):
            dict.__setitem__(result, copy.deepcopy(key, m), copy.deepcopy(value, m))
        return result

Thanks.

Attachments (1)

against_post_heaping_patch.diff (696 bytes) - added by Yasushi Masuda <whosaysni@…> 6 years ago.
A patch fixing POST data heaping

Download all attachments as: .zip

Change History (14)

comment:1 Changed 6 years ago by Yasushi Masuda <whosaysni@…>

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Aporogize for misleading summary: I mean that request.POST itself won't be populated between requests, but reference for it (data referenced as request.POST in previous requests) remains in QueryDict.

Changed 6 years ago by Yasushi Masuda <whosaysni@…>

A patch fixing POST data heaping

comment:2 Changed 6 years ago by Yasushi Masuda <whosaysni@…>

  • Owner changed from nobody to anonymous
  • Status changed from new to assigned

comment:3 Changed 6 years ago by ymasuda

  • Owner changed from anonymous to ymasuda
  • Status changed from assigned to new

comment:4 Changed 6 years ago by mtredinnick

This patch completely breaks the deepcopy behaviour. The memo dictionary must be updated directly in order to avoid circular copying loops. Making a copy of that dictionary is wrong on any level: it's an internal Python data structure, not something we ever copy. We only update it.

I need to study what's going on here, but I your example isn't particularly convincing, either. If you intentionally create a circular reference to something deep in the code object and then copy it, yes, it will keep references around. That's how Python works. That's your own doing and not something we will protect you against (since you might well have meant to do it). Do you have a more realistic example of where this comes up in practice?

comment:5 follow-up: Changed 6 years ago by mtredinnick

  • Resolution set to invalid
  • Status changed from new to closed

I've thought about this some more and I cannot see how there's problem unless you deliberately create a circular reference precisely as you've done (by diving into the bytecode). Even if you do that, Python behaves as expected, so there's no bug here.

Reopen if there's a legitimate use-case you are trying to address here (with a better example).

comment:6 in reply to: ↑ 5 Changed 6 years ago by ymasuda

  • Resolution invalid deleted
  • Status changed from closed to reopened

Replying to mtredinnick:

I've thought about this some more and I cannot see how there's problem unless you deliberately create a circular reference precisely as you've done (by diving into the bytecode). Even if you do that, Python behaves as expected, so there's no bug here.

Reopen if there's a legitimate use-case you are trying to address here (with a better example).

The problem I meant to address was that QueryDict.deepcopy' s memo argument keeps content of request.POST.copy() after deepcopy has finished -- memo should be cleared, at least before next deepcopy call, otherwise repeating request.POST.copy() would populate POST data
ever submitted and leave them in memo, resulting the memory-leak-like behavior. That was why I create a separate dictionary from memo instead of just updating it.

Instead of doing that, how about clearing memo explicitly:

    def __deepcopy__(self, memo={}):
        import copy
        result = self.__class__('', mutable=True)
        memo[id(self)] = result
        for key, value in dict.items(self):
            dict.__setitem__(result, copy.deepcopy(key, m), copy.deepcopy(value, m))
        memo.clear()
        return result

comment:7 Changed 6 years ago by mtredinnick

Please read up on how Python's deepcopy is implemented. The memo parameter is not something we are using. It is needed by deepcopy() and, as I mentioned above, we only update it. Anything other than updating it is wrong.

comment:8 follow-up: Changed 6 years ago by mtredinnick

The error here looks to be specifying a mutable default argument, which is invariably an error. I'm pretty sure that isn't needed, although it's not clear we should ever be using the default value of {} there -- deepcopy() will always pass in its own.

comment:9 in reply to: ↑ 8 Changed 6 years ago by ymasuda

Replying to mtredinnick:

The error here looks to be specifying a mutable default argument, which is invariably an error. I'm pretty sure that isn't needed, although it's not clear we should ever be using the default value of {} there -- deepcopy() will always pass in its own.

I'm getting the point -- if we should avoid memo's default value ('{}') left updated (and also keep it in the argument list as a parameter), it would be better default to None, then checked in the deepcopy as follows:

    def __deepcopy__(self, memo=None):
        import copy
        if memo is None:
            memo = {}
        result = self.__class__('', mutable=True)
        memo[id(self)] = result
        for key, value in dict.items(self):
            dict.__setitem__(result, copy.deepcopy(key, memo), copy.deepcopy(value, memo))
        return result

comment:10 Changed 6 years ago by mtredinnick

  • Resolution set to invalid
  • Status changed from reopened to closed

It's irrelevant, really. The deepcopy function will never be called without a dictionary being passed in. So we just ignore it. I might remove the default one day, but it's nmot really urgent. There really is no bug here except a cosmetic thing.

comment:11 Changed 6 years ago by mtredinnick

(In [7167]) Removed an unnecessary default argument in one deepcopy() method and fixed
up the one place that was mistakenly relying on that.

Refs #6308.

comment:12 Changed 6 years ago by mtredinnick

  • Resolution invalid deleted
  • Status changed from closed to reopened

So it turns out you were probably on the trail of a real bug, but it wasn't triggered by calling deepcopy() at all. It was the copy() method on QueryDict that was broken (because it was the only thing relying on the default argument to QueryDict.__deepcopy__()).

In hindsight, I understand what you were now trying to say: it doesn't keep a copy of request.POST.copy(), it's triggered by that call. Sorry for the confusion, but it's fixed now.

comment:13 Changed 6 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from reopened to closed

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.