Opened 7 years ago

Closed 7 years ago

Last modified 7 years ago

#6748 closed (fixed)

Ensure that debug error pages never evaluate QuerySets

Reported by: kcarnold Owned by: mtredinnick
Component: Core (Other) Version: master
Severity: Keywords: debug slow queryset repr memory
Cc: kenneth.arnold@…, ehs@… Triage Stage: Design decision needed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

The local variable display on debug 500 pages unintentionally evaluates QuerySets. At best, the result is a delay [1]. At worst, the attempted evaluation may cause another exception, which will mask the root cause of the error.

Based on my suggestions, Mike Axiak of MIT ESP wrote the attached code (originally bundled with other code). I have attempted to factor out this part in a form that could conceivably apply to trunk. Patch attached.

[1] The load delay may be significantly greater than a normal page load, for two reasons:

  1. The debug page will include all local QuerySets, even if some of them were intermediary results in constructing the final QuerySet. This is particularly bad if the intermediate one does not yet have a row limit; I have occasionally found a debug page iterating over an entire huge table because of this.
  1. The model __repr__ often returns ForeignKey fields that may not be used on the page. The patch doesn't address this yet.

Attachments (4)

django-safereprqueryset-in-debug.patch (3.5 KB) - added by kcarnold 7 years ago.
safe-queryset-debug-v2.patch (2.0 KB) - added by kcarnold 7 years ago.
Revised, simplified version of the patch
project.zip (13.8 KB) - added by edsu 7 years ago.
A simple demo application to demo QuerySet evaluation in debug view
queryset-repr.patch (759 bytes) - added by korpios 7 years ago.

Download all attachments as: .zip

Change History (20)

Changed 7 years ago by kcarnold

comment:1 Changed 7 years ago by jacob

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Design decision needed

Changed 7 years ago by kcarnold

Revised, simplified version of the patch

comment:2 Changed 7 years ago by kcarnold

I've added a smaller and actually more robust patch that takes advantage of the pprint module and the recent qs-rf merge to make this a really simple change. It will now even avoid evaluating querysets in lists or other objects.

The most prominent case for this bug to pop up will probably be views that wrap generic views, since the querysets passed around there rarely have row limits.

(I still have not solved the foreign key problem. But at this point we should consider it a separate issue, I think.)

comment:3 Changed 7 years ago by edsu

I got burned by this pretty bad, with the result being django devserver sucking up all available memory in my development environment where DEBUG was set to True.

I've recreated the problem as a very minimalistic django app, attached to the ticket as project.zip. This app contains a simple view:

def word(request, word):
    w = Word.objects.get(name=word)
    return HttpResponse(w)

First you'll need to load some demo data, and run the server:

% python manage.py syncdb
% python manage runserver

Run top in another to watch memory usage. Don't worry, this example won't grow past 11MB.

First hit the view for a Model that exists:

http://localhost:8080/word/time

Notice the very boring display and low memory footprint. Now try a URL like:

http://localhost:8080/word/thisworddoesnotexist

Notice your browser pause for a bit. Eventually you should get a debug view back, with the DoesNotExist exception. Look at the 4th stack frame and observe all the Word objects that are present in the self variable?

The reason for this is the debug view wants to display self in that frame, and self is a QuerySet(project.app.Word), which when you call __repr__ on it causes the query to run and to return the entire table into memory.

See line 301 of django/db/models/query.py for where the exception is being thrown. Notice how self is a local variable in that stack frame? The debug view tries to display that QuerySet using __repr__ on line 508 in the inline template in django/views/debug.py.

If the Model in question happens to have millions of wide rows (as ours did) you'll never see the Debug screen since our machine started swapping like mad trying to suck back all the results into memory.

I'm not sure what the fix is. Perhaps QuerySet.__repr__ should simply say it's a QuerySet instead of executing the query and displaying the results?

Changed 7 years ago by edsu

A simple demo application to demo QuerySet evaluation in debug view

comment:4 Changed 7 years ago by edsu

Note, examples using port 8080 above should be 8000.

comment:5 Changed 7 years ago by edsu

I had to reduce the amount of initial data in the project.zip demo app, since it exceeded the attachment size. You should still be able to see the large amount of Word objects in the 4th stack frame in the debug view though.

comment:6 Changed 7 years ago by kcarnold

  • Cc kenneth.arnold@… added

Did you try the patch? It should still apply, perhaps with fuzz; I've been using it locally ever since.

comment:7 Changed 7 years ago by edsu

  • Keywords memory added

Yes, the patch worked beautifully on r8210 of the trunk. Would be nice to see this make it in.

comment:8 Changed 7 years ago by edsu

  • Cc ehs@… added

comment:9 Changed 7 years ago by korpios

I just got bit by this; I'm wondering if the root issue isn't so much that debug pages evaluate QuerySets, but that QuerySet.__repr__ evaluates the entire thing. An object's __repr__ should be a value useful to a programmer, and a QuerySet containing 14149 objects (to use my current example) is not a useful string while sitting at a Python console.

Attaching my first crack at this; a better implementation might take the result cache into account so you don't end up having the __repr__ of a QuerySet slightly over the cutoff (here, CHUNK_SIZE) being more expensive than it would be in the original implementation. I intentionally kept the output the same if the QuerySet's length was below the cutoff, as I don't want to break countless doctests. ^_^

Changed 7 years ago by korpios

comment:10 Changed 7 years ago by korpios

  • Cc korpios@… added

comment:11 Changed 7 years ago by cgrady

I was also just bitten by this, we have a very large table, and when a query died in one spot, it sucked up ram and timed out the connection trying to build an oversized repr of the queryset - applying a patch along these lines fixed it entirely

comment:12 Changed 7 years ago by korpios

  • Cc korpios@… removed

comment:13 Changed 7 years ago by mtredinnick

The first patch here uses some GPL code as the genesis of the patch. It would be illegal for us to include that. Please don't submit any more code that started out life in the ESP project for inclusion in Django unless you are the sole author of that code and permitted to relicense it.

I also don't really like the idea of fundamentally changing the repr output. Querysets are basically lists, so a repr that looks like a list is reasonable. Plus it would break a few hundred thousand tests and tutorials around the planet, which isn't particularly nice.

The final patch suffers from the Butch Cassidy problem -- it's the fall that'll kill you. Printing out the results isn't the problem. It's loading them into memory in the first place (len(qs) loads all results).

Once the test suite finishes running, I'll commit a patch that has the same effect as Tom's patch without loading all the results (it doesn't print the total number, since that requires more database queries and we can live without it).

comment:14 Changed 7 years ago by mtredinnick

  • Owner changed from nobody to mtredinnick
  • Status changed from new to assigned

comment:15 Changed 7 years ago by mtredinnick

  • Resolution set to fixed
  • Status changed from assigned to closed

(In [9202]) Fixed #6748 -- When printing the repr() of querysets, don't load or display
more than 20 objects.

This means that accidentally executing HugeStoryArchive.objects.all() at the
interactive prompt (or in the debug template) won't try to load all 4,233,010
stories into memory and print them out. That would previously cause resource
starvation and other "interesting" crashes.

If you really, really want the previous behaviour (e.g. in a doctest that
prints more than 20 items), display "list(qs)" instead of just "qs".

comment:16 Changed 7 years ago by mtredinnick

(In [9205]) [1.0.X] Fixed #6748 -- When printing the repr() of querysets, don't load or
display more than 20 objects.

This means that accidentally executing HugeStoryArchive.objects.all() at the
interactive prompt (or in the debug template) won't try to load all 4,233,010
stories into memory and print them out. That would previously cause resource
starvation and other "interesting" crashes.

If you really, really want the previous behaviour (e.g. in a doctest that
prints more than 20 items), display "list(qs)" instead of just "qs".

Backport of r9202 from trunk.

Note: See TracTickets for help on using tickets.
Back to Top