Opened 5 years ago

Closed 9 months ago

Last modified 9 months ago

#12581 closed Cleanup/optimization (fixed)

connection.queries should use a ring buffer to avoid large memory consumption under runserver

Reported by: robhudson Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: rob@…, bmispelon@…, pztrick44@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Currently, when running using runserver, each query is appended to a list. For long running runserver processes, this list can grow quite large.

Attachments (1)

12581.diff (3.0 KB) - added by robhudson 5 years ago.
RingBuffer, based on connections.deque (Python 2.4+)

Download all attachments as: .zip

Change History (24)

Changed 5 years ago by robhudson

RingBuffer, based on connections.deque (Python 2.4+)

comment:1 Changed 5 years ago by robhudson

  • Has patch set
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 5 years ago by robhudson

FYI: Original patch by Alex Gaynor, tests and ticket added by Rob Hudson.

comment:3 Changed 5 years ago by russellm

  • milestone set to 1.2
  • Triage Stage changed from Unreviewed to Accepted
  • Version 1.1 deleted

comment:4 Changed 5 years ago by ubernostrum

  • milestone 1.2 deleted

1.2 is feature-frozen, moving this feature request off the milestone.

comment:5 Changed 5 years ago by adurdin

Note that the django debug toolbar’s sql panel does not work with this patch.

comment:6 Changed 4 years ago by aaugustin

This is a duplicate of #3711 (closed as wontfix) and #6734 (closed as duplicate).

Since the ticket was accepted by a core dev, I won't close it, though.

comment:7 Changed 4 years ago by mattmcc

  • Severity set to Normal
  • Type set to New feature

comment:8 Changed 4 years ago by julien

  • Patch needs improvement set
  • Type changed from New feature to Cleanup/optimization

The tests would need to be rewritten using unittests since this is now Django's preferred way. Also, the datastructures tests seem to have been moved to source:django/trunk/tests/regressiontests/utils/datastructures.py

comment:9 Changed 3 years ago by aaugustin

  • UI/UX unset

Change UI/UX from NULL to False.

comment:10 Changed 3 years ago by aaugustin

  • Easy pickings unset

Change Easy pickings from NULL to False.

comment:11 Changed 23 months ago by bmispelon

  • Cc bmispelon@… added
  • Needs tests set
  • Patch needs improvement unset
  • Version set to master

The situation has changed a bit since the original patch was written. collections.deque now takes a maxlen argument in its constructor, making the need of a custom RingBuffer implementation obsolete.

The patch becomes much simpler too: only two lines need to be changed: https://github.com/bmispelon/django/compare/master...ticket-12581

Regarding the potential problems with django debug toolbar, I haven't looked into it very thoroughly but it seemed to work on my first attempt.

The original patch had tests for the RingBuffer implementation which are not needed anymore since we rely 100% on python's stdlib now.
We should still test the new feature somehow but I'm not quite sure how to achieve this.

I'll look into this a bit deeper but if anyone has ideas on how to implement the tests, or some test failures with django-debug-toolbar, I'd love some feedback.

comment:12 Changed 23 months ago by bmispelon

I had to tweak django.test.utils.CapturedQueriesContext in order to get the test suite to run but after that, all the tests pass without further modification.

I also took a closer look at how this affects DDT and it really seems to run without any problem. It's not actually affected by the change since it has its own logic for counting queries (I tried setting the maxlen to 1 and DDT would still show the correct number of queries).

One thing I'm concerned about is hardcoding this maxlen value to 100, but I don't know how to do it in a more flexible manner (creating another setting is likely out of question).

I'll look into adding some tests and maybe some documentation in the upcoming days.

comment:13 Changed 23 months ago by charettes

A maxlen of 100 looks like sensible default to me.

comment:14 Changed 23 months ago by aaugustin

  • Summary changed from connection.queries should use a ring buffer to avoid large memory consumption under runserver to connection.queries should use a ring buffer to avoid large memory consumption under runserverontrib.comments inline on Postgres 8.3 fails to cast integer to text

At least since magic-removal (which is the horizon of events as far as I am concerned), connections.queries is reset at the end of each request-response cycle by django.db.reset_queries. I fail to see how it could grow indefinitely. I must have missed something; could someone explain what problem this ticket is trying to solve and how to reproduce it?

comment:15 Changed 23 months ago by aaugustin

  • Summary changed from connection.queries should use a ring buffer to avoid large memory consumption under runserverontrib.comments inline on Postgres 8.3 fails to cast integer to text to connection.queries should use a ring buffer to avoid large memory consumption under runserver

Restore the summary...

comment:16 Changed 23 months ago by carljm

It can grow indefinitely when the Django ORM is used outside the request/response cycle, in long-running data-processing tasks.

comment:17 Changed 23 months ago by aaugustin

I understand that, however, the summary talks of runserver which doesn't exhibit this behavior.

If we want to protect against memory exhaustion only in long running processes, can we put a limit that's outside the realm of plausibility for a single request eg. 10 000?

I'm not wild about changing the type of connection.queries, but as long as the DDT works...

comment:18 Changed 17 months ago by claudep

#21245 has been marked as a duplicate (and includes a valid use case).

comment:19 Changed 17 months ago by pztrick

  • Cc pztrick44@… added

comment:20 Changed 9 months ago by aaugustin

The self.queries = deque(maxlen=100) doesn't survive when the query log is reset with connection.queries = []. The bulk_create tests do that.

comment:21 Changed 9 months ago by aaugustin

  • Needs tests unset
  • Triage Stage changed from Accepted to Ready for checkin

I'm going to complete this patch. However, an important aspect hasn't been considered yet. connection.queries is documented in faq/models.txt as "a list of dictionaries in order of query execution". We cannot change its type.

To preserve it, we can use a deque internally and provide a queries property that turns the deque into a list. The property should also emit a warning when the limit is reached and queries are truncated.

This requires making connection.queries read-only. I suspect some people use connection.queries = [] to reset queries, even though the documentation only talks about db.reset_queries(). But since there's a documented API, I don't feel too bad about breaking undocumented usage.

I've set a high limit to minimize chances that people will hit it -- if it's over 9000, you're really in trouble -- and I've made it a class attribute to make it possible to change.

Pull request: https://github.com/django/django/pull/2772

comment:22 Changed 9 months ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from new to closed

In cfcca7ccce3dc527d16757ff6dc978e50c4a2e61:

Fixed #3711, #6734, #12581 -- Bounded connection.queries.

Prevented unlimited memory consumption when running background tasks
with DEBUG=True.

Thanks Rob, Alex, Baptiste, and others.

comment:23 Changed 9 months ago by Aymeric Augustin <aymeric.augustin@…>

In fc31319752945d4d39f70c8278ecd42af240bc64:

Fixed test again. Refs #12581.

Note: See TracTickets for help on using tickets.
Back to Top