#31582 closed New feature (wontfix)
Django template engine render pre-fetches data even when iterator() is used
Reported by: | Sümer Cip | Owned by: | nobody |
---|---|---|---|
Component: | Template system | Version: | 3.0 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
Hi,
I was testing template engines with a memory profiler we have written and I have seen an interesting issue with Django template engine. Still not sure if this is an issue or intended.
The database backend I use is django.db.backends.sqlite3. I am using following code to retrieve a huge list of objects:
comments = Comment.objects.all().iterator() template = loader.get_template('polls/comments.html') context = { 'comments': comments, } return HttpResponse(template.render(context, request))
And the template code:
{% for comment in comments %} a {% endfor %}
When I ran the view, I see the django.db.models.base.Comment.__init__
caches the results. If I change the TEMPLATE backend to Jinja2
I verified it is not being cached.
Is this somewhat intended behaviour? If so, how can we disable Model caching for Django templates?
Change History (12)
comment:1 by , 5 years ago
Resolution: | → needsinfo |
---|---|
Status: | new → closed |
comment:2 by , 5 years ago
Let me clarify this a bit more.
I have modified ModelIterable.__iter__
like following:
class ModelIterable(BaseIterable): """Iterable that yields a model instance for each row.""" def __iter__(self): print("Enter ModelIterable.__iter__", tracemalloc.get_traced_memory()) ... print("Leave ModelIterable.__iter__", tracemalloc.get_traced_memory())
Above code prints out the current used memory in that function + the peak memory reached.
When I run it without using an iterator()
call I get following values:
DTL no iterator (7K items) Enter ModelIterable.__iter__ (1340851, 1342301) Leave ModelIterable.__iter__ (5063404, 5063988)
Now, as you indicated this seems OK because the Comment
object is created and its values initialized and cached. Let's see what happens if we run this with iterator:
DTL iterator (7K items) Enter ModelIterable.__iter__ (92488, 92995) Leave ModelIterable.__iter__ (5113261, 5219668)
Now, the memory usage and peak starts with a smaller value and somehow the memory usage+peak values go from 92KB to 5MB. We have confirmed from previous run that 5MB is the memory needed to hold the objects in cache. So, either memory is being cached somewhere or we are not freeing memory in this function.
After returning from this function, the memory used+peak returns to normal, but we are peaking memory with the growth of items which should not be the case for iterator()
, right?
I have verified above code is working fine with Jinja2 engine. See the results:
Jinja2 no iterator (7K ITEMS) Enter ModelIterable.__iter__ (1362882, 1364332) Leave ModelIterable.__iter__ (5080351, 5080935) Jinja2 iterator (7K ITEMS) Enter ModelIterable.__iter__ (115616, 116123) Leave ModelIterable.__iter__ (878651, 1343215)
You can see that when iterator is being used, the memory usage numbers do not grow by a factor of 50. It is not reaching the value of 5080351 like we have in DTL.
I am suspecting, somehow the loop in ModelIterable.init is holding the reference somewhere else in DTL case, maybe?
for row in compiler.results_iter(results): obj = model_cls.from_db(db, init_list, row[model_fields_start:model_fields_end]) ....
comment:3 by , 5 years ago
If you can observe a difference between DTL & Jinja2, I'm inclined to think it's due to the defaulttags.ForNode
implementation, and a potential candidate for the 'leak' would be:
if not hasattr(values, "__len__"): values = list(values)
comment:4 by , 5 years ago
Hi Keryn,
I think you are exactly right about that:
I traced memory allocation before/after that function call:
if not hasattr(values, '__len__'): print("render len enter", tracemalloc.get_traced_memory()) values = list(values) print("render len exit", tracemalloc.get_traced_memory()) ''' render len enter (48421, 48653) render len exit (9390626, 9462231) '''
But we also have this afterwards in that function which complicates things more:
for node in self.nodelist_loop: nodelist.append(node.render_annotated(context))
context
is pointing to the item
which is the model itself! Even we fix above len()
call somehow, this will hurt us again.
I am not sure how to proceed further on this as this seems to require more internal Django knowledge than I have right now :)
Any ideas?
comment:6 by , 5 years ago
Resolution: | needsinfo |
---|---|
Status: | closed → new |
comment:7 by , 5 years ago
Summary: | Django template backend allocates model cache even iterator() is used → Django template engine render pre-fetches data even when iterator() is used |
---|
comment:8 by , 5 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Bug → New feature |
Good follow up both, that's super!
At first glance, certainly :), consuming the iterator into a list isn't ideal.
Question is, what can we do about it?
The next line is:
len_values = len(values)
Then len_values
is used to populate the various loop counter variables. revcounter
, last
and so on.
I haven't thought it through but, I could see these possibly being made optional, maybe along the lines of adding an extra option like reversed
?
(It would require editing the do_for()
function to parse another option. Not sure what it should be: ...iterator
? 😬)
I think we'd have to call that a new feature. The behaviour isn't incorrect, it just uses more memory than we'd like in this case.
Let's provisionally accept on that basis. It might be worth pinging the DevelopersMailingList to seek advice on how to proceed, particularly with regard to the possible API for such a change.
comment:9 by , 5 years ago
Ah, I didn't properly read this bit:
context is pointing to the item which is the model itself! Even we fix above len() call somehow, this will hurt us again.
There's an effort to allow a generator based approach to template rendering in #13910. There's a PR for that which got so far but needs pushing over the line.
I don't know if that would solve second issues here...?
comment:10 by , 5 years ago
Yielding for all rendering rather than build up the nodelist
would presumably assist in fixing the problem, but the problem itself would still have to remain as long as revcounter
and revcounter0
are documented as being there (I've no idea who'd use those, but presumably they exist for a reason). For last
you can probably rework the loop to be based around using next(...)
to find out if there *is* another item to yield, if there's not. That's last. Similarly for empty nodes rendering you can check whether the first next()
raised StopIteration
But until revcounter
and revcounter0
become optional (ie: the guarantee is only made if the items
has a __len__
) you can't avoid consuming the iterator to find out the overall span. I certainly don't think adding another argument is a good idea, because you could then end up with something absurd like {% for x in mymodel.objects.all.iterator iterator %}
or {% for x in mymodel.objects.all iterator %}
, right? which both look like typos or accidents.
I don't know if the exception made by doing reversed(mymodel.objects.all().iterator())
would get swallowed by DTL or surfaced, but that'd be another consideration. If it's surfaced I think that's fine, if it's not that's another issue.
comment:11 by , 5 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
I certainly don't think adding another argument is a good idea, because you could then end up with something absurd like...
OK. I think Keryn's analysis is correct. I agree the extra param doesn't look nice, but I can't see another way.
So short of a concrete proposal, I think we have to say wontfix
. (Happy to reopen if an acceptable proposal is forthcoming.)
comment:12 by , 5 years ago
Triage Stage: | Accepted → Unreviewed |
---|
Hi. I'm not sure this is the right place (as yet) for this query.
It seems like a usage question, rather than a specific bug reports, so I'd point you to TicketClosingReasons/UseSupportChannels.
Maybe there's an issue there... if there's a concrete behaviour that's wrong but I can't see that from the report.
QuerySet
has an internal results cache.Model.__init__()
creates the instance in memory yes, butModelIterable
(which is what.iterator()
gives you) instantiates the objects when you iterate it, so that's expected behaviour, independent of Jinja vs the DTL.If you can narrow this down to something specific that Django is doing wrong, we're very happy to have a look.