Opened 5 years ago

Closed 5 years ago

Last modified 5 years ago

#31582 closed New feature (wontfix)

Django template engine render pre-fetches data even when iterator() is used

Reported by: Sümer Cip Owned by: nobody
Component: Template system Version: 3.0
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi,

I was testing template engines with a memory profiler we have written and I have seen an interesting issue with Django template engine. Still not sure if this is an issue or intended.

The database backend I use is django.db.backends.sqlite3. I am using following code to retrieve a huge list of objects:

comments = Comment.objects.all().iterator()
template = loader.get_template('polls/comments.html')
context = {
        'comments': comments,
    }
return HttpResponse(template.render(context, request))

And the template code:

{% for comment in comments %}
    a
{% endfor %}

When I ran the view, I see the django.db.models.base.Comment.__init__ caches the results. If I change the TEMPLATE backend to Jinja2 I verified it is not being cached.

Is this somewhat intended behaviour? If so, how can we disable Model caching for Django templates?

Change History (12)

comment:1 by Carlton Gibson, 5 years ago

Resolution: needsinfo
Status: newclosed

Hi. I'm not sure this is the right place (as yet) for this query.

It seems like a usage question, rather than a specific bug reports, so I'd point you to TicketClosingReasons/UseSupportChannels.

Maybe there's an issue there... if there's a concrete behaviour that's wrong but I can't see that from the report.

I see the django.db.models.base.Comment.init caches the results.

QuerySet has an internal results cache. Model.__init__() creates the instance in memory yes, but ModelIterable (which is what .iterator() gives you) instantiates the objects when you iterate it, so that's expected behaviour, independent of Jinja vs the DTL.

If you can narrow this down to something specific that Django is doing wrong, we're very happy to have a look.

comment:2 by Sümer Cip, 5 years ago

Let me clarify this a bit more.

I have modified ModelIterable.__iter__ like following:

class ModelIterable(BaseIterable):
    """Iterable that yields a model instance for each row."""

    def __iter__(self):
        print("Enter ModelIterable.__iter__", tracemalloc.get_traced_memory())
        ...
        print("Leave ModelIterable.__iter__", tracemalloc.get_traced_memory())

Above code prints out the current used memory in that function + the peak memory reached.
When I run it without using an iterator() call I get following values:

DTL no iterator (7K items)
Enter ModelIterable.__iter__ (1340851, 1342301)
Leave ModelIterable.__iter__ (5063404, 5063988)

Now, as you indicated this seems OK because the Comment object is created and its values initialized and cached. Let's see what happens if we run this with iterator:

DTL iterator (7K items)
Enter ModelIterable.__iter__ (92488, 92995)
Leave ModelIterable.__iter__ (5113261, 5219668)

Now, the memory usage and peak starts with a smaller value and somehow the memory usage+peak values go from 92KB to 5MB. We have confirmed from previous run that 5MB is the memory needed to hold the objects in cache. So, either memory is being cached somewhere or we are not freeing memory in this function.

After returning from this function, the memory used+peak returns to normal, but we are peaking memory with the growth of items which should not be the case for iterator(), right?

I have verified above code is working fine with Jinja2 engine. See the results:

Jinja2 no iterator (7K ITEMS)
Enter ModelIterable.__iter__ (1362882, 1364332)
Leave ModelIterable.__iter__ (5080351, 5080935)

Jinja2 iterator (7K ITEMS)
Enter ModelIterable.__iter__ (115616, 116123)
Leave ModelIterable.__iter__ (878651, 1343215)

You can see that when iterator is being used, the memory usage numbers do not grow by a factor of 50. It is not reaching the value of 5080351 like we have in DTL.

I am suspecting, somehow the loop in ModelIterable.init is holding the reference somewhere else in DTL case, maybe?

for row in compiler.results_iter(results):
            obj = model_cls.from_db(db, init_list, row[model_fields_start:model_fields_end])
            ....


Version 0, edited 5 years ago by Sümer Cip (next)

comment:3 by Keryn Knight, 5 years ago

If you can observe a difference between DTL & Jinja2, I'm inclined to think it's due to the defaulttags.ForNode implementation, and a potential candidate for the 'leak' would be:

if not hasattr(values, "__len__"):
    values = list(values)

comment:4 by Sümer Cip, 5 years ago

Hi Keryn,

I think you are exactly right about that:

I traced memory allocation before/after that function call:

if not hasattr(values, '__len__'):
    print("render len enter", tracemalloc.get_traced_memory())
    values = list(values)
    print("render len exit", tracemalloc.get_traced_memory())
'''
render len enter (48421, 48653)
render len exit (9390626, 9462231)
'''

But we also have this afterwards in that function which complicates things more:

for node in self.nodelist_loop:
      nodelist.append(node.render_annotated(context))

context is pointing to the item which is the model itself! Even we fix above len() call somehow, this will hurt us again.

I am not sure how to proceed further on this as this seems to require more internal Django knowledge than I have right now :)

Any ideas?

comment:5 by Sümer Cip, 5 years ago

And I think we should reopen this?

comment:6 by Sümer Cip, 5 years ago

Resolution: needsinfo
Status: closednew

comment:7 by Sümer Cip, 5 years ago

Summary: Django template backend allocates model cache even iterator() is usedDjango template engine render pre-fetches data even when iterator() is used

comment:8 by Carlton Gibson, 5 years ago

Triage Stage: UnreviewedAccepted
Type: BugNew feature

Good follow up both, that's super!

At first glance, certainly :), consuming the iterator into a list isn't ideal.

Question is, what can we do about it?

The next line is:

len_values = len(values)

Then len_values is used to populate the various loop counter variables. revcounter, last and so on.

I haven't thought it through but, I could see these possibly being made optional, maybe along the lines of adding an extra option like reversed?
(It would require editing the do_for() function to parse another option. Not sure what it should be: ...iterator? 😬)

I think we'd have to call that a new feature. The behaviour isn't incorrect, it just uses more memory than we'd like in this case.

Let's provisionally accept on that basis. It might be worth pinging the DevelopersMailingList to seek advice on how to proceed, particularly with regard to the possible API for such a change.

comment:9 by Carlton Gibson, 5 years ago

Ah, I didn't properly read this bit:

context is pointing to the item which is the model itself! Even we fix above len() call somehow, this will hurt us again.

There's an effort to allow a generator based approach to template rendering in #13910. There's a PR for that which got so far but needs pushing over the line.
I don't know if that would solve second issues here...?

comment:10 by Keryn Knight, 5 years ago

Yielding for all rendering rather than build up the nodelist would presumably assist in fixing the problem, but the problem itself would still have to remain as long as revcounter and revcounter0 are documented as being there (I've no idea who'd use those, but presumably they exist for a reason). For last you can probably rework the loop to be based around using next(...) to find out if there *is* another item to yield, if there's not. That's last. Similarly for empty nodes rendering you can check whether the first next() raised StopIteration

But until revcounter and revcounter0 become optional (ie: the guarantee is only made if the items has a __len__) you can't avoid consuming the iterator to find out the overall span. I certainly don't think adding another argument is a good idea, because you could then end up with something absurd like {% for x in mymodel.objects.all.iterator iterator %} or {% for x in mymodel.objects.all iterator %}, right? which both look like typos or accidents.

I don't know if the exception made by doing reversed(mymodel.objects.all().iterator()) would get swallowed by DTL or surfaced, but that'd be another consideration. If it's surfaced I think that's fine, if it's not that's another issue.

comment:11 by Carlton Gibson, 5 years ago

Resolution: wontfix
Status: newclosed

I certainly don't think adding another argument is a good idea, because you could then end up with something absurd like...

OK. I think Keryn's analysis is correct. I agree the extra param doesn't look nice, but I can't see another way.

So short of a concrete proposal, I think we have to say wontfix. (Happy to reopen if an acceptable proposal is forthcoming.)

comment:12 by Carlton Gibson, 5 years ago

Triage Stage: AcceptedUnreviewed
Note: See TracTickets for help on using tickets.
Back to Top