Code

Opened 4 years ago

Closed 4 years ago

Last modified 4 years ago

#12374 closed (worksforme)

QuerySet .iterator() loads everything into memory anyway

Reported by: Nick Welch <mackstann@…> Owned by: nobody
Component: Database layer (models, ORM) Version: 1.1
Severity: Keywords: orm, cache, iterator
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Iterating through the result of .iterator() still causes a huge spike in memory consumption. In contrast, loading only one record with [:1] does not.

Others have run into this problem:

http://stackoverflow.com/questions/1443279/django-iterate-over-a-query-set-without-cache

Notice his follow-up comment to the suggestion of using .iterator():

"Its still chewing through a ton of RAM when I use your call. :("

This has been my experience as well.

Attachments (0)

Change History (5)

comment:1 Changed 4 years ago by Alex

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to worksforme
  • Status changed from new to closed

The author there identifies the issue as being the mysql query cache (in mysqldb I imagine). There's nothing we can do about that, when you use .iterator() django doesn't cache the data.

comment:2 Changed 4 years ago by Nick Welch <mackstann@…>

comment:3 Changed 4 years ago by walk_n_wind

  • Resolution worksforme deleted
  • Status changed from closed to reopened

I'm having this problem with QuerySet.iterator() as well - we have a table with about 2 million records in it, and looping it through it the following way causes the Python process to gather upwards of 2GB of RAM (from about 50MB to start):

for p in Property.objects.all().iterator()

print "hello"

Definitely confused because there's very little out there that says iterator() didn't help with memory consumption, and lots more out there that says iterator() did help. I'm running Django 1.1 with Python 2.6.4.

Does Django make use of the psycopg2 mentioned above?

comment:4 Changed 4 years ago by russellm

  • Resolution set to worksforme
  • Status changed from reopened to closed

The most likely cause of problems here is having DEBUG=True enabled; this will cause the debug cursor to soak up memory.

If you can validate that this problem still exists with DEBUG=False (and without the MySQL query cache problems), please reopen.

comment:5 Changed 4 years ago by zombie_ninja

I have experienced the similar problem and I have set DEBUG=False in settings. I have over 7 million rows of records in my Postgresql database.

My current setup is Debian lenny running django 1.2 (lenny backports), postgresql 8.3.7

Currently I have been using pagination approach to iterate through all records as iterator() uses up close to all of my system memory.

However, it was interesting to see that I could iterator through the dataset by doing the following (using only fraction of my system memory):

query_set = Article.objects.all()
cache = query_set._fill_cache
i = 0
while (True):

try:

r = cache.im_self[i]
print r.id, r.title, r.author
i += 1

except Exception, e:

break

Would appreciate it if someone could give some insight to this.

Regards.


Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.