Code

Opened 3 years ago

Closed 3 years ago

Last modified 3 years ago

#16022 closed Bug (needsinfo)

Cyclic reference in FieldFile causes memory usage to grow considerably

Reported by: Gustavo Owned by: nobody
Component: Database layer (models, ORM) Version: 1.1
Severity: Normal Keywords: memory leak
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX:

Description

django.db.models.fields.files:FieldFile creates a cyclic reference to the model instance it's attached to via django.db.models.fields.files:FileDescriptor.

The effects can be considerable. In our Web site, for example, it causes the process running Django to increase the memory used by over 1MB after every request.

The patch is generated against Django trunk, even though it's only been tested under Django 1.1.4 (but the cyclic reference doesn't seem to have been fixed).

Attachments (1)

fieldfile-weakref.patch (675 bytes) - added by Gustavo 3 years ago.
Fixed cyclic reference by making it a weak one

Download all attachments as: .zip

Change History (10)

Changed 3 years ago by Gustavo

Fixed cyclic reference by making it a weak one

comment:1 Changed 3 years ago by Gustavo

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Please disregard the patch, it breaks file uploads. I'm trying to improve it at the moment (and suggestions are much appreciated).

comment:2 Changed 3 years ago by Gustavo

I've tried replacing weakref.ref() with weakref.proxy(), which does work but then memory keeps growing as usual. I think a proper way to fix it is to break the cyclic reference, but that'd be backwards incompatible.

I cannot afford spending more time on this, so I'll leave it to someone else. I'm happy to test any eventual fix though. In the meantime, we'll be using mod_wsgi's maximum-requests to mitigate the consequences of this problem.

A hint to whoever will have a look at this: Try setting self.instance to None in FieldFile.__init__ and you won't notice any memory growth when you query many records. This breaks creation, modification and deletion of files, but helps demonstrate that this is the problematic reference.

comment:3 Changed 3 years ago by Gustavo

  • Has patch unset
  • Keywords memory leak added

comment:4 Changed 3 years ago by Gustavo

  • Summary changed from Cyclic reference in FieldFile to Cyclic reference in FieldFile causes memory usage to grow considerably

comment:5 Changed 3 years ago by aaugustin

Python's gc is supposed to be able to collect reference cycles these days, unless there are __del__ methods. I didn't see any that could by linked with this bug.

comment:6 Changed 3 years ago by aaugustin

  • Resolution set to worksforme
  • Status changed from new to closed

OK, I tried to reproduce this seriously. I created a project called madupload and an app called receiver inside.

madupload
├── __init__.py
├── manage.py
├── receiver
│   ├── __init__.py
│   ├── models.py
│   ├── views.py
├── settings.py
└── urls.py

receiver/models.py:

from django import forms
from django.db import models

class UploadModel(models.Model):
    upload = models.FileField()

class UploadForm(forms.ModelForm):
    class Meta:
        model = UploadModel

receiver/views.py:

from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt

from .models import UploadForm

@csrf_exempt
def upload(request):
    form = UploadForm(request.POST, request.FILES)
    try:
        model = form.save(commit=False)
        return HttpResponse("Saved %i bytes.\n" % model.upload.size,
                status=201, content_type='text/plain')
    except ValueError:
        return HttpResponse(repr(form.errors) + "\n",
                status=400, content_type='text/plain')

urls.py:

from django.conf.urls.defaults import patterns, url

urlpatterns = patterns('',
    url(r'^$', 'receiver.views.upload'),
)

All other files are unchanged from Django's template.

Then I launched runserver and started uploading:

myk@mYk madupload % dd if=/dev/random of=1mb.bin bs=1024 count=1024
1024+0 records in
1024+0 records out
1048576 bytes transferred in 0.146655 secs (7149958 bytes/sec)
myk@mYk madupload % while true; do curl -F "upload=@1mb.bin" http://localhost:8000; done
Saved 1048576 bytes.
Saved 1048576 bytes.
Saved 1048576 bytes.
...

I've let it run more than one hundred of uploads, and the memory footprint of the Python process handling runserver is flat.

At this point, I have proven that the cyclic reference in FileField is not a problem in itself.

I still don't know what's causing your bug. You may have a global variable referencing your FileField objects somewhere.

comment:7 Changed 3 years ago by Gustavo

  • Resolution worksforme deleted
  • Status changed from closed to reopened

Hello, aaugustin.

Thank you very much for looking into this!

I forgot to add a useful piece of information here, which I mentioned on Google Groups: Memory increases considerably when you query a model that has a FileField. I'm sorry about that.

To reproduce this, you could create, say, 2000 records and iterate over them like this inside a Django view:

for record in MyModel.objects.all():
   echo record.upload

As you refresh the page, you'll notice that memory increases.

Cheers.

comment:8 Changed 3 years ago by aaugustin

  • Resolution set to needsinfo
  • Status changed from reopened to closed

I read your email to django-users which had more info and I updated my test code above as follows.

receiver/models.py:

from django import forms
from django.db import models

class UploadModel(models.Model):
    upload = models.FileField(upload_to='receiver')

class UploadForm(forms.ModelForm):
    class Meta:
        model = UploadModel

receiver/views.py:

from django.http import HttpResponse
from django.views.decorators.csrf import csrf_exempt

from .models import UploadForm, UploadModel

@csrf_exempt
def upload(request, commit=False):
    form = UploadForm(request.POST, request.FILES)
    try:
        model = form.save(commit=commit)
        action = "Saved" if commit else "Received"
        return HttpResponse("%s %i bytes.\n" % (action, model.upload.size),
                status=201, content_type='text/plain')
    except ValueError:
        return HttpResponse(repr(form.errors) + "\n",
                status=400, content_type='text/plain')

def names(request):
    upload_names = [model.upload.name for model in UploadModel.objects.all()]
    return HttpResponse("%i objects \n" % len(upload_names),
            status=200, content_type='text/plain')

urls.py:

from django.conf.urls.defaults import patterns, url

urlpatterns = patterns('',
    url(r'^fake/$', 'receiver.views.upload', {'commit': False}),
    url(r'^save/$', 'receiver.views.upload', {'commit': True}),
    url(r'^read/$', 'receiver.views.names'),
)

I added a database, etc. to the settings and ran syncdb.

I uploaded 100 1kb files:

myk@mYk madupload % dd if=/dev/random of=1kb.bin bs=1024 count=1
1+0 records in
1+0 records out
1024 bytes transferred in 0.000232 secs (4414149 bytes/sec)
myk@mYk madupload % for i in {1..100}; do curl -F "upload=@1kb.bin" http://localhost:8000/save/; done
Saved 1024 bytes.
Saved 1024 bytes.
Saved 1024 bytes.
...

And then I read them over and over:

myk@mYk madupload % while true; do curl http://localhost:8000/read/; done
100 objects 
100 objects 
100 objects 
...

I've been heating the planet like this for a few minutes and the memory curve climbed a little bit at the beginning, then stabilized.

Initially, I was using Python 2.6; I switched to Python 2.5 but the result was the same.

So, unfortunately, we still don't have a proof that the bug is in Django itself.

At this point, I am afraid you didn't provide sufficient information for me to reproduce the bug. You could:

  • try to play with gc.garbage to see what happens in your app, see http://docs.python.org/library/gc.html
  • modify my example until it exhibits the memory leak
  • come up with your own reproducible example, and provide complete instructions of how to run it

Thanks!

comment:9 Changed 3 years ago by aaugustin

PS : closing the ticket as "needsinfo" is part of the standard triaging procedure when we can't reproduce a problem. It doesn't mean the problem doesn't exist.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.