Opened 9 years ago

Last modified 9 years ago

#24523 new Cleanup/optimization

django.apps.registry.populate() does not handle failures in app_config.ready()

Reported by: Jerry Londergaard Owned by: nobody
Component: Core (Other) Version: dev
Severity: Normal Keywords:
Cc: kalium99@…, someuniquename@… Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hi,

I'm using DJango 1.7, Apache 2.2 + mod_wsgi.

Here is a snippet of the error that I see on the server side logs:

[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] mod_wsgi (pid=27852): Target WSGI script '/home/test/prod_wsgi.py' cannot be loaded as Python module.

[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] mod_wsgi (pid=27852): Exception occurred processing WSGI script '/home/test/prod_wsgi.py'.
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] Traceback (most recent call last):
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/test/example/prod_wsgi.py", line 24, in <module>
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] application = get_wsgi_application()
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/django/core/wsgi.py", line 14, in get_wsgi_application
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] django.setup()
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/django/init.py", line 21, in setup
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] apps.populate(settings.INSTALLED_APPS)
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/django/apps/registry.py", line 115, in populate
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] app_config.ready()
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/django/contrib/admin/apps.py", line 22, in ready
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] self.module.autodiscover()
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/django/contrib/admin/init.py", line 23, in autodiscover
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] autodiscover_modules('admin', register_to=site)
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/django/utils/module_loading.py", line 74, in autodiscover_modules
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] import_module('%s.%s' % (app_config.name, module_to_search))
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/usr/local/lib/python2.7/importlib/init.py", line 37, in import_module
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] import(name)
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/test/apps/partners/admin.py", line 5, in <module>
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] from apps.partners.forms import url_status_change
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/test/apps/partners/forms.py", line 42, in <module>
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] class PartnerSelection(forms.Form):
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/test/apps/partners/forms.py", line 44, in PartnerSelection
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] (p.id, p.company_name) for p in Partner.objects.all()])
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/django/db/models/query.py", line 141, in iter
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] self._fetch_all()
...
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/MySQLdb/connections.py", line 193, in init
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] super(Connection, self).init(*args, kwargs2)
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] OperationalError: (2003, "Can't connect to MySQL server on '127.0.0.1' (111)")

So obviously, can't connect the the database. This is fine and expected (apart from the DB being down part...).

The problem is that this leaves the 'App' object in an erroneous state. The 'app_config' var has been populated, but the local 'ready' variable has not been set to True.

This means that even after the DB issue has been cleared, subsequent calls to app.populate() will fail for no obvious reason (notice the PID is the same as the previous DB error).

[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] mod_wsgi (pid=27852): Target WSGI script '/home/test/example/prod_wsgi.py' cannot be loaded as Python module., referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] mod_wsgi (pid=27852): Exception occurred processing WSGI script '/home/test/example/prod_wsgi.py'., referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] Traceback (most recent call last):, referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] File "/home/test/example/prod_wsgi.py", line 24, in <module>, referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] application = get_wsgi_application(), referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/django/core/wsgi.py", line 14, in get_wsgi_application, referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] django.setup(), referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/django/init.py", line 21, in setup, referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] apps.populate(settings.INSTALLED_APPS), referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] File "/home/testenv/lib/python2.7/site-packages/django/apps/registry.py", line 78, in populate, referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] raise RuntimeError("populate() isn't reentrant"), referer: http://www.example.tv/
[Sun Mar 22 23:57:17 2015] [error] [client 127.0.0.1] RuntimeError: populate() isn't reentrant, referer: http://www.example.tv/

I'm no Django expert, but it seems the way set and test 'app_config' and 'ready', are a bit off. I'm guessing that it is just simpler to not bother testing app_config for truth. If we are not 'ready', then just set app_config to the empty list and re-populate it, as if ready is not True, then we probably can't make too many assumptions about anything else (i.e that app_config has even bee populated correctly).

It's highly possible that there is a gap in my knowledge and there are better solutions to this.

Change History (17)

comment:1 by Tim Graham, 9 years ago

I'm not sure it would resolve the issue, but I'm concerned that your PartnerSelection form is running database queries at import time. Can you confirm that or paste the entire form so we can see about that?

comment:2 by Jerry Londergaard, 9 years ago

That's right, the PartnerSelection form is running database queries. This situation is not ideal for a couple of reasons (not in the least that the field is not updated until a process is recycled).

  class PartnerSelection(forms.Form):
      partner = forms.ChoiceField(label='Partner', choices=[ (p.id, p.company_name) 
          for p in Partner.objects.all()])

So moving this (and anything similar) would solve this particular instance of the problem, but not the underlying issue in Django.

comment:3 by Tim Graham, 9 years ago

I don't quite understand the conditions needed to reproduce the error. Could you elaborate on that? Maybe your Apache config would be useful?

comment:4 by Jerry Londergaard, 9 years ago

Sure. The simplest way to recreate it is to have your prefork (or equivalent) MPM module configured as follows

<IfModule prefork.c>
StartServers       1
ServerLimit      1
MaxClients       1
</IfModule>

And have your WSGI conf setup just to use a single process and thread

WSGIDaemonProcess example processes=1 threads=1

Have a DB call (as in my case, although really you could throw any kind of exception) at app load time somewhere.

Do the equivalent of the following (this assumes DB is running locally):

$ service mysqld stop
$ service httpd restart

Then go to a URL in your app, and you should get a 500. At this point, the server will always 500 regardless of whether you bring your mysql server back up or not. The only way to fix it is to restart apache (after fixing the underlying DB issue). In this kind of contrived example, it seems like it's not that big a deal.

It only really becomes annoying when you have a multiprocess model, and flaky infrastructure. Where you will randomly get 500's when using your app (depending on whether the process that your request is given to does or does not have this problem).

Technically you could say this is NOTABUG, because the method clearly says it is not re-entrant. However I got the impression the intent is that it's not re-entrant when it has already *succesfully* run.

If you think it's worthwhile fixing (and I'm not in fact missing something), I can write a patch for it.

comment:5 by Jerry Londergaard, 9 years ago

Just to clarify this pont:

it only really becomes annoying when you have a multiprocess model, and flaky infrastructure. Where you will randomly get 500's when using your app (depending on whether the process that your request is given to does or does not have this problem).

Of course, if your infrastructure is going up and down all the time you can expect your app to behave it all sorts of strange ways. What I specifically meant though, is that even if your DB fails for just a moment and quickly recovers you could have potentially have any number of wsgi processes being in a persistent erroneous state due to them being started up in that window where the DB was down.

comment:6 by Aymeric Augustin, 9 years ago

I'm not opposed to adding some strategically placed exception handling to solve this but I'm afraid it isn't possible in general.

What if two apps provide ready() methods, the first one successfully registers a signal and the second one fails? Re-running the app-loading process will register the signal a second time, which could create a variety of issues.

If we want to do this, we must require ready() methods to be idempotent.

comment:7 by Tim Graham, 9 years ago

Component: UncategorizedCore (Other)
Triage Stage: UnreviewedAccepted
Type: BugCleanup/optimization
Version: 1.7master

in reply to:  7 comment:8 by Jerry Londergaard, 9 years ago

In my reading of the docs, they do explicitly say (although, in a weakly worded way) that the ready() method should potentially be idempotent, and that the burden is on the developer to ensure their application is robust enough to handle multiple calls to ready().

...In the usual initialization process, the ready method is only called once by Django. But in some corner cases, particularly in tests which are fiddling with installed applications, ready might be called more than once. In that case, either write idempotent methods, or put a flag on your AppConfig classes to prevent re-running code which should be executed exactly one time.

https://docs.djangoproject.com/en/1.8/ref/applications/#django.apps.AppConfig.ready

comment:9 by Evstifeev Roman, 9 years ago

Cc: someuniquename@… added

comment:10 by Shai Berger, 9 years ago

I suspect this report to be invalid.

Just above the quoted note in AppConfig.ready()'s documentation, there's a warning: Avoid interacting with the database. The whole point of this report is "ready() failed when interacting with the database".

The issue is "concealed" because the app whose ready() happened to interact with the database is django.contrib.admin; the interaction happened because, as appears in the log above,

[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/test/apps/partners/forms.py", line 44, in PartnerSelection
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] (p.id, p.company_name) for p in Partner.objects.all()])

a query is executed in the import of a forms.py file.

The code is, of course, cut, but this looks like a "choices" argument, and if so, it is not only a problem because of database-interaction-in-ready(), but because it is a classic application bug -- if you add a Partner, it won't show in the form; the form only lets you select the partners that were recorded when the app started. To solve both issues, use a ModelChoiceField instead.

in reply to:  10 comment:11 by Jerry Londergaard, 9 years ago

Replying to shaib:

I suspect this report to be invalid.

Just above the quoted note in AppConfig.ready()'s documentation, there's a warning: Avoid interacting with the database. The whole point of this report is "ready() failed when interacting with the database".

Hi,

I'm not sure where that above quoted text is from, and the whole point of this report is certainly not that ready() failed when interacting with the database.

The whole point of this bug report is that the code does one or more of the following:

1) Gives really unhelpful errors that do not confer the cause of the problem.
2) The specific block of code in DJango does not actually work in the way it was intended.

The issue is "concealed" because the app whose ready() happened to interact with the database is django.contrib.admin; the interaction happened because, as appears in the log above,

[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] File "/home/test/apps/partners/forms.py", line 44, in PartnerSelection
[Sun Mar 22 23:47:26 2015] [error] [client 127.0.0.1] (p.id, p.company_name) for p in Partner.objects.all()])

a query is executed in the import of a forms.py file.

The code is, of course, cut, but this looks like a "choices" argument, and if so, it is not only a problem because of database-interaction-in-ready(), but because it is a classic application bug -- if you add a Partner, it won't show in the form; the form only lets you select the partners that were recorded when the app started. To solve both issues, use a ModelChoiceField instead.

Both of these points are well understood and accepted. In fact in an earlier message I said "Technically you could say this is NOTABUG, because the method clearly says it is not re-entrant. " , as well as saying "This situation is not ideal for a couple of reasons (not in the least that the field is not updated until a process is recycled)."

This ticket is about the interplay between the 'app_config' and 'ready' variables, and how perhaps they don't work quite as intended, and how changing the error message, or the code itself, could give a better user experience in such circumstances.

comment:12 by Aymeric Augustin, 9 years ago

Clearly the immediate answer to the OP is "don't execute queries at import time". But a ready() method could fail for another reasons. For instance it could import a dependency which isn't installed.

In the scenario discussed here, mod_wsgi:

  1. fails to import the application object (that's your first traceback, which is fine)
  2. retries to import it in the same Python process (that's your second traceback)

It's unclear to me that this is a good idea. If importing the WSGI application fails, I don't expect the Python environment to be reused. I'd like to know what Graham Dumpleton thinks about this.

In the same situation, gunicorn reports that workers have failed to boot and doesn't do anything.

comment:13 by Aymeric Augustin, 9 years ago

It appears that mod_wsgi has good reasons not to kill the Python interpreter in that case: https://twitter.com/aymericaugustin/status/636173067350376449

I think we must require ready() methods to be idempotent. Strictly speaking this is backwards-incompatible. But it's also a good property.

in reply to:  13 comment:14 by Carl Meyer, 9 years ago

Replying to aaugustin:

I think we must require ready() methods to be idempotent. Strictly speaking this is backwards-incompatible. But it's also a good property.

Ouch. That's quite a painful requirement, actually, given that e.g. we recommend registering signal handlers in ready(). I really think imposing an idempotency requirement on ready() must be a last resort, if there's any possible way Django could avoid double-calling it instead.

comment:15 by Aymeric Augustin, 9 years ago

Well, there's dispatch_uid... Not elegant for sure, but not really painful either.

comment:16 by Carl Meyer, 9 years ago

Oh - yeah, I forgot about dispatch_uid. That's what comes of not using signals myself :-)

And reviewing the thread here, it looks like the docs already suggest that ready methods are well-advised to be idempotent, so that's not really a change in our recommendations.

I don't have any other cases immediately coming to mind where idempotency is difficult, so I withdraw my comment. (Though it also seems like it would be quite easy for Django to mark on the AppConfig instance when ready has been called successfully, and skip calling it again. That sort of flag on AppConfig is essentially what we propose in our docs that ready authors should do to provide idempotency; I'm not sure why Django shouldn't just do that itself.)

The "interesting" process model of mod_wsgi strikes again!

comment:17 by Graham Dumpleton, 9 years ago

FWIW. The workaround for mod_wsgi is to do:

# Should only really do this in mod_wsgi daemon mode.

try:
    application = get_wsgi_application()
except Exception:
    time.sleep(0.25)
    os.kill(os.getpid(), signal.SIGINT)

This is only of use though for transient errors such as a database not being available. If the code is broken, then killing the process and restarting it isn't going to help as the code is still going to be broken.

This issue of state full failure on loading an application also affects other WSGI servers besides mod_wsgi but in other ways depending on the mode they run in.

From memory uWSGI has some option one can set which will force process restart after a number of successive failures to load a WSGI script file/module, when it is in a mode where it does deferred loading. Other WSGI servers may simply crash on startup and so if not using something like supervisord will stay dead and not be restarted automatically.

I am looking at what is best solution for Apache/mod_wsgi, but I have to look at some problems with signal handling during WSGI script file preloading at the same time. I just haven't had the time to look at it properly. I can only really implement a solution for mod_wsgi daemon mode though where I control process existence. In the case of embedded mode where Apache is the real owner of the process then I can't simply go nuking the process as it would be a bit antisocial given that other non Python stuff could be running in the same process. The other option of destroying the Python interpreter in process and reinitialising it has proven to have problems in the past and likely would cause more problems than it solves.

Note: See TracTickets for help on using tickets.
Back to Top