Opened 8 years ago

Closed 8 years ago

Last modified 8 years ago

#27365 closed Bug (invalid)

Fields suddenly not found.

Reported by: Brandon Owned by: nobody
Component: Database layer (models, ORM) Version: 1.9
Severity: Normal Keywords: FieldDoesNotExist
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hello,
We just moved from Django 1.7.10 to 1.9.9. After the move a new error began occurring. The exception is:
FieldDoesNotExist: No related field named '...'

We see the exception when accessing the reverse OneToOne of a non-abstract models.Model subclass that has been subclassed.

class Case(models.Model):
   ...lots of fields...
class SubclassCase(Case):
  ...lots more fields...
SubclassCase.objects.get(id=12345)

generates traceback

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 149, in get_response
    response = self.process_exception_by_middleware(e, request)
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 147, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/contrib/auth/decorators.py", line 23, in _wrapped_view
    return view_func(request, *args, **kwargs)
  File "/var/www/vhosts/prod/lib/workpath/workflow/views.py", line 1730, in view_case
    additional_view = case_module.view(case)
  File "/var/www/vhosts/prod/lib/workpath/workflow/cases/subclasscase/views.py", line 286, in view
    case = SubclassCase.objects.get(id=case.id)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/manager.py", line 122, in manager_method
    return getattr(self.get_queryset(), name)(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 378, in get
    clone = self.filter(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 790, in filter
    return self._filter_or_exclude(False, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/query.py", line 808, in _filter_or_exclude
    clone.query.add_q(Q(*args, **kwargs))
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1243, in add_q
    clause, _ = self._add_q(q_object, self.used_aliases)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1269, in _add_q
    allow_joins=allow_joins, split_subq=split_subq,
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1149, in build_filter
    lookups, parts, reffed_expression = self.solve_lookup_type(arg)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1035, in solve_lookup_type
    _, field, _, lookup_parts = self.names_to_path(lookup_splitted, self.get_meta())
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/sql/query.py", line 1345, in names_to_path
    targets = (final_field.remote_field.get_related_field(),)
  File "/usr/local/lib/python2.7/dist-packages/django/db/models/fields/reverse_related.py", line 244, in get_related_field
    self.field_name)
FieldDoesNotExist: No related field named 'subclasscase'

Whenever the exception occurs, it is exactly this exception every time once _filter_or_exclude is reached. We've seen it in the context of function based views, class based views, the views of our tastypie backed API (XML and JSON) and from view functions that support light-weight JSON functions.
We see it across multiple disconnected deployments with completely separate machines, datastores, databases, etc.
We do not see it all the time.
We do not see it when running under the development server.
An apache restart fixes it... for a little while.

We are using mpm_event with WSGI.
Our mpm_event configuration looks like this:

<IfModule mpm_event_module>
        StartServers               2
        MinSpareThreads           25
        MaxSpareThreads           75
        ThreadLimit               64
        ThreadsPerChild           25
        MaxRequestWorkers        150
        MaxConnectionsPerChild  5000
</IfModule>

Our WSGI (libapache2-mod-wsgi 3.4-4ubuntu2.1.14.04.2) configuration looks like this:

WSGIDaemonProcess prod processes=2 python-path=/usr/local/lib/python2.7/dist-packages:/var/www/vhosts/prod/lib

Based on our experience that an apache restart fixed it we changed our WSGI configuration to this:

WSGIDaemonProcess prod processes=10 threads=1 maximum-requests=50 python-path=/usr/local/lib/python2.7/dist-packages:/var/www/vhosts/prod/lib

This causes the apache children to cycle quickly and the error goes away but as you can imagine this slows down the site considerably.

The error occurs much more frequently with 1.8.15 than with 1.9.9 - but again it *never* occurred in 1.7.10. Based on this, and looking at git diff 1.7.10...1.9.9 -- django/db/models/options.py; I believe it is somehow related to the _meta API change and the introduction of _get_fields_cache in the Options class.

However, I am very unfamiliar with core django code... so that could be completely wrong.

The real difficulty here is that it seems to be time or number-of-requests based. A apache child, immediately after a restart, hums along nicely. But it *seems* that if any particular code path that needs to do that kind of "reverse OneToOne" lookup is crossed too many times the error, that exact exception, occurs. Once it has occurred for a child, that child is done. If you access the site and get that child looking for that object, the exception occurs. But if you get another apache child or are looking for another object, the error has a good chance of not occurring. Again, if that child cycles out or if Apache is restarted (not reloaded) the error vanishes completely... for a little while, seemingly dependent on site traffic.

I'll be glad to supply any debug information I can. As pervasive as this error was in production, I have been unable to reproduce the problem in a development environment. As soon as I have, I'll update the ticket with whatever information I've found.

Any insight anyone could provide in debugging this problem would be much appreciated. I would not hesitate to submit a patch if I had one.

Thanks,
--Brandon

Change History (13)

comment:1 by Brandon, 8 years ago

Component: UncategorizedDatabase layer (models, ORM)
Type: UncategorizedBug

comment:2 by Tim Graham, 8 years ago

Resolution: needsinfo
Status: newclosed

Please see TicketClosingReasons/UseSupportChannels for ways to get help and reopen the ticket if you can say why Django is at fault.

comment:3 by Aymeric Augustin, 8 years ago

This looks like it could be a bug in Django, but I'm not sure. Is there any chance you're defining Django models and not importing them in an application's models module or package? This might explain what you're seeing (in that case the answer is "don't do that").

comment:4 by Brandon, 8 years ago

I don't understand what you mean by "defining Django models and not importing them in an application's models module or package".

Every model that is being used is beneath a package that is listed in INSTALLED_APPS.

Let me describe the process.
Given this structure:

class Case(models.Model):
   ...lots of fields...
class SubclassCase(Case):
  ...lots more fields...
  1. A user accesses a generic page that retrieves and renders the state of a Case instance.
  2. The user then, if they so choose, accesses a subclass specific page that uses the id of the Case instance to retrieve the subclass instance via a SubclassCase.objects.get(id=case.id). The class SubclassCase was imported via a __import__ just moments before the {{.get}} is done.
  3. The subclass instance is then used to render a subclass specific page that includes both Case and SubclassCase parameters.

This is one of 3 different but similar situations where I have seen this bug expressed. Each situation centers around following an FK backwards via on a subclass instance using SubclassCase.objects.get(id=case.id) or SubclassCase.objects.filter(id=case.id), where id is an AutoField. As was noted in the original issue and in my post to the forums (https://groups.google.com/d/msg/django-users/S4AnygYSJqU/JttIqqTLAQAJ) once you reach _filter_or_exclude the exception is *always* the same and the manner of the error is always the same (i.e. The field variable seems to be populated by self in get_related_field at line 244). The difference between the 3 situations is, 2 of the 3 situations do *not* do the __import__, only 1 of the 3 situations do the dynamic import.

comment:5 by Aymeric Augustin, 8 years ago

Every model that is being used is beneath a package that is listed in INSTALLED_APPS.

This is good. If I may ask a more precise question:

  • are all your models in simple models.py module? or do you have a models/__init__.py package with models/cases.py, etc. submodules?
  • if the latter are you importing all your models in __init__.py? If not, that could solve your problem.

You mention __import__. Did you mean import? Or do you really use the low level __import__? Could you try using apps.get_models instead? apps.get_models is intended to raise an exception in situations that may lead to such inconsistencies.

comment:6 by Brandon, 8 years ago

Yes we do actually use __import__. We are bringing in models and views dynamically. Like this:

__import__("workpath.workflow.cases.%s.views" % casetype, fromlist=["workpath.workflow.cases"])

and

__import__("workpath.workflow.cases.%s.models" % casetype, fromlist=["workpath.workflow.cases"])

But again, this was *never* a problem before 1.9.9. This basic structure and usage has been around since early 1.4. It has actually worked quite well! =)

Version 1, edited 8 years ago by Brandon (previous) (next) (diff)

comment:7 by Brandon, 8 years ago

Resolution: needsinfo
Status: closednew

I reopened this issue because this bug is still occurring after 4 months of our 2-week development cycles. We have refactored and gone entirely to apps.get_models and even reduced most all instances of field__id to field_id, because that seems to express the bug *less* often. We also could not tell if this was Apache so we moved several customer environments over from Apache2+mod_wsgi to NGINX+uWSGI (still Ubuntu). And yet the bug still occurs in *exactly* the way as described above.

Some important things to note:

  1. It occurs seemingly at random. But when it happens it will happen in waves.
  2. It occurs equally in multi-processed + single-thread or multi-processed + multi-threaded web server configurations.
  3. Recycling Apache children or NGINX vassals quickly (every 50-150 requests) causes the bug to occur *less* but it will still occur.
  4. The stacktrace is *always* the same, has remained the same, and is the same in *every* instance. There is zero deviation.
  5. It *never* occurred from Django 1.4.x to 1.7.x; but has not stopped in 1.9.9 and occurred *more* frequently in 1.8.x

Any attention at all would be most appreciated.
--Brandon

comment:8 by Tim Graham, 8 years ago

Given the esoteric nature of the problem, I think the onus is on you to explain why Django is at fault and to propose a fix. I'm not sure if I could reproduce the problem given the information you provided so far.

comment:9 by Tim Graham, 8 years ago

Resolution: needsinfo
Status: newclosed

comment:10 by Brandon, 8 years ago

Resolution: needsinfo
Status: closednew

After stepping completely through the call stack down from MySubclassOfaModelSubclass.objects.filter(id=1234) we arrive at get_related_field in reverse_related.py Line# 237

The call in this function that fails is field = self.model._meta.get_field(self.field_name)
It fails because self.field_name has somehow been set to self.name.

I cannot directly reproduce the error on demand. But setting self.field_name = self.name from a breakpoint at Line# 237 reproduces *exactly* the stack trace I am (and have been) receiving on multiple independent production instances running first under Apache+mod_wsgi then nginx+uWSGI since moving from 1.7 to 1.9 months ago.

I replicated the production stack trace via pudb checking every function scoped local variable along the way against a production stack trace that logged those variables on a failure. I checked them down from .filter to .get_related_field; all populated variables are identical down the complete call stack. The only way to reproduce that complete call stack and exact exception is to set self.field_name = self.name where self is a OneToOneRel representing MySubclassOfaModelSubclass - note that it actually shows up from repr as <OneToOneRel: myapp.mysubclassofamodelsubclass>

I apologize for continually reopening this ticket. If there is another (or better) avenue to continue delivering detail around this issue, I'm certainly open to it. This has been a major thorn in my side. I'll provide any and all information necessary to get to the bottom of this very subtle and difficult issue. I'll be happy to work-up and submit a patch once the root cause is located.

Thanks,
--Brandon

P.S. Please note that it is not just MySubclassOfaModelSubclass.objects.filter(id=1234) that fails. Any forward FK relation can fail (so MySubclassOfaModelSubclass.objects.filter(foo_id=55)) - and fails in exactly the same way. The MySubclassOfaModelSubclass class has many forward FKs in it. It doesn't seem to matter which FK field is accessed, the stack trace is always the same.

comment:11 by Tim Graham, 8 years ago

Resolution: needsinfo
Status: newclosed

You can continue to comment on the ticket without reopening it. Until you provide the root cause or a way for someone else to reproduce the issue, I don't think there's value in keeping the ticket open since no one else can do anything about it.

comment:12 by Brandon, 8 years ago

Further down the rabbit hole...

After figuring out that it was the occurance of self.field_name == self.name that was producing the problem I instrumented field_name in ForeignObjectRel so that I could compare it to name.
Here is the instrumentation I added to ForeignObjectRel

    def _set_field_name(self,value):
        name = getattr(self,'name',None)
        field_name = value
        if field_name and name and field_name == name:
            import traceback
            stacktrace = traceback.format_stack()
            message = ''.join(["Bug#27365 - self.field_name == self.name\n"]+stacktrace)
            warnings.warn(message, RuntimeWarning, stacklevel=2)
        self._field_name=value
    def _get_field_name(self):
        name = getattr(self,'name',None)
        field_name = getattr(self,'_field_name',None)
        if field_name and name and field_name == name:
            import traceback
            stacktrace = traceback.format_stack()
            message = ''.join(["Bug#27365 - self.field_name == self.name\n"]+stacktrace)
            warnings.warn(message, RuntimeWarning, stacklevel=2)
        return self._field_name
    field_name = property(_get_field_name,_set_field_name)

This allowed me to see if *anything* in the system was setting self.field_name = self.name, plus the exact trace to it in-situ.

As it turns out, it seems an old 3rd party application that adds metadata to fields as attributes was reusing the attribute name .field_name, unaware that django was using it. I believe that the introduction of the aggressive caching to relational objects that occurred between 1.7.x and 1.9.x allowed this errant use of .field_name to persist all the way up until django attempted to reuse that object. Then when django reused the object, self.field_name had been *sometimes* been incorrectly set based on the caching + 3rd party app usage + life-time of the wsgi process.

I have since deployed a monkey patch that alters the name of the metadata field used by the 3rd party application so that it does not collide with the relational object attribute .field_name.
I have not seen the error since.

--Brandon

comment:13 by Tim Graham, 8 years ago

Resolution: needsinfoinvalid

Thanks for posting about the resolution. Glad to hear that Django isn't at fault.

Note: See TracTickets for help on using tickets.
Back to Top