Code

Opened 5 years ago

Closed 2 years ago

#11670 closed Bug (fixed)

Model fields named 'year', 'month', 'gt', 'lt' etc. get mistaken for lookup types in lookups across relations

Reported by: andy@… Owned by: julien
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: joel@…, anssi.kaariainen@…, akaariai Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Using a field named 'year' in a related table:

Event.objects.filter(eventinfo__year=settings.YEAR)

Throws a type error:

TypeError at /events/submit_review/
Related Field has invalid lookup: year

The problem is easily worked around by specifying a type of match to use (I used the exact keyword):

Event.objects.filter(eventinfo__year__exact=settings.YEAR)

I suspect this is a clash with the date field lookup functions. I'm not sure if this is a bug or if it's just something to mention in the documentation for related searches.

Attachments (8)

11670.diff (5.4 KB) - added by jpwatts 4 years ago.
11670.field-lookup-collisions.diff (8.1 KB) - added by julien 3 years ago.
11670.field-lookup-collisions.2.diff (8.0 KB) - added by julien 3 years ago.
11670.field-lookup-collisions.3.diff (21.4 KB) - added by julien 3 years ago.
11670.field-lookup-collisions.4.diff (35.5 KB) - added by julien 3 years ago.
11670.field-lookup-collisions.5.diff (36.9 KB) - added by julien 3 years ago.
11670.field-lookup-collisions.6.diff (37.1 KB) - added by julien 3 years ago.
11670.field-lookup-collisions.7.diff (7.6 KB) - added by julien 2 years ago.

Download all attachments as: .zip

Change History (35)

comment:1 Changed 4 years ago by russellm

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Triage Stage changed from Unreviewed to Accepted

comment:2 Changed 4 years ago by jpwatts

  • Cc joel@… added

comment:3 Changed 4 years ago by jpwatts

  • Has patch set
  • Owner changed from nobody to jpwatts
  • Version changed from 1.1 to SVN

I'm attaching a patch that fixes this problem by restricting which kinds of lookups will be attempted on related fields. I've included a regression test and verified that the rest of Django's test suite passes (with the exception of the test that isn't passing on trunk either).

Changed 4 years ago by jpwatts

comment:4 Changed 4 years ago by jpwatts

  • Owner jpwatts deleted

comment:5 Changed 3 years ago by julien

  • Severity set to Normal
  • Type set to Bug

comment:6 Changed 3 years ago by julien

  • Patch needs improvement set

The tests would need to be rewritten using unittests since this is now Django's preferred way.

comment:7 Changed 3 years ago by julien

  • Easy pickings unset
  • UI/UX unset

This was again reported in #16714. Note that the same issue occurs with fields named 'month' and 'day' as well as 'year'. It'd be nice to reflect that in the tests.

Last edited 3 years ago by julien (previous) (diff)

comment:8 Changed 3 years ago by ramiro

See also ticket:8424#comment:13 for a similar problem and #16187 (possibly a long term solution). There is also another ticket or thread I can't find right now that proposed changing lookup names to all uppercase.

Changed 3 years ago by julien

Changed 3 years ago by julien

comment:9 Changed 3 years ago by julien

OK so I've updated the patch simplifying the code's algorithm and the tests.

comment:10 Changed 3 years ago by akaariai

  • Cc anssi.kaariainen@… added

I read somewhat quickly through the patch. This looks simple and efficient.

I think this could be extended in the future if need be (referring for example to ticket #16187).

However, I wonder if it would be better to make this a separate method. This could be usable in checking the validitity of other calls, order_by for example. I would actually go so far as to make this model._meta method, returning 3-tuple field_path, final_field (which can be in field_path or not), and lookup_parts. This could be useful outside of sql.query, and this would make for proper separation of concerns.

I don't have time for more thorough review at the moment. I am not changing patch needs improvement. I wonder if it is set for the latest patch or just forgotten to checked?

comment:11 Changed 3 years ago by julien

  • Patch needs improvement unset

Thanks for your review and ideas! Splitting that code out into a separate method sounds interesting -- could you elaborate on where else and how it would be useful?

I'm removing the Patch needs improvement flag, since I think this patch fixes the original bug, which appears to be the highest priority at this stage -- Larger refactors could be tracked elsewhere like in #16187. The patch would still need a close review before it can proceed, though.

comment:12 Changed 3 years ago by akaariai

Well, the idea would be that when for example order by is called, we could check if the order is valid at that point. It would be an trivial check to do:

for field in fields:
    if self.resolve_lookup_path(field)[LOOKUP_PART]:
        raise ValueError('Invalid order by!')

and that is pretty much it. We should first remove the leading '-' though...

But I also think it really belongs to model._meta. Why does sql/query.py need to know how to travel the fields? IMHO it doesn't need to know that. Abstract that away in a model._meta method and just call it when needed. There are also other parts in Django which would benefit from this. Take a look at django/contrib/admin/options.py:L236.

comment:13 Changed 3 years ago by julien

Thanks, I'll like this idea. I've made good progress writing a new patch for that and I'll try to post it here tomorrow.

Changed 3 years ago by julien

comment:14 Changed 3 years ago by julien

Actually, I can post my patch right now (see above).

As suggested by akaariai I've created a new Options.resolve_lookup_path() method and used it in multiple places, which allows to simplify the code quite a bit, in particular in the admin. I've tried to make the algorithm as efficient as possible and used some caching mechanism too.

Feedback welcome. Thank you!

Changed 3 years ago by julien

comment:15 Changed 3 years ago by julien

  • Patch needs improvement set

I've pushed the refactoring further, notably by modifying setup_joins. Now a given lookup wouldn't be parsed multiple times -- only once. I'm still getting 6 test errors that I haven't got the time to get to the bottom of yet, but I think we're getting pretty close now.

Changed 3 years ago by julien

comment:16 Changed 3 years ago by julien

  • Patch needs improvement unset

All tests are now passing. I'm reasonably happy of where the patch is at now. I'm just wondering whether the values fetched from the resolve_lookup_path cache should systematically be copied (as currently in the the patch) or if it should be the calling code's responsibility to make copies before modifying the values -- if the later, then there would probably be a slight speed increase.

At this point I'd really like to get the patch reviewed. Any feedback is welcome. Thanks!

Last edited 3 years ago by julien (previous) (diff)

comment:17 Changed 3 years ago by akaariai

Why not cache and return tuples? That way there is no way the caller can modify them. If caller needs to do it, the caller can turn them into lists. However, I doubt you can show any real speed difference, as clone will take all the time in the interesting cases.

I would like to review this, but I don't know when I will have the time to do it.

comment:18 Changed 3 years ago by julien

Yeah using tuples does the trick (updated in latest patch). That removes my only concern with the patch. I'm pretty sure this refactoring would increase speed in the Python code processing. I'll try to do some benchmarking.

Changed 3 years ago by julien

comment:19 Changed 3 years ago by julien

  • Summary changed from Minor limitation using fields named 'year' in related searches. to Model fields named 'year', 'month', 'gt', 'lt' etc. get mistaken for lookup types in lookups across relations

comment:20 Changed 3 years ago by julien

To make it easier to track changes, I've pushed everything to a github branch: https://github.com/jphalip/django/compare/master...lookup_refactor

The benchmarks I've run with djangobench seem to show a very slight speed increase but nothing significant. Therefore this patch is mostly about simplifying and DRY-ing up the lookup system's codebase.

comment:21 Changed 3 years ago by akaariai

  • Patch needs improvement set

Some notes:

  • TYPO: Lookups on non-existants fields are ok, since they're ignored
  • Shouldn't the cache be cleared in add_field like the other field caches?
  • Is it absolutely necessary to pass in the query into resolve_lookup_path, it is only used in:
                    field_name not in query.query_terms and
                    field_name not in query.aggregate_select.keys() and
                    field_name not in query.aggregates.keys() and
                    LOOKUP_SEP.join(parts) not in query.aggregates.keys()
    

It would be IMHO better to pass in two sets, one containing the first three lines unioned, and the other the query.aggregates.keys(). Even better would be to move some of this logic into query.py. One idea is just to ignore the whole FieldDoesNotExist and let the caller work out what the result should be. So if you resolve 'foo__bar_id', you would only resolve the foo part and let the caller figure out what to do with the bar_id part. There is also a related bug here: the cache key is the passed in path, but the result which is cached depends also on the query and allow_explicit_fk, so if first called with 'foo__bar_id', True, we would cache the wrong result for the call 'foo__bar_id', False.

Otherwise the patch looks good to me. This is a big cleanup to lookup path handling in my opinion.

comment:22 Changed 3 years ago by julien

@akaariai: Thanks for your review. I've updated the branch: https://github.com/jphalip/django/compare/master...lookup_refactor#files_bucket

The cache is now flushed in add_field(). As for your comment re: not passing the query to resolve_lookup_path(), I've tried but couldn't come up with a clean implementation. Have you got any specific ideas for it?

comment:23 Changed 3 years ago by akaariai

  • Cc akaariai added

I have an idea, but maybe it is not workable. So, my first idea is to get the patch in as is, and revisit this issue later on.

The maybe-unworkable idea is that the code for traveling the lookups translates to this logic (at least I think it does):

while lookup_leads_to_field:
    fields.append(found_field)
    if found_field is related_field:
        related_field = found_field.related_field
    else:
        goto marker1

if lookups_left:
    if not allow_explicit_fk:
        raise FieldError
marker1:
...

Now, if you get a lookup like rel_field__non_existing_field you are going to raise FieldError. But if you instead do nonrel_field__non_existing_field you are going to say that nonrel_field was found and non_existing_field part wasn't looked. This is because nonrel_field will "goto marker1" (via NotRelatedField exception) and skip the if lookups_left part.

So, the only idea I have is to return the rel_field in the rel_field__non_existing_field lookup case. So, something like this:

# pseudoish...
for counter, field_name in enumerate(parts):
    try:
        if field_name == 'pk':
            field_name = opts.pk.name
        field_info = opts.get_field_by_name(field_name)
        fields += (field_info,)
        if (counter + 1) < num_parts:
            # If we haven't reached the end of the list of
            # lookups yet, then let's attempt to continue
            # traversing relations.
            related_model = get_model_from_relation(field_info[0])
            opts = related_model._meta
    except FieldDoesNotExist, NotRelatedField:
        break
if fields:
    # We found at least one field, lets return that
    last_field = fields[-1]
else:
    # The lookup was totally bogus - it didn't resolve any field from
    # the base model
    raise FieldError
return parts, fields, last_field

The caller is then responsible to raising FieldError if that is the appropriate action, doing the explicit_fk logic and checking for existing aggregates, etc. I haven't tried this at all, so it is possible this idea is bogus, or will make the caller's job more complicate than it needs to be.

As said before: getting the patch in as is could be the thing to do. Polish it later on when we know how custom lookups are going to be handled. I have a tendency to over-complicate things, and I have a feeling I am in over-complicating mode right now... :)

Changed 2 years ago by julien

comment:24 Changed 2 years ago by julien

The patch above is a slight improvement on 11670.field-lookup-collisions.2.diff, as the full lookup resolution now only occurs if the last part isn't a lookup type. That means there wouldn't be any extra computation for the most common lookups.

Since it's a bit late before 1.4 gets released to do a full refactor as discussed earlier, I suggest we start by fixing the bug using that latest patch, and then look into a proper refactor later for 1.5.

comment:25 Changed 2 years ago by akaariai

  • Patch needs improvement unset

If the patch is good to go, it would be nice to get this into 1.4.

In that case opening a new ticket about the larger refactor would be nice, so that the patches about the refactoring do not get lost.

I unchecked the "patch needs improvement" box, as I think that is per the triaging guide: patch author removes that box when he feels he has something ready, reviewer rechecks it, or marks as ready for checkin as needed. Unfortunately, I don't have time for review just now.

comment:26 Changed 2 years ago by julien

I'm going to commit the fix for the originally reported issue. A deeper refactoring would still be welcome but it should be tackled as part of a separate ticket. #16187 could be a good candidate.

comment:27 Changed 2 years ago by julien

  • Owner set to julien
  • Resolution set to fixed
  • Status changed from new to closed

In [17450]:

Fixed #11670 -- Prevented genuine model fields named 'year', 'month', 'gt', 'lt' etc. from being mistaken for lookup types in lookups across relations. Thanks to andy for the report, to jpwatts for the initial patch and to Anssi Kääriäinen and Alex Gaynor for the reviews.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.