Opened 5 years ago

Closed 5 years ago

Last modified 4 years ago

#27985 closed Bug (fixed)

Converting `Foo.objects.filter(bar=None)` to an `IsNull` too early.

Reported by: Jarek Glowacki Owned by: Sergey Fedoseev
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: sql None NULL transform
Cc: Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This is a little similar to #25946, but different angle.

Case Study:

We define a custom Field which converts emptystring to None before sending to DB and then None back to emptystring when reading from DB:

WHY (optional reading):

For want of a CharField that enforces uniqueness on non-empty elements only. '' == '' in the DB, but NULL != NULL. We don't want to just allow passing None into it in the code though. Hence NulledCharField.

Example Implementation:

class NulledCharField(models.CharField):
    description = _("CharField that stores NULL for emptystrings but returns ''.")

    def from_db_value(self, value, expression, connection, context):
        if value is None:
            return ''
        else:
            return value

    def to_python(self, value):
        if isinstance(value, models.CharField):
            return value
        if value is None:
            return ''
        return value

    def get_prep_value(self, value):
        if value is '':
            return None
        else:
            return value

This works for the most part, but not with filtering!
I took a look at the SQL, and Foo.objects.filter(bar='') resolves to ... WHERE "Foo"."bar" = NULL; args=(None,)
Compare this to Foo.objects.filter(bar=None), which resolves to ... WHERE "Foo"."bar" IS NULL', ()

The reason for this is that the =None -> __isnull=True conversion happens before any transformation of the value.

Specifically, in django/db/models/sql/query.py,
Query.build_filter() calls on self.prepare_lookup_value() (line1139) before it calls self.build_lookup() (line 1187).
The former does the isnull replacement, while the latter transforms the value.

This also applies (more severely) to the converse: If my get_prep_value() were to for some reason convert None to something else, then this would get ignored as Django would just decide to use the IsNull lookup off the bat.

Solution seems like we just need to rearrange the order here a little. Happy to give it a go, but if anyone has context as to whether it's maybe this way by design, do tell. The above cases would at the very least serve as good failing tests if we think this kind of custom field support is appropriate.

Change History (14)

comment:1 Changed 5 years ago by Tim Graham

Triage Stage: UnreviewedAccepted

I haven't investigated, but please provide a patch and I'll take a closer look at it.

comment:2 Changed 5 years ago by Jarek Glowacki

Here's a failing testcase for now; will have a crack at writing a patch soon..

Also realised it gets even messier for the .exclude() case, where it adds an extra query.
This causes a mutually exclusive pair of conditions, that'll always evaluate to false.
It does this:
Foo.objects.exclude(bar='') -> ... WHERE NOT ("foo"."bar" = NULL AND "foo"."bar" IS NOT NULL)
Instead of just
Foo.objects.exclude(bar=None) -> ... WHERE NOT ("foo"."bar" IS NULL)

comment:3 Changed 5 years ago by Jarek Glowacki

Owner: changed from nobody to Jarek Glowacki
Status: newassigned

comment:4 Changed 5 years ago by Jarek Glowacki

Owner: Jarek Glowacki deleted
Status: assignednew

Yeahh, nope.
The current workflow is too convoluted to just slot in a patch for this.

I couldn't find a way to untangle the ordering such that it'd cover this use case without breaking multiple other use cases.
I'd suggest taking this ticket under consideration when someone comes round to refactor this file.

Specifically, the functions build_filter and prepare_lookup_value need to be broken down into more managable bits. It's evident they've had extra conditional steps shoehorned in over time, which has made them very difficult to work with.

For anyone who comes across this in the future, all we need to address this case is ensure that this snippet of code from prepare_lookup_value:

        # Interpret '__exact=None' as the sql 'is NULL'; otherwise, reject all
        # uses of None as a query value.
        if value is None:
            if lookups[-1] not in ('exact', 'iexact'):
                raise ValueError("Cannot use None as a query value")
            return True, ['isnull'], used_joins

is evaluated AFTER the field's get_prep_value has processed the value.
Here are the failing tests again: Failing tests

And here's a bandaid workaround that hacks two lookups and the form field to get around the problem.

class NulledCharField(models.CharField):
    """
    This is required when we want uniqueness on a CharField whilst also allowing it to be empty.
    In the DB, '' == '', but NULL != NULL. So we store emptystring as NULL in the db, but treat it
    as emptystring in the code.
    """
    description = _("CharField that stores emptystring as NULL but returns ''.")

    def from_db_value(self, value, expression, connection, context):
        if value is None:
            return ''
        else:
            return value

    def to_python(self, value):
        if isinstance(value, models.CharField):
            return value
        if value is None:
            return ''
        return value

    def get_prep_value(self, value):
        if value is '':
            return None
        else:
            return value

    class NulledCharField(forms.CharField):
        """ A form field for the encompassing model field. """
        def clean(self, value):
            if value == '':
                return None
            return value

    def formfield(self, form_class=None, **kwargs):
        return super().formfield(form_class=self.__class__.NulledCharField, **kwargs)


@NulledCharField.register_lookup
class NulledCharExactLookup(Exact):
    """ Generate ISNULL in the `exact` lookup, because Django won't use the `isnull` lookup correctly. """
    lookup_name = 'exact'

    def as_sql(self, compiler, connection):
        sql, params = compiler.compile(self.lhs)
        if self.rhs in ('', None):
            return "%s IS NULL" % sql, params
        else:
            return super().as_sql(compiler, connection)


@NulledCharField.register_lookup
class NulledCharIsNullLookup(IsNull):
    """
    Doing something like Foo.objects.exclude(bar='') has django trying to shove
    an extra isnull check where it doesn't need one. We solve this by just crippling
    this lookup altogether. We don't need it as all ISNULL checks that we actually
    do want, we generate in the `exact` lookup. :(
    """
    lookup_name = 'isnull'
    prepare_rhs = False

    def as_sql(self, compiler, connection):
        sql, params = compiler.compile(self.lhs)
        return "TRUE", params

comment:5 Changed 5 years ago by Sergey Fedoseev

Owner: set to Sergey Fedoseev
Status: newassigned

comment:6 Changed 5 years ago by Sergey Fedoseev

Has patch: set

comment:7 Changed 5 years ago by Tim Graham <timograham@…>

Resolution: fixed
Status: assignedclosed

In 58da81a5:

Fixed #27985 -- Fixed query for exact=value when get_prep_value() converts value to None.

Also fixed crash of .filter(fieldtransform=None).

comment:8 Changed 5 years ago by Tim Graham <timograham@…>

In 10bfa876:

Refs #27985 -- Reallowed using exact=None as an alias for isnull=True if a custom lookup class with lookup_name != None is registered as the exact lookup.

Regression in 58da81a5a372a69f0bac801c412b57f3cce5f188 and prerequisite
for refs #28896.

comment:9 Changed 5 years ago by Tim Graham <timograham@…>

In a5c6040:

[2.0.x] Refs #27985 -- Reallowed using exact=None as an alias for isnull=True if a custom lookup class with lookup_name != None is registered as the exact lookup.

Regression in 58da81a5a372a69f0bac801c412b57f3cce5f188 and prerequisite
for refs #28896.

Backport of 10bfa876be59feec24bb6a40fa11bece808ee405 from master

comment:10 Changed 4 years ago by Jon Dufresne

This change caused a regression for my project. Here is a simplified example:

class Group(models.Model):
    pass

class User(models.Model):
    groups = models.ManyToManyField('Group')

class MyTest(TestCase):
    def test_unsaved(self):
        Group.objects.create()
        u = User()
        self.assertFalse(Group.objects.filter(user=u).exists())

This test passes on 1.11 but fails on 2.0.

On 1.11, passing an unsaved model to a M2M filter would result in an INNER JOIN. A bisect shows that since commit a5c60404476124c682c996bfb1eb077d59f1ec53, it now results in a LEFT OUTER JOIN.

At this stage, only looking for clarification on what is correct to see if this change in behavior is intentional. Is this now correct as a LEFT JOIN or should this be considered a regression for this specific use?

comment:11 Changed 4 years ago by Simon Charette

Cc: Simon Charette added

John, I feel like using a LOUTER join is the appropriate behavior here.

I assume the previous query was

SELECT *
FROM group
INNER JOIN user_groups ON (user_groups.group_id = group.id)
WHERE user_groups.user_id = NULL

And the new one is

SELECT *
FROM group
LEFT OUTER JOIN user_groups ON (user_groups.group_id = group.id)
WHERE user_groups.user_id IS NULL

Using m2m__isnull=True (or reverse_fk__isnull=True) always triggered this LOUTER AFAIK. I guess what broke here was to make m2m=None -> m2m__isnull=True which seems more appropriate given how SQL deals with = NULL.

I guess we could get related and reverse related field __exact lookups to warn/error out when an unsaved object is passed as a value though, it looks like this was undefined behavior if it wasn't caught by the test suite. An other option would be to make the related __exact=unsaved_object lookup raise EmptyResultSet on compilation to prevent any query from taking place and restore the 1.11 behavior.

comment:12 Changed 4 years ago by Simon Charette

Here's a PR implementing the EmptyResultSet approach that could be used in a 2.0 backport.

https://github.com/django/django/compare/master...charettes:ticket-27985

I really think we should deprecating this path in 2.1+ though given the ambiguity of the lookup when instance._state.adding, I feel like this would be coherent with the checks preventing instance.m2m.add(unsaved_instance).

Last edited 4 years ago by Simon Charette (previous) (diff)

comment:13 Changed 4 years ago by Jon Dufresne

Thanks Simon.

I assume the previous query was ...

Yes, that is right. And looking at this again, now, I think the = in = NULL was certainly wrong or unintended. So it would seem I was relying on undefined behavior and got lucky.

Here's a PR implementing the EmptyResultSet approach that could be used in a 2.0 backport. ... I really think we should deprecating this path in 2.1+ though

Please don't feel the need to add a workaround or hack for my project. Now that it is identified, I'm happy to fix my buggy code to rely only on defined behavior.

Or are you suggesting that in a future version of Django, RelatedExact raise an exception when used with an unsaved object?

comment:14 Changed 4 years ago by Simon Charette

Please don't feel the need to add a workaround or hack for my project. Now that it is identified, I'm happy to fix my buggy code to rely only on defined behavior.

Or are you suggesting that in a future version of Django, RelatedExact raise an exception when used with an unsaved object?

While it was undefined I'm kind of shocked we didn't have any tests for this. I'm not sure if you added this test on purpose in your suite but I could see it happening often in cases where users mix model factories with the ORM in tests.

I feel like preventing unsaved objects from being used in all related filters in future versions of Django would be the best course of action here. Surprisingly erroring on if value._state.adding in get_normalized_values yields quite a few test failures in the suite. Nothing that looks unsolvable though.

Note: See TracTickets for help on using tickets.
Back to Top