Context Navigation

← Previous Ticket
Next Ticket →

#20024 assigned Bug

QuerySet.exclude() does not work with lists containing a 'None' element.

Reported by:	stillwater.ke@…	Owned by:	Eddy ADEGNANDJOU
Component:	Database layer (models, ORM)	Version:	dev
Severity:	Normal	Keywords:
Cc:	Carsten Fuchs, Simon Charette	Triage Stage:	Accepted
Has patch:	yes	Needs documentation:	no
Needs tests:	no	Patch needs improvement:	yes
Easy pickings:	no	UI/UX:	no

Description (last modified by Jason Hall)

For example:

Entry.objects.exclude(foo__in=[None, 1])

It is supposed to return all items whose foo field is not None or 1, but it actually returns an empty query set.

Change History (32)

comment:1 by Anssi Kääriäinen, 13 years ago

Triage Stage:	Unreviewed → Accepted
Type:	Uncategorized → Bug

This is a known failure (tested in queries/tests.py, test_col_not_in_list_containing_null()).

This is somewhat hard to fix as SQL's "somecol NOT IN lst_containing_null" is weird - it will always return False (well, technically UNKNOWN but this doesn't matter here), and this is what causes the bug.

It would be possible to check for presence of None in the given list. If present, remove it and add in a proper IS NULL check instead. But then a query like .exclude(somecol__in=qs_containig_null) would work differently from .exclude(somecol__in=list(qs_containig_null)). I guess that could be classified as known abstraction leak.

comment:2 by Aymeric Augustin, 13 years ago

Let's not be too smart... NULL in SQL is nowhere near as consistent as None in Python because of SQL's tri-valued boolean logic; we cannot hide this.

comment:3 by Luke Plant, 13 years ago

Description:	modified (diff)

comment:4 by Tim Graham, 11 years ago

#13768 is a duplicate.

comment:5 by Tim Graham, 11 years ago

Summary:	'exclude' does not work with lists containing a 'None' element. → QuerySet.exclude() does not work with lists containing a 'None' element.

comment:6 by Carsten Fuchs, 7 years ago

Cc:	Carsten Fuchs added

comment:7 by Adam Sołtysik, 5 years ago

I think this should be fixed, this a seriously surprising flaw in the ORM. I've proposed several possible fixes for this issue in #31883, I'll try to summarize them here.

The simplest would be to just ignore Nones in the list. This would be consistent with how filter currently works. When dealing with querysets (__in=qs.values('col')), it should be enough to add WHERE col IS NOT NULL in the SELECT subquery.

A better solution, but a more breaking change, would be to add a proper IS [NOT] NULL check, in both exclude and filter cases, if None is in the list. This would allow writing filter(col__in=[something, None]) instead of filter(Q(col=something) | Q(col=None)). Some additional checks for inner querysets would also be required, so that the behaviour is consistent, or the queryset could be just converted to a list beforehand.

comment:8 by Adam Sołtysik, 5 years ago

Owner:	changed from nobody to Adam Sołtysik
Status:	new → assigned
Version:	1.4 → master

I wanted to fix this, and I've found that it has already been partially fixed in #31667. However, there are no tests for exclude, and the behaviour is now inconsistent, since exclude(col__in=queryset.values('col2')) still gives no results when col2 contains NULL. I'll try to provide a patch to clean this up.

comment:9 by Simon Charette, 5 years ago

Cc:	Simon Charette added

Thanks for surfacing this inconsistency Adam.

It looks like adding an else clause to deal with a query right hand side value that has explicitly selected fields that can be null (primary keys can't)

https://github.com/django/django/blob/632ccffc49658de5348c51a8918719364be54f37/django/db/models/lookups.py#L387-L389

The RelatedIn lookup will also need to be adjusted

https://github.com/django/django/blob/e74b3d724e5ddfef96d1d66bd1c58e7aae26fc85/django/db/models/fields/related_lookups.py#L98-L99

comment:10 by Adam Sołtysik, 5 years ago

Has patch:	set

I've created a patch. No PR yet, though, because one test fails: WindowFunctionTests.test_window_expression_within_subquery. SQL doesn't support filtering by window functions, so I guess the code should be corrected to use another subquery (or maybe an EXISTS query). Unfortunately I don't know how to write this without using QuerySet utilities.

As for RelatedIn, the things are weird here. It seems this case is handled differently and already works correctly, and worked even before #31667 (test_ticket20024_related_in). It was probably fixed in #13815 long ago. There is IS NOT NULL in the subquery, actually there are two next to each other. One is coming from split_exclude() (called in build_filter()), but the other one -- I couldn't find. Also, what looks like another thing to take a closer look at -- there is NOT IN (None) instead of NOT IN (NULL) in the SQL for the first query in that test, when the rhs.discard(None) line is removed.

Edit: I've moved the code to build_filter(), it fits better. New patch here.

Version 1, edited 5 years ago by Adam Sołtysik (previous) (next) (diff)

comment:11 by Adam Sołtysik, 5 years ago

Owner:	Adam Sołtysik removed
Status:	assigned → new

It would be best to make the behaviour of In lookup consistent with RelatedIn. Sadly, I wasn't able to understand how the latter works. So my patch currently crashes with window functions, and it also crashes with OuterRef due to #31714. As much as I'd like to finish this, I don't have enough time and knowledge to look into it further. If anyone comes up with a solution, feel free to claim this issue.

comment:12 by Mariusz Felisiak, 5 years ago

Has patch:	unset

comment:13 by Simon Charette, 5 years ago

Owner:	set to Simon Charette
Status:	new → assigned

comment:14 by Simon Charette, 5 years ago

Owner:	Simon Charette removed
Status:	assigned → new

I fear I won't be able to work on this issue in the near future so I'm deassigning myself.

Now that we filter out literal None values from __in=[None] lookup (#31667) we do have a mismatch with __in=nullable_expression but at least the behaviour is consistent with the nullable_field=F('nullable_field') and nullable_field=None asymmetry (#32043).

comment:15 by Jason Hall, 6 months ago

Description:	modified (diff)
Owner:	set to Jason Hall
Status:	new → assigned

I’ve been reading up on SQL’s tri-valued logic (TRUE / FALSE / UNKNOWN), which explains why SQL needs IS NULL and IS NOT NULL for comparisons involving NULL. I knew that when writing queries involving blank values one is supposed to use IS NULL, now I understand why. Neat.

As of ticket #31667, filter(fieldin=[None]) now silently ignores None, which means some of the earlier concerns about fixing this no longer apply.

I believe this can be resolved cleanly by checking if None is present in the rhs, and if so, splitting the clause into an IN condition and an IS NULL condition joined by OR. That should align exclude(fieldin=[None, X]) with user expectations and prevent SQL NULL logic from leaking into the ORM unnecessarily.

Assigning to myself and will begin a patch.

comment:16 by Jason Hall, 6 months ago

Has patch:	set

comment:17 by Jason Hall, 6 months ago

Has patch:	unset

comment:18 by Jason Hall, 6 months ago

Has patch:	set

comment:19 by Simon Charette, 5 months ago

Needs tests:	set
Patch needs improvement:	set

Composite primary key tests still need adjustments.

comment:20 by Jason Hall, 5 months ago

Needs tests:	unset
Patch needs improvement:	unset

comment:21 by Jason Hall, 5 months ago

Has patch:	unset
Owner:	Jason Hall removed
Status:	assigned → new

comment:22 by Simon Charette, 5 months ago

Has patch:	set

comment:23 by Sarah Boyce, 5 months ago

Owner:	set to Jason Hall
Status:	new → assigned

Hello Jason
I see you have closed a few of your PRs but left this one open, so I will keep this assigned to you in case this update was a mistake
If you do not wish to work on the ticket, you can unassign yourself and close the PR and I will mark "Patch needs improvement" as Yes to indicate the ticket needs a new owner before being put back in the review queue

I appreciate that you might have gotten tired of waiting for a review and that might be why you've closed some PRs. It's been a busy period with the 6.0 feature freeze, various security reports and conferences - this means it takes a really long time for us to get round to all PRs and I'm sorry if that's been your experience.

comment:24 by Jason Hall, 5 months ago

Has patch:	unset
Owner:	Jason Hall removed
Status:	assigned → new

comment:25 by Eddy ADEGNANDJOU, 4 months ago

Owner:	set to Eddy ADEGNANDJOU
Status:	new → assigned

comment:26 by Eddy ADEGNANDJOU, 3 months ago

Considering this model:

class Entry(models.Model):
    foo = models.IntegerField(null=True, blank=True)

    def __str__(self):
        return f"Entry {self.id} with foo={self.foo}"

Here is the current behavior of exclude with the __in lookup and a list containing None:

>>> Entry.objects.bulk_create([Entry(foo=1), Entry(foo=2), Entry(foo=None)])
[<Entry: Entry 1 with foo=1>, <Entry: Entry 2 with foo=2>, <Entry: Entry 3 with foo=None>]
>>> Entry.objects.exclude(foo__in=[None])
<QuerySet [<Entry: Entry 1 with foo=1>, <Entry: Entry 2 with foo=2>, <Entry: Entry 3 with foo=None>]>
>>> str(Entry.objects.exclude(foo__in=[None]).query)
'SELECT "core_entry"."id", "core_entry"."foo" FROM "core_entry"'
>>> Entry.objects.exclude(foo__in=[None, 1])
<QuerySet [<Entry: Entry 2 with foo=2>, <Entry: Entry 3 with foo=None>]>
>>> str(Entry.objects.exclude(foo__in=[None, 1]).query)
'SELECT "core_entry"."id", "core_entry"."foo" FROM "core_entry" WHERE NOT ("core_entry"."foo" IN (1) AND "core_entry"."foo" IS NOT NULL)'

Let's analyse the previous SQL when foo=None (NULL in SQL)

"core_entry"."foo" IN (1)  --->  NULL IN (1)  --->  UNKNOWN 
"core_entry"."foo" IS NOT NULL  --->  NULL IS NOT NULL  ---> FALSE  
("core_entry"."foo" IN (1) AND "core_entry"."foo" IS NOT NULL)  --->  (UNKNOWN AND FALSE)  --->  FALSE
NOT ("core_entry"."foo" IN (1) AND "core_entry"."foo" IS NOT NULL)  --->  NOT FALSE  --->  TRUE

For foo=None, the SQL condition evaluates to TRUE. So Entry(foo=None) is also returned by Entry.objects.exclude(foo__in=[None, 1])

This is not specific to Django. All frameworks or ORMs that generate IN / NOT IN SQL queries can face this issue, because the root cause is SQL’s three-valued logic. Most ORMs in other languages do not automatically “fix” this. Automatic fixing could introduce inconsistencies and subtle bugs.

Should we still try to find a Django-specific solution to this, or should we promote explicit queries using Q-objects for example ?

>>> from django.db.models import Q
>>> Entry.objects.exclude(Q(foo=None) | Q(foo=1))
<QuerySet [<Entry: Entry 2 with foo=2>]>
>>> str(Entry.objects.exclude(Q(foo=None) | Q(foo=1)).query)
'SELECT "core_entry"."id", "core_entry"."foo" FROM "core_entry" WHERE NOT (("core_entry"."foo" IS NULL OR ("core_entry"."foo" = 1 AND "core_entry"."foo" IS NOT NULL)))'

comment:27 by Eddy ADEGNANDJOU, 3 months ago

Has patch:	set

comment:28 by Jacob Walls, 3 months ago

Needs tests:	set
Patch needs improvement:	set

comment:29 by Jacob Walls, 3 months ago

Needs tests:	unset

comment:30 by Jacob Walls, 3 months ago

Patch needs improvement:	unset

comment:31 by Jacob Walls, 3 months ago

Triage Stage:	Accepted → Ready for checkin

comment:32 by Jacob Walls, 3 months ago

Patch needs improvement:	set
Triage Stage:	Ready for checkin → Accepted

Note: See TracTickets for help on using tickets.

Download in other formats:

Issues