Opened 13 years ago
Last modified 3 days ago
#20024 assigned Bug
QuerySet.exclude() does not work with lists containing a 'None' element.
| Reported by: | Owned by: | Eddy ADEGNANDJOU | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | |
| Cc: | Carsten Fuchs, Simon Charette | Triage Stage: | Accepted |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | yes | Patch needs improvement: | yes |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
For example:
Entry.objects.exclude(foo__in=[None, 1])
It is supposed to return all items whose foo field is not None or 1, but it actually returns an empty query set.
Change History (28)
comment:1 by , 13 years ago
| Triage Stage: | Unreviewed → Accepted |
|---|---|
| Type: | Uncategorized → Bug |
comment:2 by , 13 years ago
Let's not be too smart... NULL in SQL is nowhere near as consistent as None in Python because of SQL's tri-valued boolean logic; we cannot hide this.
comment:3 by , 13 years ago
| Description: | modified (diff) |
|---|
comment:5 by , 10 years ago
| Summary: | 'exclude' does not work with lists containing a 'None' element. → QuerySet.exclude() does not work with lists containing a 'None' element. |
|---|
comment:6 by , 7 years ago
| Cc: | added |
|---|
comment:7 by , 5 years ago
I think this should be fixed, this a seriously surprising flaw in the ORM. I've proposed several possible fixes for this issue in #31883, I'll try to summarize them here.
The simplest would be to just ignore Nones in the list. This would be consistent with how filter currently works. When dealing with querysets (__in=qs.values('col')), it should be enough to add WHERE col IS NOT NULL in the SELECT subquery.
A better solution, but a more breaking change, would be to add a proper IS [NOT] NULL check, in both exclude and filter cases, if None is in the list. This would allow writing filter(col__in=[something, None]) instead of filter(Q(col=something) | Q(col=None)). Some additional checks for inner querysets would also be required, so that the behaviour is consistent, or the queryset could be just converted to a list beforehand.
comment:8 by , 5 years ago
| Owner: | changed from to |
|---|---|
| Status: | new → assigned |
| Version: | 1.4 → master |
I wanted to fix this, and I've found that it has already been partially fixed in #31667. However, there are no tests for exclude, and the behaviour is now inconsistent, since exclude(col__in=queryset.values('col2')) still gives no results when col2 contains NULL. I'll try to provide a patch to clean this up.
comment:9 by , 5 years ago
| Cc: | added |
|---|
Thanks for surfacing this inconsistency Adam.
It looks like adding an else clause to deal with a query right hand side value that has explicitly selected fields that can be null (primary keys can't)
The RelatedIn lookup will also need to be adjusted
comment:10 by , 5 years ago
| Has patch: | set |
|---|
I've created a patch. No PR yet, though, because one test fails: WindowFunctionTests.test_window_expression_within_subquery. SQL doesn't support filtering by window functions, so I guess the code should be corrected to use another subquery (or maybe an EXISTS query). Unfortunately I don't know how to write this without using QuerySet utilities.
As for RelatedIn, the things are weird here. It seems this case is handled differently and already works correctly, and worked even before #31667 (test_ticket20024_related_in). It was probably fixed in #13815 long ago. There is IS NOT NULL in the subquery, actually there are two next to each other. One is coming from split_exclude() (called in build_filter()), but the other one -- I couldn't find. Also, what looks like another thing to take a closer look at -- there is NOT IN (None) instead of NOT IN (NULL) in the SQL for the first query in that test, when the rhs.discard(None) line is removed.
Edit: I've moved the code to build_filter(), it fits better. New patch here.
comment:11 by , 5 years ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
It would be best to make the behaviour of In lookup consistent with RelatedIn. Sadly, I wasn't able to understand how the latter works. So my patch currently crashes with window functions, and it also crashes with OuterRef due to #31714. As much as I'd like to finish this, I don't have enough time and knowledge to look into it further. If anyone comes up with a solution, feel free to claim this issue.
comment:12 by , 5 years ago
| Has patch: | unset |
|---|
comment:13 by , 5 years ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
comment:14 by , 5 years ago
| Owner: | removed |
|---|---|
| Status: | assigned → new |
I fear I won't be able to work on this issue in the near future so I'm deassigning myself.
Now that we filter out literal None values from __in=[None] lookup (#31667) we do have a mismatch with __in=nullable_expression but at least the behaviour is consistent with the nullable_field=F('nullable_field') and nullable_field=None asymmetry (#32043).
comment:15 by , 3 months ago
| Description: | modified (diff) |
|---|---|
| Owner: | set to |
| Status: | new → assigned |
I’ve been reading up on SQL’s tri-valued logic (TRUE / FALSE / UNKNOWN), which explains why SQL needs IS NULL and IS NOT NULL for comparisons involving NULL. I knew that when writing queries involving blank values one is supposed to use IS NULL, now I understand why. Neat.
As of ticket #31667, filter(fieldin=[None]) now silently ignores None, which means some of the earlier concerns about fixing this no longer apply.
I believe this can be resolved cleanly by checking if None is present in the rhs, and if so, splitting the clause into an IN condition and an IS NULL condition joined by OR. That should align exclude(fieldin=[None, X]) with user expectations and prevent SQL NULL logic from leaking into the ORM unnecessarily.
Assigning to myself and will begin a patch.
comment:16 by , 3 months ago
| Has patch: | set |
|---|
comment:17 by , 3 months ago
| Has patch: | unset |
|---|
comment:18 by , 3 months ago
| Has patch: | set |
|---|
comment:19 by , 2 months ago
| Needs tests: | set |
|---|---|
| Patch needs improvement: | set |
Composite primary key tests still need adjustments.
comment:20 by , 2 months ago
| Needs tests: | unset |
|---|---|
| Patch needs improvement: | unset |
comment:21 by , 7 weeks ago
| Has patch: | unset |
|---|---|
| Owner: | removed |
| Status: | assigned → new |
comment:22 by , 7 weeks ago
| Has patch: | set |
|---|
comment:23 by , 6 weeks ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
Hello Jason
I see you have closed a few of your PRs but left this one open, so I will keep this assigned to you in case this update was a mistake
If you do not wish to work on the ticket, you can unassign yourself and close the PR and I will mark "Patch needs improvement" as Yes to indicate the ticket needs a new owner before being put back in the review queue
I appreciate that you might have gotten tired of waiting for a review and that might be why you've closed some PRs. It's been a busy period with the 6.0 feature freeze, various security reports and conferences - this means it takes a really long time for us to get round to all PRs and I'm sorry if that's been your experience.
comment:24 by , 6 weeks ago
| Has patch: | unset |
|---|---|
| Owner: | removed |
| Status: | assigned → new |
comment:25 by , 5 weeks ago
| Owner: | set to |
|---|---|
| Status: | new → assigned |
comment:26 by , 2 weeks ago
Considering this model:
class Entry(models.Model): foo = models.IntegerField(null=True, blank=True) def __str__(self): return f"Entry {self.id} with foo={self.foo}"
Here is the current behavior of exclude with the __in lookup and a list containing None:
>>> Entry.objects.bulk_create([Entry(foo=1), Entry(foo=2), Entry(foo=None)]) [<Entry: Entry 1 with foo=1>, <Entry: Entry 2 with foo=2>, <Entry: Entry 3 with foo=None>] >>> Entry.objects.exclude(foo__in=[None]) <QuerySet [<Entry: Entry 1 with foo=1>, <Entry: Entry 2 with foo=2>, <Entry: Entry 3 with foo=None>]> >>> str(Entry.objects.exclude(foo__in=[None]).query) 'SELECT "core_entry"."id", "core_entry"."foo" FROM "core_entry"' >>> Entry.objects.exclude(foo__in=[None, 1]) <QuerySet [<Entry: Entry 2 with foo=2>, <Entry: Entry 3 with foo=None>]> >>> str(Entry.objects.exclude(foo__in=[None, 1]).query) 'SELECT "core_entry"."id", "core_entry"."foo" FROM "core_entry" WHERE NOT ("core_entry"."foo" IN (1) AND "core_entry"."foo" IS NOT NULL)'
Let's analyse the previous SQL when foo=None (NULL in SQL)
"core_entry"."foo" IN (1) ---> NULL IN (1) ---> UNKNOWN "core_entry"."foo" IS NOT NULL ---> NULL IS NOT NULL ---> FALSE ("core_entry"."foo" IN (1) AND "core_entry"."foo" IS NOT NULL) ---> (UNKNOWN AND FALSE) ---> FALSE NOT ("core_entry"."foo" IN (1) AND "core_entry"."foo" IS NOT NULL) ---> NOT FALSE ---> TRUE
For foo=None, the SQL condition evaluates to TRUE. So Entry(foo=None) is also returned by Entry.objects.exclude(foo__in=[None, 1])
This is not specific to Django. All frameworks or ORMs that generate IN / NOT IN SQL queries can face this issue, because the root cause is SQL’s three-valued logic. Most ORMs in other languages do not automatically “fix” this. Automatic fixing could introduce inconsistencies and subtle bugs.
Should we still try to find a Django-specific solution to this, or should we promote explicit queries using Q-objects for example ?
>>> from django.db.models import Q >>> Entry.objects.exclude(Q(foo=None) | Q(foo=1)) <QuerySet [<Entry: Entry 2 with foo=2>]> >>> str(Entry.objects.exclude(Q(foo=None) | Q(foo=1)).query) 'SELECT "core_entry"."id", "core_entry"."foo" FROM "core_entry" WHERE NOT (("core_entry"."foo" IS NULL OR ("core_entry"."foo" = 1 AND "core_entry"."foo" IS NOT NULL)))'
comment:27 by , 3 days ago
| Has patch: | set |
|---|
comment:28 by , 3 days ago
| Needs tests: | set |
|---|---|
| Patch needs improvement: | set |
This is a known failure (tested in queries/tests.py, test_col_not_in_list_containing_null()).
This is somewhat hard to fix as SQL's "somecol NOT IN lst_containing_null" is weird - it will always return False (well, technically UNKNOWN but this doesn't matter here), and this is what causes the bug.
It would be possible to check for presence of None in the given list. If present, remove it and add in a proper IS NULL check instead. But then a query like
.exclude(somecol__in=qs_containig_null)would work differently from.exclude(somecol__in=list(qs_containig_null)). I guess that could be classified as known abstraction leak.