Opened 2 hours ago
Last modified 72 minutes ago
#37140 new Bug
NULL handling of exclude() with __in varies between nullable expressions versus literal None
| Reported by: | Sebastian Vera | Owned by: | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | exclude in subquery NULL NOT IN |
| Cc: | Simon Charette | Triage Stage: | Unreviewed |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
The in field lookup reference (docs/ref/models/querysets.txt) documents
passing a queryset on the right-hand side of __in, but does not mention
that exclude(field__in=qs) where qs.values(...) can return NULL produces
a silently-empty result set on Postgres (and any database following SQL's
three-valued logic).
This is the subquery counterpart of #20024 (literal-list flavor, fixed in
cec10f99). The subquery flavor was identified by Anssi Kääriäinen and Simon
Riggs on django-developers in 2013 [1], but the ORM-side fix never landed,
and there is currently no warning in the docs.
We hit this in production on a real invoice queryset and tracked it down to
the documented NOT IN / NULL semantics. The workaround we shipped is the
same one Simon Riggs recommended in the 2013 thread: rewrite to
~Exists(OuterRef(...)), which compiles to NULL-safe NOT EXISTS.
Proposal: add a .. warning:: block to the in field lookup reference that
- describes the failure mode (NULL in subquery -> NOT IN evaluates to UNKNOWN -> empty queryset)
- points readers to
Exists+OuterRefas the NULL-safe replacement - cross-references #20024 for the literal-list case
Plus a one-line cross-reference under the exclude() method docs pointing
to the new warning.
[1] https://groups.google.com/g/django-developers/c/OG1unUV-MOU
Change History (2)
comment:2 by , 72 minutes ago
| Cc: | added |
|---|---|
| Component: | Documentation → Database layer (models, ORM) |
| Has patch: | unset |
| Summary: | Document NULL handling of exclude() with __in subqueries → NULL handling of exclude() with __in varies between nullable expressions versus literal None |
| Type: | Cleanup/optimization → Bug |
Thanks for the suggestion. It's rare for us to document bugs and workarounds, but we have done it in some cases, and it might make sense here.
To decide that, I want to get a handle first on whether we even plan to fix this.
ticket:31667#comment:7 (2020) mentioned the fact that filter(field__in=list(nullable_subquery)) started behaving differently from filter(field__in=nullable_subquery) as a reason to revert that change, a decision we never took.
ticket:20024#comment:14 (2020) suggests we can live with that drift after all.
ticket:32043#comment:3 (2020) suggested we wouldn't even try to fix this for nullable expressions, due to difficulty, but in a comment around the same time (ticket:20024#comment:9) Simon provided some hints. I'm not sure if Simon gave that a shot at some point and decided it was not practical.
Given all that, I'd like to triage the bug aspect first before accepting a docs fix.
I checked the WIP for a nullable expression flag (#24267), and it doesn't currently handle this. In it, I see this explanation:
class Query(... def is_nullable(self, nullable_aliases): # A subquery can always return no rows and thus be nullable but the ORM # currently has some expectations with regards to IN vs ANY lookups that # must be revisited to allow this logic to work properly. return False
I have a patch ready:
https://github.com/django/django/pull/21414