Opened 3 weeks ago
Last modified 7 days ago
#37140 assigned Bug
Document how to handle NULL when using filter()/exclude() with __in subqueries
| Reported by: | Sebastian Vera | Owned by: | SnippyCodes |
|---|---|---|---|
| Component: | Documentation | Version: | dev |
| Severity: | Normal | Keywords: | exclude in subquery NULL NOT IN |
| Cc: | Simon Charette, Joseano Sousa | Triage Stage: | Accepted |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
The in field lookup reference (docs/ref/models/querysets.txt) documents
passing a queryset on the right-hand side of __in, but does not mention
that exclude(field__in=qs) where qs.values(...) can return NULL produces
a silently-empty result set on Postgres (and any database following SQL's
three-valued logic).
This is the subquery counterpart of #20024 (literal-list flavor, fixed in
cec10f99). The subquery flavor was identified by Anssi Kääriäinen and Simon
Riggs on django-developers in 2013 [1], but the ORM-side fix never landed,
and there is currently no warning in the docs.
We hit this in production on a real invoice queryset and tracked it down to
the documented NOT IN / NULL semantics. The workaround we shipped is the
same one Simon Riggs recommended in the 2013 thread: rewrite to
~Exists(OuterRef(...)), which compiles to NULL-safe NOT EXISTS.
Proposal: add a .. warning:: block to the in field lookup reference that
- describes the failure mode (NULL in subquery -> NOT IN evaluates to UNKNOWN -> empty queryset)
- points readers to
Exists+OuterRefas the NULL-safe replacement - cross-references #20024 for the literal-list case
Plus a one-line cross-reference under the exclude() method docs pointing
to the new warning.
[1] https://groups.google.com/g/django-developers/c/OG1unUV-MOU
Change History (8)
comment:2 by , 3 weeks ago
| Cc: | added |
|---|---|
| Component: | Documentation → Database layer (models, ORM) |
| Has patch: | unset |
| Summary: | Document NULL handling of exclude() with __in subqueries → NULL handling of exclude() with __in varies between nullable expressions versus literal None |
| Type: | Cleanup/optimization → Bug |
Thanks for the suggestion. It's rare for us to document bugs and workarounds, but we have done it in some cases, and it might make sense here.
To decide that, I want to get a handle first on whether we even plan to fix this.
ticket:31667#comment:7 (2020) mentioned the fact that filter(field__in=list(nullable_subquery)) started behaving differently from filter(field__in=nullable_subquery) as a reason to revert that change, a decision we never took.
ticket:20024#comment:14 (2020) suggests we can live with that drift after all.
ticket:32043#comment:3 (2020) suggested we wouldn't even try to fix this for nullable expressions, due to difficulty, but in a comment around the same time (ticket:20024#comment:9) Simon provided some hints. I'm not sure if Simon gave that a shot at some point and decided it was not practical.
Given all that, I'd like to triage the bug aspect first before accepting a docs fix.
I checked the WIP for a nullable expression flag (#24267), and it doesn't currently handle this. In it, I see this explanation:
class Query(... def is_nullable(self, nullable_aliases): # A subquery can always return no rows and thus be nullable but the ORM # currently has some expectations with regards to IN vs ANY lookups that # must be revisited to allow this logic to work properly. return False
comment:3 by , 9 days ago
Replacing __in to perform EXISTS query is likely never going to happen for the reasons discussed in the thread so we'd likely need to introduce an __exists lookup instead to avoid breaking backward compatibility so knowing that I wouldn't be opposed to documenting this pitfall even if I suspect folks would only stumble upon the edge case after the fact.
As for what should be documented as a workaround having the subquery explicitly exclude NULL (e.g. filter(inner_field__isnull=False)) is likely going to be closer to what the ORM would normally generate and I would avoid tying the problem to exclude as the same behaviour exists for filter as well as demonstrated in this fiddle.
As alluded to ticket:20024#comment:14 fixing this for all nullable expressions is quite hard to do the right way. Even if we reduced the problem to only queryset like right-hand-side we'd still have to find the column being matched against and alter the query to do the equivalent of filter(inner_field__isnull=False) without causing side effects (e.g. what if the field is an aggregation or window function?).
comment:4 by , 8 days ago
| Component: | Database layer (models, ORM) → Documentation |
|---|---|
| Summary: | NULL handling of exclude() with __in varies between nullable expressions versus literal None → Document how to handle NULL when using filter()/exclude() with __in subqueries |
| Triage Stage: | Unreviewed → Accepted |
Sebastian, if you would like to resubmit a documentation patch following the above feedback, you are welcome to.
comment:5 by , 8 days ago
| Cc: | added |
|---|---|
| Has patch: | set |
| Needs documentation: | set |
| Needs tests: | set |
| Owner: | set to |
| Status: | new → assigned |
comment:6 by , 8 days ago
| Has patch: | unset |
|---|---|
| Needs documentation: | unset |
| Needs tests: | unset |
| Owner: | removed |
| Status: | assigned → new |
Claude patch wasn't close enough to iterate on.
comment:7 by , 7 days ago
I'd like to take this over! I will update the in field lookup documentation to warn about the NULL subquery behavior for both filter() and exclude(). Following Simon's advice, I will document filter(inner_fieldisnull=False) as the recommended workaround.
comment:8 by , 7 days ago
| Has patch: | set |
|---|---|
| Owner: | set to |
| Status: | new → assigned |
I have a patch ready:
https://github.com/django/django/pull/21414