Opened 4 years ago
Closed 4 years ago
#31883 closed Bug (duplicate)
exclude(pk__in=[None]) returns no objects
Reported by: | Adam Sołtysik | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 3.1 |
Severity: | Normal | Keywords: | exclude in nulls |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
The problem: Model.objects.exclude(pk__in=[None])
always returns 0 objects.
This behaviour is very surprising (especially in more complicated cases) and is essentially useless, so it should be considered a bug. Model.objects.filter(pk__in=[None])
and Model.objects.filter(pk=None)
also (correctly) return no objects, and Model.objects.exclude(pk=None)
always (correctly) returns all objects.
The issue has already been raised on the internet, see e.g. https://stackoverflow.com/questions/47058497/django-queryset-exclude-why-are-all-excluded, but I haven't found a bug report in Django. I know this is caused by how NULL
s are handled in SQL, but broken SQL is certainly not a reason for Django's ORM to be broken the same way.
It could be fixed by filtering out the None
s and splitting the query, so that the first example would be equivalent to Model.objects.exclude(pk__in=[]).exclude(pk=None)
, which does correctly return all objects. The same should be done in the filter
case, so that Model.objects.filter(field__in=[something, None]
would also find objects with field=None
, which it currently ignores (this is quite unexpected as well, and one needs to use two Q
objects to get around this). Alternatively, to avoid a breaking change, None
s could be just ignored in exclude
too.
There is also a case with passing a queryset instead of a list, e.g. Model.objects.exclude(pk__in=Model2.objects.values('field'))
. Even if there are no NULL
s and it works correctly, there is another problem: it is terribly slow when the tables are large (at least in PostgreSQL). Both problems could be solved by always using a JOIN
instead of field [NOT] IN (SELECT ...)
, like if related_name
were used: Model.objects.filter(model2=None)
. Or, if this is too complicated, the inner query could receive WHERE field IS NOT NULL
and NULL
s should be handled separately (or ignored). Or the queryset could be simply converted to a list right away (which would of course make things even slower, but at least it would be clear what really happens).
Duplicate of #20024.