Opened 4 years ago

Closed 4 years ago

#31883 closed Bug (duplicate)

exclude(pk__in=[None]) returns no objects

Reported by: Adam Sołtysik Owned by: nobody
Component: Database layer (models, ORM) Version: 3.1
Severity: Normal Keywords: exclude in nulls
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The problem: Model.objects.exclude(pk__in=[None]) always returns 0 objects.

This behaviour is very surprising (especially in more complicated cases) and is essentially useless, so it should be considered a bug. Model.objects.filter(pk__in=[None]) and Model.objects.filter(pk=None) also (correctly) return no objects, and Model.objects.exclude(pk=None) always (correctly) returns all objects.

The issue has already been raised on the internet, see e.g. https://stackoverflow.com/questions/47058497/django-queryset-exclude-why-are-all-excluded, but I haven't found a bug report in Django. I know this is caused by how NULLs are handled in SQL, but broken SQL is certainly not a reason for Django's ORM to be broken the same way.

It could be fixed by filtering out the Nones and splitting the query, so that the first example would be equivalent to Model.objects.exclude(pk__in=[]).exclude(pk=None), which does correctly return all objects. The same should be done in the filter case, so that Model.objects.filter(field__in=[something, None] would also find objects with field=None, which it currently ignores (this is quite unexpected as well, and one needs to use two Q objects to get around this). Alternatively, to avoid a breaking change, Nones could be just ignored in exclude too.

There is also a case with passing a queryset instead of a list, e.g. Model.objects.exclude(pk__in=Model2.objects.values('field')). Even if there are no NULLs and it works correctly, there is another problem: it is terribly slow when the tables are large (at least in PostgreSQL). Both problems could be solved by always using a JOIN instead of field [NOT] IN (SELECT ...), like if related_name were used: Model.objects.filter(model2=None). Or, if this is too complicated, the inner query could receive WHERE field IS NOT NULL and NULLs should be handled separately (or ignored). Or the queryset could be simply converted to a list right away (which would of course make things even slower, but at least it would be clear what really happens).

Change History (1)

comment:1 by Simon Charette, 4 years ago

Resolution: duplicate
Status: newclosed

Duplicate of #20024.

Note: See TracTickets for help on using tickets.
Back to Top