Opened 18 years ago
Last modified 2 months ago
#2361 assigned Bug
QuerySet.filter(m2mfield__isnull=False) may return duplicates
Reported by: | Owned by: | Norbert Stüken | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | normal | Keywords: | |
Cc: | Fabio Sangiovanni | Triage Stage: | Accepted |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
My Django installation is SVN Revision: 3238
Filtering models by a ManyToMany field does not appear to work.
See Google groups thread http://groups.google.com/group/django-users/browse_thread/thread/d0d799a45cb92d35
I have two classes -- Blog and Submission:
class Blog( models.Model ): entries = models.ManyToManyField( Submission ) class Submission( models.Model ): [... whatever ]
I want to fetch a list of all Blog instances which have at least one
Submission , i.e. entries.count() > 0.
The suggested
Blog.objects.filter(entries__isnull = False)
(suggested by Malcolm Tredinnick) returns:
psycopg.ProgrammingError: FEHLER: Spalte m2m_models_blog__entries.entries existiert nicht SELECT "models_blog"."id","models_blog"."title","models_blog"."description","models_blog"."region_id","models_blog"."regionname","models_blog"."date_submitted","models_blog"."author_id","models_blog"."visible" FROM "models_blog" LEFT OUTER JOIN "models_blog_entries" AS "m2m_models_blog__entries" ON "models_blog"."id" = "m2m_models_blog__entries"."blog_id" WHERE ("m2m_models_blog__entries"."entries" IS NOT NULL)
m2m_models_blogentries is an alias for models_blog_entries -- I
*have* this table, but it has no column "entries". This is what it looks like:
# \d models_blog_entries Tabelle »public.models_blog_entries« Spalte | Typ | Attribute ---------------+---------+--------------------------------------------------------------------- id | integer | not null default nextval('public.models_blog_entries_id_seq'::text) blog_id | integer | not null submission_id | integer | not null Indexe: »models_blog_entries_pkey« PRIMARY KEY, btree (id) »models_blog_entries_blog_id_key« UNIQUE, btree (blog_id, submission_id) Fremdschlüssel-Constraints: »models_blog_entries_blog_id_fkey« FOREIGN KEY (blog_id) REFERENCES models_blog(id) »models_blog_entries_submission_id_fkey« FOREIGN KEY (submission_id) REFERENCES models_submission(id)
Also, if I do this:
Blog.objects.filter(entries__blog__id__isnull = False)
I get a list of Blogs which have entries, however I have duplicates in
the list, one for each Submission which is in a M2M relationship (i.e.
each Blog is shown n times if it has n entries).
Attachments (1)
Change History (24)
comment:1 by , 18 years ago
comment:2 by , 18 years ago
Hi Daniel,
in the meantime a lot of stuff in m2m fields has changed. Has this bug been resolved, or is it still there?
comment:3 by , 18 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
Closing. If anyone's having this problem still, please reopen.
comment:4 by , 8 years ago
Easy pickings: | unset |
---|---|
Resolution: | invalid |
Status: | closed → new |
UI/UX: | unset |
As far as I can tell I am having this issue too, reopened.
Unlike Daniel I can run Blog.objects.filter(entries__isnull = False)
but I get the same result Daniel described as such:
"I get a list of Blogs which have entries, however I have duplicates in
the list, one for each Submission which is in a M2M relationship (i.e.
each Blog is shown n times if it has n entries)."
I can lookup something like Blog.objects.filter(entries__name = 'blah')
just fine, but the isnull is returning the duplicate results and that's what I want to use...
comment:5 by , 8 years ago
Summary: | Filtering on count of ManyToManyField → QuerySet.filter(m2mfield__isnull=False) may return duplicates |
---|---|
Triage Stage: | Unreviewed → Accepted |
Type: | defect → Bug |
I'm not positive this can be fixed and shouldn't just be solved by using .distinct()
but if that's the case, maybe the documentation could be clarified.
I'm attaching a test which currently passes but demonstrates that duplicates are returned in the results.
by , 8 years ago
Attachment: | 2361-test.diff added |
---|
comment:6 by , 6 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:7 by , 4 years ago
Not relevant anymore.
Model.objects.filter(ManyToManyField__isnull=False)
In Django 3.1 works properly.
Tested here:
https://github.com/almazkun/django_ticket_2361
However, in Django 1.11 shows duplicates as described above.
comment:8 by , 4 years ago
Resolution: | → fixed |
---|---|
Status: | assigned → closed |
comment:9 by , 4 years ago
Resolution: | fixed |
---|---|
Status: | closed → new |
Almaz, thanks for checking this ticket, however it still doesn't work for me. Have you checked test attached by Tim? It returns duplicates on master:
>>> Item.objects.filter(tags__isnull=False) [<Item: four>, <Item: one>, <Item: one>, <Item: two>, <Item: two>]
comment:10 by , 2 years ago
Hit the same bug in 3.2.13 (and was *very* surprised!). Adding .distinct() helps, but it feels like a workaround.
comment:11 by , 2 years ago
Owner: | removed |
---|---|
Status: | new → assigned |
comment:12 by , 2 years ago
Status: | assigned → new |
---|
comment:13 by , 2 years ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:14 by , 2 years ago
I was able to reproduce the problem in Django 4.2.dev20220923141503 and have adapted and extended Tim 's test a bit:
def test_ticket_2361(self): """ Prove that QuerySet.filter(m2mfield__isnull=False) may return duplicates. """ # Tags without items are returned, but several times if they link to multiple tags. self.assertQuerysetEqual( Item.objects.filter(tags__isnull=False), ['<Item: four>', '<Item: one>', '<Item: one>', '<Item: two>', '<Item: two>'], transform=repr ) # Adding distinct helps, but feels like a workaround self.assertQuerysetEqual( Item.objects.filter(tags__isnull=False).distinct(), ['<Item: four>', '<Item: one>', '<Item: two>'], transform=repr )
To finally close the ticket after 16 years, it needs a decision from the Django team with the following options:
- We change the ORM in such a way that it automatically executes a
distinct()
in the described edge case. This would be a breaking change. - We don't change anything in the Django code, but add a note to the Django documentation that in this particular case the unexpected behavior may occur and can be fixed with a
distinct()
. - Another solution is found.
comment:15 by , 2 years ago
Hi Norbert. Thanks for looking at this!
An option may be to document the current situation whilst also waiting for the other solution, so 2 and 3.
Would you like to prepare a suggestion for 2?
comment:16 by , 2 years ago
Hi Carlton,
it took me a while to work through the contributor documentation, but here are my suggested changes:
https://github.com/django/django/compare/main...stueken:django:ticket_2361
The first commit proves the mentioned behavior in Django, the second adds a note with an example to the documentation.
Could you have a look at it?
comment:17 by , 2 years ago
The current behavior is already documented in "Spanning multi-valued relationship" section. I think there is no need to document the same thing twice 🤔.
comment:18 by , 2 years ago
Hi Mariusz,
both examples address potential duplicates when using m2m relationships. However, the already documented example at the end of the "Spanning multi-valued relationship" section states that the duplicates are yielded since multiple filters are chained and therefore resulting in multiple joins compared to using only one filter.
My example does not address multiple filters, but shows the quite unexpected behavior of m2m relations using just one filter which is the reason this ticket was created.
I can't follow this behavior out of the example in "Spanning multi-valued relationships".
As stated, in my opinion, both examples show different screnarios of how m2m relationships can yield duplicates. Maybe linking both examples to each other and explaining their difference would make it clearer.
comment:19 by , 2 years ago
FWIW I also didn't read the "Spanning multi-valued relationships" filter(A).filter(B)
vs filter(A, B)
example as entailing the filter(m2m__isnull)
example here.
(I'd have to sit down with a piece of paper to see why that's equivalent. 🙂)
Maybe expanding that section though, so it's on topic (by section title at least)?
comment:20 by , 2 years ago
Thanks for the quick replies!
I like your example namings, very clear (could be used for example namings):
- Example 1:
filter(A).filter(B) vs filter(A, B)
- Example 2:
filter(m2m__isnull)
I think, both examples are placed correctly under the respective headings and would possibly be missed if put below the other section:
- Making Queries > Retrieving objects > Lookups that span relationships > Spanning multi-valued relationships --> Note with example 1 for possible duplicates when using multiple filters
- Making Queries > Related objects > Many-to-many relationships --> Note with example 2 for possible duplicates when using
isnull
in a filter
So if the examples address different causes for duplicate entries in m2m relationships, I would prefer to link the examples to each other.
comment:21 by , 13 months ago
Cc: | added |
---|
comment:22 by , 2 months ago
I think this is a well-known Django "gotcha", distinct()
has a big performance impact so I would definitely advise against it.
comment:23 by , 2 months ago
FWIW the most straightforward way to work around this issue today is to use Exists
instead
Blog.objects.filter( Exists(Entry.objects.filter(blog=OuterRef("pk"))) )
Unfortunately the framework doesn't allow you to register transforms on related fields lookups otherwise this could be as simple as
Blog.objects.filter(entries__exists=True)
Here's a very stale and early attempt at playing this concept if anyone is interested in trying.
My thoughts at the time was that allowing such transforms could greatly reduce a lot of the boilerplate associated with multi-valued relationships and filtering in hope to eventually resolve #2361.
For example
Blog.objects.filter(entries__exists=Q(published=True)) # Instead of Blog.objects.filter( Exists(Entry.objects.filter(blog=OuterRef("pk"), published=True))) ) Blog.objects.filter(entries__count__gte=10) # Instead of Blog.objects.annotate( entries__count=Count("entries") ).filter(entries__count__gte=10)
Can you please attach a model file that fails. The example fragment you have posted works correctly for me (with three different databases, so it's not backend dependent). Without a failing example this will not be easy to fix.
A couple of side notes about things not relevant to the problem at hand: You seem to have called your applcation "models". This is certainly going to cause problems because doing anything like
from models import ...
inside the models application is ambiguous (you will have a models.py file in there as well). Also, the multiple results you are seeing is completely unrelated and correct behaviour: there are multiple results in the database so we return them all. That is why distinct() exists for QuerySets.