Opened 2 years ago

Closed 2 years ago

#34393 closed Uncategorized (invalid)

A filter query returns more items than the original queryset provides after applying INNER JOIN

Reported by: Ľuboš Mjachky Owned by: nobody
Component: Database layer (models, ORM) Version: 3.2
Severity: Normal Keywords: filter query duplicate distinct
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no
Pull Requests:How to create a pull request

Description

In our project, we identified that the filter query returns more entries than the number of entries stored in the initial queryset.

The following piece of code is involved:

# qs.count() == 4
scoped_repos = repo_viewset.get_queryset().values_list("pk", flat=True)
filtered_content = qs.filter(repositories__in=scoped_repos)
# filtered_content.count() == 8

The generated query:

SELECT * FROM "rpm_package" INNER JOIN "core_content" ON ("rpm_package"."content_ptr_id" = "core_content"."pulp_id") 
INNER JOIN "core_repositorycontent" ON ("core_content"."pulp_id" = "core_repositorycontent"."content_id") 
WHERE "core_repositorycontent"."repository_id" IN (c35b7039-2c2c-48e3-8f4f-b0eeabad8af1, ee39a78b-9dd5-4bdf-85d9-eb6406b6ef49)

One of the things being noticed is that the query is constructed with an INNER JOIN clause instead of a LEFT JOIN clause. The core_repositorycontent table contains a lot of duplicates. We believe that this should not be a problem. Adding the distinct() query at the end of the call resolves the issue. See https://github.com/pulp/pulpcore/pull/3642.

The question is whether this is a bug in Django (i.e., a filter query can return more elements than there are in the original queryset) or on our side, and we should restructure the query in a specific way. Any advice is welcome.

Change History (1)

comment:1 by Mariusz Felisiak, 2 years ago

Resolution: invalid
Status: newclosed

This is a documented behavior. You can use Subquery() to prevent duplicates.

Note: See TracTickets for help on using tickets.
Back to Top