﻿id	summary	reporter	owner	description	type	status	component	version	severity	resolution	keywords	cc	stage	has_patch	needs_docs	needs_tests	needs_better_patch	easy	ui_ux
34393	A filter query returns more items than the original queryset provides after applying INNER JOIN	Ľuboš Mjachky	nobody	"In our project, we identified that the filter query returns more entries than the number of entries stored in the initial queryset.

The following piece of code is involved:
{{{
# qs.count() == 4
scoped_repos = repo_viewset.get_queryset().values_list(""pk"", flat=True)
filtered_content = qs.filter(repositories__in=scoped_repos)
# filtered_content.count() == 8
}}}

The generated query:
{{{
SELECT * FROM ""rpm_package"" INNER JOIN ""core_content"" ON (""rpm_package"".""content_ptr_id"" = ""core_content"".""pulp_id"") 
INNER JOIN ""core_repositorycontent"" ON (""core_content"".""pulp_id"" = ""core_repositorycontent"".""content_id"") 
WHERE ""core_repositorycontent"".""repository_id"" IN (c35b7039-2c2c-48e3-8f4f-b0eeabad8af1, ee39a78b-9dd5-4bdf-85d9-eb6406b6ef49)
}}}

One of the things being noticed is that the query is constructed with an INNER JOIN clause instead of a LEFT JOIN clause. The core_repositorycontent table contains a lot of duplicates. We believe that this should not be a problem. Adding the distinct() query at the end of the call resolves the issue. See https://github.com/pulp/pulpcore/pull/3642.

The question is whether this is a bug in Django (i.e., a filter query can return more elements than there are in the original queryset) or on our side, and we should restructure the query in a specific way. Any advice is welcome."	Uncategorized	closed	Database layer (models, ORM)	3.2	Normal	invalid	filter query duplicate distinct		Unreviewed	0	0	0	0	0	0
