id,summary,reporter,owner,description,type,status,component,version,severity,resolution,keywords,cc,stage,has_patch,needs_docs,needs_tests,needs_better_patch,easy,ui_ux 25544,prefetch_related sends duplicate ids to database,Julien Hartmann,Ian Foote,"That may be intended, but it seems dubious, so… Issue happens when prefetching a model relation from another one, linked by a M2M field. Sample case: {{{#!python class Picture(Model): # some fields class Collection(Model): pictures = ManyToManyField(Picture, blank=True, related_name='collections') # some fields class CollectionPermission(Model): content_object = ForeignKey(Collection, related_name='permissions') # fk to users and fk to permissions }}} When prefetching collection permissions from a picture queryset, like this: {{{#!python Picture.objects.prefetch_related('collections__permissions') }}} The third query (the second level of prefetching) looks like this: {{{#!sql SELECT ""gallery_collectionpermission"".""id"", ""gallery_collectionpermission"".""user_id"", ""gallery_collectionpermission"".""permission_id"", ""gallery_collectionpermission"".""content_object_id"" FROM ""gallery_collectionpermission"" WHERE ""gallery_collectionpermission"".""content_object_id"" IN ( 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 9, 9, 9, 9, 9 ) }}} The issue arises in two steps: * As prefetch has no identity mapping, the first prefetching step will build a different Collection instance for every picture that uses it. * Then the second prefetching step will gather all instances, and dump their ids to the database without looking for duplicates. Ideally, this could be fixed at the first step: when a queryset prefetches the same row several times, it would make sense to return a single instance. In this example, all Pictures' cache should point to the same Gallery instances. The second step could also be fixed: the related manager could eliminate duplicates when building the query. For some reason the ReverseSingleRelatedObjectDescriptor does it, but none of the other descriptors or managers do. The cost seems huge. Typically, I have collections of hundreds of pictures, but each picture will only appear in 4 or 5 collections. The query thus sends the db server tens of thousands of ids instead of a few dozens. Is there something I missed? Or is this use case out of the normal scope of `prefetch_related`?",Cleanup/optimization,closed,"Database layer (models, ORM)",dev,Normal,fixed,prefetch duplicate,,Accepted,1,0,0,0,0,0