Opened 2 years ago

Closed 2 years ago

Last modified 2 years ago

#33835 closed New feature (wontfix)

Select_related().only() in the Prefetch() should automatically add primary keys for reverse relations.

Reported by: Ipakeev Owned by: nobody
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Simon Charette Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

I'm trying to optimize graphQL queries. When I use Prefetch with select_related and only optimizations, I encounter an increase in database queries. This is because I'm prefetching Book via InverseManyToOne relation and It needs a publisher_id in the only method.

class Author(models.Model):
    name = models.CharField(max_length=32)


class Book(models.Model):
    title = models.CharField(max_length=32)
    author = models.ForeignKey("Author", on_delete=models.CASCADE)
    publisher = models.ForeignKey(
        "Publisher",
        on_delete=models.CASCADE,
        related_name="books",
    )


class Publisher(models.Model):
    address = models.CharField(max_length=32)


class AuthorType(DjangoObjectType):
    class Meta:
        model = Author


class BookType(DjangoObjectType):
    class Meta:
        model = Book


class PublisherType(DjangoObjectType):
    class Meta:
        model = Publisher

Fill in the database:

        with transaction.atomic():
            authors = Author.objects.bulk_create([
                Author(name=f"Name{i}") for i in range(10)
            ])
            publishers = Publisher.objects.bulk_create([
                Publisher(address=f"Address{i}") for i in range(10)
            ])
            Book.objects.bulk_create([
                Book(
                    title=f"Title{i}",
                    author=random.choice(authors),
                    publisher=random.choice(publishers),
                ) for i in range(20)
            ])

GraphQL query:

{
  publishers {
    address
    books {
      title
      author {
        name
      }
    }
  }
}

22 db queries:

class Query(graphene.ObjectType):
    publishers = graphene.List(PublisherType)

    def resolve_publishers(self, info):
        queryset = Book.objects.select_related("author").only("title", "author__name")
        return Publisher.objects.prefetch_related(Prefetch("books", queryset)).only("address")

2 db queries:

class Query(graphene.ObjectType):
    publishers = graphene.List(PublisherType)

    def resolve_publishers(self, info):
        queryset = Book.objects.select_related("author").only("title", "author__name", "publisher_id")
        return Publisher.objects.prefetch_related(Prefetch("books", queryset)).only("address")

I fixed function prefetch_related_objects at django/db/models/query.py and now I don't need publisher_id in only method:

...
prefetcher, descriptor, attr_found, is_fetched = get_prefetcher(
    first_obj, through_attr, to_attr
)

from django.db.models.fields.related_descriptors import ReverseManyToOneDescriptor
if type(descriptor) is ReverseManyToOneDescriptor:
    lookup.queryset.query.deferred_loading[0].add(descriptor.field.attname)

if not attr_found:
    raise AttributeError(
...

Attachments (2)

PR1.PNG (32.6 KB ) - added by Ipakeev 2 years ago.
PR2.PNG (32.6 KB ) - added by Ipakeev 2 years ago.

Download all attachments as: .zip

Change History (4)

by Ipakeev, 2 years ago

Attachment: PR1.PNG added

by Ipakeev, 2 years ago

Attachment: PR2.PNG added

comment:1 by Mariusz Felisiak, 2 years ago

Cc: Simon Charette added
Resolution: wontfix
Status: newclosed
Summary: Select_related().only() in the Prefetch class increases the number of queries for the InverseManyToOne relation.Select_related().only() in the Prefetch() should automatically add primary keys for reverse relations.
Type: BugNew feature

Thanks for this proposition, however, prefetch_related_objects() shouldn't implicitly add any columns, this could be unexpected and misleading. Users who use only()/defer() are fully responsible for providing a proper set of columns, see a note in docs:

"The defer() method (and its cousin, only(), below) are only for advanced use-cases. They provide an optimization for when you have analyzed your queries closely and understand exactly what information you need and have measured that the difference between returning the fields you need and the full set of fields for the model will be significant."

comment:2 by Simon Charette, 2 years ago

I agree with Mariusz's conclusion. If you're looking for a solution to catch these problems early through in development on in your test suite without peppering with it assertNumQueries I suggest you look into third-party solutions that will emit warnings when such problems occur.

In your particular case you would have gotten a warning along the lines of

UnsealedAttributeAccess: Attempt to fetch deferred field "publisher_id" on sealed <Book instance>

Which can be elevated to an error during development or tests via warnings.filterwarnings('error', category=UnsealedAttributeAccess)

This origin of this package was the exact same as yours; we needed a tool to allow the efficient creation of Queryset for a GraphQL querying endpoint and wanted to catch N+1 query problems due to missing select_related, prefetch_related or too eager usage of field deferral.

Note: See TracTickets for help on using tickets.
Back to Top