Opened 6 years ago

Last modified 6 years ago

#29244 closed Cleanup/optimization

Passing dict to a Func subclass fails silently on hash step — at Version 2

Reported by: Rob Jauquet Owned by: nobody
Component: Core (Other) Version: 2.0
Severity: Normal Keywords:
Cc: Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Rob Jauquet)

With this code:

from django.contrib.postgres.search import SearchQuery, Value, Func
from django.core.paginator import Paginator
from django.db.models import F, TextField, DateTimeField, Models
from django.utils import timezone

class Entry(Model):
    text = TextField()
    published_on = DateTimeField(default=timezone.now())

class Headline(Func):
    function = 'ts_headline'
    output_field = TextField()

    def __init__(self, text, query, options=None):
        extra = {}
        expressions = [text, query]
        if options:
            opt_str = ''
            for key, value in options.items():
                opt_str += f'{key}={value},'
            expressions.append(Value(opt_str[:-1]))
        super().__init__(*expressions, **extra)

entries = Entry.objects.annotate(
    text_highlighted=Headline(
        F('text'), 
        SearchQuery('house'), 
        options={'HighlightAll': False},
    )
).order_by('-published_on')

paginator = Paginator(entries, 20)

# evaluates the queryset
paginator.page(1)

Passing options by dict to the Func subclass causes the count step for pagination to fail silently in paginator.py line 85 (1) which is in turn caused by a silenced TypeError during the hash in expressions.py line 372 (2).

Then, because of the fallback to len on paginator.py line 90 (3) the entire queryset is evaluated instead of evaluating just the specified page (and in this specific case, calling ts_headline on every row instead of just the first page).

The correct way of passing options using **options instead of options=None fixes this case, but the silent error and fallback to calling len on a queryset causes an unexpected expensive query when using pagination.

(1) https://github.com/django/django/blob/281c0223b376d6fa1a11e0726d824ed35cfe7524/django/core/paginator.py#L85
(2) https://github.com/django/django/blob/281c0223b376d6fa1a11e0726d824ed35cfe7524/django/db/models/expressions.py#L372
(3) https://github.com/django/django/blob/281c0223b376d6fa1a11e0726d824ed35cfe7524/django/core/paginator.py#L90

Change History (2)

comment:1 by Tim Graham, 6 years ago

Do you have a proposal of how to address this? By the way, if you press the "y" key when viewing a GitHub page, you can get a link that includes a commit hash. Without that, the links can quickly go stale as line numbers change.

comment:2 by Rob Jauquet, 6 years ago

Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top