Opened 9 years ago

Closed 9 years ago

#25470 closed Cleanup/optimization (fixed)

Serious performance impact (500x slower!) when querying distinct model dates objects

Reported by: Qian Xu Owned by: Mariusz Felisiak
Component: Database layer (models, ORM) Version: 1.8
Severity: Normal Keywords:
Cc: felisiak.mariusz@…, Simon Charette Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Qian Xu)

I'm using django 1.8.4 writing a webapp. The backend uses MySQL 5.6 (MyISAM). Recently the number of table records reaches 1 million, it will take 1-1.5 seconds to query all distinct record dates. But using MySQL client, it takes less than 0.001 second.

Django Code

class Model1(models.Model):
    date = models.DateField(db_index=True)

# benchmark code
db_dates = Model1.objects.dates("date", kind="day")

I dumped the django queries.

{u'time': u'1.989', u'sql': u"SELECT DISTINCT CAST(DATE_FORMAT(`app1_model1`.`date`, '%Y-%m-%d 00:00:00') AS DATETIME) AS `datefield` FROM `app1_model1` WHERE `app1_model1`.`date` IS NOT NULL ORDER BY `datefield` ASC"}

As you can see, the query did possibly an unnecessary type cast. The performance impact is scaled to the amount of records. The table field is exactly DATE type. I don't know, if there is some reason for the type casting.

Currently, I use another query to get these dates, which takes 0.04 second.

db_dates = Model1.objects.values_list('date', flat=True).distinct()

I posted a question at StackOverflow.

Change History (19)

comment:1 by Qian Xu, 9 years ago

Description: modified (diff)

comment:2 by Qian Xu, 9 years ago

Description: modified (diff)

comment:3 by Qian Xu, 9 years ago

Description: modified (diff)

comment:4 by Qian Xu, 9 years ago

Description: modified (diff)

comment:5 by Qian Xu, 9 years ago

Description: modified (diff)

comment:6 by Qian Xu, 9 years ago

Description: modified (diff)

comment:7 by Qian Xu, 9 years ago

Description: modified (diff)
Summary: Django cast DATE field to DATETIME unexpectedlySerious performance impact (500x slower!) when querying distinct model dates objects

comment:8 by Tim Graham, 9 years ago

Triage Stage: UnreviewedAccepted

If you'd like to investigate and propose a fix, that would be great!

comment:9 by Qian Xu, 9 years ago

Thanks for agreeing this is a serious performance impact. It is still unclear, why DATE is first converted to DATETIME. The Django team must know, whether the code is written by design?

comment:10 by Tim Graham, 9 years ago

Did you try modifying the code and seeing if any tests fail? That's often helpful.

comment:11 by Mariusz Felisiak, 9 years ago

Cc: felisiak.mariusz@… added

in reply to:  10 comment:12 by Qian Xu, 9 years ago

Replying to timgraham:

Did you try modifying the code and seeing if any tests fail? That's often helpful.

I dont have the time to do any further investigation. I don't realize that when I report issue to django community, I'd provide patch and do necessary tests. Really sorry about that. Bye.

Version 1, edited 9 years ago by Qian Xu (previous) (next) (diff)

comment:13 by Mariusz Felisiak, 9 years ago

Owner: changed from nobody to Mariusz Felisiak
Status: newassigned

comment:14 by Mariusz Felisiak, 9 years ago

Has patch: set

I made PR to omit unnecessary type cast.

In my opinion, the second step, is to change default order in dates, datetimes methods

  • django/db/models/query.py

    a b class QuerySet(object):  
    726726        clone._iterable_class = FlatValuesListIterable if flat else ValuesListIterable
    727727        return clone
    728728
    729     def dates(self, field_name, kind, order='ASC'):
     729    def dates(self, field_name, kind, order=None):
    730730        """

or give a way to disable it

  • django/db/models/query.py

    a b class QuerySet(object):  
    733733        """
    734734        assert kind in ("year", "month", "day"), \
    735735            "'kind' must be one of 'year', 'month' or 'day'."
    736         assert order in ('ASC', 'DESC'), \
    737             "'order' must be either 'ASC' or 'DESC'."
    738         return self.annotate(
     736        assert order is None or order in ('ASC', 'DESC'), \
     737            "'order' must be either 'ASC', 'DESC' or None."
     738        qs = self.annotate(
    739739            datefield=Date(field_name, kind),
    740740            plain_field=F(field_name)
    741741        ).values_list(
    742742            'datefield', flat=True
    743         ).distinct().filter(plain_field__isnull=False).order_by(('-' if order == 'DESC' else '') + 'datefield')
     743        ).distinct().filter(plain_field__isnull=False)
     744        if order is not None:
     745            return qs.order_by(('-' if order == 'DESC' else '') + 'datefield')
     746        else:
     747            return qs

in reply to:  14 comment:15 by Mariusz Felisiak, 9 years ago

Any suggestions? I can't find any indications of default sort.

comment:16 by Simon Charette, 9 years ago

Cc: Simon Charette added

The default sort order of ASC seems to be documented and simply making results unordered would be backward incompatible.

I guess you should open a new ticket for a feature request allowing the use of order=None to get unordered results.

comment:17 by Tim Graham, 9 years ago

Needs tests: set

comment:18 by Mariusz Felisiak, 9 years ago

Needs tests: unset

comment:19 by Tim Graham <timograham@…>, 9 years ago

Resolution: fixed
Status: assignedclosed

In 0f6d51e6:

Fixed #25470 -- Avoided unnecessary, expensive DATETIME typecast on MySQL.

Note: See TracTickets for help on using tickets.
Back to Top