Opened 7 months ago

Last modified 5 months ago

#23242 new New feature

Add an option to avoid slow date_hierarchy queries on a big tables

Reported by: meteozond Owned by: nobody
Component: contrib.admin Version: master
Severity: Normal Keywords: date_hierarchy datetimes dates
Cc: herve Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Because of using DISTINCT query date_hierarchy incredibly slows down big tables rendering (~106 records),
May be it is more convenient to simply iterate between min and max values rather than using dates or datetimes queryset methods?

Change History (6)

comment:1 Changed 6 months ago by timgraham

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Resolution set to needsinfo
  • Status changed from new to closed

I'm not sure where the distinct query is that you're referring to. I also don't understand your proposed solution. Please reopen if you can provide more details (or even a patch).

comment:2 Changed 6 months ago by aaugustin

  • Easy pickings unset

When the date_hierarchy option is used, the admin shows a list of available years, months or days. The implementation is designed to show only dates for which objects exist.

It provides a better UX for sparsely populated tables, where iterating between min and max dates would show links to pages with no objects, but performs badly on large tables.

In general, the admin isn't designed for dealing with large amounts of data. This choice can be traced back to Django being designed for editorial sites.

Certainly, the situation could be improved, but how?

comment:3 Changed 5 months ago by herve

  • Cc herve added
  • Resolution needsinfo deleted
  • Status changed from closed to new

Here is an example of such a request:

SELECT DISTINCT DATE_TRUNC('day', "messages_message"."sent_at" AT TIME ZONE E'Europe/Paris') FROM "messages_message" WHERE (EXTRACT('month' FROM "messages_message"."sent_at" AT TIME ZONE E'Europe/Paris') = 9 AND "messages_message"."sent_at" BETWEEN '2014-01-01T00:00:00+01:00'::timestamptz and '2014-12-31T23:59:59.999999+01:00'::timestamptz AND "messages_message"."sent_at" IS NOT NULL) ORDER BY 1 ASC

The explain/analyze on ~6 million entries:

 Sort  (cost=62488.59..62488.60 rows=1 width=8) (actual time=18791.784..18791.784 rows=1 loops=1)
   Sort Key: (date_trunc('day'::text, timezone('Europe/Paris'::text, sent_at)))
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=62488.57..62488.58 rows=1 width=8) (actual time=18791.465..18791.466 rows=1 loops=1)
         ->  Bitmap Heap Scan on messages_message  (cost=856.42..62488.18 rows=153 width=8) (actual time=1276.004..17066.794 rows=6135296 loops=1)
               Recheck Cond: ((sent_at >= '2014-01-01 00:00:00+01'::timestamp with time zone) AND (sent_at <= '2014-12-31 23:59:59.999999+01'::timestamp with time zone) AND (sent_at IS NOT NULL))
               Filter: (date_part('month'::text, timezone('Europe/Paris'::text, sent_at)) = 9::double precision)
               ->  Bitmap Index Scan on messages_message_sent_at_id_idx  (cost=0.00..856.38 rows=30655 width=0) (actual time=1274.039..1274.039 rows=6136069 loops=1)
                     Index Cond: ((sent_at >= '2014-01-01 00:00:00+01'::timestamp with time zone) AND (sent_at <= '2014-12-31 23:59:59.999999+01'::timestamp with time zone) AND (sent_at IS NOT NULL))
 Total runtime: 18791.933 ms

comment:4 Changed 5 months ago by timgraham

@herve, do you have ideas for how to improve the situation? Absent that, I don't see a reason to keep this ticket open besides maybe documenting "Don't use date_hierarchy if you have millions of entries."

comment:5 Changed 5 months ago by aaugustin

The alternative would be an option to show all months and days regardless of whether items exist on these dates.

date_hierarchy_existing_only = True  # default

The year part can be handled with range(min, max) which will be fast if the field is indexed.

comment:6 Changed 5 months ago by timgraham

  • Summary changed from slow date_hierarchy on a big tables to Add an option to avoid slow date_hierarchy queries on a big tables
  • Triage Stage changed from Unreviewed to Accepted
  • Type changed from Cleanup/optimization to New feature
Note: See TracTickets for help on using tickets.
Back to Top