Opened 11 years ago

Closed 10 years ago

#22394 closed Cleanup/optimization (fixed)

Built-in datetime Lookups should actually be Transforms

Reported by: jarshwah Owned by: Marc Tamlyn
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: lookups
Cc: jon.dufresne@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The following built in lookups should be Transforms rather than Lookups, because we're transforming the date/time into a date/time part:

Year, Month, Day, WeekDay, Hour, Minute, Second.

Converting the above into Transforms then allows a Lookup to proceed:

qs.filter(date_field__year__gte=2001)

Change History (13)

comment:1 by jarshwah, 11 years ago

Discussion on IRC mentions that the Year lookup does a range query (BETWEEN x AND y) which is able to use date based indexes. If a Transform were to be used, any field based indexes would be invalid, as a computed/function-based index would need to be created instead to preserve performance.

However, all other operations based on date fields do an EXTRACT(), which can't use traditional indexes anyway. Having Year as a Lookup but the other date/time lookups as Transforms doesn't really make sense either, and would probably be more confusing than it is now.

comment:2 by Marc Tamlyn, 11 years ago

Owner: changed from nobody to Marc Tamlyn
Status: newassigned

Doing something with this issue is part of my kickstarter, so it would probably be unfair for someone else to work on it.

I believe the current machinery is enough to provide "optimal" queries in all circumstances. This will work by making __year a Transform which is actually a no-op, and that transform having a set of lookups available to it distinct from the normal lookups, which each both extract the year *and* do the comparison. This allows us to use EXTRACT for the exact case but not for the less than case (for example). (Note that if __year becomes a transform all calls to __year explicitly become __year__exact). At present, lookups and only used in filter queries. We will need to be more intelligent here when considering their use in values calls, custom index etc as then a __year transform cannot be a noop.

To work out the best option for each possible call here, I will be running benchmarks and checking query plans, with and without indexes, for all the possible queries. We can then work out what the best query is for each db in each case - they need not necessarily be the same.

It is also possible with custom indexes that there may be a case for the __year transform having a __extract transform which forces the use of EXTRACT() at SQL level. This would be an unusual use case for year, but perhaps more likely for __date or __week where a user could add an index for this particular extract and then perform efficient queries using it without needing to index the entire table. This could be much more efficient with logging-like tables, but until I try it out I don't know.

comment:3 by Anssi Kääriäinen, 11 years ago

There is a way to have year lookup as a Transform, and also have specialized Lookups for exact and friends. The way to do this is to write YearExact's as_sql() in a way that it accesses directly self.lhs.lhs, that is YearExact doesn't execute self.lhs.as_sql() at all, instead it skips directly to the column of self.lhs. This works as self.lhs is known to be a YearTransform instance, only way to access YearExact is through YearTransform.

As it happens this has already been written as a test case in custom_lookups. Check YearTransform and YearExact in custom_lookups/tests.py. The implementation is PostgreSQL specific, so there is some work to make it generic.

BTW for the kickstarter part - I think it is completely fair if somebody else has time to work on this and implements it for you for your kickstarter project. If you would be able to just merge patches others wrote I think that would be completely fair way to accomplish your kickstarter project. Of course, this is just my opinion.

comment:4 by Marc Tamlyn, 11 years ago

Ah yes, I'd forgotten about accessing lhs.lhs directly that is a good approach to this.

And that is a good point re the kickstarter - worst case is it gives me more time to work towards supporting transforms elsewhere in the code base.

comment:5 by Tim Graham, 11 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

comment:6 by Jon Dufresne, 10 years ago

Cc: jon.dufresne@… added
Has patch: set

Added an initial PR: https://github.com/django/django/pull/4393

This PR does not yet address the year BETWEEN indexing issue outline above. I will look into it. Feedback on the initial status is welcome.

comment:7 by Jon Dufresne, 10 years ago

My PR has addressed the index concern for __year lookups. The PR is ready for review. Thanks.

comment:8 by Tim Graham, 10 years ago

Triage Stage: AcceptedReady for checkin

Looks good to me (awaiting +1 from Josh).

comment:9 by Josh Smeaton, 10 years ago

+1 once extract_type is removed.

comment:10 by Tim Graham, 10 years ago

Patch needs improvement: set
Triage Stage: Ready for checkinAccepted

comment:11 by Jon Dufresne, 10 years ago

Patch needs improvement: unset

Added docs and tests to PR.

comment:12 by Tim Graham, 10 years ago

Summary: Several built in Lookups should actually be TransformsBuilt-in datetime Lookups should actually be Transforms
Triage Stage: AcceptedReady for checkin

comment:13 by Tim Graham <timograham@…>, 10 years ago

Resolution: fixed
Status: assignedclosed

In b5e0eede:

Fixed #22394 -- Refactored built-in datetime lookups to transforms.

Note: See TracTickets for help on using tickets.
Back to Top