Opened 12 months ago

Last modified 4 days ago

#22394 assigned Cleanup/optimization

Several built in Lookups should actually be Transforms

Reported by: smeatonj Owned by: mjtamlyn
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: lookups
Cc: jon.dufresne@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

The following built in lookups should be Transforms rather than Lookups, because we're transforming the date/time into a date/time part:

Year, Month, Day, WeekDay, Hour, Minute, Second.

Converting the above into Transforms then allows a Lookup to proceed:

qs.filter(date_field__year__gte=2001)

Change History (6)

comment:1 Changed 12 months ago by smeatonj

  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

Discussion on IRC mentions that the Year lookup does a range query (BETWEEN x AND y) which is able to use date based indexes. If a Transform were to be used, any field based indexes would be invalid, as a computed/function-based index would need to be created instead to preserve performance.

However, all other operations based on date fields do an EXTRACT(), which can't use traditional indexes anyway. Having Year as a Lookup but the other date/time lookups as Transforms doesn't really make sense either, and would probably be more confusing than it is now.

comment:2 Changed 12 months ago by mjtamlyn

  • Owner changed from nobody to mjtamlyn
  • Status changed from new to assigned

Doing something with this issue is part of my kickstarter, so it would probably be unfair for someone else to work on it.

I believe the current machinery is enough to provide "optimal" queries in all circumstances. This will work by making __year a Transform which is actually a no-op, and that transform having a set of lookups available to it distinct from the normal lookups, which each both extract the year *and* do the comparison. This allows us to use EXTRACT for the exact case but not for the less than case (for example). (Note that if __year becomes a transform all calls to __year explicitly become __year__exact). At present, lookups and only used in filter queries. We will need to be more intelligent here when considering their use in values calls, custom index etc as then a __year transform cannot be a noop.

To work out the best option for each possible call here, I will be running benchmarks and checking query plans, with and without indexes, for all the possible queries. We can then work out what the best query is for each db in each case - they need not necessarily be the same.

It is also possible with custom indexes that there may be a case for the __year transform having a __extract transform which forces the use of EXTRACT() at SQL level. This would be an unusual use case for year, but perhaps more likely for __date or __week where a user could add an index for this particular extract and then perform efficient queries using it without needing to index the entire table. This could be much more efficient with logging-like tables, but until I try it out I don't know.

comment:3 Changed 12 months ago by akaariai

There is a way to have year lookup as a Transform, and also have specialized Lookups for exact and friends. The way to do this is to write YearExact's as_sql() in a way that it accesses directly self.lhs.lhs, that is YearExact doesn't execute self.lhs.as_sql() at all, instead it skips directly to the column of self.lhs. This works as self.lhs is known to be a YearTransform instance, only way to access YearExact is through YearTransform.

As it happens this has already been written as a test case in custom_lookups. Check YearTransform and YearExact in custom_lookups/tests.py. The implementation is PostgreSQL specific, so there is some work to make it generic.

BTW for the kickstarter part - I think it is completely fair if somebody else has time to work on this and implements it for you for your kickstarter project. If you would be able to just merge patches others wrote I think that would be completely fair way to accomplish your kickstarter project. Of course, this is just my opinion.

comment:4 Changed 12 months ago by mjtamlyn

Ah yes, I'd forgotten about accessing lhs.lhs directly that is a good approach to this.

And that is a good point re the kickstarter - worst case is it gives me more time to work towards supporting transforms elsewhere in the code base.

comment:5 Changed 12 months ago by timo

  • Triage Stage changed from Unreviewed to Accepted
  • Type changed from Uncategorized to Cleanup/optimization

comment:6 Changed 4 days ago by jdufresne

  • Cc jon.dufresne@… added
  • Has patch set

Added an initial PR: https://github.com/django/django/pull/4393

This PR does not yet address the year BETWEEN indexing issue outline above. I will look into it. Feedback on the initial status is welcome.

Note: See TracTickets for help on using tickets.
Back to Top