Code

Opened 2 years ago

Closed 14 months ago

#17260 closed Bug (fixed)

`QuerySet.dates()` is computed in UTC when time zone support is enabled

Reported by: aaugustin Owned by: aaugustin
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords:
Cc: kmike84@…, anssi.kaariainen@…, semente+django@… Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Aggregation on dates is performed in UTC (the database timezone) when settings.USE_TZ = True, while an end user may expect its current time zone to be taken into account. This is a known problem in the initial implementation of time zone support.

It affects:

  • QuerySet.dates(),
  • the year, month, day, and week_day lookups,
  • date_hierarchy in the admin,
  • the date based generic views.

Attachments (2)

dates_in_tz.diff (11.6 KB) - added by akaariai 2 years ago.
dates_in_tz.2.diff (12.2 KB) - added by akaariai 2 years ago.
AT TIME ZONE syntax for PostgreSQL

Download all attachments as: .zip

Change History (26)

comment:1 Changed 2 years ago by aaugustin

Here are the comments that were sent to the mailing list:

Aymeric Augustin (myself)

Under PostgreSQL, provided the current time zone has a valid name (e.g. it's provided by pytz), we could temporarily set the connection's timezone to this time zone and restore it to UTC afterwards. With other backends, I can't think of a simpler solution than fetching all values and doing the aggregation in Python. Neither of these thrills me, so for the time being, I've documented this as a known limitation.

Luke Plant

You could argue a case for saying that the behaviour has some advantages: if one admin user filters a list of objects by date, and sends the link to a colleague, they would probably expect the colleague to end up with the same list of objects. But even if that isn't very convincing, I'm happy with this limitation given the implementation difficulties of fixing it.

Anssi Kääriäinen

I don't see this as a major issue, except if Django needs to support this behaviour for backwards compatibility reasons. But as this is documented as a limitation, not as a "feature", this is fine as is. It would be nice to fix this in the future. For PostgreSQL, tracking the current time zone for the connection seems a bit hard - we can't rely a SET TIME ZONE is really going to be in effect, unless we also track rollbacks/commits (see #17062). And changing it for every DB query seems bad, too. One idea is to use "SELECT col AT TIME ZONE 'GMT+2' as col". This way the session time zone doesn't need to be changed. For example:

> SET TIME ZONE 'UTC'; 
> create table testt(dt timestamptz); 
> insert into testt values(now()); 
> select dt at time zone 'GMT+2' as dt from testt; 
 2011-11-18 07:06:47.834771 
> select dt at time zone 'Europe/Helsinki' as dt from testt; 
 2011-11-18 11:06:47.834771 

Oracle seems to have a similar feature. I don't know if other databases offer similar functionality. MySQL has probably support for something similar when using TIMESTAMP columns. But as said, I don't see this as a major issue. It would be nice to support this some day.

comment:2 Changed 2 years ago by aaugustin

  • Owner changed from nobody to aaugustin

comment:4 Changed 2 years ago by aaugustin

For the record, I'm afraid this will quickly become contentious once 1.4 is out...

comment:5 Changed 2 years ago by akaariai

You mean contentious as in "wontfix, breaks existing programs"? In my opinion that would be bad. It is pretty clear to me that the current behavior is broken. The dates should be in the currently active time zone.

I think I know how to fix this:

  • aggregates can return parameters (useful for other features, too "conditional aggregates" for example). This is returning parameters from the cols part of the query.
  • other needed parts already can return parameters (where conditions etc).
  • use the AT TIME ZONE trick above, or if that is not available, use just " + timedelta offset minutes".
  • hope that everything works.

The problem is there is a _lot_ of places to fix. In addition, the ORM is a little complex to work with sometimes. I just can't see getting this into 1.4 in time.

Any way to mark this as a known bug which _will_ be fixed in the future? The other way forward is introducing "at_timezone" argument for the operations, but how to do that for __date for example?

comment:6 Changed 2 years ago by aaugustin

I mean "we're going to take some flak".

I plan to fix this eventually.

Version 0, edited 2 years ago by aaugustin (next)

comment:7 Changed 2 years ago by akaariai

Yes, I can see this is very complex to fix in a way that actually works even closely the same on every backend.

It might be possible to just add the offset into the datetime, then do the conversion to the date. This all in the DB. This could actually work, as DST is usually around 3-4 o'clock AM, so there would not be error in date conversions.

I am tempted to give this a try...

comment:8 Changed 2 years ago by kmike

  • Cc kmike84@… added

comment:9 Changed 2 years ago by akaariai

  • Cc anssi.kaariainen@… added

I have a "POC" version ready. It just adds some timedeltas to the values where needed. In the database everything is still done at time zone UTC, we just add the offsets as Python reports them.

The biggest problem with this approach is DST handling. We use the offset as it is _now_, however the database date might have a different offset (offset now is 3 hours, in the summer 2 hours). So, we have an error of one hour for summer dates.

This means that this issue is really hard to fix: if we use current offsets, we will make an error of one hour across DST (not to mention actual time zone changes in South America...). That is not correct enough to call this fixed. We could use "AT TIME ZONE 'tz name'" where available, but that has just one little problem: we don't know the time zone name.

So, I really can't see an easy fix for this. We could invent a mapping mechanism for the Python time zone name -> DB time zone name. That isn't easy either, I believe PostgreSQL can have different time zone names on Windows and Linux. So, the user should actually define his converter dictionary in the settings.py, for each database. Or we could have our own mappings: time zone name -> list of time zones to try. On first connect inspect what time zones are available. Endless source of bugs, but on the other hand that isn't much different from supporting the different locales. This is the best approach available.

After all this MySQL would still be broken. I don't believe there is anything Django can do to make time zones work correctly together with dates queries on that database. I really wonder if it is a good idea to encourage people to use USE_TZ with MySQL. The setup is broken, and as far as I see, will be broken.

The attached patch is, as said, just a proof of concept, or more correctly a poor of the concept not working.

Changed 2 years ago by akaariai

comment:10 Changed 2 years ago by akaariai

One more possibility: just require that every time zone used is usable as database time zone as is. Current time zones follow this rule (I mean the configured settings' time zone). It might be hard to require this one after 1.4 is released, and it might be also hard for those who do not use pytz.

comment:11 Changed 2 years ago by akaariai

OK, another POC patch which has some support for AT TIME ZONE when using PostgreSQL.

As far as I can see SQLite is somewhat easy to fix, PostgreSQL and Oracle are fixable if we can translate the Python time zone names to something the database understands. It seems the best fix for MySQL is adding the offset which has the DST problem.

I am now done with tinkering with this ticket. I hope the POCs have some value for solving this properly after 1.4 has been released. At this point my only worry is that the option of requiring that all time zone names used must be usable as-is in the database will be lost after 1.4 is released.

Lesson of the day: In POSIX syntax GMT+3 is GMT-3. Of course. As if the timezone handling wasn't confusing enough already...

Changed 2 years ago by akaariai

AT TIME ZONE syntax for PostgreSQL

comment:12 Changed 2 years ago by akaariai

An update about MySQL: it has convert_tz function which does something equivalent to AT TIME ZONE. However, there are some problems with convert_tz:

  • The timezone table for MySQL has to be loaded, if I understand the documentation correctly this is not done by default
  • If you try to convert a datetime using an unknown (to MySQL) timezone name, you get null back
  • If the datetime falls out of the range 1970-2038 you will not get any conversion, you get the input datetime back.

So, the convert_tz function would cure the problem for MySQL, assuming the datetime converted is in range 1970-2038. That could be documented as known limitation. This was the last missing piece of this puzzle:

  • SQLite should be easy to support by using Python code directly in the query.
  • PostgreSQL and Oracle should work using AT TIME ZONE syntax. Python <-> DB timezone names must match or there must be a converter.
  • MySQL should work with convert_tz (with the above cave-eats), Again Python <-> DB timezone names must match or there must be a converter.

There is a lot of work left to actually implement the above properly for all core databases. However, that should be pretty straightforward coding to do. The big problem is converting the timezone names from Python to DB format. Any ideas how to do that in somewhat clean fashion?

comment:13 Changed 2 years ago by aaugustin

#18217 was a duplicate.

comment:14 Changed 2 years ago by aaugustin

I've given a lot of thought to this problem when preparing a talk for DjangoCon.eu. I suggest to:

1) Fetch all the datetimes and perform the aggregation in Python to compute QuerySet.dates() on a DateTimeField when USE_TZ = True.

  • That shouldn't be hard to implement.
  • Performance may suffer, but it has always been an expensive operation. This would just shift the load from the database server to the application server.
  • We'll need a parameter to specify which time zone to use. It should default to the default time zone, because that's what the model layer always does. In many cases, programmers will want to use the current time zone instead, but for the sake of explicitness and consistency, I still prefer to use the default time zone by default. I'm open to discussing this.
  • If some backends can easily do this operation in the database (PostgreSQL) they can provide their own efficient implementation.

2) Deprecate the __day/month/year/week_day lookups on a DateTimeField when USE_TZ = True

  • There's no way we can pass an explicit time zone argument with the lookup syntax.
  • Even if the __year lookup can be emulated with a range lookup, the __day and __month lookups can't. They allow finding "all objects on the 1st of any month" or "all objects in January of any year" which is slightly pathological.
  • Like akaariai says, "there's a lot of work left", and I think that if we start on the path of hacking that, we'll be stuck forever trying to fix a fundamentally broken operation.

comment:15 Changed 23 months ago by aaugustin

I'm still making up my mind on this ticket. I like Anssi's last proposal more and more for 1).

comment:16 Changed 23 months ago by akaariai

Not directly related to this ticket, but if we can convert the datetimes to correct time zone in the DB, there are other uses than solving this ticket, too. For example it would be very useful to be able to generate "sum of payments per month and client" aggregate using the ORM. In these cases getting the timezones correctly can be important. Of course, the TZ issue isn't the only issue preventing this type of query currently...

BTW there could be qs.timezone() operation which sets the whole queryset to work in the given timezone.

Still, as said before, in-db support will take a lot of time to get working correctly.

comment:17 Changed 23 months ago by aaugustin

My last idea was to introduce a datetime_trunc_sql operation in addition to date_trunc_sql.

When USE_TZ = False, datetime_trunc_sql would be equivalent to date_trunc_sql. When USE_TZ = True, datetime_trunc_sql would convert to the appropriate time zone before truncating. It could take the timezone as an optional argument.

Then QuerySet.dates() would use datetime_trunc_sql for DateTimeField and date_trunc_sql for DateField.


By the way, your use case is totally valid — I've implemented a lot of reports like this in Python, since it's currently not possible in DB.

Last edited 14 months ago by aaugustin (previous) (diff)

comment:18 Changed 23 months ago by akaariai

That datetime_trunc_sql seems like a clean solution.

I believe the timezone must be a cursor.execute() parameter, otherwise there could be potential for sql injection. Not 100% sure of this though.

What should one do if the timezone isn't available in the DB? Just error out? This seems like the cleanest approach, but of course, this isn't backwards compatible. Maybe raise a warning, and continue using no conversions? Basically we would then need a method get_db_tz_name for the backends, which would convert the Python timezone to the DB timezone, and error/warn if the timezone can't be converted to db timezone. This should prevent the SQL-injection possibility too. I believe we could ask the get_db_tz_name() to escape the time zone properly.

So, +1 for the datetime_trunc_sql approach. I really like the idea of using the DB for this, as aggregation for example can be a lot cheaper to do directly in the DB instead of fetching the data into Python.

comment:19 Changed 23 months ago by aaugustin

I'd better keep things simple by documenting that tzname must return a string that the database will understand as a time zone name. The tzinfo database, used by pytz, is the de-facto standard. Adding a conversion functions sounds like overengineering to me.

While SQL injections through time zone names are unlikely, indeed, we must prevent them.


NB: SQLite will be restricted to using the default time zone, as defined by os.environ['TZ']. I'm willing to live with this limitation. We can put that under the "use a more feature complete database" umbrella.

comment:20 Changed 23 months ago by akaariai

I don't buy the argument that tzinfo names are usable on every DB. Are they on Oracle, DB2, and MSSQL? Are they usable on databases running on Windows? If I understand correctly Windows doesn't use the tzinfo database.

MySQL will be interesting once again. If the timezone table isn't populated, it returns silently unconverted data. This is other part of the get_db_tz_name() - the idea is that this function could warn/error in this case. It would basically query the DB (on MySQL at least) if the timezone is valid. If not, it would do the warn/error, if yes, then it would cache that the tzinfo name is usable.

But, I think you are right here about not adding the conversion functions. They would be absolutely horrible to maintain.

If we require the user to explicitly set the tzname by using .use_tz(tzname) queryset method, then it is possible to document that the user must ensure that the tzname is usable in the DB. The tzname could be a callable (most likely one returning the current active time zone name) which is resolved at execute time. Is this the idea you were referring to by "documenting that tzname must return a string that the database will understand"? If so, +1 to that approach.

comment:21 Changed 15 months ago by semente+django@…

  • Cc semente+django@… added

comment:22 Changed 14 months ago by aaugustin

I've started a branch to work on this problem: https://github.com/aaugustin/django/compare/queryset-datetimes

This branch is unstable and will be rebased often, don't fork it.

comment:23 Changed 14 months ago by aaugustin

I've started by documenting the target public API.

Conversions/truncations will use the current time zone by default because:

  • aggregation should generally happen in the end user's time zone;
  • it allows using timezone.override as shown below.

The design is sufficiently flexible to deal with all the cases discussed above.

Even "uncooperative" setups (MySQL, Windows) can be made to work with a custom tzinfo whose tzname returns the appropriate value for the database:

qs.datetimes('dt', 'day', timezone=non_standard_tzinfo_object)
with timezone.override(non_standard_tzinfo_object):
    qs.filter(dt__day=1)

I don't care if MySQL silently returns wrong data when it's misconfigured. This isn't worse than all the other data truncation or corruption bugs (for instance, when inserting a string longer than the field size) that happen with MySQL's default configuration. If you use MySQL and care about data integrity, you have to read its manual (and much more).

Finally I don't see the need for a get_db_tz_name method; the standard tzname suffices.

Last edited 14 months ago by aaugustin (previous) (diff)

comment:25 Changed 14 months ago by Aymeric Augustin <aymeric.augustin@…>

  • Resolution set to fixed
  • Status changed from new to closed

In e74e207cce54802f897adcb42149440ee154821e:

Fixed #17260 -- Added time zone aware aggregation and lookups.

Thanks Carl Meyer for the review.

Squashed commit of the following:

commit 4f290bdb60b7d8534abf4ca901bd0844612dcbda
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Wed Feb 13 21:21:30 2013 +0100

Used '0:00' instead of 'UTC' which doesn't always exist in Oracle.

Thanks Ian Kelly for the suggestion.

commit 01b6366f3ce67d57a58ca8f25e5be77911748638
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Wed Feb 13 13:38:43 2013 +0100

Made tzname a parameter of datetime_extract/trunc_sql.

This is required to work around a bug in Oracle.

commit 924a144ef8a80ba4daeeafbe9efaa826566e9d02
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Wed Feb 13 14:47:44 2013 +0100

Added support for parameters in SELECT clauses.

commit b4351d2890cd1090d3ff2d203fe148937324c935
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Mon Feb 11 22:30:22 2013 +0100

Documented backwards incompatibilities in the two previous commits.

commit 91ef84713c81bd455f559dacf790e586d08cacb9
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Mon Feb 11 09:42:31 2013 +0100

Used QuerySet.datetimes for the admin's date_hierarchy.

commit 0d0de288a5210fa106cd4350961eb2006535cc5c
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Mon Feb 11 09:29:38 2013 +0100

Used QuerySet.datetimes in date-based generic views.

commit 9c0859ff7c0b00734afe7fc15609d43d83215072
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Sun Feb 10 21:43:25 2013 +0100

Implemented QuerySet.datetimes on Oracle.

commit 68ab511a4ffbd2b811bf5da174d47e4dd90f28fc
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Sun Feb 10 21:43:14 2013 +0100

Implemented QuerySet.datetimes on MySQL.

commit 22d52681d347a8cdf568dc31ed032cbc61d049ef
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Sun Feb 10 21:42:29 2013 +0100

Implemented QuerySet.datetimes on SQLite.

commit f6800fd04c93722b45f9236976389e0b2fe436f5
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Sun Feb 10 21:43:03 2013 +0100

Implemented QuerySet.datetimes on PostgreSQL.

commit 0c829c23f4cf4d6804cadcc93032dd4c26b8c65e
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Sun Feb 10 21:41:08 2013 +0100

Added datetime-handling infrastructure in the ORM layers.

commit 104d82a7778cf3f0f5d03dfa53709c26df45daad
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Mon Feb 11 10:05:55 2013 +0100

Updated null_queries tests to avoid clashing with the second lookup.

commit c01bbb32358201b3ac8cb4291ef87b7612a2b8e6
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Sun Feb 10 23:07:41 2013 +0100

Updated tests of .dates().

Replaced .dates() by .datetimes() for DateTimeFields.
Replaced dates with datetimes in the expected output for DateFields.

commit 50fb7a52462fecf0127b38e7f3df322aeb287c43
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Sun Feb 10 21:40:09 2013 +0100

Updated and added tests for QuerySet.datetimes.

commit a8451a5004c437190e264667b1e6fb8acc3c1eeb
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Sun Feb 10 22:34:46 2013 +0100

Documented the new time lookups and updated the date lookups.

commit 29413eab2bd1d5e004598900c0dadc0521bbf4d3
Author: Aymeric Augustin <aymeric.augustin@…>
Date: Sun Feb 10 16:15:49 2013 +0100

Documented QuerySet.datetimes and updated QuerySet.dates.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.