Opened 18 months ago
Last modified 2 months ago
#34699 assigned Cleanup/optimization
Filtering on annotated TruncSecond expression gives unexpected result.
Reported by: | Stefan | Owned by: | Wes P. |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.2 |
Severity: | Normal | Keywords: | |
Cc: | Aymeric Augustin | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
With a non-UTC time zone:
TIME_ZONE = 'Europe/Berlin'
and a simply query like so
from django.db.models.functions import TruncSecond from django.utils import timezone book = Book.objects.get(id=2) now = timezone.now() book.published = now book.save() Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(id=2, _published_trunc__lte=now)
The result is empty; I have simply filtered now
against a second-trunced version of now
so I would expect a result.
However under the hood the _published_now
column is converted to a naive timestamp using AT TIME ZONE 'Europe/Berlin'
and is thus a naive timestamp 2 hours ahead of UTC.
SELECT "book"."id", "book"."published", DATE_TRUNC('second', "book"."published" AT TIME ZONE 'Europe/Berlin') AS "_published_trunc" FROM "book" WHERE ("book"."id" = 2 AND DATE_TRUNC('second', "book"."_published_trunc" AT TIME ZONE 'Europe/Berlin') <= '2023-07-04 11:59:00+02:00'::timestamptz)
The filter compares a naive timestamp to an aware one, but assumes the LHS naive timestamp is a UTC timestamp - which it is not, it is Berlin time.
Workaround
1) Use TruncSecond(now)
in the filter so the compared naive timestamps are the same.
2) Use _published_trunc=TruncSecond('published', tzinfo=datetime.timezone.utc)
- I don't like this though. It's not clear without a comment why the tzinfo is needed and it assumes the database will compare timezones using UTC.
Change History (15)
comment:1 by , 18 months ago
Cc: | added |
---|---|
Component: | Uncategorized → Database layer (models, ORM) |
comment:2 by , 18 months ago
comment:3 by , 18 months ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:4 by , 18 months ago
I've been trying to understand this report to properly triage it. From the docs for Trunc
and derivatives, when describing DateTimeField truncation, the example given shows this:
>>> Experiment.objects.annotate( ... date=TruncDate("start_datetime"), ... day=TruncDay("start_datetime", tzinfo=melb), ... hour=TruncHour("start_datetime", tzinfo=melb), ... minute=TruncMinute("start_datetime"), ... second=TruncSecond("start_datetime"), ... ).values("date", "day", "hour", "minute", "second").get() {'date': datetime.date(2014, 6, 15), 'day': datetime.datetime(2014, 6, 16, 0, 0, tzinfo=zoneinfo.ZoneInfo('Australia/Melbourne')), 'hour': datetime.datetime(2014, 6, 16, 0, 0, tzinfo=zoneinfo.ZoneInfo('Australia/Melbourne')), 'minute': 'minute': datetime.datetime(2014, 6, 15, 14, 30, tzinfo=timezone.utc), 'second': datetime.datetime(2014, 6, 15, 14, 30, 50, tzinfo=timezone.utc) }
The text and examples seem to indicate that when the stored datetime is in a given timezone, that tz would be used in the truncation unless otherwise specified by the tzinfo
param.
@stefan, can you double check that the values for published
are actually datetimes in the tz you configured you app?
comment:6 by , 18 months ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Cleanup/optimization |
I did some more research on this. I'm accepting this ticket because I do believe there is an unexpected behavior, which may be a bug or just a documentation issue.
The simplest test of getting all books and printing their published
date and their truncated-to-seconds published date is, IMHO, unexpected: the published
value is in UTC but the annotated value is in the timezone the app is configured. My settings:
USE_TZ = True TIME_ZONE = "Europe/Berlin"
The model:
from django.db import models from django.utils import timezone class Book(models.Model): published = models.DateTimeField(default=timezone.now)
In the shell:
import time from django.db.models.functions import TruncSecond from testapp.models import Book for i in range(3): Book.objects.create(); time.sleep(1) annotated_books = Book.objects.annotate(_published_trunc=TruncSecond('published')) for i in annotated_books.all(): print(i.published, i._published_trunc)
With output:
2023-07-14 18:39:22.620603+00:00 2023-07-14 20:39:22+02:00 2023-07-14 18:39:28.585856+00:00 2023-07-14 20:39:28+02:00 2023-07-14 18:39:29.590469+00:00 2023-07-14 20:39:29+02:00 2023-07-14 18:39:30.595811+00:00 2023-07-14 20:39:30+02:00
comment:7 by , 18 months ago
I'm pretty sure it's not a valid ticket, but couldn't find quickly find a proper justification. Any change to the current behavior would be backward incompatible.
comment:8 by , 17 months ago
Thanks Mariusz for your insight. I was wondering what are your thoughts for my example above, where the result of calling TruncSecond
on a timestamp would have a timezone different than the argument.
follow-up: 10 comment:9 by , 17 months ago
It works as documented:
timezone.now()
- "If USE_TZ is True, this will be an aware datetime representing the current time in UTC. Note that now() will always return times in UTC regardless of the value of TIME_ZONE; you can use localtime() to get the time in the current time zone."Trunc()
- "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."
comment:10 by , 17 months ago
Replying to Mariusz Felisiak:
It works as documented:
timezone.now()
- "If USE_TZ is True, this will be an aware datetime representing the current time in UTC. Note that now() will always return times in UTC regardless of the value of TIME_ZONE; you can use localtime() to get the time in the current time zone."Trunc()
- "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."
Right, I understand the docs above and how that match the results I got. But, at the same time, the first paragraph about "Time Zones" says:
When support for time zones is enabled, Django stores datetime information in UTC in the database, uses time-zone-aware datetime objects internally, and translates them to the end user’s time zone in templates and forms.
[...]
Even if your website is available in only one time zone, it’s still good practice to store data in UTC in your database.
The above aligns perfectly (and makes sense) with timezone.now
returning an aware datetime in UTC. But then, at least to me, is quite surprising that TruncSecond
, which is an operation fully occurring in the DB, would not "respect" that "UTC invariant" and have the TIME_ZONE
setting affecting the results. Does my point make sense? Do you have historic information about why TruncSecond
(and related functions) would not operate with/keep the tz defined in the datetime being processed?
comment:11 by , 17 months ago
@Natalia, apologies I should've said, it was PSQL 14.3.
I want to clarify that from a purely ORM perspective the result is surprising, given
Book.objects.update(published=timezone.now()) Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(_published_trunc__lte=timezone.now()).count() # 0
the result is 0. One has to dive in to the SQL to understand why this is happening, and it's because AT TIME ZONE 'Europe/Berlin'
is making the timezone naive in the DB level. The docs do say:
Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."
Although it says the datetime is converted to the new timezone before the value is truncated, converted to new timezone doesn't necessarily suggest that the resulting time zone is naive - one could still assume it an aware timezone just with an offset of +11:00. The latter is further suggested because the immediately following examples in that doc show the returned value on the annotation are an aware value with an offset of +11:00, and not naive.
comment:12 by , 17 months ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:13 by , 2 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:14 by , 2 months ago
I did a little research into this and I would say Stefan and Mariusz are both correct. The behavior of TruncSecond
is unexpected but any changes to the feature would break things for people who have used any Trunc functions in a filter and have manually adjusted for the timezone naive result. My proposal is a change to the documentation with a warning about using any of the Trunc functions in a filter when the set timezone is not UTC and to recommend a way of constructing the filter in such away as to produce the expected results. I have a number of tests that confirm the documented warning so that if someone changes the behavior in the future the documentation should be updated.
The way that Django uses DATE_TRUNC
function in Postgres leads us to believe that the underlying filter will also be timezone aware, but the way that Django calls DATE_TRUNC
in Postgres returns a timezone naive result (see: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC). Django returns a timezone aware result by converting the naive value which is why you see the different timezone values in your sample query earlier in the ticket.
The TruncSecond
(and other Trunc functions) were added (I think) in 2013 and at that time, Postgres did not have the option for doing a timezone aware DATE_TRUNC
without doing it the way that Django is currently doing it which returns a timezone naive result. In 2019 Postgres added a new feature to the function that does return a timezone aware result so that the filter in Stefan's situation would work as expected. I did play with this and it would be possible to add this as a new feature, possibly in the same way that Postgres has, by passing an optional parameter to the TruncBase class that would access the feature so that it won't break existing code.
comment:15 by , 2 months ago
Has patch: | set |
---|
This is my first ticket and pull request. https://github.com/django/django/pull/18660
I believe this would be the case for all other Trunc Date expressions as they are subclassed with
TruncBase
.Replying to Stefan: