Opened 2 years ago
Last modified 2 months ago
#34699 assigned Cleanup/optimization
Filtering on annotated TruncSecond expression gives unexpected result.
Reported by: | Stefan | Owned by: | Wes P. |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Aymeric Augustin, Simon Charette, David Sanders | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
With a non-UTC time zone:
TIME_ZONE = 'Europe/Berlin'
and a simply query like so
from django.db.models.functions import TruncSecond from django.utils import timezone book = Book.objects.get(id=2) now = timezone.now() book.published = now book.save() Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(id=2, _published_trunc__lte=now)
The result is empty; I have simply filtered now
against a second-trunced version of now
so I would expect a result.
However under the hood the _published_now
column is converted to a naive timestamp using AT TIME ZONE 'Europe/Berlin'
and is thus a naive timestamp 2 hours ahead of UTC.
SELECT "book"."id", "book"."published", DATE_TRUNC('second', "book"."published" AT TIME ZONE 'Europe/Berlin') AS "_published_trunc" FROM "book" WHERE ("book"."id" = 2 AND DATE_TRUNC('second', "book"."_published_trunc" AT TIME ZONE 'Europe/Berlin') <= '2023-07-04 11:59:00+02:00'::timestamptz)
The filter compares a naive timestamp to an aware one, but assumes the LHS naive timestamp is a UTC timestamp - which it is not, it is Berlin time.
Workaround
1) Use TruncSecond(now)
in the filter so the compared naive timestamps are the same.
2) Use _published_trunc=TruncSecond('published', tzinfo=datetime.timezone.utc)
- I don't like this though. It's not clear without a comment why the tzinfo is needed and it assumes the database will compare timezones using UTC.
Attachments (1)
Change History (23)
comment:1 by , 2 years ago
Cc: | added |
---|---|
Component: | Uncategorized → Database layer (models, ORM) |
comment:2 by , 2 years ago
comment:3 by , 2 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:4 by , 2 years ago
I've been trying to understand this report to properly triage it. From the docs for Trunc
and derivatives, when describing DateTimeField truncation, the example given shows this:
>>> Experiment.objects.annotate( ... date=TruncDate("start_datetime"), ... day=TruncDay("start_datetime", tzinfo=melb), ... hour=TruncHour("start_datetime", tzinfo=melb), ... minute=TruncMinute("start_datetime"), ... second=TruncSecond("start_datetime"), ... ).values("date", "day", "hour", "minute", "second").get() {'date': datetime.date(2014, 6, 15), 'day': datetime.datetime(2014, 6, 16, 0, 0, tzinfo=zoneinfo.ZoneInfo('Australia/Melbourne')), 'hour': datetime.datetime(2014, 6, 16, 0, 0, tzinfo=zoneinfo.ZoneInfo('Australia/Melbourne')), 'minute': 'minute': datetime.datetime(2014, 6, 15, 14, 30, tzinfo=timezone.utc), 'second': datetime.datetime(2014, 6, 15, 14, 30, 50, tzinfo=timezone.utc) }
The text and examples seem to indicate that when the stored datetime is in a given timezone, that tz would be used in the truncation unless otherwise specified by the tzinfo
param.
@stefan, can you double check that the values for published
are actually datetimes in the tz you configured you app?
comment:6 by , 2 years ago
Triage Stage: | Unreviewed → Accepted |
---|---|
Type: | Uncategorized → Cleanup/optimization |
I did some more research on this. I'm accepting this ticket because I do believe there is an unexpected behavior, which may be a bug or just a documentation issue.
The simplest test of getting all books and printing their published
date and their truncated-to-seconds published date is, IMHO, unexpected: the published
value is in UTC but the annotated value is in the timezone the app is configured. My settings:
USE_TZ = True TIME_ZONE = "Europe/Berlin"
The model:
from django.db import models from django.utils import timezone class Book(models.Model): published = models.DateTimeField(default=timezone.now)
In the shell:
import time from django.db.models.functions import TruncSecond from testapp.models import Book for i in range(3): Book.objects.create(); time.sleep(1) annotated_books = Book.objects.annotate(_published_trunc=TruncSecond('published')) for i in annotated_books.all(): print(i.published, i._published_trunc)
With output:
2023-07-14 18:39:22.620603+00:00 2023-07-14 20:39:22+02:00 2023-07-14 18:39:28.585856+00:00 2023-07-14 20:39:28+02:00 2023-07-14 18:39:29.590469+00:00 2023-07-14 20:39:29+02:00 2023-07-14 18:39:30.595811+00:00 2023-07-14 20:39:30+02:00
comment:7 by , 2 years ago
I'm pretty sure it's not a valid ticket, but couldn't find quickly find a proper justification. Any change to the current behavior would be backward incompatible.
comment:8 by , 2 years ago
Thanks Mariusz for your insight. I was wondering what are your thoughts for my example above, where the result of calling TruncSecond
on a timestamp would have a timezone different than the argument.
follow-up: 10 comment:9 by , 2 years ago
It works as documented:
timezone.now()
- "If USE_TZ is True, this will be an aware datetime representing the current time in UTC. Note that now() will always return times in UTC regardless of the value of TIME_ZONE; you can use localtime() to get the time in the current time zone."Trunc()
- "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."
comment:10 by , 2 years ago
Replying to Mariusz Felisiak:
It works as documented:
timezone.now()
- "If USE_TZ is True, this will be an aware datetime representing the current time in UTC. Note that now() will always return times in UTC regardless of the value of TIME_ZONE; you can use localtime() to get the time in the current time zone."Trunc()
- "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."
Right, I understand the docs above and how that match the results I got. But, at the same time, the first paragraph about "Time Zones" says:
When support for time zones is enabled, Django stores datetime information in UTC in the database, uses time-zone-aware datetime objects internally, and translates them to the end user’s time zone in templates and forms.
[...]
Even if your website is available in only one time zone, it’s still good practice to store data in UTC in your database.
The above aligns perfectly (and makes sense) with timezone.now
returning an aware datetime in UTC. But then, at least to me, is quite surprising that TruncSecond
, which is an operation fully occurring in the DB, would not "respect" that "UTC invariant" and have the TIME_ZONE
setting affecting the results. Does my point make sense? Do you have historic information about why TruncSecond
(and related functions) would not operate with/keep the tz defined in the datetime being processed?
comment:11 by , 2 years ago
@Natalia, apologies I should've said, it was PSQL 14.3.
I want to clarify that from a purely ORM perspective the result is surprising, given
Book.objects.update(published=timezone.now()) Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(_published_trunc__lte=timezone.now()).count() # 0
the result is 0. One has to dive in to the SQL to understand why this is happening, and it's because AT TIME ZONE 'Europe/Berlin'
is making the timezone naive in the DB level. The docs do say:
Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."
Although it says the datetime is converted to the new timezone before the value is truncated, converted to new timezone doesn't necessarily suggest that the resulting time zone is naive - one could still assume it an aware timezone just with an offset of +11:00. The latter is further suggested because the immediately following examples in that doc show the returned value on the annotation are an aware value with an offset of +11:00, and not naive.
comment:12 by , 2 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:13 by , 9 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:14 by , 9 months ago
I did a little research into this and I would say Stefan and Mariusz are both correct. The behavior of TruncSecond
is unexpected but any changes to the feature would break things for people who have used any Trunc functions in a filter and have manually adjusted for the timezone naive result. My proposal is a change to the documentation with a warning about using any of the Trunc functions in a filter when the set timezone is not UTC and to recommend a way of constructing the filter in such away as to produce the expected results. I have a number of tests that confirm the documented warning so that if someone changes the behavior in the future the documentation should be updated.
The way that Django uses DATE_TRUNC
function in Postgres leads us to believe that the underlying filter will also be timezone aware, but the way that Django calls DATE_TRUNC
in Postgres returns a timezone naive result (see: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC). Django returns a timezone aware result by converting the naive value which is why you see the different timezone values in your sample query earlier in the ticket.
The TruncSecond
(and other Trunc functions) were added (I think) in 2013 and at that time, Postgres did not have the option for doing a timezone aware DATE_TRUNC
without doing it the way that Django is currently doing it which returns a timezone naive result. In 2019 Postgres added a new feature to the function that does return a timezone aware result so that the filter in Stefan's situation would work as expected. I did play with this and it would be possible to add this as a new feature, possibly in the same way that Postgres has, by passing an optional parameter to the TruncBase class that would access the feature so that it won't break existing code.
comment:15 by , 9 months ago
Has patch: | set |
---|
This is my first ticket and pull request. https://github.com/django/django/pull/18660
comment:16 by , 3 months ago
Cc: | added |
---|---|
Patch needs improvement: | set |
Version: | 4.2 → dev |
Thank you Wes for working on this ticket. I think the PR needs some adjusting in terms of docs and code formatting, but before getting into those, details, I would like to first define a plan in what would desired in terms of fixing this ticket.
To refresh my memory, I have re-read the docs and tried the reproducing examples again. I still think there is an unexpected result when calling any of the truncation functions (when there is a TIME_ZONE
defined in the Django project). Following the case by case explanation in these docs:
Trunc takes a single expression, representing a DateField, TimeField, or DateTimeField, a kind representing a date or time part, and an output_field that’s either DateTimeField(), TimeField(), or DateField(). It returns a datetime, date, or time depending on output_field, with fields up to kind set to their minimum value. If output_field is omitted, it will default to the output_field of expression. A tzinfo subclass, usually provided by zoneinfo, can be passed to truncate a value in a specific timezone.
To me, the above paragraph does not imply that the tzinfo
SHOULD be passed to get results that make sense in the context of a custom project-wideTIME_ZONE
. Moreover, I have transformed the example that follows that paragraph into a test case (PostgreSQL only) and I get test failures for all of the Australia/Melbourne
cases because the truncated date is in UTC
:
class Ticket34699Tests(TestCase): def test_docs_example(self): self.assertSequenceEqual(DTModel.objects.all(), []) dt = datetime.datetime(2015,6, 15, 14, 30, 50, 321, zoneinfo.ZoneInfo("UTC")) # From the docs: Given the datetime 2015-06-15 14:30:50.000321+00:00... docs_dt = "2015-06-15T14:30:50.000321+00:00" self.assertEqual(dt.isoformat(), docs_dt) utc = { "year": "2015-01-01T00:00:00+00:00", "quarter": "2015-04-01T00:00:00+00:00", "month": "2015-06-01T00:00:00+00:00", "week": "2015-06-15T00:00:00+00:00", "day": "2015-06-15T00:00:00+00:00", "hour": "2015-06-15T14:00:00+00:00", "minute": "2015-06-15T14:30:00+00:00", "second": "2015-06-15T14:30:50+00:00", } melbourne = { "year": "2015-01-01T00:00:00+11:00", "quarter": "2015-04-01T00:00:00+10:00", "month": "2015-06-01T00:00:00+10:00", "week": "2015-06-16T00:00:00+10:00", "day": "2015-06-16T00:00:00+10:00", "hour": "2015-06-16T00:00:00+10:00", "minute": "2015-06-16T00:30:00+10:00", "second": "2015-06-16T00:30:50+10:00", } for tz, cases in [("UTC", utc), ("Australia/Melbourne", melbourne)]: for kind, expected in cases.items(): with (self.settings(USE_TZ=True, TIME_ZONE="UTC"), self.subTest(kind=kind, tz=tz)): instance = DTModel.objects.create(start_datetime=dt) self.assertEqual(instance.start_datetime.isoformat(), docs_dt) result = DTModel.objects.annotate( truncated=Trunc("start_datetime", kind, output_field=DateTimeField()) ).get() self.assertEqual(result.truncated.isoformat(), expected) DTModel.objects.all().delete()
If we are looking for a docs-only fix at this stage, I think we need to do a heavier rework of these docs, to be much more upfront about the need to pass tzinfo
to get expected results for any project with a custom TIME_ZONE
. Personally I feel the code should behave differently, but I don't see clearly how such fix would look or we would handle the potential deprecation/behavior change.
Simon, David, would you have further thoughts?
comment:17 by , 3 months ago
Without a proper notion of typing / distinct types for Django naive vs aware datetime fields it is quite hard to make the experience better here. The way things are currently designed DateTimeField
is expected to be naive if USE_TZ = False
and aware otherwise, there's no way to represent naive DateTimeField
at the ORM level when USE_TZ = True
.
Even on Postgres which has distinct types for naive and aware timestamps (timestamp
and timestamptz
) there's no warning emitted when mixing both; it simply defaults to the globally configured timezone as opposed to how it's implemented on Python
>>> timezone.now() > datetime.datetime.now() TypeError: can't compare offset-naive and offset-aware datetimes
Personally I feel the code should behave differently, but I don't see clearly how such fix would look or we would handle the potential deprecation/behavior change.
Assuming we reach consensus that we want to make this behavior less of a footgun we could introduce a deprecation towards making Trunc(tzinfo)
a required parameter when USE_TZ = True; TIME_ZONE != "UTC"
to eventually make it default to datetime.timezone.utc
when the deprecation period ends.
comment:18 by , 3 months ago
Cc: | added; removed |
---|
follow-up: 20 comment:19 by , 3 months ago
Thanks for responding to the PR. I look forward to the feedback.
I tried running the test you describe in your comment and I also had errors.
I was wondering about the line:
with (self.settings(USE_TZ=True, TIME_ZONE="UTC"), self.subTest(kind=kind, tz=tz))
It seems like it would always assume that we are using the UTC timezone instead of switching to the value in tz. I modified your test running it twice, first in the UTC timezone only but using tzinfo and the second time adjusting the TIME_ZONE
value in the settings without using tzinfo.
I still encountered errors and I realized that the original documentation does have two errors for the Australia timezone:
- "2015-04-01T00:00:00+10:00" should be "2015-04-01T00:00:00+11:00", daylight savings time starts after April 1 at 3 am (DST is the first Sunday in April)
- "“week”: 2015-06-16 00:00:00+10:00" should be "“week”: 2015-06-15 00:00:00+10:00" since the 15th is the start of the week
I made the above changes and all the tests pass.
class Ticket34699Tests(TestCase): def test_docs_example(self): self.assertSequenceEqual(DTModel.objects.all(), []) dt = datetime(2015, 6, 15, 14, 30, 50, 321, zoneinfo.ZoneInfo("UTC")) # From the docs: Given the datetime 2015-06-15 14:30:50.000321+00:00... docs_dt = "2015-06-15T14:30:50.000321+00:00" self.assertEqual(dt.isoformat(), docs_dt) utc = { "year": "2015-01-01T00:00:00+00:00", "quarter": "2015-04-01T00:00:00+00:00", "month": "2015-06-01T00:00:00+00:00", "week": "2015-06-15T00:00:00+00:00", "day": "2015-06-15T00:00:00+00:00", "hour": "2015-06-15T14:00:00+00:00", "minute": "2015-06-15T14:30:00+00:00", "second": "2015-06-15T14:30:50+00:00", } melbourne = { "year": "2015-01-01T00:00:00+11:00", "quarter": "2015-04-01T00:00:00+11:00", "month": "2015-06-01T00:00:00+10:00", "week": "2015-06-15T00:00:00+10:00", "day": "2015-06-16T00:00:00+10:00", "hour": "2015-06-16T00:00:00+10:00", "minute": "2015-06-16T00:30:00+10:00", "second": "2015-06-16T00:30:50+10:00", } for tz, cases in [("UTC", utc), ("Australia/Melbourne", melbourne)]: for kind, expected in cases.items(): with ( self.settings(USE_TZ=True, TIME_ZONE="UTC"), self.subTest(kind=kind, tz=tz, with_tzinfo=True) ): test_zone = zoneinfo.ZoneInfo(tz) instance = DTModel.objects.create(start_datetime=dt) self.assertEqual(instance.start_datetime.isoformat(), docs_dt) result = DTModel.objects.annotate( truncated=Trunc( "start_datetime", kind, output_field=DateTimeField(), tzinfo=test_zone, ) ).get() self.assertEqual(result.truncated.isoformat(), expected) DTModel.objects.all().delete() for tz, cases in [("UTC", utc), ("Australia/Melbourne", melbourne)]: for kind, expected in cases.items(): with (self.settings(USE_TZ=True, TIME_ZONE=tz), self.subTest(kind=kind, tz=tz, with_tzinfo=False)): instance = DTModel.objects.create(start_datetime=dt) self.assertEqual(instance.start_datetime.isoformat(), docs_dt) result = DTModel.objects.annotate( truncated=Trunc( "start_datetime", kind, output_field=DateTimeField() ) ).get() self.assertEqual(result.truncated.isoformat(), expected) DTModel.objects.all().delete()
comment:20 by , 3 months ago
Replying to Wes P.:
Thanks for responding to the PR. I look forward to the feedback.
I tried running the test you describe in your comment and I also had errors.
I was wondering about the line:
with (self.settings(USE_TZ=True, TIME_ZONE="UTC"), self.subTest(kind=kind, tz=tz))
It seems like it would always assume that we are using the UTC timezone instead of switching to the value in tz. I modified your test running it twice, first in the UTC timezone only but using tzinfo and the second time adjusting the
TIME_ZONE
value in the settings without using tzinfo.
Great catch Wes, my test should have:
with (self.settings(USE_TZ=True, TIME_ZONE=tz), self.subTest(kind=kind, tz=tz))
I had a typo there from multiple refactors. So now I need to review again my understanding of the issue, because I convinced myself that the docs were incorrect (vs having some minor issues, which seem to be the case).
Do you have further thoughts on how to proceed based on these findings?
comment:21 by , 3 months ago
My PR basically adds a warning in the documentation when using Trunc
functions as a filter because it can have unexpected behavior. I also added some tests that verify that basic issue exists (the unexpected behavior) and my suggested work-around are valid. It would be nice to have someone look over my test but more importantly that the warning makes sense (I'll attach a screenshot from my dev environment):
Warning
Trunc functions, at the database level, return a timezone naive value which is converted to a timezone aware value by the ORM. When you use a Trunc function in a filter you will need to remember that it is a timezone naive value. This can lead to unexpected results if you are using timezones other than UTC. Django will store date/time values in the database in the UTC timezone. The following example demonstrates what happens when using the timezone “Europe/Berlin” and how to adjust for this:
>>> from django.utils import timezone >>> from datetime import datetime >>> from django.db.models.functions import TruncSecond >>> import zoneinfo >>> start = datetime(2015, 6, 15, 14, 30, 50, 321) >>> start = timezone.make_aware(start) >>> exp = Experiment.objects.create(start_datetime=start) >>> find_this_exp = Experiment.objects.annotate( ... trunc_start=TruncSecond("start_datetime") ... ).filter(trunc_start__lte=start) >>> find_this_exp.count() 0 # We expect to find one result but 0 are found >>> start_adjusted = timezone.localtime(start).replace( ... tzinfo=zoneinfo.ZoneInfo(key="UTC") ... ) >>> find_this_exp_adjusted = Experiment.objects.annotate( ... trunc_start=TruncSecond("start_datetime") ... ).filter(trunc_start__lte=start_adjusted) >>> find_this_exp.count() 1
by , 3 months ago
Attachment: | Trunc_function_warning_34699.png added |
---|
comment:22 by , 2 months ago
Patch needs improvement: | unset |
---|
I've updated the pull request to work with the main branch changes.
I believe this would be the case for all other Trunc Date expressions as they are subclassed with
TruncBase
.Replying to Stefan: