Opened 2 years ago

Last modified 2 months ago

#34699 assigned Cleanup/optimization

Filtering on annotated TruncSecond expression gives unexpected result.

Reported by: Stefan Owned by: Wes P.
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords:
Cc: Aymeric Augustin, Simon Charette, David Sanders Triage Stage: Accepted
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

With a non-UTC time zone:

  TIME_ZONE = 'Europe/Berlin'

and a simply query like so

  from django.db.models.functions import TruncSecond
  from django.utils import timezone

  book = Book.objects.get(id=2)
  now = timezone.now()
  book.published = now
  book.save()
  Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(id=2, _published_trunc__lte=now)

The result is empty; I have simply filtered now against a second-trunced version of now so I would expect a result.

However under the hood the _published_now column is converted to a naive timestamp using AT TIME ZONE 'Europe/Berlin' and is thus a naive timestamp 2 hours ahead of UTC.

SELECT "book"."id",
       "book"."published",
       DATE_TRUNC('second', "book"."published" AT TIME ZONE 'Europe/Berlin') AS "_published_trunc"
FROM "book"
WHERE ("book"."id" = 2 AND
       DATE_TRUNC('second', "book"."_published_trunc" AT TIME ZONE 'Europe/Berlin') <=
       '2023-07-04 11:59:00+02:00'::timestamptz)

The filter compares a naive timestamp to an aware one, but assumes the LHS naive timestamp is a UTC timestamp - which it is not, it is Berlin time.

Workaround

1) Use TruncSecond(now) in the filter so the compared naive timestamps are the same.

2) Use _published_trunc=TruncSecond('published', tzinfo=datetime.timezone.utc) - I don't like this though. It's not clear without a comment why the tzinfo is needed and it assumes the database will compare timezones using UTC.

Attachments (1)

Trunc_function_warning_34699.png (78.1 KB ) - added by Wes P. 3 months ago.

Download all attachments as: .zip

Change History (23)

comment:1 by Mariusz Felisiak, 2 years ago

Cc: Aymeric Augustin added
Component: UncategorizedDatabase layer (models, ORM)

in reply to:  description comment:2 by Abdulla Dlshad, 2 years ago

I believe this would be the case for all other Trunc Date expressions as they are subclassed with TruncBase.
Replying to Stefan:

With a non-UTC time zone:

  TIME_ZONE = 'Europe/Berlin'

and a simply query like so

  from django.db.models.functions import TruncSecond
  from django.utils import timezone

  book = Book.objects.get(id=2)
  now = timezone.now()
  book.published = now
  book.save()
  Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(id=2, _published_trunc__lte=now)

The result is empty; I have simply filtered now against a second-trunced version of now so I would expect a result.

However under the hood the _published_now column is converted to a naive timestamp using AT TIME ZONE 'Europe/Berlin' and is thus a naive timestamp 2 hours ahead of UTC.

SELECT "book"."id",
       "book"."published",
       DATE_TRUNC('second', "book"."published" AT TIME ZONE 'Europe/Berlin') AS "_published_trunc"
FROM "book"
WHERE ("book"."id" = 2 AND
       DATE_TRUNC('second', "book"."_published_trunc" AT TIME ZONE 'Europe/Berlin') <=
       '2023-07-04 11:59:00+02:00'::timestamptz)

The filter compares a naive timestamp to an aware one, but assumes the LHS naive timestamp is a UTC timestamp - which it is not, it is Berlin time.

Workaround

1) Use TruncSecond(now) in the filter so the compared naive timestamps are the same.

2) Use _published_trunc=TruncSecond('published', tzinfo=datetime.timezone.utc) - I don't like this though. It's not clear without a comment why the tzinfo is needed and it assumes the database will compare timezones using UTC.

comment:3 by Francesco Panico, 2 years ago

Owner: changed from nobody to Francesco Panico
Status: newassigned

comment:4 by Natalia Bidart, 2 years ago

I've been trying to understand this report to properly triage it. From the docs for Trunc and derivatives, when describing DateTimeField truncation, the example given shows this:

 >>> Experiment.objects.annotate(
...     date=TruncDate("start_datetime"),
...     day=TruncDay("start_datetime", tzinfo=melb),
...     hour=TruncHour("start_datetime", tzinfo=melb),
...     minute=TruncMinute("start_datetime"),
...     second=TruncSecond("start_datetime"),
... ).values("date", "day", "hour", "minute", "second").get()
{'date': datetime.date(2014, 6, 15),
 'day': datetime.datetime(2014, 6, 16, 0, 0, tzinfo=zoneinfo.ZoneInfo('Australia/Melbourne')),
 'hour': datetime.datetime(2014, 6, 16, 0, 0, tzinfo=zoneinfo.ZoneInfo('Australia/Melbourne')),
 'minute': 'minute': datetime.datetime(2014, 6, 15, 14, 30, tzinfo=timezone.utc),
 'second': datetime.datetime(2014, 6, 15, 14, 30, 50, tzinfo=timezone.utc)
}

The text and examples seem to indicate that when the stored datetime is in a given timezone, that tz would be used in the truncation unless otherwise specified by the tzinfo param.

@stefan, can you double check that the values for published are actually datetimes in the tz you configured you app?

comment:5 by Natalia Bidart, 2 years ago

Also @stefan, what database are you using in your example? Thanks!

comment:6 by Natalia Bidart, 2 years ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

I did some more research on this. I'm accepting this ticket because I do believe there is an unexpected behavior, which may be a bug or just a documentation issue.

The simplest test of getting all books and printing their published date and their truncated-to-seconds published date is, IMHO, unexpected: the published value is in UTC but the annotated value is in the timezone the app is configured. My settings:

USE_TZ = True
TIME_ZONE = "Europe/Berlin"

The model:

from django.db import models
from django.utils import timezone


class Book(models.Model):
    published = models.DateTimeField(default=timezone.now)

In the shell:

import time
from django.db.models.functions import TruncSecond
from testapp.models import Book
for i in range(3):
    Book.objects.create(); time.sleep(1)
annotated_books = Book.objects.annotate(_published_trunc=TruncSecond('published'))
for i in annotated_books.all():
    print(i.published, i._published_trunc)

With output:

2023-07-14 18:39:22.620603+00:00 2023-07-14 20:39:22+02:00
2023-07-14 18:39:28.585856+00:00 2023-07-14 20:39:28+02:00
2023-07-14 18:39:29.590469+00:00 2023-07-14 20:39:29+02:00
2023-07-14 18:39:30.595811+00:00 2023-07-14 20:39:30+02:00

comment:7 by Mariusz Felisiak, 2 years ago

I'm pretty sure it's not a valid ticket, but couldn't find quickly find a proper justification. Any change to the current behavior would be backward incompatible.

comment:8 by Natalia Bidart, 2 years ago

Thanks Mariusz for your insight. I was wondering what are your thoughts for my example above, where the result of calling TruncSecond on a timestamp would have a timezone different than the argument.

comment:9 by Mariusz Felisiak, 2 years ago

It works as documented:

  • timezone.now() - "If USE_TZ is True, this will be an aware datetime representing the current time in UTC. Note that now() will always return times in UTC regardless of the value of TIME_ZONE; you can use localtime() to get the time in the current time zone."
  • Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."

in reply to:  9 comment:10 by Natalia Bidart, 2 years ago

Replying to Mariusz Felisiak:

It works as documented:

  • timezone.now() - "If USE_TZ is True, this will be an aware datetime representing the current time in UTC. Note that now() will always return times in UTC regardless of the value of TIME_ZONE; you can use localtime() to get the time in the current time zone."
  • Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."

Right, I understand the docs above and how that match the results I got. But, at the same time, the first paragraph about "Time Zones" says:

When support for time zones is enabled, Django stores datetime information in UTC in the database, uses time-zone-aware datetime objects internally, and translates them to the end user’s time zone in templates and forms.
[...]
Even if your website is available in only one time zone, it’s still good practice to store data in UTC in your database.

The above aligns perfectly (and makes sense) with timezone.now returning an aware datetime in UTC. But then, at least to me, is quite surprising that TruncSecond, which is an operation fully occurring in the DB, would not "respect" that "UTC invariant" and have the TIME_ZONE setting affecting the results. Does my point make sense? Do you have historic information about why TruncSecond (and related functions) would not operate with/keep the tz defined in the datetime being processed?

comment:11 by Stefan, 2 years ago

@Natalia, apologies I should've said, it was PSQL 14.3.

I want to clarify that from a purely ORM perspective the result is surprising, given

Book.objects.update(published=timezone.now())
Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(_published_trunc__lte=timezone.now()).count()  # 0

the result is 0. One has to dive in to the SQL to understand why this is happening, and it's because AT TIME ZONE 'Europe/Berlin' is making the timezone naive in the DB level. The docs do say:

Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."

Although it says the datetime is converted to the new timezone before the value is truncated, converted to new timezone doesn't necessarily suggest that the resulting time zone is naive - one could still assume it an aware timezone just with an offset of +11:00. The latter is further suggested because the immediately following examples in that doc show the returned value on the annotation are an aware value with an offset of +11:00, and not naive.

comment:12 by Francesco Panico, 2 years ago

Owner: Francesco Panico removed
Status: assignednew

comment:13 by Wes P., 9 months ago

Owner: set to Wes P.
Status: newassigned

comment:14 by Wes P., 9 months ago

I did a little research into this and I would say Stefan and Mariusz are both correct. The behavior of TruncSecond is unexpected but any changes to the feature would break things for people who have used any Trunc functions in a filter and have manually adjusted for the timezone naive result. My proposal is a change to the documentation with a warning about using any of the Trunc functions in a filter when the set timezone is not UTC and to recommend a way of constructing the filter in such away as to produce the expected results. I have a number of tests that confirm the documented warning so that if someone changes the behavior in the future the documentation should be updated.

The way that Django uses DATE_TRUNC function in Postgres leads us to believe that the underlying filter will also be timezone aware, but the way that Django calls DATE_TRUNC in Postgres returns a timezone naive result (see: https://www.postgresql.org/docs/current/functions-datetime.html#FUNCTIONS-DATETIME-TRUNC). Django returns a timezone aware result by converting the naive value which is why you see the different timezone values in your sample query earlier in the ticket.

The TruncSecond (and other Trunc functions) were added (I think) in 2013 and at that time, Postgres did not have the option for doing a timezone aware DATE_TRUNC without doing it the way that Django is currently doing it which returns a timezone naive result. In 2019 Postgres added a new feature to the function that does return a timezone aware result so that the filter in Stefan's situation would work as expected. I did play with this and it would be possible to add this as a new feature, possibly in the same way that Postgres has, by passing an optional parameter to the TruncBase class that would access the feature so that it won't break existing code.

comment:15 by Wes P., 9 months ago

Has patch: set

This is my first ticket and pull request. https://github.com/django/django/pull/18660

comment:16 by Natalia Bidart, 3 months ago

Cc: Simon Charette David Sanders added
Patch needs improvement: set
Version: 4.2dev

Thank you Wes for working on this ticket. I think the PR needs some adjusting in terms of docs and code formatting, but before getting into those, details, I would like to first define a plan in what would desired in terms of fixing this ticket.

To refresh my memory, I have re-read the docs and tried the reproducing examples again. I still think there is an unexpected result when calling any of the truncation functions (when there is a TIME_ZONE defined in the Django project). Following the case by case explanation in these docs:

Trunc takes a single expression, representing a DateField, TimeField, or DateTimeField, a kind representing a date or time part, and an output_field that’s either DateTimeField(), TimeField(), or DateField(). It returns a datetime, date, or time depending on output_field, with fields up to kind set to their minimum value. If output_field is omitted, it will default to the output_field of expression. A tzinfo subclass, usually provided by zoneinfo, can be passed to truncate a value in a specific timezone.

To me, the above paragraph does not imply that the tzinfo SHOULD be passed to get results that make sense in the context of a custom project-wideTIME_ZONE. Moreover, I have transformed the example that follows that paragraph into a test case (PostgreSQL only) and I get test failures for all of the Australia/Melbourne cases because the truncated date is in UTC:

class Ticket34699Tests(TestCase):
    def test_docs_example(self):
        self.assertSequenceEqual(DTModel.objects.all(), [])

        dt = datetime.datetime(2015,6, 15, 14, 30, 50, 321, zoneinfo.ZoneInfo("UTC"))
        # From the docs: Given the datetime 2015-06-15 14:30:50.000321+00:00...
        docs_dt = "2015-06-15T14:30:50.000321+00:00"
        self.assertEqual(dt.isoformat(), docs_dt)

        utc = {
            "year": "2015-01-01T00:00:00+00:00",
            "quarter": "2015-04-01T00:00:00+00:00",
            "month": "2015-06-01T00:00:00+00:00",
            "week": "2015-06-15T00:00:00+00:00",
            "day": "2015-06-15T00:00:00+00:00",
            "hour": "2015-06-15T14:00:00+00:00",
            "minute": "2015-06-15T14:30:00+00:00",
            "second": "2015-06-15T14:30:50+00:00",
        }
        melbourne = {
            "year": "2015-01-01T00:00:00+11:00",
            "quarter": "2015-04-01T00:00:00+10:00",
            "month": "2015-06-01T00:00:00+10:00",
            "week": "2015-06-16T00:00:00+10:00",
            "day": "2015-06-16T00:00:00+10:00",
            "hour": "2015-06-16T00:00:00+10:00",
            "minute": "2015-06-16T00:30:00+10:00",
            "second": "2015-06-16T00:30:50+10:00",
        }
        for tz, cases in [("UTC", utc), ("Australia/Melbourne", melbourne)]:
            for kind, expected in cases.items():
                with (self.settings(USE_TZ=True, TIME_ZONE="UTC"), self.subTest(kind=kind, tz=tz)):
                    instance = DTModel.objects.create(start_datetime=dt)
                    self.assertEqual(instance.start_datetime.isoformat(), docs_dt)

                    result = DTModel.objects.annotate(
                        truncated=Trunc("start_datetime", kind, output_field=DateTimeField())
                    ).get()
                    self.assertEqual(result.truncated.isoformat(), expected)

                DTModel.objects.all().delete()

If we are looking for a docs-only fix at this stage, I think we need to do a heavier rework of these docs, to be much more upfront about the need to pass tzinfo to get expected results for any project with a custom TIME_ZONE. Personally I feel the code should behave differently, but I don't see clearly how such fix would look or we would handle the potential deprecation/behavior change.

Simon, David, would you have further thoughts?

comment:17 by Simon Charette, 3 months ago

Without a proper notion of typing / distinct types for Django naive vs aware datetime fields it is quite hard to make the experience better here. The way things are currently designed DateTimeField is expected to be naive if USE_TZ = False and aware otherwise, there's no way to represent naive DateTimeField at the ORM level when USE_TZ = True.

Even on Postgres which has distinct types for naive and aware timestamps (timestamp and timestamptz) there's no warning emitted when mixing both; it simply defaults to the globally configured timezone as opposed to how it's implemented on Python

>>> timezone.now() > datetime.datetime.now()
TypeError: can't compare offset-naive and offset-aware datetimes

Personally I feel the code should behave differently, but I don't see clearly how such fix would look or we would handle the potential deprecation/behavior change.

Assuming we reach consensus that we want to make this behavior less of a footgun we could introduce a deprecation towards making Trunc(tzinfo) a required parameter when USE_TZ = True; TIME_ZONE != "UTC" to eventually make it default to datetime.timezone.utc when the deprecation period ends.

Last edited 3 months ago by Simon Charette (previous) (diff)

comment:18 by Natalia Bidart, 3 months ago

Cc: David Sanders added; David Sanders removed

comment:19 by Wes P., 3 months ago

Thanks for responding to the PR. I look forward to the feedback.

I tried running the test you describe in your comment and I also had errors.

I was wondering about the line:

with (self.settings(USE_TZ=True, TIME_ZONE="UTC"), self.subTest(kind=kind, tz=tz))

It seems like it would always assume that we are using the UTC timezone instead of switching to the value in tz. I modified your test running it twice, first in the UTC timezone only but using tzinfo and the second time adjusting the TIME_ZONE value in the settings without using tzinfo.

I still encountered errors and I realized that the original documentation does have two errors for the Australia timezone:

  • "2015-04-01T00:00:00+10:00" should be "2015-04-01T00:00:00+11:00", daylight savings time starts after April 1 at 3 am (DST is the first Sunday in April)
  • "“week”: 2015-06-16 00:00:00+10:00" should be "“week”: 2015-06-15 00:00:00+10:00" since the 15th is the start of the week

I made the above changes and all the tests pass.

class Ticket34699Tests(TestCase):
    def test_docs_example(self):
        self.assertSequenceEqual(DTModel.objects.all(), [])

        dt = datetime(2015, 6, 15, 14, 30, 50, 321, zoneinfo.ZoneInfo("UTC"))
        # From the docs: Given the datetime 2015-06-15 14:30:50.000321+00:00...
        docs_dt = "2015-06-15T14:30:50.000321+00:00"
        self.assertEqual(dt.isoformat(), docs_dt)

        utc = {
            "year": "2015-01-01T00:00:00+00:00",
            "quarter": "2015-04-01T00:00:00+00:00",
            "month": "2015-06-01T00:00:00+00:00",
            "week": "2015-06-15T00:00:00+00:00",
            "day": "2015-06-15T00:00:00+00:00",
            "hour": "2015-06-15T14:00:00+00:00",
            "minute": "2015-06-15T14:30:00+00:00",
            "second": "2015-06-15T14:30:50+00:00",
        }
        melbourne = {
            "year": "2015-01-01T00:00:00+11:00",
            "quarter": "2015-04-01T00:00:00+11:00",
            "month": "2015-06-01T00:00:00+10:00",
            "week": "2015-06-15T00:00:00+10:00",
            "day": "2015-06-16T00:00:00+10:00",
            "hour": "2015-06-16T00:00:00+10:00",
            "minute": "2015-06-16T00:30:00+10:00",
            "second": "2015-06-16T00:30:50+10:00",
        }
        for tz, cases in [("UTC", utc), ("Australia/Melbourne", melbourne)]:
            for kind, expected in cases.items():
                with (
                    self.settings(USE_TZ=True, TIME_ZONE="UTC"),
                    self.subTest(kind=kind, tz=tz, with_tzinfo=True)
                ):
                    test_zone = zoneinfo.ZoneInfo(tz)
                    instance = DTModel.objects.create(start_datetime=dt)
                    self.assertEqual(instance.start_datetime.isoformat(), docs_dt)

                    result = DTModel.objects.annotate(
                        truncated=Trunc(
                            "start_datetime",
                            kind,
                            output_field=DateTimeField(),
                            tzinfo=test_zone,
                        )
                    ).get()
                    self.assertEqual(result.truncated.isoformat(), expected)

                DTModel.objects.all().delete()
        for tz, cases in [("UTC", utc), ("Australia/Melbourne", melbourne)]:
            for kind, expected in cases.items():
                with (self.settings(USE_TZ=True, TIME_ZONE=tz),
                      self.subTest(kind=kind, tz=tz, with_tzinfo=False)):
                    instance = DTModel.objects.create(start_datetime=dt)
                    self.assertEqual(instance.start_datetime.isoformat(), docs_dt)

                    result = DTModel.objects.annotate(
                        truncated=Trunc(
                            "start_datetime",
                            kind,
                            output_field=DateTimeField()
                        )
                    ).get()
                    self.assertEqual(result.truncated.isoformat(), expected)

                DTModel.objects.all().delete()

in reply to:  19 comment:20 by Natalia Bidart, 3 months ago

Replying to Wes P.:

Thanks for responding to the PR. I look forward to the feedback.

I tried running the test you describe in your comment and I also had errors.

I was wondering about the line:

with (self.settings(USE_TZ=True, TIME_ZONE="UTC"), self.subTest(kind=kind, tz=tz))

It seems like it would always assume that we are using the UTC timezone instead of switching to the value in tz. I modified your test running it twice, first in the UTC timezone only but using tzinfo and the second time adjusting the TIME_ZONE value in the settings without using tzinfo.

Great catch Wes, my test should have:
with (self.settings(USE_TZ=True, TIME_ZONE=tz), self.subTest(kind=kind, tz=tz))

I had a typo there from multiple refactors. So now I need to review again my understanding of the issue, because I convinced myself that the docs were incorrect (vs having some minor issues, which seem to be the case).

Do you have further thoughts on how to proceed based on these findings?

comment:21 by Wes P., 3 months ago

My PR basically adds a warning in the documentation when using Trunc functions as a filter because it can have unexpected behavior. I also added some tests that verify that basic issue exists (the unexpected behavior) and my suggested work-around are valid. It would be nice to have someone look over my test but more importantly that the warning makes sense (I'll attach a screenshot from my dev environment):

Warning
Trunc functions, at the database level, return a timezone naive value which is converted to a timezone aware value by the ORM. When you use a Trunc function in a filter you will need to remember that it is a timezone naive value. This can lead to unexpected results if you are using timezones other than UTC. Django will store date/time values in the database in the UTC timezone. The following example demonstrates what happens when using the timezone “Europe/Berlin” and how to adjust for this:

>>> from django.utils import timezone
>>> from datetime import datetime
>>> from django.db.models.functions import TruncSecond
>>> import zoneinfo
>>> start = datetime(2015, 6, 15, 14, 30, 50, 321)
>>> start = timezone.make_aware(start)
>>> exp = Experiment.objects.create(start_datetime=start)
>>> find_this_exp = Experiment.objects.annotate(
...     trunc_start=TruncSecond("start_datetime")
... ).filter(trunc_start__lte=start)
>>> find_this_exp.count()
0  # We expect to find one result but 0 are found
>>> start_adjusted = timezone.localtime(start).replace(
...     tzinfo=zoneinfo.ZoneInfo(key="UTC")
... )
>>> find_this_exp_adjusted = Experiment.objects.annotate(
...     trunc_start=TruncSecond("start_datetime")
... ).filter(trunc_start__lte=start_adjusted)
>>> find_this_exp.count()
1

by Wes P., 3 months ago

comment:22 by Wes P., 2 months ago

Patch needs improvement: unset

I've updated the pull request to work with the main branch changes.

Note: See TracTickets for help on using tickets.
Back to Top