Opened 10 months ago

Last modified 10 months ago

#34699 new Cleanup/optimization

Filtering on annotated TruncSecond expression gives unexpected result.

Reported by: Stefan Owned by:
Component: Database layer (models, ORM) Version: 4.2
Severity: Normal Keywords:
Cc: Aymeric Augustin Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

With a non-UTC time zone:

  TIME_ZONE = 'Europe/Berlin'

and a simply query like so

  from django.db.models.functions import TruncSecond
  from django.utils import timezone

  book = Book.objects.get(id=2)
  now = timezone.now()
  book.published = now
  book.save()
  Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(id=2, _published_trunc__lte=now)

The result is empty; I have simply filtered now against a second-trunced version of now so I would expect a result.

However under the hood the _published_now column is converted to a naive timestamp using AT TIME ZONE 'Europe/Berlin' and is thus a naive timestamp 2 hours ahead of UTC.

SELECT "book"."id",
       "book"."published",
       DATE_TRUNC('second', "book"."published" AT TIME ZONE 'Europe/Berlin') AS "_published_trunc"
FROM "book"
WHERE ("book"."id" = 2 AND
       DATE_TRUNC('second', "book"."_published_trunc" AT TIME ZONE 'Europe/Berlin') <=
       '2023-07-04 11:59:00+02:00'::timestamptz)

The filter compares a naive timestamp to an aware one, but assumes the LHS naive timestamp is a UTC timestamp - which it is not, it is Berlin time.

Workaround

1) Use TruncSecond(now) in the filter so the compared naive timestamps are the same.

2) Use _published_trunc=TruncSecond('published', tzinfo=datetime.timezone.utc) - I don't like this though. It's not clear without a comment why the tzinfo is needed and it assumes the database will compare timezones using UTC.

Change History (12)

comment:1 by Mariusz Felisiak, 10 months ago

Cc: Aymeric Augustin added
Component: UncategorizedDatabase layer (models, ORM)

in reply to:  description comment:2 by Abdulla Dlshad, 10 months ago

I believe this would be the case for all other Trunc Date expressions as they are subclassed with TruncBase.
Replying to Stefan:

With a non-UTC time zone:

  TIME_ZONE = 'Europe/Berlin'

and a simply query like so

  from django.db.models.functions import TruncSecond
  from django.utils import timezone

  book = Book.objects.get(id=2)
  now = timezone.now()
  book.published = now
  book.save()
  Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(id=2, _published_trunc__lte=now)

The result is empty; I have simply filtered now against a second-trunced version of now so I would expect a result.

However under the hood the _published_now column is converted to a naive timestamp using AT TIME ZONE 'Europe/Berlin' and is thus a naive timestamp 2 hours ahead of UTC.

SELECT "book"."id",
       "book"."published",
       DATE_TRUNC('second', "book"."published" AT TIME ZONE 'Europe/Berlin') AS "_published_trunc"
FROM "book"
WHERE ("book"."id" = 2 AND
       DATE_TRUNC('second', "book"."_published_trunc" AT TIME ZONE 'Europe/Berlin') <=
       '2023-07-04 11:59:00+02:00'::timestamptz)

The filter compares a naive timestamp to an aware one, but assumes the LHS naive timestamp is a UTC timestamp - which it is not, it is Berlin time.

Workaround

1) Use TruncSecond(now) in the filter so the compared naive timestamps are the same.

2) Use _published_trunc=TruncSecond('published', tzinfo=datetime.timezone.utc) - I don't like this though. It's not clear without a comment why the tzinfo is needed and it assumes the database will compare timezones using UTC.

comment:3 by Francesco Panico, 10 months ago

Owner: changed from nobody to Francesco Panico
Status: newassigned

comment:4 by Natalia Bidart, 10 months ago

I've been trying to understand this report to properly triage it. From the docs for Trunc and derivatives, when describing DateTimeField truncation, the example given shows this:

 >>> Experiment.objects.annotate(
...     date=TruncDate("start_datetime"),
...     day=TruncDay("start_datetime", tzinfo=melb),
...     hour=TruncHour("start_datetime", tzinfo=melb),
...     minute=TruncMinute("start_datetime"),
...     second=TruncSecond("start_datetime"),
... ).values("date", "day", "hour", "minute", "second").get()
{'date': datetime.date(2014, 6, 15),
 'day': datetime.datetime(2014, 6, 16, 0, 0, tzinfo=zoneinfo.ZoneInfo('Australia/Melbourne')),
 'hour': datetime.datetime(2014, 6, 16, 0, 0, tzinfo=zoneinfo.ZoneInfo('Australia/Melbourne')),
 'minute': 'minute': datetime.datetime(2014, 6, 15, 14, 30, tzinfo=timezone.utc),
 'second': datetime.datetime(2014, 6, 15, 14, 30, 50, tzinfo=timezone.utc)
}

The text and examples seem to indicate that when the stored datetime is in a given timezone, that tz would be used in the truncation unless otherwise specified by the tzinfo param.

@stefan, can you double check that the values for published are actually datetimes in the tz you configured you app?

comment:5 by Natalia Bidart, 10 months ago

Also @stefan, what database are you using in your example? Thanks!

comment:6 by Natalia Bidart, 10 months ago

Triage Stage: UnreviewedAccepted
Type: UncategorizedCleanup/optimization

I did some more research on this. I'm accepting this ticket because I do believe there is an unexpected behavior, which may be a bug or just a documentation issue.

The simplest test of getting all books and printing their published date and their truncated-to-seconds published date is, IMHO, unexpected: the published value is in UTC but the annotated value is in the timezone the app is configured. My settings:

USE_TZ = True
TIME_ZONE = "Europe/Berlin"

The model:

from django.db import models
from django.utils import timezone


class Book(models.Model):
    published = models.DateTimeField(default=timezone.now)

In the shell:

import time
from django.db.models.functions import TruncSecond
from testapp.models import Book
for i in range(3):
    Book.objects.create(); time.sleep(1)
annotated_books = Book.objects.annotate(_published_trunc=TruncSecond('published'))
for i in annotated_books.all():
    print(i.published, i._published_trunc)

With output:

2023-07-14 18:39:22.620603+00:00 2023-07-14 20:39:22+02:00
2023-07-14 18:39:28.585856+00:00 2023-07-14 20:39:28+02:00
2023-07-14 18:39:29.590469+00:00 2023-07-14 20:39:29+02:00
2023-07-14 18:39:30.595811+00:00 2023-07-14 20:39:30+02:00

comment:7 by Mariusz Felisiak, 10 months ago

I'm pretty sure it's not a valid ticket, but couldn't find quickly find a proper justification. Any change to the current behavior would be backward incompatible.

comment:8 by Natalia Bidart, 10 months ago

Thanks Mariusz for your insight. I was wondering what are your thoughts for my example above, where the result of calling TruncSecond on a timestamp would have a timezone different than the argument.

comment:9 by Mariusz Felisiak, 10 months ago

It works as documented:

  • timezone.now() - "If USE_TZ is True, this will be an aware datetime representing the current time in UTC. Note that now() will always return times in UTC regardless of the value of TIME_ZONE; you can use localtime() to get the time in the current time zone."
  • Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."

in reply to:  9 comment:10 by Natalia Bidart, 10 months ago

Replying to Mariusz Felisiak:

It works as documented:

  • timezone.now() - "If USE_TZ is True, this will be an aware datetime representing the current time in UTC. Note that now() will always return times in UTC regardless of the value of TIME_ZONE; you can use localtime() to get the time in the current time zone."
  • Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."

Right, I understand the docs above and how that match the results I got. But, at the same time, the first paragraph about "Time Zones" says:

When support for time zones is enabled, Django stores datetime information in UTC in the database, uses time-zone-aware datetime objects internally, and translates them to the end user’s time zone in templates and forms.
[...]
Even if your website is available in only one time zone, it’s still good practice to store data in UTC in your database.

The above aligns perfectly (and makes sense) with timezone.now returning an aware datetime in UTC. But then, at least to me, is quite surprising that TruncSecond, which is an operation fully occurring in the DB, would not "respect" that "UTC invariant" and have the TIME_ZONE setting affecting the results. Does my point make sense? Do you have historic information about why TruncSecond (and related functions) would not operate with/keep the tz defined in the datetime being processed?

comment:11 by Stefan, 10 months ago

@Natalia, apologies I should've said, it was PSQL 14.3.

I want to clarify that from a purely ORM perspective the result is surprising, given

Book.objects.update(published=timezone.now())
Book.objects.annotate(_published_trunc=TruncSecond('published')).filter(_published_trunc__lte=timezone.now()).count()  # 0

the result is 0. One has to dive in to the SQL to understand why this is happening, and it's because AT TIME ZONE 'Europe/Berlin' is making the timezone naive in the DB level. The docs do say:

Trunc() - "If a different timezone like Australia/Melbourne is active in Django, then the datetime is converted to the new timezone before the value is truncated."

Although it says the datetime is converted to the new timezone before the value is truncated, converted to new timezone doesn't necessarily suggest that the resulting time zone is naive - one could still assume it an aware timezone just with an offset of +11:00. The latter is further suggested because the immediately following examples in that doc show the returned value on the annotation are an aware value with an offset of +11:00, and not naive.

comment:12 by Francesco Panico, 10 months ago

Owner: Francesco Panico removed
Status: assignednew
Note: See TracTickets for help on using tickets.
Back to Top