Opened 6 years ago

Last modified 9 months ago

#29884 new Bug

QuerySet.filter() with TruncBase functions not working as expected when USE_TZ= True — at Version 13

Reported by: slide333333 Owned by:
Component: Database layer (models, ORM) Version: 2.1
Severity: Normal Keywords:
Cc: Ülgen Sarıkavak Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Ülgen Sarıkavak)

Tested PostgreSQL only
USE_TZ=True

Consider the following model:

class TimeStampModel(models.Model):
    timestamp = models.DateTimeField()

Create some data:

TimeStampModel.objects.bulk_create([
    TimeStampModel(timestamp=datetime.datetime(2018, 10, 24, 5, 13, tzinfo=pytz.utc)),
    TimeStampModel(timestamp=datetime.datetime(2018, 10, 24, 7, 4, tzinfo=pytz.utc)),
    TimeStampModel(timestamp=datetime.datetime(2018, 10, 24, 8, 56, tzinfo=pytz.utc)),
    TimeStampModel(timestamp=datetime.datetime(2018, 10, 24, 13, 49, tzinfo=pytz.utc)),
    TimeStampModel(timestamp=datetime.datetime(2018, 10, 24, 15, 33, tzinfo=pytz.utc)),
    TimeStampModel(timestamp=datetime.datetime(2018, 10, 24, 18, 29, tzinfo=pytz.utc)),
    TimeStampModel(timestamp=datetime.datetime(2018, 10, 24, 19, 12, tzinfo=pytz.utc)),
    TimeStampModel(timestamp=datetime.datetime(2018, 10, 24, 21, 37, tzinfo=pytz.utc)),
    TimeStampModel(timestamp=datetime.datetime(2018, 10, 24, 21, 9, tzinfo=pytz.utc)),
    TimeStampModel(timestamp=datetime.datetime(2018, 10, 24, 23, 23, tzinfo=pytz.utc)),
])

The following code shoud definitely return the data. But it returns an empty queryset, because the SQL generated is not correct.

>>> from django.db.models.functions import *
>>> from django.utils import timezone
>>> import datetime, pytz
>>> TimeStampModel.objects.annotate(
...     day=TruncDay('timestamp', tzinfo=pytz.timezone('Europe/Berlin'))).filter(
...     day=timezone.make_aware(datetime.datetime(2018, 10, 24), pytz.timezone('Europe/Berlin'))
... )
DEBUG: (0.000) SELECT "truncbase_timestampmodel"."id", "truncbase_timestampmodel"."timestamp", DATE_TRUNC('day', "truncbase_timestampmodel"."timestamp" AT TIME ZONE 'Europe/Berlin') AS "day" FROM "truncbase_timestampmodel" WHERE DATE_TRUNC('day', "truncbase_timestampmodel"."timestamp" AT TIME ZONE 'Europe/Berlin') = '2018-10-24T00:00:00+02:00'::timestamptz LIMIT 21; args=('datetime.datetime(2018, 10, 24, 0, 0, tzinfo=<DstTzInfo 'Europe/Berlin' CEST+2:00:00 DST>)')
<QuerySet []>

The SQL should be (note the additional AT TIME ZONE 'Europe/Berlin'):

SELECT "truncbase_timestampmodel"."id", 
       "truncbase_timestampmodel"."timestamp", 
       DATE_TRUNC('day', "truncbase_timestampmodel"."timestamp" AT TIME ZONE 'Europe/Berlin') AT TIME ZONE 'Europe/Berlin' AS "day" 
FROM "truncbase_timestampmodel" 
WHERE DATE_TRUNC('day', "truncbase_timestampmodel"."timestamp" AT TIME ZONE 'Europe/Berlin') AT TIME ZONE 'Europe/Berlin' = '2018-10-24T00:00:00+02:00'::timestamptz 
LIMIT 21;

https://www.postgresql.org/docs/9.2/static/functions-datetime.html#FUNCTIONS-DATETIME-ZONECONVERT
This fix will also make sure that the returned value from the database driver is an aware dateime. Currently the returned value is native and manually made aware in django.db.models.functions.datetime.TruncBase.convert_value!

I'm not sure how the problem relates to databases without timezone support. But for those with time zone support like PostgreSQL this should work as expected. For Postgres the fix should be pretty easy: django.db.backends.postgresql.operations.DatabaseOperations.datetime_trunc_sql has to be patched and django.db.models.functions.datetime.TruncBase.convert_value should be patched. The latter will also affect other database engines.

Change History (13)

comment:1 by slide333333, 6 years ago

Description: modified (diff)

comment:2 by slide333333, 6 years ago

Description: modified (diff)

comment:3 by slide333333, 6 years ago

Description: modified (diff)

comment:4 by Dan Davis, 6 years ago

I've started the work to triage, and find out which backends may be affected. Work is in https://github.com/danizen/django-review-29984.
Will test sqlite3, postgresql, mysql, and oracle for good measure.

comment:5 by Tim Graham, 6 years ago

Summary: Queryset.filter with TruncBase functions not working as expected when USE_TZ= TrueQuerySet.filter() with TruncBase functions not working as expected when USE_TZ= True
Type: UncategorizedBug

Can you provide tests for tests/db_functions/datetime/test_extract_trunc.py? It would be easier to evaluate the ticket with a patch.

in reply to:  5 comment:6 by Dan Davis, 6 years ago

Replying to Tim Graham:

Can you provide tests for tests/db_functions/datetime/test_extract_trunc.py? It would be easier to evaluate the ticket with a patch. tim

I can, but I am wondering about the use of timezone within the database. I know that Django uses Oracle's DATE type, which is not timezone aware. The same has got to be true for sqlite3. Really, the timezone in use for Oracle DATE types is the timezone for the process running Oracle. Also, evaluating the submission, it is clear that TruncDay still returns a datetime.datetime, which I would expect.

I will go directly to that once I understand which types Django uses for each of these. I'm on the track to do this sometime tonight.

comment:7 by Dan Davis, 6 years ago

Needs tests: set
Owner: changed from nobody to Dan Davis
Status: newassigned
Triage Stage: UnreviewedAccepted

Only for PostgreSQL does Django use a timezone aware datetime:

Oracle - data type DATE is not timezone aware.
SQLite 3 - data type DATETIME is advisory and not a datetime at all, but is not timezone aware
MySQL - data type DATETIME(6), stored in UTC but not timezone aware
PostgreSQL - data type TIMESTAMP with TIMEZONE

query generated by TimeStampModel.objects.annotate(day=TruncDay('timestamp', tzinfo=pytz.timezone('Europe/Berlin'))) is:

MySQL

SELECT `app_timestampmodel`.`id`, `app_timestampmodel`.`timestamp`, CAST(DATE_FORMAT(CONVERT_TZ(`app_timestampmodel`.`timestamp`, 'UTC', 'Europe/Berlin'), '%Y-%m-%d 00:00:00') AS DATETIME) AS `day` FROM `app_timestampmodel`

PostgreSQL

SELECT "app_timestampmodel"."id", "app_timestampmodel"."timestamp", DATE_TRUNC('day', "app_timestampmodel"."timestamp" AT TIME ZONE 'Europe/Berlin') AS "day" FROM "app_timestampmodel"

This makes the intent of Django clear:

  • Always return a datetime type
  • Do truncation after converting the datetime from database native (usually UTC) to the given timezone

I've looked at the SQL produced for the filter query as well, and the DateTrunc doesn't work.

Last edited 6 years ago by Dan Davis (previous) (diff)

in reply to:  5 comment:8 by Dan Davis, 6 years ago

Replying to Tim Graham:

Can you provide tests for tests/db_functions/datetime/test_extract_trunc.py? It would be easier to evaluate the ticket with a patch.

After looking at this more deeply, a solution is proposed for PostgreSQL, and the problem also appears in multiple backends. TruncBase delegates to the backend to do this, calling connection.ops.datetime_trunc_sql for target fields that are datetime fields. I don't see any specific backend tests for datetime truncation, but I see that seetest/backends/base/test_operations.py might be a good place. I do not see how backend specific tests are invoked, but I see some with skips.

comment:9 by Tim Graham, 6 years ago

I don't expect this would require backend-specific tests. The queries results should be the same across all databases, shouldn't they? I haven't looking into this issue to say whether or not there's a bug here.

in reply to:  9 comment:10 by Dan Davis, 6 years ago

Needs tests: unset
Owner: Dan Davis removed
Status: assignednew

Replying to Tim Graham:

I don't expect this would require backend-specific tests. The queries results should be the same across all databases, shouldn't they? I haven't looking into this issue to say whether or not there's a bug here.

This would be fixed in each backend separately, because the method to truncate a datetime is delegated to each backend separately. Nevertheless, since the unit tests are run against a specific backend, by default sqlite3, the test can be in tests/db_functions/datetime/test_extract_trunc.py.

Pull request https://github.com/django/django/pull/10611 provides a failing test for the filtering case. It isn't ready, but anyone else can see the diff.

I've run this against sqlite3 and also against postgresql like this:

./runtests.py --settings postgresql_settings db_functions.datetime.test_extract_trunc

I'm out of this one - I would want someone with more experience in the project to decide whether the single ticket should be fixed for all backends or just for PostgreSQL.

One thing I notice is that make_aware is surprisingly not the same thing as adding the tzinfo to the datetime in the first place. The following raises:

from datetime import datetime
import pytz
from django.utils.timezone import make_aware

euberlin = pytz.timezone('Europe/Berlin')s a
dt1 = datetime(2018, 10, 24, tzinfo=euberlin)
dt2 = make_aware(datetime(2018, 10, 24))
assert dt1 == dt2

comment:11 by Simon Charette, 6 years ago

One thing I notice is that make_aware is surprisingly not the same thing as adding the tzinfo to the datetime in the first place.

Except for the TIME_ZONE setting omission that Aymeric mentioned on the mailing list there's a big difference between datetime(2018, 10, 24, tzinfo=tz) and tz.localize(datetime(2018, 10, 24)) which is what make_aware does. The first will assign the timezone Europe/Berlin timezone as it was first defined and the latter will assign the timezone as it is defined on the specified datetime.

In other words direct tzinfo, assignment will create highly likely bogus datetime objects (unless a timezone definition such as UTC is used) while localize will make sure the right timezone is used. In your case datetime(2018, 10, 24, tzinfo=euberlin) creates a datetime for 2018-10-24 with the timezone definition of Europe/Berlin in 1901.

in reply to:  11 comment:12 by Dan Davis, 6 years ago

Replying to Simon Charette:

Except for the TIME_ZONE setting omission that Aymeric mentioned on the mailing list there's a big difference between datetime(2018, 10, 24, tzinfo=tz) and tz.localize(datetime(2018, 10, 24)) which is what make_aware does. The first will assign the timezone Europe/Berlin timezone as it was first defined and the latter will assign the timezone as it is defined on the specified datetime.

In other words direct tzinfo, assignment will create highly likely bogus datetime objects (unless a timezone definition such as UTC is used) while localize will make sure the right timezone is used. In your case datetime(2018, 10, 24, tzinfo=euberlin) creates a datetime for 2018-10-24 with the timezone definition of Europe/Berlin in 1901.

Thanks for the explanation. Got it.

comment:13 by Ülgen Sarıkavak, 9 months ago

Cc: Ülgen Sarıkavak added
Description: modified (diff)
Note: See TracTickets for help on using tickets.
Back to Top