Code

Opened 2 years ago

Closed 2 years ago

#17728 closed Bug (fixed)

Filtering of annotated querysets broken with timezone-aware datetimes

Reported by: gg Owned by: aaugustin
Component: Database layer (models, ORM) Version: 1.4-beta-1
Severity: Release blocker Keywords: regression
Cc: anssi.kaariainen@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There appears to be a regression with timezone-aware datetimes that causes them to not match "exact" queries properly when added as an annotation.

This is easiest explained with an example. Say we have a series of "events" which are grouped into "sessions". We wish to find the start time of the session (i.e. the time of the session's earliest event)

models.py:

from django.db import models

from django.contrib.auth.models import User

class Event(models.Model):
    time = models.DateTimeField()
    session = models.ForeignKey('Session', related_name='events', null=True)

class Session(models.Model):
    user = models.ForeignKey(User, related_name='km_sessions', null=True)

With USE_TZ = False (or with Django 1.3) it works as expected:

>>> from datetime import datetime
>>> from django.db import models
>>> from django.conf import settings
>>> print settings.USE_TZ
False
>>> from aggregate_fields.models import Session, Event
>>> now = datetime.now()
>>> s = Session.objects.create()
>>> e = Event.objects.create(time=now, session=s)
>>> now_tz = Event.objects.get(pk=e.pk).time  # ensure we're querying with the as-saved datetime
>>> now == now_tz  # timezone are disabled, so these should be equivalent
True
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz)
[<Session: Session object>]
>>> now_tz in Session.objects.annotate(start=models.Min('events__time')).values_list('start', flat=True)
True
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz).count()
1
>>> Session.objects.annotate(start=models.Min('events__time')).count()
1
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start__lt=now_tz).count()
0
>>> Event.objects.filter(time=now_tz).count()
1
>>> Event.objects.get(time=now_tz)
<Event: Event object>

But with USE_TZ = False the results are inconsistent:

>>> from datetime import datetime
>>> from django.db import models
>>> from django.conf import settings
>>> print settings.USE_TZ
True
>>> from aggregate_fields.models import Session, Event
>>> now = datetime.now()
>>> s = Session.objects.create()
>>> e = Event.objects.create(time=now, session=s)
  RuntimeWarning: DateTimeField received a naive datetime (2012-02-19 15:03:55.892547) while time zone support is active.
  RuntimeWarning)
>>> now_tz = Event.objects.get(pk=e.pk).time  # ensure we're querying with the timezone-aware datetime
>>> now == now_tz  # these shouldn't be comparable
Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: can't compare offset-naive and offset-aware datetimes
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz)
[]
>>> now_tz in Session.objects.annotate(start=models.Min('events__time')).values_list('start', flat=True)
True
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz).count()
0
>>> Session.objects.annotate(start=models.Min('events__time')).count()
1
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start__lt=now_tz).count()
1
>>> Event.objects.filter(time=now_tz).count()
1
>>> Event.objects.get(time=now_tz)
<Event: Event object>

When the QuerySet is annotated with the start time, it seems to not use the timezone in the subsequent filtering, though it is present when the list of values is reconstituted as Python objects.

These tests were done on SQLite, in case this is a databse-specific problem. I don't have another DB easily accessible to me at the moment, but I can set up a test later, if needed.

Attachments (3)

17728-failing-tests.diff (4.4 KB) - added by aaugustin 2 years ago.
17728.diff (5.5 KB) - added by akaariai 2 years ago.
17728-2.diff (10.4 KB) - added by aaugustin 2 years ago.

Download all attachments as: .zip

Change History (17)

comment:1 Changed 2 years ago by aaugustin

  • Needs documentation unset
  • Needs tests unset
  • Owner changed from nobody to aaugustin
  • Patch needs improvement unset

comment:2 Changed 2 years ago by aaugustin

  • Severity changed from Normal to Release blocker

I haven't assessed the bug report yet, but it's probably a release blocker.

comment:3 Changed 2 years ago by aaugustin

  • Triage Stage changed from Unreviewed to Accepted

I can reproduce the problem under SQLite and MySQL. Under PostgreSQL everything works as expected.

comment:4 Changed 2 years ago by aaugustin

It appears that connection.ops.value_to_db_datetime isn't called when filtering on an annotation. Strictly speaking, it's a bug of annotations rather than a bug of time zones.

At this point, I haven't managed to fix it without introducing other regressions.

The core of the problem is the fallback to Field.get_db_prep_lookup that happens in two places in WhereNode:

  146              params = [connection.ops.value_to_db_datetime(params_or_value)]
  147          else:
  148:             params = Field().get_db_prep_lookup(lookup_type, params_or_value,
  149                  connection=connection, prepared=True)
  150          if isinstance(lvalue, tuple):
  ...
  339                  # (we don't really want to waste time looking up the associated
  340                  # field object at the calling location).
  341:                 params = Field().get_db_prep_lookup(lookup_type, value,
  342                      connection=connection, prepared=True)
  343                  db_type = None

When the aggregation happens on a DateTimeField, we should call DateTimeField.get_db_prep_lookup instead, so that DateTimeField.get_db_prep_value is inkoved, and calls connection.ops.value_to_db_datetime. Overall, ignoring the subclasses of Field entirely is approximative, and in this case, wrong.

Changed 2 years ago by aaugustin

comment:5 Changed 2 years ago by akaariai

  • Has patch set

The attached patch solves this issue at least on SQLite, but I must say it is just ugly. It fixes the immediate symptom without doing anything for the real problem.

Changed 2 years ago by akaariai

comment:6 Changed 2 years ago by aaugustin

I'm attaching a patch that solves the general problem: converting the value used to perform a lookup on the results of an annotation from a Python object to its database representation.

With the patch, Django uses the methods provided by the Field that describes the output of the aggregate, as if it were a regular model field.

There's a little bit of noise in the patch because I renamed value_annotation consistently (sometimes it was confusingly called annotation or value_annot).

Changed 2 years ago by aaugustin

comment:7 Changed 2 years ago by akaariai

I looked through the patch and it looks good. It not only solves this problem but also cleans the is_ordinal aggregate handling.

I was wondering that shouldn't this be about the type of the param_or_value, not about the field you are comparing against. After checking the code I think what is done in the patch is good, as DateField does its own transformations for the param_or_value so that it will be a date string in the query. There could be a problem if you compared a datetime field against something which doesn't expect datetime values, but I can't give any sane example. It doesn't make sense to compare datetime values against IntegerFields for example...

I am not at all sure that what DateField does for datetime values is sane. An example, where datef is a DateField and dtf is a DateTimeField:

    def test_date_compare_datetime(self):
        time = datetime.datetime(2012, 02, 22, 01, 00, 00).replace(tzinfo=EAT)
        WithDate.objects.create(datef=time, dtf=time)
        with override_settings(DEBUG=True):
            print len(WithDate.objects
                .filter(datef__gte=time)
                .filter(dtf__lte=time))
            print connection.queries
OUT:
SELECT ...
  FROM "timezones_withdate"
 WHERE ("timezones_withdate"."datef" >= 2012-02-22  
        AND "timezones_withdate"."dtf" <= 2012-02-21 22:00:00 )

So, what happens? The timestamp is 2012-02-21 22:00:00 in the query when viewed as a datefield, but it is 2012-02-22 as a date. Note that the date of the datetime value would not be equal to the date value. I think this boils down to in whose timezone the date is.

One viewpoint is that it should be in UTC, as in Python and in the database all datetimes should be viewed in UTC, so why not dates taken from datetimes, too?

Another viewpoint is to save in the time zone of the active time zone. I think this is what PostgreSQL does: if you save now() to a date column, the date that gets saved is dependent on what time zone is active.

However, I don't think just turning the aware datetime into a date in the time zone the datetime happens to be is correct choice from any viewpoint. I think that is what happens now, at least according to my tests using SQLite3.

Anyways, this is probably another ticket's problem.

comment:8 Changed 2 years ago by akaariai

  • Cc anssi.kaariainen@… added

comment:9 Changed 2 years ago by carljm

  • Triage Stage changed from Accepted to Ready for checkin

Reviewed the patch, and it looks excellent to me. Confirmed that without the patch, the tests fail on MySQL and SQLite and pass on Postgres, and with the patch they pass on all three. Marking RFC; you can do the commit, Aymeric.

I do have two questions/comments; neither is a blocker for the patch as they're both pre-existing issues, but I'll note them here as they're related:

  1. I think the docstring for WhereNode.make_atom has been out of date for some time (the tuple it claims to handle is not the tuple it actually handles); this might be a good time to fix it.
  1. If there are a number of tests (all of them?) that should be shared verbatim between NewDatabaseTests and LegacyDatabaseTests, why not put those test methods on a mixin class that both inherit, to reduce code duplication and ensure that the tests actually do stay in sync?

Anssi, I haven't entirely gotten my head around what should happen in the case you're discussing, but I think it's clear that it should be its own ticket.

comment:10 Changed 2 years ago by akaariai

I'll open another ticket for the datetime as date one.

comment:11 Changed 2 years ago by aaugustin

@Carl: 1. yes, I'll update the docstring of WhereNode.make_atom.

@Carl: 2. in fact, the tests are subtly different between NewDatabaseTests and LegacyDatabaseTests. It may be possible to factor them in a mixin and use different class variables; I remember decided against that when I initially wrote them because it made them hard to read. I'll look again, but outside of the scope of this ticket.

@Anssi: yes, the fact that DateTimeFields can be set or compared to date values is a problem when time zone support is enabled, since dates are naive objects. I've created #17742 to track this.

comment:12 Changed 2 years ago by akaariai

I think this is related to this ticket: why aren't aware datetimes converted to non-aware UTC datetimes using .register_adapter for SQLite3? MySQL has something similar (a "conv" dicitionary). This way the bug for this ticket would not exists. Granted that the patch should be committed even without any bugs in this ticket.

For example if you are using raw cursors datetimes aren't converted correctly. If there was an adapter, they would work correctly in raw queries, too. Am I missing some design choice here?

comment:13 Changed 2 years ago by aaugustin

By design, the ORM's puts the responsibility of conversions between Python objects and their database representation in Field classes. These classes provide more flexibility, on a per-field basis, than registering converters and adapters.

My feeling is that we should walk away from adapters / converters, since their job is already performed by fields, and their existence could hide bugs this one. I can't think of a use case where the ORM would send to the database a value (a Python object) that isn't either associated to a Field, or a known type (typically, an int for a LIMIT clause).

I think the way forward is to improve DateField and DateTimeField so they handle appropriately both datetimes and dates.

comment:14 Changed 2 years ago by aaugustin

  • Resolution set to fixed
  • Status changed from new to closed

In [17576]:

Fixed #17728 -- When filtering an annotation, ensured the values used in the filter are properly converted to their database representation. This bug was particularly visible with timezone-aware DateTimeFields. Thanks gg for the report and Carl for the review.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.