Opened 12 years ago

Closed 12 years ago

#17728 closed Bug (fixed)

Filtering of annotated querysets broken with timezone-aware datetimes

Reported by: Gabriel Grant Owned by: Aymeric Augustin
Component: Database layer (models, ORM) Version: 1.4-beta-1
Severity: Release blocker Keywords: regression
Cc: anssi.kaariainen@… Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

There appears to be a regression with timezone-aware datetimes that causes them to not match "exact" queries properly when added as an annotation.

This is easiest explained with an example. Say we have a series of "events" which are grouped into "sessions". We wish to find the start time of the session (i.e. the time of the session's earliest event)

models.py:

from django.db import models

from django.contrib.auth.models import User

class Event(models.Model):
    time = models.DateTimeField()
    session = models.ForeignKey('Session', related_name='events', null=True)

class Session(models.Model):
    user = models.ForeignKey(User, related_name='km_sessions', null=True)

With USE_TZ = False (or with Django 1.3) it works as expected:

>>> from datetime import datetime
>>> from django.db import models
>>> from django.conf import settings
>>> print settings.USE_TZ
False
>>> from aggregate_fields.models import Session, Event
>>> now = datetime.now()
>>> s = Session.objects.create()
>>> e = Event.objects.create(time=now, session=s)
>>> now_tz = Event.objects.get(pk=e.pk).time  # ensure we're querying with the as-saved datetime
>>> now == now_tz  # timezone are disabled, so these should be equivalent
True
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz)
[<Session: Session object>]
>>> now_tz in Session.objects.annotate(start=models.Min('events__time')).values_list('start', flat=True)
True
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz).count()
1
>>> Session.objects.annotate(start=models.Min('events__time')).count()
1
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start__lt=now_tz).count()
0
>>> Event.objects.filter(time=now_tz).count()
1
>>> Event.objects.get(time=now_tz)
<Event: Event object>

But with USE_TZ = False the results are inconsistent:

>>> from datetime import datetime
>>> from django.db import models
>>> from django.conf import settings
>>> print settings.USE_TZ
True
>>> from aggregate_fields.models import Session, Event
>>> now = datetime.now()
>>> s = Session.objects.create()
>>> e = Event.objects.create(time=now, session=s)
  RuntimeWarning: DateTimeField received a naive datetime (2012-02-19 15:03:55.892547) while time zone support is active.
  RuntimeWarning)
>>> now_tz = Event.objects.get(pk=e.pk).time  # ensure we're querying with the timezone-aware datetime
>>> now == now_tz  # these shouldn't be comparable
Traceback (most recent call last):
  File "<console>", line 1, in <module>
TypeError: can't compare offset-naive and offset-aware datetimes
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz)
[]
>>> now_tz in Session.objects.annotate(start=models.Min('events__time')).values_list('start', flat=True)
True
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz).count()
0
>>> Session.objects.annotate(start=models.Min('events__time')).count()
1
>>> Session.objects.annotate(start=models.Min('events__time')).filter(start__lt=now_tz).count()
1
>>> Event.objects.filter(time=now_tz).count()
1
>>> Event.objects.get(time=now_tz)
<Event: Event object>

When the QuerySet is annotated with the start time, it seems to not use the timezone in the subsequent filtering, though it is present when the list of values is reconstituted as Python objects.

These tests were done on SQLite, in case this is a databse-specific problem. I don't have another DB easily accessible to me at the moment, but I can set up a test later, if needed.

Attachments (3)

17728-failing-tests.diff (4.4 KB ) - added by Aymeric Augustin 12 years ago.
17728.diff (5.5 KB ) - added by Anssi Kääriäinen 12 years ago.
17728-2.diff (10.4 KB ) - added by Aymeric Augustin 12 years ago.

Download all attachments as: .zip

Change History (17)

comment:1 by Aymeric Augustin, 12 years ago

Owner: changed from nobody to Aymeric Augustin

comment:2 by Aymeric Augustin, 12 years ago

Severity: NormalRelease blocker

I haven't assessed the bug report yet, but it's probably a release blocker.

comment:3 by Aymeric Augustin, 12 years ago

Triage Stage: UnreviewedAccepted

I can reproduce the problem under SQLite and MySQL. Under PostgreSQL everything works as expected.

comment:4 by Aymeric Augustin, 12 years ago

It appears that connection.ops.value_to_db_datetime isn't called when filtering on an annotation. Strictly speaking, it's a bug of annotations rather than a bug of time zones.

At this point, I haven't managed to fix it without introducing other regressions.

The core of the problem is the fallback to Field.get_db_prep_lookup that happens in two places in WhereNode:

  146              params = [connection.ops.value_to_db_datetime(params_or_value)]
  147          else:
  148:             params = Field().get_db_prep_lookup(lookup_type, params_or_value,
  149                  connection=connection, prepared=True)
  150          if isinstance(lvalue, tuple):
  ...
  339                  # (we don't really want to waste time looking up the associated
  340                  # field object at the calling location).
  341:                 params = Field().get_db_prep_lookup(lookup_type, value,
  342                      connection=connection, prepared=True)
  343                  db_type = None

When the aggregation happens on a DateTimeField, we should call DateTimeField.get_db_prep_lookup instead, so that DateTimeField.get_db_prep_value is inkoved, and calls connection.ops.value_to_db_datetime. Overall, ignoring the subclasses of Field entirely is approximative, and in this case, wrong.

by Aymeric Augustin, 12 years ago

Attachment: 17728-failing-tests.diff added

comment:5 by Anssi Kääriäinen, 12 years ago

Has patch: set

The attached patch solves this issue at least on SQLite, but I must say it is just ugly. It fixes the immediate symptom without doing anything for the real problem.

by Anssi Kääriäinen, 12 years ago

Attachment: 17728.diff added

comment:6 by Aymeric Augustin, 12 years ago

I'm attaching a patch that solves the general problem: converting the value used to perform a lookup on the results of an annotation from a Python object to its database representation.

With the patch, Django uses the methods provided by the Field that describes the output of the aggregate, as if it were a regular model field.

There's a little bit of noise in the patch because I renamed value_annotation consistently (sometimes it was confusingly called annotation or value_annot).

by Aymeric Augustin, 12 years ago

Attachment: 17728-2.diff added

comment:7 by Anssi Kääriäinen, 12 years ago

I looked through the patch and it looks good. It not only solves this problem but also cleans the is_ordinal aggregate handling.

I was wondering that shouldn't this be about the type of the param_or_value, not about the field you are comparing against. After checking the code I think what is done in the patch is good, as DateField does its own transformations for the param_or_value so that it will be a date string in the query. There could be a problem if you compared a datetime field against something which doesn't expect datetime values, but I can't give any sane example. It doesn't make sense to compare datetime values against IntegerFields for example...

I am not at all sure that what DateField does for datetime values is sane. An example, where datef is a DateField and dtf is a DateTimeField:

    def test_date_compare_datetime(self):
        time = datetime.datetime(2012, 02, 22, 01, 00, 00).replace(tzinfo=EAT)
        WithDate.objects.create(datef=time, dtf=time)
        with override_settings(DEBUG=True):
            print len(WithDate.objects
                .filter(datef__gte=time)
                .filter(dtf__lte=time))
            print connection.queries
OUT:
SELECT ...
  FROM "timezones_withdate"
 WHERE ("timezones_withdate"."datef" >= 2012-02-22  
        AND "timezones_withdate"."dtf" <= 2012-02-21 22:00:00 )

So, what happens? The timestamp is 2012-02-21 22:00:00 in the query when viewed as a datefield, but it is 2012-02-22 as a date. Note that the date of the datetime value would not be equal to the date value. I think this boils down to in whose timezone the date is.

One viewpoint is that it should be in UTC, as in Python and in the database all datetimes should be viewed in UTC, so why not dates taken from datetimes, too?

Another viewpoint is to save in the time zone of the active time zone. I think this is what PostgreSQL does: if you save now() to a date column, the date that gets saved is dependent on what time zone is active.

However, I don't think just turning the aware datetime into a date in the time zone the datetime happens to be is correct choice from any viewpoint. I think that is what happens now, at least according to my tests using SQLite3.

Anyways, this is probably another ticket's problem.

comment:8 by Anssi Kääriäinen, 12 years ago

Cc: anssi.kaariainen@… added

comment:9 by Carl Meyer, 12 years ago

Triage Stage: AcceptedReady for checkin

Reviewed the patch, and it looks excellent to me. Confirmed that without the patch, the tests fail on MySQL and SQLite and pass on Postgres, and with the patch they pass on all three. Marking RFC; you can do the commit, Aymeric.

I do have two questions/comments; neither is a blocker for the patch as they're both pre-existing issues, but I'll note them here as they're related:

  1. I think the docstring for WhereNode.make_atom has been out of date for some time (the tuple it claims to handle is not the tuple it actually handles); this might be a good time to fix it.
  1. If there are a number of tests (all of them?) that should be shared verbatim between NewDatabaseTests and LegacyDatabaseTests, why not put those test methods on a mixin class that both inherit, to reduce code duplication and ensure that the tests actually do stay in sync?

Anssi, I haven't entirely gotten my head around what should happen in the case you're discussing, but I think it's clear that it should be its own ticket.

comment:10 by Anssi Kääriäinen, 12 years ago

I'll open another ticket for the datetime as date one.

comment:11 by Aymeric Augustin, 12 years ago

@Carl: 1. yes, I'll update the docstring of WhereNode.make_atom.

@Carl: 2. in fact, the tests are subtly different between NewDatabaseTests and LegacyDatabaseTests. It may be possible to factor them in a mixin and use different class variables; I remember decided against that when I initially wrote them because it made them hard to read. I'll look again, but outside of the scope of this ticket.

@Anssi: yes, the fact that DateTimeFields can be set or compared to date values is a problem when time zone support is enabled, since dates are naive objects. I've created #17742 to track this.

comment:12 by Anssi Kääriäinen, 12 years ago

I think this is related to this ticket: why aren't aware datetimes converted to non-aware UTC datetimes using .register_adapter for SQLite3? MySQL has something similar (a "conv" dicitionary). This way the bug for this ticket would not exists. Granted that the patch should be committed even without any bugs in this ticket.

For example if you are using raw cursors datetimes aren't converted correctly. If there was an adapter, they would work correctly in raw queries, too. Am I missing some design choice here?

comment:13 by Aymeric Augustin, 12 years ago

By design, the ORM's puts the responsibility of conversions between Python objects and their database representation in Field classes. These classes provide more flexibility, on a per-field basis, than registering converters and adapters.

My feeling is that we should walk away from adapters / converters, since their job is already performed by fields, and their existence could hide bugs this one. I can't think of a use case where the ORM would send to the database a value (a Python object) that isn't either associated to a Field, or a known type (typically, an int for a LIMIT clause).

I think the way forward is to improve DateField and DateTimeField so they handle appropriately both datetimes and dates.

comment:14 by Aymeric Augustin, 12 years ago

Resolution: fixed
Status: newclosed

In [17576]:

Fixed #17728 -- When filtering an annotation, ensured the values used in the filter are properly converted to their database representation. This bug was particularly visible with timezone-aware DateTimeFields. Thanks gg for the report and Carl for the review.

Note: See TracTickets for help on using tickets.
Back to Top