Opened 13 years ago
Closed 13 years ago
#17728 closed Bug (fixed)
Filtering of annotated querysets broken with timezone-aware datetimes
Reported by: | Gabriel Grant | Owned by: | Aymeric Augustin |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.4-beta-1 |
Severity: | Release blocker | Keywords: | regression |
Cc: | anssi.kaariainen@… | Triage Stage: | Ready for checkin |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
There appears to be a regression with timezone-aware datetimes that causes them to not match "exact" queries properly when added as an annotation.
This is easiest explained with an example. Say we have a series of "events" which are grouped into "sessions". We wish to find the start time of the session (i.e. the time of the session's earliest event)
models.py:
from django.db import models from django.contrib.auth.models import User class Event(models.Model): time = models.DateTimeField() session = models.ForeignKey('Session', related_name='events', null=True) class Session(models.Model): user = models.ForeignKey(User, related_name='km_sessions', null=True)
With USE_TZ = False (or with Django 1.3) it works as expected:
>>> from datetime import datetime >>> from django.db import models >>> from django.conf import settings >>> print settings.USE_TZ False >>> from aggregate_fields.models import Session, Event >>> now = datetime.now() >>> s = Session.objects.create() >>> e = Event.objects.create(time=now, session=s) >>> now_tz = Event.objects.get(pk=e.pk).time # ensure we're querying with the as-saved datetime >>> now == now_tz # timezone are disabled, so these should be equivalent True >>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz) [<Session: Session object>] >>> now_tz in Session.objects.annotate(start=models.Min('events__time')).values_list('start', flat=True) True >>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz).count() 1 >>> Session.objects.annotate(start=models.Min('events__time')).count() 1 >>> Session.objects.annotate(start=models.Min('events__time')).filter(start__lt=now_tz).count() 0 >>> Event.objects.filter(time=now_tz).count() 1 >>> Event.objects.get(time=now_tz) <Event: Event object>
But with USE_TZ = False the results are inconsistent:
>>> from datetime import datetime >>> from django.db import models >>> from django.conf import settings >>> print settings.USE_TZ True >>> from aggregate_fields.models import Session, Event >>> now = datetime.now() >>> s = Session.objects.create() >>> e = Event.objects.create(time=now, session=s) RuntimeWarning: DateTimeField received a naive datetime (2012-02-19 15:03:55.892547) while time zone support is active. RuntimeWarning) >>> now_tz = Event.objects.get(pk=e.pk).time # ensure we're querying with the timezone-aware datetime >>> now == now_tz # these shouldn't be comparable Traceback (most recent call last): File "<console>", line 1, in <module> TypeError: can't compare offset-naive and offset-aware datetimes >>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz) [] >>> now_tz in Session.objects.annotate(start=models.Min('events__time')).values_list('start', flat=True) True >>> Session.objects.annotate(start=models.Min('events__time')).filter(start=now_tz).count() 0 >>> Session.objects.annotate(start=models.Min('events__time')).count() 1 >>> Session.objects.annotate(start=models.Min('events__time')).filter(start__lt=now_tz).count() 1 >>> Event.objects.filter(time=now_tz).count() 1 >>> Event.objects.get(time=now_tz) <Event: Event object>
When the QuerySet is annotated with the start time, it seems to not use the timezone in the subsequent filtering, though it is present when the list of values is reconstituted as Python objects.
These tests were done on SQLite, in case this is a databse-specific problem. I don't have another DB easily accessible to me at the moment, but I can set up a test later, if needed.
Attachments (3)
Change History (17)
comment:1 by , 13 years ago
Owner: | changed from | to
---|
comment:2 by , 13 years ago
Severity: | Normal → Release blocker |
---|
comment:3 by , 13 years ago
Triage Stage: | Unreviewed → Accepted |
---|
I can reproduce the problem under SQLite and MySQL. Under PostgreSQL everything works as expected.
comment:4 by , 13 years ago
It appears that connection.ops.value_to_db_datetime
isn't called when filtering on an annotation. Strictly speaking, it's a bug of annotations rather than a bug of time zones.
At this point, I haven't managed to fix it without introducing other regressions.
The core of the problem is the fallback to Field.get_db_prep_lookup
that happens in two places in WhereNode
:
146 params = [connection.ops.value_to_db_datetime(params_or_value)] 147 else: 148: params = Field().get_db_prep_lookup(lookup_type, params_or_value, 149 connection=connection, prepared=True) 150 if isinstance(lvalue, tuple): ... 339 # (we don't really want to waste time looking up the associated 340 # field object at the calling location). 341: params = Field().get_db_prep_lookup(lookup_type, value, 342 connection=connection, prepared=True) 343 db_type = None
When the aggregation happens on a DateTimeField
, we should call DateTimeField.get_db_prep_lookup
instead, so that DateTimeField.get_db_prep_value
is inkoved, and calls connection.ops.value_to_db_datetime
. Overall, ignoring the subclasses of Field
entirely is approximative, and in this case, wrong.
by , 13 years ago
Attachment: | 17728-failing-tests.diff added |
---|
comment:5 by , 13 years ago
Has patch: | set |
---|
The attached patch solves this issue at least on SQLite, but I must say it is just ugly. It fixes the immediate symptom without doing anything for the real problem.
by , 13 years ago
Attachment: | 17728.diff added |
---|
comment:6 by , 13 years ago
I'm attaching a patch that solves the general problem: converting the value used to perform a lookup on the results of an annotation from a Python object to its database representation.
With the patch, Django uses the methods provided by the Field
that describes the output of the aggregate, as if it were a regular model field.
There's a little bit of noise in the patch because I renamed value_annotation
consistently (sometimes it was confusingly called annotation
or value_annot
).
by , 13 years ago
Attachment: | 17728-2.diff added |
---|
comment:7 by , 13 years ago
I looked through the patch and it looks good. It not only solves this problem but also cleans the is_ordinal aggregate handling.
I was wondering that shouldn't this be about the type of the param_or_value, not about the field you are comparing against. After checking the code I think what is done in the patch is good, as DateField
does its own transformations for the param_or_value so that it will be a date string in the query. There could be a problem if you compared a datetime field against something which doesn't expect datetime values, but I can't give any sane example. It doesn't make sense to compare datetime values against IntegerFields
for example...
I am not at all sure that what DateField
does for datetime values is sane. An example, where datef is a DateField
and dtf is a DateTimeField
:
def test_date_compare_datetime(self): time = datetime.datetime(2012, 02, 22, 01, 00, 00).replace(tzinfo=EAT) WithDate.objects.create(datef=time, dtf=time) with override_settings(DEBUG=True): print len(WithDate.objects .filter(datef__gte=time) .filter(dtf__lte=time)) print connection.queries OUT: SELECT ... FROM "timezones_withdate" WHERE ("timezones_withdate"."datef" >= 2012-02-22 AND "timezones_withdate"."dtf" <= 2012-02-21 22:00:00 )
So, what happens? The timestamp is 2012-02-21 22:00:00 in the query when viewed as a datefield, but it is 2012-02-22 as a date. Note that the date of the datetime value would not be equal to the date value. I think this boils down to in whose timezone the date is.
One viewpoint is that it should be in UTC, as in Python and in the database all datetimes should be viewed in UTC, so why not dates taken from datetimes, too?
Another viewpoint is to save in the time zone of the active time zone. I think this is what PostgreSQL does: if you save now() to a date column, the date that gets saved is dependent on what time zone is active.
However, I don't think just turning the aware datetime into a date in the time zone the datetime happens to be is correct choice from any viewpoint. I think that is what happens now, at least according to my tests using SQLite3.
Anyways, this is probably another ticket's problem.
comment:8 by , 13 years ago
Cc: | added |
---|
comment:9 by , 13 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Reviewed the patch, and it looks excellent to me. Confirmed that without the patch, the tests fail on MySQL and SQLite and pass on Postgres, and with the patch they pass on all three. Marking RFC; you can do the commit, Aymeric.
I do have two questions/comments; neither is a blocker for the patch as they're both pre-existing issues, but I'll note them here as they're related:
- I think the docstring for
WhereNode.make_atom
has been out of date for some time (the tuple it claims to handle is not the tuple it actually handles); this might be a good time to fix it.
- If there are a number of tests (all of them?) that should be shared verbatim between NewDatabaseTests and LegacyDatabaseTests, why not put those test methods on a mixin class that both inherit, to reduce code duplication and ensure that the tests actually do stay in sync?
Anssi, I haven't entirely gotten my head around what should happen in the case you're discussing, but I think it's clear that it should be its own ticket.
comment:11 by , 13 years ago
@Carl: 1. yes, I'll update the docstring of WhereNode.make_atom
.
@Carl: 2. in fact, the tests are subtly different between NewDatabaseTests and LegacyDatabaseTests. It may be possible to factor them in a mixin and use different class variables; I remember decided against that when I initially wrote them because it made them hard to read. I'll look again, but outside of the scope of this ticket.
@Anssi: yes, the fact that DateTimeField
s can be set or compared to date
values is a problem when time zone support is enabled, since date
s are naive objects. I've created #17742 to track this.
comment:12 by , 13 years ago
I think this is related to this ticket: why aren't aware datetimes converted to non-aware UTC datetimes using .register_adapter for SQLite3? MySQL has something similar (a "conv" dicitionary). This way the bug for this ticket would not exists. Granted that the patch should be committed even without any bugs in this ticket.
For example if you are using raw cursors datetimes aren't converted correctly. If there was an adapter, they would work correctly in raw queries, too. Am I missing some design choice here?
comment:13 by , 13 years ago
By design, the ORM's puts the responsibility of conversions between Python objects and their database representation in Field
classes. These classes provide more flexibility, on a per-field basis, than registering converters and adapters.
My feeling is that we should walk away from adapters / converters, since their job is already performed by fields, and their existence could hide bugs this one. I can't think of a use case where the ORM would send to the database a value (a Python object) that isn't either associated to a Field
, or a known type (typically, an int
for a LIMIT
clause).
I think the way forward is to improve DateField
and DateTimeField
so they handle appropriately both datetime
s and date
s.
I haven't assessed the bug report yet, but it's probably a release blocker.