#18833 closed Bug (invalid)
Model returns incorrect values from Postgres DB View
Reported by: | Owned by: | nobody | |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 1.4 |
Severity: | Normal | Keywords: | |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
This is essentially the same report as http://stackoverflow.com/questions/11781867/django-queryset-returns-wrong-values-from-postgresql-view
I have a database view that returns aggregate data from table.
=> \d+ valid_logins_dow_popularity View "public.valid_logins_dow_popularity" Column | Type | Modifiers | Storage | Description ------------+---------+-----------+---------+------------- logins_avg | integer | | plain | dow | integer | | plain | View definition: WITH by_dow AS ( SELECT valid_logins_over_time.count, date_part('dow'::text, valid_logins_over_time.date) AS dow FROM valid_logins_over_time ) SELECT avg(by_dow.count)::integer AS logins_avg, by_dow.dow::integer AS dow FROM by_dow GROUP BY by_dow.dow;
I have a Django model that uses that view as its data source
from django.db import models # Create your models here. class ValidLoginsDowPopularity(models.Model): class Meta: db_table = 'valid_logins_dow_popularity' managed = False logins_avg = models.IntegerField(db_column='logins_avg') # Day of Week (dow) dow = models.IntegerField(db_column='dow', primary_key=True) def __unicode__(self): return u"%d : " % (self.dow, self.logins_avg )
I have a test program that simply prints the values returned via the model
$ cat test_model.py from core.models import * v = ValidLoginsDowPopularity.objects.all().order_by('dow') for i in v: print "logins_avg : %d | dow : %d" % (i.logins_avg, i.dow)
and the values printed
$ python test_model.py logins_avg : 45 | dow : 0 logins_avg : 137 | dow : 1 logins_avg : 141 | dow : 2 logins_avg : 140 | dow : 3 logins_avg : 142 | dow : 4 logins_avg : 114 | dow : 5 logins_avg : 45 | dow : 6
Unfortunately, these values are not the same as returned directly from the database. I cannot tell if there is a pattern to how they are different, but they vary nevertheless. Here is a direct query result.
=> select * from valid_logins_dow_popularity order by dow; logins_avg | dow ------------+----- 51 | 0 153 | 1 142 | 2 145 | 3 142 | 4 102 | 5 45 | 6 (7 rows)
Finally, here is a test in pure python using the psycopg2 driver (same as in use in Django) that gets the correct data and prints it as expected.
$ cat test_psycopg2.py import psycopg2 def main(): conn_string = "dbname='auser' user='auser'" conn = psycopg2.connect(conn_string) cursor = conn.cursor() sql = "select * from valid_logins_dow_popularity order by dow" cursor.execute(sql) for rec in cursor.fetchall(): print rec if __name__ == '__main__': main()
and the correct data
$ python test_psycopg2.py (51, 0) (153, 1) (142, 2) (145, 3) (142, 4) (102, 5) (45, 6)
finally, here is the critical version info
Django==1.4 psycopg2==2.4.5
Change History (4)
comment:1 by , 12 years ago
comment:2 by , 12 years ago
test_model.py is now
$ cat test_model.py && !! cat test_model.py && python test_model.py from django.db import connection from core.models import * v = ValidLoginsDowPopularity.objects.all().order_by('dow') for i in v: print "logins_avg : %d | dow : %d" % (i.logins_avg, i.dow) print connection.queries
and this results in the values
logins_avg : 47 | dow : 0 logins_avg : 137 | dow : 1 logins_avg : 157 | dow : 2 logins_avg : 154 | dow : 3 logins_avg : 159 | dow : 4 logins_avg : 128 | dow : 5 logins_avg : 51 | dow : 6 [{'time': '0.049', 'sql': 'SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC'}]
In the postgresql log this is shown:
2012-08-27 09:00:47 CDT LOG: statement: SET TIME ZONE 'UTC' 2012-08-27 09:00:47 CDT LOG: statement: SET default_transaction_isolation TO 'read committed' 2012-08-27 09:00:47 CDT LOG: statement: BEGIN 2012-08-27 09:00:47 CDT LOG: statement: SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC 2012-08-27 09:00:47 CDT LOG: unexpected EOF on client connection
the correct set of values should be
=> SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC; logins_avg | dow ------------+----- 54 | 0 153 | 1 158 | 2 158 | 3 160 | 4 115 | 5 50 | 6 (7 rows)
USE_TZ is set to True
The postgresql definition for valid_logins_over_time is itself a view:
=> \d+ valid_logins_over_time View "public.valid_logins_over_time" Column | Type | Modifiers | Storage | Description --------+--------------------------+-----------+---------+------------- count | bigint | | plain | date | timestamp with time zone | | plain | View definition: WITH dates AS ( SELECT date_trunc('day'::text, ('now'::text::date - s.a)::timestamp with time zone) AS date FROM generate_series(1, 50) s(a) ORDER BY date_trunc('day'::text, ('now'::text::date - s.a)::timestamp with time zone) ) SELECT count(*) AS count, dates.date FROM dates JOIN ( SELECT ib_account_data.account_number, date_trunc('day'::text, ib_account_data.last_login_date) AS last_login_day FROM ib_account_data GROUP BY ib_account_data.account_number, date_trunc('day'::text, ib_account_data.last_login_date)) uniq_i_login ON dates.date = uniq_i_login.last_login_day GROUP BY dates.date;
To pre-empt the next question, the definition for ib_account_data is:
=> \d ib_account_data Table "public.ib_account_data" Column | Type | Modifiers -------------------------+--------------------------+------------------------------------------------- entry_time | timestamp with time zone | not null default now() account_number | integer | not null account_holder_name | character varying | email_address | character varying | last_login_date | timestamp with time zone | last_login_ip | inet | invalid_login_count | integer | last_invalid_login_date | timestamp with time zone | invalid_answer_count | integer | last_pin_change_date | timestamp with time zone | optin_status | character varying | optin_date | timestamp with time zone | optin_two_way | character varying | optin_two_way_date | timestamp with time zone | cu_email | character varying | batch | integer | not null default currval('batch_num'::regclass) Indexes: "ib_account_data_pkey" PRIMARY KEY, btree (entry_time, account_number) "batch_idx" btree (batch)
Thanks for looking into this!
follow-up: 4 comment:3 by , 12 years ago
Resolution: | → invalid |
---|---|
Status: | new → closed |
If USE_TZ = True
, the database connection will be set to UTC, meaning aggregation will be performed in UTC.
If USE_TZ = False
, the database connection is set to TIME_ZONE
.
When you're connecting directly to PostgreSQL, the connection uses the system time zone.
As a consequence, aggregating on a DateTimeField by day can return different values, since the day of a datetime depends on the time zone.
comment:4 by , 12 years ago
Replying to aaugustin:
If
USE_TZ = True
, the database connection will be set to UTC, meaning aggregation will be performed in UTC.
If
USE_TZ = False
, the database connection is set toTIME_ZONE
.
When you're connecting directly to PostgreSQL, the connection uses the system time zone.
As a consequence, aggregating on a DateTimeField by day can return different values, since the day of a datetime depends on the time zone.
Exactly the right diagnosis. Changin USE_TZ to False returned all the expected information. Thank you.
Can you print (django.db.)
connection.queries
at the end of test_models.py? Also (django.conf.)settings.USE_TZ
and the postgresql table definition forvalid_logins_over_time
?