#18833 closed Bug (invalid)
Model returns incorrect values from Postgres DB View
| Reported by: | Owned by: | nobody | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.4 |
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
This is essentially the same report as http://stackoverflow.com/questions/11781867/django-queryset-returns-wrong-values-from-postgresql-view
I have a database view that returns aggregate data from table.
=> \d+ valid_logins_dow_popularity
View "public.valid_logins_dow_popularity"
Column | Type | Modifiers | Storage | Description
------------+---------+-----------+---------+-------------
logins_avg | integer | | plain |
dow | integer | | plain |
View definition:
WITH by_dow AS (
SELECT valid_logins_over_time.count, date_part('dow'::text, valid_logins_over_time.date) AS dow
FROM valid_logins_over_time
)
SELECT avg(by_dow.count)::integer AS logins_avg, by_dow.dow::integer AS dow
FROM by_dow
GROUP BY by_dow.dow;
I have a Django model that uses that view as its data source
from django.db import models
# Create your models here.
class ValidLoginsDowPopularity(models.Model):
class Meta:
db_table = 'valid_logins_dow_popularity'
managed = False
logins_avg = models.IntegerField(db_column='logins_avg')
# Day of Week (dow)
dow = models.IntegerField(db_column='dow',
primary_key=True)
def __unicode__(self):
return u"%d : " % (self.dow, self.logins_avg )
I have a test program that simply prints the values returned via the model
$ cat test_model.py
from core.models import *
v = ValidLoginsDowPopularity.objects.all().order_by('dow')
for i in v:
print "logins_avg : %d | dow : %d" % (i.logins_avg, i.dow)
and the values printed
$ python test_model.py logins_avg : 45 | dow : 0 logins_avg : 137 | dow : 1 logins_avg : 141 | dow : 2 logins_avg : 140 | dow : 3 logins_avg : 142 | dow : 4 logins_avg : 114 | dow : 5 logins_avg : 45 | dow : 6
Unfortunately, these values are not the same as returned directly from the database. I cannot tell if there is a pattern to how they are different, but they vary nevertheless. Here is a direct query result.
=> select * from valid_logins_dow_popularity order by dow;
logins_avg | dow
------------+-----
51 | 0
153 | 1
142 | 2
145 | 3
142 | 4
102 | 5
45 | 6
(7 rows)
Finally, here is a test in pure python using the psycopg2 driver (same as in use in Django) that gets the correct data and prints it as expected.
$ cat test_psycopg2.py
import psycopg2
def main():
conn_string = "dbname='auser' user='auser'"
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
sql = "select * from valid_logins_dow_popularity order by dow"
cursor.execute(sql)
for rec in cursor.fetchall():
print rec
if __name__ == '__main__':
main()
and the correct data
$ python test_psycopg2.py (51, 0) (153, 1) (142, 2) (145, 3) (142, 4) (102, 5) (45, 6)
finally, here is the critical version info
Django==1.4 psycopg2==2.4.5
Change History (4)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
test_model.py is now
$ cat test_model.py && !!
cat test_model.py && python test_model.py
from django.db import connection
from core.models import *
v = ValidLoginsDowPopularity.objects.all().order_by('dow')
for i in v:
print "logins_avg : %d | dow : %d" % (i.logins_avg, i.dow)
print connection.queries
and this results in the values
logins_avg : 47 | dow : 0
logins_avg : 137 | dow : 1
logins_avg : 157 | dow : 2
logins_avg : 154 | dow : 3
logins_avg : 159 | dow : 4
logins_avg : 128 | dow : 5
logins_avg : 51 | dow : 6
[{'time': '0.049', 'sql': 'SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC'}]
In the postgresql log this is shown:
2012-08-27 09:00:47 CDT LOG: statement: SET TIME ZONE 'UTC' 2012-08-27 09:00:47 CDT LOG: statement: SET default_transaction_isolation TO 'read committed' 2012-08-27 09:00:47 CDT LOG: statement: BEGIN 2012-08-27 09:00:47 CDT LOG: statement: SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC 2012-08-27 09:00:47 CDT LOG: unexpected EOF on client connection
the correct set of values should be
=> SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC;
logins_avg | dow
------------+-----
54 | 0
153 | 1
158 | 2
158 | 3
160 | 4
115 | 5
50 | 6
(7 rows)
USE_TZ is set to True
The postgresql definition for valid_logins_over_time is itself a view:
=> \d+ valid_logins_over_time
View "public.valid_logins_over_time"
Column | Type | Modifiers | Storage | Description
--------+--------------------------+-----------+---------+-------------
count | bigint | | plain |
date | timestamp with time zone | | plain |
View definition:
WITH dates AS (
SELECT date_trunc('day'::text, ('now'::text::date - s.a)::timestamp with time zone) AS date
FROM generate_series(1, 50) s(a)
ORDER BY date_trunc('day'::text, ('now'::text::date - s.a)::timestamp with time zone)
)
SELECT count(*) AS count, dates.date
FROM dates
JOIN ( SELECT ib_account_data.account_number, date_trunc('day'::text, ib_account_data.last_login_date) AS last_login_day
FROM ib_account_data
GROUP BY ib_account_data.account_number, date_trunc('day'::text, ib_account_data.last_login_date)) uniq_i_login ON dates.date = uniq_i_login.last_login_day
GROUP BY dates.date;
To pre-empt the next question, the definition for ib_account_data is:
=> \d ib_account_data
Table "public.ib_account_data"
Column | Type | Modifiers
-------------------------+--------------------------+-------------------------------------------------
entry_time | timestamp with time zone | not null default now()
account_number | integer | not null
account_holder_name | character varying |
email_address | character varying |
last_login_date | timestamp with time zone |
last_login_ip | inet |
invalid_login_count | integer |
last_invalid_login_date | timestamp with time zone |
invalid_answer_count | integer |
last_pin_change_date | timestamp with time zone |
optin_status | character varying |
optin_date | timestamp with time zone |
optin_two_way | character varying |
optin_two_way_date | timestamp with time zone |
cu_email | character varying |
batch | integer | not null default currval('batch_num'::regclass)
Indexes:
"ib_account_data_pkey" PRIMARY KEY, btree (entry_time, account_number)
"batch_idx" btree (batch)
Thanks for looking into this!
follow-up: 4 comment:3 by , 13 years ago
| Resolution: | → invalid |
|---|---|
| Status: | new → closed |
If USE_TZ = True, the database connection will be set to UTC, meaning aggregation will be performed in UTC.
If USE_TZ = False, the database connection is set to TIME_ZONE.
When you're connecting directly to PostgreSQL, the connection uses the system time zone.
As a consequence, aggregating on a DateTimeField by day can return different values, since the day of a datetime depends on the time zone.
comment:4 by , 13 years ago
Replying to aaugustin:
If
USE_TZ = True, the database connection will be set to UTC, meaning aggregation will be performed in UTC.
If
USE_TZ = False, the database connection is set toTIME_ZONE.
When you're connecting directly to PostgreSQL, the connection uses the system time zone.
As a consequence, aggregating on a DateTimeField by day can return different values, since the day of a datetime depends on the time zone.
Exactly the right diagnosis. Changin USE_TZ to False returned all the expected information. Thank you.
Can you print (django.db.)
connection.queriesat the end of test_models.py? Also (django.conf.)settings.USE_TZand the postgresql table definition forvalid_logins_over_time?