#18833 closed Bug (invalid)
Model returns incorrect values from Postgres DB View
| Reported by: | Owned by: | nobody | |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 1.4 | 
| Severity: | Normal | Keywords: | |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | no | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | no | 
| Easy pickings: | no | UI/UX: | no | 
Description
This is essentially the same report as http://stackoverflow.com/questions/11781867/django-queryset-returns-wrong-values-from-postgresql-view
I have a database view that returns aggregate data from table.
=> \d+ valid_logins_dow_popularity
        View "public.valid_logins_dow_popularity"
   Column   |  Type   | Modifiers | Storage | Description
------------+---------+-----------+---------+-------------
 logins_avg | integer |           | plain   |
 dow        | integer |           | plain   |
View definition:
 WITH by_dow AS (
         SELECT valid_logins_over_time.count, date_part('dow'::text, valid_logins_over_time.date) AS dow
           FROM valid_logins_over_time
        )
 SELECT avg(by_dow.count)::integer AS logins_avg, by_dow.dow::integer AS dow
   FROM by_dow
  GROUP BY by_dow.dow;
I have a Django model that uses that view as its data source
from django.db import models
# Create your models here.
class ValidLoginsDowPopularity(models.Model):
    class Meta:
        db_table = 'valid_logins_dow_popularity'
        managed = False
    logins_avg = models.IntegerField(db_column='logins_avg')
    # Day of Week (dow)
    dow = models.IntegerField(db_column='dow',
                              primary_key=True)
    def __unicode__(self):
        return u"%d : " % (self.dow, self.logins_avg )
I have a test program that simply prints the values returned via the model
$ cat test_model.py
from core.models import *
v = ValidLoginsDowPopularity.objects.all().order_by('dow')
for i in v:
    print "logins_avg : %d | dow : %d" % (i.logins_avg, i.dow)
and the values printed
$ python test_model.py logins_avg : 45 | dow : 0 logins_avg : 137 | dow : 1 logins_avg : 141 | dow : 2 logins_avg : 140 | dow : 3 logins_avg : 142 | dow : 4 logins_avg : 114 | dow : 5 logins_avg : 45 | dow : 6
Unfortunately, these values are not the same as returned directly from the database.  I cannot tell if there is a pattern to how they are different, but they vary nevertheless.  Here is a direct query result.
=> select * from valid_logins_dow_popularity order by dow;
 logins_avg | dow
------------+-----
         51 |   0
        153 |   1
        142 |   2
        145 |   3
        142 |   4
        102 |   5
         45 |   6
(7 rows)
Finally, here is a test in pure python using the psycopg2 driver (same as in use in Django) that gets the correct data and prints it as expected.
$ cat test_psycopg2.py
import psycopg2
def main():
    conn_string = "dbname='auser' user='auser'"
    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()
    sql = "select * from valid_logins_dow_popularity order by dow"
    cursor.execute(sql)
    for rec in cursor.fetchall():
        print rec
if __name__ == '__main__':
    main()
and the correct data
$ python test_psycopg2.py (51, 0) (153, 1) (142, 2) (145, 3) (142, 4) (102, 5) (45, 6)
finally, here is the critical version info
Django==1.4 psycopg2==2.4.5
Change History (4)
comment:1 by , 13 years ago
comment:2 by , 13 years ago
test_model.py is now
$ cat test_model.py && !!
cat test_model.py && python test_model.py
from django.db import connection
from core.models import *
v = ValidLoginsDowPopularity.objects.all().order_by('dow')
for i in v:
    print "logins_avg : %d | dow : %d" % (i.logins_avg, i.dow)
print connection.queries
and this results in the values
logins_avg : 47 | dow : 0
logins_avg : 137 | dow : 1
logins_avg : 157 | dow : 2
logins_avg : 154 | dow : 3
logins_avg : 159 | dow : 4
logins_avg : 128 | dow : 5
logins_avg : 51 | dow : 6
[{'time': '0.049', 'sql': 'SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC'}]
In the postgresql log this is shown:
2012-08-27 09:00:47 CDT LOG: statement: SET TIME ZONE 'UTC' 2012-08-27 09:00:47 CDT LOG: statement: SET default_transaction_isolation TO 'read committed' 2012-08-27 09:00:47 CDT LOG: statement: BEGIN 2012-08-27 09:00:47 CDT LOG: statement: SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC 2012-08-27 09:00:47 CDT LOG: unexpected EOF on client connection
the correct set of values should be
=> SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC;
 logins_avg | dow
------------+-----
         54 |   0
        153 |   1
        158 |   2
        158 |   3
        160 |   4
        115 |   5
         50 |   6
(7 rows)
USE_TZ is set to True
The postgresql definition for valid_logins_over_time is itself a view:
=> \d+ valid_logins_over_time
                 View "public.valid_logins_over_time"
 Column |           Type           | Modifiers | Storage | Description
--------+--------------------------+-----------+---------+-------------
 count  | bigint                   |           | plain   |
 date   | timestamp with time zone |           | plain   |
View definition:
 WITH dates AS (
         SELECT date_trunc('day'::text, ('now'::text::date - s.a)::timestamp with time zone) AS date
           FROM generate_series(1, 50) s(a)
          ORDER BY date_trunc('day'::text, ('now'::text::date - s.a)::timestamp with time zone)
        )
 SELECT count(*) AS count, dates.date
   FROM dates
   JOIN ( SELECT ib_account_data.account_number, date_trunc('day'::text, ib_account_data.last_login_date) AS last_login_day
           FROM ib_account_data
          GROUP BY ib_account_data.account_number, date_trunc('day'::text, ib_account_data.last_login_date)) uniq_i_login ON dates.date = uniq_i_login.last_login_day
  GROUP BY dates.date;
To pre-empt the next question, the definition for ib_account_data is:
=> \d ib_account_data
                                    Table "public.ib_account_data"
         Column          |           Type           |                    Modifiers
-------------------------+--------------------------+-------------------------------------------------
 entry_time              | timestamp with time zone | not null default now()
 account_number          | integer                  | not null
 account_holder_name     | character varying        |
 email_address           | character varying        |
 last_login_date         | timestamp with time zone |
 last_login_ip           | inet                     |
 invalid_login_count     | integer                  |
 last_invalid_login_date | timestamp with time zone |
 invalid_answer_count    | integer                  |
 last_pin_change_date    | timestamp with time zone |
 optin_status            | character varying        |
 optin_date              | timestamp with time zone |
 optin_two_way           | character varying        |
 optin_two_way_date      | timestamp with time zone |
 cu_email                | character varying        |
 batch                   | integer                  | not null default currval('batch_num'::regclass)
Indexes:
    "ib_account_data_pkey" PRIMARY KEY, btree (entry_time, account_number)
    "batch_idx" btree (batch)
Thanks for looking into this!
follow-up: 4 comment:3 by , 13 years ago
| Resolution: | → invalid | 
|---|---|
| Status: | new → closed | 
If USE_TZ = True, the database connection will be set to UTC, meaning aggregation will be performed in UTC.
If USE_TZ = False, the database connection is set to TIME_ZONE.
When you're connecting directly to PostgreSQL, the connection uses the system time zone.
As a consequence, aggregating on a DateTimeField by day can return different values, since the day of a datetime depends on the time zone.
comment:4 by , 13 years ago
Replying to aaugustin:
If
USE_TZ = True, the database connection will be set to UTC, meaning aggregation will be performed in UTC.
If
USE_TZ = False, the database connection is set toTIME_ZONE.
When you're connecting directly to PostgreSQL, the connection uses the system time zone.
As a consequence, aggregating on a DateTimeField by day can return different values, since the day of a datetime depends on the time zone.
Exactly the right diagnosis.  Changin USE_TZ to False returned all the expected information.  Thank you.
Can you print (django.db.)
connection.queriesat the end of test_models.py? Also (django.conf.)settings.USE_TZand the postgresql table definition forvalid_logins_over_time?