Opened 12 years ago

Closed 12 years ago

Last modified 12 years ago

#18833 closed Bug (invalid)

Model returns incorrect values from Postgres DB View

Reported by: paul.ortman@… Owned by: nobody
Component: Database layer (models, ORM) Version: 1.4
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

This is essentially the same report as http://stackoverflow.com/questions/11781867/django-queryset-returns-wrong-values-from-postgresql-view

I have a database view that returns aggregate data from table.

=> \d+ valid_logins_dow_popularity
        View "public.valid_logins_dow_popularity"
   Column   |  Type   | Modifiers | Storage | Description
------------+---------+-----------+---------+-------------
 logins_avg | integer |           | plain   |
 dow        | integer |           | plain   |
View definition:
 WITH by_dow AS (
         SELECT valid_logins_over_time.count, date_part('dow'::text, valid_logins_over_time.date) AS dow
           FROM valid_logins_over_time
        )
 SELECT avg(by_dow.count)::integer AS logins_avg, by_dow.dow::integer AS dow
   FROM by_dow
  GROUP BY by_dow.dow;

I have a Django model that uses that view as its data source

from django.db import models

# Create your models here.
class ValidLoginsDowPopularity(models.Model):
    class Meta:
        db_table = 'valid_logins_dow_popularity'
        managed = False

    logins_avg = models.IntegerField(db_column='logins_avg')

    # Day of Week (dow)
    dow = models.IntegerField(db_column='dow',
                              primary_key=True)

    def __unicode__(self):
        return u"%d : " % (self.dow, self.logins_avg )

I have a test program that simply prints the values returned via the model

$ cat test_model.py
from core.models import *

v = ValidLoginsDowPopularity.objects.all().order_by('dow')
for i in v:
    print "logins_avg : %d | dow : %d" % (i.logins_avg, i.dow)

and the values printed

$ python test_model.py
logins_avg : 45 | dow : 0
logins_avg : 137 | dow : 1
logins_avg : 141 | dow : 2
logins_avg : 140 | dow : 3
logins_avg : 142 | dow : 4
logins_avg : 114 | dow : 5
logins_avg : 45 | dow : 6

Unfortunately, these values are not the same as returned directly from the database. I cannot tell if there is a pattern to how they are different, but they vary nevertheless. Here is a direct query result.

=> select * from valid_logins_dow_popularity order by dow;
 logins_avg | dow
------------+-----
         51 |   0
        153 |   1
        142 |   2
        145 |   3
        142 |   4
        102 |   5
         45 |   6
(7 rows)

Finally, here is a test in pure python using the psycopg2 driver (same as in use in Django) that gets the correct data and prints it as expected.

$ cat test_psycopg2.py
import psycopg2

def main():
    conn_string = "dbname='auser' user='auser'"

    conn = psycopg2.connect(conn_string)
    cursor = conn.cursor()

    sql = "select * from valid_logins_dow_popularity order by dow"
    cursor.execute(sql)

    for rec in cursor.fetchall():
        print rec

if __name__ == '__main__':
    main()

and the correct data

$ python test_psycopg2.py
(51, 0)
(153, 1)
(142, 2)
(145, 3)
(142, 4)
(102, 5)
(45, 6)

finally, here is the critical version info

Django==1.4
psycopg2==2.4.5

Change History (4)

comment:1 by Melvyn Sopacua, 12 years ago

Can you print (django.db.)connection.queries at the end of test_models.py? Also (django.conf.)settings.USE_TZ and the postgresql table definition for valid_logins_over_time?

comment:2 by paul.ortman@…, 12 years ago

test_model.py is now

$ cat test_model.py && !!
cat test_model.py && python test_model.py
from django.db import connection
from core.models import *

v = ValidLoginsDowPopularity.objects.all().order_by('dow')
for i in v:
    print "logins_avg : %d | dow : %d" % (i.logins_avg, i.dow)

print connection.queries

and this results in the values

logins_avg : 47 | dow : 0
logins_avg : 137 | dow : 1
logins_avg : 157 | dow : 2
logins_avg : 154 | dow : 3
logins_avg : 159 | dow : 4
logins_avg : 128 | dow : 5
logins_avg : 51 | dow : 6
[{'time': '0.049', 'sql': 'SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC'}]

In the postgresql log this is shown:

2012-08-27 09:00:47 CDT LOG:  statement: SET TIME ZONE 'UTC'
2012-08-27 09:00:47 CDT LOG:  statement: SET default_transaction_isolation TO 'read committed'
2012-08-27 09:00:47 CDT LOG:  statement: BEGIN
2012-08-27 09:00:47 CDT LOG:  statement: SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC
2012-08-27 09:00:47 CDT LOG:  unexpected EOF on client connection

the correct set of values should be

=> SELECT "valid_logins_dow_popularity"."logins_avg", "valid_logins_dow_popularity"."dow" FROM "valid_logins_dow_popularity" ORDER BY "valid_logins_dow_popularity"."dow" ASC;
 logins_avg | dow
------------+-----
         54 |   0
        153 |   1
        158 |   2
        158 |   3
        160 |   4
        115 |   5
         50 |   6
(7 rows)

USE_TZ is set to True

The postgresql definition for valid_logins_over_time is itself a view:

=> \d+ valid_logins_over_time
                 View "public.valid_logins_over_time"
 Column |           Type           | Modifiers | Storage | Description
--------+--------------------------+-----------+---------+-------------
 count  | bigint                   |           | plain   |
 date   | timestamp with time zone |           | plain   |
View definition:
 WITH dates AS (
         SELECT date_trunc('day'::text, ('now'::text::date - s.a)::timestamp with time zone) AS date
           FROM generate_series(1, 50) s(a)
          ORDER BY date_trunc('day'::text, ('now'::text::date - s.a)::timestamp with time zone)
        )
 SELECT count(*) AS count, dates.date
   FROM dates
   JOIN ( SELECT ib_account_data.account_number, date_trunc('day'::text, ib_account_data.last_login_date) AS last_login_day
           FROM ib_account_data
          GROUP BY ib_account_data.account_number, date_trunc('day'::text, ib_account_data.last_login_date)) uniq_i_login ON dates.date = uniq_i_login.last_login_day
  GROUP BY dates.date;

To pre-empt the next question, the definition for ib_account_data is:

=> \d ib_account_data
                                    Table "public.ib_account_data"
         Column          |           Type           |                    Modifiers
-------------------------+--------------------------+-------------------------------------------------
 entry_time              | timestamp with time zone | not null default now()
 account_number          | integer                  | not null
 account_holder_name     | character varying        |
 email_address           | character varying        |
 last_login_date         | timestamp with time zone |
 last_login_ip           | inet                     |
 invalid_login_count     | integer                  |
 last_invalid_login_date | timestamp with time zone |
 invalid_answer_count    | integer                  |
 last_pin_change_date    | timestamp with time zone |
 optin_status            | character varying        |
 optin_date              | timestamp with time zone |
 optin_two_way           | character varying        |
 optin_two_way_date      | timestamp with time zone |
 cu_email                | character varying        |
 batch                   | integer                  | not null default currval('batch_num'::regclass)
Indexes:
    "ib_account_data_pkey" PRIMARY KEY, btree (entry_time, account_number)
    "batch_idx" btree (batch)

Thanks for looking into this!

comment:3 by Aymeric Augustin, 12 years ago

Resolution: invalid
Status: newclosed

If USE_TZ = True, the database connection will be set to UTC, meaning aggregation will be performed in UTC.

If USE_TZ = False, the database connection is set to TIME_ZONE.

When you're connecting directly to PostgreSQL, the connection uses the system time zone.

As a consequence, aggregating on a DateTimeField by day can return different values, since the day of a datetime depends on the time zone.

in reply to:  3 comment:4 by paul.ortman@…, 12 years ago

Replying to aaugustin:

If USE_TZ = True, the database connection will be set to UTC, meaning aggregation will be performed in UTC.

If USE_TZ = False, the database connection is set to TIME_ZONE.

When you're connecting directly to PostgreSQL, the connection uses the system time zone.

As a consequence, aggregating on a DateTimeField by day can return different values, since the day of a datetime depends on the time zone.

Exactly the right diagnosis. Changin USE_TZ to False returned all the expected information. Thank you.

Note: See TracTickets for help on using tickets.
Back to Top