Opened 7 years ago

Closed 7 years ago

#27498 closed Bug (fixed)

Filtering annotated field in SQLite returns wrong results

Reported by: Tim Düllmann Owned by: Peter Inglesby
Component: Database layer (models, ORM) Version: dev
Severity: Normal Keywords: sqlite, annotations, filter
Cc: Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by Anton Samarchyan)

When filtering an annotated field with SQLite backend the result from Django QuerySet differs from what the raw SQL-query returns. This does not affect other databases, only SQLite.

The following is the problematic queryset:

qs = Product.objects.annotate(qty_available_sum=Sum('stock__qty_available')) \
                    .annotate(qty_needed=F('qty_target') - F('qty_available_sum')) \
                    .filter(qty_needed__gt=0)

This translates to the following Query, which returns the correct results, when run natively:

SELECT "testapp_product"."id", "testapp_product"."name", "testapp_product"."qty_target", CAST(SUM("testapp_stock"."qty_available") AS NUMERIC) AS "qty_available_sum", ("testapp_product"."qty_target" - CAST(SUM("testapp_stock"."qty_available") AS NUMERIC)) AS "qty_needed" 
FROM "testapp_product" 
LEFT OUTER JOIN "testapp_stock" ON ("testapp_product"."id" = "testapp_stock"."product_id") 
GROUP BY "testapp_product"."id", "testapp_product"."name", "testapp_product"."qty_target" 
HAVING ("testapp_product"."qty_target" - CAST(SUM("testapp_stock"."qty_available") AS NUMERIC)) > 0

Using the QuerySet, it does not respect the filter condition. For details see the test below.

Here is a simple App with a test that shows the difference. The complete files with imports are attached.

######
# models.py:
######

from django.db import models

class Product(models.Model):
    name = models.CharField(max_length=80)
    qty_target = models.DecimalField(max_digits=6, decimal_places=2)

class Stock(models.Model):
    product = models.ForeignKey(Product, related_name="stock")
    qty_available = models.DecimalField(max_digits=6, decimal_places=2)

######
# tests.py:
######

from django.test import TestCase
from django.db.models.aggregates import Sum
from django.db import connection
from django.db.models.expressions import F
from testapp.models import Product, Stock

class ErrorTestCase(TestCase):

    def setUp(self):
        p1 = Product.objects.create(name="Product1", qty_target=10)
        p2 = Product.objects.create(name="Product2", qty_target=10)
        p3 = Product.objects.create(name="Product3", qty_target=10)

        s1 = Stock.objects.create(product=p1, qty_available=5)
        s2 = Stock.objects.create(product=p1, qty_available=5)
        s3 = Stock.objects.create(product=p1, qty_available=3)  # 3 over target

        s4 = Stock.objects.create(product=p2, qty_available=5)
        s5 = Stock.objects.create(product=p2, qty_available=4)  # 1 under target

        s6 = Stock.objects.create(product=p3, qty_available=0)  # 10 under target

    def testError(self):
        # This is the exciting stuff:
        qs = Product.objects.annotate(qty_available_sum=Sum('stock__qty_available'))\
                            .annotate(qty_needed=F('qty_target') - F('qty_available_sum'))\
                            .filter(qty_needed__gt=0)

        # Retrieve by raw query:
        query = str(qs.query)
        print("# Query from QuerySet:")
        print(query)
        rows = []
        with connection.cursor() as cursor:
            cursor.execute(query)
            rows = cursor.fetchall()
        print("# Results from above Query:")
        for r in rows:
            print(r)
        self.assertEqual(len(rows), 2, "Two products need stock by SQL query.")

        # retrieve by Django QuerySet:
        print("# Results from QuerySet:")
        for q in qs:
            print(q, q.qty_needed)
        self.assertEqual(qs.count(), 2, "Two products need stock by Django QuerySet.")

And here is the output of the test. You can see the full query as it is retrieved from the queryset and then run natively to retrieve the correct amount of two rows. After that the same queryset is used to retreive the data, but it retreived no row at all.

######
# Output with SQLite:
######
Creating test database for alias 'default'...

# Query from QuerySet:
SELECT "testapp_product"."id", "testapp_product"."name", "testapp_product"."qty_target", CAST(SUM("testapp_stock"."qty_available") AS NUMERIC) AS "qty_available_sum", ("testapp_product"."qty_target" - CAST(SUM("testapp_stock"."qty_available") AS NUMERIC)) AS "qty_needed" FROM "testapp_product" LEFT OUTER JOIN "testapp_stock" ON ("testapp_product"."id" = "testapp_stock"."product_id") GROUP BY "testapp_product"."id", "testapp_product"."name", "testapp_product"."qty_target" HAVING ("testapp_product"."qty_target" - CAST(SUM("testapp_stock"."qty_available") AS NUMERIC)) > 0
# Results from above Query:
(2, 'Product2', Decimal('10'), 9, 1)
(3, 'Product3', Decimal('10'), 0, 10)

# Results from QuerySet:
F
======================================================================
FAIL: testError (testapp.tests.ErrorTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/testtmp/testapp/tests.py", line 44, in testError
    self.assertEqual(qs.count(), 2, "Two products need stock by Django QuerySet.")
AssertionError: 0 != 2 : Two products need stock by Django QuerySet.

----------------------------------------------------------------------
Ran 1 test in 0.004s

FAILED (failures=1)
Destroying test database for alias 'default'...

If the filter is inverted to .filter(qty_needed__lte=0) , the raw SQL correctly returns one row, while the QueySet returns all three rows. So I guess there is something wrong with the filter(). I tried to debug this problem, but failed because of lack of experience in Python debugging.

With the attached models- and tests-files it is possible to recreate this problem with a vanilla django project and app (including adding the app to config, and migrating of course).

Other important information:

  • Python version 3.4 on Gentoo Linux
  • Django versions tested: 1.8 to 1.10.3
  • SQLite Library in system: 3.13.0

Attachments (2)

models.py (502 bytes ) - added by Tim Düllmann 7 years ago.
tests.py (1.7 KB ) - added by Tim Düllmann 7 years ago.

Download all attachments as: .zip

Change History (15)

by Tim Düllmann, 7 years ago

Attachment: models.py added

by Tim Düllmann, 7 years ago

Attachment: tests.py added

comment:1 by Tim Düllmann, 7 years ago

Description: modified (diff)

comment:2 by Tim Graham, 7 years ago

Triage Stage: UnreviewedAccepted

comment:3 by Anton Samarchyan, 7 years ago

Description: modified (diff)
Version: 1.10master

comment:4 by Peter Inglesby, 7 years ago

Owner: changed from nobody to Peter Inglesby
Status: newassigned

comment:5 by Peter Inglesby, 7 years ago

https://www.sqlite.org/datatype3.html is a useful reference to help understand what's going on here. I can expand on anything in this comment if that would be helpful.

The reason that the two queries produce different results is because the original queryset has a single parameter, 0, which is handled differently in the two cases.

In both cases, the parameter is converted to Decimal('0'). (This is because the the lookup generated by filter(qty_needed__gt=0) knows that its left hand side's output_field is a DecimalField.)

In the first case, the parameter is interpolated into the SQL in Query.__str__, which gives us the ...HAVING (...) > 0.

In the second case, the parameter is passed through to the sqlite3 library as an extra argument to sqlite3.dbapi2.Cursor.execute. Django registers an adapter (rev_typecast_decimal) for Decimal parameters which converts the parameter to a str, so the query that's evaluated in the second case is equivalent to ...HAVING (...) > '0'.

SQLite's rules for comparing values of different types includes: "An INTEGER or REAL value is less than any TEXT or BLOB value". So in the second case, the HAVING clause is always false, since the left hand side of the > operation is NUMERIC (so either an INTEGER or a REAL) and the right hand side is TEXT. However, in the first case, the right hand side is an INTEGER, and so the HAVING behaves as expected.

So there are two bugs here:

Bug one: the SQL produced by converting a queryset to a string is often incorrect

This has been reported before and closed as wontfix. See ticket:1261 (Incorrect quoting in QuerySet.query.str()). I agree with wontfixing this, but I do think it would be good to document the limitations here. See ticket:27587.

It's worth noting that in this case, the SQL is incorrect in such a way to produce a query that does the right thing!

Bug two: on SQLite, the SQL produced by a queryset involving magnitude comparison against a DecimalField is sometimes incorrect

I believe the correct fix here is that when we have a magnitude comparison whose left hand side is a DecimalField, the right hand side should always be CAST AS NUMERIC. This is because the right hand side will have been converted to a string by the rev_typecast_decimal adapter.

Note that in some situations, this CAST will be unnecessary, and this explains why we haven't seen this bug more often. For instance, if the left hand side of the comparison is a column whose SQL type is decimal then the column has NUMERIC affinity, and so SQLite will apply NUMERIC affinity to the right hand side. If the left hand side is more complicated than this then it will have no affinity, and so no affinity will be applied to the right hand side. Rather than trying to keep track of whether the left hand side of a comparison has affinity or not, it is easier to always include the CAST, and I believe that doing so is harmless.

I have a fix for this, and will create a PR shortly.

comment:6 by Aymeric Augustin, 7 years ago

Regarding bug one, we could use the same approach as #14091 for SQLite. It's annoying to have to do this in Django. But if we're concerned about people feeding these incorrectly generated and thus insecure queries to databases, then the logical conclusion is to escape parameters in Django before substituting them. This won't make the problem worse.

This ticket is really about bug two. Let's focus on that one here.

comment:7 by Peter Inglesby, 7 years ago

Has patch: set
Last edited 7 years ago by Tim Graham (previous) (diff)

comment:8 by Tim Graham, 7 years ago

Patch needs improvement: set

Comments for improvement are on the PR.

comment:9 by Peter Inglesby, 7 years ago

Patch needs improvement: unset

comment:10 by Tim Graham, 7 years ago

Patch needs improvement: set

comment:11 by Peter Inglesby, 7 years ago

Patch needs improvement: unset

comment:12 by Tim Graham, 7 years ago

Triage Stage: AcceptedReady for checkin

comment:13 by Tim Graham <timograham@…>, 7 years ago

Resolution: fixed
Status: assignedclosed

In a4cac172:

Fixed #27498 -- Fixed filtering on annotated DecimalField on SQLite.

Note: See TracTickets for help on using tickets.
Back to Top