Opened 8 years ago
Closed 8 years ago
#27498 closed Bug (fixed)
Filtering annotated field in SQLite returns wrong results
Reported by: | Tim Düllmann | Owned by: | Peter Inglesby |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | sqlite, annotations, filter |
Cc: | Triage Stage: | Ready for checkin | |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description (last modified by )
When filtering an annotated field with SQLite backend the result from Django QuerySet differs from what the raw SQL-query returns. This does not affect other databases, only SQLite.
The following is the problematic queryset:
qs = Product.objects.annotate(qty_available_sum=Sum('stock__qty_available')) \ .annotate(qty_needed=F('qty_target') - F('qty_available_sum')) \ .filter(qty_needed__gt=0)
This translates to the following Query, which returns the correct results, when run natively:
SELECT "testapp_product"."id", "testapp_product"."name", "testapp_product"."qty_target", CAST(SUM("testapp_stock"."qty_available") AS NUMERIC) AS "qty_available_sum", ("testapp_product"."qty_target" - CAST(SUM("testapp_stock"."qty_available") AS NUMERIC)) AS "qty_needed" FROM "testapp_product" LEFT OUTER JOIN "testapp_stock" ON ("testapp_product"."id" = "testapp_stock"."product_id") GROUP BY "testapp_product"."id", "testapp_product"."name", "testapp_product"."qty_target" HAVING ("testapp_product"."qty_target" - CAST(SUM("testapp_stock"."qty_available") AS NUMERIC)) > 0
Using the QuerySet, it does not respect the filter condition. For details see the test below.
Here is a simple App with a test that shows the difference. The complete files with imports are attached.
###### # models.py: ###### from django.db import models class Product(models.Model): name = models.CharField(max_length=80) qty_target = models.DecimalField(max_digits=6, decimal_places=2) class Stock(models.Model): product = models.ForeignKey(Product, related_name="stock") qty_available = models.DecimalField(max_digits=6, decimal_places=2) ###### # tests.py: ###### from django.test import TestCase from django.db.models.aggregates import Sum from django.db import connection from django.db.models.expressions import F from testapp.models import Product, Stock class ErrorTestCase(TestCase): def setUp(self): p1 = Product.objects.create(name="Product1", qty_target=10) p2 = Product.objects.create(name="Product2", qty_target=10) p3 = Product.objects.create(name="Product3", qty_target=10) s1 = Stock.objects.create(product=p1, qty_available=5) s2 = Stock.objects.create(product=p1, qty_available=5) s3 = Stock.objects.create(product=p1, qty_available=3) # 3 over target s4 = Stock.objects.create(product=p2, qty_available=5) s5 = Stock.objects.create(product=p2, qty_available=4) # 1 under target s6 = Stock.objects.create(product=p3, qty_available=0) # 10 under target def testError(self): # This is the exciting stuff: qs = Product.objects.annotate(qty_available_sum=Sum('stock__qty_available'))\ .annotate(qty_needed=F('qty_target') - F('qty_available_sum'))\ .filter(qty_needed__gt=0) # Retrieve by raw query: query = str(qs.query) print("# Query from QuerySet:") print(query) rows = [] with connection.cursor() as cursor: cursor.execute(query) rows = cursor.fetchall() print("# Results from above Query:") for r in rows: print(r) self.assertEqual(len(rows), 2, "Two products need stock by SQL query.") # retrieve by Django QuerySet: print("# Results from QuerySet:") for q in qs: print(q, q.qty_needed) self.assertEqual(qs.count(), 2, "Two products need stock by Django QuerySet.")
And here is the output of the test. You can see the full query as it is retrieved from the queryset and then run natively to retrieve the correct amount of two rows. After that the same queryset is used to retreive the data, but it retreived no row at all.
###### # Output with SQLite: ###### Creating test database for alias 'default'... # Query from QuerySet: SELECT "testapp_product"."id", "testapp_product"."name", "testapp_product"."qty_target", CAST(SUM("testapp_stock"."qty_available") AS NUMERIC) AS "qty_available_sum", ("testapp_product"."qty_target" - CAST(SUM("testapp_stock"."qty_available") AS NUMERIC)) AS "qty_needed" FROM "testapp_product" LEFT OUTER JOIN "testapp_stock" ON ("testapp_product"."id" = "testapp_stock"."product_id") GROUP BY "testapp_product"."id", "testapp_product"."name", "testapp_product"."qty_target" HAVING ("testapp_product"."qty_target" - CAST(SUM("testapp_stock"."qty_available") AS NUMERIC)) > 0 # Results from above Query: (2, 'Product2', Decimal('10'), 9, 1) (3, 'Product3', Decimal('10'), 0, 10) # Results from QuerySet: F ====================================================================== FAIL: testError (testapp.tests.ErrorTestCase) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/testtmp/testapp/tests.py", line 44, in testError self.assertEqual(qs.count(), 2, "Two products need stock by Django QuerySet.") AssertionError: 0 != 2 : Two products need stock by Django QuerySet. ---------------------------------------------------------------------- Ran 1 test in 0.004s FAILED (failures=1) Destroying test database for alias 'default'...
If the filter is inverted to .filter(qty_needed__lte=0)
, the raw SQL correctly returns one row, while the QueySet returns all three rows. So I guess there is something wrong with the filter(). I tried to debug this problem, but failed because of lack of experience in Python debugging.
With the attached models- and tests-files it is possible to recreate this problem with a vanilla django project and app (including adding the app to config, and migrating of course).
Other important information:
- Python version 3.4 on Gentoo Linux
- Django versions tested: 1.8 to 1.10.3
- SQLite Library in system: 3.13.0
Attachments (2)
Change History (15)
by , 8 years ago
by , 8 years ago
comment:1 by , 8 years ago
Description: | modified (diff) |
---|
comment:2 by , 8 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:3 by , 8 years ago
Description: | modified (diff) |
---|---|
Version: | 1.10 → master |
comment:4 by , 8 years ago
Owner: | changed from | to
---|---|
Status: | new → assigned |
comment:5 by , 8 years ago
https://www.sqlite.org/datatype3.html is a useful reference to help understand what's going on here. I can expand on anything in this comment if that would be helpful.
The reason that the two queries produce different results is because the original queryset has a single parameter, 0, which is handled differently in the two cases.
In both cases, the parameter is converted to Decimal('0')
. (This is because the the lookup generated by filter(qty_needed__gt=0)
knows that its left hand side's output_field
is a DecimalField
.)
In the first case, the parameter is interpolated into the SQL in Query.__str__
, which gives us the ...HAVING (...) > 0
.
In the second case, the parameter is passed through to the sqlite3
library as an extra argument to sqlite3.dbapi2.Cursor.execute
. Django registers an adapter (rev_typecast_decimal
) for Decimal
parameters which converts the parameter to a str
, so the query that's evaluated in the second case is equivalent to ...HAVING (...) > '0'
.
SQLite's rules for comparing values of different types includes: "An INTEGER or REAL value is less than any TEXT or BLOB value". So in the second case, the HAVING
clause is always false, since the left hand side of the >
operation is NUMERIC (so either an INTEGER or a REAL) and the right hand side is TEXT. However, in the first case, the right hand side is an INTEGER, and so the HAVING
behaves as expected.
So there are two bugs here:
Bug one: the SQL produced by converting a queryset to a string is often incorrect
This has been reported before and closed as wontfix. See ticket:1261 (Incorrect quoting in QuerySet.query.str()). I agree with wontfixing this, but I do think it would be good to document the limitations here. See ticket:27587.
It's worth noting that in this case, the SQL is incorrect in such a way to produce a query that does the right thing!
Bug two: on SQLite, the SQL produced by a queryset involving magnitude comparison against a DecimalField
is sometimes incorrect
I believe the correct fix here is that when we have a magnitude comparison whose left hand side is a DecimalField
, the right hand side should always be CAST AS NUMERIC
. This is because the right hand side will have been converted to a string by the rev_typecast_decimal
adapter.
Note that in some situations, this CAST
will be unnecessary, and this explains why we haven't seen this bug more often. For instance, if the left hand side of the comparison is a column whose SQL type is decimal
then the column has NUMERIC
affinity, and so SQLite will apply NUMERIC
affinity to the right hand side. If the left hand side is more complicated than this then it will have no affinity, and so no affinity will be applied to the right hand side. Rather than trying to keep track of whether the left hand side of a comparison has affinity or not, it is easier to always include the CAST
, and I believe that doing so is harmless.
I have a fix for this, and will create a PR shortly.
comment:6 by , 8 years ago
Regarding bug one, we could use the same approach as #14091 for SQLite. It's annoying to have to do this in Django. But if we're concerned about people feeding these incorrectly generated and thus insecure queries to databases, then the logical conclusion is to escape parameters in Django before substituting them. This won't make the problem worse.
This ticket is really about bug two. Let's focus on that one here.
comment:9 by , 8 years ago
Patch needs improvement: | unset |
---|
comment:10 by , 8 years ago
Patch needs improvement: | set |
---|
comment:11 by , 8 years ago
Patch needs improvement: | unset |
---|
comment:12 by , 8 years ago
Triage Stage: | Accepted → Ready for checkin |
---|
Reproduced at b1a9041535db5d03dab7f205669f0ab7a47de854.