Excessive GROUP BY clauses during aggregations.
|Reported by:||christian_oudard||Owned by:||Anssi Kääriäinen|
|Component:||Database layer (models, ORM)||Version:||1.3|
|Cc:||Michael Curry, kmike84@…, YenTheFirst, baryshev@…, anssi.kaariainen@…, hcarvalhoalves@…, dev@…||Triage Stage:||Accepted|
|Has patch:||yes||Needs documentation:||no|
|Needs tests:||no||Patch needs improvement:||no|
Doing a simple aggregation using the Django ORM on MySQL, it is producing a GROUP BY clause that includes an unnecessary field. In the particular case where I discovered this, the unnecessary field is very large, and was slowing down the query by 100-fold.
Here is a simplified version of the model:
class Document(models.Model): data = models.TextField() # Very large field class Attachment(models.Model): document = models.ForeignKey(Document)
And the query I am running:
And the SQL output:
SELECT `document_document`.`id`, `document_document`.`data`, COUNT(`document_attachment`.`id`) AS `num_attachments` FROM `document_document` LEFT OUTER JOIN `document_attachment` ON (`document_document`.`id` = `document_attachment`.`document_id`) GROUP BY `document_document`.`id`, `document_document`.`id`, `document_document`.`data` ORDER BY NULL
Doing GROUP BY on the data field is unnecessary, and it did not do this in Django 1.2.
From the Django mailing list:
Karen Tracey On Tue, Nov 1, 2011 at 6:19 PM
The SQL generated by the ORM for this query changed between Django version
1.2 and 1.3. The 1.2 SQL did a group by only on the id field. With 1.3
we're getting id twice and then all other fields in the model. Bisection
shows the change was made with r14715:
It certainly looks to me like the old SQL was correct and preferable for
this particular case. In a brief search I did not find a ticket reporting
this issue -- could you open one?
Change History (26)
comment:13 Changed 5 years ago by
|Owner:||changed from Aymeric Augustin to Anssi Kääriäinen|