Opened 4 years ago

Closed 4 years ago

#24570 closed Bug (invalid)

`group_by` clause does not resolve keywords defined in `extra` clause.

Reported by: user0007 Owned by: nobody
Component: Database layer (models, ORM) Version: 1.8
Severity: Normal Keywords:
Cc: Triage Stage: Unreviewed
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by user0007)

Hello,

during migrating from Django 1.7.5 to 1.8 I found a bug(?)

    truncate_date = connection.ops.date_trunc_sql('day', 'created_at')

    qs = MyModel.objects.filter(
        user_id=1
    ).annotate(
        max_date=models.Max('created_at')
    ).extra(
        {'created_at': truncate_date}
    ).values(
        'product_id', 'created_at', 'max_date'
    )
    qs.query.group_by = ['product_id', truncate_date]
    qs.order_by('-max_date')[:10]

The result is:

FieldError: Cannot resolve keyword u"CAST(DATE_FORMAT(created_at, '%%Y-%%m-%%d 00:00:00') AS DATETIME)" into field.

This code works perfectly on Django 1.7.5

Traceback:

Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/query.py", line 138, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE + 1])
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/query.py", line 162, in __iter__
    self._fetch_all()
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/query.py", line 965, in _fetch_all
    self._result_cache = list(self.iterator())
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/query.py", line 1085, in iterator
    for row in self.query.get_compiler(self.db).results_iter():
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 783, in results_iter
    results = self.execute_sql(MULTI)
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 818, in execute_sql
    sql, params = self.as_sql()
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 367, in as_sql
    extra_select, order_by, group_by = self.pre_sql_setup()
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 51, in pre_sql_setup
    group_by = self.get_group_by(self.select + extra_select, order_by)
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 102, in get_group_by
    expressions.append(self.query.resolve_ref(expr))
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1522, in resolve_ref
    self.get_initial_alias(), reuse)
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1461, in setup_joins
    names, opts, allow_many, fail_on_missing=True)
  File "/home/env/local/lib/python2.7/site-packages/django/db/models/sql/query.py", line 1386, in names_to_path
    "Choices are: %s" % (name, ", ".join(available)))
FieldError: Cannot resolve keyword u"CAST(DATE_FORMAT(created_at, '%%Y-%%m-%%d 00:00:00') AS DATETIME)" into field.

Change History (13)

comment:1 Changed 4 years ago by user0007

Description: modified (diff)

comment:2 Changed 4 years ago by user0007

Description: modified (diff)

comment:3 Changed 4 years ago by user0007

Summary: Cannot resolve keyword "CAST" in group_by clause.`group_by` clause does not resolve keywords defined in `extra` clause.

comment:4 Changed 4 years ago by user0007

Description: modified (diff)

comment:5 Changed 4 years ago by Baptiste Mispelon

Severity: NormalRelease blocker
Triage Stage: UnreviewedAccepted

Hi,

I can indeed reproduce this issue and I bisected it down to commit 0c7633178fa9410f102e4708cef979b873bccb76.

I'm bumping the severity to release blocker because of the regression.

Thanks!

comment:6 Changed 4 years ago by Shai Berger

is that extra(select= {'created_at': truncate_date})?

(I'm not sure about the Release Blocker status -- I don't think setting group_by on a QSet's query like this is public API)

comment:7 in reply to:  6 Changed 4 years ago by user0007

Replying to shaib:

is that extra(select= {'created_at': truncate_date})?

Yes. extra(select= {'created_at': truncate_date}) and extra({'created_at': truncate_date}) are equal.

Last edited 4 years ago by user0007 (previous) (diff)

comment:8 Changed 4 years ago by Baptiste Mispelon

Severity: Release blockerNormal
Triage Stage: AcceptedUnreviewed

I skimmed through the code too quickly and didn't spot the qs.query.group_by = ....

As mentionned by Shai, using the attribute like that is not documented so this doesn't technically count as a regression.

Is there no way to achieve what you're doing with the documented/supported queryset.group_by() method?
Edit: I've been informed that such a method doesn't actually exist.

Last edited 4 years ago by Baptiste Mispelon (previous) (diff)

comment:9 Changed 4 years ago by Josh Smeaton

What you were doing with your original queryset seems suspect to me, it looks like there are better ways to accomplish your goals in a completely supported way. Let me highlight some issues with your original queryset (I'll remove pieces that aren't relevant for brevity):

MyModel.objects.annotate(
        max_date=models.Max('created_at')
    ).extra(
        {'created_at': truncate_date}  # here you're overwriting the field called created_at, you should choose a new alias
    ).values(
        'product_id', 'created_at', 'max_date'
    )
    
    # this is private API, but you'd need to use the alias name of the extra clause ("created_at"),
    # which is ambiguous because you've used the same name. group_by=['product_id', 'created_extra']
    qs.query.group_by = ['product_id', truncate_date]   
    qs.order_by('-max_date')[:10]

I've written a comparable (but not exact) queryset with 1.8 that I'd like you to take a look at.

In [46]: truncate_date = connection.ops.date_trunc_sql('day', 'created')

In [47]: qs = Stats.objects.extra({'created_day': truncate_date}).values('created_day').annotate(max_growth=Max('growth'))

In [48]: print(qs.query)
SELECT (DATE_TRUNC('day', created)) AS "created_day", MAX("scratch_stats"."growth") AS "max_growth" FROM "scratch_stats" GROUP BY (DATE_TRUNC('day', created))

In [49]: for q in qs:
   ....:     print(q)
   ....:
{'created_day': datetime.datetime(2015, 3, 29, 0, 0, tzinfo=<UTC>), 'max_growth': 16}
{'created_day': datetime.datetime(2015, 3, 24, 0, 0, tzinfo=<UTC>), 'max_growth': 11}
{'created_day': datetime.datetime(2015, 3, 31, 0, 0, tzinfo=<UTC>), 'max_growth': 17}
{'created_day': datetime.datetime(2015, 3, 25, 0, 0, tzinfo=<UTC>), 'max_growth': 18}
{'created_day': datetime.datetime(2015, 3, 23, 0, 0, tzinfo=<UTC>), 'max_growth': 17}
...

Note that the query groups by the extra created_day clause because we've not shadowed any model fields, and we've named it in our values call. The above code I've written is fully supported too, although I'd encourage you to avoid using extra and to instead use Query Expressions. Django should provide proper db_functions for extracting date parts, but that's a work in progress (I would imagine Django 1.9 will have something).

What I think your query should look like:

truncate_date = connection.ops.date_trunc_sql('day', 'created_at')

    qs = MyModel.objects.filter(
        user_id=1
    ).annotate(
        max_date=models.Max('created_at')
    ).extra(
        {'created_day': truncate_date}
    ).values(
        'product_id', 'created_day', 'max_date'
    ).order_by('-max_date')[:10]

Can you give that a go and let us know the outcome please?

Last edited 4 years ago by Josh Smeaton (previous) (diff)

comment:10 in reply to:  9 Changed 4 years ago by user0007

Thank You for the answer.

Replying to jarshwah:

What I think your query should look like:

truncate_date = connection.ops.date_trunc_sql('day', 'created_at')

    qs = MyModel.objects.filter(
        user_id=1
    ).annotate(
        max_date=models.Max('created_at')
    ).extra(
        {'created_day': truncate_date}
    ).values(
        'product_id', 'created_day', 'max_date'
    ).order_by('-max_date')[:10]

Can you give that a go and let us know the outcome please?

I've tried in that way. This QuerySet generates incorrect SQL (it groups by id which is wrong and gives duplicate results).

SQL output:

SELECT (CAST(DATE_FORMAT(created_at, '%Y-%m-%d 00:00:00') AS DATETIME)) AS `created_day`, `mymodel`.`product_id`, MAX(`mymodel`.`created_at`) AS `max_date` FROM `mymodel` WHERE (`mymodel`.`user_id` = 1) GROUP BY `mymodel`.`id` ORDER BY `max_date` DESC LIMIT 10

The query which works is:

SELECT (CAST(DATE_FORMAT(created_at, '%Y-%m-%d 00:00:00') AS DATETIME)), `mymodel`.`product_id`, MAX(`mymodel`.`created_at`) AS `max_date` FROM `mymodel` WHERE (`mymodel`.`user_id` = 1) GROUP BY `mymodel`.`product_id`, (CAST(DATE_FORMAT(created_at, '%%Y-%%m-%%d 00:00:00') AS DATETIME)) ORDER BY `max_date` DESC LIMIT 10

And the question is how to write it using QuerySet? :-) I was able to that in Django 1.7.5 using query.group by, but now it throws errors.

comment:11 Changed 4 years ago by Josh Smeaton

Which database backend are you using? Is that MySQL?

comment:12 in reply to:  11 Changed 4 years ago by user0007

Replying to jarshwah:

Which database backend are you using? Is that MySQL?

Yup, MySQL.

comment:13 Changed 4 years ago by Tim Graham

Resolution: invalid
Status: newclosed

It looks like this has turned into a support ticket about how to use private APIs; see TicketClosingReasons/UseSupportChannels.

Note: See TracTickets for help on using tickets.
Back to Top