Opened 5 years ago

Closed 5 years ago

#30099 closed Bug (fixed)

Filter by Count annotated inside Subquery

Reported by: MrFus10n Owned by: Nasir Hussain
Component: Database layer (models, ORM) Version: 2.1
Severity: Normal Keywords: subquery annotate filter
Cc: Simon Charette Triage Stage: Ready for checkin
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description (last modified by MrFus10n)

I want to get authors and annotate minimal count of books in category if it is greater than three.

Book and Author models and are not connected with ForeignKey fields (this is abstract and simplified, there is a reason):

Author(models.Model):
    name = models.CharField(max_length=250)

Book(models.Model):
    author_name = models.CharField(max_length=250)
    book_category = models.CharField(max_length=250)

Here is simplest query I can get to reproduce:

(Author.objects
 .annotate(min_valuable_count=Subquery(
    Book.objects
        .filter(author_name=OuterRef('name'))
        .annotate(cnt=Count('book_category'))
        .filter(cnt__gt=3)
        .order_by('cnt')
        .values('cnt')[:1],
    output_field=models.IntegerField()
)))

And I get an error:

psycopg2.ProgrammingError: missing FROM-clause entry for table "U0"
LINE 1: ... "core_author" GROUP BY "core_author"."id", "U0"."id" ...
                                                       ^

Here is SQL:

SELECT "core_author"."id", "core_author"."name", (
    SELECT COUNT(U0."book_category") AS "cnt" 
    FROM "core_book" U0 WHERE U0."id" = ("core_author"."chat_id") 
    GROUP BY U0."id" HAVING COUNT(U0."book_category") > 3 
    ORDER BY "cnt" ASC  LIMIT 1) 
AS "min_valuable_count" 
FROM "core_author" 
GROUP BY "core_author"."id", "U0"."id"

If I remove line .filter(cnt__gt=3), last GROUP BY disappears and query stops raising error:

SELECT "core_author"."id", "core_author"."name", (
    SELECT COUNT(U0."book_category") AS "cnt" 
    FROM "core_book" U0 WHERE U0."id" = ("core_author"."chat_id") 
    GROUP BY U0."id"
    ORDER BY "cnt" ASC  LIMIT 1) 
AS "min_valuable_count" 
FROM "core_author" 

Is there any way to remove GROUP BY in outer query without removing .filter(cnt__gt=3) in subquery?
Is qs.query.group_by = None ok?

Change History (18)

comment:1 by MrFus10n, 5 years ago

Description: modified (diff)
Keywords: Subquery annotate Count filter added

comment:2 by MrFus10n, 5 years ago

Description: modified (diff)

comment:3 by MrFus10n, 5 years ago

Description: modified (diff)

comment:4 by MrFus10n, 5 years ago

Description: modified (diff)

comment:5 by MrFus10n, 5 years ago

Description: modified (diff)

comment:6 by Simon Charette, 5 years ago

Cc: Simon Charette added
Keywords: subquery added; Subquery Count removed
Triage Stage: UnreviewedAccepted

I'm pretty sure it's caused by Subquery.contains_aggregate returning True when it shouldn't; this attribute shouldn't cross the subquery boundary. Right now it defaults to BaseExpression.contains_aggregate which is based of the filters of the subquery and would explain why it's the .filter(cnt__gt=3) that triggers the issue.

Could you try setting Subquery.contains_aggregate = False and confirm it drops the outer GROUP BY?

Also, would you be interested in submitting a patch if it does fix it? It should consist of setting this attribute and adding a test in tests/expressions/tests.py that uses the existing models. Happy to walk you through a review if you are interested.

Last edited 5 years ago by Simon Charette (previous) (diff)

comment:7 by MrFus10n, 5 years ago

Thank you for response. Yes, setting contains_aggregate = False in subquery did the job. I will see your links and try to make a patch.

comment:8 by MrFus10n, 5 years ago

I ran tests before changing anything and got FAILED (failures=3, errors=3, skipped=948, expected failures=4). Is this ok?

comment:9 by Simon Charette, 5 years ago

It'd be great if you could submit a Github Pull Request with your changes but if it's not possible could you report the traceback of failures?

comment:10 by MrFus10n, 5 years ago

I didn't change anything yet. Just cloned repo, setup virtualenv and executed tests/runtests.py. I am using windows 7, python 3.5.3.

This is in the middle of testing:

Exception in thread Thread-805:
Traceback (most recent call last):
  File "C:\Python\35x32\lib\threading.py", line 914, in _bootstrap_inner
    self.run()
  File "c:\dev\projects\django\django\test\testcases.py", line 1364, in run
    connections.close_all()
  File "c:\dev\projects\django\django\db\utils.py", line 224, in close_all
    connection.close()
  File "c:\dev\projects\django\django\db\backends\sqlite3\base.py", line 237, in close
    self.validate_thread_sharing()
  File "c:\dev\projects\django\django\db\backends\base\base.py", line 531, in validate_thread_sharing
    % (self.alias, self._thread_ident, _thread.get_ident())
django.db.utils.DatabaseError: DatabaseWrapper objects created in a thread can only be used in that same thread. The object with alias 'other' was created in thread id 11452 and this is thread id 1148.

And this is after it's done:

======================================================================
ERROR: test_unicode_file_name (i18n.test_extraction.BasicExtractorTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\i18n\test_extraction.py", line 216, in test_unicode_file_name
    management.call_command('makemessages', locale=[LOCALE], verbosity=0)
  File "c:\dev\projects\django\django\core\management\__init__.py", line 148, in call_command
    return command.execute(*args, **defaults)
  File "c:\dev\projects\django\django\core\management\base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "c:\dev\projects\django\django\core\management\commands\makemessages.py", line 384, in handle
    potfiles = self.build_potfiles()
  File "c:\dev\projects\django\django\core\management\commands\makemessages.py", line 426, in build_potfiles
    self.process_files(file_list)
  File "c:\dev\projects\django\django\core\management\commands\makemessages.py", line 519, in process_files
    self.process_locale_dir(locale_dir, files)
  File "c:\dev\projects\django\django\core\management\commands\makemessages.py", line 583, in process_locale_dir
    input_files_list.write(('\n'.join(input_files)))
  File "C:\Python\35x32\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xe9' in position 241: character maps to <undefined>

======================================================================
ERROR: test_strip_tags_files (utils_tests.test_html.TestUtilsHtml) (filename='strip_tags1.html')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\utils_tests\test_html.py", line 103, in test_strip_tags_files
    content = fp.read()
  File "C:\Python\35x32\lib\encodings\cp1251.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 33114: character maps to <undefined>

======================================================================
ERROR: test_content_saving (file_storage.tests.ContentFileStorageTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\file_storage\tests.py", line 968, in test_content_saving
    self.storage.save('unicode.txt', ContentFile("espa\xf1ol"))
  File "c:\dev\projects\django\django\core\files\storage.py", line 52, in save
    return self._save(name, content)
  File "c:\dev\projects\django\django\core\files\storage.py", line 274, in _save
    _file.write(chunk)
  File "C:\Python\35x32\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xf1' in position 4: character maps to <undefined>

======================================================================
FAIL: test_all (admin_scripts.tests.DiffSettings)
The all option also shows settings with the default value.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\admin_scripts\tests.py", line 2247, in test_all
    self.assertNoOutput(err)
  File "C:\dev\projects\django\tests\admin_scripts\tests.py", line 189, in assertNoOutput
    self.assertEqual(len(stream), 0, "Stream should be empty: actually contains '%s'" % stream)
AssertionError: 1064 != 0 : Stream should be empty: actually contains 'Traceback (most recent call last):
  File "./manage.py", line 21, in <module>
    main()
  File "./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "C:\dev\projects\django\django\core\management\__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "C:\dev\projects\django\django\core\management\__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "C:\dev\projects\django\django\core\management\base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "C:\dev\projects\django\django\core\management\base.py", line 373, in execute
    self.stdout.write(output)
  File "C:\dev\projects\django\django\core\management\base.py", line 145, in write
    self._out.write(style_func(msg))
  File "C:\Python\35x32\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xe5' in position 4960: character maps to <undefined>
'

======================================================================
FAIL: test_unified_all (admin_scripts.tests.DiffSettings)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\admin_scripts\tests.py", line 2285, in test_unified_all
    self.assertNoOutput(err)
  File "C:\dev\projects\django\tests\admin_scripts\tests.py", line 189, in assertNoOutput
    self.assertEqual(len(stream), 0, "Stream should be empty: actually contains '%s'" % stream)
AssertionError: 1064 != 0 : Stream should be empty: actually contains 'Traceback (most recent call last):
  File "./manage.py", line 21, in <module>
    main()
  File "./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "C:\dev\projects\django\django\core\management\__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "C:\dev\projects\django\django\core\management\__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "C:\dev\projects\django\django\core\management\base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "C:\dev\projects\django\django\core\management\base.py", line 373, in execute
    self.stdout.write(output)
  File "C:\dev\projects\django\django\core\management\base.py", line 145, in write
    self._out.write(style_func(msg))
  File "C:\Python\35x32\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xe5' in position 4888: character maps to <undefined>
'

======================================================================
FAIL: test_accent (dbshell.test_postgresql_psycopg2.PostgreSqlDbshellCommandTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\dbshell\test_postgresql_psycopg2.py", line 100, in test_accent
    pgpass_string,
AssertionError: Tuples differ: (['ps[32 chars]st', '-p', '444', 'dbname'], None) != (['ps[32 chars]st', '-p', '444', 'dbname'], 'somehost:444:dbname:r\xf4le:s\xe9same')

First differing element 1:
None
'somehost:444:dbname:r\xf4le:s\xe9same'

- (['psql', '-U', 'r\xf4le', '-h', 'somehost', '-p', '444', 'dbname'], None)
?                                                                  ------

+ (['psql', '-U', 'r\xf4le', '-h', 'somehost', '-p', '444', 'dbname'],
+  'somehost:444:dbname:r\xf4le:s\xe9same')

----------------------------------------------------------------------
Ran 12577 tests in 2925.839s

FAILED (failures=3, errors=3, skipped=948, expected failures=4)
Destroying test database for alias 'default'\u2026
Destroying test database for alias 'other'\u2026

comment:11 by Simon Charette, 5 years ago

Alright, don't worry about these failures they are likely related to your Windows configuration. Do you feel comfortable submitting a Github PR with your changes?

comment:12 by Nasir Hussain, 5 years ago

Owner: changed from nobody to Nasir Hussain
Status: newassigned

comment:13 by Nasir Hussain, 5 years ago

Has patch: set

in reply to:  11 comment:14 by Nasir Hussain, 5 years ago

Replying to Simon Charette:
I've pushed a fix with tests in PR. Can you please review it?
Thanks

comment:15 by Simon Charette, 5 years ago

Patch needs improvement: set

Thanks for taking the time to submit a PR Nasir! I left a few comments for improvements on the PR, please uncheck patch needs improvement once they are addressed.

in reply to:  15 comment:16 by Nasir Hussain, 5 years ago

Patch needs improvement: unset

Replying to Simon Charette:

Thanks for taking the time to submit a PR Nasir! I left a few comments for improvements on the PR, please uncheck patch needs improvement once they are addressed.

Hi, thanks for reviewing. I've pushed the fixed according to the review. Can you please review it again?
Thanks,

comment:17 by Simon Charette, 5 years ago

Triage Stage: AcceptedReady for checkin

comment:18 by Tim Graham <timograham@…>, 5 years ago

Resolution: fixed
Status: assignedclosed

In f021c11:

Fixed #30099 -- Fixed invalid SQL when filtering a Subquery by an aggregate.

Note: See TracTickets for help on using tickets.
Back to Top