Opened 7 years ago
Closed 7 years ago
#30099 closed Bug (fixed)
Filter by Count annotated inside Subquery
| Reported by: | MrFus10n | Owned by: | Nasir Hussain | 
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 2.1 | 
| Severity: | Normal | Keywords: | subquery annotate filter | 
| Cc: | Simon Charette | Triage Stage: | Ready for checkin | 
| Has patch: | yes | Needs documentation: | no | 
| Needs tests: | no | Patch needs improvement: | no | 
| Easy pickings: | no | UI/UX: | no | 
Description (last modified by )
I want to get authors and annotate minimal count of books in category if it is greater than three.
Book and Author models and are not connected with ForeignKey fields (this is abstract and simplified, there is a reason):
Author(models.Model):
    name = models.CharField(max_length=250)
Book(models.Model):
    author_name = models.CharField(max_length=250)
    book_category = models.CharField(max_length=250)
Here is simplest query I can get to reproduce:
(Author.objects
 .annotate(min_valuable_count=Subquery(
    Book.objects
        .filter(author_name=OuterRef('name'))
        .annotate(cnt=Count('book_category'))
        .filter(cnt__gt=3)
        .order_by('cnt')
        .values('cnt')[:1],
    output_field=models.IntegerField()
)))
And I get an error:
psycopg2.ProgrammingError: missing FROM-clause entry for table "U0"
LINE 1: ... "core_author" GROUP BY "core_author"."id", "U0"."id" ...
                                                       ^
Here is SQL:
SELECT "core_author"."id", "core_author"."name", (
    SELECT COUNT(U0."book_category") AS "cnt" 
    FROM "core_book" U0 WHERE U0."id" = ("core_author"."chat_id") 
    GROUP BY U0."id" HAVING COUNT(U0."book_category") > 3 
    ORDER BY "cnt" ASC  LIMIT 1) 
AS "min_valuable_count" 
FROM "core_author" 
GROUP BY "core_author"."id", "U0"."id"
If I remove line .filter(cnt__gt=3), last GROUP BY disappears and query stops raising error:
SELECT "core_author"."id", "core_author"."name", (
    SELECT COUNT(U0."book_category") AS "cnt" 
    FROM "core_book" U0 WHERE U0."id" = ("core_author"."chat_id") 
    GROUP BY U0."id"
    ORDER BY "cnt" ASC  LIMIT 1) 
AS "min_valuable_count" 
FROM "core_author" 
Is there any way to remove GROUP BY in outer query without removing .filter(cnt__gt=3) in subquery? 
Is qs.query.group_by = None ok?
Change History (18)
comment:1 by , 7 years ago
| Description: | modified (diff) | 
|---|---|
| Keywords: | Subquery annotate Count filter added | 
comment:2 by , 7 years ago
| Description: | modified (diff) | 
|---|
comment:3 by , 7 years ago
| Description: | modified (diff) | 
|---|
comment:4 by , 7 years ago
| Description: | modified (diff) | 
|---|
comment:5 by , 7 years ago
| Description: | modified (diff) | 
|---|
comment:6 by , 7 years ago
| Cc: | added | 
|---|---|
| Keywords: | subquery added; Subquery Count removed | 
| Triage Stage: | Unreviewed → Accepted | 
comment:7 by , 7 years ago
Thank you for response. Yes, setting contains_aggregate = False in subquery did the job. I will see your links and try to make a patch.
comment:8 by , 7 years ago
I ran tests before changing anything and got FAILED (failures=3, errors=3, skipped=948, expected failures=4). Is this ok? 
comment:9 by , 7 years ago
It'd be great if you could submit a Github Pull Request with your changes but if it's not possible could you report the traceback of failures?
comment:10 by , 7 years ago
I didn't change anything yet. Just cloned repo, setup virtualenv and executed tests/runtests.py. I am using windows 7, python 3.5.3.
This is in the middle of testing:
Exception in thread Thread-805:
Traceback (most recent call last):
  File "C:\Python\35x32\lib\threading.py", line 914, in _bootstrap_inner
    self.run()
  File "c:\dev\projects\django\django\test\testcases.py", line 1364, in run
    connections.close_all()
  File "c:\dev\projects\django\django\db\utils.py", line 224, in close_all
    connection.close()
  File "c:\dev\projects\django\django\db\backends\sqlite3\base.py", line 237, in close
    self.validate_thread_sharing()
  File "c:\dev\projects\django\django\db\backends\base\base.py", line 531, in validate_thread_sharing
    % (self.alias, self._thread_ident, _thread.get_ident())
django.db.utils.DatabaseError: DatabaseWrapper objects created in a thread can only be used in that same thread. The object with alias 'other' was created in thread id 11452 and this is thread id 1148.
And this is after it's done:
======================================================================
ERROR: test_unicode_file_name (i18n.test_extraction.BasicExtractorTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\i18n\test_extraction.py", line 216, in test_unicode_file_name
    management.call_command('makemessages', locale=[LOCALE], verbosity=0)
  File "c:\dev\projects\django\django\core\management\__init__.py", line 148, in call_command
    return command.execute(*args, **defaults)
  File "c:\dev\projects\django\django\core\management\base.py", line 364, in execute
    output = self.handle(*args, **options)
  File "c:\dev\projects\django\django\core\management\commands\makemessages.py", line 384, in handle
    potfiles = self.build_potfiles()
  File "c:\dev\projects\django\django\core\management\commands\makemessages.py", line 426, in build_potfiles
    self.process_files(file_list)
  File "c:\dev\projects\django\django\core\management\commands\makemessages.py", line 519, in process_files
    self.process_locale_dir(locale_dir, files)
  File "c:\dev\projects\django\django\core\management\commands\makemessages.py", line 583, in process_locale_dir
    input_files_list.write(('\n'.join(input_files)))
  File "C:\Python\35x32\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xe9' in position 241: character maps to <undefined>
======================================================================
ERROR: test_strip_tags_files (utils_tests.test_html.TestUtilsHtml) (filename='strip_tags1.html')
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\utils_tests\test_html.py", line 103, in test_strip_tags_files
    content = fp.read()
  File "C:\Python\35x32\lib\encodings\cp1251.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 33114: character maps to <undefined>
======================================================================
ERROR: test_content_saving (file_storage.tests.ContentFileStorageTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\file_storage\tests.py", line 968, in test_content_saving
    self.storage.save('unicode.txt', ContentFile("espa\xf1ol"))
  File "c:\dev\projects\django\django\core\files\storage.py", line 52, in save
    return self._save(name, content)
  File "c:\dev\projects\django\django\core\files\storage.py", line 274, in _save
    _file.write(chunk)
  File "C:\Python\35x32\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xf1' in position 4: character maps to <undefined>
======================================================================
FAIL: test_all (admin_scripts.tests.DiffSettings)
The all option also shows settings with the default value.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\admin_scripts\tests.py", line 2247, in test_all
    self.assertNoOutput(err)
  File "C:\dev\projects\django\tests\admin_scripts\tests.py", line 189, in assertNoOutput
    self.assertEqual(len(stream), 0, "Stream should be empty: actually contains '%s'" % stream)
AssertionError: 1064 != 0 : Stream should be empty: actually contains 'Traceback (most recent call last):
  File "./manage.py", line 21, in <module>
    main()
  File "./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "C:\dev\projects\django\django\core\management\__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "C:\dev\projects\django\django\core\management\__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "C:\dev\projects\django\django\core\management\base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "C:\dev\projects\django\django\core\management\base.py", line 373, in execute
    self.stdout.write(output)
  File "C:\dev\projects\django\django\core\management\base.py", line 145, in write
    self._out.write(style_func(msg))
  File "C:\Python\35x32\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xe5' in position 4960: character maps to <undefined>
'
======================================================================
FAIL: test_unified_all (admin_scripts.tests.DiffSettings)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\admin_scripts\tests.py", line 2285, in test_unified_all
    self.assertNoOutput(err)
  File "C:\dev\projects\django\tests\admin_scripts\tests.py", line 189, in assertNoOutput
    self.assertEqual(len(stream), 0, "Stream should be empty: actually contains '%s'" % stream)
AssertionError: 1064 != 0 : Stream should be empty: actually contains 'Traceback (most recent call last):
  File "./manage.py", line 21, in <module>
    main()
  File "./manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "C:\dev\projects\django\django\core\management\__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "C:\dev\projects\django\django\core\management\__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "C:\dev\projects\django\django\core\management\base.py", line 323, in run_from_argv
    self.execute(*args, **cmd_options)
  File "C:\dev\projects\django\django\core\management\base.py", line 373, in execute
    self.stdout.write(output)
  File "C:\dev\projects\django\django\core\management\base.py", line 145, in write
    self._out.write(style_func(msg))
  File "C:\Python\35x32\lib\encodings\cp1251.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode character '\xe5' in position 4888: character maps to <undefined>
'
======================================================================
FAIL: test_accent (dbshell.test_postgresql_psycopg2.PostgreSqlDbshellCommandTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "C:\dev\projects\django\tests\dbshell\test_postgresql_psycopg2.py", line 100, in test_accent
    pgpass_string,
AssertionError: Tuples differ: (['ps[32 chars]st', '-p', '444', 'dbname'], None) != (['ps[32 chars]st', '-p', '444', 'dbname'], 'somehost:444:dbname:r\xf4le:s\xe9same')
First differing element 1:
None
'somehost:444:dbname:r\xf4le:s\xe9same'
- (['psql', '-U', 'r\xf4le', '-h', 'somehost', '-p', '444', 'dbname'], None)
?                                                                  ------
+ (['psql', '-U', 'r\xf4le', '-h', 'somehost', '-p', '444', 'dbname'],
+  'somehost:444:dbname:r\xf4le:s\xe9same')
----------------------------------------------------------------------
Ran 12577 tests in 2925.839s
FAILED (failures=3, errors=3, skipped=948, expected failures=4)
Destroying test database for alias 'default'\u2026
Destroying test database for alias 'other'\u2026
follow-up: 14 comment:11 by , 7 years ago
Alright, don't worry about these failures they are likely related to your Windows configuration. Do you feel comfortable submitting a Github PR with your changes?
comment:12 by , 7 years ago
| Owner: | changed from to | 
|---|---|
| Status: | new → assigned | 
comment:14 by , 7 years ago
Replying to Simon Charette:
I've pushed a fix with tests in PR. Can you please review it?
Thanks
follow-up: 16 comment:15 by , 7 years ago
| Patch needs improvement: | set | 
|---|
Thanks for taking the time to submit a PR Nasir! I left a few comments for improvements on the PR, please uncheck patch needs improvement once they are addressed.
comment:16 by , 7 years ago
| Patch needs improvement: | unset | 
|---|
Replying to Simon Charette:
Thanks for taking the time to submit a PR Nasir! I left a few comments for improvements on the PR, please uncheck patch needs improvement once they are addressed.
Hi, thanks for reviewing. I've pushed the fixed according to the review. Can you please review it again? 
Thanks,
comment:17 by , 7 years ago
| Triage Stage: | Accepted → Ready for checkin | 
|---|
I'm pretty sure it's caused by
Subquery.contains_aggregatereturningTruewhen it shouldn't; this attribute shouldn't cross the subquery boundary. Right now it defaults toBaseExpression.contains_aggregatewhich is based of the filters of the subquery and would explain why it's the.filter(cnt__gt=3)that triggers the issue.Could you try setting
Subquery.contains_aggregate = Falseand confirm it drops the outerGROUP BY?Also, would you be interested in submitting a patch if it does fixes it? It should consist of setting this attribute and adding a test in tests/expressions/tests.py that uses the existing models. Happy to walk you through a review if you are interested.