Opened 5 years ago
Closed 5 years ago
#31904 closed Bug (invalid)
Django queryset cannot call distinct() and order_by() on separate fields.
| Reported by: | Logan Knecht | Owned by: | nobody |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | 3.1 |
| Severity: | Normal | Keywords: | PostgreSQL, Django, order_by, distinct, queryset |
| Cc: | Triage Stage: | Unreviewed | |
| Has patch: | no | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description
Introduction
Hello there!
I want to start by saying thank you so much for this library! It is absolutely wonderful!
Data Model
There are Learners
There are Schools
A School has many Courses
A Course has many Lessons`
A School has many Resources
A Lesson has one Resource
Problem
I want all the Schools ordered by the most Learners that are enrolled, and I want the returned Schools to be distinct.
I have an issue where I've got to do a really complex query across multiple models in order to generate the full data of the school and return it. I have to use a heavily prefetched and select_related query to get what I need so that it's performant. When I query the School model that's returned it is duplicated multiple times.
I can de-duplicate the School by calling distinct("id").
However, now they're not in the correct order, so I want to call order_by("-learner_enrolled_count")
When I do this the query looks like this:
user_query_set = (UserModel.objects.all())
# ---
school_board_headmaster_query_set = (SchoolBoardHeadmasterModel.objects.all()
.select_related("learner",
"school_board"))
school_board_member_query_set = (SchoolBoardMemberModel.objects.all()
.select_related("learner",
"school_board"))
# ---
flash_card_set_query_set = (FlashCardSetModel.objects.all()
.select_related("school"))
flash_card_query_set = (FlashCardModel.objects.all()
.select_related("flash_card_set",
"playsheetmusicflashcardmodel",
"school",
"trueorfalseflashcardmodel"))
play_sheet_music_flash_card_query_set = (PlaySheetMusicFlashCardModel.objects.all()
.select_related("flash_card",
"school",
"sheet_music"))
true_or_false_flash_card_query_set = (TrueOrFalseFlashCardModel.objects.all()
.select_related("flash_card",
"school"))
sheet_music_query_set = (SheetMusicModel.objects.all()
.select_related("school"))
# ---
school_course_query_set = (SchoolCourseModel.objects.all()
.select_related("school"))
# ---
school_lesson_query_set = (SchoolLessonModel.objects.all()
.select_related("course",
"flash_card_set",
"school"))
# ----------------------------------------------------------------------
# Prefetch
# ----------------------------------------------------------------------
school_models = (super()
.get_queryset()
.select_related("school_board")
.prefetch_related(
# ------------------------------------------------------------------
# School Board
# ------------------------------------------------------------------
Prefetch("school_board__school_board_headmaster",
school_board_headmaster_query_set),
Prefetch("school_board__school_board_headmaster__learner__user",
user_query_set),
Prefetch("school_board__school_board_member",
school_board_member_query_set),
Prefetch("school_board__school_board_member__learner__user",
user_query_set),
# ------------------------------------------------------------------
# School Course
# ------------------------------------------------------------------
Prefetch("school_course",
school_course_query_set),
# ------------------------------------------------------------------
# School Course -> School Lesson
# ------------------------------------------------------------------
Prefetch("school_course__school_lesson",
school_lesson_query_set),
# ------------------------------------------------------------------
# School Course -> School Lesson -> Flash Card Set
# ------------------------------------------------------------------
Prefetch("school_course__school_lesson__flash_card_set",
flash_card_set_query_set),
Prefetch("school_course__school_lesson__flash_card_set__flash_card",
flash_card_query_set),
Prefetch("school_course__school_lesson__flash_card_set__flash_card__playsheetmusicflashcardmodel",
play_sheet_music_flash_card_query_set),
Prefetch("school_course__school_lesson__flash_card_set__flash_card__playsheetmusicflashcardmodel__sheet_music",
sheet_music_query_set),
Prefetch("school_course__school_lesson__flash_card_set__flash_card__trueorfalseflashcardmodel",
true_or_false_flash_card_query_set),
# ------------------------------------------------------------------
# School Resources
# ------------------------------------------------------------------
Prefetch("flashcardsetmodel_set",
flash_card_set_query_set),
# ---
Prefetch("flashcardsetmodel_set__flash_card",
flash_card_query_set),
# ---
Prefetch("flashcardsetmodel_set__flash_card__playsheetmusicflashcardmodel",
play_sheet_music_flash_card_query_set),
Prefetch("flashcardsetmodel_set__flash_card__playsheetmusicflashcardmodel__sheet_music",
sheet_music_query_set),
# ---
Prefetch("flashcardsetmodel_set__flash_card__trueorfalseflashcardmodel",
true_or_false_flash_card_query_set),
# ---
Prefetch("sheetmusicmodel_set",
sheet_music_query_set)
))
all_school_models = school_models.annotate(
learner_enrolled_count=Case(
When(learner_enrolled_school__learner_enrolled_course__learner_enrolled_lesson__is_enrolled=True,
then=1),
default=0,
output_field=IntegerField()
))
So I then call
all_school_models.distinct("id").order_by("-learner_enrolled_count")
Which then gives me this output:
Internal Server Error: /api/v1/schools/
Traceback (most recent call last):
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
return self.cursor.execute(sql, params)
psycopg2.errors.InvalidColumnReference: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON ("piano_gym_api_schoolmodel"."id") "piano...
^
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/core/handlers/exception.py", line 34, in inner
response = get_response(request)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/core/handlers/base.py", line 115, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/core/handlers/base.py", line 113, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/viewsets.py", line 114, in view
return self.dispatch(request, *args, **kwargs)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 505, in dispatch
response = self.handle_exception(exc)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 465, in handle_exception
self.raise_uncaught_exception(exc)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception
raise exc
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/rest_framework/views.py", line 502, in dispatch
response = handler(request, *args, **kwargs)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/piano_gym_api/versions/v1/views/school_view.py", line 102, in list
request)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/piano_gym_api/versions/v1/pagination/header_link_pagination.py", line 91, in paginate_queryset
total_result_count = len(queryset)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/query.py", line 258, in __len__
self._fetch_all()
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/query.py", line 1261, in _fetch_all
self._result_cache = list(self._iterable_class(self))
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/query.py", line 57, in __iter__
results = compiler.execute_sql(chunked_fetch=self.chunked_fetch, chunk_size=self.chunk_size)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/models/sql/compiler.py", line 1144, in execute_sql
cursor.execute(sql, params)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/debug_toolbar/panels/sql/tracking.py", line 198, in execute
return self._record(self.cursor.execute, sql, params)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/debug_toolbar/panels/sql/tracking.py", line 133, in _record
return method(sql, params)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 100, in execute
return super().execute(sql, params)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 68, in execute
return self._execute_with_wrappers(sql, params, many=False, executor=self._execute)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 77, in _execute_with_wrappers
return executor(sql, params, many, context)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
return self.cursor.execute(sql, params)
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/Users/hugbot/Repositories/piano-gym-api-web-server/piano_gym_back_end/venv/lib/python3.7/site-packages/django/db/backends/utils.py", line 86, in _execute
return self.cursor.execute(sql, params)
django.db.utils.ProgrammingError: SELECT DISTINCT ON expressions must match initial ORDER BY expressions
LINE 1: SELECT DISTINCT ON ("piano_gym_api_schoolmodel"."id") "piano...
^
[15/Aug/2020 06:42:27] "GET /api/v1/schools/?order_by=-learner_enrolled_count HTTP/1.1" 500 27740
Understanding
My understanding of this is that this is caused by PostgresSQL AND Django
If you read the documentation for Django it says
When you specify field names, you must provide an order_by() in the QuerySet, and the fields in order_by() must start with the fields in distinct(), in the same order.
- https://docs.djangoproject.com/en/3.0/ref/models/querysets/#django.db.models.query.QuerySet.order_by
- https://docs.djangoproject.com/en/3.0/ref/models/querysets/#distinct
So in this case the solution, if I'm lazy is just to do
all_school_models.distinct("learner_enrolled_count", "id").order_by("-learner_enrolled_count", "id")
But that doesn't return the correct information because now the results are distinct for the School models that have a distinct learner_enrolled_count AND id when I just want the distinct Schools and then now they are first ordered by learner_enrolled_count and immediately by id which is also incorrect.
Additional References
I'm struggling with this in the django-filter library and created an to issue to track it:
Conclusion
I have no idea how to achieve the ability to select distinct schools and order_by a separate column, and it doesn't seem like an unreasonable request? I have looked all over and see this issue repeated, but no work arounds provided.
Do you have guidance on how to achieve this?
This seems like an oversight of Django to not support the ability to distinctly select fields and order arbitrarily.
Change History (2)
comment:1 by , 5 years ago
| Type: | Uncategorized → Bug |
|---|
comment:2 by , 5 years ago
| Resolution: | → invalid |
|---|---|
| Status: | new → closed |
| Summary: | Django `queryset` Cannot Call `distinct()` and `order_by()` On Separate Fields When Using Postgres SQL → Django queryset cannot call distinct() and order_by() on separate fields. |
This is not a Django limitation that's how
DISTINCTworks in databases, it's also widely explained in docs.You should try to rewrite a queryset to omit duplicate rows, e.g. by using
Exists()instead ofCase()forlearner_enrolled_count.Closing per TicketClosingReasons/UseSupportChannels.