#34091 closed Bug (wontfix)
Invalid SQL: FROM clauses can be omitted when QuerySet is accessed from multiple threads
Reported by: | Marti Raudsepp | Owned by: | nobody |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | 4.1 |
Severity: | Normal | Keywords: | race-condition, orm, threading, SQLCompiler |
Cc: | Triage Stage: | Unreviewed | |
Has patch: | no | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | no |
Easy pickings: | no | UI/UX: | no |
Description
I am experiencing very rare cases, where Django generates invalid SQL for select_related()
tables. This occurs in a place where multiple threads are trying to execute the same QuerySet at the same time. (The access from multiple threads was unintentional, but I believe this deserves to be fixed in Django anyway)
When this occurs, tables/joins in the FROM clause can be missing entirely, but they are still referenced in the SELECT clause, and the query fails, for example sqlite3.OperationalError: no such column: app_fk18.id
or ORA-00904: "xxx"."xxx": invalid identifier
I could not reproduce this with vanilla Django, but after patching Django to add time.sleep(0)
in SQLCompiler.compile()
, I can reliably reproduce this.
The reproducer and longer explanation is here: https://github.com/intgr/bug-reports/tree/main/django-query-race-condition
I suspect some mutation of the Query
object is going on during SQL compilation, but by efforts to narrow down the mutation have been unsuccessful so far.
Change History (4)
comment:1 by , 2 years ago
comment:2 by , 2 years ago
Resolution: | → wontfix |
---|---|
Status: | new → closed |
comment:3 by , 2 years ago
Django doesn't provide any thread safety guarantees for ORM objects (model instances, querysets) and it's your responsibility
Do document this! Most Django APIs seem to be thread-safe, how is a user supposed to know, where the limits of thread (un)safety are?
make sure they create copies of the references before manipulating them.
So QuerySet.all()
is thread-safe, but iterating over QuerySet is not? How is one supposed to know?
Before investigating this, I explicitly searched to figure out whether QuerySets are thread safe, the only result that was sort-of authoritative was https://code.djangoproject.com/ticket/11906 and since it's marked as "fixed", it looked like this API aims to support thread safety.
comment:4 by , 2 years ago
Do document this! Most Django APIs seem to be thread-safe, how is a user supposed to know, where the limits of thread (un)safety are?
I was under the impression that this was formally documented but just like multi-processing gotchas when dealing with connections it doesn't seem to be the case.
So QuerySet.all() is thread-safe, but iterating over QuerySet is not? How is one supposed to know?
Accessing shared_qs: QuerySet
in a thread and calling thread_qs = shared_qs.all()
creates a copy of the queryset as document hence threads don't share the same instance.
he only result that was sort-of authoritative was #11906 and since it's marked as "fixed", it looked like this API aims to support thread safety.
I again don't agree with your conclusion here. If you read the ticket's evolution it was marked as fixed because QuerySet._fill_cache
was removed during a refactor, not because thread safety guarantees for ORM objects changed. There's even a comment in there that points to a page that is now long gone and stated
However, QuerySets are known not to be thread-safe, see #11906. Usually that does not pose problems as they are (or should be) not shared between threads in Django. The exception to that rule is the use of exotic global/class-level/shared instance variable querysets in your own code (e.g. when using the ORM outside of the Django dispatch system), where you are assumed to know what you are doing and protect them appropriately anyway.
Thank you for your report and investigation but I disagree with your conclusion.
Django doesn't provide any thread safety guarantees for ORM objects (model instances, querysets) and it's your responsibility to ensure that copies of these objects are used in threads instead of sharing them. That's the reason why thread safe built-in abstractions that interact with the ORM and share references to objects (e.g. generic model views) make sure they create copies of the references before manipulating them.
Even if we were to have the compiler make a copy of its query before altering its alias counts and then setting it back it's likely only one of the locations that make query compilation non-thread safe.
We can't accept this ticket without committing to making the ORM entirely thread-safe with the performance and complexity penalties it incurs. If you disagree with my conclusion feel free to open a thread on the developers mailing list to discuss the subject further.