Opened 2 months ago
Last modified 38 hours ago
#36912 assigned Cleanup/optimization
Q.create() doesn't validate connectors as Q.__init__() does
| Reported by: | Jacob Walls | Owned by: | Anna Makarudze |
|---|---|---|---|
| Component: | Database layer (models, ORM) | Version: | dev |
| Severity: | Normal | Keywords: | _connector security |
| Cc: | Triage Stage: | Accepted | |
| Has patch: | yes | Needs documentation: | no |
| Needs tests: | no | Patch needs improvement: | no |
| Easy pickings: | no | UI/UX: | no |
Description (last modified by )
In 98e642c69181c942d60a10ca0085d48c6b3068bb, we mitigated a SQL injection vector for user-controlled arguments to filter() and friends (CVE-2025-64459) by adding validation for the _connector argument.
We deliberately avoided adding the same validation to Q.create(), because Q.create is an undocumented internal not to be used with user-controlled field names and was created specifically for the purpose of speed.
The Security Team then received more than one report extrapolating from CVE-2025-64459, suggesting that Q.create was missing the same validation.
Although we don't consider this a security vulnerability, we would be interested to evaluate if adding validation to Q.create to match Q.__init__ would be cheap enough to implement. A concern is how many times Q.create might be called in loops. Thus, one part of the solution here might be a refactor to reduce the number of times Q.create is called in loops in Django's code:
in contrib.contenttypes:
https://github.com/django/django/blob/13299a6203f4bc3e5b2552c96a51ff2b15da3c43/django/contrib/contenttypes/fields.py#L682-L695
in the deletion Collector:
https://github.com/django/django/blob/13299a6203f4bc3e5b2552c96a51ff2b15da3c43/django/db/models/deletion.py#L357-L358
...others?
Tentatively passing over to Djangonaut Space navigators to see if a good fit for anyone (git archaeology, benchmarking, security, ORM internals)
Change History (8)
comment:1 by , 2 months ago
| Description: | modified (diff) |
|---|
comment:2 by , 2 months ago
| Description: | modified (diff) |
|---|
comment:3 by , 2 months ago
| Triage Stage: | Unreviewed → Accepted |
|---|
comment:4 by , 5 weeks ago
Hey, I am Anna. I would like to work on this as part of the Djangonaut 6 Session (Team Saturn).
comment:6 by , 2 weeks ago
Hey Jacob, just an update.
I was having a discussion with Tim Schilling, and he suggested that I investigate the following:
Is the original reason for Q.create() still valid? The docstring on tree.Node.create() indicates the signatures were different between Q.init() and Q.create() hence the reason for the extra method. It looks like they are slightly different, but the biggest difference is in the unpacking and the sort() call.
Is there a performance difference if we swap out all Q.create() usages with Q() instead. This could be determined by creating a commit to Django that uses that (confirm the tests pass), then using that with django-asv to compare performance
What is the performance impact of calling [*args, *sorted(kwargs.items())] in Q.init()? This could be calculated in a script. This is where AI can be helpful. I used Claude to generate a script that provided a framework to tweak options to better understand the impact of that: https://claude.ai/share/d7af3307-58c8-4c91-af82-0d30d0571dfc
I investigated the 2nd scenario only with django-asv and the benchmarks indicated better performance for Q() than Q.create(). I didn't go ahead with creating a commit to Django using that method. I will be investigating 1 and 3 this week. Any thoughts or insights on how to tackle this ticket?
comment:7 by , 12 days ago
Hi Anna, thanks for these details. I agree the unpacking is the main difference, and that was the main rationale for switching to Q.create() more in 9dff316be41847c505ebf397e4a52a0a71e0acc4. From that commit message:
In addition, we were often needing to unpack iterables into *args and can instead pass a list direct to Node.create().
In other words, we already had an iterable, so we were able to micro-optimize out the unpacking by using Q.create().
Maybe we could have our cake and eat it too by just overriding create() on Q to do the necessary checking of valid connectors. This would allow us to get the desired checking without the (possibly) undesired cost of unpacking and sorting/repacking.
I investigated the 2nd scenario only with django-asv and the benchmarks indicated better performance for Q() than Q.create().
(At first glance, this seems surprising to me -- it's worth a second look to ensure your bench is fully isolating what you wanted to measure.)
Thanks!
comment:8 by , 38 hours ago
| Has patch: | set |
|---|
I have two pull requests open, one for Django here https://github.com/django/django/pull/21022 and another for django-asv here https://github.com/django/django-asv/pull/98. I am familiar with benchmarks so hopefully I got them right this time.
Thanks!