Opened 9 years ago
Last modified 5 days ago
#25705 assigned Cleanup/optimization
Parameters are not adapted or quoted in Query.__str__
Reported by: | Dmitry Dygalo | Owned by: | Alex |
---|---|---|---|
Component: | Database layer (models, ORM) | Version: | dev |
Severity: | Normal | Keywords: | |
Cc: | Petr Přikryl, Ülgen Sarıkavak, Simon Charette, Mariusz Felisiak | Triage Stage: | Accepted |
Has patch: | yes | Needs documentation: | no |
Needs tests: | no | Patch needs improvement: | yes |
Easy pickings: | no | UI/UX: | no |
Description
Now it is just string interpolation of the SQL template with parameters and in most cases produces invalid queries for the following reasons:
- No quoting
- No adaptation. So, some python objects will be used as is, not like their SQL equivalents
Yes, there are situations, when output of Query.__str__
is equal to actual query. But for debugging reasons, it will be better to see real query here. Also it is logical and expected behavior of this method - to show actual query.
Attachments (1)
Change History (25)
comment:1 by , 9 years ago
Has patch: | set |
---|---|
Needs tests: | set |
Owner: | changed from | to
Patch needs improvement: | set |
Status: | new → assigned |
by , 9 years ago
Attachment: | ticket25705.patch added |
---|
comment:2 by , 9 years ago
Commit: https://github.com/Stranger6667/django/commit/b2e36668a6877fe29923493e5669402dd30d6421
This issue is more general case of #24991
comment:3 by , 9 years ago
comment:4 by , 9 years ago
Triage Stage: | Unreviewed → Accepted |
---|
comment:5 by , 6 years ago
Replying to Dmitry Dygalo:
Now it is just string interpolation of the SQL template with parameters and in most cases produces invalid queries for the following reasons:
- No quoting
- No adaptation. So, some python objects will be used as is, not like their SQL equivalents
Yes, there are situations, when output of
Query.__str__
is equal to actual query. But for debugging reasons, it will be better to see real query here. Also it is logical and expected behavior of this method - to show actual query.
Fascinating. I inadvertantly filed a duplicate of this it seems:
https://code.djangoproject.com/ticket/30132#ticket
and I would dearly love this fixed. In essence Query.__str__
should return exactly what is logged as per here:
so we can reliably see and extract teh SQL of a query in a production environment in which DEBUG is not enabled!
I would gladly fix this and PR it, if I had the skills and I am close to that, as I am coding and debugging python, but single stepping into Query.__str__
didn't find me a quick easy answer and I bailed for now. So if you have any tips as to where the code is that Query.__str__
uses to generate SQL and where the code is that logs SQL to connection.queries
I canbegin to look at it consider how this might be fixed.
It seems to me an experience Django coder could fix this in minutes and that there should be a regression test for this kind of code:
qs=model.objects.somequeryset sql=str(qs.query) raw_qs=model.objects.raw(sql)
I have a test bed in which I confirmed this failure here:
Let me know how I might help get this fixed sooner, given ti was reported 3 years ago and is still not fixed!
comment:7 by , 4 years ago
Cc: | added |
---|
comment:8 by , 3 years ago
comment:9 by , 3 years ago
Owner: | removed |
---|---|
Status: | assigned → new |
comment:10 by , 9 months ago
Cc: | added |
---|
comment:11 by , 6 months ago
Owner: | set to |
---|---|
Status: | new → assigned |
comment:12 by , 6 months ago
I've done some investigation.
The main issue comes to the Python DB API doesn't have a way to do this. The only way to see the query with the parameters binded correctly is after executing it.
As Mariusz commented here, only Mysql/MariaDB and Postgres have a way to do it via a mogrify
function which is their own extension to the API.
Django is already using the postgres mogrify in its own compose_sql function in the search backend, schema queries and ensuring the role of the connection
In Oracle and SQLite, none of the extensions to the API they add allows do this.
I see two approaches:
- We fix this issue for the first 3 backends and leave it as it is in Oracle and SQLite.
- Use the mogrify function in the first three backends, and manually quote the parameters in the other two. Something similar was already attempted in this PR and it was rejected. The amount of effort needed to implement and maintain this, example on how cPython does it for SQLite, would probably be too much since this seems to be the only use case.
follow-up: 15 comment:13 by , 5 months ago
My concerns with fixing this issue related to comments such as comment:5
It seems to me an experience Django coder could fix this in minutes and that there should be a regression test for this kind of code:
qs=model.objects.somequeryset sql=str(qs.query) raw_qs=model.objects.raw(sql)
We absolutely don't want to support this pattern in a context where we can't guarantee that the proper quoting is performed on all supported backends as that might result in SQL injection problems. In this sense I think that it's a good thing that sql.Query.__str__
doesn't attempt to perform the proper quoting to make it clear it should not be used for this purpose.
I'd much rather see us document sql.Query.sql_with_params(using: str = DEFAULT_DB_ALIAS)
which could be used as
sql, params = qs.query.sql_with_params() model.objects.raw(sql, params)
over spending time trying to dangerously emulate parameters quoting.
comment:14 by , 5 months ago
@Simon do you think we should close this ticket as wontfix
.
I can recall 2 or 3 (or 4?) occasions since say DjangoCon US 2018 where a (new) contributor has tried to pick this up, only to hit the same wall. The trail of breadcrumbs gets slightly clearer each time, but, is there a realistic fix for this issue, and if not can we not say so?
comment:15 by , 5 months ago
Replying to Simon Charette:
My concerns with fixing this issue related to comments such as comment:5
It seems to me an experience Django coder could fix this in minutes and that there should be a regression test for this kind of code:
qs=model.objects.somequeryset sql=str(qs.query) raw_qs=model.objects.raw(sql)We absolutely don't want to support this pattern in a context where we can't guarantee that the proper quoting is performed on all supported backends as that might result in SQL injection problems. In this sense I think that it's a good thing that
sql.Query.__str__
doesn't attempt to perform the proper quoting to make it clear it should not be used for this purpose.
I'm not sure I understand your point. I think the original commenter was suggesting that just as a simple (and probably not ideal) way to do the test. I wouldn't consider someone doing str(qs.query)
to then pass in to raw()
(which the docs already warn of SQL injections) a real use case that anyone would do.
Personally, the use case I've had with this issue is printing the query to then try to format it and maybe execute it in a sql editor connected that my database while testing stuff in local. For that case your suggestion of sql_with_params wouldn't cover it. I'm not against having that method either, but I don't see it as a replacement of a correctstr(qs.query)
.
Also my recommended choice was not trying to emulate parameter quoting, but to use the database library mogrify
method in 3 of the 5 supported backends.
follow-up: 17 comment:16 by , 5 months ago
Cc: | added |
---|
@Carlton I think the request for this feature is a symptom of a different problem so maybe we could re-purpose this ticket instead?
@Alex
I wouldn't consider someone doing str(qs.query) to then pass in to raw() (which the docs already warn of SQL injections) a real use case that anyone would do.
I don't agree on that point. From helping users through various forums in most cases the reason why they were bringing up the issue of improper quoting was when they tried executing the query as is. The duplicates of this ticket seem to support this position (see #17741 and #25092 SO) and I don't think would be hard to find other examples.
It might not be your personal use case as you are well versed into why this a bad idea but I think it would be irresponsible to our user base from an API design (newcomers footgun) and maintenance burden perspective (think of the security and bug reports about mogrify
not producing the exact same SQL as backend escaping normally does).
Also my recommended choice was not trying to emulate parameter quoting, but to use the database library mogrify method in 3 of the 5 supported backends.
You second alternative mentioned
Use the mogrify function in the first three backends, and manually quote the parameters in the other two.
I think that would qualify as a form of emulation or best-effort wouldn't it? What guarantees do we have that the quoting is appropriate for all datatypes that these backends support as parameters? Are we feeling confident, knowing that users will attempt to pipe str(qs.query)
into raw
and that Query.__str__
always use DEFAULT_DB_ALIAS
for compilation #25947 (even if the proper backend can often only be determined as execution time) that we are exposing the right tools to users and we can commit to them being safe for the years to come?
IMO the usage of raw(str(qs.query))
, and the main motive for this ticket, is a symptom of a lack of documented way for safely building and executing the SQL and parameters from a QuerySet
object which makes me believe the focus should be on documenting queryset.qs.sql_with_params()
first instead.
As for the printing and copy-pasting into a shell use case we've reached an agreement that the only way to see the query with the parameters binded correctly is after executing it. Knowing that, and the fact Query.__str__
is still pretty legible even without the exact mogrifying that only leaves the copy-pasting into a shell use case. Should we make it easier to retrieve the exact SQL associated with a particular queryset instead? Something like QuerySet.executed_query: str | None
that is only present on evaluated querysets.
These appears like solutions that prevent misuse of Query.__str__
and avoid maintaining correct and safe x-backend mogrifying solutions?
comment:17 by , 5 months ago
Replying to Simon Charette:
I wouldn't consider someone doing str(qs.query) to then pass in to raw() (which the docs already warn of SQL injections) a real use case that anyone would do.
I don't agree on that point. From helping users through various forums in most cases the reason why they were bringing up the issue of improper quoting was when they tried executing the query as is. The duplicates of this ticket seem to support this position (see #17741 and #25092 SO) and I don't think would be hard to find other examples.
If the worry is raw(str(qs.query))
, then not having the quoting is a security issue.
In fact I can actually do an SQL injection abusing this lack of quoting.
qs = User.objects.filter(first_name='"test")--', is_staff=True).only('id') >>> <QuerySet []> list(User.objects.raw(str(qs.query))) >>> [<User: test>] list(User.objects.raw(str(qs.query))) >>> 'SELECT "auth_user"."id" FROM "auth_user" WHERE ("auth_user"."first_name" = "test")-- AND "auth_user"."is_staff")'
With proper quoting, as the one done when executing normally, the query is SELECT "auth_user"."id" FROM "auth_user" WHERE ("auth_user"."first_name" = \'"test")--\' AND "auth_user"."is_staff")
So if raw(str(qs.query))
is a risk, then quoting the parameters would fix it.
Also my recommended choice was not trying to emulate parameter quoting, but to use the database library mogrify method in 3 of the 5 supported backends.
You second alternative mentioned
Use the mogrify function in the first three backends, and manually quote the parameters in the other two.
I think that would qualify as a form of emulation or best-effort wouldn't it? What guarantees do we have that the quoting is appropriate for all datatypes that these backends support as parameters? Are we feeling confident, knowing that users will attempt to pipe
str(qs.query)
intoraw
and thatQuery.__str__
always useDEFAULT_DB_ALIAS
for compilation #25947 (even if the proper backend can often only be determined as execution time) that we are exposing the right tools to users and we can commit to them being safe for the years to come?
And as I said just after that, that alternative was rejected in the past, would be too much effort and I was only mentioning it as the only way to fix it in every backend. I should have been clearer that I wasn't pushing for this option.
IMO the usage of
raw(str(qs.query))
, and the main motive for this ticket, is a symptom of a lack of documented way for safely building and executing the SQL and parameters from aQuerySet
object which makes me believe the focus should be on documentingqueryset.qs.sql_with_params()
first instead.
That idea was rejected 14 months ago, but it could be reconsidered in light of this discussion.
Also, I didn't actually find qs.query
being documented, mostly this mention about pickling querysets.
As for the printing and copy-pasting into a shell use case we've reached an agreement that the only way to see the query with the parameters binded correctly is after executing it. Knowing that, and the fact
Query.__str__
is still pretty legible even without the exact mogrifying that only leaves the copy-pasting into a shell use case. Should we make it easier to retrieve the exact SQL associated with a particular queryset instead? Something likeQuerySet.executed_query: str | None
that is only present on evaluated querysets.
I would be happy with this option. An alternative would be an error if the queryset hasn't been evaluated instead of None. But I'm not that big of a fan of the error idea.
These appears like solutions that prevent misuse of
Query.__str__
and avoid maintaining correct and safe x-backend mogrifying solutions?
However, my initial investigation was not deep enough and I found more reasons to discard the mogrify option:
- In mysqlclient, mogrify was added in version 2.2.0 and django supports 1.4.3 onwards, so there would have to be an additional check and a difference in behavior.
- In the MySQL Connector/Python driver I couldn't find a mogrify method. Not the recommended driver, but still supported.
- In psicopg3,
mogrify
is only on the ClientCursor class. While that's the default in Django, server side is also supported. Again, an additional check. Also, I'm not sure if this means that these usages of mogrify I mentioned in the postgres backend don't work with server side parameter binding.
Django is already using the postgres mogrify in its own compose_sql function in the search backend, schema queries and ensuring the role of the connection
My proposal is to close this as a wontfix and work on the QuerySet.executed_query
idea and maybe reconsider documenting sql_with_params
I'm happy to do both as long as I don't get yelled for reopenniing the wonfix sql_with_params
ticket #34636
comment:18 by , 5 months ago
Cc: | added |
---|
So if raw(str(qs.query)) is a risk, then quoting the parameters would fix it.
I was not arguing that it is not possible to cause SQL injection today by using raw(str(qs.query))
but that the moment we do fix it by quoting parameters we must ensure to safely support this anti-pattern going forward unless we document sql_with_params
as the blessed raw of doing that.
And as I said just after that, that alternative was rejected in the past, would be too much effort and I was only mentioning it as the only way to fix it in every backend. I should have been clearer that I wasn't pushing for this option.
Sorry I missed that, I thought you were pushing for this option.
That idea was rejected 14 months ago, but it could be reconsidered in light of this discussion.
I wasn't aware that documenting sql_with_params
was rejected in #34636. I do think we should reconsider.
I would be happy with this option. An alternative would be an error if the queryset hasn't been evaluated instead of None. But I'm not that big of a fan of the error idea.
I don't have a strong opinion on the subject, either works for me!
My proposal is to close this as a wontfix and work on the QuerySet.executed_query idea and maybe reconsider documenting sql_with_params. I'm happy to do both as long as I don't get yelled for reopenniing the wonfix sql_with_params ticket #34636.
I'm happy to support you towards re-opening #34636 but we should likely follow the normal process of gathering a bit more consensus. It'd be great to hear from Mariusz and others what their thoughts are on the matter.
follow-up: 20 comment:19 by , 5 months ago
FWIW I’ve wanted sql_with_params() to be public for other reasons, so I’d be keen there is there are no blockers
comment:20 by , 5 months ago
Replying to Carlton Gibson:
FWIW I’ve wanted sql_with_params() to be public for other reasons, so I’d be keen there is there are no blockers
I don't mind documenting sql_with_params()
, but would like to avoid encouraging users to use raw SQL queries. Also, sql_with_params()
is not a panacea, it has it's limitation e.g. it always uses the default database connection. For most users it may be tricky to include parameters into an SQL string which can lead to SQL injection vectors. IMO, sql_with_params()
is only for advanced users, who know the risks.
comment:21 by , 5 months ago
Thank you for chiming in Mariusz.
I don't mind documenting
sql_with_params()
, but would like to avoid encouraging users to use raw SQL queries.. [snip]
For most users it may be tricky to include parameters into an SQL string which can lead to SQL injection vectors. IMO, sql_with_params() is only for advanced users, who know the risks.
I share this sentiment but I believe that there will always be a need for an escape hatch and if we don't provide a safe one users will follow the path of least resistance that sql.Query.__str__
provides. My hope is that with the documentation of sql_with_params
we could even consider disallowing passing str(qs.query)
to raw
and cursor.execute
completely to prevent its misuse. Something like
-
django/db/backends/utils.py
diff --git a/django/db/backends/utils.py b/django/db/backends/utils.py index ab0ea8258b..9874b362b3 100644
a b 9 9 10 10 from django.apps import apps 11 11 from django.db import NotSupportedError 12 from django.db.utils import QueryStr 12 13 from django.utils.dateparse import parse_time 13 14 14 15 logger = logging.getLogger("django.db.backends") … … def _execute_with_wrappers(self, sql, params, many, executor): 94 95 def _execute(self, sql, params, *ignored_wrapper_args): 95 96 # Raise a warning during app initialization (stored_app_configs is only 96 97 # ever set during testing). 98 if isinstance(sql, QueryStr): 99 raise TypeError("str(qs.query) cannot be passed directly to execute(sql).") 97 100 if not apps.ready and not apps.stored_app_configs: 98 101 warnings.warn(self.APPS_NOT_READY_WARNING_MSG, category=RuntimeWarning) 99 102 self.db.validate_no_broken_transaction() … … def _execute(self, sql, params, *ignored_wrapper_args): 107 110 def _executemany(self, sql, param_list, *ignored_wrapper_args): 108 111 # Raise a warning during app initialization (stored_app_configs is only 109 112 # ever set during testing). 113 if isinstance(sql, QueryStr): 114 raise TypeError("str(qs.query) cannot be passed directly to execute(sql).") 110 115 if not apps.ready and not apps.stored_app_configs: 111 116 warnings.warn(self.APPS_NOT_READY_WARNING_MSG, category=RuntimeWarning) 112 117 self.db.validate_no_broken_transaction() -
django/db/models/sql/query.py
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py index f00eb1e5a5..9e7cc03527 100644
a b 18 18 19 19 from django.core.exceptions import FieldDoesNotExist, FieldError 20 20 from django.db import DEFAULT_DB_ALIAS, NotSupportedError, connections 21 from django.db.utils import QueryStr 21 22 from django.db.models.aggregates import Count 22 23 from django.db.models.constants import LOOKUP_SEP 23 24 from django.db.models.expressions import ( … … class RawQuery: 145 146 """A single raw SQL query.""" 146 147 147 148 def __init__(self, sql, using, params=()): 149 if isinstance(sql, QueryStr): 150 raise TypeError("Query strings cannot be safely used by RawQuery. Use sql_with_params().") 148 151 self.params = params 149 152 self.sql = sql 150 153 self.using = using … … def params_type(self): 191 194 192 195 def __str__(self): 193 196 if self.params_type is None: 194 return self.sql195 return self.sql % self.params_type(self.params)197 return QueryStr(self.sql) 198 return QueryStr(self.sql % self.params_type(self.params)) 196 199 197 200 def _execute_query(self): 198 201 connection = connections[self.using] … … def __str__(self): 340 343 done by the database interface at execution time. 341 344 """ 342 345 sql, params = self.sql_with_params() 343 return sql % params346 return QueryStr(sql % params) 344 347 345 348 def sql_with_params(self): 346 349 """ -
django/db/utils.py
diff --git a/django/db/utils.py b/django/db/utils.py index e45f1db249..a5f821cee7 100644
a b def get_migratable_models(self, app_config, db, include_auto_created=False): 276 276 """Return app models allowed to be migrated on provided db.""" 277 277 models = app_config.get_models(include_auto_created=include_auto_created) 278 278 return [model for model in models if self.allow_migrate_model(db, model)] 279 280 281 class QueryStr(str): 282 """ 283 String representation of a SQL query with interpolated params that is 284 unsafe to pass to the database in raw form. 285 """ 286 def __str__(self): 287 return self
Also, sql_with_params() is not a panacea, it has it's limitation e.g. it always uses the default database connection.
I don't think this is a big issue, the above discussions suggest allowing alias
to be passed
-
django/db/models/sql/query.py
diff --git a/django/db/models/sql/query.py b/django/db/models/sql/query.py index 9e7cc03527..f952c1bb86 100644
a b def __str__(self): 345 345 sql, params = self.sql_with_params() 346 346 return QueryStr(sql % params) 347 347 348 def sql_with_params(self ):348 def sql_with_params(self, alias=DEFAULT_DB_ALIAS): 349 349 """ 350 350 Return the query as an SQL string and the parameters that will be 351 351 substituted into the query. 352 352 """ 353 return self.get_compiler( DEFAULT_DB_ALIAS).as_sql()353 return self.get_compiler(alias).as_sql() 354 354 355 355 def __deepcopy__(self, memo): 356 356 """Limit the amount of work when a Query is deepcopied."""
With your approval I'll re-open #34636, Alex feel free to pick it up.
comment:22 by , 4 months ago
Added a draft PR https://github.com/django/django/pull/18468 copying the description for posterity and easy access.
No tests done yet. I'll like to know first if the approach I did is valid.
Since the DB connections are per-thread as far as I know, this approach should not have any race condition issues. And since it's done before the prefetch, it should pick the Queryset query. Also, since queries are only logged if DEBUG is enabled, this approach requires that too.
My discarded approach was to to get it somehow in the return of self._result_cache = list(self._iterable_class(self))
, or stored in the QuerySet.
However, it's much more complicated, because as far as I see the flow is something like
QuerySet when evaluating calls the IterableClass which then executes the query which then gets the connection cursor, which is wrapped in the previous mentioned one that can log the query.
So the a way would be to make all 4 SqlCompiler.execute_sql
return the result and the SQL query. Then you have to adapt all 6 Iterable.__iter__
to also return the query. Then you can store it in the Queryset. Since these methods are part of the public API, this would be a breaking change in both methods. Also I'm not 100% sure how to get the query in the wrapper since the Cursor.execute
is wrapped in a context manager which is the one logging the query.
comment:23 by , 8 weeks ago
Needs tests: | unset |
---|---|
Patch needs improvement: | unset |
comment:24 by , 5 days ago
Patch needs improvement: | set |
---|
Initial draft