Opened 2 years ago

Last modified 6 weeks ago

#20516 new New feature

Allow use of prepared statements

Reported by: est Owned by: nobody
Component: Database layer (models, ORM) Version: master
Severity: Normal Keywords: db, prepared statement
Cc: mpessas@…, aron45 Triage Stage: Accepted
Has patch: no Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Hello,

Since we have persistant connection in Django 1.6 now, can we consider adding prepared statements?

So we can create statements upon the first query, and use the prepared statement for all the rest and so on.

http://dev.mysql.com/doc/refman/5.0/en/sql-syntax-prepared-statements.html

This could be a huge speed up for heavy db IO applications. Reduce traffice size as well as RTT. Also improves security by parameterize queries.

Change History (14)

comment:1 Changed 2 years ago by akaariai

  • Keywords mysql removed
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset
  • Summary changed from Prepared statement for MySQL to Allow use of prepared statements
  • Triage Stage changed from Unreviewed to Accepted
  • Version changed from 1.6-alpha-1 to master

Yes, I think addition of prepared statements is something to consider. No idea of what would be a nice API, or if this could be somehow delegated to third party apps. The latter option means some hooks are needed somewhere in the query execution path. One such point is execute_sql() where it would be fairly straightforward to change the used SQL to prepared statements if the query is set to use prepared statements.

FWIW there are use cases where prepared statements can give easily over 50% speedup. Such cases are simple to execute queries which retrieve just a few rows of data, but which have a massive amount of joins (so that planning overhead is big). Using .select_related() and complicated .filter() clauses are easy ways to produce such queries.

I don't see any reason to restrict this to MySQL only.

(This feature seems to need at least these parts:

  • API for setting given queryset to use prepared statements - using them automatically is generally a bad idea
  • Convert the query's execution to use prepared statement instead of SQL. This consists of:
    • A check if the given query is already prepared for the used connection
    • If so, use the correct prepared statement
    • If not, create a new prepared statement and use that)

comment:2 Changed 21 months ago by mpessas

  • Cc mpessas@… added

comment:3 Changed 21 months ago by anonymous

with statements seem to be the correct way to handle this from an API standpoint.

comment:4 Changed 21 months ago by FunkyBob

Surely the security only improves if the DB driver is faulty? Django already passes parameterised queries to the DB driver to handle escaping [or pass through parameterised, as some do].

But in general, I'm happy to throw my brain behind this effort :)

comment:5 Changed 21 months ago by FunkyBob

One idea that struck me the other day-

A prepared statement could yield a function with keyword arguments matching its bound parameters.

Something like:

    stmt = MyModel.objects.filter(foo__bar=Param('able'), baz=Param('baker')).exclude(foo_quux=7)

Though perhaps an explicit "prepare()" would be better, the Param arguments mark where to bind parameters, and what to call them.

The, you can execute the query with:

for my in stmt(able=1, baker="wibble"):

Last edited 14 months ago by apollo13 (previous) (diff)

comment:6 Changed 18 months ago by david@…

This isn't just for MySQL. If possible, it would be really nice to have some facility for using server-side prepare statements not created by Django. https://github.com/dimitri/preprepare is an example of such a use case

comment:7 Changed 14 months ago by the_drow

PostgresSQL also supports prepare statements.

@FunkyBob: That syntax looks great but the PreparedQuerySet object should support a context manager.

comment:8 Changed 14 months ago by FunkyBob

Could you elaborate on how you see context manager working with this feature?

comment:9 Changed 14 months ago by the_drow

The way I see it a PreparedQuerySet should support the following:

  • Preparing on execution of a query for the entire session:
qs = list(MyModel.objects.filter(foo__bar=1).prepare())
qs2 = list(MyModel.objects.filter(foo__bar=2)) # Already prepared for this session (that is the connection)

The query parameters are inferred from the query itself.
This is very convenient and doesn't require anyone to learn the new syntax.

  • Preparing for a certain scope and deallocating:
with MyModel.objects.filter(foo__bar=Param('foobar')).prepare() as qs:
     results = qs(1)
     # Some logic
     results = qs(2)
     # Some logic

When the PreparedQuerySet enters the context the query is prepared. When it exists the prepared query is deallocated.
It's not always a good idea to prepare queries for a very long time for rapidly changing tables (small tables don't need to use indexes for example).

  • Preparing a query for usage in a Manager or QuerySet:
class MyQuerySet(QuerySet):
  foobars = MyModel.objects.filter(foo__bar=Param('foobar')).prepare()

This will turn foobars into a callable with an argument named foobar.
The query will be prepared once the first query using it will be executed.
You can prepare ahead of time:

class MyQuerySet(QuerySet):
  foobars = MyModel.objects.filter(foo__bar=Param('foobar')).prepare(ahead=True) # I need a better name for the argument here

You can also prepare a query that will be deallocated after a TTL:

class MyQuerySet(QuerySet):
  foobars = MyModel.objects.filter(foo__bar=Param('foobar')).prepare(ttl=60000)

Explicitly deallocating a prepared query should also be possible:

qs = MyModel.objects.filter(foo__bar=1).prepare()
results = list(qs)
qs2 = list(MyModel.objects.filter(foo__bar=2)) # Already prepared for this session (that is the connection)
qs.deallocate()
results = list(qs) # the query is prepared again

The query will be prepared again once the first query is executed.

What do you guys think?

comment:10 Changed 5 months ago by thedrow

I investigated about implementing it today and it turns out that the Query class has a method called sql_with_params() which returns the SQL query without concatenating the parameters.

MySQL uses the following syntax:

PREPARE usrrptplan FROM "SELECT * FROM users u, logs l WHERE u.usrid=? AND u.usrid=l.usrid
    AND l.date = ?;"
EXECUTE usrrptplan USING 1, NOW()

PostgreSQL uses the following syntax:

PREPARE usrrptplan (int) AS
    SELECT * FROM users u, logs l WHERE u.usrid=$1 AND u.usrid=l.usrid
    AND l.date = $2;
EXECUTE usrrptplan(1, NOW());

It seems that when you call prepare() you need to switch the query entirely with the EXECUTE syntax.
Moreover, at least in PostgreSQL you have to check if the statement is already prepared and there's no SQL syntax to do that.
You need to query the pg_prepared_statements table in order to verify that it doesn't exist.
Since a prepared statement requires a name we can either randomize one or have one provided for us.
I think that providing both options is a good idea.

comment:11 Changed 5 months ago by thedrow

As apollo13 mentioned on IRC, prepared statements are at the session level on PostgresSQL so we can just store them on the connection object.

comment:12 Changed 5 months ago by thedrow

akaariai says we can just subclass the Query object and replace the parameters with the prepared statements placeholders.

comment:13 Changed 5 months ago by shaib

On Oracle, the concept of prepared statements is linked to a connection-level statement cache. Prepared statements are not handled with different SQL, but with different API calls -- specifically, cx_Oracle provides a prepare method on its cursors. This method is not currently used by Django's Oracle backend.

comment:14 Changed 6 weeks ago by aron45

  • Cc aron45 added
Note: See TracTickets for help on using tickets.
Back to Top