Opened 16 years ago

Closed 16 years ago

Last modified 16 years ago

#9136 closed (fixed)

Oracle backend: slicing is using row_number() instead of rownum

Reported by: Guillaume Taglang <guillaume.taglang@…> Owned by: nobody
Component: Database layer (models, ORM) Version: 1.0
Severity: Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: no UI/UX: no

Description

Doing Entity.objects.all()[:10] is generating a query like:

SELECT *
FROM   (SELECT (ROW_NUMBER() OVER (ORDER BY "ENTITY"."ID")) AS "_RN",
               "ENTITY"."ID", "ENTITY"."NAME"
        FROM   "ENTITY")
WHERE  "_RN" > 0
AND    "_RN" <= 10

This is suboptimal as a full table scan is forced (at least in 10g).
Using rownum instead of row_number() is faster (it can use indexes), and doesn't require an order by clause when not required. The query would look like:

SELECT   "ENTITY"."ID", "ENTITY"."NAME"
FROM     "ENTITY"
WHERE    ROWNUM > 0 AND ROWNUM <= 10
ORDER BY "ENTITY"."ID"

It won't make a noticeable difference with tables with a couple of rows; but when you are reaching the million rows, you go from minutes to milliseconds.

Attachments (2)

rownum.patch (2.3 KB ) - added by Guillaume Taglang <guillaume.taglang@…> 16 years ago.
patch replacing row_number() with row
rownum.2.patch (1.6 KB ) - added by Guillaume Taglang <guillaume.taglang@…> 16 years ago.
Updated patch

Download all attachments as: .zip

Change History (10)

by Guillaume Taglang <guillaume.taglang@…>, 16 years ago

Attachment: rownum.patch added

patch replacing row_number() with row

comment:1 by Guillaume Taglang <guillaume.taglang@…>, 16 years ago

Has patch: set

comment:2 by Guillaume Taglang <guillaume.taglang@…>, 16 years ago

The patch is not extensively tested, but my basic tests work fine.

by Guillaume Taglang <guillaume.taglang@…>, 16 years ago

Attachment: rownum.2.patch added

Updated patch

comment:3 by Erin Kelly, 16 years ago

I tried to do some limited benchmarking on this on an Oracle XE installation and got conflicting results. The model I used was this:

class Test(models.Model):
    class Meta:
        ordering = ['id']

With 1,000,000 rows in the table, I tried taking slices of 100 rows from the beginning, middle, and end of the table. Using rownum, the time was consistently about 870 ms/query. Using row_number(), the time was about 20 ms/query from the beginning of the table, about the same as rownum from the middle, and about twice as long as rownum from the end. Removing the default ordering from the model slightly sped up rownum, but slightly slowed down row_number().

It's difficult to judge, but I guess that the most common scenario for slicing large table would be taking small slices from near the beginning of an ordered query. From that point of view, I'm in favor of keeping row_number() at this point, but I could be persuaded to change my mind with fresh evidence.

Note that I recently changed the overall structure of the query in [9221], which could have an impact on the results.

comment:4 by Guillaume Taglang, 16 years ago

The change to the query in [9221] fixes my main gripes about slicing with oracle.

My own test on a set of table with some real data do not show any significant difference between rownum and row_number() for the first few rows, however as the number of rows increase row_number() gets significantly slower.

If you look at the result of an explain plan you will easily understand why: row_number() is an analytic function and the database needs to apply the function first; rownum on the oder hand is built in the query and costs nothing. Depending on data and indexes the cost difference between rownum and row_number(), as calculated by the optimizer, can reach several order of magnitude. (At least for my release of Oracle: 10.2 Enterprise.)

(As a side note: 870ms per query for the table you mentioned is really bad. Are you sure you did execute the tests with warm and/or cold cache for both rownum and row_number() ?)

(Second side note: looks like the spam filter is a tad overzealous.)

comment:5 by Erin Kelly, 16 years ago

Yeah, that database is running on an old Pentium 3 desktop, and it's probably horrendously misconfigured to boot -- I'm not a DBA. I just tried running the same test on one of our real development databases, and in that scenario both methods run about 10k queries/second regardless of position, which probably just means that the network is now the bottleneck and I need to do a larger test.

comment:6 by Erin Kelly, 16 years ago

Scratch that, I messed up the second test and the queries weren't actually running (I knew 10k / second was too good to be true). I am seeing row_number() running significantly slower than rownum now.

comment:7 by Erin Kelly, 16 years ago

Resolution: fixed
Status: newclosed

(In [9235]) Fixed #9136: Do slicing in Oracle with rownum instead of row_number() for a speed improvement. Thanks, Guillaume Taglang.

comment:8 by Erin Kelly, 16 years ago

(In [9236]) [1.0.X] Fixed #9136: Do slicing in Oracle with rownum instead of
row_number() for a speed improvement. Thanks, Guillaume Taglang.

Backport of [9235] from trunk.

Note: See TracTickets for help on using tickets.
Back to Top