Code

Opened 6 years ago

Closed 6 years ago

Last modified 6 years ago

#9136 closed (fixed)

Oracle backend: slicing is using row_number() instead of rownum

Reported by: Guillaume Taglang <guillaume.taglang@…> Owned by: nobody
Component: Database layer (models, ORM) Version: 1.0
Severity: Keywords:
Cc: Triage Stage: Unreviewed
Has patch: yes Needs documentation: no
Needs tests: no Patch needs improvement: no
Easy pickings: UI/UX:

Description

Doing Entity.objects.all()[:10] is generating a query like:

SELECT *
FROM   (SELECT (ROW_NUMBER() OVER (ORDER BY "ENTITY"."ID")) AS "_RN",
               "ENTITY"."ID", "ENTITY"."NAME"
        FROM   "ENTITY")
WHERE  "_RN" > 0
AND    "_RN" <= 10

This is suboptimal as a full table scan is forced (at least in 10g).
Using rownum instead of row_number() is faster (it can use indexes), and doesn't require an order by clause when not required. The query would look like:

SELECT   "ENTITY"."ID", "ENTITY"."NAME"
FROM     "ENTITY"
WHERE    ROWNUM > 0 AND ROWNUM <= 10
ORDER BY "ENTITY"."ID"

It won't make a noticeable difference with tables with a couple of rows; but when you are reaching the million rows, you go from minutes to milliseconds.

Attachments (2)

rownum.patch (2.3 KB) - added by Guillaume Taglang <guillaume.taglang@…> 6 years ago.
patch replacing row_number() with row
rownum.2.patch (1.6 KB) - added by Guillaume Taglang <guillaume.taglang@…> 6 years ago.
Updated patch

Download all attachments as: .zip

Change History (10)

Changed 6 years ago by Guillaume Taglang <guillaume.taglang@…>

patch replacing row_number() with row

comment:1 Changed 6 years ago by Guillaume Taglang <guillaume.taglang@…>

  • Has patch set
  • Needs documentation unset
  • Needs tests unset
  • Patch needs improvement unset

comment:2 Changed 6 years ago by Guillaume Taglang <guillaume.taglang@…>

The patch is not extensively tested, but my basic tests work fine.

Changed 6 years ago by Guillaume Taglang <guillaume.taglang@…>

Updated patch

comment:3 Changed 6 years ago by ikelly

I tried to do some limited benchmarking on this on an Oracle XE installation and got conflicting results. The model I used was this:

class Test(models.Model):
    class Meta:
        ordering = ['id']

With 1,000,000 rows in the table, I tried taking slices of 100 rows from the beginning, middle, and end of the table. Using rownum, the time was consistently about 870 ms/query. Using row_number(), the time was about 20 ms/query from the beginning of the table, about the same as rownum from the middle, and about twice as long as rownum from the end. Removing the default ordering from the model slightly sped up rownum, but slightly slowed down row_number().

It's difficult to judge, but I guess that the most common scenario for slicing large table would be taking small slices from near the beginning of an ordered query. From that point of view, I'm in favor of keeping row_number() at this point, but I could be persuaded to change my mind with fresh evidence.

Note that I recently changed the overall structure of the query in [9221], which could have an impact on the results.

comment:4 Changed 6 years ago by gtaglang

The change to the query in [9221] fixes my main gripes about slicing with oracle.

My own test on a set of table with some real data do not show any significant difference between rownum and row_number() for the first few rows, however as the number of rows increase row_number() gets significantly slower.

If you look at the result of an explain plan you will easily understand why: row_number() is an analytic function and the database needs to apply the function first; rownum on the oder hand is built in the query and costs nothing. Depending on data and indexes the cost difference between rownum and row_number(), as calculated by the optimizer, can reach several order of magnitude. (At least for my release of Oracle: 10.2 Enterprise.)

(As a side note: 870ms per query for the table you mentioned is really bad. Are you sure you did execute the tests with warm and/or cold cache for both rownum and row_number() ?)

(Second side note: looks like the spam filter is a tad overzealous.)

comment:5 Changed 6 years ago by ikelly

Yeah, that database is running on an old Pentium 3 desktop, and it's probably horrendously misconfigured to boot -- I'm not a DBA. I just tried running the same test on one of our real development databases, and in that scenario both methods run about 10k queries/second regardless of position, which probably just means that the network is now the bottleneck and I need to do a larger test.

comment:6 Changed 6 years ago by ikelly

Scratch that, I messed up the second test and the queries weren't actually running (I knew 10k / second was too good to be true). I am seeing row_number() running significantly slower than rownum now.

comment:7 Changed 6 years ago by ikelly

  • Resolution set to fixed
  • Status changed from new to closed

(In [9235]) Fixed #9136: Do slicing in Oracle with rownum instead of row_number() for a speed improvement. Thanks, Guillaume Taglang.

comment:8 Changed 6 years ago by ikelly

(In [9236]) [1.0.X] Fixed #9136: Do slicing in Oracle with rownum instead of
row_number() for a speed improvement. Thanks, Guillaume Taglang.

Backport of [9235] from trunk.

Add Comment

Modify Ticket

Change Properties
<Author field>
Action
as closed
as The resolution will be set. Next status will be 'closed'
The resolution will be deleted. Next status will be 'new'
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.